Data Analytics for Business - Steps


I. Ozkan, PhD

Professor
MIS
Cankaya University

iozkan@cankaya.edu.tr

Spring 2025

Data Analysis

The Data Science Process

Data Collection/Usage: Principles

Principles of Data Usage in Analytics

Data File Organization

Steps of Data Analysis/Mining

Data Mining (When)

Lots of Keywords

Learning:

Task and Data

Why Business Deals with Data

Learning From Data

Supervised Learning Unsupervised Learning Reinforcement Learning
{Y;X} available {X} available Ex: Game
\(E[Y \: given \: X]\) Pattern inside data
\(P(Y=y \: given \:X=x)\) Homogeneous Groups
Ex: Regression Ex: Clustering

Data Rich Environment: [Very] High Dimensionality

\(Data=Pattern(s)+Error(s)\)

Example: Standard Regression

\(y=\beta_0+\beta_1 x_1+\beta_2 x_2+ \cdots + \beta_k x_k + \varepsilon\)

for some \(k>>2\)

This is equivalent to

\(y=\underbrace{\beta_0+\beta_1 x_1+\beta_2 x_2+ \cdots + \beta_k x_k}_\text{Pattern}+\underbrace{\varepsilon}_\text{Error}\)

Or put in another form:

\(\mu(X)=E[Y|X=x]=\hat \beta_0+\hat \beta_1 x_1+\hat \beta_2 x_2+ \cdots +\hat \beta_k x_k\)

given \(E[\varepsilon]=0\) and \(\hat \beta_i\) are the estimated coefficients.

How to find the parameters, \(\hat \beta_i\):

\(MSE=\frac{1}{N+1} \sum_{i=0}^{N} (y_i-\mu(x_i))^2=\frac{1}{N+1} \sum_{i=0}^{N} \varepsilon_i^2\)

In Business

Means:

Correlation vs Causation must be discussed

Error structure is important

Behavioral assessments to model is crucial

Fundamental Table

Data Causal Predictive
Observational Good/Bad Good/Bad
Experimental Good/Bad Good/Bad

Lets think two variables, \(y\) and \(x\), and the causality structure such that \(X\) causes \(Y\). All of the alternatives are:

Causality (Will be back to this topic later)

It is possible then,

\(X \implies Y\)

\(Y\) do not causes \(X\) since the sample is split by chance then chance causes \(X\)

\(Z\) may cause both possible but by chance

It could still be by chance

It could be by selection, but it should be excluded by the experimenter

Fundamental Table

Data Causal Predictive
Observational Bad Good
Experimental Good Bad