I. Ozkan
Spring 2025
Deals with obtaining the features (inputs) from data
Deals with predictive tasks such as:
Data Rich Environment
Lack of Human Expertise
Difficult to explain Human Expertise
Dynamic Systems, Changing with time
Needs for adaptation
Departments of Business Where Analytics are valuable:
- Finance
Marketing and Sales
Supply Chaing and Logistics
…
Learning:
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning
Deep Learning (Part of both Supervised and Unsupervised Learning)
etc.
Task and Data:
Regression
Classification
Clustering
Forecasting
etc.
Huge Amount of data
100s of covariates
It becomes more fashionable
It’s algorithms become more available
Computers are more powerful
Need to use different data types in modelling
Data to Pattern to [hopefully] theory is promising
…
| Supervised Learning | Unsupervised Learning | |
|---|---|---|
| {Y;X} available | {X} available | Ex: Game | 
| \(E[Y \: given \: X]\) | Pattern inside data | |
| \(P(Y=y \: given \:X=x)\) | Homogeneous Groups | |
| Ex: Regression | Ex: Clustering | 
 
\(Data=Pattern(s)+Error(s)\)
Example: Standard Regression
\(y=\beta_0+\beta_1 x_1+\beta_2 x_2+ \cdots
+ \beta_k x_k + \varepsilon\)
for some \(k>>2\)
This is equivalent to
\(Pattern=\beta_0+\beta_1 x_1+\beta_2 x_2+
\cdots + \beta_k x_k \; and \; \; error=\varepsilon\)
(Assumptions are skipped)
Or put in another form:
\(\mu(X)=E[Y|X=x]=\hat \beta_0+\hat \beta_1 x_1+\hat \beta_2 x_2+ \cdots +\hat \beta_k x_k\)
given \(E[\varepsilon]=0\) and \(\hat \beta_i\) are the estimated coefficients.
How to find the parameters, \(\hat \beta_i\):
\(MSE=\frac{1}{N+1} \sum_{i=0}^{N} (y_i-\mu(x_i))^2=\frac{1}{N+1} \sum_{i=0}^{N} \varepsilon_i^2\)
In most of the cases, number of observations, \(N\) is grater than number of covariates (parameters), \(P\), \(N>>P\)
If \(N \sim P\) then one might talk about some failure due to degrees of freedom (and overfitting)
if \(N < P\) OLS fails.
\(\implies\) high dimensionality comes with difficulties.
Means:
Correlation vs Causation must be discussed (this one is the main critique)
Error structure is important
Behavioral assessments to model is crucial
Goodness of fit is not the main focus (though it is important)
Theories (Thought Exercise: Idea)
Lots of assumptions without a plausible way to test them (many of them are unrealistic)
Theories \(\implies\) Models
Estimate Models (OLS, IV Regression, Max Likelihood GMM etc.)
Conclude with estimated parameters and standard errors
| Data | Causal | Predictive | 
|---|---|---|
| Observational | Good/Bad | Good/Bad | 
| Experimental | Good/Bad | Good/Bad | 
Lets think two variables, \(y\) and \(x\), and the causality structure such that \(X\) causes \(Y\). All of the alternatives are:
Experiment to remove the effects of potential confounding factors? (may solve some of the cases)
Sample split randomly
It is possible then,
\(X \implies Y\)
\(Y\) do not causes \(X\) since the sample is splitt by chance then chance causes \(X\)
\(Z\) may cause both possible but by chance
It could still be by chance
It could be by selection, but it should be excluded by the experimenter
OPEN DISCUSSION (ONCE MORE)