I. Ozkan
Fall 2025
“Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalise to unseen data, and thus perform tasks without explicit instructions. […]The application of ML to business problems is known as predictive analytics” Wikipedia
Deals with obtaining the features (inputs) from data
Deals with predictive tasks, Classification and/or Regression
Other Fields and Machine Learning (ML):
Artificial intelligence (AI): ML is a subset of AI
Data mining: ML and Data Mining overlap significantly
Statistics: Statistics and ML are closely related (but goals are different)
Computers have been used to automate many business decisions such as:
This is [in general] called digitization
Machine learning is central to the fourth industrial revolution, where computers are used to create intelligence
Data Rich Environment
Difficult to explain Human Expertise
Dynamic Systems, Changing with time
Needs for adaptation
Examples of Recent Success stories:
Speech Recognition 
 NLP
 Translation
 Image Processing
Learning:
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning (Out of the scope of this course)
Reinforcement Learning (Out of the scope of this course)
Deep Learning (Neural Networks will be introduced as the foundation of Deep Learning)
etc.
Task and Data:
Regression
Classification
Clustering
etc.
Example: Loan Applications (digitization vs. ML)
If there are certain known rules known
rules loan officers can apply, one could digitize their activities
If rules are not known ML can be used to determine them
ML can also be used to improve upon the rules for loan decisions
Huge Amount of data
1000s of covariates
ML becomes more fashionable
ML algorithms become more available
Computers are more powerful
Need to utilize different data types in modelling
Data to Pattern to [hopefully] theory is promising
Traditional Statistics is helpful (and necessary) but new methods/approaches are essential for business decisions
| Supervised Learning | Unsupervised Learning | Reinforcement Learning | 
|---|---|---|
| {Y;X} available | {X} available | Actions in Dynamic Environment Ex: Game | 
| \(E[Y \: given \: X]\) | Pattern inside data | |
| \(P(Y=y \: given \:X=x)\) | Homogeneous Groups | |
| Ex: Regression | Ex: Clustering | 
Supervised Learning: Labelled Data
 \(Data=Pattern + Error, \:
y=f(X)+\varepsilon\)
 Unsupervised Learning: Unlabeled Data
 \(Data \propto Pattern, \: X \propto
Pattern\) 
 
 Reinforcement Learning: An
Intelligent Agent should Take Actions in Dynamic Environment to maximize
a reward
Supervised Learning: main goal is to find:
\(Data=Pattern(s)+Error(s)\)
Example: Standard Regression
\(y=\beta_0+\beta_1 x_1+\beta_2 x_2+ \cdots + \beta_k x_k + \varepsilon\)
for some \(k>>2\)
This is equivalent to
\(Pattern=\beta_0+\beta_1 x_1+\beta_2 x_2+ \cdots + \beta_k x_k \; and \; \; error=\varepsilon\)
Or put in another form:
\(\mu(X)=E[Y|X=x]=\hat \beta_0+\hat \beta_1 x_1+\hat \beta_2 x_2+ \cdots +\hat \beta_k x_k\)
given \(E[\varepsilon]=0\) and \(\hat \beta_i\) are the estimated coefficients.
How to find the parameters, \(\hat \beta_i\):
\(MSE=\frac{1}{N+1} \sum_{i=0}^{N} (y_i-\mu(x_i))^2=\frac{1}{N+1} \sum_{i=0}^{N} \varepsilon_i^2\)
In most of the cases, number of observations, \(N\) is much grater than number of covariates (parameters), \(P\), \(N>>P\)
If the number of observations is similar to number of covariates, \(N \sim P\), then one might talk about some failure due to degrees of freedom
if \(N < P\) estimation fails
\(\implies\) high dimensionality comes with difficulties.
| Data | Causal | Predictive | 
|---|---|---|
| Observational | Good/Bad | Good/Bad | 
| Experimental | Good/Bad | Good/Bad | 
Lets think two variables, \(y\) and \(x\), and the causality structure such that \(X\) causes \(Y\). All of the alternatives are:
Experiment to remove the effects of potential confounder? (variable that influences both the dependent variable and independent variable)
Sample split randomly
It is possible then,
\(X \implies Y\)
\(Y\) do not causes \(X\) since the sample is split by chance then chance causes \(X\)
\(Z\) may cause both possible but by chance
It could still be by chance
It could be by selection, but it should be excluded by the experimenter
| Data | Causal | Predictive | 
|---|---|---|
| Observational | Bad | Good | 
| Experimental | Good | Bad | 
In economics Observational Data set is used for Causal Inference
In \(Theories \implies Models \implies Validate \: with \: Data\) flow, causal structure is dictated by \(Theories\). Hence the word EconoMetrics had been used in Economics.
Means:
Correlation vs Causation must be discussed (this one is the main critique)
Error structure is important
Behavioral assessments to model is crucial
Goodness of fit is not the main focus (though it is important)
Theories (Thought Exercise: Idea)
Lots of assumptions without a plausible way to test them (many of them are unrealistic)
Theories \(\implies\) Models
Estimate Models (OLS, IV Regression, Max Likelihood GMM etc.)
Conclude with estimated parameters and standard errors