MIS 302

Learning From Data: Statistical Learning


I. Ozkan, PhD

Professor
MIS
Cankaya University

iozkan@cankaya.edu.tr

Spring 2025

Reading




Learning Objectives

Keywords:

Predictors and Response (Dependent, Output) Variables

\[Y=f(X)+\varepsilon\]

\[Y=f(X)+\varepsilon=Pattern+Error\]

Function \(f()\)

Why Estimate \(f()\), Prediction

\(X=(X_1, X_2, \cdots,X_p)\) are available but \(Y\) can not be obtained.

\(\hat Y=\hat f(X)\) since \(E(\varepsilon)=0\)

\(\hat f()\) may be a black box model where exact form is not important but it predicts \(Y\) accurately

Reducible (\(\hat f()\) is not perfect estimate of \(f\)) and irreducible error (\(\hat f()\) almost perfect estimate of \(f\), but \(Y\) is a function of \(\varepsilon\))

The expected value of the squared difference between actual and predicted value of \(Y\)

\[E(Y-\hat Y)^2=E[f(X)+\varepsilon -\hat f(X)]^2\] \[=\underbrace{E[\big(f(X) -\hat f(X)\big)^2]}_{reducible} +\underbrace{Var(\varepsilon)}_{irreducible}\]

\(\varepsilon\) may contain (i) unmeasured variables and (ii) unmeasurable variation

The focus is to minimize the reducible error with different techniques for estimating \(\hat f()\)

Why Estimate \(f()\), Inference

The main aim is to understand the relationship between \(X\) and \(Y\). The main aim is not necessarily to make prediction.

\(\hat f()\) should be chosen so that it is interpretable.

Questions are:

Inference: An Example, Advertising Data

TV radio newspaper sales
230.1 37.8 69.2 22.1
44.5 39.3 45.1 10.4
17.2 45.9 69.3 9.3
151.5 41.3 58.5 18.5
180.8 10.8 58.4 12.9
8.7 48.9 75.0 7.2
57.5 32.8 23.5 11.8
120.2 19.6 11.6 13.2
8.6 2.1 1.0 4.8
199.8 2.6 21.2 10.6

– Which media contribute to sales?

– Which media generate the biggest boost in sales?

– How much increase in sales is associated with a given increase in TV advertising?

How Do We Estimate \(f\)

Parametric Example (Fig 2.4 ISLR)

Example: income, education and seniority relationship where the true underlying relationship shown as below:

Fig. 2.3
Here is the example of parametric approach,

\[income=\beta_0 + \beta_1 \times education + \beta_2 \times seniority\]

Then the estimated function:

Fig. 2.4
Fig. 2.4

How Do We Estimate \(f\)

Fig. 2.5
Fig. 2.5

Prediction Accuracy and Model Interpretability

Supervised vs Unsupervised Learning

Fig 2.8
Fig 2.8

Regression vs Classification

Assessing Model Accuracy

Low Bias and Low Variance

Fig. 2.9
Fig. 2.9

Low Bias and Low Variance

Fig. 2.11
Fig. 2.11

Model Assessment and Model Selection