Spline Regression
Generalized Additive Models

I. Ozkan

Spring 2025

Splines (Spline Regression)

  1. continuous
  2. smooth: \(d-1\) continuous derivatives
  3. \(d^{th}\) derivative is constant between knots

Piece-Wice Regression

Let Cubic Polynomial Regressionis;

\(y_i=\beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 + \varepsilon_i\)

A single knot cubic polynomial with a single knot at c

\(y_i = \begin{cases} y_i=\beta_{01} + \beta_{11} x_i + \beta_{21} x_i^2 + \beta_{31} x_i^3 + \varepsilon_{1,i} & \quad \text{if } x_i<c \\ y_i=\beta_{02} + \beta_{12} x_i + \beta_{22} x_i^2 + \beta_{32} x_i^3 + \varepsilon_{2,i} & \quad \text{if } x_i \geq c \end{cases}\)

Piecewise Regression Example

Wage Data (ISLR2 Package)
wage and age Variables
wage age
114.02258 26
65.11085 36
141.77517 59
186.87668 50
173.87949 55
93.87868 30
Cubic Regression Results (1:all, 2:less then 50, 2:greater then 50)
Dependent variable:
wage
(1) (2) (3)
poly(age, 3, raw = T)1 -0.339 -91.636* -525.260
(14.837) (48.915) (774.593)
poly(age, 3, raw = T)2 0.082 2.778* 8.813
(0.339) (1.398) (12.969)
poly(age, 3, raw = T)3 -0.001 -0.027** -0.049
(0.002) (0.013) (0.072)
Constant 57.336 1,056.080* 10,487.720
(205.923) (556.584) (15,295.300)
Observations 75 57 18
R2 0.088 0.154 0.048
Adjusted R2 0.049 0.106 -0.157
Residual Std. Error 46.184 (df = 71) 41.798 (df = 53) 59.470 (df = 14)
F Statistic 2.283* (df = 3; 71) 3.204** (df = 3; 53) 0.233 (df = 3; 14)
Note: p<0.1; p<0.05; p<0.01

Piecewise Regression Example

  1. Continuous at the knot

  2. the first derivative is Continuous at the knot

  3. the second derivative is Continuous at the knot

Generalization

\[y=\beta_0 + \sum_{i=1}^{d} \beta_i x^i + \sum_{j=1}^{k} b_j (x-\zeta_j)_+^d\]

where,

\(\zeta_j\) is the \(j^{th}\) knot and truncated polynomial,

\((x-\zeta_j)_+^d = \begin{cases} (x-\zeta_j)^d & \quad \text{if } x_i \geq \zeta_j \\ 0 & \quad \text{if } x_i < \zeta_j \end{cases}\)

and

\(b(.)\) is the basis function

Polynomial Spline

Example: Linear Spline with one knot at 50

\(y=\beta_0 + \beta_1 b_1(x) + \beta_{1+1} (x-50)_+\)

or since \(b_1(x)\equiv x^1=x\)

\(y=\beta_0 + \beta_1 x + \beta_{1+1} (x-50)_+\)

where,

\((x_i-50)_+ = \begin{cases} (x_i-50) & \quad \text{if } x_i \geq 50 \\ 0 & \quad \text{if } x_i < 50 \end{cases}\)

Example: Quadratic Spline with one knot at 50

\(y=\beta_0 + \beta_1 b_1(x) + \beta_2 b_2(x) + \beta_{1+2} (x-50)_+^2\)

\((x_i-50)_+^2 = \begin{cases} (x_i-50)^2 & \quad \text{if } x_i \geq 50 \\ 0 & \quad \text{if } x_i < 50 \end{cases}\)

Example: Cubic Spline with one knot at 50

\(y=\beta_0 + \beta_1 b_1(x) + \beta_2 b_2(x) + \beta_3 b_3(x) + \beta_{1+3} (x-50)_+^3\)

\((x_i-50)_+^3 = \begin{cases} (x_i-50)^3 & \quad \text{if } x_i \geq 50 \\ 0 & \quad \text{if } x_i < 50 \end{cases}\)

Cubic Spline with several knots at \(\zeta_1, \zeta_2, \dots, \zeta_K\)

Since there are \(K\) knots, truncated power basis like above should be added for each intervals

\[y=\beta_0 + \beta_1 b_1(x) + \beta_2 b_2(x) + \beta_3 b_3(x) + \beta_{1+3} (x-\zeta_1)_+^3+\cdots+\beta_{K+3} (x-\zeta_K)_+^3\]

Then there are \(K+4\) regression coefficients to be estimated and hence fitting cubic spline (degree=3) with \(K\) knots uses \(K+4\) degrees of freedom

Boundary Regions

Estimation

\[ \begin{pmatrix} 1 & x_1 & x_1^2 & x_1^3 & \cdots & x_1^d & (x_1-\zeta_1)_+^d \cdots & (x_1-\zeta_k)_+^d\\ 1 & x_2 & x_2^2 & x_2^2 & \cdots & x_2^d & (x_2-\zeta_1)_+^d \cdots & (x_2-\zeta_k)_+^d\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots \\ 1 & x_n & x_n^2 & x_n^2 & \cdots & x_n^d & (x_n-\zeta_1)_+^d \cdots & (x_n-\zeta_k)_+^d\\ \end{pmatrix}\]

Natural Splines

Basis Functions

Let’s draw the basis functions for both splines and natural splines using R. In this case, ggfortify (to plot the basis functions) and splines packages are used.

Basis Functions (and Model Matrix)

Choosing Number of Knots

Smoothing Splines

\(\underbrace{\sum_{i=1}^n(y_i-g(x_i))^2}_\text{residual squares}+\underbrace{\lambda\int g''(t)^2dt}_\text{roughness penalty}\)

where, \(\lambda\) is called tuning parameter.

A side note

Here is the Hodrick-Prescott filter (which is a special case of a Smoothing Splines);

\(\begin{split}\min_{\\{ \tau_{t}\\} }\sum_{t}^{T}(y_t - \tau_t)^2 +\lambda\sum_{t=1}^{T}\left[\left(\tau_{t}-\tau_{t-1}\right)-\left(\tau_{t-1}-\tau_{t-2}\right)\right]^{2}\end{split}\)

End of a side note

Generalized Additive Model (GAM)

“Generalized additive models (GAMs) provide a general framework for extending a standard linear model by allowing non-linear functions of each of the variables, while maintaining additivity.”

Linear Regression Model

\(y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\cdots+\beta_px_{i,p}+\varepsilon_i\)

Each component \(\beta_j x_{i,j}\) can be replaced by a smooth non-linear function \(f_j(x_{i,j})\) to allow for non-linear relationships between each feature and response variables

\(y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\cdots+\beta_px_{i,p}+\varepsilon_i\)

\(y_i=\beta_0 + f_1(x_{i,1}) + f_2(x_{i,2}) + \cdots + f_p(x_{i,p}) + \varepsilon_i\)

This is an example of GAM and it is called an additive since all \(f_j\) are calculated separately and added to account their contributions

GAM Example (page 308, ISLR)

\(wage=\beta_0+f_1(year)+f_2(age)+f_3(education)+\varepsilon\)


Call: gam(formula = wage ~ s(year, 4) + s(age, 5) + education, data = Wage)
Deviance Residuals:
    Min      1Q  Median      3Q     Max 
-119.43  -19.70   -3.33   14.17  213.48 

(Dispersion Parameter for gaussian family taken to be 1235.69)

    Null Deviance: 5222086 on 2999 degrees of freedom
Residual Deviance: 3689770 on 2986 degrees of freedom
AIC: 29887.75 

Number of Local Scoring Iterations: NA 

Anova for Parametric Effects
             Df  Sum Sq Mean Sq F value    Pr(>F)    
s(year, 4)    1   27162   27162  21.981 2.877e-06 ***
s(age, 5)     1  195338  195338 158.081 < 2.2e-16 ***
education     4 1069726  267432 216.423 < 2.2e-16 ***
Residuals  2986 3689770    1236                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova for Nonparametric Effects
            Npar Df Npar F  Pr(F)    
(Intercept)                          
s(year, 4)        3  1.086 0.3537    
s(age, 5)         4 32.380 <2e-16 ***
education                            
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Pros and Cons of GAMs

Note: The multivariate adaptive regression spline (MARS) is out of this course

GAMs for Classification