I. Ozkan
Spring, 2025
A stationary time series is in statistical equilibrium. In other words joint probability distribution does not change with time.
What if, some properties do change with time. For example, if time series contains trend, changing seasonality or time varying variance. Time series;
\(X_t=\mu_t + \varepsilon_t\)
where \(\mu_t\) is time-varying trend and \(\varepsilon_t\) is stationary \(ARMA\) process.
Let: \(X_t=\mu_t + \varepsilon_t\)
For the sake of simplicity, let’s assume: \(\mu_t=\beta_0 + \beta_1 t\) and set \(\varepsilon_t=a_t\). Then \(X_t=\beta_0+\beta_1 t + a_t\)
If we know the trend structure then we can estimate it. This is known as trend stationary
If the trend is stochastic (unit-root) or if the trend structure is not known, then differencing is possible.
set \(W_t=X_t-X_{t-1}=\mu_t-\mu_{t-1}+\varepsilon_t-\varepsilon_{t-1}\)
for the linear trend case,
\(W_t=(\beta_0+\beta_1 t +
a_t)-(\beta_0+\beta_1 (t-1) + a_{t-1})\)
\(\implies W_t=\beta_1 + a_t -
a_{t-1}\)
and both \(E[W_t]=\beta_1 \; and\; V(W_t)=2 \sigma^2\) are constant (do not change with time)
\(X_t=X_{t-1}+a_t \implies (1-B)X_t=a_t \implies \nabla X_t =a_t\)
where differencing operator defined as \(\nabla\equiv(1-B)\)
Similarly d times differencing \(\nabla^d=(1-B)^d\) also called as
differencing \(d^th\) order
We will discuss when we need differencing more than one time
\(X_t=\beta_0+\beta_1 t+\beta_2 t^2 + a_t\)
\(w_t=\nabla X_t=X_t - X_{t-1}\)
\(\implies (\beta_0+\beta_1 t+\beta_2 t^2 + a_t)-(\beta_0+\beta_1 (t-1)+\beta_2 (t-1)^2 + a_{t-1})\)
\(\implies \beta_0 - \beta_0 + \beta_1 - \beta_2 +\beta_1 t-\beta_1 t+\beta_2 t^2-\beta_2 t^2 + a_t-a_{t-1}+2 \beta_2t\)
\(\implies w_t=\beta_1-\beta_2+2 \beta_2 t + a_t - a_{t-1}\)
The first difference does not remove the trend completely. But the remaining part of the trend is now linear trend
This means that it is possible to remove this part by differencing once more
\(w_t-w_{t-1}=2 \beta_2 + a_t - 2 a_{t-1} + a_{t-2}\)
The second order differencing completely removes quadratic time trend
As a generalization, one can conclude that differencing enough times remove any polynomial time trend
But there is a catch, differencing inflates the variance
\(V(w_t-w_{t-1})=\sigma^2 + 4 \sigma^2 + \sigma^2=6 \sigma^2\)
The cost of differencing is nothing but inflated variance
Every n times differentiable function, \(f(t)\), can be represented by \(n^{th}\) order polynomial function of
t (Taylor Series Expansion)
It is possible then remove any \(n^{th}\) order polynomial trend by \(n^{th}\) order differencing
AutoRegressive Integrated Moving Average process \(ARIMA(p,d,q)\) is then given as,
\(\Phi(B) \nabla^d X_t=\Theta(B) a_t\)
\(X_t=\mu_t + \varepsilon_t\)
\(V(X_t)=V(\varepsilon_t)=h^2(\mu_t) \sigma^2\)
where, \(h(.)\) is unknown function.
\(g(X_t)\approx g(\mu_t) + g'(\mu_t)(X_t-\mu_t)\) where, \(g'(.)\) is the first derivative \(\frac{d}{dx}g(x)\)
\(V(g(X_t))\approx V(g(\mu_t) + g'(\mu_t)(X_t-\mu_t))\)
\(=[g'(\mu_t)]^2 V(X_t) = [g'(\mu_t)]^2 h^2(\mu_t) \sigma^2\)
\(\implies g'(\mu_t)=\frac{1}{h(\mu_t)}\)
For example, if \(h(\mu_t)=\mu_t\) or the standard deviation is proportional to the level of the process,
\(g'(\mu_t)=\frac{1}{\mu_t} \implies g(.)=ln(.)\)
\(g(X_t)=\frac{X_t^\lambda - 1}{\lambda}\) and note that,
\(\lim_{\lambda \to 0} g(X_t) = \lim_{\lambda \to 0} \frac{X_t^\lambda - 1}{\lambda} = ln(X_t)\)
bcPower Transformation to Normality
Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
x 0.196 0.2 0.144 0.248
Likelihood ratio test that transformation parameter is equal to 0
(log transformation)
LRT df pval
LR test, lambda = (0) 68.39832 1 < 2.22e-16
Likelihood ratio test that no transformation is needed
LRT df pval
LR test, lambda = (1) 511.5032 1 < 2.22e-16
[1] 47 149
[1] 202 47
Another Lambda estimation
Lambda: 0.1818
forecast package functions, tsdisplay() and
ggtsdisplay()
\(g(X_t)=\frac{X_t^\lambda - 1}{\lambda}\)
From the first step, if it is necessary Difference time series \(d\) times to obtain stationary (in mean) series
\(ACF\) and \(PACF\) of stationary series
\(ARMA(p,q)\): \(ACF\) and \(PACF\) infinity in extent, for
example:
\(ARMA(1,1):\;q-p+1=1\): \(ACF\) with initial value \(\rho_0\) and exponentially decay from lag
1.
\(ARMA(1,2):\;q-p+1=2\): two initial
values \(\rho_0, \rho_1\) and
exponentially decay from lag 2.
\(ARMA(2,1):\;q-p+1=0\): no initial
values exponential decay or damped sine wave.
\(ARMA(2,2):\;q-p+1=1\): \(ACF\) with initial value \(\rho_0\) exponential decay or damped sine
wave from lag 1.
Estimation: if no \(MA\) part one can use \(LSE\) otherwise Maximum Likelihood estimation
Diagnostic
Forecasting
\(Q_{stat}=n \displaystyle\sum_{k=1}^{h} \rho_k^2\)
\(h\): maximum lag to be considered. Ususally 20 is selected. If residuals are white noise, \(Q_stat \sim \chi^2\) with \(h-m\) degrees of freedom where \(m\) is the number of parameters in the model.
\(Q^*_{stat}=n (n+2)\displaystyle\sum_{k=1}^{h} (n-k)^k \rho_k^2\)
again, \(Q^*_stat \sim \chi^2\) with \(h-m\) degrees of freedom.
There exists several unit-root tests available. Most widely used one is Augmented Dickey-Fuller test.
The others are;
fUnitRoots package contains all these tests. See,
urdfTest(), urersTest(),
urkpssTest(), urppTest(),
urspTest() and urzaTest() function of this
package.
Competition Assignment due to last week:
Each group should select one of the group members’ stock price series
Create a model for this time series with \(ARIMA(p,d,q)\). To do so; (i) try to come out with candidate models, (ii) select the best model among them, (iii) diagnose the best model
This part is left to students as exercise.
Read (i) Forecasting: Principles and Practice, Section 9.9