I. Ozkan
Spring 2025
Probably the simplest model is nothing but white noise model. This is simple white noise plus some constant \(c\).
\[X_t = c + a_t\]
where, \(a_t \: i.i.d \: \sim N(0,\sigma^2)\) or it is purely random numbers follows normal distribution and \(Cov(a_t,a_{t-k})=0\) for all \(k \neq 0\).
[1] 1990.000 1993.827 52.000
Stationary stochastic AR(1) model is the simplest AutoRegressive model. It is given as:
\(X_t=\phi \times X_{t-1} + a_t=\phi X_{t-1} + a_t\)
where, \(|\phi| < 1\) with \(a_t\) is white noise. If we continue to rewrite the above equation with replacing the lagged values:
\(X_t=\phi X_{t-1} + a_t=\phi (\phi X_{t-2}+a_{t-1}) + a_t=\phi^2X_{t-2}+\phi a_{t-1}+a_t\)
\(=\phi^2(\phi X_{t-3}+a_{t-2}) +\phi
a_{t-1}+a_t=\phi^3X_{t-3}+\phi^2 a_{t-2}+\phi a_{t-1}+
a_t\)
or
\(\cdots=(1+\phi B+\phi^2 B^2+\cdots)a_t\)
Or mathematically this model can be written as
\((1-\phi B)X_t=a_t\)
where \(BX_t \equiv X_{t-1}\)
Let’s re-write the above equation as:
\(X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t\)
This is not difficult to prove (hint use Taylor Series around \(x=0\) for \(f(x)=(1-x)^{-1}\))
The expected value of \(X_t\) then,
\(E[X_t]=E[\phi X_{t-1}+ a_t]=E[(1+\phi B+\phi^2 B^2+\cdots)a_t]=0\)
For the variance of \(X_t\),
\(V(X_t)=V(\phi X_{t-1}+ a_t)=V(a_t+\phi a_{t-1}+\phi^2 a_{t-2}+\cdots)=\sigma^2+\phi^2\sigma^2+\phi^4\sigma^2+\cdots\)
This can approach to a value if and only if, (iff), \(|\phi|<1\).
Assuming that \(V(X_t)=V(X_{t-k})\) for all \(k\) and \(Cov(X_{t-k},a_t)=0\) for \(k\neq 0\), then,
\(V(X_t)=V(\phi X_{t-1}+ a_t)=V(\phi X_{t-1}) + V(a_t)\implies V(X_t)=\frac{\sigma^2}{1-\phi^2}\)
It means that if \(|\phi|=1\) then the \(V(X_t)\) explodes to infinity. The model with \(\phi=1 \implies X_t=X_{t-1}+a_t\) is called Random Walk
Example:
Random walk is a special process used in many context in finance/economics. The model:
\(X_t=X_{t-1}+a_t\)
Then,
\(E[X_t|X_{t-1}, X_{t-2},\cdots]=E[X_{t}|X_{t-1}]+E[a_t]=X_{t-1}+0\)
It simply means that the expected value of \(X_t\) is the last observation
In other words, no one can anticipate the next value of \(X_t\)
Another type of special Random Walk model is called as Random Walk with Drift in which a constant is added to the model:
\(X_t=\delta+X_{t-1}+a_t\)
Simulated Series
500 simulated series of Random Walk (\(a_t \sim N(0,1)\)) and Random Walk with Drift (\(\delta=0.2, \; a_t \sim N(0,1)\))
It means, sample moments for finite stretches of the realisation approach their population counterparts as the length of the realisation becomes infinite.
A stochastic process is said to be strictly stationary if its properties are unaffected by a change of time origin (T. Mills, Time Series Techniques for Economists, p11).
In simple terms, time shift of observation do not cause any change in joint probability distribution
some math:
For any \(E[X_t]^2 < \infty\), Joint probability distribution,
\(P(X_{t_1},X_{t_2},\cdots,X_{t_m}) = P(X_{t_{k+1}},X_{t_{k+2}},\cdots,X_{t_{k+m}})\)
Here are what we assume in simple terms, for \(m=1\):
\(E[X_1]=E[X_2]=E[X_3] \cdots E[X_t]=\mu\)
\(V(X_1)=V(X_2)=V(X_3) \cdots V(X_t)=\sigma_x^2\)
for \(m=2\), time series should exhibit constant covariance for being strictly stationary. \(Cov(x_t,x_{t-k})\) should be constant for all \(k \neq 0\)
As in these cases, condition applies to only first and second moment, this is known as weak stationary (or covariance stationary). The above covariance equation is known as autocovariance and can be written as:
\(\gamma_k=Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]=Cov(X_{t-k},X_t)=\gamma_{-k}\)
with \(\gamma_0=Var(X_t)\) and hence the Autocorrelation (ACF) function at lag \(k\),
\(\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\gamma_{-k}}{\gamma_0}=\rho_{-k}\)
\(X_t-\mu=a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots=\displaystyle\sum_{j=0}^{j=\infty}\psi_j a_{t-j} \: \: \: \: with \: \: \: \psi_0=1\)
\(a_t\) is defined before as white noise and hence,
\(E(a_t)=0\), \(V(a_t)=E(a_t^2)=\sigma^2<\infty\) and \(cov(a_t,a_{t-k})=0 \: \: for \: all \: \: k\neq 0\), or for the sake of simplicity let’s repeat \(a_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)\)
Recall that, \(V(X_t)=E[(X_t-\mu)^2]\) and Wald decomposition allows us to rewrite variance as:
\(V(X_t)=E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots)^2]\)
\(\implies E[(a_t)^2] + \psi_1^2 E[(a_{t-1})^2] + \psi_2^2 E[(a_{t-2})^2] + \cdots=\sigma^2+\psi_1^2 \sigma^2+\psi_2^2 \sigma^2+ \cdots\)
since \(E(a_t a_{t-k})=0 \: \: for \: k \neq 0\)
Then, \(V(X_t)=\gamma_0=\sigma^2 \displaystyle\sum_{j=0}^{j=\infty}\psi_j^2\)
Similarly, \(Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]\), then,
\(E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots+\psi_k a_{t-k}+\cdots)(a_{t-k}+\psi_1 a_{t-k-1}+\psi_2 a_{t-k-2}+\cdots)]\)
\(\sigma^2(\psi_k+\psi_1 \psi_{k+1}+\psi_2 \psi_{k+2}+\cdots)=\sigma^2 \displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}\)
Hence, Autocorrelation at lag \(k\)
\(\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}}{\displaystyle\sum_{j=1}^{j=\infty}\psi_j^2}\)
in order to converge,
\(\displaystyle\sum_{j=0}^{j=\infty}|\psi_j|<\infty\)
For AR(1) model, recall that,
\(X_t - \phi X_{t-1}=a_t\)
and
\(X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t\)
hence, \(\psi_j=\phi^j\).
If we multiply both sides with \(X_{t-k}\) of \(AR(1)\) model, \(X_t X_{t-k}-\phi X_{t-1}X_{t-k}=a_t X_{t-k}\) and taking expectations of both sides
\(E(X_t X_{t-k})-\phi E(X_{t-1}X_{t-k})=\gamma_k - \phi \gamma_{k-1} =E(a_t X_{t-k})\)
\(\implies \gamma_k - \phi \gamma_{k-1}=0 \implies \gamma_k = \phi \gamma_{k-1} \implies \gamma_k = \phi^k \gamma_0\)$
hence, \(ACF\) then, \(\rho_k=\phi^k\) for \(AR(1)\) process
Since we know that, \(|\phi|<1\) for a stationary process. Then here are two examples of \(\rho_k\) for negative and positive values of \(\phi\)
Here there are four \(AR(1)\) models simulated with \(a_t \: i.i.d. \sim N(0,2^2)\)
\(X_t = 0.5 X_{t-1} + a_t\)
\(X_t = 0.9 X_{t-1} + a_t\)
\(X_t = -0.5 X_{t-1} + a_t\)
\(X_t = -0.9 X_{t-1} + a_t\)
The simplest moving average \(MA(q)\) process is first order moving average, \(MA(1)\), process. It is given as;
\(X_t=a_t - \theta a_{t-1}=(1-\theta B)a_t\)
In this case, \(V(X_t)=\gamma_0=V(a_t - \theta a_{t-1})=\sigma^2+\theta^2 \sigma^2=\sigma^2(1+\theta^2)\).
\(\gamma_1=-\theta \sigma^2\) and \(\gamma_k=0 \: \: for \: all \: k>1\). Hence, \(ACF\) of \(MA(1)\) is then, \(\rho_1=\frac{-\theta}{1+\theta^2}\) and \(\rho_k=0\) for all \(k > 1\)
Here is the correlation of two consecutive observartion is not zero but there is no memory for higher number of lags.
\((1-\theta B)^{-1} X_t=a_t=(1+\theta B + \theta^2 B^2+\cdots)X_t\)
If not invertible, \(|\theta| \geq 1\) then this process does not converge
Similarly, for \(AR(1)\) recall that the stationary condition states that \(|\phi|<1\) which is also known as causal process which states that it is possible to find \(MA(\infty)\) representation
\(X_t=\phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} +a_t\) or,
\(X_t- \phi_1 X_{t-1} - \phi_2 X_{t-2} - \cdots - \phi_p X_{t-p} = a_t\)
\((1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p)X_t = \Phi(B)X_t =a_t\)
The function
\(\Phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p\)
called as characteristic polynomial (or characteristic equation).
\(X_t=a_t - \theta_1 a_{t-1} - \theta_2 a_{t-2} - \cdots - \theta_q a_{t-q}\)
Similarly,
\(X_t=(1-\theta_1 B - \theta_2 B^2 - \cdots -\theta_q B^q) a_t = \Theta(B) a_t\)
Examples
\(X_t = -0.5 X_{t-1} + a_t\)
\(X_t = 0.5 X_{t-1} + -0.5 X_{t-2} + a_t\)
\(X_t = 0.2 X_{t-1} - 0.4 X_{t-2} + 0.6 X_{t-1} + a_t\)
\(X_t = 0.5 X_{t-1} + 0.4 X_{t-2} - 0.6 X_{t-3} + 0.4 X_{t-4} + a_t\)
\(X_t = a_t - a_{t-1}\)
\(X_t = a_t - a_{t-1} + a_{t-2}\)
\(X_t = a_t - a_{t-1} - a_{t-2} - a_{t-3}\)
\(X_t = a_t - 0.25a_{t-1} - 0.25a_{t-2} - 0.25a_{t-3} - 0.25a_{t-4}\)
In general, a stationary time series will have no predictable patterns in the long-term. Time plots will show the series to be roughly horizontal (although some cyclic behaviour is possible), with constant variance
The stationarity conditions required for convergence of the weights are that the roots of the characteristic equation (characteristic polynomial) of \(AR(p)\) process \(\Phi(B)\):
\[\Phi(B)=(1-g_1 B)(1-g_2 B) \cdots (1-g_p B)=0\] must be within unit-disk (inside unit circle), \(|g_i| <1 \: \: for \: all \: i=1,2,..,p\)
\[\rho_k=A_1 g_1^k + A_2 g_2^k + \cdots + A_p g_p^k\]
and \(|g_i|<1\) means that \(ACF\) composed of damped exponentials (real roots) or damped sine waves (imaginary roots)
Example: \(AR(2)\)
\((1 - \phi_1 B - \phi_2 B^2)X_t = \Phi(B)X_t =a_t \implies \Phi(B)=1 - \phi_1 B - \phi_2 B^2\)
\(\Phi(B)=0 \implies g_1,g_2=\frac{\phi_1 \mp \sqrt{\phi_1^2+4 \phi_2}}{2}\)
with \(|g_1|<1\) and \(|g_2|<1\) for stationarity. Then the followings should hold:
\(\phi_1+\phi_2<1\), \(-\phi_1+\phi_2<1\), \(-1<\phi_2<1\)
The roots are complex iff \(\phi_1^2+4 \phi_2<0\)
\(ACF\) of \(AR(p)\) processes are damped out (either exponentially or sine waves) it is difficult to distinguish the order of the process
The \(k^{th}\) partial autocorrelation function \(pacf\), is the coefficient \(\phi_{kk}\) of the following regression:
\(X_t=\phi_{k,1} X_{t-1}+\phi_{k,2} X_{t-2}+\cdots+\phi_{k,k} X_{t-k}\)
It measures the additional correlation between \(X_t\) and \(X_{t-k}\), after adjustments have been made for the intervening lags
\(ACF\) is then zero for the lags larger than the order of the process
For \(AR(p)\):
For \(MA(q)\), the process \(X_t=\Theta(B)a_t\)
\(ACF,\; \rho_k=\frac{-\theta_k+\theta_1 \theta_{k+1}+\cdots+\theta_{q-k} \theta_{q}}{1+\theta_1^2+\cdots+\theta_q^2}\)
and \(k=1,2,...,q, \; \rho_k=0 \; for \: k>q\).
The memory of the process extends \(q\) periods. In simple terms, \(ACF\) is zero after \(q^{th}\) lag
Recall that every invertible \(MA(q)\) process can be represented as \(AR(\infty)\) and hence it is easy to see that \(PACF\) of \(MA(q)\) process is infinite in extent. Again it exhibits mixture of exponential decays or damped sine wave
AR(p) Examples
\(AR(p), p=1,2,3,4\)
\(X_t-\phi X_{t-1}=a_t - \theta a_{t-1}\)
or
\((1-\phi B) X_t =(1-\theta B) a_t\)
\(MA(\infty)\) representation,
\(X_t=\frac{1-\theta B}{1-\phi B}a_t\)
Similarly \(AR(\infty)\) representation,
\(\frac{1-\phi B}{1-\theta B}X_t=a_t\)
\(ARMA(1,1)\) leads to both autoregressive and moving average representations with infinite number of lags. Hence, \(ACF\) is similar (not the same) to \(AR(p)\) and \(PACF\) is similar (again not the same) to \(MA(q)\)
Similarly, \(ARMA(p,q)\) is then
\(X_t-\phi_1 X_{t-1}-\phi_2 X_{t-2}-\cdots-\phi_p X_{t-p}=a_t - \theta_1 a_{t-1}-\theta_2 a_{t-2}-\cdots-\theta_q a_{t-q}\)
\((1-\phi_1 B - \phi_2 B^2 - \cdots - - \phi_p B^p) X_t =(1-\theta_1 B -\theta_2 B^2 - \cdots -\theta_1 B^q) a_t\)
\(\Phi(B) X_t =\Theta(B) a_t\)
\(AR(\infty)\) representation
\(\frac{\Phi(B)}{\Theta(B)}X_t=a_t\)
\(MA(\infty)\) representation
\(X_t=\frac{\Theta(B)}{\Phi(B)}a_t\)
with both stationary condition and invertibilty condition
Theoretical ACF and PACF
$roots
[1] 0.7
$acf
0 1 2 3 4 5
1.0000000000 0.8307692308 0.5815384615 0.4070769231 0.2849538462 0.1994676923
6 7 8 9 10 11
0.1396273846 0.0977391692 0.0684174185 0.0478921929 0.0335245350 0.0234671745
12 13 14 15 16 17
0.0164270222 0.0114989155 0.0080492409 0.0056344686 0.0039441280 0.0027608896
18 19 20
0.0019326227 0.0013528359 0.0009469851
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] 0.7
$pacf
[1] 8.307692e-01 -3.506494e-01 1.687500e-01 -8.359133e-02 4.169884e-02
[6] -2.083735e-02 1.041717e-02 -5.208396e-03 2.604175e-03 -1.302084e-03
[11] 6.510418e-04 -3.255208e-04 1.627604e-04 -8.138021e-05 4.069010e-05
[16] -2.034505e-05 1.017253e-05 -5.086263e-06 2.543132e-06 -1.271566e-06
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] -0.7
$acf
0 1 2 3 4
1.0000000000 -0.2363636364 0.1654545455 -0.1158181818 0.0810727273
5 6 7 8 9
-0.0567509091 0.0397256364 -0.0278079455 0.0194655618 -0.0136258933
10 11 12 13 14
0.0095381253 -0.0066766877 0.0046736814 -0.0032715770 0.0022901039
15 16 17 18 19
-0.0016030727 0.0011221509 -0.0007855056 0.0005498539 -0.0003848978
20
0.0002694284
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] -0.7
$pacf
[1] -2.363636e-01 1.160714e-01 -5.777778e-02 2.885683e-02 -1.442441e-02
[6] 7.211705e-03 -3.605790e-03 1.802887e-03 -9.014426e-04 4.507212e-04
[11] -2.253606e-04 1.126803e-04 -5.634014e-05 2.817007e-05 -1.408504e-05
[16] 7.042518e-06 -3.521259e-06 1.760630e-06 -8.803148e-07 4.401574e-07
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] 0.7
$acf
0 1 2 3 4 5
1.0000000000 0.2363636364 0.1654545455 0.1158181818 0.0810727273 0.0567509091
6 7 8 9 10 11
0.0397256364 0.0278079455 0.0194655618 0.0136258933 0.0095381253 0.0066766877
12 13 14 15 16 17
0.0046736814 0.0032715770 0.0022901039 0.0016030727 0.0011221509 0.0007855056
18 19 20
0.0005498539 0.0003848978 0.0002694284
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] 0.7
$pacf
[1] 2.363636e-01 1.160714e-01 5.777778e-02 2.885683e-02 1.442441e-02
[6] 7.211705e-03 3.605790e-03 1.802887e-03 9.014426e-04 4.507212e-04
[11] 2.253606e-04 1.126803e-04 5.634014e-05 2.817007e-05 1.408504e-05
[16] 7.042518e-06 3.521259e-06 1.760630e-06 8.803148e-07 4.401574e-07
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] -0.7
$acf
0 1 2 3 4
1.0000000000 -0.8307692308 0.5815384615 -0.4070769231 0.2849538462
5 6 7 8 9
-0.1994676923 0.1396273846 -0.0977391692 0.0684174185 -0.0478921929
10 11 12 13 14
0.0335245350 -0.0234671745 0.0164270222 -0.0114989155 0.0080492409
15 16 17 18 19
-0.0056344686 0.0039441280 -0.0027608896 0.0019326227 -0.0013528359
20
0.0009469851
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
$roots
[1] -0.7
$pacf
[1] -8.307692e-01 -3.506494e-01 -1.687500e-01 -8.359133e-02 -4.169884e-02
[6] -2.083735e-02 -1.041717e-02 -5.208396e-03 -2.604175e-03 -1.302084e-03
[11] -6.510418e-04 -3.255208e-04 -1.627604e-04 -8.138021e-05 -4.069010e-05
[16] -2.034505e-05 -1.017253e-05 -5.086263e-06 -2.543132e-06 -1.271566e-06
$periodicity
[1] damping period
<0 rows> (or 0-length row.names)
Here are the examples of \(AR(p), \; p=1,2,3,4\) processes and the roots of characteristic equation.
\(AR(p)\) models can be estimated by least square estimation \(LSE\)
For example, \(AR(1)\) model,
\(X_t=\phi X_{t-1} + a_t\) is just
standard regression (with or without) intercept. The estimation can be
achieved by lm(xt ~ -1+xt1)
. The proper steps may be
(assume that \(X_t\) is ts
object- denoted as xt
in \(R\)):
xt1 <- lag(xt, -1)
data1 <- ts.union(xt, xt1)
with intercept: lm(xt ~ xt1, data=data1)
without intercept: lm(xt ~ 0 + xt1, data=data1)
in \(R\) there are other functions
we can use. For example, ar()
, arima()
are
functions available in base distribution. Arima()
,
auto.arima()
are available in forecast
package. AR()
, MA()
ARIMA()
functions are available in fable
package. The best way is
to use these functions instead of using lm()
function
Example with \(\phi=0.7 \; and \; \phi=-0.7\)
# AR Simulation
set.seed(12345)
x<-rnorm(1000)
sim1<-arima.sim(list(ar=c(0.7)),n=1000,innov=x)
sim2<-arima.sim(list(ar=c(-0.9)),n=1000,innov=x)
ggtsdisplay(sim1, main="Simulated AR(1), 0.7", theme = theme_bw())
Call:
ar(x = sim1)
Coefficients:
1
0.7096
Order selected 1 sigma^2 estimated as 0.9985
Call:
arima(x = sim1, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.7091 0.1503
s.e. 0.0222 0.1082
sigma^2 estimated as 0.9961: log likelihood = -1417.32, aic = 2840.64
Series: sim1
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 mean
0.7091 0.1503
s.e. 0.0222 0.1082
sigma^2 = 0.9981: log likelihood = -1417.32
AIC=2840.64 AICc=2840.66 BIC=2855.36
Call:
ar(x = sim2)
Coefficients:
1
-0.8702
Order selected 1 sigma^2 estimated as 1.008
Call:
arima(x = sim2, order = c(1, 0, 0))
Coefficients:
ar1 intercept
-0.8725 0.0242
s.e. 0.0155 0.0169
sigma^2 estimated as 0.9948: log likelihood = -1417.04, aic = 2840.08
Series: sim2
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 mean
-0.8725 0.0242
s.e. 0.0155 0.0169
sigma^2 = 0.9968: log likelihood = -1417.04
AIC=2840.08 AICc=2840.1 BIC=2854.8
In general, \(ARMA()\) parameter estimation is performed with Maximum Likelihood Estimation
It has; a function to be maximized (log-likelihood function), initial parameters and optimization algorithm.
For example for \(MA(1); \; X_t=a_t - \theta a_{t-1}\) process, the only observed value is the initial value, \(a_0\) or one can assume it is zero (conditional maximum likelihood), then;
find \(\theta\) so that log-likelihood function, \(L(\theta|X_t, a_0=0)\) maximized. To do so, \(a_t\) first estimated (starting \(a_o=0\) or \(a_T=0\), for a some given \(\theta\) these can be estimated recursively)
forecast
, feasts
, fable
(and
some others that you may find) packages offers some tools that is
valuable for fitting and assessing time series processesSeries: sim1
ARIMA(0,1,2)
Coefficients:
ma1 ma2
-0.2449 -0.2318
s.e. 0.0335 0.0406
sigma^2 = 1.094: log likelihood = -1461.73
AIC=2929.46 AICc=2929.48 BIC=2944.18
acf2AR()
: stat
package
ARMAacf()
: stat
package
ARMAtoMA()
: stat
package
ar()
: stat
package
tseries::arma()
: tseries
package,
intercept Parameterization is different than arima()
function
StructTS()
: stat
package, see help
page
forecast::auto.arima()
: forecast
package