MIS 306: AR(p), MA(q) & ARMA(p,q)

I. Ozkan

Spring 2025

Linear Stochastic Stationary Time Series Models

Probably the simplest model is nothing but white noise model. This is simple white noise plus some constant \(c\).

\[X_t = c + a_t\]

where, \(a_t \: i.i.d \: \sim N(0,\sigma^2)\) or it is purely random numbers follows normal distribution and \(Cov(a_t,a_{t-k})=0\) for all \(k \neq 0\).

[1] 1990.000 1993.827   52.000

AR(1) Model

Stationary stochastic AR(1) model is the simplest AutoRegressive model. It is given as:

\(X_t=\phi \times X_{t-1} + a_t=\phi X_{t-1} + a_t\)

where, \(|\phi| < 1\) with \(a_t\) is white noise. If we continue to rewrite the above equation with replacing the lagged values:

\(X_t=\phi X_{t-1} + a_t=\phi (\phi X_{t-2}+a_{t-1}) + a_t=\phi^2X_{t-2}+\phi a_{t-1}+a_t\)

\(=\phi^2(\phi X_{t-3}+a_{t-2}) +\phi a_{t-1}+a_t=\phi^3X_{t-3}+\phi^2 a_{t-2}+\phi a_{t-1}+ a_t\)
or

\(\cdots=(1+\phi B+\phi^2 B^2+\cdots)a_t\)

Or mathematically this model can be written as

\((1-\phi B)X_t=a_t\)

where \(BX_t \equiv X_{t-1}\)

Let’s re-write the above equation as:

\(X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t\)

This is not difficult to prove (hint use Taylor Series around \(x=0\) for \(f(x)=(1-x)^{-1}\))

The expected value of \(X_t\) then,

\(E[X_t]=E[\phi X_{t-1}+ a_t]=E[(1+\phi B+\phi^2 B^2+\cdots)a_t]=0\)

For the variance of \(X_t\),

\(V(X_t)=V(\phi X_{t-1}+ a_t)=V(a_t+\phi a_{t-1}+\phi^2 a_{t-2}+\cdots)=\sigma^2+\phi^2\sigma^2+\phi^4\sigma^2+\cdots\)

This can approach to a value if and only if, (iff), \(|\phi|<1\).

Assuming that \(V(X_t)=V(X_{t-k})\) for all \(k\) and \(Cov(X_{t-k},a_t)=0\) for \(k\neq 0\), then,

\(V(X_t)=V(\phi X_{t-1}+ a_t)=V(\phi X_{t-1}) + V(a_t)\implies V(X_t)=\frac{\sigma^2}{1-\phi^2}\)

It means that if \(|\phi|=1\) then the \(V(X_t)\) explodes to infinity. The model with \(\phi=1 \implies X_t=X_{t-1}+a_t\) is called Random Walk

Example:

Random Walk

Random walk is a special process used in many context in finance/economics. The model:

\(X_t=X_{t-1}+a_t\)

Then,

\(E[X_t|X_{t-1}, X_{t-2},\cdots]=E[X_{t}|X_{t-1}]+E[a_t]=X_{t-1}+0\)

\(X_t=\delta+X_{t-1}+a_t\)

Simulated Series

500 simulated series of Random Walk (\(a_t \sim N(0,1)\)) and Random Walk with Drift (\(\delta=0.2, \; a_t \sim N(0,1)\))

Stationary Series

It means, sample moments for finite stretches of the realisation approach their population counterparts as the length of the realisation becomes infinite.

A stochastic process is said to be strictly stationary if its properties are unaffected by a change of time origin (T. Mills, Time Series Techniques for Economists, p11).

For any \(E[X_t]^2 < \infty\), Joint probability distribution,

\(P(X_{t_1},X_{t_2},\cdots,X_{t_m}) = P(X_{t_{k+1}},X_{t_{k+2}},\cdots,X_{t_{k+m}})\)

Here are what we assume in simple terms, for \(m=1\):

\(E[X_1]=E[X_2]=E[X_3] \cdots E[X_t]=\mu\)

\(V(X_1)=V(X_2)=V(X_3) \cdots V(X_t)=\sigma_x^2\)

for \(m=2\), time series should exhibit constant covariance for being strictly stationary. \(Cov(x_t,x_{t-k})\) should be constant for all \(k \neq 0\)

As in these cases, condition applies to only first and second moment, this is known as weak stationary (or covariance stationary). The above covariance equation is known as autocovariance and can be written as:

\(\gamma_k=Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]=Cov(X_{t-k},X_t)=\gamma_{-k}\)

with \(\gamma_0=Var(X_t)\) and hence the Autocorrelation (ACF) function at lag \(k\),

\(\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\gamma_{-k}}{\gamma_0}=\rho_{-k}\)

Wold Decomposition

\(X_t-\mu=a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots=\displaystyle\sum_{j=0}^{j=\infty}\psi_j a_{t-j} \: \: \: \: with \: \: \: \psi_0=1\)

\(a_t\) is defined before as white noise and hence,

\(E(a_t)=0\), \(V(a_t)=E(a_t^2)=\sigma^2<\infty\) and \(cov(a_t,a_{t-k})=0 \: \: for \: all \: \: k\neq 0\), or for the sake of simplicity let’s repeat \(a_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)\)

Autocorrelation Function (ACF)

Recall that, \(V(X_t)=E[(X_t-\mu)^2]\) and Wald decomposition allows us to rewrite variance as:

\(V(X_t)=E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots)^2]\)

\(\implies E[(a_t)^2] + \psi_1^2 E[(a_{t-1})^2] + \psi_2^2 E[(a_{t-2})^2] + \cdots=\sigma^2+\psi_1^2 \sigma^2+\psi_2^2 \sigma^2+ \cdots\)

since \(E(a_t a_{t-k})=0 \: \: for \: k \neq 0\)

Then, \(V(X_t)=\gamma_0=\sigma^2 \displaystyle\sum_{j=0}^{j=\infty}\psi_j^2\)

Similarly, \(Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]\), then,

\(E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots+\psi_k a_{t-k}+\cdots)(a_{t-k}+\psi_1 a_{t-k-1}+\psi_2 a_{t-k-2}+\cdots)]\)

\(\sigma^2(\psi_k+\psi_1 \psi_{k+1}+\psi_2 \psi_{k+2}+\cdots)=\sigma^2 \displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}\)

Hence, Autocorrelation at lag \(k\)

\(\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}}{\displaystyle\sum_{j=1}^{j=\infty}\psi_j^2}\)

in order to converge,

\(\displaystyle\sum_{j=0}^{j=\infty}|\psi_j|<\infty\)

ACF of AR(1)

For AR(1) model, recall that,

\(X_t - \phi X_{t-1}=a_t\)

and

\(X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t\)

hence, \(\psi_j=\phi^j\).

If we multiply both sides with \(X_{t-k}\) of \(AR(1)\) model, \(X_t X_{t-k}-\phi X_{t-1}X_{t-k}=a_t X_{t-k}\) and taking expectations of both sides

\(E(X_t X_{t-k})-\phi E(X_{t-1}X_{t-k})=\gamma_k - \phi \gamma_{k-1} =E(a_t X_{t-k})\)

\(\implies \gamma_k - \phi \gamma_{k-1}=0 \implies \gamma_k = \phi \gamma_{k-1} \implies \gamma_k = \phi^k \gamma_0\)$

hence, \(ACF\) then, \(\rho_k=\phi^k\) for \(AR(1)\) process

Since we know that, \(|\phi|<1\) for a stationary process. Then here are two examples of \(\rho_k\) for negative and positive values of \(\phi\)

AR(1) Simulations

Here there are four \(AR(1)\) models simulated with \(a_t \: i.i.d. \sim N(0,2^2)\)

\(X_t = 0.5 X_{t-1} + a_t\)

\(X_t = 0.9 X_{t-1} + a_t\)

\(X_t = -0.5 X_{t-1} + a_t\)

\(X_t = -0.9 X_{t-1} + a_t\)



Moving Average Processes

The simplest moving average \(MA(q)\) process is first order moving average, \(MA(1)\), process. It is given as;

\(X_t=a_t - \theta a_{t-1}=(1-\theta B)a_t\)

In this case, \(V(X_t)=\gamma_0=V(a_t - \theta a_{t-1})=\sigma^2+\theta^2 \sigma^2=\sigma^2(1+\theta^2)\).

\(\gamma_1=-\theta \sigma^2\) and \(\gamma_k=0 \: \: for \: all \: k>1\). Hence, \(ACF\) of \(MA(1)\) is then, \(\rho_1=\frac{-\theta}{1+\theta^2}\) and \(\rho_k=0\) for all \(k > 1\)

Here is the correlation of two consecutive observartion is not zero but there is no memory for higher number of lags.

Causal and Invertible Process

\((1-\theta B)^{-1} X_t=a_t=(1+\theta B + \theta^2 B^2+\cdots)X_t\)

AR(p) and MA(q) Processes

\(X_t=\phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} +a_t\) or,

\(X_t- \phi_1 X_{t-1} - \phi_2 X_{t-2} - \cdots - \phi_p X_{t-p} = a_t\)

\((1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p)X_t = \Phi(B)X_t =a_t\)

The function

\(\Phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p\)

called as characteristic polynomial (or characteristic equation).

\(X_t=a_t - \theta_1 a_{t-1} - \theta_2 a_{t-2} - \cdots - \theta_q a_{t-q}\)

Similarly,

\(X_t=(1-\theta_1 B - \theta_2 B^2 - \cdots -\theta_q B^q) a_t = \Theta(B) a_t\)

AR(p) Examples

Examples

\(X_t = -0.5 X_{t-1} + a_t\)

\(X_t = 0.5 X_{t-1} + -0.5 X_{t-2} + a_t\)

\(X_t = 0.2 X_{t-1} - 0.4 X_{t-2} + 0.6 X_{t-1} + a_t\)

\(X_t = 0.5 X_{t-1} + 0.4 X_{t-2} - 0.6 X_{t-3} + 0.4 X_{t-4} + a_t\)



MA(q) Examples

\(X_t = a_t - a_{t-1}\)

\(X_t = a_t - a_{t-1} + a_{t-2}\)

\(X_t = a_t - a_{t-1} - a_{t-2} - a_{t-3}\)

\(X_t = a_t - 0.25a_{t-1} - 0.25a_{t-2} - 0.25a_{t-3} - 0.25a_{t-4}\)



Stationary Time Series Revisited

\[\Phi(B)=(1-g_1 B)(1-g_2 B) \cdots (1-g_p B)=0\] must be within unit-disk (inside unit circle), \(|g_i| <1 \: \: for \: all \: i=1,2,..,p\)

\[\rho_k=A_1 g_1^k + A_2 g_2^k + \cdots + A_p g_p^k\]

and \(|g_i|<1\) means that \(ACF\) composed of damped exponentials (real roots) or damped sine waves (imaginary roots)

Example: \(AR(2)\)

\((1 - \phi_1 B - \phi_2 B^2)X_t = \Phi(B)X_t =a_t \implies \Phi(B)=1 - \phi_1 B - \phi_2 B^2\)

\(\Phi(B)=0 \implies g_1,g_2=\frac{\phi_1 \mp \sqrt{\phi_1^2+4 \phi_2}}{2}\)

with \(|g_1|<1\) and \(|g_2|<1\) for stationarity. Then the followings should hold:

\(\phi_1+\phi_2<1\), \(-\phi_1+\phi_2<1\), \(-1<\phi_2<1\)

The roots are complex iff \(\phi_1^2+4 \phi_2<0\)

Partial AutoCorrelation Function (PACF)

\(X_t=\phi_{k,1} X_{t-1}+\phi_{k,2} X_{t-2}+\cdots+\phi_{k,k} X_{t-k}\)

\(ACF,\; \rho_k=\frac{-\theta_k+\theta_1 \theta_{k+1}+\cdots+\theta_{q-k} \theta_{q}}{1+\theta_1^2+\cdots+\theta_q^2}\)

and \(k=1,2,...,q, \; \rho_k=0 \; for \: k>q\).

ACF and PACF of AR(p) and MA(q)

AR(p) Examples

\(AR(p), p=1,2,3,4\)

ARMA(p,q) Process

\(X_t-\phi X_{t-1}=a_t - \theta a_{t-1}\)

or

\((1-\phi B) X_t =(1-\theta B) a_t\)

\(MA(\infty)\) representation,

\(X_t=\frac{1-\theta B}{1-\phi B}a_t\)

Similarly \(AR(\infty)\) representation,

\(\frac{1-\phi B}{1-\theta B}X_t=a_t\)

\(X_t-\phi_1 X_{t-1}-\phi_2 X_{t-2}-\cdots-\phi_p X_{t-p}=a_t - \theta_1 a_{t-1}-\theta_2 a_{t-2}-\cdots-\theta_q a_{t-q}\)

\((1-\phi_1 B - \phi_2 B^2 - \cdots - - \phi_p B^p) X_t =(1-\theta_1 B -\theta_2 B^2 - \cdots -\theta_1 B^q) a_t\)

\(\Phi(B) X_t =\Theta(B) a_t\)

\(AR(\infty)\) representation

\(\frac{\Phi(B)}{\Theta(B)}X_t=a_t\)

\(MA(\infty)\) representation

\(X_t=\frac{\Theta(B)}{\Phi(B)}a_t\)

with both stationary condition and invertibilty condition

ARMA(1,1) Simulation Examples

Theoretical ACF and PACF

$roots
[1] 0.7

$acf
           0            1            2            3            4            5 
1.0000000000 0.8307692308 0.5815384615 0.4070769231 0.2849538462 0.1994676923 
           6            7            8            9           10           11 
0.1396273846 0.0977391692 0.0684174185 0.0478921929 0.0335245350 0.0234671745 
          12           13           14           15           16           17 
0.0164270222 0.0114989155 0.0080492409 0.0056344686 0.0039441280 0.0027608896 
          18           19           20 
0.0019326227 0.0013528359 0.0009469851 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] 0.7

$pacf
 [1]  8.307692e-01 -3.506494e-01  1.687500e-01 -8.359133e-02  4.169884e-02
 [6] -2.083735e-02  1.041717e-02 -5.208396e-03  2.604175e-03 -1.302084e-03
[11]  6.510418e-04 -3.255208e-04  1.627604e-04 -8.138021e-05  4.069010e-05
[16] -2.034505e-05  1.017253e-05 -5.086263e-06  2.543132e-06 -1.271566e-06

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)
$roots
[1] -0.7

$acf
            0             1             2             3             4 
 1.0000000000 -0.2363636364  0.1654545455 -0.1158181818  0.0810727273 
            5             6             7             8             9 
-0.0567509091  0.0397256364 -0.0278079455  0.0194655618 -0.0136258933 
           10            11            12            13            14 
 0.0095381253 -0.0066766877  0.0046736814 -0.0032715770  0.0022901039 
           15            16            17            18            19 
-0.0016030727  0.0011221509 -0.0007855056  0.0005498539 -0.0003848978 
           20 
 0.0002694284 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$pacf
 [1] -2.363636e-01  1.160714e-01 -5.777778e-02  2.885683e-02 -1.442441e-02
 [6]  7.211705e-03 -3.605790e-03  1.802887e-03 -9.014426e-04  4.507212e-04
[11] -2.253606e-04  1.126803e-04 -5.634014e-05  2.817007e-05 -1.408504e-05
[16]  7.042518e-06 -3.521259e-06  1.760630e-06 -8.803148e-07  4.401574e-07

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)
$roots
[1] 0.7

$acf
           0            1            2            3            4            5 
1.0000000000 0.2363636364 0.1654545455 0.1158181818 0.0810727273 0.0567509091 
           6            7            8            9           10           11 
0.0397256364 0.0278079455 0.0194655618 0.0136258933 0.0095381253 0.0066766877 
          12           13           14           15           16           17 
0.0046736814 0.0032715770 0.0022901039 0.0016030727 0.0011221509 0.0007855056 
          18           19           20 
0.0005498539 0.0003848978 0.0002694284 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] 0.7

$pacf
 [1] 2.363636e-01 1.160714e-01 5.777778e-02 2.885683e-02 1.442441e-02
 [6] 7.211705e-03 3.605790e-03 1.802887e-03 9.014426e-04 4.507212e-04
[11] 2.253606e-04 1.126803e-04 5.634014e-05 2.817007e-05 1.408504e-05
[16] 7.042518e-06 3.521259e-06 1.760630e-06 8.803148e-07 4.401574e-07

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)
$roots
[1] -0.7

$acf
            0             1             2             3             4 
 1.0000000000 -0.8307692308  0.5815384615 -0.4070769231  0.2849538462 
            5             6             7             8             9 
-0.1994676923  0.1396273846 -0.0977391692  0.0684174185 -0.0478921929 
           10            11            12            13            14 
 0.0335245350 -0.0234671745  0.0164270222 -0.0114989155  0.0080492409 
           15            16            17            18            19 
-0.0056344686  0.0039441280 -0.0027608896  0.0019326227 -0.0013528359 
           20 
 0.0009469851 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$pacf
 [1] -8.307692e-01 -3.506494e-01 -1.687500e-01 -8.359133e-02 -4.169884e-02
 [6] -2.083735e-02 -1.041717e-02 -5.208396e-03 -2.604175e-03 -1.302084e-03
[11] -6.510418e-04 -3.255208e-04 -1.627604e-04 -8.138021e-05 -4.069010e-05
[16] -2.034505e-05 -1.017253e-05 -5.086263e-06 -2.543132e-06 -1.271566e-06

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

Roots of Characteristic Function

Here are the examples of \(AR(p), \; p=1,2,3,4\) processes and the roots of characteristic equation.

Estimation

\(X_t=\phi X_{t-1} + a_t\) is just standard regression (with or without) intercept. The estimation can be achieved by lm(xt ~ -1+xt1). The proper steps may be (assume that \(X_t\) is ts object- denoted as xt in \(R\)):

xt1 <- lag(xt, -1)
data1 <- ts.union(xt, xt1)

with intercept: lm(xt ~ xt1, data=data1)

without intercept: lm(xt ~ 0 + xt1, data=data1)

in \(R\) there are other functions we can use. For example, ar(), arima() are functions available in base distribution. Arima(), auto.arima() are available in forecast package. AR(), MA() ARIMA() functions are available in fable package. The best way is to use these functions instead of using lm() function

Example with \(\phi=0.7 \; and \; \phi=-0.7\)

# AR Simulation
set.seed(12345)
x<-rnorm(1000)
sim1<-arima.sim(list(ar=c(0.7)),n=1000,innov=x)
sim2<-arima.sim(list(ar=c(-0.9)),n=1000,innov=x)

ggtsdisplay(sim1, main="Simulated AR(1), 0.7", theme = theme_bw())

# Estimation examples..
ar(sim1)

Call:
ar(x = sim1)

Coefficients:
     1  
0.7096  

Order selected 1  sigma^2 estimated as  0.9985
arima(sim1,order=c(1,0,0))

Call:
arima(x = sim1, order = c(1, 0, 0))

Coefficients:
         ar1  intercept
      0.7091     0.1503
s.e.  0.0222     0.1082

sigma^2 estimated as 0.9961:  log likelihood = -1417.32,  aic = 2840.64
Arima(sim1, order = c(1,0,0))
Series: sim1 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
         ar1    mean
      0.7091  0.1503
s.e.  0.0222  0.1082

sigma^2 = 0.9981:  log likelihood = -1417.32
AIC=2840.64   AICc=2840.66   BIC=2855.36



ggtsdisplay(sim2, main="Simulated AR(1), -0.9", theme = theme_bw())

# Estimation examples..
ar(sim2)

Call:
ar(x = sim2)

Coefficients:
      1  
-0.8702  

Order selected 1  sigma^2 estimated as  1.008
arima(sim2,order=c(1,0,0))

Call:
arima(x = sim2, order = c(1, 0, 0))

Coefficients:
          ar1  intercept
      -0.8725     0.0242
s.e.   0.0155     0.0169

sigma^2 estimated as 0.9948:  log likelihood = -1417.04,  aic = 2840.08
Arima(sim2, order = c(1,0,0))
Series: sim2 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
          ar1    mean
      -0.8725  0.0242
s.e.   0.0155  0.0169

sigma^2 = 0.9968:  log likelihood = -1417.04
AIC=2840.08   AICc=2840.1   BIC=2854.8

find \(\theta\) so that log-likelihood function, \(L(\theta|X_t, a_0=0)\) maximized. To do so, \(a_t\) first estimated (starting \(a_o=0\) or \(a_T=0\), for a some given \(\theta\) these can be estimated recursively)

Some Tools from Forecast Package

# Time Series, ACF and PACF together
tsdisplay(sim1)  

# a better one 
ggtsdisplay(sim1)

# auto parameter selection   
auto.arima(sim1)
Series: sim1 
ARIMA(0,1,2) 

Coefficients:
          ma1      ma2
      -0.2449  -0.2318
s.e.   0.0335   0.0406

sigma^2 = 1.094:  log likelihood = -1461.73
AIC=2929.46   AICc=2929.48   BIC=2944.18
# Diagnostic (later we discussed this part a bit more detailed)
tsdiag(Arima(sim1, order=c(1,0,0)))  

sim1 |> 
  as_tsibble() |> 
  model(fable::ARIMA(x ~ pdq(1,0,0))) |> 
  gg_tsresiduals()

# AR Roots plot
autoplot(Arima(sim1, order=c(1,0,0))) + theme_bw() 

List of Some Functions