AR(1) Model

Stationary stochastic AR(1) model is the simplest AutoRegressive model. It is given as:

$X_t=\phi \times X_{t-1} + a_t=\phi X_{t-1} + a_t$

where, $|\phi| < 1$ with $a_t$ is white noise. If we continue to rewrite the above equation with replacing the lagged values:

$X_t=\phi X_{t-1} + a_t=\phi (\phi X_{t-2}+a_{t-1}) + a_t=\phi^2X_{t-2}+\phi a_{t-1}+a_t$

$=\phi^2(\phi X_{t-3}+a_{t-2}) +\phi a_{t-1}+a_t=\phi^3X_{t-3}+\phi^2 a_{t-2}+\phi a_{t-1}+ a_t$
or

$\cdots=(1+\phi B+\phi^2 B^2+\cdots)a_t$

Or mathematically this model can be written as

$(1-\phi B)X_t=a_t$

where $BX_t \equiv X_{t-1}$

Let’s re-write the above equation as:

$X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t$

This is not difficult to prove (hint use Taylor Series around $x=0$ for $f(x)=(1-x)^{-1}$)

The expected value of $X_t$ then,

$E[X_t]=E[\phi X_{t-1}+ a_t]=E[(1+\phi B+\phi^2 B^2+\cdots)a_t]=0$

For the variance of $X_t$,

$V(X_t)=V(\phi X_{t-1}+ a_t)=V(a_t+\phi a_{t-1}+\phi^2 a_{t-2}+\cdots)=\sigma^2+\phi^2\sigma^2+\phi^4\sigma^2+\cdots$

This can approach to a value if and only if, (iff), $|\phi|<1$.

Assuming that $V(X_t)=V(X_{t-k})$ for all $k$ and $Cov(X_{t-k},a_t)=0$ for $k\neq 0$, then,

$V(X_t)=V(\phi X_{t-1}+ a_t)=V(\phi X_{t-1}) + V(a_t)\implies V(X_t)=\frac{\sigma^2}{1-\phi^2}$

It means that if $|\phi|=1$ then the $V(X_t)$ explodes to infinity. The model with $\phi=1 \implies X_t=X_{t-1}+a_t$ is called Random Walk

Example:

Four simulated time series, one is white noise and 3 of them are the particular realization of $AR(1)$ with different values of $\phi$

Random Walk

Random walk is a special process used in many context in finance/economics. The model:

$X_t=X_{t-1}+a_t$

Then,

$E[X_t|X_{t-1}, X_{t-2},\cdots]=E[X_{t}|X_{t-1}]+E[a_t]=X_{t-1}+0$

It simply means that the expected value of $X_t$ is the last observation
In other words, no one can anticipate the next value of $X_t$
Another type of special Random Walk model is called as Random Walk with Drift in which a constant is added to the model:

$X_t=\delta+X_{t-1}+a_t$

Simulated Series

500 simulated series of Random Walk ($a_t \sim N(0,1)$) and Random Walk with Drift ($\delta=0.2, \; a_t \sim N(0,1)$)

Stationary Series

Let’s start with Ergodicity (in this course we assume that all time series are ergodic)

It means, sample moments for finite stretches of the realisation approach their population counterparts as the length of the realisation becomes infinite.

Stationary means that the process is in a particular state of equilibrium (Box & Jenkins, 1976, p26).

A stochastic process is said to be strictly stationary if its properties are unaffected by a change of time origin (T. Mills, Time Series Techniques for Economists, p11).

In simple terms, time shift of observation do not cause any change in joint probability distribution
some math:

For any $E[X_t]^2 < \infty$, Joint probability distribution,

$P(X_{t_1},X_{t_2},\cdots,X_{t_m}) = P(X_{t_{k+1}},X_{t_{k+2}},\cdots,X_{t_{k+m}})$

Here are what we assume in simple terms, for $m=1$:

$E[X_1]=E[X_2]=E[X_3] \cdots E[X_t]=\mu$

$V(X_1)=V(X_2)=V(X_3) \cdots V(X_t)=\sigma_x^2$

for $m=2$, time series should exhibit constant covariance for being strictly stationary. $Cov(x_t,x_{t-k})$ should be constant for all $k \neq 0$

As in these cases, condition applies to only first and second moment, this is known as weak stationary (or covariance stationary). The above covariance equation is known as autocovariance and can be written as:

$\gamma_k=Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]=Cov(X_{t-k},X_t)=\gamma_{-k}$

with $\gamma_0=Var(X_t)$ and hence the Autocorrelation (ACF) function at lag $k$,

$\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\gamma_{-k}}{\gamma_0}=\rho_{-k}$

Since there exist a symmetry, only positive half of the Autocorrelation function is given

Wold Decomposition

Wold decomposition state that every purely stochastic time series can be written as a linear combination of uncorrelated random variables

$X_t-\mu=a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots=\displaystyle\sum_{j=0}^{j=\infty}\psi_j a_{t-j} \: \: \: \: with \: \: \: \psi_0=1$

$a_t$ is defined before as white noise and hence,

$E(a_t)=0$, $V(a_t)=E(a_t^2)=\sigma^2<\infty$ and $cov(a_t,a_{t-k})=0 \: \: for \: all \: \: k\neq 0$, or for the sake of simplicity let’s repeat $a_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)$

Autocorrelation Function (ACF)

Recall that, $V(X_t)=E[(X_t-\mu)^2]$ and Wald decomposition allows us to rewrite variance as:

$V(X_t)=E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots)^2]$

$\implies E[(a_t)^2] + \psi_1^2 E[(a_{t-1})^2] + \psi_2^2 E[(a_{t-2})^2] + \cdots=\sigma^2+\psi_1^2 \sigma^2+\psi_2^2 \sigma^2+ \cdots$

since $E(a_t a_{t-k})=0 \: \: for \: k \neq 0$

Then, $V(X_t)=\gamma_0=\sigma^2 \displaystyle\sum_{j=0}^{j=\infty}\psi_j^2$

Similarly, $Cov(X_t,X_{t-k})=E[(X_t-\mu)(X_{t-k}-\mu)]$, then,

$E[(a_t+\psi_1 a_{t-1}+\psi_2 a_{t-2}+\cdots+\psi_k a_{t-k}+\cdots)(a_{t-k}+\psi_1 a_{t-k-1}+\psi_2 a_{t-k-2}+\cdots)]$

$\sigma^2(\psi_k+\psi_1 \psi_{k+1}+\psi_2 \psi_{k+2}+\cdots)=\sigma^2 \displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}$

Hence, Autocorrelation at lag $k$

$\rho_k=\frac{Cov(X_t,X_{t-k})}{[V(X_t) \times V(X_{t-k})]^{0.5}} = \frac{\gamma_k}{\gamma_0}=\frac{\displaystyle\sum_{j=1}^{j=\infty}\psi_j \psi_{j+k}}{\displaystyle\sum_{j=1}^{j=\infty}\psi_j^2}$

in order to converge,

$\displaystyle\sum_{j=0}^{j=\infty}|\psi_j|<\infty$

ACF of AR(1)

For AR(1) model, recall that,

$X_t - \phi X_{t-1}=a_t$

and

$X_t=(1-\phi B)^{-1}a_t=\frac{a_t}{(1-\phi B)}=(1+\phi B+\phi^2 B^2+\cdots)a_t$

hence, $\psi_j=\phi^j$.

If we multiply both sides with $X_{t-k}$ of $AR(1)$ model, $X_t X_{t-k}-\phi X_{t-1}X_{t-k}=a_t X_{t-k}$ and taking expectations of both sides

$E(X_t X_{t-k})-\phi E(X_{t-1}X_{t-k})=\gamma_k - \phi \gamma_{k-1} =E(a_t X_{t-k})$

$\implies \gamma_k - \phi \gamma_{k-1}=0 \implies \gamma_k = \phi \gamma_{k-1} \implies \gamma_k = \phi^k \gamma_0$$

hence, $ACF$ then, $\rho_k=\phi^k$ for $AR(1)$ process

Since we know that, $|\phi|<1$ for a stationary process. Then here are two examples of $\rho_k$ for negative and positive values of $\phi$

AR(1) Simulations

Here there are four $AR(1)$ models simulated with $a_t \: i.i.d. \sim N(0,2^2)$

$X_t = 0.5 X_{t-1} + a_t$

$X_t = 0.9 X_{t-1} + a_t$

$X_t = -0.5 X_{t-1} + a_t$

$X_t = -0.9 X_{t-1} + a_t$

Moving Average Processes

The simplest moving average $MA(q)$ process is first order moving average, $MA(1)$, process. It is given as;

$X_t=a_t - \theta a_{t-1}=(1-\theta B)a_t$

In this case, $V(X_t)=\gamma_0=V(a_t - \theta a_{t-1})=\sigma^2+\theta^2 \sigma^2=\sigma^2(1+\theta^2)$.

$\gamma_1=-\theta \sigma^2$ and $\gamma_k=0 \: \: for \: all \: k>1$. Hence, $ACF$ of $MA(1)$ is then, $\rho_1=\frac{-\theta}{1+\theta^2}$ and $\rho_k=0$ for all $k > 1$

Here is the correlation of two consecutive observartion is not zero but there is no memory for higher number of lags.

Causal and Invertible Process

In order to have moving average process that converges, $|\theta|<1$ and this is known as the invertiblity which implies that the process can be written in terms of an infinite autoregressive representation. This representation is also known as $AR(\infty)$ representation

$(1-\theta B)^{-1} X_t=a_t=(1+\theta B + \theta^2 B^2+\cdots)X_t$

If not invertible, $|\theta| \geq 1$ then this process does not converge
Similarly, for $AR(1)$ recall that the stationary condition states that $|\phi|<1$ which is also known as causal process which states that it is possible to find $MA(\infty)$ representation

AR(p) and MA(q) Processes

$AR(p)$ model is given by

$X_t=\phi_1 X_{t-1} + \phi_2 X_{t-2} + \cdots + \phi_p X_{t-p} +a_t$ or,

$X_t- \phi_1 X_{t-1} - \phi_2 X_{t-2} - \cdots - \phi_p X_{t-p} = a_t$

$(1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p)X_t = \Phi(B)X_t =a_t$

The function

$\Phi(B) = 1 - \phi_1 B - \phi_2 B^2 - \cdots - \phi_p B^p$

called as characteristic polynomial (or characteristic equation).

$MA(q)$ model is given by

$X_t=a_t - \theta_1 a_{t-1} - \theta_2 a_{t-2} - \cdots - \theta_q a_{t-q}$

Similarly,

$X_t=(1-\theta_1 B - \theta_2 B^2 - \cdots -\theta_q B^q) a_t = \Theta(B) a_t$

AR(p) Examples

Examples

$X_t = -0.5 X_{t-1} + a_t$

$X_t = 0.5 X_{t-1} + -0.5 X_{t-2} + a_t$

$X_t = 0.2 X_{t-1} - 0.4 X_{t-2} + 0.6 X_{t-1} + a_t$

$X_t = 0.5 X_{t-1} + 0.4 X_{t-2} - 0.6 X_{t-3} + 0.4 X_{t-4} + a_t$

MA(q) Examples

$X_t = a_t - a_{t-1}$

$X_t = a_t - a_{t-1} + a_{t-2}$

$X_t = a_t - a_{t-1} - a_{t-2} - a_{t-3}$

$X_t = a_t - 0.25a_{t-1} - 0.25a_{t-2} - 0.25a_{t-3} - 0.25a_{t-4}$

Stationary Time Series Revisited

In general, a stationary time series will have no predictable patterns in the long-term. Time plots will show the series to be roughly horizontal (although some cyclic behaviour is possible), with constant variance
The stationarity conditions required for convergence of the weights are that the roots of the characteristic equation (characteristic polynomial) of $AR(p)$ process $\Phi(B)$:

\[\Phi(B)=(1-g_1 B)(1-g_2 B) \cdots (1-g_p B)=0\] must be within unit-disk (inside unit circle), $|g_i| <1 \: \: for \: all \: i=1,2,..,p$

Behaviour of $ACF$ is determined by $\Phi(B)\rho_k=0$ which has the solution;

\[\rho_k=A_1 g_1^k + A_2 g_2^k + \cdots + A_p g_p^k\]

and $|g_i|<1$ means that $ACF$ composed of damped exponentials (real roots) or damped sine waves (imaginary roots)

Example: $AR(2)$

$(1 - \phi_1 B - \phi_2 B^2)X_t = \Phi(B)X_t =a_t \implies \Phi(B)=1 - \phi_1 B - \phi_2 B^2$

$\Phi(B)=0 \implies g_1,g_2=\frac{\phi_1 \mp \sqrt{\phi_1^2+4 \phi_2}}{2}$

with $|g_1|<1$ and $|g_2|<1$ for stationarity. Then the followings should hold:

$\phi_1+\phi_2<1$, $-\phi_1+\phi_2<1$, $-1<\phi_2<1$

The roots are complex iff $\phi_1^2+4 \phi_2<0$

Partial AutoCorrelation Function (PACF)

$ACF$ of $AR(p)$ processes are damped out (either exponentially or sine waves) it is difficult to distinguish the order of the process
The $k^{th}$ partial autocorrelation function $pacf$, is the coefficient $\phi_{kk}$ of the following regression:

$X_t=\phi_{k,1} X_{t-1}+\phi_{k,2} X_{t-2}+\cdots+\phi_{k,k} X_{t-k}$

It measures the additional correlation between $X_t$ and $X_{t-k}$, after adjustments have been made for the intervening lags
$ACF$ is then zero for the lags larger than the order of the process
For $AR(p)$:
- $ACF$ is infinite in extend (damped exponential or sine wave) + $PACF$ is zero after $p^{th}$ lag.
For $MA(q)$, the process $X_t=\Theta(B)a_t$

$ACF,\; \rho_k=\frac{-\theta_k+\theta_1 \theta_{k+1}+\cdots+\theta_{q-k} \theta_{q}}{1+\theta_1^2+\cdots+\theta_q^2}$

and $k=1,2,...,q, \; \rho_k=0 \; for \: k>q$.

The memory of the process extends $q$ periods. In simple terms, $ACF$ is zero after $q^{th}$ lag
Recall that every invertible $MA(q)$ process can be represented as $AR(\infty)$ and hence it is easy to see that $PACF$ of $MA(q)$ process is infinite in extent. Again it exhibits mixture of exponential decays or damped sine wave

ACF and PACF of AR(p) and MA(q)

AR(p) Examples

$AR(p), p=1,2,3,4$

ARMA(p,q) Process

$ARMA(p,q)$ is the mixture of $AR(p)$ amd $MA(q)$. The simplest process is $ARMA(1,1)$,

$X_t-\phi X_{t-1}=a_t - \theta a_{t-1}$

$(1-\phi B) X_t =(1-\theta B) a_t$

$MA(\infty)$ representation,

$X_t=\frac{1-\theta B}{1-\phi B}a_t$

Similarly $AR(\infty)$ representation,

$\frac{1-\phi B}{1-\theta B}X_t=a_t$

$ARMA(1,1)$ leads to both autoregressive and moving average representations with infinite number of lags. Hence, $ACF$ is similar (not the same) to $AR(p)$ and $PACF$ is similar (again not the same) to $MA(q)$
Similarly, $ARMA(p,q)$ is then

$X_t-\phi_1 X_{t-1}-\phi_2 X_{t-2}-\cdots-\phi_p X_{t-p}=a_t - \theta_1 a_{t-1}-\theta_2 a_{t-2}-\cdots-\theta_q a_{t-q}$

$(1-\phi_1 B - \phi_2 B^2 - \cdots - - \phi_p B^p) X_t =(1-\theta_1 B -\theta_2 B^2 - \cdots -\theta_1 B^q) a_t$

$\Phi(B) X_t =\Theta(B) a_t$

$AR(\infty)$ representation

$\frac{\Phi(B)}{\Theta(B)}X_t=a_t$

$MA(\infty)$ representation

$X_t=\frac{\Theta(B)}{\Phi(B)}a_t$

with both stationary condition and invertibilty condition

ARMA(1,1) Simulation Examples

Theoretical ACF and PACF

$roots
[1] 0.7

$acf
           0            1            2            3            4            5 
1.0000000000 0.8307692308 0.5815384615 0.4070769231 0.2849538462 0.1994676923 
           6            7            8            9           10           11 
0.1396273846 0.0977391692 0.0684174185 0.0478921929 0.0335245350 0.0234671745 
          12           13           14           15           16           17 
0.0164270222 0.0114989155 0.0080492409 0.0056344686 0.0039441280 0.0027608896 
          18           19           20 
0.0019326227 0.0013528359 0.0009469851 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] 0.7

$pacf
 [1]  8.307692e-01 -3.506494e-01  1.687500e-01 -8.359133e-02  4.169884e-02
 [6] -2.083735e-02  1.041717e-02 -5.208396e-03  2.604175e-03 -1.302084e-03
[11]  6.510418e-04 -3.255208e-04  1.627604e-04 -8.138021e-05  4.069010e-05
[16] -2.034505e-05  1.017253e-05 -5.086263e-06  2.543132e-06 -1.271566e-06

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$acf
            0             1             2             3             4 
 1.0000000000 -0.2363636364  0.1654545455 -0.1158181818  0.0810727273 
            5             6             7             8             9 
-0.0567509091  0.0397256364 -0.0278079455  0.0194655618 -0.0136258933 
           10            11            12            13            14 
 0.0095381253 -0.0066766877  0.0046736814 -0.0032715770  0.0022901039 
           15            16            17            18            19 
-0.0016030727  0.0011221509 -0.0007855056  0.0005498539 -0.0003848978 
           20 
 0.0002694284 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$pacf
 [1] -2.363636e-01  1.160714e-01 -5.777778e-02  2.885683e-02 -1.442441e-02
 [6]  7.211705e-03 -3.605790e-03  1.802887e-03 -9.014426e-04  4.507212e-04
[11] -2.253606e-04  1.126803e-04 -5.634014e-05  2.817007e-05 -1.408504e-05
[16]  7.042518e-06 -3.521259e-06  1.760630e-06 -8.803148e-07  4.401574e-07

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] 0.7

$acf
           0            1            2            3            4            5 
1.0000000000 0.2363636364 0.1654545455 0.1158181818 0.0810727273 0.0567509091 
           6            7            8            9           10           11 
0.0397256364 0.0278079455 0.0194655618 0.0136258933 0.0095381253 0.0066766877 
          12           13           14           15           16           17 
0.0046736814 0.0032715770 0.0022901039 0.0016030727 0.0011221509 0.0007855056 
          18           19           20 
0.0005498539 0.0003848978 0.0002694284 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] 0.7

$pacf
 [1] 2.363636e-01 1.160714e-01 5.777778e-02 2.885683e-02 1.442441e-02
 [6] 7.211705e-03 3.605790e-03 1.802887e-03 9.014426e-04 4.507212e-04
[11] 2.253606e-04 1.126803e-04 5.634014e-05 2.817007e-05 1.408504e-05
[16] 7.042518e-06 3.521259e-06 1.760630e-06 8.803148e-07 4.401574e-07

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$acf
            0             1             2             3             4 
 1.0000000000 -0.8307692308  0.5815384615 -0.4070769231  0.2849538462 
            5             6             7             8             9 
-0.1994676923  0.1396273846 -0.0977391692  0.0684174185 -0.0478921929 
           10            11            12            13            14 
 0.0335245350 -0.0234671745  0.0164270222 -0.0114989155  0.0080492409 
           15            16            17            18            19 
-0.0056344686  0.0039441280 -0.0027608896  0.0019326227 -0.0013528359 
           20 
 0.0009469851 

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

$roots
[1] -0.7

$pacf
 [1] -8.307692e-01 -3.506494e-01 -1.687500e-01 -8.359133e-02 -4.169884e-02
 [6] -2.083735e-02 -1.041717e-02 -5.208396e-03 -2.604175e-03 -1.302084e-03
[11] -6.510418e-04 -3.255208e-04 -1.627604e-04 -8.138021e-05 -4.069010e-05
[16] -2.034505e-05 -1.017253e-05 -5.086263e-06 -2.543132e-06 -1.271566e-06

$periodicity
[1] damping period 
<0 rows> (or 0-length row.names)

Roots of Characteristic Function

Here are the examples of $AR(p), \; p=1,2,3,4$ processes and the roots of characteristic equation.

Estimation

$AR(p)$ models can be estimated by least square estimation $LSE$
For example, $AR(1)$ model,

$X_t=\phi X_{t-1} + a_t$ is just standard regression (with or without) intercept. The estimation can be achieved by lm(xt ~ -1+xt1). The proper steps may be (assume that $X_t$ is ts object- denoted as xt in $R$):

xt1 <- lag(xt, -1)
data1 <- ts.union(xt, xt1)

with intercept: lm(xt ~ xt1, data=data1)

without intercept: lm(xt ~ 0 + xt1, data=data1)

in $R$ there are other functions we can use. For example, ar(), arima() are functions available in base distribution. Arima(), auto.arima() are available in forecast package. AR(), MA() ARIMA() functions are available in fable package. The best way is to use these functions instead of using lm() function

Example with $\phi=0.7 \; and \; \phi=-0.7$

# AR Simulation
set.seed(12345)
x<-rnorm(1000)
sim1<-arima.sim(list(ar=c(0.7)),n=1000,innov=x)
sim2<-arima.sim(list(ar=c(-0.9)),n=1000,innov=x)

ggtsdisplay(sim1, main="Simulated AR(1), 0.7", theme = theme_bw())

# Estimation examples..
ar(sim1)


Call:
ar(x = sim1)

Coefficients:
     1  
0.7096  

Order selected 1  sigma^2 estimated as  0.9985

arima(sim1,order=c(1,0,0))


Call:
arima(x = sim1, order = c(1, 0, 0))

Coefficients:
         ar1  intercept
      0.7091     0.1503
s.e.  0.0222     0.1082

sigma^2 estimated as 0.9961:  log likelihood = -1417.32,  aic = 2840.64

Arima(sim1, order = c(1,0,0))

Series: sim1 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
         ar1    mean
      0.7091  0.1503
s.e.  0.0222  0.1082

sigma^2 = 0.9981:  log likelihood = -1417.32
AIC=2840.64   AICc=2840.66   BIC=2855.36

ggtsdisplay(sim2, main="Simulated AR(1), -0.9", theme = theme_bw())

# Estimation examples..
ar(sim2)


Call:
ar(x = sim2)

Coefficients:
      1  
-0.8702  

Order selected 1  sigma^2 estimated as  1.008

arima(sim2,order=c(1,0,0))


Call:
arima(x = sim2, order = c(1, 0, 0))

Coefficients:
          ar1  intercept
      -0.8725     0.0242
s.e.   0.0155     0.0169

sigma^2 estimated as 0.9948:  log likelihood = -1417.04,  aic = 2840.08

Arima(sim2, order = c(1,0,0))

Series: sim2 
ARIMA(1,0,0) with non-zero mean 

Coefficients:
          ar1    mean
      -0.8725  0.0242
s.e.   0.0155  0.0169

sigma^2 = 0.9968:  log likelihood = -1417.04
AIC=2840.08   AICc=2840.1   BIC=2854.8

In general, $ARMA()$ parameter estimation is performed with Maximum Likelihood Estimation
It has; a function to be maximized (log-likelihood function), initial parameters and optimization algorithm.
For example for $MA(1); \; X_t=a_t - \theta a_{t-1}$ process, the only observed value is the initial value, $a_0$ or one can assume it is zero (conditional maximum likelihood), then;

find $\theta$ so that log-likelihood function, $L(\theta|X_t, a_0=0)$ maximized. To do so, $a_t$ first estimated (starting $a_o=0$ or $a_T=0$, for a some given $\theta$ these can be estimated recursively)

Some Tools from Forecast Package

forecast, feasts, fable (and some others that you may find) packages offers some tools that is valuable for fitting and assessing time series processes

# Time Series, ACF and PACF together
tsdisplay(sim1)

# a better one 
ggtsdisplay(sim1)

# auto parameter selection   
auto.arima(sim1)

Series: sim1 
ARIMA(0,1,2) 

Coefficients:
          ma1      ma2
      -0.2449  -0.2318
s.e.   0.0335   0.0406

sigma^2 = 1.094:  log likelihood = -1461.73
AIC=2929.46   AICc=2929.48   BIC=2944.18

# Diagnostic (later we discussed this part a bit more detailed)
tsdiag(Arima(sim1, order=c(1,0,0)))

sim1 |> 
  as_tsibble() |> 
  model(fable::ARIMA(x ~ pdq(1,0,0))) |> 
  gg_tsresiduals()

# AR Roots plot
autoplot(Arima(sim1, order=c(1,0,0))) + theme_bw()

List of Some Functions

acf2AR(): stat package
ARMAacf(): stat package
ARMAtoMA(): stat package
ar(): stat package
tseries::arma(): tseries package, intercept Parameterization is different than arima() function
StructTS(): stat package, see help page
forecast::auto.arima(): forecast package

MIS 306: AR(p), MA(q) & ARMA(p,q)

Linear Stochastic Stationary Time Series Models