Forecaster’s Toolbox

Read:

Chapter 5 of FPP: The forecaster’s toolbox

Data Preparation:

This topic was discussed in different courses. This step includes loading in data, identifying missing values, filtering the time series, and other pre-processing tasks. Useful packages are tsibble and tidyverse

Example: Obtain GDP per capita in global_economy data coming with tsibbledata package

gdppc <- global_economy |>
  mutate(GDP_per_capita = GDP / Population)

Plot the Data: Visualization is an essential part of data analysis

Example: Plot the GDP per capita for Sweden

gdppc |>
  filter(Country == "Sweden") |>
  autoplot(GDP_per_capita) +
  labs(y = "$US", title = "GDP per capita for Sweden") + 
  theme_bw()

Define a model (specify): There are many different time series models that can be used for forecasting, and we will cover some of them in this course

Example: Linear Model Definition (using fable package function). Check the formula specification

TSLM(GDP_per_capita ~ trend())

Train the model (fit, estimate):

One or more model specifications can be estimated using the model() function of fabletools package

fit <- gdppc |>
  model(trend_model = TSLM(GDP_per_capita ~ trend()))

# A mable: 263 x 2
# Key:     Country [263]
   Country             trend_model
   <fct>                   <model>
 1 Afghanistan              <TSLM>
 2 Albania                  <TSLM>
 3 Algeria                  <TSLM>
 4 American Samoa           <TSLM>
 5 Andorra                  <TSLM>
 6 Angola                   <TSLM>
 7 Antigua and Barbuda      <TSLM>
 8 Arab World               <TSLM>
 9 Argentina                <TSLM>
10 Armenia                  <TSLM>
# ℹ 253 more rows

Check model performance (evaluate):

How well the model has performed on the data

Produce forecasts (forecast):

forecast() function with argument horizon, for example 10 observation ahead h=10, can be used to forecast

# A fable: 789 x 5 [1Y]
# Key:     Country, .model [263]
   Country        .model       Year
   <fct>          <chr>       <dbl>
 1 Afghanistan    trend_model  2018
 2 Afghanistan    trend_model  2019
 3 Afghanistan    trend_model  2020
 4 Albania        trend_model  2018
 5 Albania        trend_model  2019
 6 Albania        trend_model  2020
 7 Algeria        trend_model  2018
 8 Algeria        trend_model  2019
 9 Algeria        trend_model  2020
10 American Samoa trend_model  2018
# ℹ 779 more rows
# ℹ 2 more variables: GDP_per_capita <dist>, .mean <dbl>

The forecast can be plotted to observe the performance

Some Simple Forecasting Methods

Mean method

The forecasts of all future values are equal to the average (or “mean”) of the historical data

\(\hat{y}_{T+h|T} = \bar{y} = (y_{1}+\dots+y_{T})/T\)

Notation: \(\hat{y}_{T+h|T}\) means estimate of \(\hat{y}_{T+h}\) based on \(y_1,\dots,y_T\)

Example: Australian clay brick production

Seasonal naïve method

A similar method is useful for highly seasonal data
Each forecast to be equal to the last observed value from the same season

\(\hat{y}_{T+h|T} = y_{T+h-m(k+1)}\)

where \(m\) is the seasonal period, \(k\) is the integer part of \((h-1)/m\) (the number of complete years)

Drift method

A naïve method that allows the forecasts to increase or decrease over time

\(\hat{y}_{T+h|T} = y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_{t}-y_{t-1}) = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right)\)

Like drawing a line between the first and last observations, and extrapolating it into the future

Mean, Naïve and Seasonal naïve together

Fitted Values and Residuals

Fitted Values \(\hat{y}_{t|t-1}\)

Each observation in a time series can be forecast using all previous observations, called fitted values \(\hat{y}_{t|t-1}\). Sometimes we write \(\hat{y}_{t}\) instead of \(\hat{y}_{t|t-1}\)
For example, for the mean method, \(\hat{y}_{t}=\hat c\) where \(\hat{c}= (y_T-y_1)/(T-1)\) is the average of \(y_t\) over the period

Residuals

\(e_{t} = y_{t}-\hat{y}_{t}\)

If transformation applied before fitting the model, \(w_t=log(y_t)\), then the residual calculated as \(w_{t}-\hat{w}_{t}\) is called innovation residuals and \(y_{t}-\hat{y}_{t}\) is called residuals.
augment() function can be used to obtain the fitted values and residuals

.model	Quarter	Beer	.fitted	.resid	.innov
Mean	1992 Q1	443	436.45	6.55	6.55
Mean	1992 Q2	410	436.45	-26.45	-26.45
Mean	1992 Q3	420	436.45	-16.45	-16.45
Mean	1992 Q4	532	436.45	95.55	95.55
Mean	1993 Q1	433	436.45	-3.45	-3.45
Mean	1993 Q2	421	436.45	-15.45	-15.45
Mean	1993 Q3	410	436.45	-26.45	-26.45

Residual diagnostics

Good models yield:
- The innovation residuals are uncorrelated
- The innovation residuals have zero mean, If they have a mean other than zero, then the forecasts are biased
In addition to these essential properties, it is useful (but not necessary) to have the following two properties:
- The innovation residuals have constant variance (homoscedasticity)
- The innovation residuals are normally distributed

Example, Google daily closing stock prices

For stock market prices and indexes, the best forecasting method is often the naïve method

\(\hat{y}_{t} = y_{t-1}\)

**Google** 2015 Stock Prices.
Symbol	Date	Open	High	Low	Close	Adj_Close	Volume	day
GOOG	2015-01-02	526.11	528.36	521.23	521.94	521.94	1447600	1
GOOG	2015-01-05	520.40	521.46	510.25	511.06	511.06	2059800	2
GOOG	2015-01-06	512.18	513.35	498.31	499.21	499.21	2899900	3
GOOG	2015-01-07	504.23	504.47	496.92	498.36	498.36	2065100	4
GOOG	2015-01-08	495.26	500.72	488.31	499.93	499.93	3353600	5
GOOG	2015-01-09	502.00	502.16	492.08	493.45	493.45	2069400	6
GOOG	2015-01-12	492.23	493.26	484.89	489.85	489.85	2322400	7

Residual Distribution

The mean of the residuals is close to zero
There is no significant correlation in the residuals series
There appears a one potential outlier, hence the variance can be treated as constant
The histogram suggests that the residuals may not be normal
Forecasts from this method will probably be quite good, (prediction intervals may be inaccurate)

gg_tsresiduals() function may be used to obtain all these graphs

Portmanteau tests for autocorrelation

One of the formal test for autocorrelation by considering a whole set of autocorrelation estimates, \(r_k\), as a group
Box-Pierce test

\(Q = T \sum_{k=1}^\ell r_k^2\) in which first \(\ell\) (it may be set for 10 for non seasonal data and \(2m\) for seasonal data) autocorrelation values are considered. \(T\) is the number of observations

Ljung-Box test

\(Q^* = T(T+2) \sum_{k=1}^\ell (T-k)^{-1}r_k^2\)

If the autocorrelations did come from a white noise series, then both \(Q\) and \(Q^∗\) would have a \(\chi^2\) distribution with \(\ell\) degrees of freedom

Example: Box-Pierce with \(\ell=10\)

# A tibble: 1 × 4
  Symbol .model       bp_stat bp_pvalue
  <chr>  <chr>          <dbl>     <dbl>
1 GOOG   NAIVE(Close)    7.74     0.654

Example: Ljung-Box with \(\ell=10\)

# A tibble: 1 × 4
  Symbol .model       lb_stat lb_pvalue
  <chr>  <chr>          <dbl>     <dbl>
1 GOOG   NAIVE(Close)    7.91     0.637

Both are not significant (p-values large), hence, residuals are not distinguishable from a white noise series

Distributional forecasts and prediction intervals

Prediction intervals

Assume that distribution of future observations is normal \(95\%\) prediction interval

\(\hat{y}_{T+h|T} \pm 1.96 \hat\sigma_h\)

where, \(\hat\sigma_h\) is an estimate of the standard deviation of the \(h\)-step forecast distribution

or in general,

\(\hat{y}_{T+h|T} \pm c \hat\sigma_h\)

\(c\) depends on the coverage probability

One-step prediction intervals

\(\hat{\sigma} = \sqrt{\frac{1}{T-K-M}\sum_{t=1}^T e_t^2}\)

\(K\): Number of parameters estimated in the forecasting method

\(M\): Number of missing values in residuals (\(M=1\) for a naive forecast)

Benchmark method	\(\hat\sigma_h\)
Mean	\(\hat\sigma_h = \hat\sigma\sqrt{1 + 1/T}\)
Naïve	\(\hat\sigma_h = \hat\sigma\sqrt{h}\)
Seasonal Naïve	\(\hat\sigma_h = \hat\sigma\sqrt{k+1}\)
Drift	\(\hat\sigma_h = \hat\sigma\sqrt{h(1+h/(T-1))}\)

**Google** 2015 Stock Prices.
Symbol	.model	day	Close	.mean	80%	95%
GOOG	NAIVE(Close)	253	N(759, 125)	758.88	[744.539977027039, 773.220032972961]80	[736.948824765081, 780.811185234919]95
GOOG	NAIVE(Close)	254	N(759, 250)	758.88	[738.600142955829, 779.159867044171]80	[727.864632472929, 789.895377527071]95
GOOG	NAIVE(Close)	255	N(759, 376)	758.88	[734.042347968872, 783.717662031128]80	[720.89408656317, 796.86592343683]95
GOOG	NAIVE(Close)	256	N(759, 501)	758.88	[730.199949054077, 787.560060945922]80	[715.017644530162, 802.742365469838]95
GOOG	NAIVE(Close)	257	N(759, 626)	758.88	[726.81472765321, 790.94528234679]80	[709.840395167922, 807.919614832078]95
GOOG	NAIVE(Close)	258	N(759, 751)	758.88	[723.754253569008, 794.005756430992]80	[705.159803967437, 812.600206032563]95
GOOG	NAIVE(Close)	259	N(759, 876)	758.88	[720.939857189835, 796.820152810165]80	[700.85555614027, 816.90445385973]95
GOOG	NAIVE(Close)	260	N(759, 1002)	758.88	[718.320280911657, 799.439729088343]80	[696.849259945858, 820.910750054142]95
GOOG	NAIVE(Close)	261	N(759, 1127)	758.88	[715.859921081116, 801.900088918884]80	[693.086464295244, 824.673545704756]95
GOOG	NAIVE(Close)	262	N(759, 1252)	758.88	[713.532854894915, 804.227155105085]80	[689.52752368199, 828.23248631801]95

Prediction intervals from bootstrapped residuals (Optional)

\(e_t = y_t - \hat{y}_{t|t-1}\)

Remember for naïve forecasting method, \(\hat{y}_{t|t-1} = y_{t-1}\) then
\(y_t = y_{t-1} + e_t\)

Assume that future errors will be similar to past errors

\(y^*_{T+1} = y_{T} + e^*_{T+1}\), where \(e^*_{T+1}\) is a randomly sampled error from the past

Adding, the second simulated observation, \(y^*_{T+2} = y_{T+1}^* + e^*_{T+2}\) and continuing the similar fashion, we can obtained the simulated path

# A tsibble: 150 x 5 [1]
# Key:       Symbol, .model, .rep [5]
   Symbol .model         day .rep   .sim
   <chr>  <chr>        <dbl> <chr> <dbl>
 1 GOOG   NAIVE(Close)   253 1      752.
 2 GOOG   NAIVE(Close)   254 1      742.
 3 GOOG   NAIVE(Close)   255 1      729.
 4 GOOG   NAIVE(Close)   256 1      705.
 5 GOOG   NAIVE(Close)   257 1      683.
 6 GOOG   NAIVE(Close)   258 1      682.
 7 GOOG   NAIVE(Close)   259 1      670.
 8 GOOG   NAIVE(Close)   260 1      684.
 9 GOOG   NAIVE(Close)   261 1      679.
10 GOOG   NAIVE(Close)   262 1      689.
# ℹ 140 more rows

Prediction intervals can be computed using percentiles of these simulated values
The result is called a bootstrapped prediction interval

Forecasting with Decomposition

\(y_t = \hat{S}_t + \hat{T}_t + \hat{R}_t\)

\(y_t = \hat{S}_t \times \hat{T}_t \times \hat{R}_t\)

To forecast a decomposed time series
- forecast the seasonal component \(\hat{S}_t\) ( usually assumed that the seasonal component is unchanging)
- forecast seasonally adjusted component \(\hat{A}_t\) where \(\hat{A}_t=\hat{T}_t+\hat{R}_t\) or \(\hat{A}_t=\hat{T}_t \times \hat{R}_t\)

Example: US retail employment, first obtain the seasonally adjusted series by decomposing with STL then forecast with naïve method

Forecast by forecastint additive components

Residuals

Significant autocorrelations due to the naïve method do not capture the changing trend in the seasonally adjusted series

MIS 306 - Data Analysis: Forecasting Forecaster Toolbox Cankaya University

Forecaster’s Toolbox

Some Simple Forecasting Methods

Fitted Values and Residuals

Residual diagnostics

Distributional forecasts and prediction intervals

MIS 306 - Data Analysis: Forecasting

Forecaster Toolbox

Cankaya University