🎞️

FTS-3 Wold Decomposition, ARIMA Model

Wold decomposition

Let is a weakly stationary process. Define as the linear space of . Let represent the best linear projection of on , i.e. , where
is said to be a deterministic process if , i.e. one whose future is perfectly linearly predictable from its past.
If , then the weakly stationary process can be written as a moving average process of infinite order w.r.t. some white noise process , i.e.
where , and is a deterministic process.
This shows that all weakly stationary processes that are not perfectly linearly predictable can be written as a linear transformation of a white noise process, that is,
However, this is a moving average of infinite order and cannot be used in practice for modeling or inference because it involves infinite number of parameters with no indication on how they can be estimated.
Remarks:
  • Any time series process such that only depends on the noise sequence up to time is called causal. The series does not reference the future.
  • Although Wold decomposition is not readily actionable, it shows that some β€œgood” approximation are possible under weak stationarity (MA, AR, etc. models)
Stationarity is closed under linear combinations. Suppose is a stationary process with mean zero and autocovariance function . If , then the process
is stationary with mean zero and autocovariance function
In the special case where is , we get

Moving Average Process

Definition and Properties

A process is said to be moving average of order denoted by if
where and are fixed constants.
The process is a reduction of the Wold decomposition with infinite number of unknowns by truncating after terms leading to a model with unknowns. (Because we are modeling noise, ).
process with any set of coefficients is weakly stationary because it is obtained by truncating the general representation of any weakly stationary sequence. If is an process, then
and
And the autocovariance can be proved to be
The autocorrelation function drops to zero after lag . If for a stationary process, we notice that the autocorrelation function drops to zero after lag , then might be a good fit. However, this might not always be true. For example, consider the process defined via
where are iid random variables and are some arbitrary functions. From the independence of , we get that and are independent for all and hence for all . The behavior of ACF for such an is similar to that of a moving average process , but is not a moving average process.
Backshift Operator
For any process , the backshift operator is defined as
Following this, we can write , and .
Using the backshift operator, we can define the differencing operator as
Following this, we can write
A time series follows the process if
for a white noise process . Using the backshift operator, this is
Define the polynomial
then the model can be written succinctly as .

Invertibility of MA process

Recall that a process with noise is said to be causal if can be written as a (convergent) linear combination of . Ivertibility is a concept dual to causality. A process with noise is said to be invertible if can be written as a (convergent) linear combination of .
Whenever we have a choice to model a particular time series in two ways, we should pick the one that is invertible and causal.
process:
We can use the product representation of
where are roots of . Then
which leads to a convergent linear combination of if for all .
Therefore, the MA process is invertible iff all the roots of lie outside of the unit circle (in the complex plane).
Note that two different MA processes can show the same autocorrelation structure. For example, with
both share the same autocorrelation . But one of them is invertible and the other is not (depends on the value of ).

Auto-regressive Process

Definition and Properties

A process is said to be auto-regressive process of order denoted by if
for some and constants . An process shows that the future value can be predicted based on last values of the series.
Unlike MA processes which are always stationary, AR processes are not always stationary.
AR using Backshift Operator
The equation is the same as
Define the polynomial
Then the model can be written succinctly as .
Recall that
whenever . The process can be written as
assuming .
Extending to , the product decomposition of is
for roots .
Identification
From equation , if we regress on for , then the coefficient of will appear to be zero for .
The coefficient of in the regression of on is called the partial autocorrelation of lag .
For example, regress on and obtain coefficents . Then is called the partial autocorrelation at lag 2, but is not the partial autocorrelation at lag 1. If we regress on and obtain the coefficent , is the partical autocorrelation at lag1.
For the model, the PACF drops to zero after lag , the true lag of the AR model. The ACF and PACF plots can help up identify the lags in MA and AR models respectively.
statsmodels.graphics.tsaplots.plot_pacf provides the plots of the PACF plot.

Stationarity of AR process

Process
Suppose for a white noise process . If , then
is the unique stationary process satisfying . The is a causal process and implies that is stationary because it shows is an MA process of infinite order. Furthermore, we have
If , the right hand side of does not converge because as . However, we have
which is stationary, but not causal. This solution is considered unnatural since is defined to be correlated with future values of . It is customary in modeling stationary time series to restrict processes with .
The processes with are called explosive as the value of the time series quickly become large in magnitude.
If , then the process does not have a stationary solution.
To make , there is an extra condition on
Then for
we have
otherwise will be a function of .
Process
Using the product representation, we can write the process as
The right hand side can be expanded as a causal MA process if (or equivalently, ). Continuing this, we get if , for all , then
It turns out the condition that all the roots lie outside the unit circle (in complex plane) is a necessary and sufficient condition for an process to be causal and stationary. This can be intuitively understood by writing the process as in the vector form. Define
Then is the same as , an example of vector autoregressive (VAR) process. In , we need the coefficient to be less than 1 in absolute value. In the process, we need the eigenvalues of to be less than 1 in absolute values.
Remark: All AR process are invertible because
γ…€
Invertibility
Always invertible
Not always invertible
Stationarity
Not always stationary
Always stationary

ARMA model

We may have to use relative long AR or long MA model to capture complex structure, requiring too many parameters from data. An alternative is the ARMA model.
A process is said to follow if
for coefficients and a white noise process .
If and are polynomials in terms of the backshift operator, then the process can be represented as .
  • If , ARMA model is reduced to
  • If , ARMA model is reduced to
γ…€
ACF
Tails off
Cuts off after lag
Tails off
PACF
Cuts off after lag
Tails off
Tails off
MA & AR representations of ARMA:
  • If all the solutions of lie outside the unit circle, then , which is an MA representation of the ARMA process and thus, causal and stationary.
  • If all the solutions of lie outside the unit circle, then , which is an AR representation of the ARMA process and thus invertible.
When modeling a noise series (a detrended time series), ARMA models is adequate.
ARIMA model
If we want to include the trend in modeling, we can use the ARIMA model. A time series is said to follow if is .
Because of the inclusion of trend, ARIMA is not expected to be a stationary process. An equivalent representation of ARIMA is
Because the roots of lie on the unit circle, is not stationary.
Integrated Process
A univariate time series is said to be integrated of order denoted by if is not stationary for but is stationary.
An -dimensional time series is integrated of order if at least one of its coordinates is and all others are for some .
is integrated of order means that it can be fitted with a stationary only if .

Estimation in ARIMA models

AR models

Least Square Estimation
Suppose satisfies
with a white noise process .
If the observed sequence is , then the least squares estimator for is obtained by minimizing
over all . There is a closed-form estimator given by
This is the least square regression applied to on where .
If , i.e. model , then the least squares estimator is given by
If are iid , then the least squares estimator is also the maximum likelihood estimator (MLE).
Under the assumed model, we have the asymptotics
where
For , this becomes
Note that the asymptotics here assume that the data generating model is a stationary process. Without stationarity, the asymptotic variance matrix can be negative. With estimated parameters and estimated autocorrelation, we can obtain approximate confidence intervals for .
Yule-Walker Equations
Method of moments for common time series models yields Yule-Walker equations. Take
Multiply both sides by and take expectation to get
for .
There are unknowns . The autocorrelations are unknown but we can estimate them with linear equations.
Take the equation corresponding to
If the process is stationary, then and hence replacing with the empirical autocorrelation function , we get
These are called the Yule-Walker estimates of . In practice, the computation of these estimators is done iteratively without inverting the matrix via the Durbin-Levinson algorithm.

MA models

Conditional Least Squares (CLS)
The goal of least squares is to minimize the sum of squares of residuals. Consider
Observed data is and we want to estimate .
Assume , then , and
Following this, we can represent in terms of known and unknown , conditional on .
With these representation of , compute and minimize this over . This minimization has to be done, e.g. via Newton’s method.
The conditional least squares method can be summarized as writing unobserved errors in terms of observed data and any unknown parameters and minimize the sum of squares of the errors.
Yule-Walker Equations
In the AR models case, the auto-correlation functions represented by the unknown parameters happen to be linear equations. But for MA models, they are not.
Consider the model: . The autocorrelation function for this process is given by
Replacing by its estimator and solving this quadratic equation, we obtain the Yule-Walker estimate
The roots are real only if , in which case the estimator corresponding to invertible MA is
Summary on Estimation of ARIMA models
  • In practice, we do not know and . We wil select and depending on the adequacy of the fitted model over different and . Commonly, we can use some information criteria such as AIC, BIC to find the adequate and . We can use arma_order_select_ic to find the orders for fitting ARMA models.
  • Whatever model we end up with, we should always test if the residuals in that model are white noise (using the Ljung-Box or Box-Pierce test).

Forecasting in ARIMA models

AR models

For the model
Given and if are known, then the best prediction of is given by
because the white noise process is linearly unpredictable.
With estimated parameters , the forecast is given by
Take model for example: , we have
Assume a stationary process asymptotically and hence, as . The same holds true for forecasting in . The -step ahead forecast tends to zero for large enough .
Prediction Uncertainty
In the model we have
Hence, .
This implies that .
Similarly, we have
which is increasing in and converges to (the variance of ) as .

MA models

Take model for example: . The best forecast for is zero because white noise is linearly unpredictable, and hence, the best forecast for is
where is the estimated residuals from the fit of models.
As for , note that and the best forecasts for are zero. Hence, the -step ahead forecast for is zero for model.
In practice, when fitting the MA model we can include an intercept: . In this case, the -step ahead forecast for is .
Consider the model . The best forecast for is zero because white noise is linearly unpredictable and hence, the best forecast for is
where are the estimated residuals obtained from MA model fitting.
As for , since and the best forecasts for are zero. Therefore, the 2-step ahead forecast is
The best -step ahead forecast is zero for .
Prediction Uncertainty
Because , we get .
For the 2-step ahead forecast, we get
This implies that
For -step ahead forecast,
Remarks:
  • The uncertainty in the prediction increases as we look more into the future (increasing )
  • All these forecasts depend on the correctness of the parametric model used

Tests for Stationarity

The MA processes are always stationary but AR and ARMA processes may not be. The condition for stationarity (and invertibility) is that the roots of the coefficient polynomials all lie outside of the unit circle.
For example, with , the condition for stationarity is . There are many tests for whether , which are called the unit root tests.

Dickey-Fuller test

Begin with the process with being a white noise process. DF test has
It is implemented as statsmodels.tsa.stattools.adfuller(data, maxlag=0, regression=...) . There are three versions of the DF test:
  • Fit and test if is zero. Suitable for non-trending noise series such as log returns. regression='n' .
  • Fit and test if is zero. Suitable for non-trending financial time series with constant mean such as interest rates. regression='c' .
  • Fit and test if is zero. Suitable for trending time series like asset prices or the levels of macroeconomic aggregates like real GDP. regression='ct' . (regression=’ctt’ uses linear and quadratic trend).
Under , the time series behaves like a random walk and the t-stats does not have a t-distribution. Fitting the regression model , we get
Under or equivalently
for the Brownian motion . The limiting distribution is not known in closed form and requires simulation to estimate its quantiles. is rejected if is smaller than 5% quantile of the limiting distribution.

Augmented Dickey-Fuller test

Dickey-Fuller test starts with the assumption of an model. Including the trend component can better model the data but could lead to a loss of power. Not all time-series variables can be well represented by . If there is residual autocorrelation in the process, then this test does not control type I error.
The augmented Dickey-Fuller test starts with an model and tests for a unit root by testing .
Recall that model with is stationary iff all roots of lie outside the unit circle. This implies that is necessary for the stationarity of .
Take model for example:
The augmented DF test with one extra lag (maxlag=1) fits the model
and tests if .
For general , the augmented DF test uses
where and .
When applying maxlag=k in adf_test , is set to be .

KPSS test

ADF is a parametric test as it relies on parametric models. Parametric models may not be satisfied for real data. A popular non-parametric test for stationarity is the KPSS test.
KPSS test starts with the model
where , contains deterministic components (intercept, ) and is a stationary process, which can be heteroscedastic.
The hypothesis of stationarity becomes .
Remark:
  • the null hypothesis of the KPSS test is stationary while the null hypothesis of the ADF test is non-stationary.
  • If the -value of KPSS is less than , then the series is non-stationary.

Loading Comments...