🎞️

FTS-2 Time Series Decomposition, Stationarity, White Noise

Decomposition of time series

In general, a time series can be decomposed into macroscopic components and a microscopic component
  • macro: the component refers to the trend and / or seasonality readily visible from plotting the data
  • micro: the component refers to the noise which is the main source of autocorrelation or temporal dependence in the time series
We can decompose a time series as
Trend and seasonality are deterministic functions of time while noise is the random noise. Most of time series analysis deals with modeling the noise . If represent the estimates of trend and seasonality respectively, then the residual time series is given by
For positive time series, multiplicative decomposition is
Note that these decompositions are mostly used for the explanation and interpretation of time series instead of forecasting. Forecasting requires modeling of noise.
The basic decomposition method is available in Python via statsmodels.tsa.seasonal.seasonal_decompose which allows for both additive and multiplicative decomposition. It uses convolution filtering as a way to obtain trend and requires input period to estimate seasonality. But it does not molde the noise and is not suitable for forecasting.
An additive model is appropriate if the magnitude of the seasonal fluctuations does not vary with the level (trend) of the time series
An additive model is appropriate if the magnitude of the seasonal fluctuations does not vary with the level (trend) of the time series
The multiplicative model is appropriate if the seasonal fluctuations increase or decrease proportionally with increases and decreases in the level of the series
The multiplicative model is appropriate if the seasonal fluctuations increase or decrease proportionally with increases and decreases in the level of the series
Note that the seasonal_decompose only considers additive and multiplicative, if we believe a time series might have a more complicated structure (e.g. , then we might need to consider the decomposition manually.

Fitting Trend

Assume that there is no seasonality so that the time series can be written as a deterministic function of and noise . With zero mean noise , i.e. , we have . We can estimate the trend by regressing the time series on time .
Parametric Modeling
Suppose we know for all with some unknown values . Then and can be estimated by
With these estimators, the estimate of trend is .
This can be generalized to fitting a polynomial in , i.e. (numpy.polyfit)
Β 
Remark:
  • parametric modeling assumes a fixed trend for the entire span of the data, which is in general not true
  • linear or polynomial trends can be disastrous in the long run and can be absurd for extrapolation
Filtering
It can be seen as a local averaging non-parametric method of fitting trend. Filtering is also sometimes referred to as moving average smoothing. The filtering estimate of trend is
Often so that the smoothing filter uses the same number time points before and after to smooth the time series.
This is available in Python via statsmodels.tsa.filters.filtertools.convolution_filter(x, filt, nsides=2) where filt is the vector of weights .
When , we have the -point moving average filter. If , then it’s a simple three-point moving average.
For example, suppose , then the -moving point average filter is
Above method makes use of both times before and after current time . This can be problematic when we reach the end of the series and from a forecasting point of view. Setting nsides = 1 in convolution_filter function allows for only using time points before the current time.
An alternative method is to use an exponential smoothing filter given by
for some . Empirical experience suggests that chosen between 0.1 and 0.3 works well.

Fitting Seasonality

Suppose we have a detrended series where is a noise process and is a seasonal component such that . If the seasonality stands out from the noise, then is expected to be larger whenever is large. The sample autocorrelation at lag , lag , … are expected to the largest other than the autocorrelation at lag 0.
In the presence of seasonality, fitting trend and seasonality are intertwined. The trend and seasonality may not be separated without any assumptions. Thus, we make an assumption that
is called the period or seasonal index of the time series. Most commonly days or months.
Define the detrended series as . The seasonality can be estimated by
for . And then define .
For the multiplicative decomposition, estimate trend , define the detrended series , estimate seasonality as and the noise .

Stationarity

With trend and seasonality being deterministic functions of time , a good forecast of requires modeling the noise well. From the estimates of trend and seasonality, it suffices to get a good forecast of . Then the forecast for is given by
Note that we can predict using only because of the temporal dependence. If the sequence were IID, we cannot predict the future because the future is independent of the past.
Basics of Linear Prediction
Suppose we have a random variable to be predicted based on other random variables . With respect to the squared error loss, the best prediction of is given by , where
Equivalently, and can be written as
and
When we have iiid observations, the expectation and covariances can be estimated using sample average. However, the needs estimations while the nees estimations. The computation costs can be expensive.
One solution to this issue is to apply the stationarity:
  • Strong Stationarity:
    • the entire probability distribution of the process is invariant under time shifts
    • the joint distribution of any collection of values is the same no matter when we observe them
    • this is a very strong condition, and often hard to check in practice
  • Weak Stationarity: (three conditions)
    • constant mean: for all
    • constant variance: for all
    • autocovariance depends only on lag not on actual time :
    • Weak stationarity is easier to check from data using sample mean, variance and autocorrelation function (ACF)
When we have weak stationarity, define as below
The map is called the autocovariance function while the map is called the autocorrelation function.
Under weak stationarity, the mean and autocovariance function can be estimated by
One can also use the denominator instead of which does not make a difference asymptotically. statsmodels.tsa.stattools.acovf has a parameter adjusted , when it’s False the function will use , when it’s True , the function uses . An estimate of the autocorrelation function follows from the estimate of the autocovariance function, given by statsmodels.tsa.stattools.acf.
Weak Stationarity and Prediction
Given time series data , we can compute and . This allows us to estimate the best linear prediction coefficients and as
and
And thus the best linear prediction of is
If we observe , we can update to and to . Then a forecast for is
The right hand side involves in two ways:
  • updating the coefficients
  • for to (still k numbers)
Remarks:
  • It might be beneficial to use that changes over time, for example, or
  • On face value, the forecast for only depends on , but in obtaining the coefficient estimates we use all of the data
  • This is a model-free linear prediction that is valid under weak stationarity assumption. This can be combined with conformal prediction intervals for uncertainty quantification
class TimeSeriesOLS(): def __init__(self, Xt, alpha=None, beta=None): self.alpha = alpha self.beta = beta self.Xt = Xt def fit(self, T): self.mu = self.Xt.mean() rho_hat = acf(self.Xt, nlags=T+1, adjusted=True) autocorr = np.zeros((T+1, T+1)) for col in range(T+1): for row in range(T+1): autocorr[row, col] = rho_hat[abs(row-col)] self.beta = np.linalg.inv(autocorr) @ rho_hat[1:][::-1] # [::-1] is because following inputs are from T-k-1 to T-1 self.alpha = self.mu * (1 - self.beta @ np.ones(T+1)) def predict(self, pred_len): preds = np.zeros(pred_len) N = self.beta.shape[0] input_vec = self.Xt[-N:] for i in range(pred_len): preds[i] = self.alpha + self.beta @ input_vec input_vec = np.r_[input_vec[1:], preds[i]] return preds
Weak Stationarity and Asymptotics
Under iid data, we have several useful results such as law of large numbers, and central limit theorem that yield confidence intervals and hypothesis tests. Under dependence, stationarity provides such a framework to derive such asymptotic nice results.
Law of Large Numbers: Suppose is weakly stationary with and . If , then
The assumption . This requires that tends to be zero as diverges, i.e., the autocovariance dies off between time points that are more and more separated. In other words, converges to zero as .

White Noise Process

The backbone of much of basic statistics is IID data. In such data, the consecutive random variables are independent. In time series, an analogue of IID data is called white noise process where the consecutive random variables are just required to be uncorrelated, rather than independent.
Formally, a white noise (WN) process is a weakly stationary process with mean 0, and autocovariance function given by
The autocorrelation function is then if and otherwise. A white noise process is denoted by WN.

Martingale Difference Sequences

Consider a time series . Define filtration . A sequence , is said to be a martingale difference sequence (MDS) if for all . This implies for any .
Since for all , thus
Therefore, the martingale difference sequence has no autocorrelation, and

Test for White Noise

If a time series is white noise, then immediate past does not lead to a better linear prediction of the future. The forecast for the future is just zero. If we model a time series and reach to residuals which are white noise, then there is no more linear modeling needed to obtain better forecast. It is thus important to test whether a time series is white noise.
For a stationary time series , the formal hypothesis of white noise is
Because of the difficulty in estimate infinite number of autocorrelations with finite data, we can restrict to testing
for some .
Barlett’s Formula
Barlett assumed
where and is iid with .
In this case
where is a covariance matrix with
If are iid mean zero variance , then for all and hence . Therefore, a 95% confidence interval for is , i.e.
Box-Pierce test
A simultaneous test of whether a sample autocorrelation function comes from an iid white noise process was first developed by Box and Pierce. For a data set of size , the Box-Pierce test statistic is
Under , the test statistics has an asymptotic distribution. Hence, the test at level rejects if , the quantile of .
The test is available via statsmodels.stats.diagnostic.acorr_ljunbox with argument boxpierce=True
import statsmodels.api as sm import numpy as np from scipy import stats np.random.seed(2025) x = np.random.normal(size=1000, loc=0, scale=1) lags_box = [1, 2, 3] sm.stats.acorr_ljungbox(x, lags=lags_box, return_df=True, boxpierce=True)
Ljung-Box test
Ljung and Box proposed an improved test statistic that is better approximated by the distribution than the Box-Pierce test statitic, which is
This is available via statsmodels.stats.diagnostic.acorr_ljungbox with boxpierce=False .

Loading Comments...