๐ŸŽž๏ธ

FTS-1 Conformal Prediction

Time Series Basics

Notations
Time series denoted by can be discrete or continuous in terms of the domain
  • if , it is called a discrete time series
  • if or , it is called a continuous time series
We only focus on discrete time series in this course.
Differences from i.i.d. Data
Traditional statistics deals with iid data, where we have law of large numbers, central limit theorems, hypothesis tests and confidence intervals following form the CLT.
If are iid random variables, then and stay the same for all and .
If we just have independent (not identically distributed) random variables, then and may not be the same for all but .
The main distinguishing feature of time series is the temporal dependence, i.e. .
The main goal of time series is to predict or forecast what happens in the future. All the time series modeling is to perform good forecasting. We observe the time series up to time , predict what would be or what would be .

Conformal Prediction

The conformal prediction is a generic method for constructing prediction intervals given any forecast.
We observe a time series , which can take values in . The problem of prediction interval is to construct and based on such that
If we know the distribution of , then and can be obtained using and . But in general, we do not know the distribution.
Let be a real-valued map, i.e. for all . This is called a score, measuring how far our forecast of is from . For example, if is a forecast for based on the data , then we can take the non-conformity score as
Non-conformity score often involves a forecast which can be estimated in many methods. One simple-minded forecast could be
If we can find such that
then
For example, with , this becomes
We cannot obtain without adequate knowledge of how the time series is generated. Classical time series methods make stringent assumptions to make this possible. Conformal prediction, on the other hand, can yield some weaker guarantee without making any assumptions.

PID Vanilla

Consider the scheme:
Only assume that for all , we have
Moreover, for any
The average miscoverage error on any time interval is small enough whenever the time interval is large enough.
A decent choice of in practice is
def make_forecast(df, target, method): if method == 'running_average': return df[target].mean() if method == 'least_squares': numerator = 0 denominator = 0 for k in range(1, len(df)): numerator += df[target].iloc[k] * df[target].iloc[k-1] for k in range(len(df)): denominator += df[target].iloc[k] ** 2 theta_hat = numerator / denominator return df[target].iloc[len(df) - 1] * theta_hat def select_step_size(scores, t, method): if method == 'scaled_max_score': return 0.1 * np.max(scores[t//2:]) def conformal_pid_vanilla(df, alpha, target, forecast_method, step_size_method): errs = [] scores = [] forecasts = [] lowers = [] uppers = [] widths = [] trailing10_coverages = [] trailing20_coverages = [] obs_so_far = pd.DataFrame() n = len(df) for t in range(n-1): new_obs = df.iloc[[t]] obs_so_far = pd.concat([obs_so_far, new_obs]) if t == 0: quantile_t = 0 else: score_t = abs(new_obs.values[0][0] - forecast_t) scores.append(score_t) err = score_t > quantile_t errs.append(err) eta_t = select_step_size(scores, t, step_size_method) quantile_t = quantile_t + eta_t * (err - alpha) width_t = 2 * quantile_t trailing10_coverages_t = 1 - np.mean(errs[-10:]) trailing20_coverages_t = 1 - np.mean(errs[-20:]) forecast_t = make_forecast(obs_so_far, target, forecast_method) forecasts.append(forecast_t) lowers.append(forecast_t - quantile_t) uppers.append(forecast_t + quantile_t) widths.append(width_t) trailing10_coverages.append(trailing10_coverages_t) trailing20_coverages.append(trailing20_coverages_t) df = df.iloc[1:].copy() df['1-step-ahead forecast'] = forecasts df['lower'] = lowers df['upper'] = uppers df['width'] = widths df['trailing10_coverage'] = trailing10_coverages df['trailing20_coverage'] = trailing20_coverages return df

Coverage Guarantee

PID Vanilla has a limitation of requiring a finite for validity. With , the coverage bound is actually infinity.
Recall that this method uses the iteration
This can be rewritten as
A simple modification of this iteration can yield a bound on miscoverage without the boundedness requirement:
where for some constants . is a typical value of and can be taken to be 5 or 6 for time series of length less than . This is PID Tangent Integrator.
The updates of quantiles should then be
def conformal_pid_tangent(df, alpha, target, forecast_method): ... X0 = df[target].iloc[0] B = max(5 * abs(X0), 10) C = 6 n = len(df) for t in range(n-1): ... if t == 0: quantile_t = 0 else: score_t = abs(new_obs.values[0][0] - forecast_t) err = score_t > quantile_t errs.append(err) errs_alpha = [error - alpha for error in errs] quantile_t = B * np.tan((np.log(t)/(t * C)) * np.sum(errs_alpha)) ... return df
Remarks:
  • the conformal prediction sets are sequential in nature because we update the prediction set for the future each time we observe the time series at a new point.
  • the advantage of conformal prediction is that no matter what forecast is used, the miscoverage bound holds true.
PID Tangent Example
Take the forecast
This says on average varies around a constant.
Take and set . Take and . For , consider the iteration
Return

Loading Comments...