Gaussian process regression as an alternative to continuous-time autoregression

For small datasets, Gaussian process regression can be just as effective.

Ben Trueman true
2024-09-06

Recently, I’ve been learning about Gaussian processes, and it’s become clear that they offer an alternative to something I’ve spent some time on recently: modeling time series as the sum of one or more smoothing splines and a continuous time autoregression—see my bgamcar1 package for examples.

I’ll start this post with some simulations that helped me understand the basics of Gaussian processes, and then I’ll fit one to some (simulated) data and compare it with a continuous-time autoregression.

We’ll need the following packages:

The basic idea behind a Gaussian process regression is that we can estimate any continuous function as a draw from an \(N\)-dimentional multivariate normal, where \(N\) is the number of observations. For an isotropic Gaussian process, the covariance matrix of the multivariate normal is fully determined by the distance between all pairs of observations, via a covariance function. A popular choice is the exponentiated quadratic. For \(D\)-dimensional observations \(x_1, ..., x_N \in \mathbb{R}^D\) the covariance between \(x_i\) and \(x_j\) is calculated as

\[ K(x | \sigma, \rho, \alpha)_{i,j} = \alpha ^ 2 exp \left( -\frac{1}{2 \rho ^ 2} \sum_{d = 1}^D(x_{i,d} - x_{j,d}) ^ 2 \right) + \sigma ^ 2 I_N \]

where \(\sigma\) is the error term, \(\rho\) controls the frequency, \(\alpha\) controls the range, and \(I_N\) is the identity matrix. See the Gaussian processes section of the Stan manual for details.

This is what the covariance function looks like for a range of \(\rho\) values:

First, we’ll do a simulation in one dimension with the following parameters:

sigma <- 1e-4 # residual standard deviation (noise term)
rho <- c(1, 2, 5, 10, 20) # length-scale (frequency)
alpha <- 1 # marginal standard deviation (amplitude of the GP)

For each set, we’ll simulate a multivariate normal draw with zero mean:

N <- 100
x <- seq(N)
distance_matrix <- outer(x, x, "-") 
covariance_matrix <- map(rho, ~ alpha ^ 2 * exp(-distance_matrix ^ 2 / (2 * .x ^ 2))) 
# add sigma squared to diagonal:
covariance_matrix <- covariance_matrix |> 
  map(~ .x + diag(rep(sigma ^ 2, nrow(covariance_matrix[[1]])))) 
# simulate a multivariate normal draw:
y <- with_seed(12456, {map(covariance_matrix, ~ mvtnorm::rmvnorm(1, sigma = .x))}) 

The parameter \(\rho\) has an important effect on the kinds of functions a Gaussian process can model. In the figure below, realizations are on the left and the covariance matrices that generated them are on the right.

Next, we’ll do a simulation in two dimensions. Since the covariance matrix depends only on the Euclidean distance between points, the code above generalizes easily (n.b., the distance matrix is already squared, so no need to square it when calculating the covariance matrix).

rho <- c(5, 10, 20)
N <- 35
x <- seq(N)
grid <- crossing(x1 = x, x2 = x)
# squared Euclidean distance in 2D:
distance_matrix <- outer(grid$x1, grid$x1, "-") ^ 2 + 
  outer(grid$x2, grid$x2, "-") ^ 2

Covariance matrices and multivariate normal draws are generated as above, and they should yield something like this:

Now, on to fitting a Gaussian process! In brms, this is done by estimating the parameters of the covariance function as well as a latent standard normal parameter \(\eta_i \sim N(0, 1)\) for each data point. The \(\eta_i\) are then transformed to yield the target multivariate normal distribution, as described here.

Posterior predictions at new predictor values are calculated analytically, and a function to do so is provided at the above link, under Analytical form of joint predictive inference.

Let’s simulate from a Gaussian process, add a linear trend, subsample from it to create an irregularly-spaced time series, and then fit (1) a continuous-time autoregressive model and (2) a Gaussian process regression model to the data.

First, we’ll simulate a one-dimensional Gaussian process like we did above, with the following parameters (and without a call to map(), since we only have one rho):

sigma <- 0.5 # residual standard deviation
rho <- 10 # length-scale
alpha <- 1 # marginal standard deviation
N <- 100 # number of observations before subsampling

The autocorrelation function looks a lot like it came from a first-order autoregressive model:

Then, we’ll simulate the data:

beta_0 <- 5 # linear model intercept
beta_1 <- 0.1 # linear model slope

data_simulated <- tibble(
  time = x,
  y = beta_0 + beta_1 * time + t(y)[,1]
)

with_seed(4128128, {
  data_simulated <- slice_sample(data_simulated, prop = 0.5) |> 
    arrange(time) |> 
    mutate(d_x = replace_na(time - lag(time), 0))
})

Next, we’ll fit the continuous-time autoregressive model using my R package bgamcar1:

model_car1 <- fit_stan_model(
  file = here("_posts/2024-09-04-modeling-time-series-using-gaussian-processes/models/model-car1"),
  seed = 1245,
  bform = y ~ time + ar(time),
  bdata = data_simulated,
  bpriors = c(
    prior(normal(0, 1), class = b), 
    prior(normal(0.5, 2), class = ar, lb = 0, ub = 1)
  ),
  backend = "cmdstanr"
)

It does a good job of modeling the autocorrelation …

… and it recovers the linear model parameters reasonably well:

fixef(model_car1)
            Estimate  Est.Error       Q2.5     Q97.5
Intercept 5.13800326 0.73921859 3.64954625 6.6194333
time      0.09237404 0.02105802 0.04426472 0.1299154

Finally, we’ll fit the Gaussian process regression:

model_gp <- brm(
  y ~ time + gp(time),
  data = data_simulated,
  prior = prior(normal(0, 1), class = b),
  cores = 4,
  backend = "cmdstanr",
  seed = 214,
  control = list(adapt_delta = 0.999, max_treedepth = 14),
  file = here("_posts/2024-09-04-modeling-time-series-using-gaussian-processes/models/model-gp")
)

Unsurprisingly, it models the autocorrelation too:

And it recovers the linear model parameters:

fixef(model_gp)
           Estimate Est.Error       Q2.5     Q97.5
Intercept 4.2608885 1.2682394 1.68248725 6.8221698
time      0.1087675 0.0215588 0.06731881 0.1537618

Here are the predictions from the two models, superimposed on the data:

The Gaussian process regression is much slower and more difficult to fit, at least with the default priors. But it is a much, much more flexible model than the continuous-time autoregression, and for some applications it may be preferable.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Trueman (2024, Sept. 6). Benjamin Trueman: Gaussian process regression as an alternative to continuous-time autoregression. Retrieved from https://bentrueman.github.io/posts/2024-09-04-modeling-time-series-using-gaussian-processes/

BibTeX citation

@misc{trueman2024gaussian,
  author = {Trueman, Ben},
  title = {Benjamin Trueman: Gaussian process regression as an alternative to continuous-time autoregression},
  url = {https://bentrueman.github.io/posts/2024-09-04-modeling-time-series-using-gaussian-processes/},
  year = {2024}
}