Censored autoregression with a smoothed time covariate.
I recently came across a new paper titled “Brownification on hold: What traditional analyses miss in extended surface water records” (Eklöf et al. 2021). In it, the authors argue that linear trend analysis misses important nonlinearities in surface water quality records. Specifically, the paper suggests that surface water browning—increases in coloured organic matter—has paused in recent decades and that generalized additive modeling would be better suited than linear regression to describe that pause.
Earlier this year I cowrote an article (Redden et al. 2021) describing a linear trend analysis of surface water time series in Nova Scotia. We found strong evidence of browning in many surface waters, but I thought I’d briefly revisit that analysis here to see if we missed anything that a nonlinear trend analysis wouldn’t. This isn’t meant to be a full reanalysis of the data, but it might point in the direction that a future analysis would take.
There is a significant roadblock to fitting generalized additive models (GAMs) to the data from our paper: GAMs—at least, as they are implemented in the popular R package mgcv
—do not allow censored autoregression (models of autocorrelated time series where part of the series is censored).
One option for censored autoregression is the carx
package in R. It’s primary function, carx::carx()
, also permits a matrix of external covariates that can include a basis expansion—making it possible to fit a GAM to a left-censored, autocorrelated time series.
In this post I demonstrate the package on a single measurement series, one of many we used in Redden et al. (2021) The data are sourced from Environment Canada, as described in the paper.
In the series, 10% of apparent colour values—those below 5 Pt-Co—are left-censored. To generate a regularly spaced time series, I aggregated the data into semesters by taking medians, and I excluded values collected before 1983. When aggregation required taking the midpoint of a left-censored and an observed value, the output was left-censored at the midpoint between the observed value and the censoring limit.
I chose to handle missing values by left-censoring them at a limit of positive infinity, as implemented in carx()
. The response was log-transformed prior to fitting the model, and the matrix of covariates comprised a cubic regression spline basis expansion of the time variable using splines::bs()
with 4 degrees of freedom. The model also included semester as a binary covariate to account for seasonal variation.
As a point of comparison, I fit a separate censored autoregression with a linear time covariate. Here are the linear model and GAM predictions along with 95% confidence intervals on the predicted values:
While the GAM does appear to track the series slightly better, the linear model yielded an AIC of -44, whereas the cubic regression spline model yielded a larger AIC of -39.
Both models yielded residuals with no obvious deviations from whiteness. For comparison, equivalent models fit using the function lm()
(no autoregression) are shown as well. Autocorrelation at lag 1 is notably lower in the residuals from the censored AR(1) models.
Both autoregressions yielded residuals that were approximately Gaussian, albeit with somewhat fatter tails than expected (the carx
method is robust against mild departures from normality (Wang and Chan 2017)).
The variance of the residuals was also reasonably constant, apart from a few outliers and perhaps a slight reduction in later years.
But overall, there’s not much evidence here that the GAM describes this particular series any better than the linear model. And a GAM fit using brms
(“Bayesian Regression Models using Stan”)—which accomodates both left-censoring and autoregression—yielded similar results. Here is a sketch of the model one might fit using brms::brm()
. N.B., missing values are handled differently here: see this vignette.
While I haven’t fully evaluated the fit of this model, it doesn’t suffer from any major issues in terms of convergence. And it is worth noting that it yields a smooth curve that differs little from the censored autogregression with linear time covariate. Fitted values from the latter model—including the AR(1) component—are superimposed in red on the following plot.
model_in_brms <- model_in %>%
mutate(
# replace missing values in the censoring indicator with 0 (uncensored)
ci = replace_na(ci, 0)
)
model_brms <- brm(
bf(log(value) | cens(ci) + mi() ~
s(numeric_date) + sem + ar(time = numeric_date, p = 1)),
data = model_in_brms,
control = list(adapt_delta = .999),
seed = 3152,
file = "model_brms"
)
The brms
and carx
models also yielded similar estimates of the AR(1) parameter: 0.21 for the censored autoregression (linear time covariate) and 0.28 for the Bayesian regression model. In percentage terms, the two approaches differ mainly in the predictions over the first part of the series, where most of the censored observations were recorded.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Trueman (2021, Sept. 8). Benjamin Trueman: A brownification pause?. Retrieved from https://bentrueman.github.io/posts/2021-09-08-a-brownification-pause/
BibTeX citation
@misc{trueman2021a, author = {Trueman, Ben}, title = {Benjamin Trueman: A brownification pause?}, url = {https://bentrueman.github.io/posts/2021-09-08-a-brownification-pause/}, year = {2021} }