--- title: "Seasonal mortality trend decomposition" author: "Roman Popat" date: "11 December 2015" output: html_document layout: post --- I recently wrote a [blog](http://thedatalab.com/Fruit-and-veg-prices) on trends and seasonal variation in fruit and veg wholesale prices provided by DEFRA. It was using a beatiful technique called ‘STL’ or seasonal-trend decomposition via loess[^1]. Just now I spotted a dataset from the Office for National Statistics on [winter mortality](http://visual.ons.gov.uk/excesswintermortality/). ONS highlight that: * last winter was a particularly high winter mortality year and * that overall winter mortality was decreasing. Lets take a look ourselves. We can read their .csv file straight from the site, and then apply the seasonal trend decomposition. For convenience, I am going to construct a new variable that counts the weeks as decimal years. ```{r} dat <- read.csv("http://visual.ons.gov.uk/wp-content/uploads/2015/11/figure45.csv", as.is = T) # the splitN function will split a string vector and # return a vector of the nth element of each split splitN <- function(dd, n, split) { sapply(dd, function(x) strsplit(as.character(x), split)[[1]][n]) } dat\$year <- as.numeric(splitN(dat\$date, 1, "-")) dat\$week <- as.numeric(splitN(dat\$date, 2, "-")) dat\$decimalYears <- dat\$year + (dat\$week/52) ``` Now we can apply the seasonal trend decomposition. ```{r} t_series <- ts(dat\$Deaths..thousands., frequency = 52) stl_series <- stl(t_series, 9) ``` I have prepared a function to plot the results of the seasonal trend decomposition. ```{r, dev='svg', fig.width = 10, fig.height = 6} source("https://raw.githubusercontent.com/rmnppt/FruitAndVeg/master/Scripts/plotSTL.R") plotSTL(stl_series, "winter mortality", interval = 52) # We can add in some labels at January of each year Xindex <- which(dat\$decimalYears %% 1 == 0) axis(1, labels = dat\$decimalYears[Xindex], at = Xindex) ``` First lets test if the overall trend (3rd panel) is decreasing. ```{r} y <- stl_series\$time.series[,2] x <- 1:nrow(stl_series\$time.series) mod <- lm(y ~ x) summary(mod) ``` Seems like an unambiguous negative trend then, although quite weak. The coefficient is `r signif(coef(mod)[2], 4)` and this is in deaths per thousand per week. This equates to `r signif(abs(coef(mod)[2]*1000*52), 4)` fewer deaths per million each year since 1999. Now lets look at 2014 in the distribution of residuals or in the STL language, the remainder. ```{r, dev='svg', fig.width = 8, fig.height = 5} resids <- stl_series\$time.series[,"remainder"] fifteenPeak <- max(resids[dat\$year == 2015]) par(fg = "grey", las = 1) hist(resids, breaks = 50, col = "#ff000050", main = "", xlab = "Weekly mortality residuals") abline(v = fifteenPeak, col = "black") text(fifteenPeak, 100, "2014/15 winter\npeak value", pos = 4) ``` How can we test the hypothesis that 2014 was an unusual year? One way might be to repeatedly sample the remainders (with replacement) each time asking if the 2014 peak value was in the 1% tail. ```{r} N <- 1e5 cutoff <- floor(length(resids)*0.01) inTail <- logical(N) for(i in 1:N){ sampl <- sample(resids, replace = T) inTail[i] <- fifteenPeak > sort(sampl, T)[cutoff] } mean(inTail) ``` The 2014/15 winter peak mortality is in the 1% tail in `r paste(mean(inTail)*100, "%")` of samples. Good evidence that this value is unsusual. Not that surprising since there are only `r length(which(resids > fifteenPeak))` other residuals larger than it. So seasonal mortality winter 2014/15 does appear to have been unusually high. [.Rmd Script](https://raw.githubusercontent.com/rmnppt/rmnppt.github.io/master/_source/2015-12-11-STL-winterMortality.Rmd) for this post. 1. A full description of the method can be found in the following reference. R. B. Cleveland, W. S. Cleveland, J.E. McRae, and I. Terpenning (1990) STL: A Seasonal-Trend Decomposition Procedure Based on Loess. Journal of Official Statistics, 6, 3–73