--- layout: post title: "Age Stratified All-Cause and COVID-19 Associated Mortality" tags: [rstats, dataviz, R, COVID-19, SARS-CoV-2, demography] # bibliography: ~/Literature/Bibtex/jabref.bib header-includes: - \usepackage{bm} comments: true editor_options: chunk_output_type: console --- ```{r,include=FALSE,echo=FALSE,message=FALSE} ##If default fig.path, then set it. if (knitr::opts_chunk$get("fig.path") == "figure/") { knitr::opts_knit$set( base.dir = '/Users/hoehle/Sandbox/Blog/') knitr::opts_chunk$set(fig.path="figure/source/2020-12-28-mort/") } fullFigPath <- paste0(knitr::opts_knit$get("base.dir"),knitr::opts_chunk$get("fig.path")) filePath <- file.path("","Users","hoehle","Sandbox", "Blog", "figure", "source", "2020-12-28-mort") knitr::opts_chunk$set(echo = FALSE, fig.width=8, fig.height=6, fig.cap='', fig.align='center', dpi=72*2)#, global.par = TRUE) options(width=150, scipen=1e3) suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(scales)) suppressPackageStartupMessages(library(knitr)) suppressPackageStartupMessages(library(kableExtra)) suppressPackageStartupMessages(library(viridis)) # Analyse DeStatis Death data suppressPackageStartupMessages(library(ISOweek)) suppressPackageStartupMessages(library(lubridate)) suppressPackageStartupMessages(library(readxl)) # Non CRAN packages # devtools::install_github("hadley/emo") ##Configuration options(knitr.table.format = "html") theme_set(theme_minimal()) #if there are more than n rows in the tibble, print only the first m rows. options(tibble.print_max = 10, tibble.print_min = 5) # Warning: Code is not yet 2021 ready, once new data arrive. Need to add ISO year to the Week variable. ``` ## Abstract: We consider the age stratified all-cause and COVID-19 associated mortality in Germany during 2020 based on numbers provided by the Federal Statistical Office and the Robert Koch Institute. **Update 2021-03-01**: An up-to-date and methodologocial improved version of the post, including population adjusted excess mortality calculations for 2020 based on the most recent Destatis and RKI data is available [here](https://staff.math.su.se/hoehle/blog/2021/03/01/mortadj.html). ```{r,results='asis',echo=FALSE,fig.cap=""} cat(paste0("

\n")) ``` {% include license.html %} ## Introduction All-cause mortality is one of indicators used to measure the impact of the COVID-19 pandemic, because this indicator is less biased by the testing strategy needed to identify cases to have died while having a recent COVID-19 diagnosis. Since both death and COVID-19 have a strong age component, it appears crucial to take an age-stratified view on both all-cause mortality as well as deaths associated with COVID-19. This will also help to put age-related COVID-19 mortality in a bigger picture. ## Age Stratified Mortality Real-time mortality monitoring is not common in Germany, as can be seen from the coverage of the [EuroMoMo monitoring](https://www.euromomo.eu/) for Germany, where only the two federal states Hesse and Berlin participate. However, as part of the COVID-19 response, the Federal Statistical Office (Destatis) now provides weekly updated [preliminary mortality statistics of all-cause mortality in 2020](https://www.destatis.de/DE/Themen/Querschnitt/Corona/Gesellschaft/kontextinformationen-gesellschaft.html#Sterbe). The methodology behind the numbers as well as age-stratified additional analyses are described in an accompanying [publication](https://www.destatis.de/DE/Methoden/WISTA-Wirtschaft-und-Statistik/2020/04/sonderauswertung-sterbefallzahlen-042020.pdf?__blob=publicationFile) [@zurnieden_etal2020]. The age-stratified analyses are unfortunately not continuously updated, however, up-to-date data are made publicly [available](https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Sterbefaelle-Lebenserwartung/Tabellen/sonderauswertung-sterbefaelle.html?nn=209016).

```{r,results='asis',echo=FALSE,fig.cap=""} cat(paste0("

\n")) ``` The reported COVID-19 associated deaths (by week of death) are obtained from an [export of the RKI](https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Projekte_RKI/COVID-19_Todesfaelle.html). However, the COVID-19 deaths are not available in age-stratified form. Furthermore, in order to compensate for reporting delays of deaths, the Destatis analysis only goes until 4 weeks before the time point of analysis, i.e. the 2020-12-18 version shown in the above image only reaches ISO week 47 (spanning the time period 2020-11-16 - 2020-11-22). The aim of the present blog post is to provide an **up-to-date age-stratified view including COVID-19 associated deaths**. As additional data source we use the age-stratified cumulative number of deaths reported every Tuesday in the RKI [situational report](https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/Dez_2020/2020-12-22-en.pdf?__blob=publicationFile) (p.7). ## Data I/O This section consists of data wrangling the above mentioned data sources. One challenge is for example to align the age classes used in the different sources. See the R code on [GitHub](`r paste0("https://raw.githubusercontent.com/hoehleatsu/hoehleatsu.github.io/master/_source/",current_input())`) for the gory details. ```{r, echo=FALSE, results="hide", warning=FALSE, message=FALSE} ################################################# # Load Destatis age-stratified mortality data ################################################# destatis_file <- file.path(filePath, "sonderauswertung-sterbefaelle-2020-12-18.xlsx") if (!file.exists(destatis_file)) { download.file("https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bevoelkerung/Sterbefaelle-Lebenserwartung/Tabellen/sonderauswertung-sterbefaelle.xlsx?__blob=publicationFile", destfile = destatis_file) } # Special fun with ISO week 53... destatis_deaths <- readxl::read_xlsx(path=destatis_file, sheet="D_2016_2020_KW_AG_Ins", skip=8) names(destatis_deaths)[2:3] <- c("Jahr", "Altersgruppe") destatis_deaths <- destatis_deaths %>% filter(Altersgruppe != "Insgesamt") %>% select(-`Nr.`) %>% mutate(`53` = NA) # Destatis all cause deaths regrouped to age groups to spans of 10y destatis_deaths_long <- destatis_deaths %>% select(-`53`) %>% pivot_longer(cols=-c(Jahr,Altersgruppe), names_to="Woche", values_to="Anzahl") %>% mutate(Woche = as.numeric(Woche), Jahr = as.numeric(Jahr), Altersgruppe = if_else(Altersgruppe == "95 u. mehr", "95-120", Altersgruppe)) %>% mutate(Altersgruppe_mid = unlist(lapply(str_split(Altersgruppe, "-"), function(x) mean(as.numeric(x)))), Altersgruppe10 = cut(Altersgruppe_mid, breaks=c(0,30,40,50,60,70,80,90,Inf),right=FALSE), Altersgruppe10 = fct_recode(Altersgruppe10,`[00,30)`="[0,30)")) # destatis_deaths_long10 <- destatis_deaths_long %>% group_by(Jahr, Woche, Altersgruppe10) %>% summarise(Anzahl=sum(Anzahl)) %>% mutate(Vorjahr = Jahr < 2020) ##################################################### # Load RKI data for deaths dates by week of death (week of the day of death). # Note: These data are without age stratification ##################################################### date_of_analysis <- "2020-12-27" dest_file <- file.path(filePath,str_c("COVID-19_Todesfaelle-", date_of_analysis, ".xlsx")) if (!file.exists(dest_file)) { download.file("https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Projekte_RKI/COVID-19_Todesfaelle.xlsx;jsessionid=3C9D99CA3AC8A1084C70263FBBC1C92B.internet081?__blob=publicationFile", destfile=dest_file) } # Load data and impute "<4" entries (i.e. 1-3) as 1.5 rki_deaths <- readxl::read_xlsx(path=dest_file) %>% mutate(`Anzahl verstorbene COVID-19 Fälle` = as.numeric(ifelse(`Anzahl verstorbene COVID-19 Fälle` == "<4", 1.5, `Anzahl verstorbene COVID-19 Fälle`))) ########################################################################### # Extract 2019 Destatis population data (available by age-year) ########################################################################### # Define age groups to study - these are actually finer than used later. age_breaks <- c(seq(0,90,by=10),Inf) # Load data and remove "Insgesamt" rows pop <- read_csv2(file=file.path(filePath, "12411-0012.csv"), skip=5) names(pop)[1:2] <- c("Stichtag","Altersgruppe") pop <- pop %>% filter(!is.na(Altersgruppe) & Altersgruppe != "Insgesamt") # Implement age-groups on the data by appropriate grouping pop <- pop %>% mutate(alter = str_extract(Altersgruppe, "^[0-9]+")) %>% # Make age group for single digits into two digit, i.e. 0-9 becomes 00-09 for better sorting mutate(altersgrp = sprintf("[%02s,%02s)",as.numeric(alter), as.numeric(alter)+1)) %>% mutate(alter = case_when( Altersgruppe == "unter 1 Jahr" ~ "0", TRUE~alter), altersgrp = case_when( Altersgruppe == "unter 1 Jahr" ~ "[00,01)", Altersgruppe == "90 Jahre und mehr" ~ "[90,Inf)", TRUE ~ altersgrp), agegrp10 = cut(as.numeric(alter), breaks=age_breaks, labels=sprintf("[%02s,%02s)", head(age_breaks,n=-1), tail(age_breaks,n=-1)), right=FALSE)) # Convert to long format pop_total <- pop %>% select(-Stichtag, -Altersgruppe, -alter, -altersgrp) %>% pivot_longer(-agegrp10, names_to="Bundesland", values_to="Anzahl") %>% group_by(agegrp10) %>% summarise(Anzahl=sum(Anzahl,na.rm=TRUE)) # Merge first 3 classes into a [00,30) y class to match the other destatis data pop_total_0030 <- pop_total %>% mutate(agegrp10 = fct_recode(agegrp10,`[00,30)`="[00,10)", `[00,30)`="[10,20)", `[00,30)`="[20,30)")) %>% group_by(agegrp10) %>% summarise(Population = sum(Anzahl)) # See result and do a checksum - should be 83 mio. pop_total_0030 pop_total_0030 %>% pull(Population) %>% sum() ###################################################################### # RKI cumulative COVID-19 deaths. Own work by M. Höhle digitizing the RKI # situation reports. Didn't find these data elsewhere electronically, # but possibly part of Matthias Linden analysis # (https://twitter.com/matthiaslinden/status/1338984728209338369?s=20). ###################################################################### # Load cumulative data and merge the two highest groups, because the RKI # stopped reporting the 100+ group somewhere along the line. covid_deaths_cum <- read_csv(file.path(filePath,"cum_agestrat_coviddeaths_rki.csv")) %>% mutate(`[90,Inf)` = `[90,100)` + `100+`) %>% select(-`[90,100)`,-`100+`) # New incident number of deaths since start of recording. These are obtained # by differencing of the cumulative numbers. covid_deaths <- covid_deaths_cum %>% mutate(across(where(is.numeric), ~ .x - lag(.x))) %>% slice(-1) covid_deaths #COVID deaths in long format and regroup to same age-classes as all-cause mortality data covid_deaths_long <- covid_deaths %>% pivot_longer(-Date, names_to="age_group", values_to="number") %>% mutate(age_group = factor(age_group, levels=names(covid_deaths)[-1])) %>% left_join(pop_total, by=c("age_group"="agegrp10")) %>% mutate(inc = number / Anzahl * 1e5) %>% filter(number >= 0) %>% # Group first 3 age groups to align with Destatis data. mutate(age_group = fct_recode(age_group,`[00,30)`="[00,10)", `[00,30)`="[10,20)", `[00,30)`="[20,30)")) %>% group_by(Date, age_group) %>% # Recalc count and incidence after the summary summarise( number = sum(number), Anzahl = sum(Anzahl)) %>% mutate(inc = number / Anzahl) #################################################### # Compute the span of the 2016-2019 for each week #################################################### destatis_deaths_long10_range <- destatis_deaths_long10 %>% arrange(Woche, Altersgruppe10, Jahr) %>% group_by(Woche, Altersgruppe10, Vorjahr) %>% summarise(min_Anzahl = min(Anzahl), mean_Anzahl= mean(Anzahl), max_Anzahl = max(Anzahl)) # Join the destatis age-stratified deaths and the population counts. # Note: we use the 2019 population for all years 2016-2020! destatis_deaths <- left_join(destatis_deaths_long10_range, pop_total_0030, by=c("Altersgruppe10" = "agegrp10")) %>% mutate(min_inc = min_Anzahl / Population * 1e5, mean_inc = mean_Anzahl / Population * 1e5, max_inc = max_Anzahl / Population * 1e5, Datum = ISOweek::ISOweek2date(str_c("2020-W", sprintf("%02d",Woche), "-1"))) max_destatis_week <- destatis_deaths_long10_range %>% filter(!Vorjahr & !is.na(mean_Anzahl)) %>% pull(Woche) %>% max() ``` ## Up-to-date age-stratified all-cause mortality Available all-cause deaths (by week of death) are available until `r str_c("2020-W", max_destatis_week)`. Note that Destatis stresses the [preliminary character of the data](https://twitter.com/FlorianBurg2/status/1343903541849681922?s=20) - the numbers might change as further deaths arrive. Stratified by age the times series for 2020 compared to the years 2016-2019 looks as follows - please beware of the different y-axes for the age-groups in the plot: ```{r AGEMORT, echo=FALSE, warning=FALSE} ########################################################### # Helper function to construct a polygon # @param col_reponse Name of the column where to store the response # @param col_lower Name of the column containing the lower limit value # @param col_upper Name of the column containing the upper limit value # @return A data.frame representing a polygon using Woche as x-axis column and # col_response for the y-values. ########################################################### make_poly <- function(col_response, col_lower, col_upper) { poly <- NULL col_lower <- sym(col_lower) col_upper <- sym(col_upper) for (ag in unique(destatis_deaths$Altersgruppe10)) { ts <- destatis_deaths %>% filter(Vorjahr) %>% filter(Altersgruppe10 == ag) poly_ag <- tibble( Woche= c(ts %>% pull(Woche), rev(ts %>% pull(Woche))), `__response__` = c(ts %>% pull(!!col_upper), rev(ts %>% pull(!!col_lower))), Altersgruppe10 = ag, Population = pop_total_0030 %>% filter(agegrp10 == ag) %>% pull(Population)) poly <- rbind(poly, poly_ag) } names(poly) <- names(poly) %>% str_replace("__response__", col_response) return(poly) } # Make a polygon containing the span of the 2016-2019 data poly_anzahl <- make_poly(col_response="mean_Anzahl", col_lower="min_Anzahl", col_upper="max_Anzahl") # Make a range plot with 2020 a line p_sterbefaelle <- ggplot(destatis_deaths_long10_range %>% filter(!Vorjahr), aes(x=Woche, y=mean_Anzahl)) + geom_polygon(data=poly_anzahl, aes(fill="Range 2016-2019 All-Cause")) + geom_line(data=destatis_deaths_long10_range %>% filter(Vorjahr), aes(color="Mean 2016-2019")) + geom_line(aes(color="2020 All-Cause")) + scale_fill_manual(values=c("Range 2016-2019 All-Cause"="gray"), name="") + scale_color_manual(values=c("2020 All-Cause"="black", "Mean 2016-2019"="darkgray"), name="") + xlab("Week") + ylab("No. Deaths") + ggtitle("All-Cause Mortality") + theme_minimal() + theme(legend.position = 'bottom') p_sterbefaelle + facet_wrap(~ Altersgruppe10, scales = "free_y") ``` ```{r, results="hide", warning=FALSE, message=FALSE} # Total mean number of deaths per age group total20162019 <- destatis_deaths %>% filter(Vorjahr) %>% group_by(Altersgruppe10) %>% summarise(Anzahl = sum(mean_Anzahl)) %>% mutate(proportion = Anzahl / sum(Anzahl)) total20162019 # Proportion of deaths, which are of age >= 80y prop80 <- total20162019 %>% filter(Altersgruppe10 %in% c("[80,90)", "[90,Inf)")) %>% summarise(proportion = sum(proportion)) %>% as.numeric() # Proportion of deaths, which are of age >= 60y prop60 <- total20162019 %>% filter(Altersgruppe10 %in% c("[60,70)", "[70,80)", "[80,90)", "[90,Inf)")) %>% summarise(proportion = sum(proportion)) %>% as.numeric() ``` Since the age-groups also contain different population sizes, a better comparison between age-groups instead of absolute numbers is by [incidence rate](`r str_c("{{ site.baseurl }}/",knitr::opts_chunk$get("fig.path"),"sterbefaelle_incidence.png")`) (i.e. deaths per 100,000 population in the age group). To underline the age-gradient of mortality: `r scales::percent(prop80)` of the deaths 2016-2019 occured in the age group of 80+ years (`r scales::percent(prop60)` in the age group of 60+). It becomes clear that 2020 mortality in the 80+ age groups was rather low during the first 10-12 weeks and then had a spike in connection with the first COVID-19 wave (March-April). Subsequently, a summer peak (possibly associated with heat) is followed by an increasing and ongoing upward trend. One challenge for a statistical analysis of these numbers is to figure out how much of the upwards trend is "catch-up mortality" due to the lower mortality in the beginning of the year and how much is excess related to COVID-19. ```{r AGEMORTINC, echo=FALSE, warning=FALSE, fig.keep="none"} # Make an incidence polygon poly_inc <- make_poly(col_response="mean_inc", col_lower="min_inc", col_upper="max_inc") # Plot of the incidence p_inc <- ggplot(destatis_deaths %>% filter(!Vorjahr), aes(x=Woche, y=mean_inc)) + geom_polygon(data=poly_inc, aes(x=Woche, y=mean_inc, fill="Range 2016-2019 All-Cause")) + geom_line(aes(color="2020 All-Cause")) + scale_fill_manual(values=c("Range 2016-2019 All-Cause"="gray"), name="") + scale_color_manual(values=c("2020 All-Cause"="black"), name="") + ylab("7D Mortality incidence rate (per 100,000 population)") + ggtitle("All-Cause mortality") + xlab("Week") + theme_minimal() + theme(legend.position = 'bottom') # Make and store the image p_inc + facet_wrap(~ Altersgruppe10, scales = "free_y") ggsave(filename=file.path(filePath,"sterbefaelle_incidence.png"), width=8, height=5, dpi=300) ``` An initial analysis of this question consists of summing the all-cause mortalities from W1 until `r str_c("W", max_destatis_week)` for 2020 (observed) and compare this to the summation of the weekly mean of 2016-2019 for the corresponding time period (expected)[^1]. When we do this by age-group we obtain: ```{r, results="asis", echo=FALSE, message=FALSE} # Range of total per year (with age stratification) total_20162019 <- destatis_deaths_long10 %>% filter(Woche <= max_destatis_week) %>% group_by(Vorjahr, Altersgruppe10, Jahr) %>% summarise(Anzahl=sum(Anzahl)) oe_ratio <- total_20162019 %>% summarise(mean_Anzahl= mean(Anzahl), min_Anzahl = min(Anzahl), max_Anzahl=max(Anzahl)) %>% pivot_wider( id_cols=Altersgruppe10, names_from=Vorjahr, values_from=c(mean_Anzahl, min_Anzahl, max_Anzahl)) %>% select(-min_Anzahl_FALSE, -max_Anzahl_FALSE) %>% rename(observed=`mean_Anzahl_FALSE`, expected_20162019=`mean_Anzahl_TRUE`, min_20162019=`min_Anzahl_TRUE`, max_20162019=`max_Anzahl_TRUE`) #Total per year (up to most recent week) total_year <- total_20162019 %>% group_by(Jahr) %>% summarise(Anzahl = sum(Anzahl)) oe_ratio_total <- oe_ratio %>% ungroup %>% summarise(Altersgruppe10 = "Total", observed=sum(observed), expected_20162019=sum(expected_20162019), min_20162019=total_year %>% pull(Anzahl) %>% min(), max_20162019=total_year %>% pull(Anzahl) %>% max()) oe_ratio_withtotal <- bind_rows(oe_ratio, oe_ratio_total) %>% mutate(ratio = observed / expected_20162019, ratio_percent= scales::percent(ratio-1, accuracy=1)) %>% select(-ratio) prop80_2020 <- oe_ratio_withtotal %>% ungroup %>% filter(Altersgruppe10 != "Total") %>% mutate(proportion = observed / sum(observed)) %>% filter(Altersgruppe10 %in% c("[80,90)", "[90,Inf)")) %>% summarise(proportion = sum(proportion)) %>% as.numeric() # Make the table using kableExtra oe_ratio_withtotal %>% rename(`Age group`=Altersgruppe10, `Percent change` = ratio_percent) %>% mutate(expected_20162019 = round(expected_20162019)) %>% select(`Age group`, observed, expected_20162019, `Percent change`, everything()) %>% kbl(align=('lrrrrr')) %>% kable_classic() %>% pack_rows("Total", 9, 9) %>% footnote(alphabet = c("Min and max for row 'Total' is obtained by first summing each of the years 2016-2019 and then take the min and max.")) ```

So in these numbers the mild mortality in the older age groups during the first weeks balances some, but not all, of the excess in these age-groups since Mar-Apr. The total proportion of 2020-W1 to 2020-W47 mortalities in the 80+ age group is currently `r scales::percent(prop80_2020)`. However, it is also important to realize that the current observed 2020 numbers contain the consequences of all type of effects from the pandemic management, which includes changes in the population behavior due to interventions. Disentangling the complex effects of all-cause mortality and the COVID-19 pandemic is a delicate matter, which takes experts in several disciplines (demographers, statisticians, epidemiologists) to solve. However, should you based on the above numbers happen to think that COVID-19 is not a serious problem, it is insightful to think about the [prevention paradox](https://en.wikipedia.org/wiki/Prevention_paradox) and take a look at the [all-cause mortality statistics](https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938) from other countries. ### All-Cause Mortality and COVID-19 Associated Deaths To see, how much of the all-cause mortality is directly contributed by deaths in association with COVID-19, we match the age-stratified all-cause mortality data with the age-stratified COVID-19 deaths reported by the RKI since Sep 2020 (2020-W35). One complication of this matching is that the RKI deaths are reported by the week that the information about the death reached the RKI and not the week of death. In order to match it with the Destatis all-cause mortality time series, we extrapolate week of death from the week of report by the simple assumption that the death occurred 2 weeks before the report[^2]. ```{r PRECALC, echo=FALSE, message=FALSE} corresponding_sit_rep <- covid_deaths_cum %>% filter(Date >= ymd(20201218)) %>% slice(1) total_deaths_destatis <- rki_deaths %>% filter(Sterbewoche <= max_destatis_week) %>% summarise(Anzahl = sum(`Anzahl verstorbene COVID-19 Fälle`)) ``` ```{r,echo=FALSE, results="hide", message=FALSE} # Transform reporting date time series of deaths to a time series of time of death covid_deaths_long2 <- covid_deaths_long %>% mutate( # ISO week of the day the report is published (so covered week actually is -1) Meldewoche = as.numeric(str_extract(ISOweek::date2ISOweek(Date), "(?<=W)[0-9]{2}")), # Adjust for reporting lag between day of death and report received at the RKI. # We assume this takes two weeks, but might even be longer # In https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202000143 the delay between # reports of those reported to the RKI on day $x$ as being infectious and the information # that they died has a median of 14 days. Woche = Meldewoche - 2 ) %>% rename(COVID = number, Bevölkerung=Anzahl) # Join the two datasets deaths_ts <- right_join(destatis_deaths, covid_deaths_long2, by=c("Woche", c("Altersgruppe10"="age_group"))) %>% mutate(COVID = ifelse(Vorjahr, NA, COVID)) # Deaths in RKI statistics by week deaths_ts %>% group_by(Woche) %>% summarise(COVID = sum(COVID, na.rm=TRUE)) # Total Covid-19 deaths in Destatis graphic (as it was computed on the day of query, i.e. 2020-12-18) # compare with truth in the RKI data about that week (late adjustments occur) date_range <- seq(min(deaths_ts %>% pull(Woche)), destatis_deaths_long10_range %>% filter(!Vorjahr & !is.na(mean_Anzahl)) %>% pull(Woche) %>% max(), by=1) # Number of deaths in the time span covered by the age-stratified mortality data using week-of-death rki_deaths %>% filter(Sterbewoche %in% date_range) %>% pull(`Anzahl verstorbene COVID-19 Fälle`) %>% sum(na.rm=TRUE) # Corresponding number of deaths in in transformed reporting date time series deaths_ts %>% filter(Woche %in% date_range) %>% pull(COVID) %>% sum(na.rm=TRUE) # # # Total Covid-19 in age-stratified adjusted RKI dataset until week same week as DeStatis data # # # by artifically generated day-of-death. This should approximately match the above number # deaths_ts %>% filter(Woche %in% date_range) %>% pull(COVID) %>% sum(na.rm=TRUE) # # # By unadjusted week, i.e. day of report - this is naturally less.. # deaths_ts %>% filter(Meldewoche %in% date_range) %>% pull(COVID) %>% sum(na.rm=TRUE) if (FALSE) { #For comparison right_join(deaths_ts %>% filter(!Vorjahr) %>% group_by(Woche) %>% summarise(COVID=sum(COVID)), destatis_graphic, by=c("Woche"="Kalenderwoche")) %>% select(Woche, COVID, `2020 (davon COVID-19)`) } ######################################################################## # Extrapolate COVID-19 mortality on current 2020 all-cause mortality ######################################################################## end <- deaths_ts %>% ungroup %>% filter(!Vorjahr & !is.na(mean_Anzahl)) %>% filter(Woche == max(Woche)) %>% select(Woche, Altersgruppe10, mean_Anzahl, COVID) %>% rename(end_mean_Anzahl = mean_Anzahl, end_COVID = COVID) future <- deaths_ts %>% ungroup %>% filter(!Vorjahr & !is.na(COVID)) %>% filter(Woche >= max(end$Woche)) prediction <- future %>% left_join(end %>% select(-Woche), by=c("Altersgruppe10")) %>% mutate(mean_Anzahl = end_mean_Anzahl + (COVID - end_COVID)) # Last week of the prediction last_week_of_predict <- str_c("2020-W", deaths_ts %>% filter(!Vorjahr) %>% pull(Woche) %>% max()) ``` Furthermore, to avoid a downward bias in the observed numbers by observed-but-not-yet-reported deaths, the previously shown Destatis analyses of all-cause mortality does not include the most recent weeks, where COVID-19 associated mortality increased substantially in Germany: the analysis is only done until `r str_c("2020-W", max_destatis_week)`, even though the date of analysis was 2020-12-18. At this time the RKI in their situational report of `r corresponding_sit_rep %>% pull(Date)` already reported a total of `r corresponding_sit_rep %>% select(-Date) %>% sum()` COVID-19 associated deaths - only `r total_deaths_destatis` have their time of death up to `r str_c("2020-W", max_destatis_week)`. We thus expect the reported excess mortality to increase within the coming weeks. As a simple extrapolation, we assume that all COVID-19 associated mortality in the subsequent weeks above the level in `r str_c("2020-W", max_destatis_week)`, is directly summable to the 2020 all-cause mortality[^3]. With this simple extrapolation, the excess mortality computations can be extended until `r last_week_of_predict` and leads to the following predictions: ```{r AGEMORTWCOVID, echo=FALSE, results="hide", warning=FALSE, message=FALSE} # Palette for plotting pal <- brewer_pal(palette="Dark2")(3)[2:3] #dont use the green # Make a plot with the extrapolations added p_deaths_est <- ggplot(deaths_ts %>% filter(!Vorjahr), aes(x=Woche, y=mean_Anzahl, fill=NA)) + geom_polygon(data=poly_anzahl %>% filter(Woche >= date_range[1]), aes(fill="Range 2016-2019 All-Cause")) + geom_line(aes(color="2020 All-Cause")) + geom_line(data=deaths_ts %>% filter(!Vorjahr), aes(x=Woche, y=COVID, color="2020 with COVID-19")) + geom_line(data=prediction, aes(color="2020 All-Cause", linetype="COVID19 extrapolated")) + scale_color_manual(values=c("2020 with COVID-19"=pal[1], "2020 All-Cause"=pal[2]), name="") + scale_fill_manual(values=c("Range 2016-2019 All-Cause"="gray"), name="") + scale_linetype_manual(values=c("COVID19 extrapolated"=3), name="") + scale_x_continuous(breaks = seq(min(deaths_ts$Woche), max(deaths_ts$Woche, poly_anzahl$Woche), 2), minor_breaks = seq(min(deaths_ts$Woche), max(deaths_ts$Woche, poly_anzahl$Woche), 1)) + ylab("No. Deaths") + xlab("Week") + theme_minimal() + #labs(caption="Data Source: Destatis and RKI.\n Note: Beware the different y-axes.") + theme(legend.position = 'bottom') + facet_wrap(~ Altersgruppe10, scales = "free_y") p_deaths_est ``` ```{r AGEMORTWCOVID-PERCENT, echo=FALSE, fig.keep="none", warning=FALSE} #################### # Proportion #################### death_prop <- deaths_ts %>% pivot_wider(id_cols=c(Woche, Altersgruppe10), names_from=c(Vorjahr), values_from=c(mean_Anzahl, COVID)) %>% mutate(proportion_allcause_historicmean = COVID_FALSE / mean_Anzahl_TRUE, proportion_allcause2020 = COVID_FALSE / mean_Anzahl_FALSE) %>% select(Woche, Altersgruppe10, proportion_allcause_historicmean, proportion_allcause2020) %>% pivot_longer(cols=c(proportion_allcause_historicmean, proportion_allcause2020), names_to="type", values_to="proportion") ggplot(death_prop, aes(x=Woche, y=proportion, linetype=type)) + geom_line() + facet_wrap( ~ Altersgruppe10) + scale_y_continuous(labels=scales::percent) + #scale_color_brewer(palette="Set1", name="") + scale_linetype_manual(labels=c("2020 with COVID-19 / 2016-2019 All-Cause Mean", "2020 with COVID-19 / 2020 All-Cause"), values=c(1, 2), name="") + theme_minimal() + ylab("Proportion") + scale_x_continuous(minor_breaks = seq(min(death_prop$Woche), max(death_prop$Woche), 1)) + theme(legend.position = 'bottom') ggsave(filename=file.path(filePath,"deaths_covid19_proportion.png"), width=8, height=5, dpi=300) ``` ```{r EXCESSEST, echo=FALSE} ac2020 <- destatis_deaths %>% ungroup %>% filter(!Vorjahr) %>% pull(mean_Anzahl) %>% sum(na.rm=TRUE) ac2020_predict <- prediction %>% summarise(mean_Anzahl=sum(mean_Anzahl)) %>% sum(na.rm=TRUE) predicted <- ac2020 + ac2020_predict expected <- destatis_deaths %>% filter(Vorjahr & Woche <= max(prediction$Woche)) %>% ungroup() %>% summarise(mean_Anzahl=sum(mean_Anzahl),.groups="drop") %>% sum() ratio <- predicted / expected ``` We note that the COVID-19 associated deaths in the most recent weeks in the 80+ age groups make up more than [`r scales::percent(death_prop$proportion %>% max(na.rm=TRUE), accuracy=5)` of all deaths reported on average over the years 2016-2019](`r str_c("{{ site.baseurl }}/",knitr::opts_chunk$get("fig.path"),"deaths_covid19_proportion.png")`). This would mean an excess of mortality for the period of 2020-W01 to `r last_week_of_predict` of `r scales::percent(ratio-1, accuracy=1)`, which is likely to increase even further as the remaining weeks of 2020 are added[^4]. ## Discussion Considering all-cause mortality and COVID-19 associated mortality as a measure for the impact of an pandemic is a rather simplistic view of the pandemic. COVID-19 infections can be very mild, but complicated progressions can occur without leading to death (see, e.g., [long COVID](https://en.wikipedia.org/wiki/Long_COVID)). Looking at mortality also ignores the complex interplay between age-groups, where it can be beneficial to reduce infections in a not-so-affected-by-the-disease age-group in order to protect the higher-risk groups. The motivation of this post was primarily to put COVID-19 associated mortality in relation to all-cause mortality in order to get a better understanding of the daily number of COVID-19 deaths. An age-stratified view is necessary for this. We showed that the Destatis reported excess-mortality are expected to increase in the coming weeks. The extrapolations used in the present analysis are simplistic and could be improved by a [nowcasting approach](https://staff.math.su.se/hoehle/blog/2016/07/19/nowCast.html), which extrapolates not-yet-reported deaths from knowledge about the reporting delay [@schneble_etal2020]. For a more modelling based analysis of the German COVID-19 associated mortality data see also the work by @linden_etal2020 ([updated analysis](https://twitter.com/matthiaslinden/status/1338984728209338369?s=20)). More information on real-time mortality monitoring can be obtained from the [EuroMoMo methodology](https://www.euromomo.eu/how-it-works/methods/) page or @hoehle_mazick2010. Comments and feedback to the analysis in this blog post are much appretiated. **Update 2020-03-01**: An up-to-date version of this post, including an analysis of the most recent Destatis and RKI data is available [here](https://staff.math.su.se/hoehle/blog/2021/03/01/mortadj.html). [^1]: More involved ways to compute excess-mortality are imaginable. [^2]: Using two weeks provided the best fit to the unstratified number of observed cases by week of death. More advanced transformation schemes than simply subtracting two week are imaginable. [^3]: Note: This is really a guesstimate and might produce a slight excess, because some of the individuals who would have died in this week in a COVID-free year, by chance now happen to die this week with COVID-19. However, part of inferential statistics is to make predictions (which can be wrong) in timely fashion. If you want the *true numbers*, you will have to wait to end of Jan 2021 or even to mid-2021 (when the official mortality statistics is released). This is not helpful for situational awareness during a pandemic. [^4]: Note that 2020 has an ISO week 53 spanning 2020-12-28 to 2021-01-03, whereas none of years 2016-2019 had an ISO week 53. It will be interesting to see how this week will be handled by Destatis for the excess mortality calculations. ## Literature