--- title: "Section 2: CAR models for spatial disease mapping" author: "Aritz Adin" date: "2024-08-13" date-format: medium format: html: toc: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` This lab session focuses on how to use the `bigDM` package to fit the scalable spatial model’s proposals described in [Orozco-Acosta et al. (2021)](https://doi.org/10.1016/j.spasta.2021.100496) using simulated mortality data from the municipalities of continental Spain. ## The `CAR_INLA()` function This function allows fitting (scalable) spatial Poisson mixed models to areal count data, where several conditional autoregressive (CAR) prior distributions can be specified for the spatial random effects. Specifically, the linear predictor is modelled as ```{=tex} \begin{equation} \log r_{i} = \beta_0 + {\bf x}_{i}^{'}\mathbf{\beta} + \xi_i, \quad \mbox{for} \quad i=1,\ldots,I \end{equation} ``` where $\beta_0$ is a global intercept, ${\bf x}_{i}^{'}=(x_{i1},\ldots,x_{ip})$ is a $p$-vector of standardized covariates in the $i$-th area, $\mathbf{\beta}=(\beta_1,\ldots,\beta_p)^{'}$ is the $p$-vector of fixed effect coefficients, and $\xi_i$ is a spatially structured random effect with a CAR prior distribution. ### Main input arguments What follows is a brief description of the main input arguments and functionalities of the `CAR_INLA()` function: - **`carto`**: object of class `sf` or `SpatialPolygonsDataFrame` that contains the data of analysis and its associated cartography file. This object must contain at least the target variables of interest specified in the arguments `ID.area`, `O` and `E`. - **`ID.area`**: character; name of the variable that contains the IDs of spatial areal units. - **`ID.group`**: character; name of the variable that contains the IDs of the spatial partition (grouping variable). Only required if `model="partition"`. - **`O`**: character; name of the variable that contains the observed number of disease cases for each areal units. - **`E`**: character; name of the variable that contains either the expected number of disease cases or the population at risk for each areal unit. - **`X`**: a character vector containing the names of the covariates within the `carto` object to be included in the model as fixed effects, or a matrix object playing the role of the fixed effects design matrix. If `X=NULL` (default), only a global intercept is included in the model as fixed effect. - **`W`**: optional argument with the binary adjacency matrix of the spatial areal units. If `NULL` (default), this object is computed from the `carto` argument (two areas are considered as neighbours if they share a common border). - **`prior`**: one of either `"Leroux"` (default), `"intrinsic"`, `"BYM"` or `"BYM2"`, which specifies the prior distribution considered for the spatial random effect. - **`model`**: one of either `"global"` or `"partition"` (default), which specifies the Global model or one of the scalable model proposal’s (*Disjoint model* and *k-order neighbourhood model*, respectively). - **`k`**: numeric value with the neighbourhood order used for the partition model. Usually k=2 or 3 is enough to get good results. If `k=0` (default) the Disjoint model is considered. Only required if `model="partition"`. - **`compute.DIC`**: logical value; if `TRUE` (default) then approximate values of the Deviance Information Criterion (DIC) and Watanabe-Akaike Information Criterion (WAIC) are computed. - **`compute.fitted.values`**: logical value (default `FALSE`); if `TRUE` transforms the posterior marginal distribution of the linear predictor to the exponential scale (risks or rates). - **`inla.mode`**: one of either `"classic"` (default) or `"compact"`, which specifies the approximation method used by INLA. See `help(inla)` for further details. For further details, please refer to the [reference manual](https://cran.r-project.org/web/packages/bigDM/bigDM.pdf) and the [vignettes](https://github.com/spatialstatisticsupna/bigDM/tree/master?tab=readme-ov-file#basic-use) accompanying this package. ## Example 1: stomach cancer mortality data The main input argument of the `CAR_INLA()` function must be an object of class `sf` (simple feature) or `SpatialPolygonsDataFrame` that contains the data of analysis and its associated cartography file. Standard .shp files (shapefiles) can be loaded as `sf` objects in R using the `sf::st_read()` function. Note that **bigDM** includes the `Carto_SpainMUN` object with the polygons of Spanish municipalities and simulated colorectal cancer mortality data (modified in order to preserve the confidentiality of the original data). ```{r} library(bigDM) library(INLA) library(sf) library(tmap) tmap4 <- packageVersion("tmap") >= "3.99" data("Carto_SpainMUN") head(Carto_SpainMUN) ``` ### Global model We start by fitting the *Global model* described in Equation (1), where the entire neighbourhood graph of the areal units (municipalities) is considered to define the adjacency matrix ${\bf W}$. The `connect_subgraphs()` function computes a fully connected graph and its associated adjacency matrix by merging disjoint connected subgraphs through its nearest polygon centroids. ```{r} ## NOTE: Not necessary (shown for illustrative purposes only) aux <- connect_subgraphs(Carto_SpainMUN, ID.area="ID") ``` The function returns a list with the following two elements: \* nb: the modified neighbours list \* W: associated spatial adjacency matrix of class `dgCMatrix` ```{r} str(aux,1) summary(aux$nb) aux$W[1:10,1:10] ``` The *Global model* with an iCAR/BYM2 prior distribution are fitted using the `CAR_INLA()` function as follows: ```{r} ## Fit the models iCAR.Global <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", O="obs", E="exp", model="global", prior="intrinsic", inla.mode="compact") summary(iCAR.Global) BYM2.Global <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", O="obs", E="exp", model="global", prior="BYM2", inla.mode="compact") summary(BYM2.Global) ## Model comparison compare.DIC <- function(x){ data.frame(mean.deviance=x$dic$mean.deviance, p.eff=x$dic$p.eff, DIC=x$dic$dic, WAIC=x$waic$waic, time=x$cpu.used["Total"]) } do.call(rbind,lapply(list(iCAR=iCAR.Global, BYM2=BYM2.Global), compare.DIC)) plot(iCAR.Global$summary.linear.predictor$`0.5quant`, BYM2.Global$summary.linear.predictor$`0.5quant`, xlim=c(-0.5,0.5), ylim=c(-0.5,0.5), xlab="iCAR model", ylab="BYM2 model", main="Posterior median estimates") lines(c(-1,1),c(-1,1)) ``` ### Partition models A natural choice for defining the partition of the entire spatial domain could be using administrative subdivisions, such as provinces or states. For our example data, the $D=15$ Autonomous Regions of Spain are used as a partition of the $n=7907$ municipalities (`Carto_SpainMUN$region` variable). ```{r} tmap_mode("plot") if(tmap4){ tm_shape(Carto_SpainMUN) + tm_polygons(fill="region", fill.scale=tm_scale(values="brewer.set3"), fill.legend=tm_legend(frame=FALSE)) }else{ tm_shape(Carto_SpainMUN) + tm_polygons(col="region") + tm_layout(legend.outside=TRUE) } ``` The *disjoint* and *k-order neighbourhood models* with an iCAR prior distribution are fitted using the `CAR_INLA()` function as: ```{r} iCAR.k0 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", O="obs", E="exp", model="partition", k=0, ID.group="region", prior="intrinsic", inla.mode="compact") iCAR.k0$cpu.used iCAR.k1 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", O="obs", E="exp", model="partition", k=1, ID.group="region", prior="intrinsic", inla.mode="compact") iCAR.k1$cpu.used iCAR.k2 <- CAR_INLA(carto=Carto_SpainMUN, ID.area="ID", O="obs", E="exp", model="partition", k=2, ID.group="region", prior="intrinsic", inla.mode="compact") iCAR.k2$cpu.used ``` Internally, the `divide_carto()` function is called to compute the overlapping set of regions $\{D_1,\ldots,D_{15}\}$ according to the grouping variable `ID.group="region"`. The neighbourhood order to add polygons at the border of the spatial subdomains is controlled by the `k` argument. ```{r} ## Compute subdomains for k=0,1 and 2 carto.k0 <- divide_carto(carto=Carto_SpainMUN, ID.group="region", k=0) carto.k1 <- divide_carto(carto=Carto_SpainMUN, ID.group="region", k=1) carto.k2 <- divide_carto(carto=Carto_SpainMUN, ID.group="region", k=2) ## Plot the spatial polygons for the autonomous region of Castilla y Leon plot(carto.k2$`Castilla y Leon`$geometry, col="dodgerblue4", main="Castilla y Leon") plot(carto.k1$`Castilla y Leon`$geometry, col="dodgerblue", add=TRUE) plot(carto.k0$`Castilla y Leon`$geometry, col="lightgrey", add=TRUE) ``` ### Compare the results ```{r, message=FALSE, warning=FALSE} library(RColorBrewer) ## Carto object of the Spanish provinces carto.CCAA <- aggregate(Carto_SpainMUN[,"geometry"],list(ID.group=st_drop_geometry(Carto_SpainMUN)$region), head) ## Model selection criteria MODELS <- list(Global=iCAR.Global, k0=iCAR.k0, k1=iCAR.k1, k2=iCAR.k2) do.call(rbind,lapply(MODELS, compare.DIC)) ## Maps with posterior median estimates of log-risks Carto_SpainMUN$Global <- iCAR.Global$summary.linear.predictor$`0.5quant` Carto_SpainMUN$k0 <- iCAR.k0$summary.linear.predictor$`0.5quant` Carto_SpainMUN$k1 <- iCAR.k1$summary.linear.predictor$`0.5quant` Carto_SpainMUN$k2 <- iCAR.k2$summary.linear.predictor$`0.5quant` paleta <- brewer.pal(8,"RdYlGn")[8:1] values <- c(-Inf,log(c(0.83,0.91,0.95,1,1.05,1.10,1.20)),Inf) if(tmap4){ Map.risk <- tm_shape(Carto_SpainMUN) + tm_polygons(fill=c("Global","k0","k1","k2"), fill.free=FALSE, col_alpha=0, fill.scale=tm_scale(values=paleta, breaks=values, interval.closure="left", midpoint=0), fill.legend=tm_legend("log-risks", show=TRUE, reverse=TRUE, position=tm_pos_out("right","center"), frame=FALSE)) + tm_shape(carto.CCAA) + tm_borders(col="gray40") + tm_title("Posterior median estimates") + tm_layout(panel.labels=c("Global model","Disjoint model","1st-order nb","2nd-order nb")) + tm_facets(nrow=2, ncol=2) }else{ Map.risk <- tm_shape(Carto_SpainMUN) + tm_polygons(col=c("Global","k0","k1","k2"), palette=paleta, border.alpha=0, title="log-risks", legend.show=T, legend.reverse=T, style="fixed", breaks=values, interval.closure="left") + tm_shape(carto.CCAA) + tm_borders(col="gray40") + tm_layout(main.title="Posterior median estimates", main.title.position="center", panel.labels=c("Global model","Disjoint model","1st-order nb","2nd-order nb"), legend.outside=T, legend.outside.position="right", legend.frame=F, legend.outside.size=0.2, outer.margins=c(0.02,0.01,0.02,0.01)) + tm_facets(nrow=2, ncol=2) } tmap_mode("plot") print(Map.risk) ## Maps with posterior exceedence probabilities Carto_SpainMUN$Global <- 1-iCAR.Global$summary.linear.predictor$`0cdf` Carto_SpainMUN$k0 <- 1-iCAR.k0$summary.linear.predictor$`0cdf` Carto_SpainMUN$k1 <- 1-iCAR.k1$summary.linear.predictor$`0cdf` Carto_SpainMUN$k2 <- 1-iCAR.k2$summary.linear.predictor$`0cdf` paleta <- brewer.pal(6,"Blues")[-1] values <- c(0,0.1,0.2,0.8,0.9,1) if(tmap4){ Map.prob <- tm_shape(Carto_SpainMUN) + tm_polygons(fill=c("Global","k0","k1","k2"), fill.free=FALSE, col_alpha=0, fill.scale=tm_scale(values=paleta, breaks=values, interval.closure="left", labels=c("[0-0.1)","[0.1-0.2)","[0.2-0.8)","[0.8-0.9)","[0.9-1]")), fill.legend=tm_legend("Prob", show=TRUE, reverse=TRUE, frame=FALSE, position=tm_pos_out("right","center"))) + tm_shape(carto.CCAA) + tm_borders(col="gray40") + tm_title("Posterior exceedence probabilities") + tm_layout(panel.labels=c("Global model","Disjoint model","1st-order nb","2nd-order nb")) + tm_facets(nrow=2, ncol=2) }else{ Map.prob <- tm_shape(Carto_SpainMUN) + tm_polygons(col=c("Global","k0","k1","k2"), palette=paleta, border.alpha=0, title="Prob", legend.show=T, legend.reverse=T, style="fixed", breaks=values, interval.closure="left", labels=c("[0-0.1)","[0.1-0.2)","[0.2-0.8)","[0.8-0.9)","[0.9-1]")) + tm_shape(carto.CCAA) + tm_borders(col="gray40") + tm_layout(main.title="Posterior exceedence probabilities", main.title.position="center", panel.labels=c("Global model","Disjoint model","1st-order nb","2nd-order nb"), legend.outside=T, legend.outside.position="right", legend.frame=F, legend.outside.size=0.2, outer.margins=c(0.02,0.01,0.02,0.01)) + tm_facets(nrow=2, ncol=2) } tmap_mode("plot") print(Map.prob) ``` ## Example 2: ecological regression model In this second example, we are going to fit a spatial model that incorporates both area-level CAR random effects and a set of explanatory variables, with the aim of comparing the regression coefficient estimates obtained from the global and partitioned models. For this, we will use the data contained in the `Data_MultiCancer` object as follows: ```{r} ## Define the data and the spatial covariates ## data("Data_MultiCancer") head(Data_MultiCancer) data <- Data_MultiCancer[Data_MultiCancer$disease==1,] data$X1 <- Data_MultiCancer[Data_MultiCancer$disease==2,"SMR"] data$X2 <- Data_MultiCancer[Data_MultiCancer$disease==3,"SMR"] head(data) ## Define the sf object ## Carto_SpainMUN$obs <- NULL Carto_SpainMUN$exp <- NULL Carto_SpainMUN$SMR <- NULL carto <- merge(Carto_SpainMUN,data,by="ID") head(carto) ``` A character vector with the covariate names or the design matrix of the fixed effects can be included through the `X=...` argument of the `CAR_INLA()` function: ```{r} iCAR.Global <- CAR_INLA(carto=carto, ID.area="ID", O="obs", E="exp", X=c("X1","X2"), model="global", prior="intrinsic", inla.mode="compact") iCAR.k1 <- CAR_INLA(carto=carto, ID.area="ID", O="obs", E="exp", X=c("X1","X2"), model="partition", k=1, ID.group="region", prior="intrinsic", inla.mode="compact") ``` ### Local and global estimates of the fixed effects Summary statistics of the posterior marginal estimates for the global model's fixed effects: ```{r} iCAR.Global$summary.fixed ``` In partition models, local estimates of the fixed effects coefficients are obtained in each province: ```{r, message=FALSE, warning=FALSE} x1 <- grep("^X1",rownames(iCAR.k1$summary.fixed.partition)) head(iCAR.k1$summary.fixed.partition[x1,1:6]) x2 <- grep("^X2",rownames(iCAR.k1$summary.fixed.partition)) head(iCAR.k1$summary.fixed.partition[x2,1:6]) ## Maps with posterior median estimates at each province carto.CCAA <- aggregate(Carto_SpainMUN[,"geometry"],list(ID.group=st_drop_geometry(Carto_SpainMUN)$region), head) carto.CCAA$X1 <- iCAR.k1$summary.fixed.partition$`0.5quant`[x1] carto.CCAA$X2 <- iCAR.k1$summary.fixed.partition$`0.5quant`[x2] paleta <- c(brewer.pal(3,"Reds")[3:2],brewer.pal(3,"Blues")[2:3]) values <- seq(-0.2,0.2,0.1) if(tmap4){ tm_shape(carto.CCAA) + tm_polygons(fill=c("X1","X2"), fill.free=FALSE, fill.scale=tm_scale(values=paleta, breaks=values, interval.closure="left"), fill.legend=tm_legend("", show=TRUE, reverse=TRUE, frame=FALSE, position=tm_pos_out("right","center"))) + tm_title("Posterior median estimates") + tm_layout(panel.labels=c("X1 covariate","X2 covariate")) + tm_facets(nrow=1, ncol=2) }else{ tm_shape(carto.CCAA) + tm_polygons(col=c("X1","X2"), palette=paleta, title="", legend.show=T, legend.reverse=T, style="fixed", breaks=values, interval.closure="left") + tm_layout(main.title="Posterior median estimates", main.title.position="center", legend.outside=T, legend.outside.position="right", legend.frame=F, panel.labels=c("X1 covariate","X2 covariate"), legend.outside.size=0.2, outer.margins=c(0.02,0.01,0.02,0.01)) + tm_facets(nrow=1, ncol=2) } ``` Global estimates can be also computed by combining the marginal estimates in each partition by applying the CMC algorithm: ```{r} rbind(Global=iCAR.Global$summary.fixed["X1",1:5], `k1-CMC`=iCAR.k1$summary.fixed["X1",1:5]) rbind(Global=iCAR.Global$summary.fixed["X2",1:5], `k1-CMC`=iCAR.k1$summary.fixed["X2",1:5]) ``` ```{r, echo=FALSE, fig.width=10} par(mfrow=c(1,2), pty="m") ## X1 covariate ## plot(inla.smarginal(iCAR.Global$marginals.fixed$`X1`), type="l", xlab="", ylab="", main="X1 covariate", col="blue", lwd=2, xlim=c(-0.4,0.4)) for(i in x1){ lines(inla.smarginal(iCAR.k1$marginals.fixed.partition[[i]]), lty=2) } lines(inla.smarginal(iCAR.k1$marginals.fixed$`X1`), col="red", lwd=2) legend("topright", legend=c("Global model","CMC estimate"), lwd=2, bty="n", col=c("blue","red")) ## X2 covariate ## plot(inla.smarginal(iCAR.Global$marginals.fixed$`X2`), type="l", xlab="", ylab="", main="X2 covariate", col="blue", lwd=2, xlim=c(-0.4,0.4)) for(i in x2){ lines(inla.smarginal(iCAR.k1$marginals.fixed.partition[[i]]), lty=2) } lines(inla.smarginal(iCAR.k1$marginals.fixed$`X2`), col="red", lwd=2) legend("topright", legend=c("Global model","CMC estimate"), lwd=2, bty="n", col=c("blue","red")) ```