--- title: "Modeling Areal Data: Intro" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(message = FALSE) library(tidyverse) library(mnormt) library(spdep) set.seed(03202021) ``` ## Spatial Smoothing \vfill \vfill \vfill \vfill \vfill \vfill \newpage ## Areal Data Models: Disease Mapping Areal data with counts is often associated with disease mapping, where there are two quantities for each areal unit: \vfill One way to think about the expected counts is \vfill \vfill However note that $\bar{r},$ and hence, $E_i$ is a not fixed, but is a function of the data. \vfill An alternative is to use some standard rate for a given age group, such that $E_i = \sum_j n_{ij} r_j.$ \vfill ### Traditional Models Often counts are assumed to follow the Poisson model where \vfill Then the MLE of $\eta_i$ is $\frac{Y_i}{E_i}$. \newpage ### Poisson-Gamma Model Consider the following framework \vfill \vfill \vfill For the Poisson sampling model, the gamma prior is conjugate. This means that the posterior distribution $p(\eta_i | y_i)$ is also a gamma distribution, and in particular \vfill The mean of this distribution is \begin{eqnarray*} E(\eta_i | \boldsymbol{y}) = E(\eta_i| y_i) &=& \frac{y_i + a}{E_i + b} = \frac{y_i + \frac{\mu^2}{\sigma^2}}{E_i + \frac{\mu}{\sigma^2}}\\ &=& \frac{E_i (\frac{y_i}{E_i})}{E_i + \frac{\mu}{\sigma^2}} + \frac{(\frac{\mu}{\sigma^2})\mu}{E_i + \frac{\mu}{\sigma2}}\\ &=& w_i SMR_i + (1 - w_i) \mu, \end{eqnarray*} \vfill \newpage ## Poisson-lognormal models The model can be written as \begin{eqnarray*} Y_i | \psi_i &\sim& Poisson(E_i \exp(\psi_i))\\ \psi_i &=& \boldsymbol{x_i^T}\boldsymbol{\beta} + \theta_i + \phi_i \end{eqnarray*} ### Brook's Lemma and Markov Random Fields To consider areal data from a model-based perspective, it is necessary to obtain the joint distribution of the responses $$p(y_1, \dots, y_n).$$ \vfill From the joint distribution, the *full conditional distribution* $$p(y_i|y_j, j \neq i),$$ is uniquely determined. \vfill \vfill When the areal data set is large, working with the full conditional distributions can be preferred to the full joint distribution. \vfill More specifically, the response $Y_i$ should only directly depend on the neighbors, \newpage ## Markov Random Field The idea of using the local specification for determining the global form of the distribution is \vfill An essential element of a MRF is a *clique*, which is a group of units where each unit is a neighbor of all units in the clique \vfill A *potential function* is a function that is exchangeable in the arguments. With continuous data a common potential is $(Y_i - Y_j)^2$ if $i \sim j$ ($i$ is a neighbor of $j$). \vfill ### Gibbs Distribution A joint distribution $p(y_1, \dots, y_n)$ is a Gibbs distribution if it is a function of $Y_i$ only through the potential on cliques. \vfill Mathematically, this can be expressed as: $$p(y_1, \dots, y_n) \propto \exp \left(\gamma \sum_k \sum_{\alpha \in \mathcal{M}_k} \phi^{(k)}(y_{\alpha_1},y_{\alpha_2}, \dots, y_{\alpha_k} ) \right),$$ where $\phi^{(k)}$ is a potential of order $k$, $\mathcal{M}_k$ is the collection of all subsets of size $k = {1, 2, ...}$(typically restricted to 2 in spatial settings), $\alpha$ indexes the set in $\mathcal{M}_k$. \vfill \newpage ### Hammersley-Clifford Theorem The Hammersley-Clifford Theorem demonstrates that if we have a MRF that defines a unique joint distribution, \vfill The converse was later proved, showing that a MRF could be sampled from the associated Gibbs distribution. \vfill ### Model Specification With continuous data, a common choice for the joint distribution is the pairwise difference $$p(y_1, \dots, y_n) \propto \exp \left(-\frac{1}{2\tau^2} \sum_{i,j}(y_i - y_j)^2 I(i \sim j) \right)$$ \vfill Then the full conditional distributions \vfill \vfill \vfill