---
title: "Spatial Statistics Notation"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = F, message = F)
library(tidyverse)
library(knitr)
library(gstat)
library(sp)
set.seed(02192021)
```


\vfill

\vfill


\vfill

\vfill


\vfill


\vfill


\vfill

\newpage

Given that we assume $$\boldsymbol{Y}|\mu, \underline{\theta} \sim MVN_n(\underline{\mu}, \Sigma(\underline{\theta})),$$

\vfill


\vfill

\vfill

## Stationarity

Assume the spatial process, $\boldsymbol{Y}(\boldsymbol{s})$ has a mean, $\underline{\mu}(\boldsymbol{s})$, and that the variance of $\boldsymbol{Y}(\boldsymbol{s})$ exists everywhere.

\vfill


## Strict Stationarity
Is this strictly stationary?


```{r}
# Data
a <- data.frame( x=rnorm(20000, 0, .5), y=rnorm(20000, 1, .5) )
b <- data.frame( x=rnorm(20000, 2, 1), y=rnorm(20000, -.5, 1) )
c <- data.frame( x=rnorm(20000, -.5, 1), y=rnorm(20000, 0, 1) )
data <- rbind(a,b,c)

ggplot(data, aes(x=x, y=y) ) +
  stat_density_2d(aes(fill = ..density..), geom = "raster", contour = FALSE) +
  scale_fill_distiller(palette= "Spectral", direction=1) +
  scale_x_continuous(expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  theme(
    legend.position='none'
  )
```


\newpage

## Weak Stationarity
Weak stationarity, or second-order stationarity of a spatial process, requires a constant mean and covariance that is a function of $\boldsymbol{h}$, $Cov(\boldsymbol{Y}(\boldsymbol{s}),\boldsymbol{Y}(\boldsymbol{s +h})) = C(\boldsymbol{h}),$ 

\vfill

Typically with spatial models, the process is assumed to be mean zero as covariates explain the mean structure. 


## Weak Stationarity
Is this weakly stationary?


```{r}
# Data
dat <- rep(0,100)
dat[1:100 %%2 == 1] <- sample(x = c(1,-1), size = 50, replace =T)
dat[1:100 %%2 == 0] <- rnorm(50)
comb <- data.frame(dat = dat, n = 1:100)
ggplot(comb, aes(x=n, y=dat) ) +
  geom_point() + geom_line() + theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) +
  theme_bw()
```

\newpage

## Intrinsic Stationarity
A third kind of stationarity, known as intrinsic stationarity, describes the behavior of differences in the spatial process, rather than the data.
\vfill

Intrinsic stationarity assumes $E[\boldsymbol{Y}(\boldsymbol{s + h}) -\boldsymbol{Y}(\boldsymbol{s}) ] =0$ then define
$$E[\boldsymbol{Y}(\boldsymbol{s + h}) - \boldsymbol{Y}(\boldsymbol{s})]^2 = Var(\boldsymbol{Y}(\boldsymbol{s + h}) - \boldsymbol{Y}(\boldsymbol{s})) = 2 \gamma(\boldsymbol{h}),$$
which only works, and satisfies intrinsic stationarity, if the equation only depends on $\boldsymbol{h}$.

\newpage

## Variograms
Intrinsic stationary justifies the use of variograms $2\gamma( \boldsymbol{h})$

\vfill

Variograms are often used to visualize spatial patterns:

\vfill

\vfill

\vfill

There is a mathematical link between the covariance function $C(\boldsymbol{h})$ and the variogram $2\gamma(\boldsymbol{h})$.

## Isotropy


\vfill


\vfill


## Variogram to the Covariance Function
\begin{eqnarray*}
2 \gamma (\boldsymbol{h}) &=& Var(\boldsymbol{Y}(\boldsymbol{s+h}) - \boldsymbol{Y}(\boldsymbol{s}))\\
&=& Var(\boldsymbol{Y}(\boldsymbol{s+h})) + Var(\boldsymbol{Y}(\boldsymbol{s})) - 2 Cov(Var(\boldsymbol{Y}(\boldsymbol{s+h}),\boldsymbol{Y}(\boldsymbol{s})))\\
&=&C(\boldsymbol{0}) + C(\boldsymbol{0}) - 2C(\boldsymbol{h})\\
&=& 2 \left[C(\boldsymbol{0}) - C(\boldsymbol{h})\right]
\end{eqnarray*}

Given $C()$, the variogram can easily be recovered

\newpage

## Linear semivariogram
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2d \; \; \text{if } d > 0\\
0 \; \; \text{otherwise}
\end{cases}$$

```{r}
tau.sq <- 1
sigma.sq <- 1
d <- seq(0,3, by =.01)
lin.gam <- tau.sq + sigma.sq * d
lin.var <- data.frame(d=d, lin.gam = lin.gam )
ggplot(data = lin.var, aes(x=d, y=lin.gam)) + geom_line() + ylim(0,4) + ylab('linear variogram') + theme_bw()
```


### Nugget, sill, partial-sill, and range


\vfill


\vfill


\vfill

\newpage

## Spherical semivariogram: 
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2 \; \; \text{if } d \geq 1/ \phi\\
\tau^2 + \sigma^2\left[\frac{3\phi d}{2} - \frac{1}{2} (\phi d)^3 \right]\\
0 \; \; \text{otherwise}
\end{cases}$$

- Sketch, or generate in R, a spherical semivariogram

- On this figure label the nugget, sill, partial sill, and range.

```{r, eval = T}
tau.sq <- 1
sigma.sq <- 2
range <- 2
phi <- 1/range
d <- seq(0,3, by =.01)
sph.gam <- tau.sq + sigma.sq * (3 * phi * d / 2 - .5 * (phi*d)^3)
sph.gam[d >= range] <- tau.sq + sigma.sq
sph.var <- data.frame(d=d, sph.gam = sph.gam )
# ggplot(data = sph.var, aes(x=d, y=sph.gam)) + geom_line() + ylim(0,4) + ylab('spherical variogram') + annotate('text', x=0.1, y = 0.9, label = 'nugget') + annotate('text', x=2.5, y = 3.1, label = 'sill') + annotate('text', x=2.2, y = .1, label = 'range', color = 'blue') +  annotate("segment", x = 2, xend = 2, y = 0, yend = 4,
#   colour = "blue", linetype =3, size = .5) +  annotate("segment", x = 2, xend = 3, y = 1, yend = 1,colour = "red", linetype =3, size = .5) +  annotate("segment", x = 2, xend = 3, y = 3, yend = 3,colour = "red", linetype =3, size = .5) +  annotate("segment", x = 2.5, xend = 2.5, y = 1, yend = 3,colour = "red", linetype =3, size = .5) + annotate('text', x=2.2, y = 2, label = 'partial sill', color = 'red') 
```

\newpage

## Exponential 

We saw the exponential covariance earlier in class, what is the mathematical form of the covariance?

\vfill


\vfill

The variogram is 
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2(1 - \exp(- d / \phi)) \; \; \text{if } d > 0\\
0 \; \; \text{otherwise} 
\end{cases}$$


```{r}
exp.gam <- tau.sq + sigma.sq * (1 - exp( - phi * d))
exp.var <- data.frame(d=d, exp.gam = exp.gam )
# ggplot(data = exp.var, aes(x=d, y=exp.gam)) + geom_line() + ylim(0,4) + ylab('exponential variogram') + annotate('text', x=0.1, y = 0.9, label = 'nugget') + annotate('text', x=2.9, y = 2.65, label = 'sill?') + annotate('text', x=2.2, y = .1, label = 'range?', color = 'blue') +  annotate("segment", x = 2, xend = 2, y = 0, yend = 4,
#   colour = "blue", linetype =3, size = .5) + theme_bw()
```


\vfill

\vfill

\vfill


```{r}
d <- seq(0,8, by = .01)
exp.gam <- tau.sq + sigma.sq * (1 - exp( - phi * d))
exp.var <- data.frame(d=d, exp.gam = exp.gam )
ggplot(data = exp.var, aes(x=d, y=exp.gam)) + geom_line() + ylim(0,4) + ylab('exponential variogram') + annotate('text', x=0.1, y = 0.9, label = 'nugget') + annotate('text', x=7.9, y = 3.1, label = 'sill') + annotate('text', x=6.6, y = .1, label = 'effective range', color = 'blue') +  annotate("segment", x = 6, xend = 6, y = 0, yend = 4,
  colour = "blue", linetype =3, size = .5) + theme_bw()
```

\vfill

\newpage

## More Semivariograms: Equations
Gaussian: 
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2(1 - \exp(-\phi^2 d^2)) \; \; \text{if } d > 0\\
0 \; \; \text{otherwise} 
\end{cases}$$

\vfill

Powered Exponential:
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2(1 - \exp(-|\phi d|^p)) \; \; \text{if } d > 0\\
0 \; \; \text{otherwise} 
\end{cases}$$

\vfill

Mat$\acute{\text{e}}$rn:
$$\gamma(d)=
\begin{cases}
\tau^2 + \sigma^2\left[1 - \frac{(2\sqrt{\nu}d \phi)^\nu}{2^{\nu-1}\Gamma(\nu)} K_\nu (2 \sqrt{\nu} d \phi)\right] \; \; \text{if } d > 0\\
0 \; \; \text{otherwise} 
\end{cases},$$
where $K_\nu$ is a modified Bessel function and $\Gamma()$ is a Gamma function.

\vfill


\newpage

## Variogram Creation: How?

```{r, echo=T,}
data(meuse)
meuse.small <- meuse %>% select(x, y, copper) %>% as_tibble()
meuse.small %>% head(15) %>% kable()
```

## Variogram Creation: Steps

1. Calculate distances between sampling locations
2. Choose grid for distance calculations
3. Calculate empirical semivariogram
$$\hat{\gamma}(d_k) = \frac{1}{2N(d_k)} \sum_{\boldsymbol{s}_i,\boldsymbol{s}_i, \in N(d_k)}\left[ \boldsymbol{Y}(\boldsymbol{s}_i) - \boldsymbol{Y}(\boldsymbol{s}_j)  \right]^2,$$
where $$N(d_k) = \{(\boldsymbol{s}_i,\boldsymbol{s}_j): ||\boldsymbol{s}_i -\boldsymbol{s}_j|| \in I_k\}$$ and $I_k$ is the $k^{th}$ interval.
4. Plot the semivariogram


## Variogram Creation: R function

```{r, eval = T}
coordinates(meuse) = ~x+y
variogram(copper~1, meuse) %>% plot()
```

## Variogram Creation: How - Step 1

Calculate Distances between sampling locations

```{r, echo= T}
#dist(meuse.small)
dist.mat <- dist(meuse.small %>% select(x,y))
```

## Variogram Creation: How - Step 2

Choose grid for distance calculations

```{r, echo= T}
cutoff <- max(dist.mat) / 3 # default maximum distance
num.bins <-  15
bin.width <- cutoff / 15
```

## Variogram Creation: How - Step 3

Calculate empirical semivariogram

```{r, echo = T}
dist.sq <- dist(meuse.small$copper)^2

vario.dat <- data.frame(dist = as.numeric(dist.mat), diff = as.numeric(dist.sq)) %>%
  mutate(bin = floor(dist / bin.width) + 1) %>% filter(bin < 16) %>% 
  group_by(bin) %>% summarize(emp.sv = .5 * mean(diff)) 
vario.dat

```

## Variogram Creation: How - Step 4

Plot empirical semivariogram

```{r}
vario.dat %>% ggplot(aes(x=bin, y= emp.sv)) + 
  geom_point() + xlim(0,15) + ylim(0,800) + 
  scale_x_continuous(breaks=c(5, 10, 15), labels =c('500','1000','1500')) +
  ggtitle('Empirical Semivariogram for Copper in Meuse Data Set') + 
  xlab('distance (m)') + ylab('Empirical Semivariogram') + 
  theme_bw()
```