---
title: "Week 9: Activity Key"
format: gfm
editor: source
editor_options: 
  chunk_output_type: console
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.height = 6, fig.width = 6)
library(tidyverse) 
library(knitr)
library(mnormt)
library(plgp)
library(reshape2)
library(rstan)
eps <- .Machine$double.eps

```


### Last Week's Recap

- Gaussian Process Intro
- Bayesian inference with `stan`
- Correlation functions


### This week

- GPs in 2D
- Bayesian inference with `stan`

---


### Geostatistical Data
At last, we will look at simulated 2-d "spatial" data.


##### 1. Create Sampling Locations

```{r}
#set.seed(03062025)

num.locations <- 50
coords <- data.frame(x = runif(num.locations), y = runif(num.locations))
coords %>% ggplot(aes(x=x,y=y)) + geom_point() + 
  ggtitle('Hypothetical Sampling Locations') + xlim(0,1) + 
  ylim(0,1) + theme_bw()
```

##### 2. Calculate Distances

```{r, echo = T}
dist.mat <- sqrt(plgp::distance(coords))
```

##### 3. Define Covariance Function and Set Parameters

Use exponential covariance with no nugget: 

```{r}
sigma.sq <- 1
phi <- .1
Sigma <- sigma.sq * exp(- dist.mat/phi) + diag(eps, num.locations) 
```


##### 4. Sample realization of the process
- This requires a distributional assumption, we will use the Gaussian distribution

```{r}
Y <- rmnorm(n=1, mean = 0, varcov = Sigma)
```

- What about the rest of the locations on the map?

##### 5. Vizualize Spatial Process

Start with a coarse grid and them move to a finer grid

```{r}
coords %>% mutate(Y = Y) %>% ggplot(aes(x=x,y=y)) + geom_point(aes(color=Y), size=2) + 
  ggtitle(label = 'Simulated Spatial Process', 
          subtitle = 'Exponential Covariance: sigma.sq = 1, phi = .1') +  
  xlim(0,1) + ylim(0,1) +   scale_colour_gradient2() + theme_dark()
```

\newpage

Now we can look at more sampling locations

```{r, echo = F}
dim.grid <- 10
grid.coords <- data.frame(x.grid = rep(seq(.05, .95, length.out=dim.grid), dim.grid),
  y.grid = rep(seq(.05, .95, length.out = dim.grid), each = dim.grid))

dist.grid <- sqrt(plgp::distance(grid.coords))

sigma.sq <- 1
phi <- .1
Sigma <- sigma.sq * exp(- dist.grid/phi) + diag(eps, dim.grid ^ 2) 

Y <- rmnorm(n=1, mean = 0, varcov = Sigma)

grid.coords %>% mutate(Y = Y) %>% ggplot(aes(x=x.grid,y=y.grid)) + geom_point(aes(color=Y), size=3) + ggtitle('Simulated Spatial Process', subtitle = 'Exponential Covariance: sigma.sq = 1, phi = .1') + xlim(0,1) + ylim(0,1) +   scale_colour_gradient2() + theme_dark()

```


```{r, echo = F}
dim.grid <- 50
grid.coords <- data.frame(x.grid = rep(seq(.05, .95, length.out=dim.grid), dim.grid),
  y.grid = rep(seq(.05, .95, length.out = dim.grid), each = dim.grid))

dist.grid <- sqrt(plgp::distance(grid.coords))

sigma.sq <- 1
phi <- .1
Sigma <- sigma.sq * exp(- dist.grid/phi) + diag(eps, dim.grid ^ 2) 

Y <- rmnorm(n=1, mean = 0, varcov = Sigma  )

grid.coords %>% mutate(Y = Y) %>% ggplot(aes(x=x.grid,y=y.grid)) + geom_point(aes(color=Y), size=3) + ggtitle('Simulated Spatial Process', subtitle = 'Exponential Covariance: sigma.sq = 1, phi = .1') + xlim(0,1) + ylim(0,1) +   scale_colour_gradient2() + theme_dark()
```


How does the spatial process change with:

- another draw with same parameters?
- a different value of $\phi$
- a different value of $\sigma^2$

---

### Visual Overview of Bayesian Inference

Using some Bridger Bowl weather data we will provide a visual overview of Bayesian Inference. *The goal will be to model the average winter high temperature at the base of Bridger Bowl.*

1. Prior Specification

\vfill
- First sketch a prior distribution that encapsulates your belief about what you believe the average high temperature would be. *Note this should obey law of total probability*

\vfill

- *Next we generally need to parameterize (perhaps approximately) this belief with some sort of probability distribution.*

\vfill

```{r}
temp_seq <- -10:50
prob_seq <- dnorm(temp_seq, mean = 20, sd = 10)
tibble(temp = temp_seq, prob = prob_seq) %>% ggplot(aes(temp, prob)) + geom_line() + theme_bw() +
  ggtitle("Andy's Prior Belief: N(20, 10^2)")
```


*Formally, my prior is on the mean high temp, which we will denote $\mu$.*

$$\mu \sim N(20, 10^2)$$

\vfill

2. Specify the sampling distribution for the data or perhaps in more familiar language, state the likelihood for the statistical model

\vfill

- *We will assume that the temperature readings are continuous (or "nearly continuous")*

\vfill

- *It seems reasonable to start with a normal distribution, so:*
$$X|\mu, \sigma^2 \sim N(\mu, \sigma^2)$$

\vfill

- Note that we also need to estimate $\sigma$ in this model and need a prior for that parameter too.

\vfill

- Grab some weather data from Bridger Bowl (roughly the first half of January 2021)
```{r}
temp <- c(26, 45, 44, 36, 22, 25, 31, 31, 37, 34, 35, 37, 32, 31)
```

\vfill

- Any concerns about using this data to inform our research question?

\vfill


3. Posterior Inference 

\vfill

- Using classical inference, how would you estimate $\mu$.

\vfill

- *Using maximum likelihood, $\hat{\mu}_{MLE} = \bar{X}$ = `r round(mean(temp))`.*

\vfill

- *With Bayesian inference, our posterior belief is based on the data __and__ our prior belief. Note this can be a blessing or a curse.*

\vfill

- Formally, we have a distribution for the maximum temperature (a posterior distribution):
$p(\mu|x) = \int p(x|\mu,\sigma) \times p(\mu)p(\sigma) /p(\mu)d\sigma$,
note solving this is not trivial and isn't something we will handle in this class.

\vfill

- Luckily, there is an elegant computational procedure that will allow us to approximate $p(\mu|x)$ by taking samples from the distribution. *This is, of course, MCMC.*

\vfill


STAN code for this situation can be written as below. Note that the prior values are hard coded, these could also be passed in as arguments to the model. 

\vfill

```
data {
  int<lower=0> N;
  vector[N] y;
}


parameters {
  real mu;
  real<lower=0> sigma;
}


model {
  y ~ normal(mu, sigma);
  mu ~ normal(20, 10);
}

```

\vfill

```{r, results = F}
temp_data <- stan("normal.stan", data=list(N = length(temp), y=temp))
```

\vfill

```{r}
print(temp_data)
```


```{r}
plot(temp_data)
```
\vfill


We can also view the posterior and prior beliefs together on a single figure.
```{r}
tibble(sims = c(extract(temp_data, pars = 'mu')$mu,rnorm(4000, 20, 10)), 
       Distribution = rep(c('posterior','prior'), each = 4000)) %>% 
  ggplot(aes(x = sims, color = Distribution)) + 
  geom_density() + theme_bw() + 
  xlab('Temperature (F)') + ylab('') + 
  ggtitle("Prior and posterior belief for winter temperature in Bozeman")
```


#### Multivariate Normal Distribution

Next we will segue from standard linear models to analyzing correlated
data.

First we will start with the a bivariate normal distribution: y \~
N(theta,sigma), where theta is a mean vector and sigma = sigmasq \* I is
a covariance matrix.

To provide a motivating context, not consider jointly estimating the
temperature at Bridger Bowl *and* Big Sky Resort.

##### 1. Simulate independent bivariate normal

Simulate a set of temperature values from each location, where the
temperature values are independent (sigma = sigmasq \* I)

```{r}
library(mnormt)
n <- 100
theta <- c(15,25)
sigma <- diag(2) * 100
fake_temperatures <- rmnorm(n, theta , sigma)
```

Then create a few graphs to show marginal distribution of temperature as
well as how the temperatures evolve in time.

```{r}
library(reshape2)
melt(fake_temperatures, value.name = 'temp') %>% 
  rename(location = Var2) %>%
  mutate(location = factor(location)) %>% 
  ggplot(aes(x =temp, fill = location)) + 
  geom_histogram() +
  facet_wrap(.~location) + theme_bw() 
```

```{r}
melt(fake_temperatures, value.name = 'temp') %>% 
  rename(location = Var2, day = Var1) %>%
  mutate(location = factor(location)) %>%
  ggplot(aes(y =temp, x = day, color = location )) + 
  geom_line() + theme_bw() + xlim(0,30) + 
  ggtitle('First 30 observations of independent response')
```

##### 2. Simulate correlated bivariate normal

Simulate a set of temperature values from each location, where the
temperature values are not independent (sigma = sigmasq \* H), where H
is a correlation matrix. (Note there are some constraints we will
discuss later)

```{r}
sigma <- matrix(c(1, .9, .9, 1), nrow = 2, ncol = 2) * 100
fake_temperatures_corr <- rmnorm(n, theta , sigma)
```

Then create a few graphs to show marginal distribution of temperature as
well as how the temperatures evolve in time.

```{r}
melt(fake_temperatures_corr, value.name = 'temp') %>% 
  rename(location = Var2) %>%
  mutate(location = factor(location)) %>% 
  ggplot(aes(x =temp, fill = location)) + geom_histogram() +
  facet_wrap(.~location) + theme_bw()
```


```{r}
melt(fake_temperatures_corr, value.name = 'temp') %>% 
  rename(location = Var2, day = Var1) %>%
  mutate(location = factor(location)) %>%
  ggplot(aes(y =temp, x = day,color = location )) + 
  geom_line() + theme_bw() + xlim(0,30) +
  ggtitle('First 30 observations of correlated response')
```


##### 3. Write STAN code for bivariate normal

Write stan code that will allow you to estimate theta and sigma
(including H)

```
data {
  int<lower=0> p;
  int<lower=0> N;
  matrix[N,p] y;
}

parameters {
  vector[p] theta;
  corr_matrix[p] H;
  real<lower = 0> sigma;
}

model {
  for(i in 1:N){
    y[i,:] ~ multi_normal(theta, sigma*H);
  }
}
```

##### 4. Use STAN to estimate bivariate normal parameters

Use your stan code to estimate theta and sigma (including H and sigmasq)

```{r}
indep_mvn <- stan("multi_norm.stan", 
                  data=list(N = nrow(fake_temperatures), 
                            p = ncol(fake_temperatures), 
                            y=fake_temperatures))
```

```{r}
print(indep_mvn)
```


```{r}
corr_mvn <- stan("multi_norm.stan", 
                 data=list(N = nrow(fake_temperatures_corr),
                           p = ncol(fake_temperatures_corr), 
                           y=fake_temperatures_corr))
```

```{r}
print(corr_mvn)
```

##### 5. Final Thoughts About Correlation

In many statistical models there is an assumption about independence.
When independence is violated, uncertainty is under estimated and in
incorrect inferences can be made.

While lack of independence often has a negative connotation, in spatial
statistics we can actually exploit correlation. For instance, by knowing
the temperature at the weather station at Bozeman High School or Bridger
Bowl, we can estimate temperature at other locations.