---
title: "Week 8: Activity"
format: pdf
editor: source
editor_options: 
  chunk_output_type: console
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.height = 6, fig.width = 6)
library(tidyverse) 
library(knitr)
library(mnormt)
library(plgp)
library(reshape2)

```


### Last Week's Recap

- Exam 1

- Linear Algebra 
- Linear Model Overview
- Simulating Data in R
- Fitting Linear Models in R
- Multivariate Normal distribution
- Partitioned Matrices and Conditional Multivariate normal distribution

### Video Lectures

- `rstan` in R for Bayesian inference

### This week

- Gaussian Process Intro
- Bayesian inference with `stan`
- Correlation functions

---


#### Visual Example

Let $n_1 = 1$ and $n_2 = 1$, then

$$\begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \sim N \left( \begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}   ,
\begin{bmatrix} \sigma^2_{1} & \sigma_{12} \\ \sigma_{12} & \sigma_{2}^2 \end{bmatrix}\right)$$

and

$$y_1|y_2 \sim N \left( \mu_1 + \sigma_{12} (\sigma_{2}^{2})^{-1}\left(y_2 - \mu_2 \right), \sigma_{1}^2 - \sigma_{12} (\sigma_{2}^{2})^{-1} \sigma_{21} \right)$$
\vfill

```{r, echo = F}
#| fig-height: 4

mu1 <- 0
mu2 <- 0
sigmasq1 <- sigmasq2 <- 1

dat_seq <- seq(-4,4, by = .01)
n_seq <- length(dat_seq)
tibble(group = rep(c('y2 = -2; sigma12 = 0',
                     'y2 = -2; sigma12 = .2',
                     'y2 = -2; sigma12 = .8'), each = n_seq), 
dens = c(dnorm(dat_seq, mu1 + 0*(1/sigmasq2)*(-2 - mu2),
               sqrt(sigmasq1 - 0 * (1/sigmasq2)*0 )), 
         dnorm(dat_seq, mu1 + .2*(1/sigmasq2)*(-2 - mu2),
               sqrt(sigmasq1 - .2 * (1/sigmasq2)*.2 )), 
         dnorm(dat_seq, mu1 + .8*(1/sigmasq2)*(-2 - mu2),
               sqrt(sigmasq1 - .8 * (1/sigmasq2)*.8 ))), 
y = rep(dat_seq, 3)) %>% 
  ggplot(aes(x=y, y = dens, group = group, color = group)) +
  geom_line() + theme_bw()
```

\vfill

__Q:__ Calculate and write out the actual distributions for $y_1$ in these three settings.



\vfill

One last note, the marginal distributions for any partition $\underline{y_1}$ are quite simple.

$$\underline{y_1} \sim N \left( X_1\beta, \Sigma_{11} \right)$$
or just

$$y_1 \sim N \left( X_1\beta, \sigma^2_{1} \right)$$
if $y_1$ is scalar.


#### GP Overview

Now let's extend this idea to a Gaussian Process (GP). There are two fundamental ideas to a GP.

\vfill

1. Any finite set of realizations (say $\underline{y_2}$) has a multivariate normal distribution.

\vfill

2. Conditional on a set of realizations, all other locations (say $\underline{y_1}$) have a conditional normal distribution characterized by the mean, and most importantly the covariance function. Note the dimension of $\underline{y_1}$ can actually be infinite, such as defined on the real line.

\vfill

The big question is how to we estimate $\Sigma_{12}$? How many parameters are necessary for this distribution?

\vfill




#### Correlation function

Initially, let's consider correlation as a function of distance, in one dimension or on a line.


As a starting point, consider a variant of what is known as the exponential covariance function - we used this earlier. First define $d$ as the Euclidean distance between $x_1$ and $x_2$, such that $d = \sqrt{(x_i - x_j)^2}$ 

$$\rho_{i,j} = \exp \left(- d \right)$$

\vfill

Create a figure that shows the exponential correlation as a function of distance between the two points.

\vfill

Using a correlation function can reduce the number of unknown parameters in a covariance matrix. In an unrestricted case, $\Sigma$ has $n \choose 2$ + $n$ unknown parameters. However, using a correlation function can reduce the number of unknown parameters substantially, generally less than 4.


#### Realizations of a Gaussian Process

Recall that a process implies an infinite dimensional object. So we can generate a line rather than a discrete set of points. (While in practice the line will in fact be generated with a discrete set of points and then connected.)

\vfill

For this scenario we will assume a zero-mean GP, with covariance equal to the correlation function using $\rho_{i,j} = \exp \left(- d \right)$


\vfill

```{r}
#| echo: false
set.seed(02252025)
x <- seq(0, 10, by = .1)

dist_mat <- as.matrix(dist(x, diag = T, upper = T))
Sigma <- exp(-dist_mat)

y <- rmnorm(n =1, mean = 0 , varcov = Sigma)

tibble(y = y, x = x) |>
  ggplot(aes(y=y, x=x)) + 
  theme_bw() +
  geom_line() + 
  geom_hline(yintercept = 0, color = 'grey40') +
  ggtitle('Random realization of a GP') 
```

Overlay a few realizations of a Gaussian process on the same curve.




#### Connecting a GP to conditional normal

Now consider a discrete set of points, say $\underline{y_2}$, how can we estimate the response for the remainder of the values in the interval [0,10].


We can connect the dots (with uncertainty) using:

$$\underline{y_1}|\underline{y_2} \sim N \left( X_1\beta + \Sigma_{12} \Sigma_{22}^{-1}\left(\underline{y_2} - X_2\beta \right), \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)$$
\vfill

__Create a figure that shows the data points, conditional mean and uncertainty__


### GP Regression

Now rather than specifying a zero-mean GP, let the mean be $X \underline{\beta}.$ Create this figure.


---
#### Correlation function: more details

Recall the variant of the exponential covariance function that we have previously seen. Where $d$ as the Euclidean distance between $x_1$ and $x_2$, such that $d = \sqrt{(x_i - x_j)^2}$

$$\rho_{i,j} = \exp \left(- d \right)$$

\vfill

Recall that we can view the exponential correlation as a function of distance between the two points.


Now let's consider a more general framework for covariance where 

$$\sigma_{i,j} = \sigma^2 \exp \left(- d_{ij} /\phi \right)$$

Now we have introduced two new parameters into this function. What do you suppose that they do?

- $\sigma^2$: 

\vfill

- $\phi$: 

\vfill

Modify your previous code do adjust $\phi$ and $\sigma^2$ and explore how they differ.\vfill




We will soon talk about a more broad set of correlation functions and another parameter that provides flexibility so that predictions do not have to directly through observed points.

\vfill

### Geostatistical Data Exercise

At last, we will look at simulated 2-d "spatial" data.


##### 1. Create Sampling Locations


##### 2. Calculate Distances


##### 3. Define Covariance Function and Set Parameters

##### 4. Sample a realization of the process

- This requires a distributional assumption, we will use the Gaussian distribution

- Start with a coarse grid and them move to a finer grid

How does the spatial process change with:

- another draw with same parameters?
- a different value of $\phi$
- a different value of $\sigma^2$