---
title: "Week 7: Activity 2"
format: pdf
editor: source
editor_options: 
  chunk_output_type: console
---

\renewcommand{\vec}[1]{\mathbf{#1}}

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.height = 4, fig.width = 6)
library(tidyverse) 
library(knitr)
library(mnormt)
library(plgp)
library(reshape2)

```


### Tuesday's Recap

- Linear Algebra 
- Linear Model Overview
- Simulating Data in R
- Fitting Linear Models in R
- Bayesian Inference

### Today's Key Concepts

- Multivariate Normal distribution
- Conditional Multivariate normal distribution
- Gaussian Process Intro

---

##### Multivariate Normal Distribution

Formally, our matrix notation has used a multivariate normal distribution.

\begin{equation}
\underline{y} = X \underline{\beta} + \underline{\epsilon},
\end{equation}

where $\underline{\epsilon} \sim N(\underline{0}, \Sigma),$ which also implies $\underline{y} \sim N(X \underline{\beta}, \Sigma)$.

\vfill

Simulate data from a multivariate normal distribution. Use the x sequence created below and define $\Sigma_{ij} = \exp(- d_{ij})$ where $\Sigma_{ij}$ is the $i^{th}$ row and $j^{th}$ column of $\Sigma$ and $d_{ij}$ is the distance between the $i^{th}$ and $j^{th}$ points.

```{r}
x <- seq(0, 10, by = .1)
```


##### Partitioned Matrices

Now consider splitting the sampling units into two partitions such that $\underline{y} = \begin{bmatrix} \underline{y}_1 \\ \underline{y}_2 \end{bmatrix}$. Then,

\vfill

$$\begin{bmatrix}\underline{y}_1 \\ \underline{y}_2 \end{bmatrix} \sim N \left( \begin{bmatrix}X_1 \\ X_2 \end{bmatrix} \underline{\beta}  ,
\begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{12} & \Sigma_{22} \end{bmatrix}\right)$$

\vfill

Fundamentally, there is no change to the model, we have just created "groups" by partitioning the model. Do note that $\Sigma_{11}$ is an $n_1 \times n_1$ covariance matrix.

$$\Sigma_{11} = \begin{bmatrix}
\sigma^2_1 & \sigma_{12} & \cdots &\sigma_{1n_1} \\
\sigma_{22} & \sigma^2_2 &   \cdots &\sigma_{2n_1} \\
\sigma_{31} & \sigma_{32} &   \ddots &\sigma_{3n_1} \\
\vdots & \vdots &   \ddots &\vdots \\
\sigma_{n_1 1} & \sigma_{n_1 2} &   \ddots &\sigma^2_{n_1} 
\end{bmatrix}$$

\vfill

However, while $\Sigma_{12} = \Sigma_{21}^T$, neither of these are necessarily symmetric matrices. They also do not have any variance components, but rather just covariance terms. $\Sigma_{12}$ will be an $n_1 \times n_2$ matrix.

$$\Sigma_{11} = \begin{bmatrix}
\sigma_{1,n_1 +1} & \sigma_{1,n_1 + 2} & \cdots &\sigma_{1,n_1 + n_2} \\
\sigma_{2, n_1 + 1} & \sigma_{2, n1 + 2} &   \cdots &\sigma_{2,n_1 + n_2} \\
\vdots & \vdots &   \ddots &\vdots \\
\sigma_{n_1, n_1 + 1} & \sigma_{n_1, n_1 + 2} &   \ddots &\sigma_{n_1, n_1 + n_2} 
\end{bmatrix}$$

\newpage

### Conditional Multivariate Normal

Here is where the magic happens with correlated data. Let $\underline{y_1}|\underline{Y_2}=\underline{y_2}$ be a conditional distribution for $\underline{y_1}$ given that $\underline{y_2}$ is known. Then

\vfill

$$\underline{y_1}|\underline{y_2} \sim N \left( X_1\beta + \Sigma_{12} \Sigma_{22}^{-1}\left(\underline{y_2} - X_2\beta \right), \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)$$
\vfill

Now let's consider a few special cases (in the context of the DC housing dataset.) What is $\underline{y_1}|\underline{y_2}  \sim$

\vfill

1. Let $\Sigma = \sigma^2 I$, then the batch of houses in group 1 are conditionally dependent from the houses in group 2 

\vfill

2. Otherwise, let $\Sigma = \sigma^2 H$ and we'll assume $\Sigma_{12}$ has some non-zero elements. 
\vfill

First a quick interlude about matrix inversion. The inverse of a symmetric matrix is defined such that $E \times E^{-1} = I$. We can calculate the inverse of a matrix for a $1 \times 1$ matrix, perhaps as $2 \times 2$, matrix and maybe even a $3 \times 3$ matrix. However, beyond that it is quite challenging and time consuming. Furthermore, it is also (relatively) time intensive for your computer.

\vfill

\newpage

#### Visual Example

Let $n_1 = 1$ and $n_2 = 1$, then

$$\begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \sim N \left( \begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}   ,
\begin{bmatrix} \sigma^2_{1} & \sigma_{12} \\ \sigma_{12} & \sigma_{2}^2 \end{bmatrix}\right)$$

and

$$y_1|y_2 \sim N \left( \mu_1 + \sigma_{12} (\sigma_{2}^{2})^{-1}\left(y_2 - \mu_2 \right), \sigma_{1}^2 - \sigma_{12} (\sigma_{2}^{2})^{-1} \sigma_{21} \right)$$
\vfill

Now consider an illustration for a couple simple scenarios. Let $\mu_1 = \mu_2 = 0$ and $\sigma^2_1 = \sigma^2_2 = 1$. Now assume $y_2 = -2$ and we compare the conditional distribution for a few values of $\sigma_{12}$ = {0, .2, .8}. Plot these three distributions on the same figure.



\vfill

#### Marginal Distributions

One last note, the marginal distributions for any partition $\underline{y_1}$ are quite simple.

$$\underline{y_1} \sim N \left( X_1\beta, \Sigma_{11} \right)$$
or just

$$y_1 \sim N \left( X_1\beta, \sigma^2_{1} \right)$$
if $y_1$ is scalar.


#### GP Overview

Now let's extend this idea to a Gaussian Process (GP). There are two fundamental ideas to a GP.

\vfill

1. Any finite set of realizations (say $\underline{y_2}$) has a multivariate normal distribution.

\vfill

2. Conditional on a set of realizations, all other locations (say $\underline{y_1}$) have a conditional normal distribution characterized by the mean, and most importantly the covariance function. Note the dimension of $\underline{y_1}$ can actually be infinite, such as defined on the real line.

\vfill

The big question is how to we estimate $\Sigma_{12}$?

\vfill


\vfill

Fundamental idea of spatial statistics is that things close together tend to be similar.


#### Correlation function

Initially, let's consider correlation as a function of distance, in one dimension or on a line.

\vfill

As a starting point, consider a variant of what is known as the exponential covariance function - we used this earlier. First define $d$ as the Euclidean distance between $x_1$ and $x_2$, such that $d = \sqrt{(x_i - x_j)^2}$ 

$$\rho_{i,j} = \exp \left(- d \right)$$

\vfill

Lets view the exponential correlation as a function of distance between the two points.

```{r, echo = F}
dist <- seq(0, 10, by = .1)

tibble(rho = exp(-dist), dist = dist) %>%
  ggplot(aes(y = rho, x = dist)) + geom_line() +
  theme_bw() + ylab(expression(rho))

```

\vfill

Using a correlation function can reduce the number of unknown parameters in a covariance matrix. In an unrestricted case, $\Sigma$ has $n \choose 2$ + $n$ unknown parameters. However, using a correlation function can reduce the number of unknown parameters substantially, generally less than 4.

\newpage

#### Realizations of a Gaussian Process

Recall that a process implies an infinite dimensional object. So we can generate a line rather than a discrete set of points. (While in practice the line will in fact be generated with a discrete set of points and then connected.)

\vfill

For this scenario we will assume a zero-mean GP, with covariance equal to the correlation function using $\rho_{i,j} = \exp \left(- d \right)$

Simulate and visualize a realization from this process.



\vfill



#### Connecting a GP to conditional normal

Now consider a discrete set of points, say $\underline{y_2}$, how can we estimate the response for the remainder of the values in the interval [0,1].

```{r, echo = F}
x2 <- seq(0, 10, by = .75)
n <- length(x2)
d2 <- as.matrix(dist(x2, diag = T, upper = T))
Sigma22 <- exp(-d2) 
y2 <- rmnorm(1, rep(0,n),Sigma22)
data_fig <- tibble(y = y2, x = x2) %>% ggplot(aes(y=y, x=x)) +
  #geom_line() + 
  theme_bw() + ggtitle('Observed Data') +
  geom_point(size = .5)
data_fig
```

\newpage

We can connect the dots (with uncertainty) using:

$$\underline{y_1}|\underline{y_2} \sim N \left( X_1\beta + \Sigma_{12} \Sigma_{22}^{-1}\left(\underline{y_2} - X_2\beta \right), \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)$$
\vfill

Plot the conditional mean curve on the observed data.

\vfill


### GP Regression

Now rather than specifying a zero-mean GP, let the mean be $X \underline{\beta}.$

\vfill

```{r}
x <- seq(0, 10, by = .25)
beta <- 1
n <- length(x)
d <- sqrt(plgp::distance(x))
H <- exp(-d)
y <- rmnorm(1, x * beta ,H)
```

```{r, echo = F, fig.width = 8, fig.height = 4}
tibble(y = y, x = x) %>% ggplot(aes(y=y, x=x)) +
 theme_bw() + ggtitle('Random realization of a GP Regression') +
  geom_point(size = .5) + geom_smooth(formula = 'y~x', method = 'lm')
```

\