--- title: "Week 8: Activity" format: pdf editor: source editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, fig.height = 6, fig.width = 6) library(tidyverse) library(knitr) library(mnormt) library(plgp) library(reshape2) ``` ### Last Week's Recap - Exam 1 - Linear Algebra - Linear Model Overview - Simulating Data in R - Fitting Linear Models in R - Multivariate Normal distribution - Partitioned Matrices and Conditional Multivariate normal distribution ### Video Lectures - `rstan` in R for Bayesian inference ### This week - Gaussian Process Intro - Bayesian inference with `stan` - Correlation functions --- #### Visual Example Let $n_1 = 1$ and $n_2 = 1$, then $$\begin{bmatrix}y_1 \\ y_2 \end{bmatrix} \sim N \left( \begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix} , \begin{bmatrix} \sigma^2_{1} & \sigma_{12} \\ \sigma_{12} & \sigma_{2}^2 \end{bmatrix}\right)$$ and $$y_1|y_2 \sim N \left( \mu_1 + \sigma_{12} (\sigma_{2}^{2})^{-1}\left(y_2 - \mu_2 \right), \sigma_{1}^2 - \sigma_{12} (\sigma_{2}^{2})^{-1} \sigma_{21} \right)$$ \vfill ```{r, echo = F} #| fig-height: 4 mu1 <- 0 mu2 <- 0 sigmasq1 <- sigmasq2 <- 1 dat_seq <- seq(-4,4, by = .01) n_seq <- length(dat_seq) tibble(group = rep(c('y2 = -2; sigma12 = 0', 'y2 = -2; sigma12 = .2', 'y2 = -2; sigma12 = .8'), each = n_seq), dens = c(dnorm(dat_seq, mu1 + 0*(1/sigmasq2)*(-2 - mu2), sqrt(sigmasq1 - 0 * (1/sigmasq2)*0 )), dnorm(dat_seq, mu1 + .2*(1/sigmasq2)*(-2 - mu2), sqrt(sigmasq1 - .2 * (1/sigmasq2)*.2 )), dnorm(dat_seq, mu1 + .8*(1/sigmasq2)*(-2 - mu2), sqrt(sigmasq1 - .8 * (1/sigmasq2)*.8 ))), y = rep(dat_seq, 3)) %>% ggplot(aes(x=y, y = dens, group = group, color = group)) + geom_line() + theme_bw() ``` \vfill __Q:__ Calculate and write out the actual distributions for $y_1$ in these three settings. \vfill One last note, the marginal distributions for any partition $\underline{y_1}$ are quite simple. $$\underline{y_1} \sim N \left( X_1\beta, \Sigma_{11} \right)$$ or just $$y_1 \sim N \left( X_1\beta, \sigma^2_{1} \right)$$ if $y_1$ is scalar. #### GP Overview Now let's extend this idea to a Gaussian Process (GP). There are two fundamental ideas to a GP. \vfill 1. Any finite set of realizations (say $\underline{y_2}$) has a multivariate normal distribution. \vfill 2. Conditional on a set of realizations, all other locations (say $\underline{y_1}$) have a conditional normal distribution characterized by the mean, and most importantly the covariance function. Note the dimension of $\underline{y_1}$ can actually be infinite, such as defined on the real line. \vfill The big question is how to we estimate $\Sigma_{12}$? How many parameters are necessary for this distribution? \vfill #### Correlation function Initially, let's consider correlation as a function of distance, in one dimension or on a line. As a starting point, consider a variant of what is known as the exponential covariance function - we used this earlier. First define $d$ as the Euclidean distance between $x_1$ and $x_2$, such that $d = \sqrt{(x_i - x_j)^2}$ $$\rho_{i,j} = \exp \left(- d \right)$$ \vfill Create a figure that shows the exponential correlation as a function of distance between the two points. \vfill Using a correlation function can reduce the number of unknown parameters in a covariance matrix. In an unrestricted case, $\Sigma$ has $n \choose 2$ + $n$ unknown parameters. However, using a correlation function can reduce the number of unknown parameters substantially, generally less than 4. #### Realizations of a Gaussian Process Recall that a process implies an infinite dimensional object. So we can generate a line rather than a discrete set of points. (While in practice the line will in fact be generated with a discrete set of points and then connected.) \vfill For this scenario we will assume a zero-mean GP, with covariance equal to the correlation function using $\rho_{i,j} = \exp \left(- d \right)$ \vfill ```{r} #| echo: false set.seed(02252025) x <- seq(0, 10, by = .1) dist_mat <- as.matrix(dist(x, diag = T, upper = T)) Sigma <- exp(-dist_mat) y <- rmnorm(n =1, mean = 0 , varcov = Sigma) tibble(y = y, x = x) |> ggplot(aes(y=y, x=x)) + theme_bw() + geom_line() + geom_hline(yintercept = 0, color = 'grey40') + ggtitle('Random realization of a GP') ``` Overlay a few realizations of a Gaussian process on the same curve. #### Connecting a GP to conditional normal Now consider a discrete set of points, say $\underline{y_2}$, how can we estimate the response for the remainder of the values in the interval [0,10]. We can connect the dots (with uncertainty) using: $$\underline{y_1}|\underline{y_2} \sim N \left( X_1\beta + \Sigma_{12} \Sigma_{22}^{-1}\left(\underline{y_2} - X_2\beta \right), \Sigma_{11} - \Sigma_{12} \Sigma_{22}^{-1} \Sigma_{21} \right)$$ \vfill __Create a figure that shows the data points, conditional mean and uncertainty__ ### GP Regression Now rather than specifying a zero-mean GP, let the mean be $X \underline{\beta}.$ Create this figure. --- #### Correlation function: more details Recall the variant of the exponential covariance function that we have previously seen. Where $d$ as the Euclidean distance between $x_1$ and $x_2$, such that $d = \sqrt{(x_i - x_j)^2}$ $$\rho_{i,j} = \exp \left(- d \right)$$ \vfill Recall that we can view the exponential correlation as a function of distance between the two points. Now let's consider a more general framework for covariance where $$\sigma_{i,j} = \sigma^2 \exp \left(- d_{ij} /\phi \right)$$ Now we have introduced two new parameters into this function. What do you suppose that they do? - $\sigma^2$: \vfill - $\phi$: \vfill Modify your previous code do adjust $\phi$ and $\sigma^2$ and explore how they differ.\vfill We will soon talk about a more broad set of correlation functions and another parameter that provides flexibility so that predictions do not have to directly through observed points. \vfill ### Geostatistical Data Exercise At last, we will look at simulated 2-d "spatial" data. ##### 1. Create Sampling Locations ##### 2. Calculate Distances ##### 3. Define Covariance Function and Set Parameters ##### 4. Sample a realization of the process - This requires a distributional assumption, we will use the Gaussian distribution - Start with a coarse grid and them move to a finer grid How does the spatial process change with: - another draw with same parameters? - a different value of $\phi$ - a different value of $\sigma^2$