Generalized Additive Models ============================ css: custom.css transition: none ```{r setup, include=FALSE} library(knitr) library(magrittr) library(viridis) library(ggplot2) library(reshape2) library(animation) opts_chunk$set(cache=TRUE, echo=FALSE, fig.height=10, fig.width=10) ``` ```{r initialmodeletc, echo=FALSE, message=FALSE, warning=FALSE} load("../spermwhaledata/R_import/spermwhale.RData") library(Distance) library(dsm) df <- ds(dist, truncation=6000) dsm_tw_xy_depth <- dsm(count ~ s(x, y) + s(Depth), ddf.obj=df, observation.data=obs, segment.data=segs, family=tw()) ``` Overview ========= - What is a GAM? - What is smoothing? - How do GAMs work? - Fitting GAMs using `dsm` What is a GAM? =============== type:section "gam" ==================== 1. *Collective noun used to refer to a group of whales, or rarely also of porpoises; a pod.* 2. *(by extension) A social gathering of whalers (whaling ships).* (via Natalie Kelly, AAD. Seen in Moby Dick.) Generalized Additive Models ============================ - Generalized: many response distributions - Additive: terms **add** together - Models: well, it's a model... What does a model look like? ============================= - Count $n_j$ distributed according to some count distribution - Model as sum of terms ```{r sumterms, fig.width=15} plot(dsm_tw_xy_depth, pages=1, scheme=1) ``` Mathematically... ================== Taking the previous example... $$ n_j = \color{red}{A_j}\color{blue}{\hat{p}_j} \color{green}{\exp}\left[\color{grey}{ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j)} \right] + \epsilon_j $$ where $\epsilon_j \sim N(0, \sigma^2)$, $\quad n_j\sim$ count distribution - $\color{red}{\text{area of segment - offset}}$ - $\color{blue}{\text{probability of detection in segment}}$ - $\color{green}{\text{link function}}$ - $\color{grey}{\text{model terms}}$ Response ================== $$ \color{red}{n_j} = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) + s(\text{Depth}_j) \right] + \epsilon_j $$
where $\epsilon_j \sim N(0, \sigma^2)$, $\quad \color{red}{n_j\sim \text{count distribution}}$ Count distributions ===================== ```{r countshist} hist(dsm_tw_xy_depth$data$count, xlab="Count", main="") ``` *** - Response is a count (not not always integer) - Often, it's mostly zero (that's complicated) - Want response distribution that deals with that - Flexible mean-variance relationship Tweedie distribution ===================== ```{r tweedie} library(tweedie) library(RColorBrewer) # tweedie y<-seq(0.01,5,by=0.01) pows <- seq(1.2, 1.9, by=0.1) fymat <- matrix(NA, length(y), length(pows)) i <- 1 for(pow in pows){ fymat[,i] <- dtweedie( y=y, power=pow, mu=2, phi=1) i <- i+1 } plot(range(y), range(fymat), type="n", ylab="Density", xlab="x", cex.lab=1.5, main="") rr <- brewer.pal(8,"Dark2") for(i in 1:ncol(fymat)){ lines(y, fymat[,i], type="l", col=rr[i], lwd=2) } ``` *** - $\text{Var}\left(\text{count}\right) = \phi\mathbb{E}(\text{count})^q$ - Common distributions are sub-cases: - $q=1 \Rightarrow$ Poisson - $q=2 \Rightarrow$ Gamma - $q=3 \Rightarrow$ Normal - We are interested in $1 < q < 2$ - (here $q = 1.2, 1.3, \ldots, 1.9$) Negative binomial distribution ================== ```{r negbin} y<-seq(1,12,by=1) disps <- seq(0.001, 1, len=10) fymat <- matrix(NA, length(y), length(disps)) i <- 1 for(disp in disps){ fymat[,i] <- dnbinom(y, size=disp, mu=5) i <- i+1 } plot(range(y), range(fymat), type="n", ylab="Density", xlab="x", cex.lab=1.5, main="") rr <- brewer.pal(8,"Dark2") for(i in 1:ncol(fymat)){ lines(y, fymat[,i], type="l", col=rr[i], lwd=2) } ``` *** - $\text{Var}\left(\text{count}\right) =$ $\mathbb{E}(\text{count}) + \kappa \mathbb{E}(\text{count})^2$ - Estimate $\kappa$ - Is quadratic relationship a "strong" assumption? - Similar to Poisson: $\text{Var}\left(\text{count}\right) =\mathbb{E}(\text{count})$ Smooth terms ================== $$ n_j = A_j\hat{p}_j \exp\left[ \beta_0 + \color{red}{s(\text{y}_j) + s(\text{Depth}_j}) \right] + \epsilon_j $$
where $\epsilon_j \sim N(0, \sigma^2)$, $\quad n_j\sim$ count distribution Okay, but what about these "s" things? ==================================== ```{r n-covar, fig.height=12, fig.width=9} spdat <- dsm_tw_xy_depth$data spdat <- melt(spdat, id.vars = c("Sample.Label","count"), measure.vars = c("x","y")) p <- ggplot(spdat) + geom_point(aes(y=count,x=value)) + theme_minimal() + theme(strip.text=element_text(size=25), axis.text.y=element_text(size=15)) + xlab("") + ylab("Count") + facet_wrap(~variable, ncol=1) print(p) ``` *** - Think $s$=**smooth** - Want to model the covariates flexibly - Covariates and response not necessarily linearly related! - Want some wiggles What is smoothing? =============== type:section Straight lines vs. interpolation ================================= ```{r wiggles} par(cex=1.5, lwd=1.75) library(mgcv) # hacked from the example in ?gam set.seed(2) ## simulate some data... dat <- gamSim(1,n=50,dist="normal",scale=0.5, verbose=FALSE) dat$y <- dat$f2 + rnorm(length(dat$f2), sd = sqrt(0.5)) f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10-mean(dat$y) ylim <- c(-4,6) # fit some models b.justright <- gam(y~s(x2),data=dat) b.sp0 <- gam(y~s(x2, sp=0, k=50),data=dat) b.spinf <- gam(y~s(x2),data=dat, sp=1e10) curve(f2, 0, 1, col="blue", ylim=ylim) points(dat$x2, dat$y-mean(dat$y), pch=19, cex=0.8) ``` *** - Want a line that is "close" to all the data - Don't want interpolation -- we know there is "error" - Balance between interpolation and "fit" Splines ======== left: 55% ```{r results='hide'} par(cex=1.5) set.seed(2) datb <- gamSim(1,n=400,dist="normal",scale=2) bb <- gam(y~s(x0, k=5, bs="cr"),data=datb) # main plot plot(bb, se=FALSE, ylim=c(-1, 1), lwd=3, asp=1/2) # plot each basis cf <- coef(bb) xp <- data.frame(x0=seq(0, 1, length.out=100)) Xp <- predict(bb, newdata=xp, type="lpmatrix") for(i in 1:length(cf)){ cf_c <- cf cf_c[-i] <- 0 cf_c[i] <- 1 lines(xp$x0, as.vector(Xp%*%cf_c), lty=i+1, lwd=2) } ``` *** - Functions made of other, simpler functions - **Basis functions** $b_k$, estimate $\beta_k$ - $s(x) = \sum_{k=1}^K \beta_k b_k(x)$ - Makes the maths much easier Measuring wigglyness ====================== - Visually: - Lots of wiggles == NOT SMOOTH - Straight line == VERY SMOOTH - How do we do this mathematically? - Derivatives! - (Calculus *was* a useful class afterall) Wigglyness by derivatives ========================== ```{r wigglyanim, results="hide", fig.width=12} par(cex=1.5, lwd=1.75) library(numDeriv) f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 - mean(dat$y) xvals <- seq(0,1,len=100) plot_wiggly <- function(f2, xvals){ # pre-calculate f2v <- f2(xvals) f2vg <- grad(f2,xvals) f2vg2 <- unlist(lapply(xvals, hessian, func=f2)) f2vg2min <- min(f2vg2) -2 # now plot for(i in 1:length(xvals)){ par(mfrow=c(1,3)) plot(xvals, f2v, type="l", main="function", ylab="f") points(xvals[i], f2v[i], pch=19, col="red") plot(xvals, f2vg, type="l", main="derivative", ylab="df/dx") points(xvals[i], f2vg[i], pch=19, col="red") plot(xvals, f2vg2, type="l", main="2nd derivative", ylab="d2f/dx2") points(xvals[i], f2vg2[i], pch=19, col="red") polygon(x=c(0,xvals[1:i], xvals[i],f2vg2min), y=c(f2vg2min,f2vg2[1:i],f2vg2min,f2vg2min), col = "grey") ani.pause() } } saveGIF(plot_wiggly(f2, xvals), "wiggly.gif", interval = 0.2, ani.width = 800, ani.height = 400) ``` ![Animation of derivatives](wiggly.gif) Making wigglyness matter ========================= - Integration of derivative (squared) gives wigglyness - Fit needs to be **penalised** - **Penalty matrix** gives the wigglyness - Estimate the $\beta_k$ terms but penalise objective - "closeness to data" + penalty Penalty matrix =============== - For each $b_k$ calculate the penalty - Penalty is a function of $\beta$ - $\lambda \beta^\text{T}S\beta$ - $S$ calculated once - smoothing parameter ($\lambda$) dictates influence Smoothing parameter ======================= ```{r wiggles-plot, fig.width=18, fig.height=10} # make three plots, w. estimated smooth, truth and data on each par(mfrow=c(1,3), lwd=2.6, cex=1.6, pch=19, cex.main=1.8) plot(b.justright, se=FALSE, ylim=ylim, main=expression(lambda*plain("= estimated"))) points(dat$x2, dat$y-mean(dat$y)) curve(f2,0,1, col="blue", add=TRUE) plot(b.sp0, se=FALSE, ylim=ylim, main=expression(lambda*plain("=")*0)) points(dat$x2, dat$y-mean(dat$y)) curve(f2,0,1, col="blue", add=TRUE) plot(b.spinf, se=FALSE, ylim=ylim, main=expression(lambda*plain("=")*infinity)) points(dat$x2, dat$y-mean(dat$y)) curve(f2,0,1, col="blue", add=TRUE) ``` How wiggly are things? ======================== - We can set **basis complexity** or "size" ($k$) - Maximum wigglyness - Smooths have **effective degrees of freedom** (EDF) - EDF < $k$ - Set $k$ "large enough" Why GAMs are cool... ================================================ ![](images/igam.jpg) *** - Fancy smooths (cyclic, boundaries, ...) - Fancy responses (exp family and beyond!) - Random effects (by equivalence) - Markov random fields - Correlation structures - See Wood (2006/2017) for a handy intro Okay, that was a lot of theory... ================================== type:section Example data ========================= type:section Example data ============ Example data ============ Sperm whales off the US east coast ==================================== *** - Hang out near canyons, eat squid - Surveys in 2004, US east coast - Combination of data from 2 NOAA cruises - Thanks to Debi Palka (NOAA NEFSC), Lance Garrison (NOAA SEFSC) for data. Jason Roberts (Duke University) for data prep. Model formulation ================= - Pure spatial, pure environmental, mixed? - May have some prior knowledge - Biology/ecology - What are drivers of distribution? - Inferential aim - Abundance - Ecology Fitting GAMs using dsm ========================= type:section Translating maths into R ========================== $$ n_j = A_j\hat{p}_j \exp\left[ \beta_0 + s(\text{y}_j) \right] + \epsilon_j $$
where $\epsilon_j \sim N(0, \sigma^2)$, $\quad n_j\sim$ count distribution
- inside the link: `formula=count ~ s(y)` - response distribution: `family=nb()` or `family=tw()` - detectability: `ddf.obj=df_hr` - offset, data: `segment.data=segs, observation.data=obs` Your first DSM =============== ```{r firstdsm, echo=TRUE} library(dsm) dsm_x_tw <- dsm(count~s(x), ddf.obj=df, segment.data=segs, observation.data=obs, family=tw()) ``` `dsm` is based on `mgcv` by Simon Wood What did that do? =================== ```{r echo=TRUE} summary(dsm_x_tw) ``` Plotting ================ ```{r plotsmooth} plot(dsm_x_tw) ``` *** - `plot(dsm_x_tw)` - Dashed lines indicate +/- 2 standard errors - Rug plot - On the link scale - EDF on $y$ axis Adding a term =============== - Just use `+` ```{r xydsm, echo=TRUE} dsm_xy_tw <- dsm(count ~ s(x) + s(y), ddf.obj=df, segment.data=segs, observation.data=obs, family=tw()) ``` Summary =================== ```{r echo=TRUE} summary(dsm_xy_tw) ``` Plotting ================ ```{r plotsmooth-xy1, eval=FALSE, echo=TRUE} plot(dsm_xy_tw, pages=1) ``` ```{r plotsmooth-xy2, fig.width=25, echo=FALSE} par(cex.axis=2, lwd=2, cex.lab=2) plot(dsm_xy_tw, pages=1) ``` - `scale=0`: each plot on different scale - `pages=1`: plot together Bivariate terms ================ - Assumed an additive structure - No interaction - We can specify `s(x,y)` (and `s(x,y,z,...)`) Thin plate regression splines ================================ - Default basis - One basis function per data point - Reduce # basis functions (eigendecomposition) - Fitting on reduced problem - Multidimensional Thin plate splines (2-D) ==================== Thin plate regression spline basis functions. Taken from Wood 2006. Bivariate spatial term ======================= ```{r xy-biv-dsm, echo=TRUE} dsm_xyb_tw <- dsm(count ~ s(x, y), ddf.obj=df, segment.data=segs, observation.data=obs, family=tw()) ``` Summary =================== ```{r echo=TRUE} summary(dsm_xyb_tw) ``` Plotting... erm... ================ ```{r plotsmooth-xy-biv1, eval=TRUE, fig.width=15, fig.height=17} par(cex.axis=2, lwd=2, cex.lab=2, cex=2) plot(dsm_xyb_tw, asp=1) ``` *** ```{r plotsmooth-xy-biv2, eval=FALSE, echo=TRUE} plot(dsm_xyb_tw) ``` Let's try something different =============================== ```{r twodee-p, echo=TRUE, eval=FALSE} plot(dsm_xyb_tw, select=1, scheme=2, asp=1) ``` - Still on link scale - `too.far` excludes points far from data *** ```{r twodee, echo=FALSE, fig.height=12} par(cex.axis=1.2, lwd=2, cex.lab=1.5, cex=2) plot(dsm_xyb_tw, select=1, scheme=2, asp=1) ``` Comparing bivariate and additive models ======================================== ```{r xy-x-y, fig.width=28, fig.height=15} dsm_xy_nb <- dsm(count~s(x,y), ddf.obj=df, segment.data=segs, observation.data=obs, family=nb()) dsm_x_y_nb <- dsm(count~s(x) +s(y), ddf.obj=df, segment.data=segs, observation.data=obs, family=nb()) par(cex.axis=1.2, cex.main=4, lwd=2, cex.lab=1.8, cex=2, mfrow=c(1,2)) vis.gam(dsm_xy_nb, plot.type = "contour", view=c("x","y"), zlim = c(-11,1), too.far=0.1, asp=1, main="Bivariate") vis.gam(dsm_x_y_nb, plot.type = "contour", view=c("x","y"), zlim = c(-11,1), too.far=0.1, asp=1, main="Additive") ``` Let's have a go... ============================== type:section