Multivariate smoothing, model selection
========================================
css: custom.css
transition: none
```{r setup, include=FALSE}
library(knitr)
library(magrittr)
library(viridis)
library(ggplot2)
library(reshape2)
library(dsm)
library(animation)
opts_chunk$set(cache=TRUE, echo=FALSE, fig.height=10, fig.width=10)
```
```{r initialmodeletc, echo=FALSE, message=FALSE, warning=FALSE}
load("../spermwhaledata/R_import/spermwhale.RData")
library(Distance)
library(dsm)
df_hr <- ds(dist, truncation=6000, key="hr")
dsm_tw_xy_depth <- dsm(count ~ s(x, y) + s(Depth), ddf.obj=df_hr, observation.data=obs, segment.data=segs, family=tw())
```
Recap
======
- How GAMs work
- How to include detection info
- Simple spatial-only models
- How to check those models
Univariate models are fun, but...
==================================
type: section
Ecology is not univariate
============================
- Many variables affect distribution
- Want to model the **right** ones
- Select between possible models
- Smooth term selection
- Response distribution
- Large literature on model selection
Models with multiple smooths
===========================
type:section
Adding smooths
===============
- Already know that `+` is our friend
- Add everything then remove smooth terms?
```{r add-all, echo=TRUE}
dsm_all <- dsm(count~s(x, y) +
s(Depth) +
s(DistToCAS) +
s(SST) +
s(EKE) +
s(NPP),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw())
```
Now we have a huge model, what do we do?
=========================================
type: section
Smooth term selection
================
- Classically, two main approaches
- Both have problems
- Usually use $p$-values
**Stepwise selection** - path dependence
**All possible subsets** - computationally expensive (fishing?)
p-values
=================
- $p$-values can calculate
- Test for zero effect of a smooth
- They are **approximate** for GAMs (but useful)
- Reported in `summary`
p-values example
================
```{r p-summary, echo=TRUE}
summary(dsm_all)
```
Shrinkage or extra penalties
=============================
- Use penalty to remove terms during fitting
- Two methods
- Basis `s(..., bs="ts")` - thin plate splines with shrinkage
- nullspace should be shrunk less than the wiggly part
- `dsm(..., select=TRUE)` - extra penalty
- no assumption of how much to shrink the nullspace
Shrinkage example
=================
```{r add-ts, echo=TRUE}
dsm_ts_all <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts") +
s(DistToCAS, bs="ts") +
s(SST, bs="ts") +
s(EKE, bs="ts") +
s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw())
```
Shrinkage example
=================
```{r ts-summary, echo=TRUE}
summary(dsm_ts_all)
```
Extra penalty example
=================
```{r add-select, echo=TRUE}
dsm_sel <- dsm(count~s(x, y) +
s(Depth) +
s(DistToCAS) +
s(SST) +
s(EKE) +
s(NPP),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), select=TRUE)
```
Extra penalty example
=================
```{r select-summary, echo=TRUE}
summary(dsm_sel)
```
EDF comparison
==============
```{r edf-comp}
aa <- cbind(summary(dsm_all)$edf,
summary(dsm_sel)$edf,
summary(dsm_ts_all)$edf)
rownames(aa) <- rownames(summary(dsm_sel)$s.table)
colnames(aa) <- c("allterms", "select", "ts")
print(round(aa, 4), digits=4)
```
Double penalty can be slow
===========================
- Lots of smoothing parameters to estimate
```{r numberofsmoopars, echo=TRUE}
length(dsm_ts_all$sp)
length(dsm_sel$sp)
```
Let's employ a mixture of these techniques
===========================================
type: section
How do we select smooth terms?
===============================
1. Look at EDF
- Terms with EDF<1 may not be useful
- These can usually be removed
2. Remove non-significant terms by $p$-value
- Decide on a significance level and use that as a rule
(In some sense leaving "shrunk" terms in is more "consistent", but can be computationally annoying)
Example of selection
======================
type: section
Selecting smooth terms
========================
```{r add-term-summary}
summary(dsm_ts_all)
```
Shrinkage in action
====================
```{r smooth-shrinkage, fig.width=15}
plot(dsm_ts_all, pages=1, scale=0)
```
Same model with no shrinkage
=============================
```{r smooth-no-shrinkage, fig.width=15}
plot(dsm_all, pages=1, scale=0)
```
Let's remove some smooth terms & refit
=======================================
```{r add-term-rm-terms, echo=TRUE}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts") +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
s(EKE, bs="ts"),#+
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs,
observation.data=obs,
family=tw())
```
What does that look like?
===========================
```{r add-term-summary-rm}
summary(dsm_all_tw_rm)
```
Removing EKE...
=================
```{r add-term-rm-terms2}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
s(Depth, bs="ts"),# +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
#s(EKE, bs="ts"), #+
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
summary(dsm_all_tw_rm)
```
General strategy
=================
For each response distribution and non-nested model structure:
1. Build a model with the smooths you want
2. Make sure that smooths are flexible enough (`k=...`)
3. Remove smooths that have been shrunk
4. Remove non-significant smooths
Comparing models
=================
type: section
Comparing models
=================
- Usually have >1 option
- How can we pick?
- Even if we have 1 model, is it any good?
Nested vs. non-nested models
==============================
- Compare `~s(x)+s(depth)` with `~s(x)`
- nested models
- What about `s(x) + s(y)` vs. `s(x, y)`
- don't want to have all these in the model
- not nested models
Measures of "fit"
===================
- Two listed in `summary`
- Deviance explained
- Adjusted $R^2$
- Deviance is a generalisation of $R^2$
- Highest likelihood value (*saturated* model) minus estimated model value
- (These are usually not very high for DSMs)
AIC
===
- Can get AIC from our model
- Comparison of AIC fine (but not the end of the story)
```{r aic, echo=TRUE}
AIC(dsm_all)
AIC(dsm_ts_all)
```
A quick note about REML scores
================================
- Use REML to select the smoothness
- Can also use the score to do model selection
- **BUT** only compare models with the same fixed effects
- (i.e., same "linear terms" in the model)
- $\Rightarrow$ **All terms** must be penalised
- `bs="ts"` or `select=TRUE`
Selecting between response distributions
==========================================
type: section
Goodness of fit tests
======================
- Q-Q plots
- Closer to the line == better
```{r gof-qq, fig.width=18}
dsm_x_nb <- dsm(count~s(x, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=nb(), method="REML")
par(mfrow=c(1,2), cex.title=3, cex.axis=1.5, cex.lab=1.5, lwd=2)
qq.gam(dsm_all, asp=1, main="Tweedie", cex=5)
qq.gam(dsm_x_nb, asp=1, main="Negative binomial", cex=5)
```
Tobler's first law of geography
==================================
type:section
"Everything is related to everything else, but near things are more related than distant things"
Tobler (1970)
Implications of Tobler's law
==============================
```{r pairrrrs, fig.width=10}
plot(segs[,c("x","y","SST","EKE","NPP","Depth")], pch=19, cex=0.4)
```
blah
=====
type:section
title:none
Covariates are not only correlated (linearly)...
...they are also "concurve"
"How much can one smooth be approximated by one or more other smooths?"
Concurvity (model/smooth)
=========================
```{r concurvity, echo=TRUE}
concurvity(dsm_all)
```
Concurvity between smooths
===========================
```{r concurvity-all, echo=TRUE}
concurvity(dsm_all, full=FALSE)$estimate
```
Visualising concurvity between terms
==========================
```{r concurvity-all-vis}
vis.concurvity(dsm_ts_all)
```
***
- Previous matrix output visualised
- High values (yellow) = BAD
Path dependence
==================
type:section
Sensitivity
============
- General path dependency?
- What if there are highly concurve smooths?
- Is the model is sensitive to them?
What can we do?
=================
- Fit variations excluding smooths
- Concurve terms that are excluded early on
- Appendix of Winiarski et al (2014) has an example
Sensitivity example
======================
- `s(Depth)` and `s(x, y)` are highly concurve (`r round(concurvity(dsm_all_tw_rm, full=FALSE)$estimate[2,3],4)`)
- Refit removing `Depth` first
```{r nodepth}
dsm_no_depth <- dsm(count~s(x, y, bs="ts") +
#s(DistToCAS, bs="ts") +
#s(SST, bs="ts") +
s(EKE, bs="ts") +
s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
```{r sensitivity-anova}
cat("# with depth")
summary(dsm_all_tw_rm)$s.table
cat("# without depth")
summary(dsm_no_depth)$s.table
```
Comparison of spatial effects
===============================
```{r sensitivity-vis, fig.width=18}
par(mfrow=c(1,2))
vis.gam(dsm_all_tw_rm, view=c("x","y"), plot.type="contour", main="s(x,y) + s(Depth)", asp=1, too.far=0.1, zlim=c(-6, 2))
vis.gam(dsm_no_depth, view=c("x","y"), plot.type="contour", main="s(x,y)+s(EKE)+s(NPP)", asp=1, too.far=0.1, zlim=c(-6, 2))
```
Sensitivity example
======================
- Refit removing `x` and `y`...
```{r noxy}
dsm_no_xy <- dsm(count~
#s(DistToCAS, bs="ts") +
s(SST, bs="ts") +
s(Depth, bs="ts"),# +
#s(EKE, bs="ts") +
#s(NPP, bs="ts"),
ddf.obj=df_hr,
segment.data=segs, observation.data=obs,
family=tw(), method="REML")
```
```{r sensitivity-anova-noxy, fig.width=12}
cat("# without xy")
summary(dsm_no_xy)$s.table
cat("# with xy")
summary(dsm_all_tw_rm)$s.table
```
Comparison of depth smooths
============================
```{r sensitivity-depth, fig.width=12}
par(mfrow=c(1,2))
plot(dsm_all_tw_rm, select=2, ylim=c(-5,5))
plot(dsm_no_xy, select=2, ylim=c(-5,5))
```
Comparing those three models...
================================
```{r sensitivity-table, results="asis"}
sens <- data.frame(Model = c("`s(x,y) + s(Depth)`",
"`s(x,y)+s(EKE)+s(NPP)`",
"`s(SST)+s(Depth)`"),
AIC = c(AIC(dsm_all_tw_rm),
AIC(dsm_no_depth),
AIC(dsm_no_xy)),
Deviance = c(summary(dsm_all_tw_rm)$dev.expl,
summary(dsm_no_depth)$dev.expl,
summary(dsm_no_xy)$dev.expl))
sens$Deviance <- sens$Deviance*100
sens[,2] <- round(sens[,2],4)
sens[,3] <- round(sens[,3],2)
kable(sens)
```
- "Full" model still explains most deviance
- No depth model requires spatial smooth to "mop up" extra variation
- We'll come back to this when we do prediction
Recap
========
type: section
Recap
=======
- Adding smooths
- Removing smooths
- $p$-values
- shrinkage/extra penalties
- Comparing models
- Comparing response distributions
- Sensitivity