Multivariate smoothing, model selection
========================================
css: custom.css
transition: none

```{r setup, include=FALSE}
library(knitr)
library(magrittr)
library(viridis)
library(ggplot2)
library(reshape2)
library(dsm)
library(animation)
opts_chunk$set(cache=TRUE, echo=FALSE, fig.height=10, fig.width=10)
```

```{r initialmodeletc, echo=FALSE, message=FALSE, warning=FALSE}
load("../spermwhaledata/R_import/spermwhale.RData")
library(Distance)
library(dsm)
df_hr <- ds(dist, truncation=6000, key="hr")
dsm_tw_xy_depth <- dsm(count ~ s(x, y) + s(Depth), ddf.obj=df_hr, observation.data=obs, segment.data=segs, family=tw())
```


Recap
======

- How GAMs work
- How to include detection info
- Simple spatial-only models
- How to check those models


Univariate models are fun, but...
==================================
type: section


Ecology is not univariate
============================

- Many variables affect distribution
- Want to model the **right** ones
- Select between possible models
  - Smooth term selection
  - Response distribution
- Large literature on model selection

Models with multiple smooths
===========================
type:section

Adding smooths
===============

- Already know that `+` is our friend
- Add everything then remove smooth terms?

```{r add-all, echo=TRUE}
dsm_all <- dsm(count~s(x, y) +
                        s(Depth) +
                        s(DistToCAS) +
                        s(SST) +
                        s(EKE) +
                        s(NPP),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw())
```

Now we have a huge model, what do we do?
=========================================
type: section

Smooth term selection
================

- Classically, two main approaches
- Both have problems
- Usually use $p$-values

**Stepwise selection** - path dependence

**All possible subsets** - computationally expensive (fishing?)


p-values
=================

- $p$-values can calculate
- Test for zero effect of a smooth
- They are **approximate** for GAMs (but useful)
- Reported in `summary`

p-values example
================

```{r p-summary, echo=TRUE}
summary(dsm_all)
```


Shrinkage or extra penalties
=============================

- Use penalty to remove terms during fitting
- Two methods
- Basis `s(..., bs="ts")` - thin plate splines with shrinkage
  - nullspace should be shrunk less than the wiggly part
- `dsm(..., select=TRUE)` - extra penalty
  - no assumption of how much to shrink the nullspace
  
  
Shrinkage example
=================

```{r add-ts, echo=TRUE}
dsm_ts_all <- dsm(count~s(x, y, bs="ts") +
                        s(Depth, bs="ts") +
                        s(DistToCAS, bs="ts") +
                        s(SST, bs="ts") +
                        s(EKE, bs="ts") +
                        s(NPP, bs="ts"),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw())
```

Shrinkage example
=================

```{r ts-summary, echo=TRUE}
summary(dsm_ts_all)
```

Extra penalty example
=================

```{r add-select, echo=TRUE}
dsm_sel <- dsm(count~s(x, y) +
                     s(Depth) +
                     s(DistToCAS) +
                     s(SST) +
                     s(EKE) +
                     s(NPP),
                  ddf.obj=df_hr,
                  segment.data=segs, observation.data=obs,
                  family=tw(), select=TRUE)
```

Extra penalty example
=================

```{r select-summary, echo=TRUE}
summary(dsm_sel)
```

EDF comparison
==============

```{r edf-comp}
aa <- cbind(summary(dsm_all)$edf,
            summary(dsm_sel)$edf,
            summary(dsm_ts_all)$edf)
rownames(aa) <- rownames(summary(dsm_sel)$s.table)
colnames(aa) <- c("allterms", "select", "ts")
print(round(aa, 4), digits=4)
```

Double penalty can be slow
===========================

- Lots of smoothing parameters to estimate

```{r numberofsmoopars, echo=TRUE}
length(dsm_ts_all$sp)
length(dsm_sel$sp)
```


Let's employ a mixture of these techniques
===========================================
type: section

How do we select smooth terms?
===============================

1. Look at EDF
  - Terms with EDF<1 may not be useful
  - These can usually be removed
2. Remove non-significant terms by $p$-value
  - Decide on a significance level and use that as a rule

(In some sense leaving "shrunk" terms in is more "consistent", but can be computationally annoying)

Example of selection
======================
type: section

Selecting smooth terms
========================

```{r add-term-summary}
summary(dsm_ts_all)
```

  
Shrinkage in action
====================

```{r smooth-shrinkage, fig.width=15}
plot(dsm_ts_all, pages=1, scale=0)
```

Same model with no shrinkage
=============================

```{r smooth-no-shrinkage, fig.width=15}
plot(dsm_all, pages=1, scale=0)
```


Let's remove some smooth terms & refit
=======================================

```{r add-term-rm-terms, echo=TRUE}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
                        s(Depth, bs="ts") +
                        #s(DistToCAS, bs="ts") +
                        #s(SST, bs="ts") +
                        s(EKE, bs="ts"),#+
                        #s(NPP, bs="ts"),
                     ddf.obj=df_hr,
                     segment.data=segs,
                     observation.data=obs,
                     family=tw())
```

What does that look like?
===========================

```{r add-term-summary-rm}
summary(dsm_all_tw_rm)
```

Removing EKE...
=================

```{r add-term-rm-terms2}
dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") +
                        s(Depth, bs="ts"),# +
                        #s(DistToCAS, bs="ts") +
                        #s(SST, bs="ts") +
                        #s(EKE, bs="ts"), #+
                        #s(NPP, bs="ts"),
                      ddf.obj=df_hr,
                      segment.data=segs, observation.data=obs,
                      family=tw(), method="REML")
summary(dsm_all_tw_rm)
```

General strategy
=================

For each response distribution and non-nested model structure:

1. Build a model with the smooths you want
2. Make sure that smooths are flexible enough (`k=...`)
3. Remove smooths that have been shrunk
4. Remove non-significant smooths


Comparing models
=================
type: section

Comparing models
=================

- Usually have >1 option
- How can we pick?
- Even if we have 1 model, is it any good?

Nested vs. non-nested models
==============================

- Compare `~s(x)+s(depth)` with `~s(x)`
  - nested models
- What about `s(x) + s(y)` vs. `s(x, y)`
  - don't want to have all these in the model
  - not nested models

Measures of "fit"
===================

- Two listed in `summary`
  - Deviance explained
  - Adjusted $R^2$
- Deviance is a generalisation of $R^2$
- Highest likelihood value (*saturated* model) minus estimated model value
- (These are usually not very high for DSMs)

AIC
===

- Can get AIC from our model
- Comparison of AIC fine (but not the end of the story)

```{r aic, echo=TRUE}
AIC(dsm_all)
AIC(dsm_ts_all)
```

A quick note about REML scores
================================

- Use REML to select the smoothness
- Can also use the score to do model selection
- **BUT** only compare models with the same fixed effects
  - (i.e., same "linear terms" in the model)
- $\Rightarrow$ **All terms** must be penalised
  - `bs="ts"` or `select=TRUE`


Selecting between response distributions
==========================================
type: section

Goodness of fit tests
======================

- Q-Q plots
- Closer to the line == better
```{r gof-qq, fig.width=18}
dsm_x_nb <- dsm(count~s(x, bs="ts"),
                ddf.obj=df_hr,
                segment.data=segs, observation.data=obs,
                family=nb(), method="REML")
par(mfrow=c(1,2), cex.title=3, cex.axis=1.5, cex.lab=1.5, lwd=2)
qq.gam(dsm_all, asp=1, main="Tweedie", cex=5)
qq.gam(dsm_x_nb, asp=1, main="Negative binomial", cex=5)
```


Tobler's first law of geography
==================================
type:section

"Everything is related to everything else, but near things are more related than distant things"

Tobler (1970)

Implications of Tobler's law
==============================

```{r pairrrrs, fig.width=10}
plot(segs[,c("x","y","SST","EKE","NPP","Depth")], pch=19, cex=0.4)
```

blah
=====
type:section
title:none

Covariates are not only correlated (linearly)...
<br/>
<br/>
<br/>
<br/>
...they are also "concurve"

"How much can one smooth be approximated by one or more other smooths?"

Concurvity (model/smooth)
=========================

```{r concurvity, echo=TRUE}
concurvity(dsm_all)
```


Concurvity between smooths
===========================

```{r concurvity-all, echo=TRUE}
concurvity(dsm_all, full=FALSE)$estimate
```

Visualising concurvity between terms
==========================

```{r concurvity-all-vis}
vis.concurvity(dsm_ts_all)
```
***
- Previous matrix output visualised
- High values (yellow) = BAD

Path dependence
==================
type:section

Sensitivity
============

- General path dependency?
- What if there are highly concurve smooths?
- Is the model is sensitive to them?


What can we do?
=================

- Fit variations excluding smooths
  - Concurve terms that are excluded early on
- Appendix of Winiarski et al (2014) has an example


Sensitivity example
======================

- `s(Depth)` and `s(x, y)` are highly concurve (`r round(concurvity(dsm_all_tw_rm, full=FALSE)$estimate[2,3],4)`)
- Refit removing `Depth` first

```{r nodepth}
dsm_no_depth <- dsm(count~s(x, y, bs="ts") +
                        #s(DistToCAS, bs="ts") +
                        #s(SST, bs="ts") +
                        s(EKE, bs="ts") +
                        s(NPP, bs="ts"),
                      ddf.obj=df_hr,
                      segment.data=segs, observation.data=obs,
                      family=tw(), method="REML")
```
```{r sensitivity-anova}
cat("# with depth")
summary(dsm_all_tw_rm)$s.table
cat("# without depth")
summary(dsm_no_depth)$s.table
```

Comparison of spatial effects
===============================

```{r sensitivity-vis, fig.width=18}
par(mfrow=c(1,2))
vis.gam(dsm_all_tw_rm, view=c("x","y"), plot.type="contour", main="s(x,y) + s(Depth)", asp=1, too.far=0.1, zlim=c(-6, 2))
vis.gam(dsm_no_depth, view=c("x","y"), plot.type="contour", main="s(x,y)+s(EKE)+s(NPP)", asp=1, too.far=0.1, zlim=c(-6, 2))
```

Sensitivity example
======================

- Refit removing `x` and `y`...

```{r noxy}
dsm_no_xy <- dsm(count~
                        #s(DistToCAS, bs="ts") +
                        s(SST, bs="ts") +
                        s(Depth, bs="ts"),# +
                        #s(EKE, bs="ts") +
                        #s(NPP, bs="ts"),
                      ddf.obj=df_hr,
                      segment.data=segs, observation.data=obs,
                      family=tw(), method="REML")
```
```{r sensitivity-anova-noxy, fig.width=12}
cat("# without xy")
summary(dsm_no_xy)$s.table
cat("# with xy")
summary(dsm_all_tw_rm)$s.table
```

Comparison of depth smooths
============================

```{r sensitivity-depth, fig.width=12}
par(mfrow=c(1,2))
plot(dsm_all_tw_rm, select=2, ylim=c(-5,5))
plot(dsm_no_xy, select=2, ylim=c(-5,5))
```

Comparing those three models...
================================

```{r sensitivity-table, results="asis"}
sens <- data.frame(Model    = c("`s(x,y) + s(Depth)`",
                                "`s(x,y)+s(EKE)+s(NPP)`",
                                "`s(SST)+s(Depth)`"),
                   AIC      = c(AIC(dsm_all_tw_rm),
                                AIC(dsm_no_depth),
                                AIC(dsm_no_xy)),
                   Deviance = c(summary(dsm_all_tw_rm)$dev.expl,
                                summary(dsm_no_depth)$dev.expl,
                                summary(dsm_no_xy)$dev.expl))
sens$Deviance <- sens$Deviance*100
sens[,2] <- round(sens[,2],4)
sens[,3] <- round(sens[,3],2)

kable(sens)
```
- "Full" model still explains most deviance
- No depth model requires spatial smooth to "mop up" extra variation
- We'll come back to this when we do prediction


Recap
========
type: section


Recap
=======

- Adding smooths
- Removing smooths
  - $p$-values
  - shrinkage/extra penalties
- Comparing models
- Comparing response distributions
- Sensitivity