Multivariate smoothing, model selection ======================================== css: custom.css transition: none ```{r setup, include=FALSE} library(knitr) library(magrittr) library(viridis) library(ggplot2) library(reshape2) library(dsm) library(animation) opts_chunk$set(cache=TRUE, echo=FALSE, fig.height=10, fig.width=10) ``` ```{r initialmodeletc, echo=FALSE, message=FALSE, warning=FALSE} load("../spermwhaledata/R_import/spermwhale.RData") library(Distance) library(dsm) df_hr <- ds(dist, truncation=6000, key="hr") dsm_tw_xy_depth <- dsm(count ~ s(x, y) + s(Depth), ddf.obj=df_hr, observation.data=obs, segment.data=segs, family=tw()) ``` Recap ====== - How GAMs work - How to include detection info - Simple spatial-only models - How to check those models Univariate models are fun, but... ================================== type: section Ecology is not univariate ============================ - Many variables affect distribution - Want to model the **right** ones - Select between possible models - Smooth term selection - Response distribution - Large literature on model selection Models with multiple smooths =========================== type:section Adding smooths =============== - Already know that `+` is our friend - Add everything then remove smooth terms? ```{r add-all, echo=TRUE} dsm_all <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw()) ``` Now we have a huge model, what do we do? ========================================= type: section Smooth term selection ================ - Classically, two main approaches - Both have problems - Usually use $p$-values **Stepwise selection** - path dependence **All possible subsets** - computationally expensive (fishing?) p-values ================= - $p$-values can calculate - Test for zero effect of a smooth - They are **approximate** for GAMs (but useful) - Reported in `summary` p-values example ================ ```{r p-summary, echo=TRUE} summary(dsm_all) ``` Shrinkage or extra penalties ============================= - Use penalty to remove terms during fitting - Two methods - Basis `s(..., bs="ts")` - thin plate splines with shrinkage - nullspace should be shrunk less than the wiggly part - `dsm(..., select=TRUE)` - extra penalty - no assumption of how much to shrink the nullspace Shrinkage example ================= ```{r add-ts, echo=TRUE} dsm_ts_all <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw()) ``` Shrinkage example ================= ```{r ts-summary, echo=TRUE} summary(dsm_ts_all) ``` Extra penalty example ================= ```{r add-select, echo=TRUE} dsm_sel <- dsm(count~s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), select=TRUE) ``` Extra penalty example ================= ```{r select-summary, echo=TRUE} summary(dsm_sel) ``` EDF comparison ============== ```{r edf-comp} aa <- cbind(summary(dsm_all)$edf, summary(dsm_sel)$edf, summary(dsm_ts_all)$edf) rownames(aa) <- rownames(summary(dsm_sel)$s.table) colnames(aa) <- c("allterms", "select", "ts") print(round(aa, 4), digits=4) ``` Double penalty can be slow =========================== - Lots of smoothing parameters to estimate ```{r numberofsmoopars, echo=TRUE} length(dsm_ts_all$sp) length(dsm_sel$sp) ``` Let's employ a mixture of these techniques =========================================== type: section How do we select smooth terms? =============================== 1. Look at EDF - Terms with EDF<1 may not be useful - These can usually be removed 2. Remove non-significant terms by $p$-value - Decide on a significance level and use that as a rule (In some sense leaving "shrunk" terms in is more "consistent", but can be computationally annoying) Example of selection ====================== type: section Selecting smooth terms ======================== ```{r add-term-summary} summary(dsm_ts_all) ``` Shrinkage in action ==================== ```{r smooth-shrinkage, fig.width=15} plot(dsm_ts_all, pages=1, scale=0) ``` Same model with no shrinkage ============================= ```{r smooth-no-shrinkage, fig.width=15} plot(dsm_all, pages=1, scale=0) ``` Let's remove some smooth terms & refit ======================================= ```{r add-term-rm-terms, echo=TRUE} dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts") + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + s(EKE, bs="ts"),#+ #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw()) ``` What does that look like? =========================== ```{r add-term-summary-rm} summary(dsm_all_tw_rm) ``` Removing EKE... ================= ```{r add-term-rm-terms2} dsm_all_tw_rm <- dsm(count~s(x, y, bs="ts") + s(Depth, bs="ts"),# + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + #s(EKE, bs="ts"), #+ #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") summary(dsm_all_tw_rm) ``` General strategy ================= For each response distribution and non-nested model structure: 1. Build a model with the smooths you want 2. Make sure that smooths are flexible enough (`k=...`) 3. Remove smooths that have been shrunk 4. Remove non-significant smooths Comparing models ================= type: section Comparing models ================= - Usually have >1 option - How can we pick? - Even if we have 1 model, is it any good? Nested vs. non-nested models ============================== - Compare `~s(x)+s(depth)` with `~s(x)` - nested models - What about `s(x) + s(y)` vs. `s(x, y)` - don't want to have all these in the model - not nested models Measures of "fit" =================== - Two listed in `summary` - Deviance explained - Adjusted $R^2$ - Deviance is a generalisation of $R^2$ - Highest likelihood value (*saturated* model) minus estimated model value - (These are usually not very high for DSMs) AIC === - Can get AIC from our model - Comparison of AIC fine (but not the end of the story) ```{r aic, echo=TRUE} AIC(dsm_all) AIC(dsm_ts_all) ``` A quick note about REML scores ================================ - Use REML to select the smoothness - Can also use the score to do model selection - **BUT** only compare models with the same fixed effects - (i.e., same "linear terms" in the model) - $\Rightarrow$ **All terms** must be penalised - `bs="ts"` or `select=TRUE` Selecting between response distributions ========================================== type: section Goodness of fit tests ====================== - Q-Q plots - Closer to the line == better ```{r gof-qq, fig.width=18} dsm_x_nb <- dsm(count~s(x, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=nb(), method="REML") par(mfrow=c(1,2), cex.title=3, cex.axis=1.5, cex.lab=1.5, lwd=2) qq.gam(dsm_all, asp=1, main="Tweedie", cex=5) qq.gam(dsm_x_nb, asp=1, main="Negative binomial", cex=5) ``` Tobler's first law of geography ================================== type:section "Everything is related to everything else, but near things are more related than distant things" Tobler (1970) Implications of Tobler's law ============================== ```{r pairrrrs, fig.width=10} plot(segs[,c("x","y","SST","EKE","NPP","Depth")], pch=19, cex=0.4) ``` blah ===== type:section title:none Covariates are not only correlated (linearly)...



...they are also "concurve" "How much can one smooth be approximated by one or more other smooths?" Concurvity (model/smooth) ========================= ```{r concurvity, echo=TRUE} concurvity(dsm_all) ``` Concurvity between smooths =========================== ```{r concurvity-all, echo=TRUE} concurvity(dsm_all, full=FALSE)$estimate ``` Visualising concurvity between terms ========================== ```{r concurvity-all-vis} vis.concurvity(dsm_ts_all) ``` *** - Previous matrix output visualised - High values (yellow) = BAD Path dependence ================== type:section Sensitivity ============ - General path dependency? - What if there are highly concurve smooths? - Is the model is sensitive to them? What can we do? ================= - Fit variations excluding smooths - Concurve terms that are excluded early on - Appendix of Winiarski et al (2014) has an example Sensitivity example ====================== - `s(Depth)` and `s(x, y)` are highly concurve (`r round(concurvity(dsm_all_tw_rm, full=FALSE)$estimate[2,3],4)`) - Refit removing `Depth` first ```{r nodepth} dsm_no_depth <- dsm(count~s(x, y, bs="ts") + #s(DistToCAS, bs="ts") + #s(SST, bs="ts") + s(EKE, bs="ts") + s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` ```{r sensitivity-anova} cat("# with depth") summary(dsm_all_tw_rm)$s.table cat("# without depth") summary(dsm_no_depth)$s.table ``` Comparison of spatial effects =============================== ```{r sensitivity-vis, fig.width=18} par(mfrow=c(1,2)) vis.gam(dsm_all_tw_rm, view=c("x","y"), plot.type="contour", main="s(x,y) + s(Depth)", asp=1, too.far=0.1, zlim=c(-6, 2)) vis.gam(dsm_no_depth, view=c("x","y"), plot.type="contour", main="s(x,y)+s(EKE)+s(NPP)", asp=1, too.far=0.1, zlim=c(-6, 2)) ``` Sensitivity example ====================== - Refit removing `x` and `y`... ```{r noxy} dsm_no_xy <- dsm(count~ #s(DistToCAS, bs="ts") + s(SST, bs="ts") + s(Depth, bs="ts"),# + #s(EKE, bs="ts") + #s(NPP, bs="ts"), ddf.obj=df_hr, segment.data=segs, observation.data=obs, family=tw(), method="REML") ``` ```{r sensitivity-anova-noxy, fig.width=12} cat("# without xy") summary(dsm_no_xy)$s.table cat("# with xy") summary(dsm_all_tw_rm)$s.table ``` Comparison of depth smooths ============================ ```{r sensitivity-depth, fig.width=12} par(mfrow=c(1,2)) plot(dsm_all_tw_rm, select=2, ylim=c(-5,5)) plot(dsm_no_xy, select=2, ylim=c(-5,5)) ``` Comparing those three models... ================================ ```{r sensitivity-table, results="asis"} sens <- data.frame(Model = c("`s(x,y) + s(Depth)`", "`s(x,y)+s(EKE)+s(NPP)`", "`s(SST)+s(Depth)`"), AIC = c(AIC(dsm_all_tw_rm), AIC(dsm_no_depth), AIC(dsm_no_xy)), Deviance = c(summary(dsm_all_tw_rm)$dev.expl, summary(dsm_no_depth)$dev.expl, summary(dsm_no_xy)$dev.expl)) sens$Deviance <- sens$Deviance*100 sens[,2] <- round(sens[,2],4) sens[,3] <- round(sens[,3],2) kable(sens) ``` - "Full" model still explains most deviance - No depth model requires spatial smooth to "mop up" extra variation - We'll come back to this when we do prediction Recap ======== type: section Recap ======= - Adding smooths - Removing smooths - $p$-values - shrinkage/extra penalties - Comparing models - Comparing response distributions - Sensitivity