![](ISL_fig_7_3.pdf){width=600px height=600px}

### Linear spline - A **linear spline** with knots at $\xi_k$, $k = 1,\ldots,K$, is a piecewise linear polynomial continuous at each knot. - We can represent this model as $$ y_i = \beta_0 + \beta_1 b_1(x_i) + \beta_2 b_2(x_i) + \cdots + \beta_{K+1} b_{K+1}(x_i) + \epsilon_i, $$ where $b_k$ are **basis functions**: \begin{eqnarray*} b_1(x_i) &=& x_i \\ b_{k+1}(x_i) &=& (x_i - \xi_k)_+, \quad k=1,\ldots,K. \end{eqnarray*} Here $(\cdot)_k$ means positive part $$ (x_i - \xi_i)_+ = \begin{cases} x_i - \xi_k & \text{if } x_i > \xi_k \\ 0 & \text{otherwise} \end{cases}. $$ ### Cubic splines - A **cubic spline** with knots at $\xi_k$, $k = 1,\ldots,K$, is a piecewise cubic polynomial with continuous derivatives up to order 2 at each knot. - Again we can represent this model with **truncated power basis functions** $$ y_i = \beta_0 + \beta_1 b_1(x_i) + \beta_2 b_2(x_i) + \cdots + \beta_{K+3} b_{K+3}(x_i) + \epsilon_i, $$ with \begin{eqnarray*} b_1(x_i) &=& x_i \\ b_2(x_i) &=& x_i^2 \\ b_3(x_i) &=& x_i^3 \\ b_{k+3}(x_i) &=& (x_i - \xi_k)_+^3, \quad k = 1,\ldots,K, \end{eqnarray*} where $$ (x_i - \xi_i)_+^3 = \begin{cases} (x_i - \xi_k)^3 & \text{if } x_i > \xi_k \\ 0 & \text{otherwise} \end{cases}. $$ - A cubic spline with $K$ knots costs $K+4$ parameters or degrees of freedom. That is $4(K+1)$ polynomial coefficients minus $3K$ constraints. - While the truncated power basis is conceptually simple, it is not too attractive numerically: powers of large numbers can lead to severe rounding problems. In practice, **B-spline basis functions** are preferred for their computational efficiency. See ESL Chapter 5 Appendix. ```{python} #| code-fold: true from sklearn.preprocessing import SplineTransformer # Cubic spline for age X_age = np.array(X['age']).reshape(3000, 1) x_plot = np.linspace(start = 15, stop = 85, num = 70) X_plot = x_plot[:, np.newaxis] bs_plot = SplineTransformer( degree = 3, # knots = np.array([25, 40, 60]).reshape(3, 1), n_knots = 5, extrapolation = 'continue', # include_bias = False ).fit(X_age).transform(X_plot) ns_plot = SplineTransformer( degree = 3, # knots = np.array([25, 40, 60]).reshape(3, 1), n_knots = 5, extrapolation = 'linear', # include_bias = False ).fit(X_age).transform(X_plot) # Plot fig, axes = plt.subplots(ncols = 2, figsize = (20, 6)) axes[0].plot(x_plot, bs_plot) # axes[0].legend(axes[0].lines, [f"spline {n}" for n in range(4)]) axes[0].set_title("B-splines") axes[1].plot(x_plot, ns_plot) # axes[1].legend(axes[0].lines, [f"spline {n}" for n in range(8)]) axes[1].set_title("B-splines with linearity at boundary") plt.show() ``` ### Natural cubic splines - Splines can have high variance at the outer range of the predictors. - A **natural cubic spline** extrapolates linearly beyond the boundary knots. This adds $4 = 2 \times 2$ extra constraints, and allows us to put more internal knots for the same degrees of freedom as a regular cubic spline. - A natural spline with $K$ knots has $K$ degrees of freedom. ::: {.panel-tabset} #### R ```{r} #| code-fold: true library(splines) # Plot wage ~ age Wage %>% ggplot(mapping = aes(x = age, y = wage)) + geom_point(alpha = 0.25) + # Cubic spline geom_smooth( method = "lm", formula = y ~ bs(x, knots = c(25, 40, 60)), color = "blue" ) + # Natural cubic spline geom_smooth( method = "lm", formula = y ~ ns(x, knots = c(25, 40, 60)), color = "red" ) + labs( title = "Natural cubic spline fit (red) vs cubic spline fit (blue)", x = "Age", y = "Wage (k$)" ) ``` ::: ### Knot placement - One strategy is to decide $K$, the number of knots, and then place them at appropriate quantiles of the observed $X$. - In practice users often specify the degree of freedom and let software choose the number of knots and locations. ## Smoothing splines - Consider this criterion for fitting a smooth function $g(x)$ to some data: $$ \text{minimize} \quad \sum_{i=1}^n (y_i - g(x_i))^2 + \lambda \int g''(t)^2 \, dt. $$ - The first term is RSS, and tries to make $g(x)$ match the data at each $x_i$. - The second term is a **roughness penalty** and controls how wiggly $g(x)$ is. It is modulated by the tuning parameters $\lambda \ge 0$. * The smaller $\lambda$, the more wiggly the function, eventually interpolating $y_i$ when $\lambda = 0$. * As $\lambda \to \infty$, the function $g(x)$ becomes linear. - The solution is a (shrunken) natural cubic spline, with a knot at every unique value of $x_i$. The roughness penalty still controls the roughness via $\lambda$. - Smoothing splines avoid the knot-selection issue, leaving a single $\lambda$ to be chosen. - The vector of $n$ fitted values can be written as $\hat{g}_\lambda = S_\lambda y$, where $S_{\lambda}$ is an $n \times n$ matrix (determined by the $x_i$ and $\lambda$). - The **effective degrees of freedom** are given by $$ \text{df}_{\lambda} = \sum_{i=1}^n S_{\lambda,ii}. $$ Thus we can specify `df` rather than $\lambda$. - The leave-one-out (LOO) cross-validated error is given by $$ \text{RSS}_{\text{CV}}(\lambda) = \sum_{i=1}^n \left[ \frac{y_i - \hat{g}_\lambda(x_i)}{1 - S_{\lambda,ii}} \right]^2. $$ ::: {.panel-tabset} #### R `ggformula` package supplies `geom_spline` function for displaying smoothing spline fits. ```{r} #| code-fold: true library(ggformula) library(splines) # Plot wage ~ age Wage %>% ggplot(mapping = aes(x = age, y = wage)) + geom_point(alpha = 0.25) + # Smoothing spline with df = 16 geom_spline( df = 16, color = "red" ) + # Smoothing spline with GCV tuned df geom_spline( # df = 6.8, cv = TRUE, color = "blue" ) + labs( title = "Smoothing spline with df=16 (red) vs LOOCV tuned df=6.8 (blue)", x = "Age", y = "Wage (k$)" ) ``` ::: ## Local regression![](ISL_fig_7_9.pdf){width=600px height=600px}

- With a sliding weight function, we fit separate linear fits over the range of $X$ by weighted least squares. - At $X=x_0$, $$ \text{minimize} \quad \sum_{i=1}^n K(x_i, x_0) (y_i - \beta_0 - \beta_1 x_i)^2, $$ where $K$ is a weighting function that assigns heavier weight for $x_i$ close to $x_0$ and zero weight for points furthest from $x_0$. - **Locally weighted linear regression**: `loess` function in R and `lowess` in Python. - Anecdotally, loess gives better appearance, but is $O(N^2)$ in memory, so does not work for larger data sets. ## Generalized additive model (GAM) - Generalized additive models (GAMs) allows for flexible nonlinearities in several variables, but retains the additive structure of linear models. $$ y_i = \beta_0 + f_1(x_{i1}) + f_2(x_{i2}) + \cdots + f_p (x_{ip}) + \epsilon_i. $$ - We can fit GAM simply using, e.g. natural splines. - Coefficients not that interesting; fitted functions are. - Can mix terms: some linear, some nonlinear, and use ANOVA to compare models. - Can use smoothing splines or local regression as well. In R: `gam(wage ~ s(year; df = 5) + lo(age; span = :5) + education)`. - GAMs are additive, although low-order interactions can be included in a natural way using, e.g. bivariate smoothers or interactions of the form (in R) `ns(age, df = 5):ns(year, df = 5)`. ::: {.panel-tabset} #### Python (sklearn) ```{python} from sklearn.preprocessing import OneHotEncoder, SplineTransformer # Natural cubic spline features of year predictor ns_tf = make_column_transformer( (SplineTransformer( n_knots = 4, # knots = 'quantile', degree = 3, extrapolation = 'linear', # natural cubic spline # include_bias = False ), ['year']), (SplineTransformer( n_knots = 5, # knots = 'quantile', degree = 3, extrapolation = 'linear', # natural cubic spline # include_bias = False ), ['age']), (OneHotEncoder(drop = 'first'), ['education']), remainder = 'drop' ) # Define pipeline and fit to Wage data pipe = Pipeline(steps = [ ("ns_tf", ns_tf), ("model", LinearRegression()) ]) # Fit pipeline X = Wage.drop('wage', axis = 1) y = Wage.wage pipe.fit(X, y) # R^2 pipe.score(X, y) ``` #### Python (statsmodels) ```{python} from statsmodels.gam.api import GLMGam, BSplines # Create spline basis for year and age x_spline = Wage[['year', 'age']] bs = BSplines(x_spline, df = [4, 5], degree = [3, 3]) # Fit GAM gam_mod = GLMGam.from_formula('wage ~ education', data = Wage, smoother = bs).fit() gam_mod.summary() ``` ```{python} # Plot smooth components for i in [0, 1]: plt.figure() gam_mod.plot_partial(i, cpr = True) plt.show() ``` #### R Natural splines for `year` and `age`. ```{r} gam_mod <- lm( wage ~ ns(year, df = 4) + ns(age, df = 5) + education, data = Wage ) summary(gam_mod) ``` Smoothing splines for `year` and `age`. ```{r} library(gam) gam_mod <- gam( wage ~ s(year, 4) + s(age, 5) + education, data = Wage ) summary(gam_mod) plot(gam_mod, se = TRUE, col = "red") ``` :::