--- title: "Chapter 10 Regularization Methods" date: "`r Sys.Date()`" output: html_document: toc: true toc_depth: 3 toc_float: collapsed: true smooth_scroll: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) ``` This notebook illustrates how to implement regularization methods using R. # Load R Packages ```{r, message = FALSE, warning=FALSE, results='hide'} # install packages from CRAN p_needed <- c('caret', 'elasticnet', 'glmnet', 'devtools', 'MASS', 'grplasso') packages <- rownames(installed.packages()) p_to_install <- p_needed[!(p_needed %in% packages)] if (length(p_to_install) > 0) { install.packages(p_to_install) } lapply(p_needed, require, character.only = TRUE) # install packages from GitHub p_needed_gh <- c('NetlifyDS') if (! p_needed_gh %in% packages) { devtools::install_github("netlify/NetlifyDS") } library(NetlifyDS) ``` # Ridge Regression ```{r} dat <- read.csv("http://bit.ly/2P5gTw4") # data cleaning: delete wrong observations # expense can't be negative dat <- subset(dat, store_exp > 0 & online_exp > 0) # get predictors trainx <- dat[ , grep("Q", names(dat))] # get response trainy <- dat$store_exp + dat$online_exp ``` In Ridge regression, there is a tuning parameter $\lambda$ that needs to set for the right level of regularization. The value of lambda can be determined by looking at the performance in the cross-validation data. ```{r} # set cross validation ctrl <- trainControl(method = "cv", number = 10) # set the parameter range ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 20)) set.seed(100) ridgeRegTune <- train(trainx, trainy, method = "ridge", tuneGrid = ridgeGrid, trControl = ctrl, ## center and scale predictors preProc = c("center", "scale")) ridgeRegTune ``` ```{r} plot(ridgeRegTune) ``` Once we have determined the value for tuning parameter lambda, we can use different ways to fit the ridge regression model, for example, using `enet()` function. ```{r} ridgefit = enet(x = as.matrix(trainx), y = trainy, lambda = 0.01, # center and scale predictors normalize = TRUE) ``` Once the model is fit, we can use the `predict()` function to predict new dataset. ```{r} ridgePred <- predict(ridgefit, newx = as.matrix(trainx), s = 1, mode = "fraction", type = "fit") ``` ```{r} names(ridgePred) ``` ```{r} head(ridgePred$fit) ``` ```{r} ridgeCoef<-predict(ridgefit,newx = as.matrix(trainx), s=1, mode="fraction", type="coefficients") ridgeCoef$coefficients ``` # LASSO LASSO method will shrink certain parameters to be zero which can be used as a variable selection tool. In LASSO, there is also a tuning parameter called fraction which can be determined by performance of the cross validation data set. ```{r} ctrl <- trainControl(method = "cv", number = 10) lassoGrid <- data.frame(fraction = seq(.8, 1, length = 20)) set.seed(100) lassoTune <- train(trainx, trainy, ## set the method to be lasso method = "lars", tuneGrid = lassoGrid, trControl = ctrl, ## standardize the predictors preProc = c("center", "scale")) lassoTune ``` ```{r} plot(lassoTune) ``` Once the tuning parameter fraction is set, we can use different functions to fit LASSO model, such as `lars()` in `lars`, `enet()` in `elasticnet`, and `glmnet()` in `glmnet`. ```{r} lassoModel <- enet(x = as.matrix(trainx), y = trainy, lambda = 0, normalize = TRUE) ``` ```{r} lassoFit <- predict(lassoModel, newx = as.matrix(trainx), s = 0.957, mode = "fraction", type = "fit") ``` ```{r} head(lassoFit$fit) ``` ```{r} lassoCoef <- predict(lassoModel, newx = as.matrix(trainx), s = 0.95, mode = "fraction", type = "coefficients") lassoCoef$coefficients ``` # Elastic Net Elastic net is a generalization of LASSO and ridge regression. It is a combination of LASSO and ridge regression. For elastic net, there are two tuning parameters called lambda and fraction. ```{r} enetGrid <- expand.grid(.lambda = seq(0,0.2,length=20), .fraction = seq(.8, 1, length = 20)) set.seed(100) enetTune <- train(trainx, trainy, method = "enet", tuneGrid = enetGrid, trControl = ctrl, preProc = c("center", "scale")) enetTune ``` # Penalized Generalized Linear Model Adding penalties is a general technique that can be applied to many methods other than linear regression. In this section, we will introduce the penalized generalized linear model using `glmnet()`. ## Introduction to `glmnet` package The default family option in the function `glmnet()` is `gaussian`. It is the linear regression we discussed so far in this chapter. But the parameterization is a little different in the generalized linear model framework (we have $\alpha$ and $\lambda$). Let's start from our previous example, using the same training data but `glmnet()` to fit model: ```{r} dat <- read.csv("http://bit.ly/2P5gTw4") # data cleaning: delete wrong observations with expense < 0 dat <- subset(dat, store_exp > 0 & online_exp > 0) # get predictors trainx <- dat[, grep("Q", names(dat))] # get response trainy <- dat$store_exp + dat$online_exp glmfit = glmnet(as.matrix(trainx), trainy) ``` We can extract useful information using `plot()` and `print()` function from the fitted model. ```{r} plot(glmfit, label = T) ``` ```{r} print(glmfit) ``` We can also set up the value of $\lambda$ in the LASSO penalty by setting `s` in `coef()` function. For example, the following code set $\lambda$ to be 1200. and there are three coefficients with non-zero estimates(`Q1`, `Q2` and `Q3`). ```{r} coef(glmfit, s = 1200) ``` You can apply models with different values of tuning parameter to new data using `predict()`: ```{r} newdat = matrix(sample(1:9, 30, replace = T), nrow = 3) predict(glmfit, newdat, s = c(1741, 2000)) ``` The tuning parameter $\lambda$ can be determined by the performance of the cross validation data such as the minimum of error or one standard error of the minimum. ```{r} cvfit = cv.glmnet(as.matrix(trainx), trainy) ``` ```{r} cvfit = cv.glmnet(as.matrix(trainx), trainy) plot(cvfit) ``` ```{r} # lambda with minimum mean cross-validated error cvfit$lambda.min ``` ```{r} # lambda with one standard error of the minimum cvfit$lambda.1se ``` ```{r} # coefficient estimates for model with the error # that is within one standard error of the minimum coef(cvfit, s = "lambda.1se") ``` ## Penalized Logistic Regression In this section, we will go over penalized logistic regression using `glmnet()` with "`binomial`" for the "`family`" parameter. Similarly, for the default parameter, it will be LASSO penalty and the tuning parameter can be determined by the performance of the cross validation dataset: minimum of error or one standard error of the minimum. ```{r} dat <- read.csv("http://bit.ly/2KXb1Qi") trainx = dplyr::select(dat, -y) trainy = dat$y fit <- glmnet(as.matrix(trainx), trainy, family = "binomial") ``` ```{r} levels(as.factor(trainy)) ``` ```{r} newdat = as.matrix(trainx[1:3, ]) predict(fit, newdat, type = "link", s = c(2.833e-02, 3.110e-02)) ``` ```{r} cvfit = cv.glmnet(as.matrix(trainx), trainy, family = "binomial", type.measure = "class") plot(cvfit) ``` ```{r} cvfit$lambda.min ``` ```{r} cvfit$lambda.1se ``` ## Group LASSO Logistic Regression ```{r} data("sim1_da1") ``` ```{r} # the last column of sim1_da1 response variable y # trainx is the explanatory variable matrix trainx = dplyr::select(sim1_da1, -y) # save response variable as as trainy trainy = sim1_da1$y # get the group indicator index <- gsub("\\..*", "", names(trainx)) ``` ```{r} index[1:50] ``` ```{r, results='hide', message=FALSE} # Tune over 100 values nlam <- 100 # set the type of prediction # - `link`: return the predicted link function # - `response`: return the predicted probability # number of cross-validation folds kfold <- 10 cv_fit <- cv_glasso(trainx, trainy, nlam = nlam, kfold = kfold, type = "link") ``` ```{r} str(cv_fit, width = 60, strict.width = "cut", list.len = 9) ``` ```{r} plot(cv_fit) ``` ```{r} fitgl <- fitglasso(trainx, trainy, lambda = 0.922, na_action = na.pass) ``` ```{r} head(coef(fitgl)) ``` ```r prey <- predict_glasso(fitgl, trainx) ```