---
title: "Chapter 10 Regularization Methods"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_depth: 3
    toc_float:
      collapsed: true
      smooth_scroll: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```

This notebook illustrates how to implement regularization methods using R. 


# Load R Packages

```{r, message = FALSE, warning=FALSE, results='hide'}
# install packages from CRAN
p_needed <- c('caret', 'elasticnet', 'glmnet', 'devtools',
              'MASS', 'grplasso')

packages <- rownames(installed.packages())
p_to_install <- p_needed[!(p_needed %in% packages)]
if (length(p_to_install) > 0) {
    install.packages(p_to_install)
}

lapply(p_needed, require, character.only = TRUE)

# install packages from GitHub
p_needed_gh <- c('NetlifyDS')

if (! p_needed_gh %in% packages) {
    devtools::install_github("netlify/NetlifyDS")
}

library(NetlifyDS)
```

# Ridge Regression

```{r}
dat <- read.csv("http://bit.ly/2P5gTw4")
# data cleaning: delete wrong observations
# expense can't be negative
dat <- subset(dat, store_exp > 0 & online_exp > 0)

# get predictors
trainx <- dat[ , grep("Q", names(dat))]
# get response
trainy <- dat$store_exp + dat$online_exp
```


In Ridge regression, there is a tuning parameter $\lambda$  that needs to set for the right level of regularization. The value of lambda can be determined by looking at the performance in the cross-validation data.

```{r}
# set cross validation
ctrl <- trainControl(method = "cv", number = 10)
# set the parameter range
ridgeGrid <- data.frame(.lambda = seq(0, .1, length = 20))
set.seed(100)

ridgeRegTune <- train(trainx, trainy,
                      method = "ridge",
                      tuneGrid = ridgeGrid,
                      trControl = ctrl,
                      ## center and scale predictors
                      preProc = c("center", "scale"))
ridgeRegTune
```

```{r}
plot(ridgeRegTune)
```

Once we have determined the value for tuning parameter lambda, we can use different ways to fit the ridge regression model, for example, using `enet()` function.

```{r}
ridgefit = enet(x = as.matrix(trainx), y = trainy, lambda = 0.01,
                # center and scale predictors
                normalize = TRUE)
```

Once the model is fit, we can use the `predict()` function to predict new dataset.

```{r}
ridgePred <- predict(ridgefit, newx = as.matrix(trainx),
                     s = 1, mode = "fraction", type = "fit")
```


```{r}
names(ridgePred)
```

```{r}
head(ridgePred$fit)
```


```{r}
ridgeCoef<-predict(ridgefit,newx = as.matrix(trainx),
                   s=1, mode="fraction", type="coefficients")
ridgeCoef$coefficients
```

# LASSO

LASSO method will shrink certain parameters to be zero which can be used as a variable selection tool. In LASSO, there is also a tuning parameter called fraction which can be determined by performance of the cross validation data set.

```{r}
ctrl <- trainControl(method = "cv", number = 10)
lassoGrid <- data.frame(fraction = seq(.8, 1, length = 20))
set.seed(100)
lassoTune <- train(trainx, trainy,
                   ## set the method to be lasso
                   method = "lars",
                   tuneGrid = lassoGrid,
                   trControl = ctrl,
                   ## standardize the predictors
                   preProc = c("center", "scale"))
lassoTune
```


```{r}
plot(lassoTune)
```

Once the tuning parameter fraction is set, we can use different functions to fit LASSO model, such as `lars()` in `lars`, `enet()` in `elasticnet`, and `glmnet()` in `glmnet`.

```{r}
lassoModel <- enet(x = as.matrix(trainx), y = trainy,
                  lambda = 0, normalize = TRUE)
```

```{r}
lassoFit <- predict(lassoModel, newx = as.matrix(trainx),
                    s = 0.957, mode = "fraction", type = "fit")
```

```{r}
head(lassoFit$fit)
```


```{r}
lassoCoef <- predict(lassoModel, 
                     newx = as.matrix(trainx),
                     s = 0.95, mode = "fraction", 
                     type = "coefficients")
lassoCoef$coefficients
```

# Elastic Net

Elastic net is a generalization of LASSO and ridge regression. It is a combination of LASSO and ridge regression. For elastic net, there are two tuning parameters called lambda and fraction.

```{r}
enetGrid <- expand.grid(.lambda = seq(0,0.2,length=20),
                        .fraction = seq(.8, 1, length = 20))
set.seed(100)
enetTune <- train(trainx, trainy,
                  method = "enet",
                  tuneGrid = enetGrid,
                  trControl = ctrl,
                  preProc = c("center", "scale"))
enetTune
```

# Penalized Generalized Linear Model

Adding penalties is a general technique that can be applied to many methods other than linear regression. In this section, we will introduce the penalized generalized linear model using `glmnet()`. 

## Introduction to `glmnet` package

The default family option in the function `glmnet()` is `gaussian`. It is the linear regression we discussed so far in this chapter. But the parameterization is a little different in the generalized linear model framework (we have $\alpha$ and $\lambda$). Let's start from our previous example, using the same training data but `glmnet()` to fit model:

```{r}
dat <- read.csv("http://bit.ly/2P5gTw4")
# data cleaning: delete wrong observations with expense < 0
dat <- subset(dat, store_exp > 0 & online_exp > 0)
# get predictors
trainx <- dat[, grep("Q", names(dat))]
# get response
trainy <- dat$store_exp + dat$online_exp

glmfit = glmnet(as.matrix(trainx), trainy)
```

We can extract useful information using `plot()` and `print()` function from the fitted model. 

```{r}
plot(glmfit, label = T)
```

```{r}
print(glmfit)
```

We can also set up the value of $\lambda$ in the LASSO penalty by setting `s` in `coef()` function. For example, the following code set $\lambda$ to be 1200. and there are three coefficients with non-zero estimates(`Q1`, `Q2` and `Q3`).

```{r}
coef(glmfit, s = 1200)
```

You can apply models with different values of tuning parameter to new data using `predict()`:

```{r}
newdat = matrix(sample(1:9, 30, replace = T), nrow = 3)
predict(glmfit, newdat, s = c(1741, 2000))
```

The tuning parameter $\lambda$ can be determined by the performance of the cross validation data such as the minimum of error or one standard error of the minimum.

```{r}
cvfit = cv.glmnet(as.matrix(trainx), trainy)
```

```{r}
cvfit = cv.glmnet(as.matrix(trainx), trainy)
plot(cvfit)
```


```{r}
# lambda with minimum mean cross-validated error
cvfit$lambda.min
```

```{r}
# lambda with one standard error of the minimum
cvfit$lambda.1se
```

```{r}
# coefficient estimates for model with the error
# that is within one standard error of the minimum
coef(cvfit, s = "lambda.1se")
```

## Penalized Logistic Regression

In this section, we will go over penalized logistic regression using `glmnet()` with "`binomial`" for the "`family`" parameter. Similarly, for the default parameter, it will be LASSO penalty and the tuning parameter can be determined by the performance of the cross validation dataset: minimum of error or one standard error of the minimum. 

```{r}
dat <- read.csv("http://bit.ly/2KXb1Qi")
trainx = dplyr::select(dat, -y)
trainy = dat$y
fit <- glmnet(as.matrix(trainx), trainy, family = "binomial")
```


```{r}
levels(as.factor(trainy))
```


```{r}
newdat = as.matrix(trainx[1:3, ])
predict(fit, newdat, type = "link", s = c(2.833e-02, 3.110e-02))
```


```{r}
cvfit = cv.glmnet(as.matrix(trainx), trainy,
                 family = "binomial", type.measure = "class")
plot(cvfit)
```


```{r}
cvfit$lambda.min
```


```{r}
cvfit$lambda.1se
```

## Group LASSO Logistic Regression


```{r}
data("sim1_da1")
```


```{r}
# the last column of sim1_da1 response variable y
# trainx is the explanatory variable matrix
trainx = dplyr::select(sim1_da1, -y)
# save response variable as as trainy
trainy = sim1_da1$y
# get the group indicator
index <- gsub("\\..*", "", names(trainx))
```

```{r}
index[1:50]
```

```{r, results='hide', message=FALSE}
# Tune over 100 values
nlam <- 100
# set the type of prediction
# - `link`: return the predicted link function
# - `response`: return the predicted probability
# number of cross-validation folds
kfold <- 10
cv_fit <- cv_glasso(trainx, trainy,
nlam = nlam, kfold = kfold, type = "link")
```

```{r}
str(cv_fit, width = 60, strict.width = "cut",  list.len = 9)
```


```{r}
plot(cv_fit)
```


```{r}
fitgl <- fitglasso(trainx, trainy,
                   lambda = 0.922, na_action = na.pass)
```


```{r}
head(coef(fitgl))
```


```r
prey <- predict_glasso(fitgl, trainx)
```