---
title: "Partial Residuals Plots"
author: "Dan McGlinn"
date: '`r paste("First created on 2015-01-29. Updated on", Sys.Date())`'
output: 
    html_document:
       fig_width: 10
       fig_height: 5
---

Home Page - http://dmcglinn.github.io/quant_methods/ 
GitHub Repo - https://github.com/dmcglinn/quant_methods 

### Source Code Link
https://raw.githubusercontent.com/dmcglinn/quant_methods/gh-pages/lessons/partial_residual_plots.Rmd

When working with multiple regression models we often spend a lot of time
examining tabular summary tables and not enough time examining graphical 
fits of the models to the data. Partial residual plots help us to visualize 
the fit of each independent variable in the multiple regression model after
controlling for the other variables. Mathematically partial residuals are defined
as: 
 
$\text{Residuals} + \hat{\beta}_iX_i \text{ versus } X_i,$  

where  

$\text{Residuals = residuals from the full model}$  
$\hat{\beta}_i \text{= regression coefficient from the i-th independent variable in the full model}$  
$X_i \text{= the i-th independent variable}$  

We will use the function `termplot` to examine partial regression plots in R for
`lm` models but these also work for `glm` models. 

Before we go further we should mention two caveats: 

This approach is not well suited for models that contain interaction
effects because in that case examining the partial effect of a single term that 
is included in an interaction effect does not make sense. 

This approach can also be misleading if there is strong multicollinearity in which
the independent variables are highly correlated with each other. In this case,
the variance indicated by the partial residual plot can be much less than the
actual variance. 

We will use a simple simulated example with 3 independent variables. 

```{r}
set.seed(1)
x1 <- rnorm(100)                          # continuous variable 1
x2 <- rnorm(100)                          # continuous variable 2
x3 <- as.factor(rep(c('cnt', 'trt'), 50)) # categorical variable

y <- .5 * x1^2 + .75 * x2 + ifelse(x3 == 'cnt', -0.5, 0.5) + rnorm(100, 0, 0.1)
```

We can visually examine our simulated data relationships to `y` prior to modeling  

```{r}
par(mfrow=c(1,3))
plot(y ~ x1)
plot(y ~ x2)
plot(y ~ x3)
```
  
Even though in this simulated case we know the 'true' model because we designed it 
by visually examining the relationship between `y` and each variable individually
it does look like `x1` has no relationship, `x1` has a positive relationship, and 
for x3 it appears that the `trt` samples have a higher `y`

Let's carryout multiple regression modeling to see if we can uncover more information:  

```{r}
mod <- lm(y ~ x1 + x2 + x3)
summary(mod)
```
OK we fit the model and the tabular output seems to agree with our initial
graphical exploration prior to model fitting. Let's now examine the partial
residual plot for this example:   

```{r}
par(mfrow = c(1, 3))
termplot(mod, partial.resid = TRUE, se=T, smooth=panel.smooth, pch=19, cex=0.75,
             col.res = 'black', col.term = 'dodgerblue', lwd.term=2,
             col.se = 'dodgerblue', lwd.se = 2, lty.se=3,
             col.smth = 'red', lty.smth = 2)
```
  
First let's get oriented on these graphics. The y-axis is the partial residual of
the response variable `y` against each specific independent variable. There are
several sets of lines on the graphs. The light blue lines are the linear model 
fits with their associated standard errors (there is more uncertainty at the ends
of the best fit line). The graphs also include a localized smoother or loess 
curve (in red) which fits the data as well as possible. 

In the first panel we are examining the relationship `y ~ x1 | x2 + x3` where the `|` 
stands for `given` so this formula asking what is the strength of the relationship
of `x1` to `y` given or after controlling for `x2` and `x3` in other words: 

```{r}
lm(residuals(lm(y ~ x2 + x3)) ~ x1)
```

Note that the leftmost panel above indicates something really useful to us. It
shows clearly that the regression model (light blue line) has the wrong functional
form. It isn't that `x1` has no relationship to `y` it is just that it is not a 
linear relationship. The loess smoother (red line) shows this clearly. 

The other two panels for `x2` and `x3` also so their partial effects on `y`. 
These panels look good and indicate that our linear model assumption for the 
relationship between `y1` and `x2` seems pretty reasonable. 

Let's re-specify our model and re-examine the partial residual plot  

```{r}
mod2 <- lm(y ~ I(x1^2) + x2 + x3)
summary(mod2)
par(mfrow = c(1, 3))
termplot(mod2, partial.resid = TRUE, se=T, smooth=panel.smooth, pch=19, cex=0.75,
             col.res = 'black', col.term = 'dodgerblue', lwd.term=2,
             col.se = 'dodgerblue', lwd.se = 2, lty.se=3,
             col.smth = 'red', lty.smth = 2)
```
  
Now the model is properly specified the model fits look better and they have lower
uncertainty associated with them. 

Home Page - http://dmcglinn.github.io/quant_methods/ 
GitHub Repo - https://github.com/dmcglinn/quant_methods