---
title: "What happens when you set the intercept to 0 in regression models"
description: |
What happens when you force the intercept to be 0 in a regression model and why you should (generally) never do it
preview: https://raw.githubusercontent.com/hauselin/rtutorialsite/master/_posts/2019-07-24-what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models/what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models_files/figure-html5/unnamed-chunk-6-1.png
author:
- name: Hause Lin
url: {}
date: 07-24-2019
categories:
- regression
- general linear model
output:
radix::radix_article:
toc: true
self_contained: false
draft: false
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE, comment = NA, message = FALSE, warning = FALSE)
```
Get source code for this RMarkdown script [here](https://raw.githubusercontent.com/hauselin/rtutorialsite/master/_posts/2019-07-24-what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models/what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models.Rmd).
## Consider being a patron and supporting my work?
[Donate and become a patron](https://donorbox.org/support-my-teaching): If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters.
This article answers the questions below. I will use the built-in dataset `mtcars`.
* What is the intercept in a regression model?
* What happens when you remove or set the intercept to 0 in a regression model?
* Why you should never remove or set the intercept to 0?
* What are the effects of mean-centering a regressor/predictor?
If you need a refresher on how to interpret regression coefficients, see my other [article](https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/).
```{r}
library(ggplot2) # plot regression lines
```
Have a look at the built-in `mtcars` dataset.
```{r}
head(mtcars)
```
## Fit simple regression model (with intercept)
```{r}
model1 <- lm(mpg ~ disp, mtcars) # the intercept is included by default: lm(mpg ~ 1 + disp, mtcars)
coef1 <- coef(model1) # get coefficients
coef1
```
Regression equation:
$$mpg_{i} = 29.59 - 0.04*disp_{i}$$
Interpretation
* For every 1 unit increase in the predictor `disp`, the outcome `mpg` changes by -0.04. That is, as `disp` increases, `mpg` decreases.
* When `disp = 0`, `mpg = 29.59`.
$$mpg = 29.59 - 0.04*0$$
$$mpg = 29.59$$
```{r fig1}
ggplot(mtcars, aes(disp, mpg)) +
geom_point() +
geom_vline(xintercept = 0, col = 'grey') +
geom_hline(yintercept = 0, col = 'grey') +
scale_x_continuous(limits = c(-10, max(mtcars$disp))) +
scale_y_continuous(limits = c(-10, max(mtcars$mpg))) +
geom_abline(intercept = coef(model1)[1], slope = coef(model1)[2]) # manually plot regression line
```
## Fit simple regression model without intercept
```{r}
model0 <- lm(mpg ~ 0 + disp, mtcars) # equivalent syntax: lm(mpg ~ -1 + disp, mtcars)
coef0 <- coef(model0) # get coefficients
coef0
```
Note that after setting the intercept to 0, the relationship between `mpg` and `disp` is now **POSITIVE**, rather than negative (see above model with intercept).
Regression equation:
$$mpg_{i} = 0 + 0.059*disp_{i}$$
Interpretation
* For every 1 unit increase in the predictor `disp`, the outcome `mpg` changes by 0.059. That is, as `disp` increases, `mpg` **increases**.
* When `disp = 0`, `mpg = 0`. **By removing the intercept (i.e., setting it to 0), we are forcing the regression line to go through the origin (the point where disp = 0 and mpg = 0).**
$$mpg = 0 + 0.059*0$$
$$mpg = 0$$
The regression line is forced to pass through the origin (0, 0). Therefore, unless your regressors are standardized or mean-centered, it's not a good idea to set the intercept to 0 when fitting the model. Even when your regressors are standardized or mean-centered, you should still include the intercept.
```{r}
ggplot(mtcars, aes(disp, mpg)) +
geom_point() +
geom_vline(xintercept = 0, col = 'grey') +
geom_hline(yintercept = 0, col = 'grey') +
scale_x_continuous(limits = c(-10, max(mtcars$disp))) +
scale_y_continuous(limits = c(-10, max(mtcars$mpg))) +
geom_abline(intercept = 0, slope = coef(model0)[1]) # manually plot regression line
```
## Fit simple regression models with mean-centered regressor
Mean-center regressor
```{r}
mtcars$dispC <- mtcars$disp - mean(mtcars$disp) # create mean-centered variable
mean(mtcars$dispC) # mean of dispC is 0 (with some rounding error)
```
Fit model with intercept and mean-centered regressor
```{r}
model1c <- lm(mpg ~ dispC, mtcars)
coef(model1c)
```
Fit model **without** intercept and with mean-centered regressor
```{r}
model0c <- lm(mpg ~ 0 + dispC, mtcars)
coef(model0c)
```
After mean-centering the regressor/predictor, fitting the model with or without the intercept gives the same `dispC` coefficient: -0.04
```{r}
ggplot(mtcars, aes(dispC, mpg)) +
geom_point() +
geom_vline(xintercept = 0, col = 'grey') +
geom_hline(yintercept = 0, col = 'grey') +
scale_x_continuous(limits = c(min(mtcars$dispC), max(mtcars$dispC))) +
scale_y_continuous(limits = c(-10, max(mtcars$mpg))) +
geom_abline(intercept = coef(model1c)[1], slope = coef(model1c)[2])
```
Note that the regression slope is identical to the first figure. The only difference is that the points have been shifted to the left.
## Additional resources
* [StackExchange: When is it ok to remove the intercept in a linear regression model?](https://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-a-linear-regression-model)
* [Interpreting regression coefficients](https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/).
## Support my work
[Support my work and become a patron here](https://donorbox.org/support-my-teaching)!