--- title: "What happens when you set the intercept to 0 in regression models" description: | What happens when you force the intercept to be 0 in a regression model and why you should (generally) never do it preview: https://raw.githubusercontent.com/hauselin/rtutorialsite/master/_posts/2019-07-24-what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models/what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models_files/figure-html5/unnamed-chunk-6-1.png author: - name: Hause Lin url: {} date: 07-24-2019 categories: - regression - general linear model output: radix::radix_article: toc: true self_contained: false draft: false editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, cache = FALSE, comment = NA, message = FALSE, warning = FALSE) ``` Get source code for this RMarkdown script [here](https://raw.githubusercontent.com/hauselin/rtutorialsite/master/_posts/2019-07-24-what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models/what-happens-when-you-remove-or-set-the-intercept-to-0-in-regression-models.Rmd). ## Consider being a patron and supporting my work? [Donate and become a patron](https://donorbox.org/support-my-teaching): If you find value in what I do and have learned something from my site, please consider becoming a patron. It takes me many hours to research, learn, and put together tutorials. Your support really matters. This article answers the questions below. I will use the built-in dataset `mtcars`. * What is the intercept in a regression model? * What happens when you remove or set the intercept to 0 in a regression model? * Why you should never remove or set the intercept to 0? * What are the effects of mean-centering a regressor/predictor? If you need a refresher on how to interpret regression coefficients, see my other [article](https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/). ```{r} library(ggplot2) # plot regression lines ``` Have a look at the built-in `mtcars` dataset. ```{r} head(mtcars) ``` ## Fit simple regression model (with intercept) ```{r} model1 <- lm(mpg ~ disp, mtcars) # the intercept is included by default: lm(mpg ~ 1 + disp, mtcars) coef1 <- coef(model1) # get coefficients coef1 ``` Regression equation: $$mpg_{i} = 29.59 - 0.04*disp_{i}$$ Interpretation * For every 1 unit increase in the predictor `disp`, the outcome `mpg` changes by -0.04. That is, as `disp` increases, `mpg` decreases. * When `disp = 0`, `mpg = 29.59`. $$mpg = 29.59 - 0.04*0$$ $$mpg = 29.59$$ ```{r fig1} ggplot(mtcars, aes(disp, mpg)) + geom_point() + geom_vline(xintercept = 0, col = 'grey') + geom_hline(yintercept = 0, col = 'grey') + scale_x_continuous(limits = c(-10, max(mtcars$disp))) + scale_y_continuous(limits = c(-10, max(mtcars$mpg))) + geom_abline(intercept = coef(model1)[1], slope = coef(model1)[2]) # manually plot regression line ``` ## Fit simple regression model without intercept ```{r} model0 <- lm(mpg ~ 0 + disp, mtcars) # equivalent syntax: lm(mpg ~ -1 + disp, mtcars) coef0 <- coef(model0) # get coefficients coef0 ``` Note that after setting the intercept to 0, the relationship between `mpg` and `disp` is now **POSITIVE**, rather than negative (see above model with intercept). Regression equation: $$mpg_{i} = 0 + 0.059*disp_{i}$$ Interpretation * For every 1 unit increase in the predictor `disp`, the outcome `mpg` changes by 0.059. That is, as `disp` increases, `mpg` **increases**. * When `disp = 0`, `mpg = 0`. **By removing the intercept (i.e., setting it to 0), we are forcing the regression line to go through the origin (the point where disp = 0 and mpg = 0).** $$mpg = 0 + 0.059*0$$ $$mpg = 0$$ The regression line is forced to pass through the origin (0, 0). Therefore, unless your regressors are standardized or mean-centered, it's not a good idea to set the intercept to 0 when fitting the model. Even when your regressors are standardized or mean-centered, you should still include the intercept. ```{r} ggplot(mtcars, aes(disp, mpg)) + geom_point() + geom_vline(xintercept = 0, col = 'grey') + geom_hline(yintercept = 0, col = 'grey') + scale_x_continuous(limits = c(-10, max(mtcars$disp))) + scale_y_continuous(limits = c(-10, max(mtcars$mpg))) + geom_abline(intercept = 0, slope = coef(model0)[1]) # manually plot regression line ``` ## Fit simple regression models with mean-centered regressor Mean-center regressor ```{r} mtcars$dispC <- mtcars$disp - mean(mtcars$disp) # create mean-centered variable mean(mtcars$dispC) # mean of dispC is 0 (with some rounding error) ``` Fit model with intercept and mean-centered regressor ```{r} model1c <- lm(mpg ~ dispC, mtcars) coef(model1c) ``` Fit model **without** intercept and with mean-centered regressor ```{r} model0c <- lm(mpg ~ 0 + dispC, mtcars) coef(model0c) ``` After mean-centering the regressor/predictor, fitting the model with or without the intercept gives the same `dispC` coefficient: -0.04 ```{r} ggplot(mtcars, aes(dispC, mpg)) + geom_point() + geom_vline(xintercept = 0, col = 'grey') + geom_hline(yintercept = 0, col = 'grey') + scale_x_continuous(limits = c(min(mtcars$dispC), max(mtcars$dispC))) + scale_y_continuous(limits = c(-10, max(mtcars$mpg))) + geom_abline(intercept = coef(model1c)[1], slope = coef(model1c)[2]) ``` Note that the regression slope is identical to the first figure. The only difference is that the points have been shifted to the left. ## Additional resources * [StackExchange: When is it ok to remove the intercept in a linear regression model?](https://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-a-linear-regression-model) * [Interpreting regression coefficients](https://hausetutorials.netlify.com/posts/2019-07-06-interpreting-interaction-regression-coefficients/). ## Support my work [Support my work and become a patron here](https://donorbox.org/support-my-teaching)!