## 3.7 Feature importance: Correlation [Slides](https://www.slideshare.net/AlexeyGrigorev/ml-zoomcamp-3-machine-learning-for-classification) ## Notes **Correlation coefficient** measures the degree of dependency between two variables. This value is negative if one variable grows while the other decreases, and it is positive if both variables increase. Depending on its size, the dependency between both variables could be low, moderate, or strong. It allows measuring the importance of numerical variables. If `r` is correlation coefficient, then the correlation between two variables is: - LOW when `r` is between [0, -0.2) or [0, 0.2) - MEDIUM when `r` is between [-0.2, -0.5) or [2, 0.5) - STRONG when `r` is between [-0.5, -1.0] or [0.5, 1.0] Positive Correlation vs. Negative Correlation * When `r` is positive, an increase in x will increase y. * When `r` is negative, an increase in x will decrease y. * When `r` is 0, a change in x does not affect y. **Functions and methods:** * `df[x].corrwith(y)` - returns the correlation between x and y series. This is a function from pandas. The entire code of this project is available in [this jupyter notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/chapter-03-churn-prediction/03-churn.ipynb).
⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.
* [Notes from Peter Ernicke](https://knowmledge.com/2023/09/29/ml-zoomcamp-2023-machine-learning-for-classification-part-7/) ## Navigation * [Machine Learning Zoomcamp course](../) * [Session 3: Machine Learning for Classification](./) * Previous: [Feature importance: Mutual information](06-mutual-info.md) * Next: [One-hot encoding](08-ohe.md)