{ "cells": [ { "cell_type": "markdown", "id": "5625d8cf", "metadata": {}, "source": [ "
\n", "\n", "# Regression Evaluation Metrics\n", "\n", "Descriptions and analysis of popular evalution metrics for measuring performance of regresssion models.\n", "\n", "In general the most common metrics for regression model evaluation are:\n", "- $ R^2 $: Coefficient of determination\n", "- **Adjusted $ R^2 $:** $ R^2 $ with adjustments for overfitting\n", "- **MSE:** Mean Squared Error\n", "- **RMSE:** Root Mean Squared Error\n", "- **RMSLE:** Root Mean Squared Logrithmic Error\n", "- **MAE:** Mean Absolute Error\n", "\n", "Resources for More information:\n", "\n", "- [Evalution Metrics](https://towardsdatascience.com/what-are-the-best-metrics-to-evaluate-your-regression-model-418ca481755b)\n", "- [MSE vs R2](https://vitalflux.com/mean-square-error-r-squared-which-one-to-use/)\n", "- [Kaggle Evaluation Metrics](https://www.kaggle.com/vipulgandhi/how-to-choose-right-metric-for-evaluating-ml-model#-Regression-Metrices-)\n", "- [R2 vs Adjusted R2](https://www.analyticsvidhya.com/blog/2020/07/difference-between-r-squared-and-adjusted-r-squared/)\n", "- [R squared Does Not Measure Predictive Capacity or Statistical Adequacy](https://www.kdnuggets.com/2020/07/r-squared-predictive-capacity-statistical-adequacy.html)\n", "- [Why R-Squared shouldn't be used for model performance](https://towardsdatascience.com/avoid-r-squared-to-judge-regression-model-performance-5c2bc53c8e2e)" ] }, { "cell_type": "markdown", "id": "fc346475", "metadata": {}, "source": [ "# Equations\n", "Note: To reduce clutter we're just using Sigma for summation from 1 to N, so $ \\sum = \\sum_{i = 1}^{N} $\n", "# $ R^2 = 1 - \\frac {\\sum (y_i - \\hat y_i)^2} {\\sum (y_i - \\bar y)^2} $\n", "## $ \\text{Adjusted} R^2 = 1 - \\frac {(1-R^2)(n-1)} {(n - k - 1)} $\n", "## $ \\text{MSE} = \\frac{1}{N} \\sum (y_i - \\hat y_i)^2 $\n", "## $ \\text{RMSE} = \\sqrt{\\frac{1}{N} \\sum (y_i - \\hat y_i)^2} $\n", "## $ \\text{RMSLE} = \\sqrt{\\frac{1}{N} \\sum (\\log y_i - \\log \\hat y_i)^2} $\n", "## $ \\text{MAE} = \\frac{1}{N} \\sum |(y_i - \\hat y_i)| $" ] }, { "cell_type": "markdown", "id": "a940ab70", "metadata": {}, "source": [ "
\n", "\n", "# Residuals\n", "\n", "Lets first look at **Residuals**. \n", "\n", "A **Residual** is the difference between the **actual** value and the **predicted** value we when evaluate the model.\n", "\n", "\n", "### $$ y = \\text{Actual Value} $$\n", "### $$ \\hat y = \\text{Predicted Value} $$ \n", "### $$ \\text {Residual} = (y - \\hat y) $$\n", "\n", "So for evaluting model performance many of these metrics use some verions of the **Residual sum of squares** or **RSS** for short.\n", "\n", "## $$ RSS = \\sum (y_i - \\hat y_i)^2 $$\n", "\n", "**Note:** RSS should not be confused with RegSS (Regression Sum of Squares)" ] }, { "cell_type": "markdown", "id": "0992e5c8", "metadata": {}, "source": [ "# Deeper Dive on $ R^2 $\n", "\n", "The $ R^2 $ statistic or coefficient of determination is a scale invariant statistic that gives the **proportion of variation** in target variable explained by the model.\n", "\n", "Lets look at the **R2** Equation written in a couple of different ways.\n", "\n", "# $ R^2 = 1 - \\frac {\\sum (y_i - \\hat y_i)^2} {\\sum (y_i - \\bar y)^2} $\n", "\n", "Total variation in target variable is the sum of squares of the difference between the actual values and their mean. The definition of the Total Sum of Squares (TSS) which gives the total variation in Y is:\n", "\n", "### $ TSS = \\sum (y_i - \\bar y)^2 $\n", "
\n", "\n", "Given our previous definition of RSS and this definition of TSS we can write the equation as:\n", "\n", "## $ R^2 = 1 - \\frac {RSS} {TSS} $\n", "\n", "or in human terms:\n", "## $ R^2 = 1 - \\frac {\\text {Unexplained Variation}} {\\text {Total Variation}} $\n", "\n", "\n", "### Suggested Reading:\n", "- [R squared Does Not Measure Predictive Capacity or Statistical Adequacy](https://www.kdnuggets.com/2020/07/r-squared-predictive-capacity-statistical-adequacy.html)\n", "- [Why R-Squared shouldn't be used for model performance](https://towardsdatascience.com/avoid-r-squared-to-judge-regression-model-performance-5c2bc53c8e2e)" ] }, { "cell_type": "markdown", "id": "3ce478e5", "metadata": {}, "source": [ "# Adjusted $ R^2 $\n", "\n", "One of the issues with the $ R^2 $ metric is that it improves with more independant variables (features). So the more features that are added the better the metric does. This can lead to issues with overfitting. Overfitting is where the model performs well on the training data but then performs poorly on the test data.\n", "\n", "In order to address this issue, the **Adjusted $ R^2 $** metric takes into account the number of independent variables used for predicting the target variable. \n", "\n", "# $$ \\text{Adjusted} R^2 = 1 - \\frac {(1-R^2)(n-1)} {(n - k - 1)} $$\n", "\n", "- **n** = number of data points in the dataset\n", "- **k:** = number of independent variables\n", "- $ R^2 $ = $ R^2 $values (see above)\n", "\n", "We're not going to cover the **Adjusted $ R^2 $** metric in detail for this notebook. \n", "\n", "### Suggested Reading:\n", "- [R2 vs Adjusted R2](https://www.analyticsvidhya.com/blog/2020/07/difference-between-r-squared-and-adjusted-r-squared/)" ] }, { "cell_type": "markdown", "id": "9fdcaabb", "metadata": {}, "source": [ "# MSE: Mean Squared Error\n", "\n", "In statistics, the mean squared error of an estimator measures the mean of the squares of the errors. The MSE is a measure of the quality of an estimator. The measure is a positive value that decreases as the error approaches zero.\n", "\n", "## $$ \\text{MSE} = \\frac{1}{N} \\sum (y_i - \\hat y_i)^2 $$\n", "**Advantages:**\n", "- Easy to compute and understand.\n", "- The function is differentiable so it can be used as a loss function for Gradient Descent based algorithms (Neural Networks, Deep Learning Packages).\n", "\n", "**Disadvantages:**\n", "- Since the metric computes the square of the error the units aren't the same as the problem domain. For example, differences in dollars results in a metric of dollars squared.\n", "- As a square of the error this metric may be senstive to outliers*\n", "\n", "**\\*Note:** Depending on the problem domain you may want a metric that penalizes large errors. So this disadvantage may actually be an advantage for some use cases." ] }, { "cell_type": "markdown", "id": "2a831cac", "metadata": {}, "source": [ "# RMSE: Root Mean Squared Error\n", "\n", "RMSE is simply the square root of the MSE Metric. RMSE is certainly one of the most popular evaluation metrics.\n", "Many of the Kaggle challenges use RMSE as the model performance metric. RMSE can be directly interpreted in terms \n", "of measurement units and is often considered a better measure of fit than a correlation coefficient.\n", "\n", "## $$ \\text{RMSE} = \\sqrt{\\frac{1}{N} \\sum (y_i - \\hat y_i)^2} $$\n", "\n", "**Advantages:**\n", "- Easy to compute and understand.\n", "- Used quite often in Kaggle competitions (they also use MSE and RMSLE in some cases)\n", "- The function is differentiable so it can be used as a loss function for Gradient Descent based algorithms (Neural Networks, Deep Learning Packages).\n", "\n", "**Disadvantages:**\n", "- As a square of the error this metric may be senstive to outliers*\n", "\n", "**\\*Note:** Depending on the problem domain you may want a metric that penalizes large errors. So this disadvantage may actually be an advantage for some use cases." ] }, { "cell_type": "markdown", "id": "8f06b620", "metadata": {}, "source": [ "# RMSLE: Root Mean Squared Logistic Error\n", "\n", "RMSLE is simply the RMSE Metric with a logorithm applied to both the actual and predicted values. \n", "\n", "## $ \\text{RMSLE} = \\sqrt{\\frac{1}{N} \\sum (\\log y_i - \\log \\hat y_i)^2} $\n", "\n", "**Advantages (compared to RMSE):**\n", "- Doesn't lend more weight to absolute differences\n", "\n", "**Disadvantages (compared to RMSE):**\n", "- Can only be used for positive/non zero target ranges\n", "- LogS typically ranges from -13 to +2 :(\n", "\n", "### Example\n", "We have data that ranges from 1 to 10. We have two actual vs predicted values:\n", "\n", "- Case 1: \n", " - actual: 1.0 \n", " - pred: 2.0\n", "- Case 2: \n", " - actual: 10\n", " - pred: 11\n", "\n", "From a human's prespective a value of 1.0 and a prediction of 2.0 feels 'way off' where \n", "a value of 10 and a prediction of 11 feels 'pretty close'. In the case of **RMSE** these will be the same from a metric penalty perspective, you'd like Case 1 to have a higher penalty. **RMSLE** will have a higher (relative) penalty for Case 1. " ] }, { "cell_type": "markdown", "id": "1ffe8065", "metadata": {}, "source": [ "# MAE: Mean Absolute Error\n", "\n", "MAE is also closely related to the MSE Metric. MAE simply takes the absolute value of the error instead of\n", "squaring the error. One of the main reasons to use MAE is that it's robust to outliers.\n", "\n", "## $$ \\text{MAE} = \\frac{1}{N} \\sum |(y_i - \\hat y_i)| $$\n", "**Advantages:**\n", "- Easy to compute and understand.\n", "- Robust to outliers.\n", "\n", "**Disadvantages:**\n", "- Not differentiable (so not usable in Neural Nets/Deep Learning)\n", "\n", "### Suggested Reading\n", "- [MAE vs RMSE](https://medium.com/human-in-a-machine-world/mae-and-rmse-which-metric-is-better-e60ac3bde13d)" ] }, { "cell_type": "markdown", "id": "490c31ff", "metadata": {}, "source": [ "\n", "\n", "# Which one should I choose?\n", "\n", "Given the popularity of **RMSE** and that Kaggle uses it for many of it's competitions, we suggest starting with that one. If you want a metric that robust to outliers then perhaps go with **MAE**. If you have target variable with a large range (like from 1 to 100), then **RMSLE** might be the way to go.\n", "\n", "Well hopefully this overview of the various metrics for regression model performance was helpful. If you'd like additional details we suggest the follow up reading links and the resources at the top.\n", "\n", "If you liked this notebook please visit [SCP Labs](https://github.com/SuperCowPowers/scp-labs) for more notebooks and examples, or visit our company page for consulting and development services [SuperCowPowers](https://www.supercowpowers.com)" ] }, { "cell_type": "code", "execution_count": 1, "id": "ce818a93", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This cell is simply for adding some CSS (Ignore it :)\n", "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 5 }