{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \ud83d\udcc3 Solution for Exercise M4.01\n",
    "\n",
    "The aim of this exercise is two-fold:\n",
    "\n",
    "* understand the parametrization of a linear model;\n",
    "* quantify the fitting accuracy of a set of such models.\n",
    "\n",
    "We will reuse part of the code of the course to:\n",
    "\n",
    "* load data;\n",
    "* create the function representing a linear model.\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "### Data loading"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "<p class=\"first admonition-title\" style=\"font-weight: bold;\">Note</p>\n",
    "<p class=\"last\">If you want a deeper overview regarding this dataset, you can refer to the\n",
    "Appendix - Datasets description section at the end of this MOOC.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "penguins = pd.read_csv(\"../datasets/penguins_regression.csv\")\n",
    "feature_name = \"Flipper Length (mm)\"\n",
    "target_name = \"Body Mass (g)\"\n",
    "data, target = penguins[[feature_name]], penguins[target_name]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "### Model definition"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def linear_model_flipper_mass(\n",
    "    flipper_length, weight_flipper_length, intercept_body_mass\n",
    "):\n",
    "    \"\"\"Linear model of the form y = a * x + b\"\"\"\n",
    "    body_mass = weight_flipper_length * flipper_length + intercept_body_mass\n",
    "    return body_mass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main exercise\n",
    "\n",
    "Define a vector `weights = [...]` and a vector `intercepts = [...]` of the\n",
    "same length. Each pair of entries `(weights[i], intercepts[i])` tags a\n",
    "different model. Use these vectors along with the vector\n",
    "`flipper_length_range` to plot several linear models that could possibly fit\n",
    "our data. Use the above helper function to visualize both the models and the\n",
    "real samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "flipper_length_range = np.linspace(data.min(), data.max(), num=300)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "\n",
    "weights = [-40, 45, 90]\n",
    "intercepts = [15000, -5000, -14000]\n",
    "\n",
    "ax = sns.scatterplot(\n",
    "    data=penguins, x=feature_name, y=target_name, color=\"black\", alpha=0.5\n",
    ")\n",
    "\n",
    "label = \"{0:.2f} (g / mm) * flipper length + {1:.2f} (g)\"\n",
    "for weight, intercept in zip(weights, intercepts):\n",
    "    predicted_body_mass = linear_model_flipper_mass(\n",
    "        flipper_length_range, weight, intercept\n",
    "    )\n",
    "\n",
    "    ax.plot(\n",
    "        flipper_length_range,\n",
    "        predicted_body_mass,\n",
    "        label=label.format(weight, intercept),\n",
    "    )\n",
    "_ = ax.legend(loc=\"center left\", bbox_to_anchor=(-0.25, 1.25), ncol=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "lines_to_next_cell": 2
   },
   "source": [
    "In the previous question, you were asked to create several linear models. The\n",
    "visualization allowed you to qualitatively assess if a model was better than\n",
    "another.\n",
    "\n",
    "Now, you should come up with a quantitative measure which indicates the\n",
    "goodness of fit of each linear model and allows you to select the best model.\n",
    "Define a function `goodness_fit_measure(true_values, predictions)` that takes\n",
    "as inputs the true target values and the predictions and returns a single\n",
    "scalar as output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "def goodness_fit_measure(true_values, predictions):\n",
    "    # we compute the error between the true values and the predictions of our\n",
    "    # model\n",
    "    errors = np.ravel(true_values) - np.ravel(predictions)\n",
    "    # We have several possible strategies to reduce all errors to a single value.\n",
    "    # Computing the mean error (sum divided by the number of element) might seem\n",
    "    # like a good solution. However, we have negative errors that will misleadingly\n",
    "    # reduce the mean error. Therefore, we can either square each\n",
    "    # error or take the absolute value: these metrics are known as mean\n",
    "    # squared error (MSE) and mean absolute error (MAE). Let's use the MAE here\n",
    "    # as an example.\n",
    "    return np.mean(np.abs(errors))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can now copy and paste the code below to show the goodness of fit for each\n",
    "model.\n",
    "\n",
    "```python\n",
    "for model_idx, (weight, intercept) in enumerate(zip(weights, intercepts)):\n",
    "    target_predicted = linear_model_flipper_mass(data, weight, intercept)\n",
    "    print(f\"Model #{model_idx}:\")\n",
    "    print(f\"{weight:.2f} (g / mm) * flipper length + {intercept:.2f} (g)\")\n",
    "    print(f\"Error: {goodness_fit_measure(target, target_predicted):.3f}\\n\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# solution\n",
    "for model_idx, (weight, intercept) in enumerate(zip(weights, intercepts)):\n",
    "    target_predicted = linear_model_flipper_mass(data, weight, intercept)\n",
    "    print(f\"Model #{model_idx}:\")\n",
    "    print(f\"{weight:.2f} (g / mm) * flipper length + {intercept:.2f} (g)\")\n",
    "    print(f\"Error: {goodness_fit_measure(target, target_predicted):.3f}\\n\")"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}