{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "

Introduction to Python for Data Sciences

Franck Iutzeler
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "

\n", "\n", "
Chap. 4 - Scikit Learn
\n", "\n", "

\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1- Scikit Learn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "Now that we explored data structures provided by the Pandas library, we will investigate how to learn over it using **Scikit-learn**.\n", "\n", "Scikit-learn is ont of the most celebrated and used machine learning library. It features a complete set of efficiently implemented machine learning algorithms for classification, regression, and clustering. Scikit-learn is designed to operate over Numpy, Scipy, and Pandas data structures. \n", "\n", "**Links:** [Scikit-learn webpage](http://scikit-learn.org) [Wikipedia article](https://en.wikipedia.org/wiki/Scikit-learn)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Machine Learning problems" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Machine learning* is the task of predicting properties out of some data. The *dataset* consists in several *examples* or *samples* and the associated target properties can be available, partially available, or not at all; we respectively call these setting *supervised*, *semi-supervised*, *unsupervised*. The examples are made out of one or several *features* or *attributes* that can be of different types (real number, discretes values, strings, booleans, etc.). \n", "\n", "Learning problems can be broadly divided in a few categories:\n", "* **supervised learning** \n", " * **classification:** Place incoming data into a finite number or classes by learning over labeled data. Example: Classifying iris into species based on recorded petal and sentil sizes from the 3 species. \n", " * **regression:** Predict a value from example data. To the difference of classification, the output value is continuous. Example: Predict the carbon monoxide concentration for next years based on previous measures.\n", "* **unsupervised learning**\n", " * **clustering:** Place the data (both new and the dataset) into a finite number of classes. To the difference with classification, no labeled data is provided. Example: Create market segments from customer information for targeted advertising.\n", " * **dimension reduction:** Discard uniformative features for the purpose of visualization or efficient storage. Example: Creation of eigenfaces in visage recognition. \n", " \n", "\n", "The following flowchart can be found on the [Scikit Learn website](http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html):\n", "\n", "![Scikit Learn Algorithm cheatsheet](img/ml_map.png \"Scikit Learn Algorithm cheatsheet\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Learning with Scikit Learn\n", "\n", "The process of learning and predicting with Scikit Learn follows three main steps:
\n", "**1. Selecting and adjusting a model**
\n", "**2. Fitting the model to the data**
\n", "**3. Predicting from this fitted model**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will illustrate this process on a simple *linear model*\n", "$$ y = a x + b + \\nu$$\n", "where \n", "* $(x,y)\\in\\mathbb{R}^m\\times\\mathbb{R}^m$ are the data points. $x$ contains the examples and $y$ the associated outputs \n", "* $a,b$ are the model coefficients to estimate\n", "* $\\nu$ is a standard centered white Gaussian noise" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD8CAYAAABuHP8oAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWWUlEQVR4nO3df5DcdX3H8deLyyHHj3pQr5lw4UqmOsmoKVzdIjYdC0GEiiNntCqjNrbMpJ2pLVKLhv4DbXUSGyr6h+M0CppOKeJADAxYkSGxjEwn7YVLIRAYKP7KGkgYcmqaiJfw7h/3vWSz2dv97t7++H7v+3zMZLL7vb3d9w748sPn+/58Po4IAQDy55ReFwAAaA0BDgA5RYADQE4R4ACQUwQ4AOQUAQ4AOZU6wG332Z6wfX/yfInt7bafs32X7VM7VyYAoFozI/DrJO2ueP45SbdGxOslHZB0bTsLAwDUlyrAbS+WdJWkrybPLWmlpLuTl2ySNNaB+gAAs1iQ8nVfkPQpSWclz39d0mREHEme75E0XOsXba+RtEaSzjjjjLcsW7as5WIBoIh27NjxUkQMVV9vGOC23y1pX0TssH1Jsx8cERslbZSkUqkU4+Pjzb4FABSa7R/Vup5mBL5C0ntsv0vSaZJ+TdIXJQ3aXpCMwhdLKrerWABAYw3nwCPixohYHBHnS/qQpK0R8WFJ2yS9P3nZakn3dqxKAMBJ5tIH/mlJf237OU3Pid/WnpIAAGmkvYkpSYqI70n6XvL4eUkXtb8kAEAarMQEgJxqagQOAGhsy0RZGx58Rj+dPKxzBwd0wxVLNTZas9N6TghwAGijLRNl3bj5CR2eOipJKk8e1o2bn5Cktoc4UygA0EYbHnzmWHjPODx1VBsefKbtn0WAA0Ab/XTycFPX54IAB4A2OndwoKnrc0GAA0Ab3XDFUg30951wbaC/TzdcsbTtn0WAA0AbjY0Oa92q5RoeHJAlDQ7067T+U3T9XTu1Yv1WbZlo364jBDgAtNnY6LAeXbtSt37wQr1y5FUdODSl0PGOlHaFOAEOAB3S6Y4U+sABoEq7FuJ0uiOFETgAVJhZiFOePHxs2uMTd+3U6N9/t+mpj053pDACB4AKtaY9JOnAoSnduPkJjf/oZW17ev9Jo/Nao/Ybrlh6wqpMqb0dKY6ItrxRGpzIAyDrlqx9QM2k4kB/n973lmHds6N8UlCvW7VckuY8HWN7R0SUqq8zAgeACucODqjcxBz14amjunP7T3S0ajA8c7Py0bUrO7KRlcQcOACcoNZCnEaqw3tGJ5bPV2IEDmDea6arZOb6zfc9qcnDU6nev8+uGeKdWD5fiRE4gHmtVldJo8U0Y6PD2nnTO/WFD154bEVln13ztZZ0zVvP69ry+UoEOIB5rdXFNNWj9lohbUkfvnhEnxlbfsLy+eHBAa1btbxjc9/HPr9RF4rt0yQ9Iuk1mp5yuTsibrL9dUl/IOlnyUs/FhE7670XXSgAuq1eV8nwLNMp1YcySMe7TWq1EHbaXLpQXpG0MiIO2u6X9H3b/5787IaIuLudhQJAOw2e3q8Dh2rPZc92Ws5so/ZtT+/Xo2tXdq7YJjWcQolpB5On/cmf7jWPA0CLtkyUdfCXR+q+ptZ0SjcPZZiLVHPgtvts75S0T9JDEbE9+dFnbT9u+1bbr5nld9fYHrc9vn///vZUDQApbHjwGU292ni8WR3M3TyUYS5SBXhEHI2ICyUtlnSR7TdLulHSMkm/K+kcSZ+e5Xc3RkQpIkpDQ0PtqRoAUkg7Yq4O5m4eyjAXTfWBR8Sk7W2SroyIW5LLr9j+mqS/aXt1ANBAvR7vNKsqawXzzO+3Y0fCTmoY4LaHJE0l4T0g6XJJn7O9KCL22rakMUm7OlsqAJyouluk+qZkrc2k+k+xzjxtgSYPTdUN5rHR4cwFdrU0I/BFkjbZ7tP0lMs3I+J+21uTcLeknZL+vHNlAsDJ6vV4VwZw1kfSrWoY4BHxuKTRGtez00sDoJDSdIvkYSTdKlZiAsitvHSLdAoBDiC38tIt0insRgggt+b7HHcjBDiAXJvPc9yNMIUCADlFgANATjGFAiDTmjlNp2gIcACZ1WilZdExhQIgs1o9TacoCHAAmZWXfbl7hQAHkFlFX2nZCAEOILOKvtKyEW5iAsisoq+0bIQAB5BpRV5p2QgBDqCt6NvuHgIcQNvQt91d3MQE0Db0bXcXAQ6gbejb7q6GAW77NNv/Zft/bD9p+++S60tsb7f9nO27bJ/a+XIBZBl9292VZgT+iqSVEXGBpAslXWn7Ykmfk3RrRLxe0gFJ13asSgC5QN92dzUM8Jh2MHnan/wJSSsl3Z1c3yRprBMFAsiPsdFhrVu1XMODA7Kk4cEBrVu1nBuYHZKqC8V2n6Qdkl4v6UuS/lfSZEQcSV6yR1LNf0K210haI0kjIyNzrRdAxtG33T2pAjwijkq60PagpG9JWpb2AyJio6SNklQqlaKFGgFkBD3e2dJUH3hETNreJultkgZtL0hG4YsllTtRIIBsoMc7e9J0oQwlI2/ZHpB0uaTdkrZJen/ystWS7u1QjQAygB7v7EkzAl8kaVMyD36KpG9GxP22n5L0DdufkTQh6bYO1gmgx+jxzp6GAR4Rj0sarXH9eUkXdaIoANlz7uCAyjXCmh7v3mElJoBU6PHOHjazAgqsma4S9ubOHgIcKKhWukro8c4WplCAgqKrJP8IcKCg6CrJPwIcKCh2Dsw/AhwoqEuXDclV1+gqyRcCHCigLRNl3bOjrMrNiSzpfW/hJmWeEOBAAdW6gRmSHnh8b28KQksIcKCAZrtReeDQlLZMsC9dXhDgQI5tmShrxfqtWrL2Aa1YvzV1+Na7UUkbYX4Q4EBOzSzEKU8eVuj4Qpw0IV7vRiVthPlBgAM5dfN9T7a8EGdsdFiDA/01f0YbYX4Q4EAObZkoa/LwVM2fpR1B3/yeN7E5Vc6xFwqQQ/VG2WlH0GxOlX8EOJBD9UbZzYyg2Zwq3whwIIcGT+/XgUMnT6Gcffr0vPaK9VsZVRcAAQ7kzJaJsg7+8shJ1/v7rKt+exEHDxcINzGBnNnw4DOaejVOun7GqQu07en9bBFbIGlOpT/P9jbbT9l+0vZ1yfWbbZdt70z+vKvz5QKYbf77Z4en2CK2YNJMoRyR9MmIeMz2WZJ22H4o+dmtEXFL58oDUK3R4cIcPFwcDUfgEbE3Ih5LHv9C0m5JTKYBLWp1+fuMeocLc/BwsTR1E9P2+ZJGJW2XtELSx23/saRxTY/SD9T4nTWS1kjSyMjIXOsFcq2Vcyirpenfpre7GBxx8s2Qmi+0z5T0H5I+GxGbbS+U9JKmd6H8B0mLIuJP671HqVSK8fHxOZYM5NeK9VtrTnEMDw7o0bUre1AR8sD2jogoVV9P1YViu1/SPZLuiIjNkhQRL0bE0Yh4VdJXJF3UzoKB+YibjGinNF0olnSbpN0R8fmK64sqXvZeSbvaXx4wv3AOJdopzQh8haSPSlpZ1TL4j7afsP24pEslXd/JQoH5gJuMaKeGNzEj4vvSSWefStK3218OML/VuwG5ZaLMzUc0haX0QJfV2kCqHd0pKB6W0gMZUOuQYZbAoxECHMgAulPQCgIcyAC6U9AKAhzIALpT0ApuYgIZwPFmaAUBDmQEx5uhWUyhAEBOMQIHJBbRIJcIcBQei2iQV0yhoPBYRIO8IsBReLX25653HcgKAhyFVu84sz7X2sMNyA4CHIU1M/c9m6MpT6sCeoUAR2HVmvuuNMwydmQcAY7CqrdRFMvYkQcEOAprto2i+mytW7WcFkJkHgGOwpptA6l/+sAFhDdyIc2hxufZ3mb7KdtP2r4uuX6O7YdsP5v8fXbnywXaZ2x0WOtWLdfw4ICs6TlvRt7IE0eDO+3J6fOLIuIx22dJ2iFpTNLHJL0cEettr5V0dkR8ut57lUqlGB8fb0vhAFAUtndERKn6eppDjfdK2ps8/oXt3ZKGJV0t6ZLkZZskfU9S3QAHOoF9TFBUTe2FYvt8SaOStktamIS7JL0gaWF7SwMaYx8TFFnqm5i2z5R0j6RPRMTPK38W0/MwNedibK+xPW57fP/+/XMqFqjGPiYoslQBbrtf0+F9R0RsTi6/mMyPz8yT76v1uxGxMSJKEVEaGhpqR83AMRwGjCJL04ViSbdJ2h0Rn6/40X2SViePV0u6t/3lAfVxGDCKLM0IfIWkj0paaXtn8uddktZLutz2s5LekTwHuorDgFFkabpQvi9ptm3ZLmtvOUBz6h0GTHcK5jtO5EHu1ToMmO4UFAEBjp7q1Ci5XncKAY75ggBHz3RylEx3CoqAzazQM53s4aY7BUVAgKNnOjlKpjsFRUCAo2c6OUpmp0EUAXPg6Jkbrlh6why41N5Rcq3uFGA+IcDRM/V6uAE0RoCjpxglA60jwJFZrKQE6iPAkUmspAQaowsFmcQ+30BjBDgyiZWUQGMEODJly0RZK9ZvrX28k6RTbG2ZKHe1JiCrmANHZlTPe9dyNIK5cCDBCByZUWveuxbmwoFpBDgyo5n5bebCAQIcGdLMHijsKggQ4MiQWjsI9p9i9fedeKIfuwoC09KcSn+77X22d1Vcu9l2ueqQY2BOau0guOGPLtCG91/AroJADY6YrWEreYH9dkkHJf1LRLw5uXazpIMRcUszH1YqlWJ8fLzFUgGgmGzviIhS9fU0p9I/Yvv8jlSF3GO/EqB35jIH/nHbjydTLGfP9iLba2yP2x7fv3//HD4OWTPTt12ePKzQ8f1KWGgDdEerAf5lSb8l6UJJeyX902wvjIiNEVGKiNLQ0FCLH4csYr8SoLdaCvCIeDEijkbEq5K+Iumi9paFPGC/EqC3Wgpw24sqnr5X0q7ZXov5i5Pfgd5qeBPT9p2SLpH0Ott7JN0k6RLbF0oKST+U9GedKxFZM3Pjsjx5WJZO2HiKHm2ge9J0oVxT4/JtHagFOVC94VRIx0J8mC4UoKvYjRBNqXXjcia8H127sjdFAQXFUno0hRuXQHYQ4GgKNy6B7CDA0ZRaG05x4xLoDebA0ZSZG5Qsnwd6jwBH08ZGhwlsIAOYQgGAnCLAASCnmEIpOLaDBfKLAC+w6lWVM9vBSiLEgRxgCqXA2A4WyDcCvMBYVQnkGwFeYKyqBPKNAC8wVlUC+cZNzAJjVSWQbwR4wbGqEsgvplAAIKcIcADIqYYBbvt22/ts76q4do7th2w/m/x9dmfLBABUSzMC/7qkK6uurZX0cES8QdLDyXMAQBc1DPCIeETSy1WXr5a0KXm8SdJYe8sCADTS6hz4wojYmzx+QdLCNtUDAEhpzm2EERG2Y7af214jaY0kjYyMzPXj5h12AwTQqlZH4C/aXiRJyd/7ZnthRGyMiFJElIaGhlr8uPlpZjfA8uRhhY7vBrhlotzr0gDkQKsBfp+k1cnj1ZLubU85xcJugADmIk0b4Z2S/lPSUtt7bF8rab2ky20/K+kdyXM0qTzLrn+zXQeASg3nwCPimll+dFmba5nXas11A8BcsBdKF9Q7+QYAWsVS+i6Yba57Nn12p0sCMA8Q4F3Q7Ak317z1vA5VAmA+IcC7YLYTboYHB/SRi0eOjbj7bH3k4hF9Zmz5sddsmShrxfqtWrL2Aa1Yv5UWQwDHMAfeBTdcsfSEOXDp+Mk3Y6PDJwR2JU6NB1APAV6hU6siWz35pl6fOAEOgABPdHq028rJN5waD6Ae5sATaVdFdnNOmlPjAdRDgCfSjHa7vXcJp8YDqIcAT6QZ7XZ775Kx0WGtW7Vcw4MDsqa7VtatWs78NwBJzIEfU69TZEYv5qQ5NR7AbBiBJ9KMdpmTBpAljMArNBrtphmlA0C3EOBNaLWfGwA6gQBXcwt4mJMGkBWFD3CWqwPIq8LfxORYMwB5VfgAZ7k6gLwqfIDTGgggr+YU4LZ/aPsJ2zttj7erqG5iuTqAvGrHTcxLI+KlNrxPT9AaCCCvCt+FItEaCCCf5hrgIem7tkPSP0fExuoX2F4jaY0kjYyMzPHjOqNTBzkAQCfNNcB/PyLKtn9D0kO2n46IRypfkIT6RkkqlUoxx887pjJ0XzvQL1uaPDTVdADTBw4gr+Z0EzMiysnf+yR9S9JF7Siqkep9uScPT+nAoamW9uimDxxAXrUc4LbPsH3WzGNJ75S0q12F1VMrdCs1E8D0gQPIq7lMoSyU9C3bM+/zbxHxnbZU1UCacK0+SWe2Oe5zBwdUrvF+9IEDyLqWR+AR8XxEXJD8eVNEfLadhdWTJlxnXtPoGDT6wAHkVeZXYtY6RLhW6FaqDOBGc9wcWwYgrxzRtsaQhkqlUoyPp1+wWd0hIk2H87pVyyUpVRfKkrUPqNY3tKQfrL9qLl8HALrC9o6IKFVfz/RCnnqj50fXrkw1SmaOG8B8lekplHZ0iDDHDWC+ynSAt2OnQOa4AcxXmZ5Cadchwux1AmA+ynSA19op8NJlQ9rw4DO6/q6d7FsCoNAyHeDSiaNn9i0BgOMyPQdejX1LAOC4XAU4+5YAwHG5CnDOrwSA43IV4PR0A8Bxmb+JWYnzKwHguFwFuERPNwDMyNUUCgDgOAIcAHKKAAeAnCLAASCnCHAAyKmunshje7+kH7Xwq6+T9FKby8mLon73on5vqbjfvajfW2r83X8zIoaqL3Y1wFtle7zWcUJFUNTvXtTvLRX3uxf1e0utf3emUAAgpwhwAMipvAT4xl4X0ENF/e5F/d5Scb97Ub+31OJ3z8UcOADgZHkZgQMAqhDgAJBTmQ9w21fafsb2c7bX9rqebrB9nu1ttp+y/aTt63pdU7fZ7rM9Yfv+XtfSLbYHbd9t+2nbu22/rdc1dYvt65N/13fZvtP2ab2uqVNs3257n+1dFdfOsf2Q7WeTv89O816ZDnDbfZK+JOkPJb1R0jW239jbqrriiKRPRsQbJV0s6S8K8r0rXSdpd6+L6LIvSvpORCyTdIEK8v1tD0v6K0mliHizpD5JH+ptVR31dUlXVl1bK+nhiHiDpIeT5w1lOsAlXSTpuYh4PiJ+Jekbkq7ucU0dFxF7I+Kx5PEvNP0/5MJsgm57saSrJH2117V0i+3XSnq7pNskKSJ+FRGTPS2quxZIGrC9QNLpkn7a43o6JiIekfRy1eWrJW1KHm+SNJbmvbIe4MOSflLxfI8KFGSSZPt8SaOStve4lG76gqRPSXq1x3V00xJJ+yV9LZk6+qrtM3pdVDdERFnSLZJ+LGmvpJ9FxHd7W1XXLYyIvcnjFyQtTPNLWQ/wQrN9pqR7JH0iIn7e63q6wfa7Je2LiB29rqXLFkj6HUlfjohRSf+nlP8ZnXfJfO/Vmv4/sXMlnWH7I72tqndiurc7VX931gO8LOm8iueLk2vznu1+TYf3HRGxudf1dNEKSe+x/UNNT5mttP2vvS2pK/ZI2hMRM/+ldbemA70I3iHpBxGxPyKmJG2W9Hs9rqnbXrS9SJKSv/el+aWsB/h/S3qD7SW2T9X0jY37elxTx9m2pudCd0fE53tdTzdFxI0RsTgiztf0P++tETHvR2MR8YKkn9hemly6TNJTPSypm34s6WLbpyf/7l+mgtzArXCfpNXJ49WS7k3zS5k+1Dgijtj+uKQHNX1n+vaIeLLHZXXDCkkflfSE7Z3Jtb+NiG/3riR0wV9KuiMZrDwv6U96XE9XRMR223dLekzTHVgTmsfL6m3fKekSSa+zvUfSTZLWS/qm7Ws1veX2B1K9F0vpASCfsj6FAgCYBQEOADlFgANAThHgAJBTBDgA5BQBDgA5RYADQE79P8YZtRCd8beXAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "a = np.random.randn()*5 # Drawing randomly the slope\n", "b = np.random.rand()*10 # Drawing randomly the initial point\n", "\n", "m = 50 # number of points\n", "\n", "x = np.random.rand(m,1)*10 # Drawing randomly abscisses\n", "y = a*x + b + np.random.randn(m,1) # y = ax+b + noise\n", "\n", "plt.scatter(x, y)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 1. Selecting and adjusting a model\n", "\n", "As we want to fit a linear model $y=ax+b$ through the data, we will import the `Linear Regression` module from scikit learn with `sklearn.linear_model import LinearRegression`.\n", "\n", "As our model has a non null coefficient at the origin, the model needs an *intercept*. This can be tuned, along with several other parameters, see Scikit Learn's [linear_model documentation](http://Scikit-Learn.org/stable/modules/linear_model.html)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LinearRegression()\n" ] } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "model = LinearRegression(fit_intercept=True)\n", "print(model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This terminates our model tuning. Notice that we have described our model, but no learning or fitting has been done." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Fitting the model to the data\n", "\n", "\n", "Applying our model to the data $(x,y)$ is done using the `fit` method." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression()" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.fit(x,y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the model is fitted, one can observe the learned coefficients:\n", "* `coef_` for the model coefficients ($a$ here)\n", "* `intercept_` foe the intercept ($b$ here)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Learned coefficients: a = 3.637057 \t b = 2.559807\n", "True coefficients: a = 3.674435 \t b = 2.385000\n" ] } ], "source": [ "print(\"Learned coefficients: a = {:.6f} \\t b = {:.6f}\".format(float(model.coef_),float(model.intercept_)))\n", "print(\"True coefficients: a = {:.6f} \\t b = {:.6f}\".format(a,b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Predicting from this fitted model\n", "\n", "From a feature matrix, the method `predict` returns the predicted output from the fitted model. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "xFit = np.linspace(-2,12,21).reshape(-1, 1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "yFit = model.predict(xFit)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXAAAAD4CAYAAAD1jb0+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAoI0lEQVR4nO3deZzO5f7H8ddlDEZobMmS5UT2ZWpIBifJ1mbSIsc5CFHpZ+soUlEppFOprEWpY0nSUNmXsrVYBmNJUeoYRDIhg1mu3x/fGc0+w9z3fO975v18PHqY+57vfd+f8XDe55rre12fy1hrERER/1PI7QJEROTyKMBFRPyUAlxExE8pwEVE/JQCXETETxXOyw8rV66crV69el5+pIiI39u6detv1tryaZ/P0wCvXr06W7ZsycuPFBHxe8aYnzN6XlMoIiJ+SgEuIuKnFOAiIn4qT+fAMxIXF8ehQ4c4d+6c26W4qlixYlSpUoXAwEC3SxERP+F6gB86dIiSJUtSvXp1jDFul+MKay0nTpzg0KFD1KhRw+1yRMRPuD6Fcu7cOcqWLVtgwxvAGEPZsmUL/G8hInJpXA9woECHdzL9HYjIpfKJABcRybd++QUGD4b4eI+/tQLcw6pXr85vv/2W62tExM8lJsLkyVC/Prz9Nmzf7vGPUICLiHja99/DzTfDgAFw002wezeEhnr8YxTgwMGDB6lTpw69evXiuuuuo3v37qxatYqwsDBq1arFt99+y++//054eDiNGjWiefPm7Ny5E4ATJ07Qvn176tevT9++fUl5wtF///tfmjVrRpMmTejfvz8JCQlu/Ygikhfi42H8eGjUCKKi4N13Yfly8FIPKNeXEaYyeLDnf81o0gRefz3by/bv389HH33EzJkzadq0KXPmzGHDhg0sXryYl156iWuuuYaQkBAiIiJYs2YNPXr0YPv27Tz33HO0bNmSZ599ls8//5wZM2YAsHfvXj788EM2btxIYGAgjz76KLNnz6ZHjx6e/flExDds3w59+sC2bdClC0yaBFdf7dWP9K0Ad1GNGjVo2LAhAPXr16dt27YYY2jYsCEHDx7k559/5uOPPwbglltu4cSJE5w6dYp169axcOFCAG6//XZKly4NwOrVq9m6dStNmzYFIDY2lquuusqFn0xEvOrcOXjhBWfkXa4cLFgA99yTJx/tWwGeg5GytxQtWvTi14UKFbr4uFChQsTHx1/yDklrLT179mTs2LEerVNEfMimTc6o+7vvoGdPePVVKFMmzz5ec+A51KpVK2bPng3AF198Qbly5ShVqhStW7dmzpw5ACxdupSTJ08C0LZtWxYsWMCxY8cA+P333/n55ww7QoqIvzlzBgYOhJYt4exZWLYM3nsvT8MbfG0E7sNGjx5N7969adSoEcWLF2fWrFkAjBo1im7dulG/fn1atGhB1apVAahXrx5jxoyhffv2JCYmEhgYyKRJk6hWrZqbP4aI5NaKFdCvn7O++7HH4MUXoWRJV0oxKVdNeFtoaKhNe6DD3r17qVu3bp7V4Mv0dyHiw06ehKFDnZF27dowYwaEheXJRxtjtlpr061D1BSKiEh2Fi6EevXggw/gqaecFSd5FN5Z0RSKiEhmjh51pkk+/hhCQmDpUmdpso/QCFxEJC1rnamSevXgs89g7Fj45hufCm/QCFxEJLWDB6F/f+dmZcuW8M47zpy3D9IIXEQEnOZTb74JDRo467snTYIvv/TZ8AaNwEVEnI04ffvCxo3QsSNMnQp+sORXI/A0Ro8ezSuvvJLp9yMiItizZ08eViQiXhMXBy+9BI0bw9698P77sGSJX4Q3KMAvmQJcJJ/Ytg2aNYORI6FzZ9izB/71L/Cj07H8LsAjIqMJG7eGGsM/J2zcGiIio3P9ni+++CLXXXcdLVu2ZN++fQC8/fbbNG3alMaNG3PPPfdw9uxZNm3axOLFixk2bBhNmjThwIEDGV4nIj4sNhZGjHDC++hRZ433/PlQoYLblV0yvwrwiMhoRiyMIjomFgtEx8QyYmFUrkJ869atzJs3j+3bt7NkyRI2b94MQJcuXdi8eTM7duygbt26zJgxgxYtWnDXXXcxYcIEtm/fzrXXXpvhdSLio9avd5YCjhsHvXo5o+6773a7qsvmVwE+Yfk+YuNSH4oQG5fAhOX7Lvs9169fz913303x4sUpVaoUd911FwC7du2iVatWNGzYkNmzZ7N79+4MX5/T60TERadPO6fjtG4NFy7AypXO8sCk9s/+yq8C/HBM7CU9nxu9evXirbfeIioqilGjRnHu3LlcXSciLlm61DmXcsoUGDQIdu2CW291uyqPyHGAG2MCjDGRxpjPkh7XMMZ8Y4zZb4z50BhTxHtlOioFB13S8znRunVrIiIiiI2N5fTp03z66acAnD59mooVKxIXF3exjSxAyZIlOX369MXHmV0nIi47cQJ69IDbboMSJZwlgq+/Dldc4XZlHnMpI/BBwN4Uj8cDr1lrawIngT6eLCwjwzrUJigwINVzQYEBDOtw+Qvtr7/+erp27Urjxo3p1KnTxRN0XnjhBW688UbCwsKoU6fOxesfeOABJkyYQEhICAcOHMj0OhFxibXw0UfONvi5c+GZZyAy0jlcOJ/JUTtZY0wVYBbwIjAUuBM4DlxtrY03xtwEjLbWdsjqfTzRTjYiMpoJy/dxOCaWSsFBDOtQm/CQyjl+vS9TO1mRXDp82JnrjoiAG26AmTOdA4b9XGbtZHO6E/N14AkguWt5WSDGWhuf9PgQkGGKGmP6Af2Ai4cd5EZ4SOV8E9gi4iHWOmH9+ONw/jy8/DIMGQKF8/dm82ynUIwxdwDHrLVbL+cDrLXTrbWh1trQ8uXLX85biIhk7scfoV07Zyt848awcycMG5bvwxtyNgIPA+4yxtwGFANKAROBYGNM4aRReBXgshdjW2sxfrT7yRvy8mQkkXwhIcFpPjVyJAQEOKtM+vWDQn61uC5Xsv1JrbUjrLVVrLXVgQeANdba7sBa4N6ky3oCiy6ngGLFinHixIkCHWDWWk6cOEGxYsXcLkXEP+zZ47R6HTIE2rSB3bvh4YcLVHhD7roRPgnMM8aMASKBy9qCWKVKFQ4dOsTx48dzUYr/K1asGFWqVHG7DBGfkOlihQsXYPx4eOEFKFUKZs+Gbt38qn+JJ7l+qLGISErJLTNS7roOCgxgSp1Ebh4/HKKi4IEH4I03oIDcV8vtKhQRkTyRtmVGsbhzDFo7h1YvRkDFq2HRIkhqeVHQKcBFxKekbI1x4y9RjFv2BjVOHmFO447848t5cOWVLlbnWxTgIuJTKgUHcerX3xj+xbt0376Mg8EV6fbAS3xbvTHFfzxDeIgCPJkCXER8yivFD1FjxlDKnznJ9KZ382qr7pwLLAbWMmJhFIA28yVRgItInsi2Dcbx4zB4MDfNmcMfNWtz790jiax4Xar3SG4frQB3FKxFkyLiiowOYxn84XZCnl9BxLZDMG+e03zqo49g9Giu3L2T7WnCO5k32kf7K43ARcTrMjqMBaDo0SOU6PoM7P+W3VXqsPrpaXwYG8zhZ1dSyBgSMljmnJv20fmNAlxEvC7tqNnYRB7YsYIRa2cSmJjAC7f05d0b7iTxSADgXJtReOe2fXR+owAXEa+rFBxEdFKIVzt5mHHL3uSmX6LYVLURwzv+H7+UrpjpawOMIdHafNc+2hMU4CLiEVndpBzWoTYjF2yn21cLeXz9bOIKBfBkx//jw0bts90Gn2gtP427PS9+BL+jABeRXEu7/T06JjbVkr/wwJOEzh9Olf27WFmzGU+3f5RfS5bL0XtrzjtzCnARybXnPt2d7iZlbFwCI+Zsoc7U/3DdzLcIKlKcx+56gs/qtMpx8ynNeWdNAS4iuRIRGc3Js3Hpnm9yeB/jl06k9m+/8Fmjtjzz996cLJ75LkoDtLi2DAdPxObLIxO9QQEuIrkyYfm+VI+DLpzj8fUf0HvLYo6WLMuD945i7bVNM329AYX1ZVKAi8glSXuzMjrFEsGbft7BuGVvUi3mKB+E3Mb4v/fiTNHimb5X5eAgNg6/JS/KzpcU4CKSYxndrDRAyXNnGLF2Jt12ruDH0pXo2m0s31RtePF1pYsHci4uMV2Pb81v544CXERyLKMdlbf+8DVjVkym3J8xTL3xHl4L+wfnA4te/H5QYACj7qx/8fWa3/YcBbiI5FjKHZVl/4xh9Kpp3PndevaWr85TPcewpkRVrgwKpLiBmLNx6YJage1ZCnARybFKwUFEnzxL+J4vGLVqOsXjYnml1T9Z3OFfrBvZ3u3yChwFuIjk2LONS1Js4HD+vn8zWyvV4clOA4muWIOxt9V3u7QCSQEuItlLTIRp0+jw5JPEx8Xz+h2P8mbdDlxdpgRjNZftGgW4iGQoeblgkR/38+qqSYT8tBNuvZXC06czuEYNBrtdoCjARSS9iMhonl6wne6bFjBkwxzOBwTy1B1DaPbcUMJrVHG7PEmiABeRdD55bwlz5r9Mo6P7WV6rOU+3f5TjJcrw5YrvCb9eAe4rFOAi8pfz52HMGN55aywxxUryaOfhLKkddrH5lI4z8y0KcBFxfPUV9OkDe/eyKqQdI1o9SExQqVSXqLWrb9GhxiIF3ZkzMHgwhIU5Xy9ZwvkZ73K+VOlUl2nru+/RCFwkn8vqpBxWroR+/eDgQRgwAMaOhZIlCU96rba++zYFuEg+ltlJOYGnYrj9/Vdh5kyoVQvWrYNWrVK9NjyksgLbxynARfKxjJpPtd69nhsnToWzf8Dw4fDssxCkuW1/lG2AG2OKAeuAoknXL7DWjjLG1ADmAWWBrcC/rLUXvFmsiFyalKtGyv15ktErp3HHvg3suaoG5daugBtucLE6ya2c3MQ8D9xirW0MNAE6GmOaA+OB16y1NYGTQB+vVSkil6VScBBYS5ddq1n1ziO02/81L7fuwcMDpyq884FsR+DWWgucSXoYmPSfBW4B/pH0/CxgNDDF8yWKyOV6tlEJggY+SesDW9hSuS5PdhrI4aurq/lUPpGjOXBjTADONElNYBJwAIix1sYnXXIIyPBuhzGmH9APoGrVqrmtV0QykXK1SeVSRZl06hs6TBpHfEIir935GG/WbU/F0leo+VQ+kqMAt9YmAE2MMcHAJ0CdnH6AtXY6MB0gNDTUXkaNIpKNlKtN/nbiEONmv0HjQ3s41rw1V82dxZDq1RnidpHicZe0CsVaG2OMWQvcBAQbYwonjcKrANHeKFBEsjdh+T7izp3nkc2fMHjDHGIDi/L4bUP4uuXtbKxe3e3yxEtysgqlPBCXFN5BQDucG5hrgXtxVqL0BBZ5s1ARyVzwvl1MW/oGDX49wJLrWjCq3SMcL1Ea88c5t0sTL8rJCLwiMCtpHrwQMN9a+5kxZg8wzxgzBogEZnixTpECL8MdlXXLwgsvsOj9cZwMKsXD4SNYVjvs4mvUuyR/y8kqlJ1ASAbP/wg080ZRIpJaRjsqP5z4IW2/nELJg/uJvvN+ul53L0cLF7/4GvUuyf/UzErED6TcUVn8QiyjVk1j9qx/cybmNCxfTrXFHzK8ewsqBwdhgMrBQYzt0lCrTfI5baUX8QPRSTsqW/20jbHL3qLSqePMuuEOXmndg93tndPg1buk4FGAi/i4iMhogmNPM3LNDO7btYr9ZapwX/fxbK1Sj8qa4y7QFOAiPm7zf95mZcRESp89xZs3deWtFl05X7gIBjTHXcApwEVckmWfboAjR+Cxx3hx4UJ2VbiWnvc9z54Kf7v4bQuaMingFOAiLsisTzdAeJNKMGsWDBkCsbGMv7kX05veTUKhgFTvEZB0TqUUXFqFIuKC0Yt3p+vTHRuXwAfzvoQOHeDBB6FhQ9ixgyk33psuvAESrDpTFHQagYvksYjIaGJi41I9VygxgR7bPmfYuvehaGGYNAkefhgKFaJycPTFVSgp6QamaAQukscmLN+X6vG1v/2P+XOGM3r1dLZcU5+wf71B2Kk6ROw4Ajg3KoMCU4/AtUlHQCNwkTyXfEpO4YR4+n/zMQM3zeVsYBBDbh/KJ/XbgDGQck486UalDhiWtBTgInksuHgglX7cy4SlE6l37Cc+q92S0e3689sVpVNdFxuXwITl+y5u0FFgS1oKcJG8FBvL/62YQY9NCzhxRTD97h7JiutuyvTywxnMfYskU4CLeEC2a7oB1q2Dvn3p/cMPzGvUnpfa9OZUsRJZvq+6CUpWFOAiuZTlmu6QynDqFIwYAZMnQ40aDOwzgcXl6qZ7H4OzOSeZblRKdrQKRSSXUnYKTJY8f83SpdCgAUyZAoMHQ1QUtwzoluGqku7Nq6qboFwSjcBFcimjeerSZ//g35/9B0ashXr1YNMmaN4cgPCQKwCtKpHcU4CL5FKl4KC/NtpYy+3fbeC5VVO58twZJrboxiedejC46DWEp3iNVpWIJyjARXJpWIfajFgYRcnfjzFm5RTa//A1O66uxT+7juG7q2rAmYR0a7pFPEEBLpJL4U0qUS1iLjVnjiYwPo6xbXrzTmjnVP1LUq7pFvEU3cQUyY0ff4R27Qh5fhglbwyl2N7dTG/WJcPmU1rTLZ6mABe5HAkJ8NprTsfAb7+FqVNhzRqoWTPTtdta0y2epgAXuVS7d0NYGAwdCm3awJ490L8/FHL+56TmU5JXFOAiOXXhAjz/PISEwIEDMGcOfPopVKmS6rLwkMqM7dJQa7rF63QTUwqsHG1/T7Z5M/TpA1FR0K0bTJwI5ctn+t5aJih5QSNwKZAiIqMZ9tEOomNisTjb34d9tIOIyOjUF549C8OGOZtwfv8dFi92Rt5ZhLdIXlGAS4ETERnNkPnbiUtMfSRZXKJl9OLdfz3xxRfQqBG88gr07evMfd95Z94WK5IFBbgUKMmNpzI7TjImNg7++MO5KdmmjfPkmjUwbRpceWXeFSqSA5oDlwIlo8ZTKd2y/1uo3x+OHIHHH3duWhYvnocViuScAlwKlMw205Q5+wejVk2n894vne6BCxdCs2Z5XJ3IpdEUihQo6TbTWMtde75k5TuP0GnfRvb2Hwpbtyq8xS9kG+DGmGuMMWuNMXuMMbuNMYOSni9jjFlpjPkh6c/S2b2XiNtSbrK5+tRvvPPx87zx6QQOl6nI+nnLqDv1P1CkiMtViuRMTqZQ4oHHrbXbjDElga3GmJVAL2C1tXacMWY4MBx40nulimQvu7Xd4SGVITGR78e8yiNLphFoE4ka+iwNX34WAtL3LxHxZdkGuLX2CHAk6evTxpi9QGWgM3Bz0mWzgC9QgIuLsj3aDGD/fsL//ZCzRLBNG3j7bRpee61LFYvkziXNgRtjqgMhwDdAhaRwBzgKVMjkNf2MMVuMMVuOHz+em1pFspTl0Wbx8c567oYNYds2ePttWL0aFN7ix3Ic4MaYEsDHwGBr7amU37PWWlKfx5rye9OttaHW2tDy2r0mXpTZCpOSP+yFFi2cHZXt2jnNp/r2BWPyuEIRz8pRgBtjAnHCe7a1dmHS078aYyomfb8icMw7JYrkTNoVJkXi4xiyfjafvjeIk3t+YPPYybBoEVRWjxLJH3KyCsUAM4C91tpXU3xrMdAz6euewCLPlycFWURkNGHj1lBj+OeEjVuTvk9JGilXmDQ5vI9PZw1i0Ka5fFq3Nbf0nkSPP2sQsf1wXpQukidysgolDPgXEGWM2Z703FPAOGC+MaYP8DNwv1cqlAIpRzck0wgPqUxA7Fn+fGIE929cyNGSZXnw3lGsvbapc4GONZN8JierUDYAmU0WtvVsOSKOrG5IZhrAa9Zw50MPwY8/8kHIbYz/ey/OFE29DV7Hmkl+op2Y4pMyC9oMn4+JgYcegrZtnVNxvviCqfc/ni68QceaSf6iABeflONzJRcvhvr1YeZMZ5XJzp3w97/rWDMpEBTg4pOyDeBjx+CBB6BzZyhXDr75Bl5+GYKcgNexZlIQqBuh+KTkoE23Lb5JJbaMeYNaLz1NsQuxzGr3IBXGPEvn0OoZvocCW/IzBbj4rHQB/L//cbT1rYRuWMO2SrV5otMg9perStCn32EDAxXWUuBoCkV8X2IiTJkC9etz5TebGN22H/d2f5n95aoCKbbLixQwGoGLb/v+e2eFybp1bK55PUPaPsqh4KvTXRat5YFSAGkELr4pPt65Kdm4MRcid/DUHUO4r8tzGYY3OBsVstupKZLfaAQuvmfHDujd2+kaePfd3FunGzsTsz6X0oJ2WUqBoxG4+I7z5+GZZyA0FA4dgo8+go8/Jiqb8E6mXZZS0CjAxTd89RWEhMCYMdC9u9Py9d57wZgc757ULkspaBTg4q4zZ2DwYAgLgz//hKVL4b33oGzZi5dktKknLe2ylIJIc+CS55LPrfxb5CbGr5xEpZNH4bHH4KWXoGTJdNdntKmnTZ3yrP3ueKZnX4oUBApwyVMRkdGMnb2Jx1dM5/6oVRwoU4XuPSZwX+9uhGcQ3sm0q1IkPQW45KlvXp3Bp5+8TpmzfzCp+X28EdaN84WLcFArSEQumQJc8sTSlZEUHTqYsbvWsfuqv/HgvaPYfXXNi9/XChKRS6cAF++ylq0vTOSmsc8QFHeel1v3YHqzLsQHpP6npxUkIpdOAS7e8/PP0L8/NyxfzubK9Rje6f84UPaadJdpBYnI5VGAi+clJsLkyTB8OACjbu3P+9ffjjXpV61W1goSkcumABfP2rcP+vSBjRuhQweYNo1Vcw9gM5jjrhwcxMbht7hQpEj+oI084hlxcTB2LDRu7OyifO89Z1NOtWo63kzESzQCl9yLjHSaT23fDvfcA2+9BVf/1TUw09N1NG0ikivGWptnHxYaGmq3bNmSZ58nXnbuHDz3HEyYwLngMjzfcQBzq4QqoEU8zBiz1VobmvZ5jcDl8mzY4Mx1f/89P9/Vlftr3cOvhZ2ugdExsYxYGAWgEBfxIs2By6U5fdrpW9KqFVy4AMuX84+b+l0M72Q65kzE+xTgknPLl0ODBs4SwYEDISoK2rfPdBeldleKeJcCXLL3++/Qsyd07AjFizvTJxMnQokSQOa7KLW7UsS7FOCStQULoG5dmDMHRo50Vpy0aJHqEi0TFHGHbmJKxo4cgQED4JNP4PrrnemTJk0yvFTLBEXcoQCX1Kx1NuEMHeosExw/3vm6cNb/VNSvWyTvZTuFYoyZaYw5ZozZleK5MsaYlcaYH5L+LO3dMiVP/PQTtG/vbMpp2NA5Hf6JJ7INbxFxR07mwN8DOqZ5bjiw2lpbC1id9Fj8VUICvPGGs8Lk66+dVSZffAHXXed2ZSKShWyHVtbadcaY6mme7gzcnPT1LOAL4ElPFiael3wWZap56mKnnA05X30FnTrB1KlQtarbpYpIDlzu78YVrLVHkr4+ClTI7EJjTD+gH0BVBYNrIiKjGbEwiti4BAB+PXGag0OfImHjPAJKlYQPPoDu3cEYlysVkZzK9eSmtdYaYzJtqGKtnQ5MB6cXSm4/Ty7PhOX7LoZ3g6P7mbDkdeoeP8iqRjdz68oP4aqrXK5QRC7V5Qb4r8aYitbaI8aYisAxTxYlnpFyysQCRePOM3jjXB76diEnrgjmoS5Ps7JWcw4qvEX80uUG+GKgJzAu6c9FHqtIPCLtlEmz/+1i3NI3+NvJw8xt1J6xbXpzqlgJl6sUkdzINsCNMXNxbliWM8YcAkbhBPd8Y0wf4Gfgfm8WKZcuecqkxPmzPPHlLHpEfs4vV1bgH13HsKl6E7fLExEPyMkqlG6ZfKuth2sRDzocE8vNB7bw4vJJXH3mBG83DefVlv8ktkixVNcFGJPx6hRtyhHxedqh4cMuO1hPnGDyiol0ilzJ92Wrcs8/J7C9UsZ9SZr/rXSqqRb18hbxH2pm5aOS57Cjk25AJgdrRGR05i+yFubPh7p16RC1lrdadeeOXhMvhnch89cqwQBj+Gfzqhw8EXsxvJOpl7eIf9AI3EelXPaXLGWwphuZVzDw6KOwaBGEhlJo1SqqJJSlfDYj+BrDP8/w89XLW8T3KcB9VGYBmjwSvzjlcfIsW555hdu/nElg/AWYMAEGD4bChQkn+2mQSsFBRGfwWerlLeL7NIXiozIL0ABjLob3NTFHmf3hSMZ8/jpR5Ws4J+T8+9+X1HxKvbxF/JcC3EdlFqwJ1lIoMYHemxexfOYAGh35gac6DODe+16AmjUv+XPCQyoztktDKgcHYYDKwUGM7dJQNzBF/ICmUHxUZockLHh/OY9/+DIhR/ax+tqmjGw/gKOlylE5F1Me6uUt4p8U4D4o7fLB17o2Ibx+eRg3jrveGkNMkSAG3jmMxXVbgzGa8hApoBTgPibtFvjomFhmv7WANuumcuX+7yjUrRtf9x3O1m9/w2jjjUiBpgD3MSmXDxaLO8eQDXPouzmCEyXLwOLFcOed3Abcdou7dYqI+xTgPiZ5+WDzX3Yydtmb1Dh5hNlNOjL+5gfZeeedLlcnIr5EAe5jahVLoFfEFP6xYxkHgyvS7YGX+Kpao1zdpBSR/EkB7ks+/ZRFU/pT5PgxpjXrwmst/8G5wGIYoE2d8m5XJyI+RgHuZckrSqJjYgkwhgRrqZz2xuPx4zBoEMydS1CDBkx5/D+8/PuVJB9fZIGPt0YTWq2MblaKyEXayONFKRtSASRYJ5IvNqbadgjmzIG6dWHBAnjuOdi6lf8mXk3as+fUYEpE0lKAe1FGDamSXXniV8p1vw+6d2d38ato/6/XCSvSkojdxzPtg6IGUyKSkqZQPCztOZRpGZtItx3LGbF2JoUTExnXrh/Tm9xOYqEASBqZBxcP5OTZuHSvVYMpEUlJAe5BaTfhpFXt5GHGLXuTm36JYmO1RozsNJCDV16d6prYuASKFi5EUGBAqvfRbksRSUsB7kGZTZkEJDWfenzDf7lQqDBPdvw/Ft/Qidj4xAzf54/YOF7r2kTHnIlIlhTgHpTRHHWdYz8xfukbND76Aytr3sjT7R+h8DXXMLZD7YurU9KqFBykBlMiki0FuAelPByhSHwcA76az6Nfz+dMUEmYN492999Pu+QzzZKknXLRVImI5JRWoXhQcg/vkOjv+Oy9QQzaNJfP6rbmlt6TCPupPBHbD6e6Xr24RSQ3NAL3oPDrgmnw00L+NucdjpYoy4P3jmLttU0BOJnJae+aKhGRy6UA95TVq+Ghh6j500/wyCP0vKoTP5xL/QtO8mYcBbaIeIKmUHIrJgb69oVbb3XOovzyS5g8mf3nMv6r1WYcEfEUBXhuLFoE9erBu+/CE0/Ajh3QujWQ+aYbbcYREU9RgF+OX3+Frl0hPBzKl4dvvoHx4yHor3DWae8i4m2aA78U1sLs2U7nwDNnYMwYZ+QdGJju0swOJdb8t4h4igI8p375BR5+GJYuhZtughkznC6CWdAKExHxplxNoRhjOhpj9hlj9htjhnuqKJ+SmAiTJ0P9+s4Nytdfh/Xrsw1vERFvu+wRuDEmAJgEtAMOAZuNMYuttXs8VZzrvv/eWWGyfj20awfTpkGNGm5XJSIC5G4E3gzYb6390Vp7AZgHdPZMWS6Lj3duSjZqBFFRMHMmLF+u8BYRn5KbOfDKwP9SPD4E3Ji7cnzAjh3Quzds2wZ33w2TJkHFim5XJSKSjteXERpj+hljthhjthw/ftzbH3f5zp2Dp5+G0FCIjnaOOFu4UOEtIj4rNwEeDVyT4nGVpOdSsdZOt9aGWmtDy5f30ZPVN22CkBB48UXo3h327IF77nG7KhGRLOUmwDcDtYwxNYwxRYAHgMWeKSuPnDkDAwdCy5Zw9iwsWwbvvQdlyrhdmYhIti57DtxaG2+MeQxYDgQAM621uz1WmbetWAH9+jnruwcMgJdegpIl3a5KRCTHcrWRx1q7BFjioVryxsmTMHSoM9KuXdtZIhgW5nZVIiKXrGD1Qlm40Gk+9cEH8NRTsH27wltE/FbB2Ep/9Cg89hh8/LFzs3LpUmjSxO2qRERyJX+PwK2FWbOcUfdnn8HYsU7nQIW3iOQD+XcEfvAg9O/v3KwMC3OaT9VWK1cRyT/y3wg8MRHefBMaNHDWd7/1Fqxbp/AWkXwnf43Av/vOaT61cSN07AhTp0K1am5XJSLiFfljBB4X56zjbtwY9u6F99+HJUsU3iKSr/n/CHzbNujTx1kSeN99zvRJhQpuVyUi4nX+OwKPjYURI6BZM2eZ4MKFMH++wltECgz/HIFv2OCMur//3mn9+sorULq021WJiOQp/xqBnz7tbMhp1QouXICVK53lgQpvESmA/CfAly1zlgZOnuycCh8VBbfe6nZVIiKu8Y8plP79Yfp05yDhjRudU+FFRAo4/xiB16zpnJYTGanwFhFJ4h8j8GHD3K5ARMTn+McIXERE0lGAi4j4KQW4iIifUoCLiPgpBbiIiJ9SgIuI+CkFuIiIn1KAi4j4KWOtzbsPM+Y48PNlvrwc8JsHy/E2f6pXtXqPP9XrT7WCf9Wb21qrWWvLp30yTwM8N4wxW6y1oW7XkVP+VK9q9R5/qtefagX/qtdbtWoKRUTETynARUT8lD8F+HS3C7hE/lSvavUef6rXn2oF/6rXK7X6zRy4iIik5k8jcBERSUEBLiLip/wqwI0xE4wx3xljdhpjPjHGBLtdU1rGmI7GmH3GmP3GmOFu15MVY8w1xpi1xpg9xpjdxphBbteUHWNMgDEm0hjzmdu1ZMcYE2yMWZD0b3avMcZnj5MyxgxJ+jewyxgz1xhTzO2aUjLGzDTGHDPG7ErxXBljzEpjzA9Jf/rE6eaZ1OqV7PKrAAdWAg2stY2A74ERLteTijEmAJgEdALqAd2MMfXcrSpL8cDj1tp6QHNggI/XCzAI2Ot2ETk0EVhmra0DNMZH6zbGVAYGAqHW2gZAAPCAu1Wl8x7QMc1zw4HV1tpawOqkx77gPdLX6pXs8qsAt9ausNbGJz38GqjiZj0ZaAbst9b+aK29AMwDOrtcU6astUestduSvj6NEzCV3a0qc8aYKsDtwDtu15IdY8yVQGtgBoC19oK1NsbVorJWGAgyxhQGigOHXa4nFWvtOuD3NE93BmYlfT0LCM/LmjKTUa3eyi6/CvA0egNL3S4ijcrA/1I8PoQPB2JKxpjqQAjwjculZOV14Akg0eU6cqIGcBx4N2nK5x1jzBVuF5URa2008ArwC3AE+MNau8LdqnKkgrX2SNLXR4EKbhZzCTyWXT4X4MaYVUnzcGn/65zimpE4v/7Pdq/S/MMYUwL4GBhsrT3ldj0ZMcbcARyz1m51u5YcKgxcD0yx1oYAf+I7v+KnkjR33Bnn/3QqAVcYY/7pblWXxjrroX1+TbSns8vnTqW31t6a1feNMb2AO4C21vcWsUcD16R4XCXpOZ9ljAnECe/Z1tqFbteThTDgLmPMbUAxoJQx5r/WWl8NmkPAIWtt8m80C/DRAAduBX6y1h4HMMYsBFoA/3W1quz9aoypaK09YoypCBxzu6CseCO7fG4EnhVjTEecX6HvstaedbueDGwGahljahhjiuDcCFrsck2ZMsYYnDnavdbaV92uJyvW2hHW2irW2uo4f69rfDi8sdYeBf5njKmd9FRbYI+LJWXlF6C5MaZ40r+JtvjoDdc0FgM9k77uCSxysZYseSu7/GonpjFmP1AUOJH01NfW2oddLCmdpBHi6zh38mdaa190t6LMGWNaAuuBKP6aV37KWrvEvaqyZ4y5Gfi3tfYOl0vJkjGmCc4N1yLAj8CD1tqTrhaVCWPMc0BXnF/vI4G+1trz7lb1F2PMXOBmnLasvwKjgAhgPlAVp031/dbatDc681wmtY7AC9nlVwEuIiJ/8aspFBER+YsCXETETynARUT8lAJcRMRPKcBFRPyUAlxExE8pwEVE/NT/AyPICGk3qh+gAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(x, y , label=\"data\")\n", "plt.plot(xFit, yFit , label=\"model\",color=\"r\")\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preprocessing Data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data format\n", "\n", "Scikit Learn can take as an input (*i.e.* passed to `fit` and `predict`) several format including:\n", "* Numpy arrays. **Warning:** the data *has* to be **2D** even if there is only one example or one feature.\n", "* Pandas dataframes.\n", "* SciPy sparse matrices.\n", "\n", "The *examples/samples* of the datasets are stored as *rows*.
\n", "The *features* are the *columns*.\n", "\n", "### Training/Testing sets\n", "\n", "In order to *cross-validate* our model, it is customary to split the dataset into training and testing subsets. It can be done manually but there is also a dedicated method." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "\n", "xTrain, xTest, yTrain, yTest = train_test_split(x,y)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(37, 1) (37, 1)\n", "(13, 1) (13, 1)\n" ] } ], "source": [ "print(xTrain.shape,yTrain.shape)\n", "print(xTest.shape,yTest.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us use cross validation to compare linear model and linear model with intercept." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Testing Error with intercept: 3.3147332805841767 \t without intercept: 7.139630719389703\n" ] } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "model1 = LinearRegression(fit_intercept=True)\n", "model2 = LinearRegression(fit_intercept=False)\n", "\n", "model1.fit(xTrain,yTrain)\n", "yPre1 = model1.predict(xTest)\n", "error1 = np.linalg.norm(yTest-yPre1)\n", "\n", "model2.fit(xTrain,yTrain)\n", "yPre2 = model2.predict(xTest)\n", "error2 = np.linalg.norm(yTest-yPre2)\n", "\n", "print(\"Testing Error with intercept:\", error1, \"\\t without intercept:\" ,error2)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(xTrain, yTrain , label=\"Train data\")\n", "plt.scatter(xTest, yTest , color= 'k' , label=\"Test data\")\n", "plt.plot(xTest, yPre1 , color='r', label=\"model w/ intercept (err = {:.1f})\".format(error1))\n", "plt.plot(xTest, yPre2 , color='m', label=\"model w/o intercept (err = {:.1f})\".format(error2))\n", "plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Performance metrics\n", "\n", "In order to quantitatively evaluate the models, Scikit Learn provide a wide range of [metrics](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics), we will see some of them in the following examples." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "position": { "height": "462px", "left": "1160px", "right": "47px", "top": "174px", "width": "553px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }