{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression\n", "\n", "By Jen Selby and Carl Shan\n", "\n", "This Jupyter Notebook will introduce to you to how to make a Linear Regression model using the Sci-kit Learn (aka `sklearn`) Python library.\n", "\n", "You can see basic example here:\n", "> http://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares\n", "\n", "and full documentation of the sklearn linear_model module here:\n", "> http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html\n", "\n", "# Instructions\n", "\n", "0. Make sure you've read and learned a bit about the Linear Regression model. [Click here for course notes.](https://jennselby.github.io/MachineLearningCourseNotes/#linear-regression)\n", "1. Read through the instructions and code behind the following sections:\n", "\n", " * [Setup](#Setup)\n", " * [Fake Data Generation](#Fake-Data-Generation)\n", " * [Training](#Training)\n", " * [Results and Visualization](#Results-and-Visualization)\n", "2. Then, pick and complete at least one of the set of exercises (Standard or Advanced) and write code that answers each set of questions.\n", " * [Option 1 - Standard Difficulty](#Exercise-Option-#1---Standard-Difficulty)\n", " * [Option 2 - Standard Difficulty](#Exercise-Option-#2---Standard-Difficulty)\n", " * [Option 3 - Advanced Difficulty](#Exercise-Option-#3---Advanced-Difficulty)\n", " * [Option 4 - Advanced Difficulty](#Exercise-Option-#4---Advanced-Difficulty)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "First, make sure you have installed all of the necessary Python libraries, following [the instructions here](https://jennselby.github.io/MachineLearningCourseNotes/#setting-up-python3).\n", "\n", "You should have `sklearn`, `numpy`, `matplotlib` and `pandas` installed.\n", "\n", "If you haven't installed them, use `pip install ` to install them in your Terminal.\n", "\n", "Next, we want to make sure we can display our graphs in this notebook and import all of the libraries we'll need into the notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# We're going to be doing some plotting, and we want to be able to see these plots.\n", "# To display graphs in this notebook, run this cell.\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# We're now going to import some important libraries\n", "\n", "import numpy.random # for generating a noisy data set\n", "from sklearn import linear_model # for training a linear model\n", "\n", "import matplotlib.pyplot # for plotting in general\n", "from mpl_toolkits.mplot3d import Axes3D # for 3D plotting\n", "\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fake Data Generation\n", "\n", "We're going to generate some fake data to test out our ideas about linear regression. These constant variables decide some of the characteristics of our data: the `x` range (which will also be used to set the size of the graph later) and how many inputs we should generate." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Setting the limits and number of our first, X, variable\n", "\n", "MIN_X = -10\n", "MAX_X = 10\n", "NUM_INPUTS = 50" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fake Dataset 1 - Single x Variable\n", "\n", "Our first dataset has just one input feature. We are going to pick out 50 random real numbers between our min and max. Then, we will generate one output for each of these inputs following the function $y = 0.3x + 1$." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 6.2042874 ]\n", " [-6.44558848]\n", " [ 6.15347981]\n", " [-5.84754016]\n", " [ 1.43601348]\n", " [-4.31411709]\n", " [-9.82494627]\n", " [ 8.48626601]\n", " [-7.62915955]\n", " [-3.29137353]\n", " [-9.99398847]\n", " [-8.37608792]\n", " [ 5.07202459]\n", " [ 5.50636949]\n", " [ 6.09568009]\n", " [-4.30089789]\n", " [-8.88273978]\n", " [ 9.12468103]\n", " [-7.73938696]\n", " [-9.33474834]\n", " [-3.49694032]\n", " [-8.9676608 ]\n", " [-2.80176355]\n", " [-5.03206763]\n", " [-0.68356 ]\n", " [ 1.73552019]\n", " [ 7.9379289 ]\n", " [-7.70543788]\n", " [-1.45995305]\n", " [ 5.09314035]\n", " [ 5.99847056]\n", " [ 3.34302821]\n", " [-8.10582136]\n", " [-2.26602336]\n", " [-2.27335965]\n", " [-4.09892983]\n", " [-8.99217476]\n", " [ 8.90280292]\n", " [-8.6455045 ]\n", " [-4.26283741]\n", " [ 0.11768981]\n", " [ 5.15041637]\n", " [ 8.15758258]\n", " [-5.45726117]\n", " [-6.92202854]\n", " [-9.78166627]\n", " [ 5.57196798]\n", " [ 4.4655849 ]\n", " [ 3.24344148]\n", " [ 5.48035288]]\n" ] } ], "source": [ "# randomly pick numbers for x\n", "x_one_x = numpy.random.uniform(low=MIN_X, high=MAX_X, size=(NUM_INPUTS, 1))\n", "\n", "print(x_one_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's store this data into a `pandas` `DataFrame` object and name the column `'x'`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x
06.204287
1-6.445588
26.153480
3-5.847540
41.436013
\n", "
" ], "text/plain": [ " x\n", "0 6.204287\n", "1 -6.445588\n", "2 6.153480\n", "3 -5.847540\n", "4 1.436013" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_one_x = pd.DataFrame(data=x_one_x, columns=['x'])\n", "data_one_x.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cool. Now we have some fake `x` data.\n", "\n", "Let's make the fake `y` data now.\n", "\n", "Let's try to make data that follows the equation: $y = 0.3x + 1$." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "data_one_x['y'] = 0.3 * data_one_x['x'] + 1" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data_one_x.plot.scatter(x='x', y='y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay. That looks *too* perfect.\n", "\n", "Most data in the real world look less linear than that.\n", "\n", "So let's add a little bit of noise. Noise are random pertubations to your data that happens naturally in the real world. We will simulate some noise.\n", "\n", "Otherwise our linear model will be too easy.\n", "\n", "**Note:** We can generate some noise by picking numbers in a [normal distribution (also called bell curve)](http://www.statisticshowto.com/probability-and-statistics/normal-distributions/) around zero." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# First, let's create some noise to make our data a little bit more spread out.\n", "\n", "# generate some normally distributed noise\n", "noise_one_x = numpy.random.normal(size=NUM_INPUTS)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Now let's create the 'y' variable\n", "# It turns out you can make a new column in pandas just by doing the below.\n", "# It's so simple!\n", "data_one_x['y'] = data_one_x['y'] + noise_one_x" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data_one_x.plot.scatter(x='x', y='y')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great!\n", "\n", "This looks more like real data now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training\n", "\n", "Now that we have our data, we can train our model to find the best fit line. We will use the linear model module from the scikit-learn library to do this.\n", "\n", "Note: you may get a warning about LAPACK. According to [this discussion on the scikit-learn github page](https://github.com/scipy/scipy/issues/5998), this is safe to ignore." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# This creates an \"empty\" linear model\n", "\n", "model_one_x = linear_model.LinearRegression()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we need to reshape our data.\n", "\n", "Currently, our data looks like the following:\n", "\n", "```python\n", "# data_one_x['x'] looks like\n", "[-3.44342026, 9.60082542, 4.99683803, 7.11339915, 9.69287893, ...]\n", "\n", "```\n", "\n", "In other words, it's just a list.\n", "\n", "However, this isn't sufficient.\n", "\n", "That's because later on, we will use a command called `.fit()` and this command expects our data to look like a list of lists.\n", "\n", "For example:\n", "\n", "```python\n", "[[-3.44342026],\n", "[ 9.60082542],\n", "[ 4.99683803],\n", "[ 7.11339915],\n", "[ 9.69287893],\n", "[-5.1383316 ],\n", "[ 8.96638209],\n", "...\n", "[-9.12492363]]\n", "```\n", "\n", "We will use a the command `.reshape()`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Run this code\n", "x_one_x = data_one_x['x'].values.reshape(-1, 1)\n", "y_one_x = data_one_x['y'].values.reshape(-1, 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There we go. Now we can \"fit\" the data.\n", "\n", "\"Fitting\" the data means to give the \"empty model\" real data and ask it to find the \"best parameters\" that \"best fits\" the data.\n", "\n", "Using the amazing `sklearn` library, it's as easy as running the `.fit()` command.\n", "\n", "Note: you may get a warning about LAPACK. According to [this discussion on the scikit-learn github page](https://github.com/scipy/scipy/issues/5998), this is safe to ignore." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Run this code\n", "model_one_x.fit(X=x_one_x, y=y_one_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results and Visualization\n", "\n", "Now, let's see what our model learned. We can look at the results numerically:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def print_model_fit(model):\n", " # Print out the parameters for the best fit line\n", " print('Intercept: {i} Coefficients: {c}'.format(i=model.intercept_, c=model.coef_))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intercept: [1.11825823] Coefficients: [[0.32606958]]\n" ] } ], "source": [ "print_model_fit(model_one_x)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.71719265]\n", " [1.33346415]\n", " [1.76387599]]\n" ] } ], "source": [ "## How would this model make predictions?\n", "\n", "# Let's make some new data that have the following values and see how to predict their corresponding 'y' values.\n", "\n", "# Print out the model's guesses for some values of x\n", "new_x_values = [ [-1.23], [0.66], [1.98] ]\n", "\n", "predictions = model_one_x.predict(new_x_values)\n", "\n", "print(predictions)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model prediction for -1.23: [0.71719265]\n", "Model prediction for 0.66: [1.33346415]\n", "Model prediction for 1.98: [1.76387599]\n" ] } ], "source": [ "# Let's print them a little bit nicer\n", "for datapoint, prediction in zip(new_x_values, predictions):\n", " print('Model prediction for {}: {}'.format(datapoint[0], prediction))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also look at them graphically." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def plot_best_fit_line(model, x, y):\n", " # create the figure\n", " fig = matplotlib.pyplot.figure(1)\n", " fig.suptitle('Data and Best-Fit Line')\n", " matplotlib.pyplot.xlabel('x values')\n", " matplotlib.pyplot.ylabel('y values')\n", "\n", " # put the generated dataset points on the graph\n", " matplotlib.pyplot.scatter(x, y)\n", " \n", " # Now we actually want to plot the best-fit line.\n", " # To simulate that, we'll simply generate all the\n", " # inputs on the graph and plot that.\n", " # predict for inputs along the graph to find the best-fit line\n", " X = numpy.linspace(MIN_X, MAX_X) # generates all the possible values of x\n", " Y = model.predict(list(zip(X)))\n", " matplotlib.pyplot.plot(X, Y)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_best_fit_line(model_one_x, x_one_x, y_one_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Option #1 - Standard Difficulty\n", "\n", "Answer the following questions about dataset 1:\n", "1. Take a look at the output of the `print_model_fit()` function in the \"Results and Visualization\" section above. What numbers did you expect to see printed if the linear regression code was working, and why?\n", "1. What numbers did you expect the model to predict when we gave it our new x values, -1.23, 0.66, and 1.98, and why?\n", "1. What did you expect to see on the graph if the linear regression code was working, and why?\n", "1. Pick some lines of code that you could change to continue testing that the linear regression worked properly. What lines did you choose and how did you change them? How did the output change, and why does that tell you that the code is working correctly?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fake Dataset 2 - Two x Values\n", "\n", "Let's look at a dataset has two inputs, like [the tree example in our notes](https://jennselby.github.io/MachineLearningCourseNotes/#linear-regression).\n", "\n", "**NOTE**: This will make it a littler harder to visualize, particularly because you cannot rotate the graph interactively in the Jupyter notebook. If you are interested in looking more closely at this graph, you can copy the code below in the next several cells into a file and run it through Python normally. This will open a graph window that will allow you to drag to rotate the graph." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# generate some normally distributed noise\n", "noise_two_x = numpy.random.normal(size=NUM_INPUTS)\n", "\n", "# randomly pick pairs of numbers for x\n", "x1_two_x = numpy.random.uniform(low=MIN_X, high=MAX_X, size=NUM_INPUTS)\n", "x2_two_x = numpy.random.uniform(low=MIN_X, high=MAX_X, size=NUM_INPUTS)\n", "\n", "y_two_x = 0.5 * x1_two_x - 2.7 * x2_two_x - 2 + noise_two_x" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "data_two_x = pd.DataFrame(data=x1_two_x, columns = ['x1'])" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "data_two_x['x2'] = x2_two_x\n", "data_two_x['y'] = y_two_x" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
x1x2y
02.5575844.564261-13.463797
1-0.8815043.287239-12.344193
2-0.2316919.680882-27.295643
38.2841731.276975-2.805065
43.5427962.611530-7.216126
\n", "
" ], "text/plain": [ " x1 x2 y\n", "0 2.557584 4.564261 -13.463797\n", "1 -0.881504 3.287239 -12.344193\n", "2 -0.231691 9.680882 -27.295643\n", "3 8.284173 1.276975 -2.805065\n", "4 3.542796 2.611530 -7.216126" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_two_x.head()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intercept: -2.0817409227903294 Coefficients: [ 0.48804114 -2.72217748]\n" ] } ], "source": [ "# use scikit-learn's linear regression model and fit to our data\n", "model_two_x = linear_model.LinearRegression()\n", "model_two_x.fit(data_two_x[['x1', 'x2']], data_two_x['y'])\n", "\n", "# Print out the parameters for the best fit plane\n", "print_model_fit(model_two_x)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "## Now create a function that can plot in 3D\n", "\n", "def plot_3d(model, x1, x2, y):\n", " # 3D Plot\n", " # create the figure\n", " fig = matplotlib.pyplot.figure(1)\n", " fig.suptitle('3D Data and Best-Fit Plane')\n", " \n", " # get the current axes, and tell them to do a 3D projection\n", " axes = fig.gca(projection='3d')\n", " axes.set_xlabel('x1')\n", " axes.set_ylabel('x2')\n", " axes.set_zlabel('y')\n", " \n", " \n", " # put the generated points on the graph\n", " axes.scatter(x1, x2, y)\n", "\n", " # predict for input points across the graph to find the best-fit plane\n", " # and arrange them into a grid for matplotlib\n", " X1 = X2 = numpy.arange(MIN_X, MAX_X, 0.05)\n", " X1, X2 = numpy.meshgrid(X1, X2)\n", " Y = numpy.array(model.predict(list(zip(X1.flatten(), X2.flatten())))).reshape(X1.shape)\n", "\n", " # put the predicted plane on the graph\n", " axes.plot_surface(X1, X2, Y, alpha=0.1)\n", "\n", " # show the plots\n", " matplotlib.pyplot.show()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Now let's use the function\n", "plot_3d(model_two_x, x1_two_x, x2_two_x, y_two_x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Option #2 - Standard Difficulty\n", "\n", "Now, answer the following questions about [Fake Dataset 2](#Fake-Dataset-2---Two-x-Values):\n", "1. Take a look at the output of the `print_model_fit()` function for this above dataset. What output did you expect to see printed if the linear regression code was working, and why?\n", "1. What did you expect to see on the graph if the linear regression code was working, and why?\n", "1. Pick some lines of code that you could change to continue testing that the linear regression worked properly. What lines did you choose and how did you change them? How did the output change, and why does that tell you that the code is working correctly?\n", "1. Explain any differences you noticed between working with dataset 1 and dataset 2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fake Dataset 3 - Quadratic\n", "\n", "The new equation we'll try to model is $y = 0.7x^2 - 0.4x + 1.5$.\n", "\n", "\n", "This dataset still just has one input, so the code is very similar to our first one. However, now the generating function is quadratic, so this one will be trickier to deal with.\n", "\n", "Again, we'll go through dataset generation, training, and visualization." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# randomly pick numbers for x\n", "x_quadratic = numpy.random.uniform(low=MIN_X, high=MAX_X, size=(NUM_INPUTS, 1))\n", "\n", "data_quadratic = pd.DataFrame(data=x_quadratic, columns=['x'])" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Let's create some noise to make our data a little bit more spread out.\n", "# generate some normally distributed noise\n", "noise_quadratic = numpy.random.normal(size=NUM_INPUTS)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Let's generate the y values\n", "# Our equation:\n", "# y = 0.7x^2 - 0.4x + 1.5\n", "data_quadratic['y'] = 0.7 * data_quadratic['x'] * data_quadratic['x'] - 0.4 * data_quadratic['x'] + 1.5 + noise_quadratic\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Intercept: [26.1204316] Coefficients: [[-0.22266468]]\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# get a 1D array of the input data\n", "x_quadratic = data_quadratic['x'].values.reshape(-1, 1)\n", "y_quadratic = data_quadratic['y'].values.reshape(-1, 1)\n", "\n", "# Let's try use scikit-learn's linear regression model and fit to our data\n", "model_quadratic = linear_model.LinearRegression()\n", "model_quadratic.fit(x_quadratic, y_quadratic)\n", "\n", "# show results\n", "print_model_fit(model_quadratic)\n", "plot_best_fit_line(model_quadratic, x_quadratic, y_quadratic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Option #3 - Advanced Difficulty\n", "\n", "First, look over and understand the data for [Fake Dataset 3](#Fake-Dataset-3---Quadratic).\n", "\n", "There are some issues here. Clearly the linear model that we have isn't working great.\n", "\n", "Your challenge is to write some new code that will better fit a linear model to this data. There are a couple different ways to do this, but all of them will involve some new code. If you have ideas but just aren't sure how to translate them into code, please ask for help!" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "### Your code here\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise Option #4 - Advanced Difficulty\n", "\n", "Try adding some [regularization](https://jennselby.github.io/MachineLearningCourseNotes/#regularization-ridge-lasso-and-elastic-net) to your linear regression model. This will get you some practice in using the sci-kit learn documentation to find new functions and figure out how to use them.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "### Your code here\n", "\n", "\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }