{ "cells": [ { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "db86c5af-b769-4d51-afe0-c5b27884bf3b" } }, "source": [ "# Least-Squares Regression Workbook \n", "## CH EN 2450 - Numerical Methods\n", "**Prof. Tony Saad (www.tsaad.net)
Department of Chemical Engineering
University of Utah**" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "59bc4550-5ef3-4891-8817-9b7525d8d84d" } }, "source": [ "# Example 1\n", "The height of a person vs femur length is given by the following data:\n", "\n", "| femur length (cm) | height (cm) |\n", "| :----------------: |:-----------: |\n", "| 40\t | 163|\n", "|41\t|165|\n", "|43\t|167|\n", "|43\t|164|\n", "|44\t|168|\n", "|44\t|169|\n", "|45\t|170|\n", "|45\t|167|\n", "|48\t|170|\n", "|51\t|175|\n", "\n", "Plot this data and then perform regression to a line and a quadratic using the standard derivation as well as the normal equations. Also use `numpy`'s `polyfit` and compare to you solutions." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "2af0a0eb-9013-4aca-a8f6-f47bde2594c9" } }, "source": [ "Let's first get the boiler plate out of the way and import matplotlib and numpy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "nbpresent": { "id": "acff7541-c478-484b-be35-84becaf643e8" } }, "outputs": [], "source": [ "%matplotlib inline\n", "%config InlineBackend.figure_format = 'svg'\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "ce4a31b8-73b5-499f-b0a7-b287eb4b5bba" } }, "source": [ "Now place the input data in `numpy` arrays. We can enter these manually or import from a text file." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "nbpresent": { "id": "ab215b66-b879-4f0c-bf43-87ba09d16247" } }, "outputs": [], "source": [ "xi = np.array([40,41,43,43,44,44,45,45,48,51])\n", "yi = np.array([163,165,167,164,168,169,170,167,170,175])" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "e2adf352-d57b-461c-9d45-a1b3f47b2a23" } }, "source": [ "Now plot the data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "nbpresent": { "id": "732e1de5-36c8-45bc-a0e6-8033b5538339" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2022-08-28T19:40:43.206383\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.5.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(xi,yi,'ro')\n", "plt.xlabel('Femur length (cm)')\n", "plt.ylabel('Person\\' height (cm)')\n", "plt.title('Height vs femur length')\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "dca69fd3-5aee-4ee9-a1ad-5e55d7e0a196" } }, "source": [ "## Regression to a straight line\n", "Here, we regress the data to a straight line $a_0 + a_1 x$. Recall that, for regression to a straight line, we must solve the following system of equations:\n", "\\begin{equation}\n", "\\label{eq:regression-line}\n", "\\left[\n", "\\begin{array}{*{20}{c}}\n", "N& \\sum{x_i}\\\\\n", "\\sum{x_i} &\\sum{x_i^2}\n", "\\end{array}\n", "\\right]\n", "\\left(\n", "\\begin{array}{*{20}{c}}\n", "a_0\\\\\n", "a_1\n", "\\end{array} \\right)\n", "= \\left( \n", "\\begin{array}{*{20}{c}}\n", "\\sum{y_i}\\\\\n", "\\sum {x_i y_i}\n", "\\end{array} \n", "\\right)\n", "\\end{equation}" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "c3ecee70-b5bb-42bf-87f3-024869e268f1" } }, "source": [ "first build the coefficient matrix as a list of lists [ [a,b], [c,d]]. We will use `numpy.sum` to compute the various summations in the system \\eqref{eq:regression-line}." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "nbpresent": { "id": "f51d4697-448b-48a2-a1d2-384685c2f3f7" } }, "outputs": [], "source": [ "N = len(xi)\n", "A = np.array([[N, np.sum(xi)],\n", " [np.sum(xi), np.sum(xi**2)]])" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "a00d339d-8502-4b5e-a3ce-361154353ac0" } }, "source": [ "next, construct the right-hand-side using a 1D numpy array" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "nbpresent": { "id": "e8bad887-71cb-46df-9359-1958f1d32d7c" } }, "outputs": [], "source": [ "b = np.array([np.sum(yi), np.sum(xi*yi)])" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "af48547e-4860-4271-925d-45c7e2651e3f" } }, "source": [ "now find the solution of $[\\mathbf{A}]\\mathbf{a} = \\mathbf{b}$, where $a = (a_0, a_1)$ are the coefficients of the regressed line. Use `numpy`'s built-in linear solver." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "nbpresent": { "id": "3b498fcd-5f23-49d8-88c6-1273e612d8b8" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[123.20779221 1.004329 ]\n" ] } ], "source": [ "sol = np.linalg.solve(A,b)\n", "# print the solution. Note that in this case, the solution should contain two values, a0 and a1, respectively.\n", "print(sol)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "0964208c-5b97-4832-9474-2cd06a97bd0a" } }, "source": [ "The variable `sol` contains the solution of the system of equations. It is a list of two entries corresponding to the coefficients $a_0$ and $a_1$ of the regressed line." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "b3162bda-d35e-41b0-88fb-abad44b694fd" } }, "source": [ "We can now plot the line, $a_0 + a_1 x$ that we just fitted into the data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "nbpresent": { "id": "d9ca569b-9b76-4e43-8144-9c270c5d3fc0" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2022-08-28T19:40:43.325674\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.5.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# first get the coefficients a0 and a1 from the variable sol\n", "a0 = sol[0]\n", "a1 = sol[1]\n", "\n", "# construct a function f(x) for the regressed line\n", "y_line = lambda x: a0 + a1*x\n", "\n", "# now plot the regressed line as a function of the input data xi\n", "plt.plot(xi, y_line(xi), label='least-squares fit')\n", "# plot the original data\n", "plt.plot(xi, yi,'ro', label='original data')\n", "plt.xlabel('Femur length (cm)')\n", "plt.ylabel('Person\\' height (cm)')\n", "plt.title('Height vs femur length')\n", "plt.legend()\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Standard Error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The standard error of the model quantifies the spread of the data around the regression curve. It is given by\n", "\\begin{equation}\n", "S_{y/x} = \\sqrt{\\frac{S_r}{n-2}} = \\sqrt{\\frac{\\sum{(y_i - f_i)^2}}{n-2}}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "167.8\n", "-164.3103327124527\n", "1.4317065166379392\n" ] } ], "source": [ "ybar = np.average(yi)\n", "print(ybar)\n", "fi = y_line(xi)\n", "stdev = np.sqrt(np.sum((yi - ybar)**2)/(N-1))\n", "print(stdev - ybar)\n", "Sr = np.sum((yi-fi)**2)\n", "Syx = np.sqrt(Sr/(N-2))\n", "print(Syx)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "07602836-2df6-466b-bfbb-b8ce8b4198b1" } }, "source": [ "### R2 Value" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "0fadb3e3-b20b-4614-b697-7d6f11f3fbaa" } }, "source": [ "The $R^2$ value can be computed as:\n", "\\begin{equation}R^2 = 1 - \\frac{\\sum{(y_i - f_i)^2}}{\\sum{(y_i - \\bar{y})^2}}\\end{equation}\n", "where\n", "\\begin{equation}\n", "\\bar{y} = \\frac{1}{N}\\sum{y_i}\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "nbpresent": { "id": "99b84ec8-525d-45a0-96c5-97e479c7e378" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8503807627895225\n" ] } ], "source": [ "ybar = np.average(yi)\n", "fi = y_line(xi)\n", "rsq = 1 - np.sum( (yi - fi)**2 )/np.sum( (yi - ybar)**2 )\n", "print(rsq)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "7ffbfebf-4308-4100-a21f-6d450d639f2f" } }, "source": [ "To enable quick calculation of the R2 value for other functions, let's declare a function that computes the R2 value for an arbitrary model fit." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbpresent": { "id": "6f72de3a-cd1e-4b16-8371-c345d7f2bdf7" } }, "outputs": [], "source": [ "def rsquared(xi,yi,ymodel):\n", " '''\n", " xi: vector of length n representing the known x values.\n", " yi: vector of length n representing the known y values that correspond to xi.\n", " ymodel: a python function (of x only) that can be evaluated at xi and represents a model fit of \n", " the data (e.g. a regressed curve).\n", " '''\n", " ybar = np.average(yi)\n", " fi = ymodel(xi)\n", " result = 1 - np.sum( (yi - fi)**2 )/np.sum( (yi - ybar)**2 )\n", " return result" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "nbpresent": { "id": "21b5ccd3-68bd-46b2-9af4-ff8981534d7c" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8503807627895225\n" ] } ], "source": [ "rsq = rsquared(xi,yi,y_line)\n", "print(rsq)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "300ef303-812a-474a-92d4-f2746324ba28" } }, "source": [ "## Regression to a quadratic\n", "Here, we regress the data to a quadratic polynomial $a_0 + a_1 x + a_2 x^2$. Recall that, for regression to a quadratic, we must solve the following system of equations:\n", "\\begin{equation}\n", "\\label{eq:regression-line}\n", "\\left[\n", "\\begin{array}{*{20}{c}}\n", "N & \\sum{x_i} & \\sum{x_i^2}\\\\\n", "\\sum{x_i} &\\sum{x_i^2} & \\sum{x_i^3} \\\\\n", "\\sum{x_i^2} & \\sum{x_i^3} & \\sum{x_i^4}\n", "\\end{array}\n", "\\right]\n", "\\left(\n", "\\begin{array}{*{20}{c}}\n", "a_0\\\\\n", "a_1 \\\\\n", "a_2\n", "\\end{array} \\right)\n", "= \\left( \n", "\\begin{array}{*{20}{c}}\n", "\\sum{y_i}\\\\\n", "\\sum {x_i y_i}\\\\\n", "\\sum {x_i^2 y_i}\n", "\\end{array} \n", "\\right)\n", "\\end{equation}" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbpresent": { "id": "a9d898a6-ee44-42c1-9981-9720cdd0a721" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 10 444 19806]\n", " [ 444 19806 887796]\n", " [ 19806 887796 39994422]] [ 1678 74596 3331896]\n" ] } ], "source": [ "A = np.array([[len(xi), np.sum(xi) , np.sum(xi**2) ],\n", " [np.sum(xi), np.sum(xi**2), np.sum(xi**3)],\n", " [np.sum(xi**2), np.sum(xi**3), np.sum(xi**4)] ])\n", "b = np.array([np.sum(yi), np.sum(xi*yi), np.sum(xi*xi*yi)])\n", "print(A,b)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "nbpresent": { "id": "fc224dbb-beec-4581-82ca-c8bd0d976b29" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.28368173e+02 7.76379556e-01 2.50458155e-03]\n" ] } ], "source": [ "sol = np.linalg.solve(A,b)\n", "print(sol)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "nbpresent": { "id": "a0534903-3768-4383-b94b-92fff0564c06" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2022-08-28T19:40:43.440961\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.5.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "a0 = sol[0]\n", "a1 = sol[1]\n", "a2 = sol[2]\n", "y_quad = lambda x: a0 + a1*x + a2*x**2\n", "# now plot the regressed line as a function of the input data xi\n", "plt.plot(xi, y_quad(xi), label='least-squares fit (quadratic)')\n", "# plot the original data\n", "plt.plot(xi, yi,'ko', label='original data')\n", "plt.xlabel('Femur length (cm)')\n", "plt.ylabel('Person\\' height (cm)')\n", "plt.title('Height vs femur length')\n", "plt.legend()\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "d239f7f9-ca4c-40ee-b996-fa3adee2f994" } }, "source": [ "Let's now compute the R2 value for the quadratic fit" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "nbpresent": { "id": "d2d9cb7d-cea3-4a57-bf53-3aeb87b14e7f" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8504537705463826\n" ] } ], "source": [ "rsq = rsquared(xi,yi,y_quad)\n", "print(rsq)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "1a38a298-65f3-499e-9eb4-431149ec7464" } }, "source": [ "This value of 85% is not much different than the one we got with a straight line fit. The data is very likely to be distributed linearly." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "ecf3be42-4ab9-464e-8a75-0dfdb530b122" } }, "source": [ "## Using the normal equations\n", "Fit a straight line to the data using the normal equations" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "nbpresent": { "id": "bb35d348-82ad-4826-a21e-8150f5faa8ce" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 1. 40.]\n", " [ 1. 41.]\n", " [ 1. 43.]\n", " [ 1. 43.]\n", " [ 1. 44.]\n", " [ 1. 44.]\n", " [ 1. 45.]\n", " [ 1. 45.]\n", " [ 1. 48.]\n", " [ 1. 51.]]\n", "[123.20779221 1.004329 ]\n" ] } ], "source": [ "A = np.array([np.ones(len(xi)),xi]).T\n", "print(A)\n", "ATA = A.T @ A\n", "sol = np.linalg.solve(ATA,A.T@yi)\n", "print(sol)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "37f2ec84-b50c-4aec-ad0c-559a2cf2efa1" } }, "source": [ "For a quadratic fit using the normal equations" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "nbpresent": { "id": "a2cebedb-d56a-4c4f-a048-68229fd18b78" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1.28368173e+02 7.76379556e-01 2.50458155e-03]\n" ] } ], "source": [ "A = np.array([np.ones(len(xi)),xi,xi**2]).T\n", "ATA = A.T @ A\n", "sol = np.linalg.solve(ATA,A.T@yi)\n", "print(sol)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "2374f277-4ae9-4842-a415-ec84c61c750f" } }, "source": [ "## Using Numpy's Polyfit\n", "It is possible to repeat the previous analysis using `numpy's` `polyfit` function. Let's try it out and compare to our results" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "4152a896-950d-442c-94a3-332c3e3d4f72" } }, "source": [ "We first compare a straight line fit" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "nbpresent": { "id": "8a6985a7-38c7-4fab-9547-47b7288e931b" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.004329 123.20779221]\n" ] } ], "source": [ "coefs = np.polyfit(xi,yi,1)\n", "print(coefs)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "93e122ed-772c-46dd-b124-5286cc3f0fc2" } }, "source": [ "Indeed, the coefficients are the same as the ones we obtained from our straight-line fit. Polyfit returns the coefs sorted in reverse order." ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "8d56bfa1-f4aa-4980-b7b5-8c21aca135fe" } }, "source": [ "To use the data returned by `polyfit`, we can construct a polynomial using `poly1d` and simply pass the coefficients returned by `polyfit`" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "nbpresent": { "id": "2bea01ff-a1cb-44bf-946b-51af84670b6e" } }, "outputs": [], "source": [ "p = np.poly1d(coefs)" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "b7fabc50-9ef0-41a6-9720-03f3d1b92263" } }, "source": [ "no, simply call `p` on any value, say `p(43)`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "nbpresent": { "id": "82fde909-25c9-45b5-b136-692f62738f5b" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "166.39393939393938\n" ] } ], "source": [ "print(p(43))" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "3a22c7ea-2e29-493b-bc90-539e0bac09f6" } }, "source": [ "We can also plot the fit" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "nbpresent": { "id": "747a448a-75f0-49e2-a04f-8ea8767ddfa8" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2022-08-28T19:40:43.555987\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.5.1, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# now plot the regressed line as a function of the input data xi\n", "plt.plot(xi, p(xi), label='least-squares fit (quadratic)')\n", "# plot the original data\n", "plt.plot(xi, yi,'ko', label='original data')\n", "plt.xlabel('Femur length (cm)')\n", "plt.ylabel('Person\\' height (cm)')\n", "plt.title('Height vs femur length')\n", "plt.legend()\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "122e9579-fa22-40b7-9071-f377f6202506" } }, "source": [ "We can do the same for the quadratic fit - simply change the degree of the polynomial in `polyfit`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "nbpresent": { "id": "255ac291-12b7-400a-8415-8c6683edc504" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "166.383465689269\n" ] } ], "source": [ "coefs = np.polyfit(xi,yi,2)\n", "pquad = np.poly1d(coefs)\n", "print(pquad(43))" ] }, { "cell_type": "markdown", "metadata": { "nbpresent": { "id": "aa6babef-4eff-4c84-99a6-dadc1d240002" } }, "source": [ "## Using numpy's Least Squares" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "nbpresent": { "id": "06256be1-8054-48df-b64e-efebf3605271" } }, "outputs": [ { "data": { "text/plain": [ "(array([1.28368173e+02, 7.76379556e-01, 2.50458155e-03]),\n", " array([16.39026675]),\n", " 3,\n", " array([6.32567302e+03, 9.94238708e+00, 1.72620823e-02]))" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.lstsq(A,yi,rcond=None)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "nbpresent": { "id": "17b00c90-ab55-4e22-a85f-2f09e710ca04" } }, "outputs": [ { "data": { "text/html": [ "CSS style adapted from https://github.com/barbagroup/CFDPython. Copyright (c) Barba group\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "def css_styling():\n", " styles = open(\"../../styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "66px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }