{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "db86c5af-b769-4d51-afe0-c5b27884bf3b"
}
},
"source": [
"# Least-Squares Regression Workbook \n",
"## CH EN 2450 - Numerical Methods\n",
"**Prof. Tony Saad (www.tsaad.net ) Department of Chemical Engineering University of Utah**"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "59bc4550-5ef3-4891-8817-9b7525d8d84d"
}
},
"source": [
"# Example 1\n",
"The height of a person vs femur length is given by the following data:\n",
"\n",
"| femur length (cm) | height (cm) |\n",
"| :----------------: |:-----------: |\n",
"| 40\t | 163|\n",
"|41\t|165|\n",
"|43\t|167|\n",
"|43\t|164|\n",
"|44\t|168|\n",
"|44\t|169|\n",
"|45\t|170|\n",
"|45\t|167|\n",
"|48\t|170|\n",
"|51\t|175|\n",
"\n",
"Plot this data and then perform regression to a line and a quadratic using the standard derivation as well as the normal equations. Also use `numpy`'s `polyfit` and compare to you solutions."
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "2af0a0eb-9013-4aca-a8f6-f47bde2594c9"
}
},
"source": [
"Let's first get the boiler plate out of the way and import matplotlib and numpy"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"nbpresent": {
"id": "acff7541-c478-484b-be35-84becaf643e8"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'svg'\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "ce4a31b8-73b5-499f-b0a7-b287eb4b5bba"
}
},
"source": [
"Now place the input data in `numpy` arrays. We can enter these manually or import from a text file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"nbpresent": {
"id": "ab215b66-b879-4f0c-bf43-87ba09d16247"
}
},
"outputs": [],
"source": [
"xi = np.array([40,41,43,43,44,44,45,45,48,51])\n",
"yi = np.array([163,165,167,164,168,169,170,167,170,175])"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "e2adf352-d57b-461c-9d45-a1b3f47b2a23"
}
},
"source": [
"Now plot the data"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"nbpresent": {
"id": "732e1de5-36c8-45bc-a0e6-8033b5538339"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" 2022-08-28T19:40:43.206383 \n",
" image/svg+xml \n",
" \n",
" \n",
" Matplotlib v3.5.1, https://matplotlib.org/ \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.plot(xi,yi,'ro')\n",
"plt.xlabel('Femur length (cm)')\n",
"plt.ylabel('Person\\' height (cm)')\n",
"plt.title('Height vs femur length')\n",
"plt.grid()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "dca69fd3-5aee-4ee9-a1ad-5e55d7e0a196"
}
},
"source": [
"## Regression to a straight line\n",
"Here, we regress the data to a straight line $a_0 + a_1 x$. Recall that, for regression to a straight line, we must solve the following system of equations:\n",
"\\begin{equation}\n",
"\\label{eq:regression-line}\n",
"\\left[\n",
"\\begin{array}{*{20}{c}}\n",
"N& \\sum{x_i}\\\\\n",
"\\sum{x_i} &\\sum{x_i^2}\n",
"\\end{array}\n",
"\\right]\n",
"\\left(\n",
"\\begin{array}{*{20}{c}}\n",
"a_0\\\\\n",
"a_1\n",
"\\end{array} \\right)\n",
"= \\left( \n",
"\\begin{array}{*{20}{c}}\n",
"\\sum{y_i}\\\\\n",
"\\sum {x_i y_i}\n",
"\\end{array} \n",
"\\right)\n",
"\\end{equation}"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "c3ecee70-b5bb-42bf-87f3-024869e268f1"
}
},
"source": [
"first build the coefficient matrix as a list of lists [ [a,b], [c,d]]. We will use `numpy.sum` to compute the various summations in the system \\eqref{eq:regression-line}."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"nbpresent": {
"id": "f51d4697-448b-48a2-a1d2-384685c2f3f7"
}
},
"outputs": [],
"source": [
"N = len(xi)\n",
"A = np.array([[N, np.sum(xi)],\n",
" [np.sum(xi), np.sum(xi**2)]])"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "a00d339d-8502-4b5e-a3ce-361154353ac0"
}
},
"source": [
"next, construct the right-hand-side using a 1D numpy array"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"nbpresent": {
"id": "e8bad887-71cb-46df-9359-1958f1d32d7c"
}
},
"outputs": [],
"source": [
"b = np.array([np.sum(yi), np.sum(xi*yi)])"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "af48547e-4860-4271-925d-45c7e2651e3f"
}
},
"source": [
"now find the solution of $[\\mathbf{A}]\\mathbf{a} = \\mathbf{b}$, where $a = (a_0, a_1)$ are the coefficients of the regressed line. Use `numpy`'s built-in linear solver."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"nbpresent": {
"id": "3b498fcd-5f23-49d8-88c6-1273e612d8b8"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[123.20779221 1.004329 ]\n"
]
}
],
"source": [
"sol = np.linalg.solve(A,b)\n",
"# print the solution. Note that in this case, the solution should contain two values, a0 and a1, respectively.\n",
"print(sol)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "0964208c-5b97-4832-9474-2cd06a97bd0a"
}
},
"source": [
"The variable `sol` contains the solution of the system of equations. It is a list of two entries corresponding to the coefficients $a_0$ and $a_1$ of the regressed line."
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "b3162bda-d35e-41b0-88fb-abad44b694fd"
}
},
"source": [
"We can now plot the line, $a_0 + a_1 x$ that we just fitted into the data"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"nbpresent": {
"id": "d9ca569b-9b76-4e43-8144-9c270c5d3fc0"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" 2022-08-28T19:40:43.325674 \n",
" image/svg+xml \n",
" \n",
" \n",
" Matplotlib v3.5.1, https://matplotlib.org/ \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# first get the coefficients a0 and a1 from the variable sol\n",
"a0 = sol[0]\n",
"a1 = sol[1]\n",
"\n",
"# construct a function f(x) for the regressed line\n",
"y_line = lambda x: a0 + a1*x\n",
"\n",
"# now plot the regressed line as a function of the input data xi\n",
"plt.plot(xi, y_line(xi), label='least-squares fit')\n",
"# plot the original data\n",
"plt.plot(xi, yi,'ro', label='original data')\n",
"plt.xlabel('Femur length (cm)')\n",
"plt.ylabel('Person\\' height (cm)')\n",
"plt.title('Height vs femur length')\n",
"plt.legend()\n",
"plt.grid()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Standard Error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The standard error of the model quantifies the spread of the data around the regression curve. It is given by\n",
"\\begin{equation}\n",
"S_{y/x} = \\sqrt{\\frac{S_r}{n-2}} = \\sqrt{\\frac{\\sum{(y_i - f_i)^2}}{n-2}}\n",
"\\end{equation}"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"167.8\n",
"-164.3103327124527\n",
"1.4317065166379392\n"
]
}
],
"source": [
"ybar = np.average(yi)\n",
"print(ybar)\n",
"fi = y_line(xi)\n",
"stdev = np.sqrt(np.sum((yi - ybar)**2)/(N-1))\n",
"print(stdev - ybar)\n",
"Sr = np.sum((yi-fi)**2)\n",
"Syx = np.sqrt(Sr/(N-2))\n",
"print(Syx)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "07602836-2df6-466b-bfbb-b8ce8b4198b1"
}
},
"source": [
"### R2 Value"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "0fadb3e3-b20b-4614-b697-7d6f11f3fbaa"
}
},
"source": [
"The $R^2$ value can be computed as:\n",
"\\begin{equation}R^2 = 1 - \\frac{\\sum{(y_i - f_i)^2}}{\\sum{(y_i - \\bar{y})^2}}\\end{equation}\n",
"where\n",
"\\begin{equation}\n",
"\\bar{y} = \\frac{1}{N}\\sum{y_i}\n",
"\\end{equation}"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"nbpresent": {
"id": "99b84ec8-525d-45a0-96c5-97e479c7e378"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.8503807627895225\n"
]
}
],
"source": [
"ybar = np.average(yi)\n",
"fi = y_line(xi)\n",
"rsq = 1 - np.sum( (yi - fi)**2 )/np.sum( (yi - ybar)**2 )\n",
"print(rsq)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "7ffbfebf-4308-4100-a21f-6d450d639f2f"
}
},
"source": [
"To enable quick calculation of the R2 value for other functions, let's declare a function that computes the R2 value for an arbitrary model fit."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"nbpresent": {
"id": "6f72de3a-cd1e-4b16-8371-c345d7f2bdf7"
}
},
"outputs": [],
"source": [
"def rsquared(xi,yi,ymodel):\n",
" '''\n",
" xi: vector of length n representing the known x values.\n",
" yi: vector of length n representing the known y values that correspond to xi.\n",
" ymodel: a python function (of x only) that can be evaluated at xi and represents a model fit of \n",
" the data (e.g. a regressed curve).\n",
" '''\n",
" ybar = np.average(yi)\n",
" fi = ymodel(xi)\n",
" result = 1 - np.sum( (yi - fi)**2 )/np.sum( (yi - ybar)**2 )\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"nbpresent": {
"id": "21b5ccd3-68bd-46b2-9af4-ff8981534d7c"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.8503807627895225\n"
]
}
],
"source": [
"rsq = rsquared(xi,yi,y_line)\n",
"print(rsq)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "300ef303-812a-474a-92d4-f2746324ba28"
}
},
"source": [
"## Regression to a quadratic\n",
"Here, we regress the data to a quadratic polynomial $a_0 + a_1 x + a_2 x^2$. Recall that, for regression to a quadratic, we must solve the following system of equations:\n",
"\\begin{equation}\n",
"\\label{eq:regression-line}\n",
"\\left[\n",
"\\begin{array}{*{20}{c}}\n",
"N & \\sum{x_i} & \\sum{x_i^2}\\\\\n",
"\\sum{x_i} &\\sum{x_i^2} & \\sum{x_i^3} \\\\\n",
"\\sum{x_i^2} & \\sum{x_i^3} & \\sum{x_i^4}\n",
"\\end{array}\n",
"\\right]\n",
"\\left(\n",
"\\begin{array}{*{20}{c}}\n",
"a_0\\\\\n",
"a_1 \\\\\n",
"a_2\n",
"\\end{array} \\right)\n",
"= \\left( \n",
"\\begin{array}{*{20}{c}}\n",
"\\sum{y_i}\\\\\n",
"\\sum {x_i y_i}\\\\\n",
"\\sum {x_i^2 y_i}\n",
"\\end{array} \n",
"\\right)\n",
"\\end{equation}"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"nbpresent": {
"id": "a9d898a6-ee44-42c1-9981-9720cdd0a721"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 10 444 19806]\n",
" [ 444 19806 887796]\n",
" [ 19806 887796 39994422]] [ 1678 74596 3331896]\n"
]
}
],
"source": [
"A = np.array([[len(xi), np.sum(xi) , np.sum(xi**2) ],\n",
" [np.sum(xi), np.sum(xi**2), np.sum(xi**3)],\n",
" [np.sum(xi**2), np.sum(xi**3), np.sum(xi**4)] ])\n",
"b = np.array([np.sum(yi), np.sum(xi*yi), np.sum(xi*xi*yi)])\n",
"print(A,b)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"nbpresent": {
"id": "fc224dbb-beec-4581-82ca-c8bd0d976b29"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.28368173e+02 7.76379556e-01 2.50458155e-03]\n"
]
}
],
"source": [
"sol = np.linalg.solve(A,b)\n",
"print(sol)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"nbpresent": {
"id": "a0534903-3768-4383-b94b-92fff0564c06"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" 2022-08-28T19:40:43.440961 \n",
" image/svg+xml \n",
" \n",
" \n",
" Matplotlib v3.5.1, https://matplotlib.org/ \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"a0 = sol[0]\n",
"a1 = sol[1]\n",
"a2 = sol[2]\n",
"y_quad = lambda x: a0 + a1*x + a2*x**2\n",
"# now plot the regressed line as a function of the input data xi\n",
"plt.plot(xi, y_quad(xi), label='least-squares fit (quadratic)')\n",
"# plot the original data\n",
"plt.plot(xi, yi,'ko', label='original data')\n",
"plt.xlabel('Femur length (cm)')\n",
"plt.ylabel('Person\\' height (cm)')\n",
"plt.title('Height vs femur length')\n",
"plt.legend()\n",
"plt.grid()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "d239f7f9-ca4c-40ee-b996-fa3adee2f994"
}
},
"source": [
"Let's now compute the R2 value for the quadratic fit"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"nbpresent": {
"id": "d2d9cb7d-cea3-4a57-bf53-3aeb87b14e7f"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.8504537705463826\n"
]
}
],
"source": [
"rsq = rsquared(xi,yi,y_quad)\n",
"print(rsq)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "1a38a298-65f3-499e-9eb4-431149ec7464"
}
},
"source": [
"This value of 85% is not much different than the one we got with a straight line fit. The data is very likely to be distributed linearly."
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "ecf3be42-4ab9-464e-8a75-0dfdb530b122"
}
},
"source": [
"## Using the normal equations\n",
"Fit a straight line to the data using the normal equations"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"nbpresent": {
"id": "bb35d348-82ad-4826-a21e-8150f5faa8ce"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[ 1. 40.]\n",
" [ 1. 41.]\n",
" [ 1. 43.]\n",
" [ 1. 43.]\n",
" [ 1. 44.]\n",
" [ 1. 44.]\n",
" [ 1. 45.]\n",
" [ 1. 45.]\n",
" [ 1. 48.]\n",
" [ 1. 51.]]\n",
"[123.20779221 1.004329 ]\n"
]
}
],
"source": [
"A = np.array([np.ones(len(xi)),xi]).T\n",
"print(A)\n",
"ATA = A.T @ A\n",
"sol = np.linalg.solve(ATA,A.T@yi)\n",
"print(sol)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "37f2ec84-b50c-4aec-ad0c-559a2cf2efa1"
}
},
"source": [
"For a quadratic fit using the normal equations"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"nbpresent": {
"id": "a2cebedb-d56a-4c4f-a048-68229fd18b78"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.28368173e+02 7.76379556e-01 2.50458155e-03]\n"
]
}
],
"source": [
"A = np.array([np.ones(len(xi)),xi,xi**2]).T\n",
"ATA = A.T @ A\n",
"sol = np.linalg.solve(ATA,A.T@yi)\n",
"print(sol)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "2374f277-4ae9-4842-a415-ec84c61c750f"
}
},
"source": [
"## Using Numpy's Polyfit\n",
"It is possible to repeat the previous analysis using `numpy's` `polyfit` function. Let's try it out and compare to our results"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "4152a896-950d-442c-94a3-332c3e3d4f72"
}
},
"source": [
"We first compare a straight line fit"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"nbpresent": {
"id": "8a6985a7-38c7-4fab-9547-47b7288e931b"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1.004329 123.20779221]\n"
]
}
],
"source": [
"coefs = np.polyfit(xi,yi,1)\n",
"print(coefs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "93e122ed-772c-46dd-b124-5286cc3f0fc2"
}
},
"source": [
"Indeed, the coefficients are the same as the ones we obtained from our straight-line fit. Polyfit returns the coefs sorted in reverse order."
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "8d56bfa1-f4aa-4980-b7b5-8c21aca135fe"
}
},
"source": [
"To use the data returned by `polyfit`, we can construct a polynomial using `poly1d` and simply pass the coefficients returned by `polyfit`"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"nbpresent": {
"id": "2bea01ff-a1cb-44bf-946b-51af84670b6e"
}
},
"outputs": [],
"source": [
"p = np.poly1d(coefs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "b7fabc50-9ef0-41a6-9720-03f3d1b92263"
}
},
"source": [
"no, simply call `p` on any value, say `p(43)`"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"nbpresent": {
"id": "82fde909-25c9-45b5-b136-692f62738f5b"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"166.39393939393938\n"
]
}
],
"source": [
"print(p(43))"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "3a22c7ea-2e29-493b-bc90-539e0bac09f6"
}
},
"source": [
"We can also plot the fit"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"nbpresent": {
"id": "747a448a-75f0-49e2-a04f-8ea8767ddfa8"
}
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" 2022-08-28T19:40:43.555987 \n",
" image/svg+xml \n",
" \n",
" \n",
" Matplotlib v3.5.1, https://matplotlib.org/ \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n"
],
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# now plot the regressed line as a function of the input data xi\n",
"plt.plot(xi, p(xi), label='least-squares fit (quadratic)')\n",
"# plot the original data\n",
"plt.plot(xi, yi,'ko', label='original data')\n",
"plt.xlabel('Femur length (cm)')\n",
"plt.ylabel('Person\\' height (cm)')\n",
"plt.title('Height vs femur length')\n",
"plt.legend()\n",
"plt.grid()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "122e9579-fa22-40b7-9071-f377f6202506"
}
},
"source": [
"We can do the same for the quadratic fit - simply change the degree of the polynomial in `polyfit`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"nbpresent": {
"id": "255ac291-12b7-400a-8415-8c6683edc504"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"166.383465689269\n"
]
}
],
"source": [
"coefs = np.polyfit(xi,yi,2)\n",
"pquad = np.poly1d(coefs)\n",
"print(pquad(43))"
]
},
{
"cell_type": "markdown",
"metadata": {
"nbpresent": {
"id": "aa6babef-4eff-4c84-99a6-dadc1d240002"
}
},
"source": [
"## Using numpy's Least Squares"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"nbpresent": {
"id": "06256be1-8054-48df-b64e-efebf3605271"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(array([1.28368173e+02, 7.76379556e-01, 2.50458155e-03]),\n",
" array([16.39026675]),\n",
" 3,\n",
" array([6.32567302e+03, 9.94238708e+00, 1.72620823e-02]))"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.linalg.lstsq(A,yi,rcond=None)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"nbpresent": {
"id": "17b00c90-ab55-4e22-a85f-2f09e710ca04"
}
},
"outputs": [
{
"data": {
"text/html": [
"CSS style adapted from https://github.com/barbagroup/CFDPython. Copyright (c) Barba group\n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.core.display import HTML\n",
"def css_styling():\n",
" styles = open(\"../../styles/custom.css\", \"r\").read()\n",
" return HTML(styles)\n",
"css_styling()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"hide_input": false,
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {
"height": "66px",
"width": "252px"
},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": "block",
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}