{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Least-squares" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a least-squares, or linear regression, problem, we have measurements $A \\in \\mathcal{R}^{m \\times n}$ and $b \\in \\mathcal{R}^m$ and seek a vector $x \\in \\mathcal{R}^{n}$ such that $Ax$ is close to $b$. Closeness is defined as the sum of the squared differences:\n", "$$ \\sum_{i=1}^m (a_i^Tx - b_i)^2, $$\n", "also known as the $\\ell_2$-norm squared, $\\|Ax - b\\|_2^2$.\n", "\n", "For example, we might have a dataset of $m$ users, each represented by $n$ features. Each row $a_i^T$ of $A$ is the features for user $i$, while the corresponding entry $b_i$ of $b$ is the measurement we want to predict from $a_i^T$, such as ad spending. The prediction is given by $a_i^Tx$.\n", "\n", "We find the optimal $x$ by solving the optimization problem\n", "$$ \n", " \\begin{array}{ll}\n", " \\mbox{minimize} & \\|Ax - b\\|_2^2.\n", " \\end{array}\n", "$$\n", "Let $x^\\star$ denote the optimal $x$. The quantity $r = Ax^\\star - b$ is known as the residual. If $\\|r\\|_2 = 0$, we have a perfect fit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following code, we solve a least-squares problem with CVXPY." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "The optimal value is 7.005909828287484\n", "The optimal x is\n", "[ 0.17492418 -0.38102551 0.34732251 0.0173098 -0.0845784 -0.08134019\n", " 0.293119 0.27019762 0.17493179 -0.23953449 0.64097935 -0.41633637\n", " 0.12799688 0.1063942 -0.32158411]\n", "The norm of the residual is 2.6468679280023557\n" ] } ], "source": [ "# Import packages.\n", "import cvxpy as cp\n", "import numpy as np\n", "\n", "# Generate data.\n", "m = 20\n", "n = 15\n", "np.random.seed(1)\n", "A = np.random.randn(m, n)\n", "b = np.random.randn(m)\n", "\n", "# Define and solve the CVXPY problem.\n", "x = cp.Variable(n)\n", "cost = cp.sum_squares(A*x - b)\n", "prob = cp.Problem(cp.Minimize(cost))\n", "prob.solve()\n", "\n", "# Print result.\n", "print(\"\\nThe optimal value is\", prob.value)\n", "print(\"The optimal x is\")\n", "print(x.value)\n", "print(\"The norm of the residual is \", cp.norm(A*x - b, p=2).value)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }