# Least Squares via Calculus

In [1]:
# for QR codes use inline
%matplotlib inline
qr_setting = 'url'
#
# for lecture use notebook
# %matplotlib notebook
# qr_setting = None
#
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
import pandas as pd
from importlib import reload
from datetime import datetime
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML;

__One more proof.__

As mentioned, one way to express the least squares problem is:

$$\hx = \arg\min_{\vx}\Vert A\vx - \vb\Vert$$

When we write it like this, it is clear that this is a _minimization_ problem (or an _optimization_).

We want to find the $\vx$ that minimizes the function $g(\vx) = \Vert A\vx - \vb\Vert$

In fact, we can use standard methods from calculus to find the minimum of this function.

Let's see how to do it. It is simpler to minimize 

$$f(\vx) = \Vert A\vx - \vb\Vert^2$$ 

which has the same minimum as $g(\vx)$.

Let's assume the function is convex (opens upward). In following lectures, we'll see that this is always true (because $A^TA$ is _positive definite._) Then its minimum occurs where its derivative is zero.

$$\frac{d}{d\vx}f(\vx) = 0$$

So:

$$ f(\vx) = \Vert A\vx - \vb\Vert^2 $$

$$ = (A\vx - \vb)^T(A\vx - \vb)$$

$$ = (\vx^TA^T - \vb^T)(A\vx - \vb)$$

$$ = \vx^TA^TA\vx - \vb^TA\vx - \vx^TA^T\vb + \vb^T\vb$$

$$ = \vx^TA^TA\vx - 2\vx^TA^T\vb + \vb^T\vb$$

Now, 

$$ \frac{d}{d\vx}f(\vx) = 2A^TA\vx - 2A^T\vb$$

Setting the derivative to zero, we confirm the normal equations:

$$ A^TA\vx = A^T\vb$$

Let's see how to take the derivative of $\vx^TA^TA\vx$. For simplicity, we'll do this for the case in whcih $A^TA$ is $2\times 2$.

So $$A^TA = \mat{{cc}a_{11}&a_{12}\\a_{21}&a_{22}}.$$ 

Now $$A^TA\vx = \mat{{c}a_{11}x_1 + a_{12}x_2\\a_{21}x_1+a_{22}x_2}$$

So $$\vx^TA^TA\vx = a_{11}x_1^2 + a_{12}x_1x_2 + a_{21}x_2x_1 + a_{22}x_2^2$$

Let's denote this function $$h(x_1, x_2) = a_{11}x_1^2 + a_{12}x_1x_2 + a_{21}x_2x_1 + a_{22}x_2^2$$

Now $\frac{d}{d\vx} \vx^TA^TA\vx = \frac{d}{d\vx} h(x_1,x_2)$ is defined as a two element vector: 

$$\mat{{c}\frac{d}{dx_1}h(x_1,x_2)\\\frac{d}{dx_2}h(x_1,x_2)}$$

So $$\frac{d}{d\vx} h(x_1,x_2) = \mat{{c}2a_{11}x_1+a_{12}x_2+a_{21}x_2\\2a_{22}x_2+a_{12}x_1+a_{21}x_1}$$

Now $A^TA$ is symmetric, so $a_{12} = a_{21}$. So the above is the same as:

$$\frac{d}{d\vx} h(x_1,x_2) = \mat{{c}2a_{11}x_1+2a_{12}x_2\\2a_{21}x_1+2a_{22}x_2}= 2A^TA \vx.$$