{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "You can read an overview of this Numerical Linear Algebra course in [this blog post](http://www.fast.ai/2017/07/17/num-lin-alg/). The course was originally taught in the [University of San Francisco MS in Analytics](https://www.usfca.edu/arts-sciences/graduate-programs/analytics) graduate program. Course lecture videos are [available on YouTube](https://www.youtube.com/playlist?list=PLtmWHNX-gukIc92m1K0P6bIOnZb-mg0hY) (note that the notebook numbers and video numbers do not line up, since some notebooks took longer than 1 video to cover).\n", "\n", "You can ask questions about the course on [our fast.ai forums](http://forums.fast.ai/c/lin-alg)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. How to Implement Linear Regression" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "In the previous lesson, we calculated the least squares linear regression for a diabetes dataset, using scikit learn's implementation. Today, we will look at how we could write our own implementation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn import datasets, linear_model, metrics\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import PolynomialFeatures\n", "import math, scipy, numpy as np\n", "from scipy import linalg" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "np.set_printoptions(precision=6)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = datasets.load_diabetes()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "feature_names=['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "trn,test,y_trn,y_test = train_test_split(data.data, data.target, test_size=0.2)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((353, 10), (89, 10))" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trn.shape, test.shape" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "def regr_metrics(act, pred):\n", " return (math.sqrt(metrics.mean_squared_error(act, pred)), \n", " metrics.mean_absolute_error(act, pred))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How did sklearn do it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How is sklearn doing this? By checking [the source code](https://github.com/scikit-learn/scikit-learn/blob/14031f6/sklearn/linear_model/base.py#L417), you can see that in the dense case, it calls [scipy.linalg.lstqr](https://github.com/scipy/scipy/blob/v0.19.0/scipy/linalg/basic.py#L892-L1058), which is calling a LAPACK method:\n", "\n", " Options are ``'gelsd'``, ``'gelsy'``, ``'gelss'``. Default\n", " (``'gelsd'``) is a good choice. However, ``'gelsy'`` can be slightly\n", " faster on many problems. ``'gelss'`` was used historically. It is\n", " generally slow but uses less memory.\n", "\n", "- [gelsd](https://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_lapack_examples/_gelsd.htm): uses SVD and a divide-and-conquer method\n", "- [gelsy](https://software.intel.com/en-us/node/521113): uses QR factorization\n", "- [gelss](https://software.intel.com/en-us/node/521114): uses SVD" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "#### Scipy Sparse Least Squares" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "We will not get into too much detail about the sparse version of least squares. Here is a bit of info if you are interested: " ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "[Scipy sparse lsqr](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.linalg.lsqr.html#id1) uses an iterative method called [Golub and Kahan bidiagonalization](https://web.stanford.edu/class/cme324/paige-saunders2.pdf)." ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "from [scipy sparse lsqr source code](https://github.com/scipy/scipy/blob/v0.14.0/scipy/sparse/linalg/isolve/lsqr.py#L96):\n", " Preconditioning is another way to reduce the number of iterations. If it is possible to solve a related system ``M*x = b`` efficiently, where M approximates A in some helpful way (e.g. M - A has low rank or its elements are small relative to those of A), LSQR may converge more rapidly on the system ``A*M(inverse)*z = b``, after which x can be recovered by solving M\\*x = z.\n", " \n", "If A is symmetric, LSQR should not be used! Alternatives are the symmetric conjugate-gradient method (cg) and/or SYMMLQ. SYMMLQ is an implementation of symmetric cg that applies to any symmetric A and will converge more rapidly than LSQR. If A is positive definite, there are other implementations of symmetric cg that require slightly less work per iteration than SYMMLQ (but will take the same number of iterations)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### linalg.lstqr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sklearn implementation handled adding a constant term (since the y-intercept is presumably not 0 for the line we are learning) for us. We will need to do that by hand now:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "trn_int = np.c_[trn, np.ones(trn.shape[0])]\n", "test_int = np.c_[test, np.ones(test.shape[0])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since `linalg.lstsq` lets us specify which LAPACK routine we want to use, lets try them all and do some timing comparisons:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "290 µs ± 9.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver=\"gelsd\")" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "140 µs ± 91.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver=\"gelsy\")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "199 µs ± 228 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%timeit coef, _,_,_ = linalg.lstsq(trn_int, y_trn, lapack_driver=\"gelss\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Naive Solution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that we want to find $\\hat{x}$ that minimizes: \n", "$$ \\big\\vert\\big\\vert Ax - b \\big\\vert\\big\\vert_2$$\n", "\n", "Another way to think about this is that we are interested in where vector $b$ is closest to the subspace spanned by $A$ (called the *range of* $A$). This is the projection of $b$ onto $A$. Since $b - A\\hat{x}$ must be perpendicular to the subspace spanned by $A$, we see that\n", "\n", "$$A^T (b - A\\hat{x}) = 0 $$\n", "\n", "(we are using $A^T$ because we want to multiply each column of $A$ by $b - A\\hat{x}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This leads us to the *normal equations*:\n", "$$ x = (A^TA)^{-1}A^T b $$" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def ls_naive(A, b):\n", " return np.linalg.inv(A.T @ A) @ A.T @ b" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "45.8 µs ± 4.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit coeffs_naive = ls_naive(trn_int, y_trn)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(57.94102134545707, 48.053565198516438)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coeffs_naive = ls_naive(trn_int, y_trn)\n", "regr_metrics(y_test, test_int @ coeffs_naive)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normal Equations (Cholesky)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normal equations:\n", "$$ A^TA x = A^T b $$\n", "\n", "If $A$ has full rank, the pseudo-inverse $(A^TA)^{-1}A^T$ is a **square, hermitian positive definite** matrix. The standard way of solving such a system is *Cholesky Factorization*, which finds upper-triangular R s.t. $A^TA = R^TR$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following steps are based on Algorithm 11.1 from Trefethen:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = trn_int" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true }, "outputs": [], "source": [ "b = y_trn" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "AtA = A.T @ A\n", "Atb = A.T @ b" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning:** Numpy and Scipy default to different upper/lower for Cholesky" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "R = scipy.linalg.cholesky(AtA)" ] }, { "cell_type": "code", "execution_count": 196, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.9124, 0.1438, 0.1511, 0.3002, 0.2228, 0.188 ,\n", " -0.051 , 0.1746, 0.22 , 0.2768, -0.2583],\n", " [ 0. , 0.8832, 0.0507, 0.1826, -0.0251, 0.0928,\n", " -0.3842, 0.2999, 0.0911, 0.15 , 0.4393],\n", " [ 0. , 0. , 0.8672, 0.2845, 0.2096, 0.2153,\n", " -0.2695, 0.3181, 0.3387, 0.2894, -0.005 ],\n", " [ 0. , 0. , 0. , 0.7678, 0.0762, -0.0077,\n", " 0.0383, 0.0014, 0.165 , 0.166 , 0.0234],\n", " [ 0. , 0. , 0. , 0. , 0.8288, 0.7381,\n", " 0.1145, 0.4067, 0.3494, 0.158 , -0.2826],\n", " [ 0. , 0. , 0. , 0. , 0. , 0.3735,\n", " -0.3891, 0.2492, -0.3245, -0.0323, -0.1137],\n", " [ 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0.6406, -0.511 , -0.5234, -0.172 , -0.9392],\n", " [ 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0.2887, -0.0267, -0.0062, 0.0643],\n", " [ 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0.2823, 0.0636, 0.9355],\n", " [ 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0.7238, 0.0202],\n", " [ 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0. , 18.7319]])" ] }, "execution_count": 196, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.set_printoptions(suppress=True, precision=4)\n", "R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "check our factorization:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.5140158187158533e-16" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.norm(AtA - R.T @ R)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ A^T A x = A^T b $$\n", "$$ R^T R x = A^T b $$\n", "$$ R^T w = A^T b $$\n", "$$ R x = w $$" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "w = scipy.linalg.solve_triangular(R, Atb, lower=False, trans='T')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's always good to check that our result is what we expect it to be: (in case we entered the wrong params, the function didn't return what we thought, or sometimes the docs are even outdated)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.1368683772161603e-13" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.norm(R.T @ w - Atb)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "coeffs_chol = scipy.linalg.solve_triangular(R, w, lower=False)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6.861429794408013e-14" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.norm(R @ coeffs_chol - w)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def ls_chol(A, b):\n", " R = scipy.linalg.cholesky(A.T @ A)\n", " w = scipy.linalg.solve_triangular(R, A.T @ b, trans='T')\n", " return scipy.linalg.solve_triangular(R, w)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "111 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ "%timeit coeffs_chol = ls_chol(trn_int, y_trn)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(57.9410213454571, 48.053565198516438)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coeffs_chol = ls_chol(trn_int, y_trn)\n", "regr_metrics(y_test, test_int @ coeffs_chol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### QR Factorization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ A x = b $$\n", "$$ A = Q R $$\n", "$$ Q R x = b $$\n", "\n", "$$ R x = Q^T b $$" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def ls_qr(A,b):\n", " Q, R = scipy.linalg.qr(A, mode='economic')\n", " return scipy.linalg.solve_triangular(R, Q.T @ b)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "205 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%timeit coeffs_qr = ls_qr(trn_int, y_trn)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(57.94102134545711, 48.053565198516452)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coeffs_qr = ls_qr(trn_int, y_trn)\n", "regr_metrics(y_test, test_int @ coeffs_qr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SVD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ A x = b $$\n", "\n", "$$ A = U \\Sigma V $$\n", "\n", "$$ \\Sigma V x = U^T b $$\n", "\n", "$$ \\Sigma w = U^T b $$\n", "\n", "$$ x = V^T w $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVD gives the pseudo-inverse" ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def ls_svd(A,b):\n", " m, n = A.shape\n", " U, sigma, Vh = scipy.linalg.svd(A, full_matrices=False, lapack_driver='gesdd')\n", " w = (U.T @ b)/ sigma\n", " return Vh.T @ w" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.11 ms ± 320 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%timeit coeffs_svd = ls_svd(trn_int, y_trn)" ] }, { "cell_type": "code", "execution_count": 254, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "266 µs ± 8.49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" ] } ], "source": [ "%timeit coeffs_svd = ls_svd(trn_int, y_trn)" ] }, { "cell_type": "code", "execution_count": 255, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(57.941021345457244, 48.053565198516687)" ] }, "execution_count": 255, "metadata": {}, "output_type": "execute_result" } ], "source": [ "coeffs_svd = ls_svd(trn_int, y_trn)\n", "regr_metrics(y_test, test_int @ coeffs_svd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Random Sketching Technique for Least Squares Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Linear Sketching](http://researcher.watson.ibm.com/researcher/files/us-dpwoodru/journal.pdf) (Woodruff)\n", "\n", "1. Sample a r x n random matrix S, r << n\n", "2. Compute S A and S b\n", "3. Find exact solution x to regression SA x = Sb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Timing Comparison" ] }, { "cell_type": "code", "execution_count": 244, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import timeit\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def scipylstq(A, b):\n", " return scipy.linalg.lstsq(A,b)[0]" ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "collapsed": true }, "outputs": [], "source": [ "row_names = ['Normal Eqns- Naive',\n", " 'Normal Eqns- Cholesky', \n", " 'QR Factorization', \n", " 'SVD', \n", " 'Scipy lstsq']\n", "\n", "name2func = {'Normal Eqns- Naive': 'ls_naive', \n", " 'Normal Eqns- Cholesky': 'ls_chol', \n", " 'QR Factorization': 'ls_qr',\n", " 'SVD': 'ls_svd',\n", " 'Scipy lstsq': 'scipylstq'}" ] }, { "cell_type": "code", "execution_count": 247, "metadata": { "collapsed": true }, "outputs": [], "source": [ "m_array = np.array([100, 1000, 10000])\n", "n_array = np.array([20, 100, 1000])" ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "collapsed": true }, "outputs": [], "source": [ "index = pd.MultiIndex.from_product([m_array, n_array], names=['# rows', '# cols'])" ] }, { "cell_type": "code", "execution_count": 249, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pd.options.display.float_format = '{:,.6f}'.format\n", "df = pd.DataFrame(index=row_names, columns=index)\n", "df_error = pd.DataFrame(index=row_names, columns=index)" ] }, { "cell_type": "code", "execution_count": 256, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# %%prun\n", "for m in m_array:\n", " for n in n_array:\n", " if m >= n: \n", " x = np.random.uniform(-10,10,n)\n", " A = np.random.uniform(-40,40,[m,n]) # removed np.asfortranarray\n", " b = np.matmul(A, x) + np.random.normal(0,2,m)\n", " for name in row_names:\n", " fcn = name2func[name]\n", " t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())\n", " df.set_value(name, (m,n), t)\n", " coeffs = locals()[fcn](A, b)\n", " reg_met = regr_metrics(b, A @ coeffs)\n", " df_error.set_value(name, (m,n), reg_met[0])" ] }, { "cell_type": "code", "execution_count": 257, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# rows100100010000
# cols201001000201001000201001000
Normal Eqns- Naive0.0012760.003634NaN0.0009600.0051720.2931260.0022260.0212481.164655
Normal Eqns- Cholesky0.0016600.003958NaN0.0016650.0040070.0936960.0019280.0104560.399464
QR Factorization0.0021740.006486NaN0.0042350.0177730.2132320.0192290.1161222.208129
SVD0.0038800.021737NaN0.0046720.0269501.2804900.0181380.1306523.433003
Scipy lstsq0.0043380.020198NaN0.0043200.0211991.0838040.0122000.0884672.134780
\n", "
" ], "text/plain": [ "# rows 100 1000 \\\n", "# cols 20 100 1000 20 100 1000 \n", "Normal Eqns- Naive 0.001276 0.003634 NaN 0.000960 0.005172 0.293126 \n", "Normal Eqns- Cholesky 0.001660 0.003958 NaN 0.001665 0.004007 0.093696 \n", "QR Factorization 0.002174 0.006486 NaN 0.004235 0.017773 0.213232 \n", "SVD 0.003880 0.021737 NaN 0.004672 0.026950 1.280490 \n", "Scipy lstsq 0.004338 0.020198 NaN 0.004320 0.021199 1.083804 \n", "\n", "# rows 10000 \n", "# cols 20 100 1000 \n", "Normal Eqns- Naive 0.002226 0.021248 1.164655 \n", "Normal Eqns- Cholesky 0.001928 0.010456 0.399464 \n", "QR Factorization 0.019229 0.116122 2.208129 \n", "SVD 0.018138 0.130652 3.433003 \n", "Scipy lstsq 0.012200 0.088467 2.134780 " ] }, "execution_count": 257, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 252, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# rows100100010000
# cols201001000201001000201001000
Normal Eqns- Naive1.7027420.000000NaN1.9707671.9048730.0000001.9783831.9804491.884440
Normal Eqns- Cholesky1.7027420.000000NaN1.9707671.9048730.0000001.9783831.9804491.884440
QR Factorization1.7027420.000000NaN1.9707671.9048730.0000001.9783831.9804491.884440
SVD1.7027420.000000NaN1.9707671.9048730.0000001.9783831.9804491.884440
Scipy lstsq1.7027420.000000NaN1.9707671.9048730.0000001.9783831.9804491.884440
\n", "
" ], "text/plain": [ "# rows 100 1000 \\\n", "# cols 20 100 1000 20 100 1000 \n", "Normal Eqns- Naive 1.702742 0.000000 NaN 1.970767 1.904873 0.000000 \n", "Normal Eqns- Cholesky 1.702742 0.000000 NaN 1.970767 1.904873 0.000000 \n", "QR Factorization 1.702742 0.000000 NaN 1.970767 1.904873 0.000000 \n", "SVD 1.702742 0.000000 NaN 1.970767 1.904873 0.000000 \n", "Scipy lstsq 1.702742 0.000000 NaN 1.970767 1.904873 0.000000 \n", "\n", "# rows 10000 \n", "# cols 20 100 1000 \n", "Normal Eqns- Naive 1.978383 1.980449 1.884440 \n", "Normal Eqns- Cholesky 1.978383 1.980449 1.884440 \n", "QR Factorization 1.978383 1.980449 1.884440 \n", "SVD 1.978383 1.980449 1.884440 \n", "Scipy lstsq 1.978383 1.980449 1.884440 " ] }, "execution_count": 252, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_error" ] }, { "cell_type": "code", "execution_count": 618, "metadata": { "collapsed": true }, "outputs": [], "source": [ "store = pd.HDFStore('least_squares_results.h5')" ] }, { "cell_type": "code", "execution_count": 619, "metadata": { "collapsed": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\rache\\Anaconda3\\lib\\site-packages\\IPython\\core\\interactiveshell.py:2881: PerformanceWarning: \n", "your performance may suffer as PyTables will pickle object types that it cannot\n", "map directly to c-types [inferred_type->floating,key->block0_values] [items->[(100, 20), (100, 100), (100, 1000), (1000, 20), (1000, 100), (1000, 1000), (5000, 20), (5000, 100), (5000, 1000)]]\n", "\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] } ], "source": [ "store['df'] = df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Notes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used the magick %prun to profile my code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternative: least absolute deviation (L1 regression)\n", "- Less sensitive to outliers than least squares.\n", "- No closed form solution, but can solve with linear programming" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "## Conditioning & stability" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### Condition Number" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "*Condition number* is a measure of how small changes to the input cause the output to change.\n", "\n", "**Question**: Why do we care about behavior with small changes to the input in numerical linear algebra?" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The *relative condition number* is defined by\n", "\n", "$$ \\kappa = \\sup_{\\delta x} \\frac{\\|\\delta f\\|}{\\| f(x) \\|}\\bigg/ \\frac{\\| \\delta x \\|}{\\| x \\|} $$\n", " \n", "where $\\delta x$ is infinitesimal\n", "\n", "According to Trefethen (pg. 91), a problem is *well-conditioned* if $\\kappa$ is small (e.g. $1$, $10$, $10^2$) and *ill-conditioned* if $\\kappa$ is large (e.g. $10^6$, $10^{16}$)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "**Conditioning**: perturbation behavior of a mathematical problem (e.g. least squares)\n", "\n", "**Stability**: perturbation behavior of an algorithm used to solve that problem on a computer (e.g. least squares algorithms, householder, back substitution, gaussian elimination)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### Conditioning example" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The problem of computing eigenvalues of a non-symmetric matrix is often ill-conditioned" ] }, { "cell_type": "code", "execution_count": 178, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "A = [[1, 1000], [0, 1]]\n", "B = [[1, 1000], [0.001, 1]]" ] }, { "cell_type": "code", "execution_count": 179, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "wA, vrA = scipy.linalg.eig(A)\n", "wB, vrB = scipy.linalg.eig(B)" ] }, { "cell_type": "code", "execution_count": 180, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "(array([ 1.+0.j, 1.+0.j]),\n", " array([ 2.00000000e+00+0.j, -2.22044605e-16+0.j]))" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wA, wB" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "#### Condition Number of a Matrix" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "The product $\\| A\\| \\|A^{-1} \\|$ comes up so often it has its own name: the *condition number* of $A$. Note that normally we talk about the conditioning of problems, not matrices.\n", "\n", "The *condition number* of $A$ relates to:\n", "- computing $b$ given $A$ and $x$ in $Ax = b$\n", "- computing $x$ given $A$ and $b$ in $Ax = b$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loose ends from last time" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Full vs Reduced Factorizations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**SVD**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Diagrams from Trefethen:\n", "\n", "\"\"\n", "\n", "\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**QR Factorization exists for ALL matrices**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just like with SVD, there are full and reduced versions of the QR factorization.\n", "\n", "\"\"\n", "\n", "\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matrix Inversion is Unstable" ] }, { "cell_type": "code", "execution_count": 197, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from scipy.linalg import hilbert" ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "collapsed": true }, "outputs": [], "source": [ "n = 14\n", "A = hilbert(n)\n", "x = np.random.uniform(-10,10,n)\n", "b = A @ x" ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A_inv = np.linalg.inv(A)" ] }, { "cell_type": "code", "execution_count": 233, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5.0516495470543212" ] }, "execution_count": 233, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.norm(np.eye(n) - A @ A_inv)" ] }, { "cell_type": "code", "execution_count": 231, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.2271635826494112e+17" ] }, "execution_count": 231, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linalg.cond(A)" ] }, { "cell_type": "code", "execution_count": 232, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 1. , 0. , -0.0001, 0.0005, -0.0006, 0.0105, -0.0243,\n", " 0.1862, -0.6351, 2.2005, -0.8729, 0.8925, -0.0032, -0.0106],\n", " [ 0. , 1. , -0. , 0. , 0.0035, 0.0097, -0.0408,\n", " 0.0773, -0.0524, 1.6926, -0.7776, -0.111 , -0.0403, -0.0184],\n", " [ 0. , 0. , 1. , 0.0002, 0.0017, 0.0127, -0.0273,\n", " 0. , 0. , 1.4688, -0.5312, 0.2812, 0.0117, 0.0264],\n", " [ 0. , 0. , -0. , 1.0005, 0.0013, 0.0098, -0.0225,\n", " 0.1555, -0.0168, 1.1571, -0.9656, -0.0391, 0.018 , -0.0259],\n", " [-0. , 0. , -0. , 0.0007, 1.0001, 0.0154, 0.011 ,\n", " -0.2319, 0.5651, -0.2017, 0.2933, -0.6565, 0.2835, -0.0482],\n", " [ 0. , -0. , 0. , -0.0004, 0.0059, 0.9945, -0.0078,\n", " -0.0018, -0.0066, 1.1839, -0.9919, 0.2144, -0.1866, 0.0187],\n", " [-0. , 0. , -0. , 0.0009, -0.002 , 0.0266, 0.974 ,\n", " -0.146 , 0.1883, -0.2966, 0.4267, -0.8857, 0.2265, -0.0453],\n", " [ 0. , 0. , -0. , 0.0002, 0.0009, 0.0197, -0.0435,\n", " 1.1372, -0.0692, 0.7691, -1.233 , 0.1159, -0.1766, -0.0033],\n", " [ 0. , 0. , -0. , 0.0002, 0. , -0.0018, -0.0136,\n", " 0.1332, 0.945 , 0.3652, -0.2478, -0.1682, 0.0756, -0.0212],\n", " [ 0. , -0. , -0. , 0.0003, 0.0038, -0.0007, 0.0318,\n", " -0.0738, 0.2245, 1.2023, -0.2623, -0.2783, 0.0486, -0.0093],\n", " [-0. , 0. , -0. , 0.0004, -0.0006, 0.013 , -0.0415,\n", " 0.0292, -0.0371, 0.169 , 1.0715, -0.09 , 0.1668, -0.0197],\n", " [ 0. , -0. , 0. , 0. , 0.0016, 0.0062, -0.0504,\n", " 0.1476, -0.2341, 0.8454, -0.7907, 1.4812, -0.15 , 0.0186],\n", " [ 0. , -0. , 0. , -0.0001, 0.0022, 0.0034, -0.0296,\n", " 0.0944, -0.1833, 0.6901, -0.6526, 0.2556, 0.8563, 0.0128],\n", " [ 0. , 0. , 0. , -0.0001, 0.0018, -0.0041, -0.0057,\n", " -0.0374, -0.165 , 0.3968, -0.2264, -0.1538, -0.0076, 1.005 ]])" ] }, "execution_count": 232, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A @ A_inv" ] }, { "cell_type": "code", "execution_count": 237, "metadata": { "collapsed": true }, "outputs": [], "source": [ "row_names = ['Normal Eqns- Naive',\n", " 'QR Factorization', \n", " 'SVD', \n", " 'Scipy lstsq']\n", "\n", "name2func = {'Normal Eqns- Naive': 'ls_naive', \n", " 'QR Factorization': 'ls_qr',\n", " 'SVD': 'ls_svd',\n", " 'Scipy lstsq': 'scipylstq'}" ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pd.options.display.float_format = '{:,.9f}'.format\n", "df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])" ] }, { "cell_type": "code", "execution_count": 239, "metadata": { "collapsed": true }, "outputs": [], "source": [ "for name in row_names:\n", " fcn = name2func[name]\n", " t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())\n", " coeffs = locals()[fcn](A, b)\n", " df.set_value(name, 'Time', t)\n", " df.set_value(name, 'Error', regr_metrics(b, A @ coeffs)[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### SVD is best here!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "DO NOT RERUN" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeError
Normal Eqns- Naive0.0013343393.598901966
QR Factorization0.0021661390.000000000
SVD0.0015569370.000000000
Scipy lstsq0.0018715900.000000000
\n", "
" ], "text/plain": [ " Time Error\n", "Normal Eqns- Naive 0.001334339 3.598901966\n", "QR Factorization 0.002166139 0.000000000\n", "SVD 0.001556937 0.000000000\n", "Scipy lstsq 0.001871590 0.000000000" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeError
Normal Eqns- Naive0.0013343393.598901966
QR Factorization0.0021661390.000000000
SVD0.0015569370.000000000
Scipy lstsq0.0018715900.000000000
\n", "
" ], "text/plain": [ " Time Error\n", "Normal Eqns- Naive 0.001334339 3.598901966\n", "QR Factorization 0.002166139 0.000000000\n", "SVD 0.001556937 0.000000000\n", "Scipy lstsq 0.001871590 0.000000000" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Another reason not to take inverse**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even if $A$ is incredibly sparse, $A^{-1}$ is generally dense. For large matrices, $A^{-1}$ could be so dense as to not fit in memory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Runtime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Matrix Inversion: $2n^3$\n", "\n", "Matrix Multiplication: $n^3$\n", "\n", "Cholesky: $\\frac{1}{3}n^3$\n", "\n", "QR, Gram Schmidt: $2mn^2$, $m\\geq n$ (chapter 8 of Trefethen)\n", "\n", "QR, Householder: $2mn^2 - \\frac{2}{3}n^3$ (chapter 10 of Trefethen)\n", "\n", "Solving a triangular system: $n^2$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Why Cholesky Factorization is Fast:**\n", "\n", "\"\"\n", "(source: [Stanford Convex Optimization: Numerical Linear Algebra Background Slides](http://stanford.edu/class/ee364a/lectures/num-lin-alg.pdf))" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### A Case Where QR is the Best" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "m=100\n", "n=15\n", "t=np.linspace(0, 1, m)" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "# Vandermonde matrix\n", "A=np.stack([t**i for i in range(n)], 1)" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "b=np.exp(np.sin(4*t))\n", "\n", "# This will turn out to normalize the solution to be 1\n", "b /= 2006.787453080206" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "from matplotlib import pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAD8CAYAAABpcuN4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVdX+//HXh3lQQBRREUVyBGdxrCwrc0jDTCsbtG43\n8+u1urfR6qY222TdhmtXm7RumZUl5VTZoKWWOAKCgjiAguCEAwIC6/cHp/szMjjKsM/weT4ePM45\n+6x1zmdFnjdnr73XFmMMSiml1J/xsLoApZRSjk2DQimlVJU0KJRSSlVJg0IppVSVNCiUUkpVSYNC\nKaVUlTQolFJKVUmDQimlVJXsCgoRGSoi20UkQ0SmnuV5EZFXbc9vFZGe1fUVkbEikiIi5SISd5bX\nbCUiJ0Tk/vMdnFJKqZrzqq6BiHgCbwCDgWxgvYgkGGO2ndFsGNDO9tMXmA30raZvMjAa+M+fvPUs\nYJk9g2jSpImJioqyp6lSSimbDRs2HDTGhFXXrtqgAPoAGcaYTAARWQDEA2cGRTww31SsB7JOREJE\npDkQ9Wd9jTGptm1/eEMRGQXsAk7aUR9RUVEkJiba01QppZSNiOyxp509u54igKwzHmfbttnTxp6+\nvyMiDYCHgMeraTdRRBJFJDE/P7/KASillDp/jjiZPQN42RhzoqpGxpg5xpg4Y0xcWFi135yUUkqd\nJ3t2Pe0DIs943NK2zZ423nb0rawvMEZEngdCgHIRKTLGvG5HrUoppWqZPUGxHmgnIm2o+JC/Abix\nUpsEYIptDqIvUGCMyRGRfDv6/o4x5uLf7ovIDOCEhoRSSlmn2qAwxpSKyBRgBeAJvGOMSRGRSbbn\n3wSWAsOBDKAQuK2qvgAicg3wGhAGLBGRzcaYIbU9QKWUUjUjrnDhori4OKNHPSml1LkRkQ3GmD+c\nx1aZI05mK6WUciD2zFEo5RROFJeyPfc4ew6d5ERxKSeLyygpLSfY34tGgT40aeBLx2YNadzA1+pS\nlXIqGhTKaZ0oLmX1jny+Tc3jl12HyD5yyq5+ESH+dIsMZlCHplwZ04zgAO86rlQp56ZBoZyKMYZf\ndx1m/to9fLPtACVl5QT7e3NR2yaM69OKDuENiQ4LJMjfm0AfL7w9hWNFpRw+WULesSKS9xewJbuA\njXuOsDQpl4c9krjQ1ndwTDieHn9cKUApd6eT2coplJcbErbs580fd5KWe5xgf29G94xgSGwz4lo3\nwsvz3KbbjDFszS5gaXIOX23JYd/RU0Q1DuD2i6MZ26slft6edTQSpRyHvZPZGhTK4a3akc/MZWls\nyzlGx2YNue3CKK7uFoG/T+18mJeWlbMi5QBzVu1kS3YBESH+PDK8E8O7NDvrWmRKuQoNCuX0cguK\nePTzJFam5dGykT8PDOnAyK4t8Kij3UPGGH7OOMRTS7aRlnucPlGhPDEqlo7Ngurk/ZSymgaFclrG\nGD5JzObJJds4XVbOvYPbM2FAFL5e9bM7qKzcsDAxixdXbOdY0WnuHdyBiQOjdf5CuRx7g0Ins5VD\nOVpYwn0Lt7AyLY++bUJ5fkxXWjcOrNcaPD2EcX1aMSS2GY9+nsRzy9NYmXqAWdd1p1XjgHqtRSlH\noCfcKYeRlF3AiNd+YnX6QWaMjOGjO/rVe0icKTTQh3/f1JOXr+/G9gPHGfHaan7YnmdZPUpZRYNC\nOYSF67O4dvYayssNCyf159YL29TZXMS5EBGu6dGSpXdfTIsQf257bz3//iEDV9hlq5S9NCiUpYwx\nvLAijQc/20rf6FC+uvtiukeGWF3WH0SGBrBo8gCu6tKc55dv554FmykuLbO6LKXqhc5RKMuUlJYz\n9bOtLNq0j3F9InkyvvM5nw9RnwJ8vHhtXA86NQ/ihRXbOXyyhDdv6UUDX/1npFyb4/6rVC7tVEkZ\nt89bz6JN+7hvcHueuaaLQ4fEb0SEvw1qywtjurI28xDj5qzj4Iliq8tSqk45/r9M5XJOFpdy23u/\n8nPGQZ4f05W7Lm/ndCe2jY2LZM4tvUjPO851/1lL3rEiq0tSqs5oUKh6daK4lNveXc+vuw7z8vXd\nuS4usvpODuryTuG8f3tfcguKGDd3HfnH9ZuFck0aFKreFJaUcus7v7Jh7xFeHdeD+O4RVpdUY72j\nQnn31t7sP1rEjXN1N5RyTRoUql6UlJYz6YONbNx7hNfG9WBE1xZWl1Rr+kY35t3bepN95BQ3v/UL\nBYWnrS5JqVqlQaHqXFm54d6FmysW9xvdleFdmltdUq3rF92YtybEkZl/kjvmJ1J0Wg+dVa5Dg0LV\nKWMM0xOS+WprDg8P68h1vZ13TqI6F7Ztwqzru7F+z2Hu/mgTpWXlVpekVK3QoFB16j+rMvlg3V7u\nvCSaOy+5wOpy6tyIri2YMTKWr7cd4LHFyXoGt3IJeqaQqjPLknKYuSyNkd1aMHVoR6vLqTcTBkSR\nd7yIN77fSZsmgUwc6PoBqVybBoWqE5uzjvL3jzfTs1UIL4zp6nTnSdTUfYM7sPtgIc8uS6NNkwYM\njgm3uiSlzpvuelK1LqfgFH+dl0jTIF/mjo9zy8uKengIL47tRpeIYO5ZsInUnGNWl6TUedOgULWq\n6HQZkz7YSNHpMt6Z0JvGDXytLsky/j6ezB0fR5CfN3+dl8ghPcdCOSkNClVrjDFMW5zMlqyjvHRd\nN9qFN7S6JMuFB/kxd3wc+SeKuXuBHgmlnJNdQSEiQ0Vku4hkiMjUszwvIvKq7fmtItKzur4iMlZE\nUkSkXETiztg+WEQ2iEiS7faymg5S1Y///rKXhYnZ3HVZW4bENrO6HIfRpWUwT43qzM8Zh3jx6x1W\nl6PUOas2KETEE3gDGAbEAONEJKZSs2FAO9vPRGC2HX2TgdHAqkqvdRAYaYzpAkwA3j/3Yan6tnHv\nER7/MoVBHcL4+xXtrS7H4VwXF8m4Pq1488edLE/Osbocpc6JPd8o+gAZxphMY0wJsACIr9QmHphv\nKqwDQkSkeVV9jTGpxpjtld/MGLPJGLPf9jAF8BcR993R7QSOFpZw14ebCA/y45Xre+DpAFemc0Qz\nro6hW2QI93+ylV0HT1pdjlJ2sycoIoCsMx5n27bZ08aevlW5FthojPnDLKCITBSRRBFJzM/PP4eX\nVLXJGMP9n2ypOG/gxp4EB3hbXZLD8vXy5N839cTLU5jy4Ua9Qp5yGg47mS0iscBzwJ1ne94YM8cY\nE2eMiQsLC6vf4tT/vLV6F9+m5vHI8E50c8BLmDqaiBB/XhzTjZT9x3h2aZrV5ShlF3uCYh9w5gI9\nLW3b7GljT98/EJGWwOfAeGPMTjtqVBbYtPcIzy1PY0hsOLcOiLK6HKdxRUw4f7mwDe+t2c3y5Fyr\ny1GqWvYExXqgnYi0EREf4AYgoVKbBGC87einfkCBMSbHzr6/IyIhwBJgqjHm53Mcj6onJ4pLuWfB\nZsKD/Hh+TDe3O/O6pqYO60jXlsE8+OkW9h09ZXU5SlWp2qAwxpQCU4AVQCqw0BiTIiKTRGSSrdlS\nIBPIAOYCk6vqCyAi14hINtAfWCIiK2yvNQVoC0wTkc22n6a1M1xVW2YkpJB9pJBXbuhOsL/OS5wr\nHy8PXhvXo2IJ9o83U1auiwcqxyWusLplXFycSUxMtLoMt/HV1v1M+XATd1/Wlnuv7GB1OU7tk8Qs\nHvh0Kw8N7cj/XaqLB6r6JSIbjDFx1bVz2Mls5Zj2HT3FI4uS6NEqhLsvb2d1OU5vTK+WDO/SjFnf\nbCd5X4HV5Sh1VhoUym7l5YYHPtlCWbnhX9f3wMtT//epKRHh6VFdCA304Z4FmzhVoofMKsej/9KV\n3eav3c2anYd4bEQMrRoHWF2Oy2gU6MNLY7uzM/8kz6/QQ2aV49GgUHbJzD/BzOVpXNaxKde78OVM\nrXJRuyZM6N+ad3/ezdqdh6wuR6nf0aBQ1SotK+fehVvw8/Zk5ugueihsHXloWEeiGgfwwKdbOFFc\nanU5Sv2PBoWq1pzVmWzOOsqT8Z1pGuRndTkuK8DHixfHdmPf0VM8szTV6nKU+h8NClWl9APHeeWb\ndIZ3acbIbi2sLsflxUWFcsfF0Xz4y15W7dA1zJRj0KBQf6qs3PDAp1sJ9PXkifjOVpfjNu4d3J7o\nsEAeXpSku6CUQ9CgUH/qnZ92sTnrKDOujqWJG1/StL75eXvywpiu7C84xfPL9SgoZT0NCnVWmfkn\nePHr7QyOCedq3eVU73q1DuXWAVHMX7uHXzL1KChlLQ0K9Qfl5Yapi5Lw9fLg6VGd9SgnizwwpAOR\nof489NlWPRFPWUqDQv3BgvVZ/LrrMP+8KkaPcrJQgI8Xz43uyu5DhbyyUq+1rayjQaF+58CxIp5d\nmsqACxozNq6l1eW4vQFtm3B9XCRvrd6la0Epy2hQqN+ZvjiFkrJynrlGT6xzFI8M70SjAB8eXpRE\naVm51eUoN6RBof5neXIuy1Ny+cfg9kQ1CbS6HGUTHODNjKtjSNpXwHtrdltdjnJDGhQKgONFp5me\nkExM8yD+elEbq8tRlVzVpTmXdWzKS1/vIOtwodXlKDejQaEAeOnrHeQdL+bZ0V10+XAHJCI8Oaoz\nIjBtcTKucMEx5Tz0E0GxJeso89buZkL/KLpFhlhdjvoTESH+3Du4Pd9vz2dZcq7V5Sg3okHh5krL\nynl4URJNG/py35XtrS5HVePWAVHENA/i8S9TOF502upylJvQoHBz763ZzbacYzx+dSwN/bytLkdV\nw8vTg2dGdyHveDEvfa3nVqj6oUHhxvYfPcWsb3ZwecemDIltZnU5yk7dI0MY368189buZkvWUavL\nUW5Ag8KNPfHlNsqNYcbVsXrOhJO5b0gHwhr48s8vkikr14ltVbc0KNzU92l5LE/J5a7L2hEZqte/\ndjZBft48NqLi3Ir//rLH6nKUi9OgcEOnSsqYlpDMBWGB3HFxtNXlqPM0omtzLmrbhBdWbCfveJHV\n5SgXpkHhht74PoOsw6d4clRnfLz0fwFnJSI8ER9L8elynlmil05VdUc/JdxMZv4J5qzK5JoeEQy4\noInV5agaig5rwJ2XRPPF5v2syThodTnKRdkVFCIyVES2i0iGiEw9y/MiIq/ant8qIj2r6ysiY0Uk\nRUTKRSSu0us9bGu/XUSG1GSA6v8zxjA9IQVfLw8eHt7R6nJULfnboLZEhvozLSGFklJdNFDVvmqD\nQkQ8gTeAYUAMME5EYio1Gwa0s/1MBGbb0TcZGA2sqvR+McANQCwwFPi37XVUDS1LzmV1+kHuu7I9\nTRvqdSZchZ+3JzNGxpKRd4J3f95ldTnKBdnzjaIPkGGMyTTGlAALgPhKbeKB+abCOiBERJpX1dcY\nk2qM2X6W94sHFhhjio0xu4AM2+uoGjhZXMoTX24jpnkQN/drbXU5qpZd3imcKzqF86+V6eQUnLK6\nHOVi7AmKCCDrjMfZtm32tLGn7/m8nzpHr36XTu6xIp4c1VkX/XNR00fGUFZueOorndhWtctpPzFE\nZKKIJIpIYn5+vtXlOLSMvOO8vXoX18W1pFfrRlaXo+pIZGgAUwa1ZUlSDqvT9d+Eqj32BMU+IPKM\nxy1t2+xpY0/f83k/jDFzjDFxxpi4sLCwal7Sff02gR3g48lDQ3UC29XdMTCaqMYBTNeJbVWL7AmK\n9UA7EWkjIj5UTDQnVGqTAIy3Hf3UDygwxuTY2beyBOAGEfEVkTZUTJD/eg5jUmdYmpTLzxmHuH9I\nBxo38LW6HFXH/Lw9mX51LJn5J3lHJ7ZVLak2KIwxpcAUYAWQCiw0xqSIyCQRmWRrthTIpGLieS4w\nuaq+ACJyjYhkA/2BJSKywtYnBVgIbAOWA38zxpTV0njdysniUp5aUjGBfVNfncB2F4M6NGVwTDiv\n6sS2qiXiClfKiouLM4mJiVaX4XBmLkvjzR938tn/9adX61Cry1H1KOtwIVfM+pHBMeG8fmPP6jso\ntyQiG4wxcdW1c9rJbFW1nfknePunTK7t2VJDwg1FhgYw+dK2fLU1hzU79YxtVTMaFC7IGMOMhBT8\nvD2ZOkwnsN3VnZdE0yo0gOmLUzhdphPb6vxpULigr7cdYHX6Qe4d3J6whjqB7a78vD2ZNiKG9LwT\nzFuz2+pylBPToHAxp0rKeOLLbXQIb8gtega227u8U1MGdQjjlW/TyTumS5Gr86NB4WJm/7iTfUdP\n8Xh8rJ6BrRARpo+MpaS0nJnL0qwuRzkp/SRxIXsPFfLmjzu5ulsL+kU3troc5SCimgQycWA0izbt\nY/3uw1aXo5yQBoULeeKrbXh7CI9e1cnqUpSDmTzoAloE+zFtcYpeY1udMw0KF/F9Wh7fph7grsvb\nER6kS4ir3wvw8eKfI2JIzTmm19hW50yDwgUUl5bx+JcpRDcJ5C8XtrG6HOWghnVuxoVtG/Piiu0c\nOlFsdTnKiWhQuIC3Vu9i96FCZlwdq9fAVn9KRJgxMpbCkjJeWHG2S8EodXb6qeLk9h89xevfZTAk\nNpyB7XUVXVW1duENue3CKD5OzGJz1lGry1FOQoPCyT29NJVyY/jnVZWvTqvU2d19eTuaNPBl+uJk\nynViW9lBg8KJrck4yJKtOUy+tC2RoQFWl6OcREM/bx4Z3pEt2QV8siGr+g7K7WlQOKnTZeVMT0ih\nVWgAd14SbXU5ysmM6h5B76hGPLd8O0cLS6wuRzk4DQonNW/NbtLzTjBtRAx+3p5Wl6OcjIjw+NWd\nOVpYwqxvdlhdjnJwGhROKO9YEa98m86gDmFc3qmp1eUoJxXTIohb+rXmg3V7SNlfYHU5yoFpUDih\nZ5amUlJazrSRsYiI1eUoJ3bv4A40CvBh2uIUXOEiZqpuaFA4mV8yD/HF5v1MHBhNmyaBVpejnFxw\ngDcPDe3Ihj1HWLRxn9XlKAelQeFESm0T2BEh/vxtUFury1EuYkyvlnSPDOHZZWkcKzptdTnKAWlQ\nOJH5a/eQlnucx0bE4O+jE9iqdnh4CE/Gd+bQyWJe1oltdRYaFE4i73gRL3+zg4HtwxgSG251OcrF\ndGkZzI19WjFvzW5Sc45ZXY5yMBoUTmLm0jSKS8uZMTJGJ7BVnXhgSAeC/b2ZrhPbqhINCifwS+Yh\nFm3ax8SB0USHNbC6HOWiQgJ8eGhoR37dfZjPN+nEtvr/NCgc3OmycqYt1glsVT+ui4uke2QIzyxN\npeCUTmyrChoUDm7emt1sP3CcaSN1AlvVPQ8P4alRnTl8soRZX+tS5KqCBoUDO3DGGdhXxugEtqof\nnSOCublfa95ft4fkfXrGtrIzKERkqIhsF5EMEZl6ludFRF61Pb9VRHpW11dEQkXkGxFJt902sm33\nFpF5IpIkIqki8nBtDNQZPbUklZKycqbrGdiqnt13ZcUZ2//8QpciV3YEhYh4Am8Aw4AYYJyIVL74\nwTCgne1nIjDbjr5TgZXGmHbASttjgLGArzGmC9ALuFNEos5zfE7rp/SDfLllP5MvvYAoPQNb1bNg\nf28eGd6JzVlHWZioS5G7O3u+UfQBMowxmcaYEmABEF+pTTww31RYB4SISPNq+sYD82z35wGjbPcN\nECgiXoA/UAK41YHdxaVlTFucTOvGAUy65AKry1FuanTPCPpEhTJzeRqHT+pS5O7MnqCIAM78kyLb\nts2eNlX1DTfG5Nju5wK/7YT/FDgJ5AB7gReNMYftqNNlzF2VSebBkzwR31mXEFeWERGeuqYzJ4pK\nmbks1epylIUcYjLbVJzd89uO0D5AGdACaAPcJyJ/uDKPiEwUkUQRSczPz6+/YutY1uFCXvsug6u6\nNOcSvQa2slj78IbcfnEbFiZmk7jbrf5eU2ewJyj2AZFnPG5p22ZPm6r6HrDtnsJ2m2fbfiOw3Bhz\n2hiTB/wMxFUuyhgzxxgTZ4yJCwtzjQ9UYwyPLU7Gy0P454hOVpejFAB3X9aOFsF+/POLZE6XlVtd\njrKAPUGxHmgnIm1ExAe4AUio1CYBGG87+qkfUGDbrVRV3wRggu3+BGCx7f5e4DIAEQkE+gFp5zU6\nJ7MsOZcftudz75UdaB7sb3U5SgEQ6OvF9KtjScs9zrs/77K6HGWBaoPCGFMKTAFWAKnAQmNMiohM\nEpFJtmZLgUwgA5gLTK6qr63PTGCwiKQDV9geQ8VRUg1EJIWKoHnXGLO1xiN1cMeLTvP4lynEtghi\nQv/WVpej1O9cGRPOFZ2a8vI36WQfKbS6HFXPxBUW/4qLizOJiYlWl1EjMxJSmLd2N59PvpDukSFW\nl6PUH+w7eorBs36kf3Rj3poQp+f2uAAR2WCM+cOu/cocYjLb3SVlFzB/7W5u7ttaQ0I5rIgQf/5x\nRXtWpuWxIiXX6nJUPdKgsFhpWTkPf76Vxg18uX9IB6vLUapKt10YRafmQcxI2MZxvRqe29CgsNh7\na3aTvO8YM0bGEuzvbXU5SlXJy9ODZ67pzIHjRbz0tV4Nz11oUFgo+0ghL329g8s6NmV4l2ZWl6OU\nXXq0asT4fq2Zt3Y3m/YesbocVQ80KCxijGHa4hRE4Il4XfRPOZf7h3QgvKEfDy9K0nMr3IAGhUWW\nJOXwXVoe9w5uT8tGAVaXo9Q5aejnzePxFedWzF2daXU5qo5pUFjgaGEJMxJS6BIRzK0DoqwuR6nz\nMiS2GUNiw/nXt+nsPnjS6nJUHdKgsMDTS1I5Uniamdd2wctTfwXKeT1+dWd8PD145PMkXOGcLHV2\n+ilVz35KP8gnG7K5c2A0sS2CrS5HqRppFuzH1OEdWbPzkF63woVpUNSjUyVlPPJ5Em2aBHL35e2s\nLkepWjGudyv6tAnlqSWpHDhWZHU5qg5oUNSjl7/dwd7DhTw7uoteZ0K5DA8PYeboLpSUljNtcbLV\n5ag6oEFRTzbtPcJbqzMZ16cV/aIbW12OUrUqOqwBf7+iPStSDrA0Kaf6DsqpaFDUg+LSMh78dCvh\nQX48Mryj1eUoVSfuuLgNnSOCmLY4WS+d6mI0KOrBayszSM87wTOju9DQT5fpUK7Jy9ODF8Z0o+BU\nxZL5ynVoUNSx5H0FzP5xJ9f2bMmgDk2tLkepOtWpeRB/G9SWxZv387WuMOsyNCjqUElpOfd/soXQ\nQB8e00ubKjcx+dK2dGzWkEe/SOZooe6CcgUaFHXo9e/SScs9zjPXdCEkwMfqcpSqFz5eHrw4thuH\nT5bwxJfbrC5H1QINijqSlF3AGz/sZHTPCAbHhFtdjlL1qnNEMH8b1JZFm/bpRY5cgAZFHSguLeO+\nTzbTpIEP00fGWl2OUpaYMqgtMc2DePTzJD0KyslpUNSBV75NZ8eBE8y8tqtejEi5LR8vD2ZdX3EU\n1GNf6Il4zkyDopYl7j7Mf37cyfVxkXqUk3J7HZsF8fcr2rMkKYeELfutLkedJw2KWnSiuJR7F24h\nopE/j42MsbocpRzCnQOj6dEqhMe+SCa3QNeCckYaFLXo6SXbyDpSyKzrutPA18vqcpRyCF6eHsy6\nrjslpeU88OkWyst1OXJno0FRS1amHuCjX7O4c+AF9I4KtbocpRxKmyaBPDYihtXpB5m/drfV5ahz\npEFRC/KPF/PQZ1vp2Kwh/xisy4crdTbj+kRyWcemPLssjfQDx60uR50DDYoaMsbw4KdbOF5Uyqvj\neuDrpcuHK3U2IsLMa7sQ6OvFPQs2U1xaZnVJyk4aFDX0/ro9fL89n0eGd6J9eEOry1HKoTVt6Mdz\n13ZlW84xXlyx3epylJ3sCgoRGSoi20UkQ0SmnuV5EZFXbc9vFZGe1fUVkVAR+UZE0m23jc54rquI\nrBWRFBFJEhG/mg60Luw4cJynl6QyqEMY4/u3trocpZzC4JhwbunXmrmrd7FqR77V5Sg7VBsUIuIJ\nvAEMA2KAcSJS+djPYUA7289EYLYdfacCK40x7YCVtseIiBfwATDJGBMLXAqcPv8h1o2i02Xc/dEm\nGvh68fyYboiI1SUp5TQevaoT7Zo24L5PtnDoRLHV5ahq2PONog+QYYzJNMaUAAuA+Ept4oH5psI6\nIEREmlfTNx6YZ7s/Dxhlu38lsNUYswXAGHPIGONwOzOfXpJKWu5xXhzbjbCGvlaXo5RT8fP25NVx\nPSg4dZoHPt2KMXrIrCOzJygigKwzHmfbttnTpqq+4caY366ZmAv8tnJee8CIyAoR2SgiD56tKBGZ\nKCKJIpKYn1+/X1+XJ+fw/ro93HFxGwZ11LOvlTofnZoH8ciwjnyXlsfbP+2yuhxVBYeYzDYVf078\n9ieFF3ARcJPt9hoRufwsfeYYY+KMMXFhYWH1Vmv2kUIe/HQrXVsG88AQvaypUjUxYUAUV8aE89zy\nNLZkHbW6HPUn7AmKfUDkGY9b2rbZ06aqvgdsu6ew3ebZtmcDq4wxB40xhcBSoCcO4HRZOXd/tIly\nA6+N64GPl0PkrFJOS0R4YUw3mjb0Y8pHGzlW5HDTkQr7gmI90E5E2oiID3ADkFCpTQIw3nb0Uz+g\nwLZbqaq+CcAE2/0JwGLb/RVAFxEJsE1sXwI4xNVPnl+exsa9R3l2dBdaNw60uhylXEJwgDevjuvB\n/qNFTP1M5yscUbVBYYwpBaZQ8QGeCiw0xqSIyCQRmWRrthTIBDKAucDkqvra+swEBotIOnCF7THG\nmCPALCpCZjOw0RizpBbGWiMrUnKZu3oX4/u3ZmS3FlaXo5RL6dW6EQ8O6cDSpFzeW7Pb6nJUJeIK\n6R0XF2cSExPr7PX3HirkqtdW06ZJIJ9M6q9nXytVB4wxTHx/A9+n5fHxnf3p1bpR9Z1UjYjIBmNM\nXHXtdCd7NYpOlzH5ww0I8MaNPTUklKojIsKLY7vRPMSPKR9u1PMrHIgGRRWMMTz2RTLJ+44x67ru\nRIYGWF2SUi4t2N+b2Tf14tDJEv7+8WbKdElyh6BBUYUPf93LJxuyueuytlwRE159B6VUjXWOCObJ\n+FhWpx/kxa91PShHoEHxJzbtPcKMhBQuaR/G369ob3U5SrmV63u3YlyfVsz+YSfLknKq76DqlAbF\nWeQfL+b/PthIs2A//nVDdzw9dB0nperbjKtj6NEqhPs/2aLXr7CYBkUlxaVlTPpgA0dPlfDmzb0I\nCfCxuiSIb236AAANYElEQVSl3JKvlyezb+qFv48XE9/fQEGhnoxnFQ2KMxhjmPZFChv2HOHFsd2I\nbRFsdUlKubVmwX7Mvrkn2UcKmfLRRkrLyq0uyS1pUJxh/to9fJyYxZRBbRnRVU+qU8oR9I4K5alR\nnVmdfpBnlqZZXY5b8rK6AEexOj2fJ77axhWdwrl3sE5eK+VIru/dirTc47zz8y46NmvIdb0jq++k\nao1+owDSDxxn8n830q5pA165oTseOnmtlMN5dHgnLm7XhEe/SGLtzkNWl+NW3D4oDp0o5i/z1uPr\n5cnbt/amga9+yVLKEXl5evD6jT1p3TiQSR9sYGf+CatLchtuHRRFp8uY+P4G8o4V89aEOCJC/K0u\nSSlVhWB/b969tTdeHsJf3lvP4ZMlVpfkFtw6KLZkHSVpXwEvX9+d7pEhVpejlLJDZGgAcyfEkVtQ\nxMT5iRSddrgrJbsctw6KvtGNWf3gIIZ3aW51KUqpc9CzVSNmXdedxD1H+IeuCVXn3DooAMKD/Kwu\nQSl1Hq7q2pzHRsSwLDmXJ75M0Qse1SGduVVKOa3bL2pDbsEp5q7eRbNgf/7v0gusLsklaVAopZza\nw8M6ceBYMc8tT6NxoI+eY1EHNCiUUk7Nw6PigkdHCkuYumgrQf5eDO2s8461ye3nKJRSzs/Hy4P/\n3NKL7pEh3P3RZn5KP2h1SS5Fg0Ip5RICfLx499Y+RIcFMvH9RDbsOWx1SS5Dg0Ip5TKCA7yZf3sf\nwoP8uPWd9WzJOmp1SS5Bg0Ip5VKaNvTjwzv6EhLozS1v/0LyvgKrS3J6GhRKKZfTPNifD//aj4Z+\nFWGRmnPM6pKcmgaFUsolRYYG8OEdffHz9mTc3HX6zaIGNCiUUi6rdeNAPp7Yn0AfL2566xeSsjUs\nzocGhVLKpbVqHMCCif0I8vfixrfWsXHvEatLcjp2BYWIDBWR7SKSISJTz/K8iMirtue3ikjP6vqK\nSKiIfCMi6bbbRpVes5WInBCR+2syQKWUigwN4OOJ/Wkc6MPNb/2i51mco2qDQkQ8gTeAYUAMME5E\nYio1Gwa0s/1MBGbb0XcqsNIY0w5YaXt8plnAsvMYk1JK/UGLEH8WTupPq9AA/vLeelak5FpdktOw\n5xtFHyDDGJNpjCkBFgDxldrEA/NNhXVAiIg0r6ZvPDDPdn8eMOq3FxORUcAuIOU8x6WUUn/QtKEf\nCyb2IzYiiMn/3cgniVlWl+QU7AmKCODM/5rZtm32tKmqb7gxJsd2PxcIBxCRBsBDwON21KaUUuck\nJMCHD27vy4ALGvPAp1t5/bt0XaK8Gg4xmW0qfku//aZmAC8bY6q8IK6ITBSRRBFJzM/Pr+sSlVIu\nJNDXi7cn9OaaHhG8+PUOHlucrBc/qoI9q8fuA85ct7elbZs9bbyr6HtARJobY3Jsu6nybNv7AmNE\n5HkgBCgXkSJjzOtnvqExZg4wByAuLk5/w0qpc+Lj5cFLY7vRNMiX//yYSW5BEf+6oQeBvrqodmX2\nfKNYD7QTkTYi4gPcACRUapMAjLcd/dQPKLDtVqqqbwIwwXZ/ArAYwBhzsTEmyhgTBbwCPFM5JJRS\nqjZ4eAgPD+vEE/GxfJeWx9g315JTcMrqshxOtUFhjCkFpgArgFRgoTEmRUQmicgkW7OlQCaQAcwF\nJlfV19ZnJjBYRNKBK2yPlVKq3o3vH8Xbt/Zm7+FCRr3xs56YV4m4wiROXFycSUxMtLoMpZSTS8s9\nxu3vJXLwRDHPj+lKfPfKx+24FhHZYIyJq66dQ0xmK6WUI+jYLIjFUy6kW2QI9yzYzDNLU3WSGw0K\npZT6nSYNfPnvX/syvn9r5qzKZMI7v3LoRLHVZVlKg0IppSrx9vTgifjOPH9tV9bvPsxVr/5E4m73\nvWKeBoVSSv2J63pHsmjyAHy9Pbhhzjrmrsqk3A13RWlQKKVUFWJbBJMw5SIu79SUp5em8pd56zno\nZruiNCiUUqoawf7evHlzL56Mj2XNzkMMfWU1q9PdZ0UIDQqllLKDiHBL/ygSplxIowBvbnn7V2Yk\npHCqpMzq0uqcBoVSSp2Djs2C+PKui7jtwijeW7Obq15dzSYXvxiSBoVSSp0jP29Ppo+M5cO/9qXo\ndBnXzl7DM0tTXfbbhQaFUkqdpwFtm7D8HwO5vnckc1ZlMuxfq1iXecjqsmqdBoVSStVAkJ83z47u\nyod/7Uu5gRvmrOP+T7a41El6GhRKKVULBrRtwoq/D2TypRfwxaZ9XD7rRz76da9LnHehQaGUUrXE\n38eTB4d2ZNk9F9M+vCEPL0oi/o2fnf6sbg0KpZSqZe3CG/LxxH7864bu5B8vZsyba7nro01kHS60\nurTzopdyUkqpOiAixHePYHBMOG/+sJP/rMpkRXIu4/u3ZsplbQkJ8LG6RLvp9SiUUqoe5BScYtbX\nO/h0YzYNfL24c2A0t17YhgYWXnrV3utRaFAopVQ9Sss9xgvLt7MyLY/QQB8mXRLNzf1aE+BT/4Gh\nQaGUUg5s094jzPpmB6vTDxIa6MPtF7Xhlv6tCfLzrrcaNCiUUsoJbNhzhDe+z+C7tDwa+nlxY99W\n3DagDc2C/er8vTUolFLKiSTvK2D2DztZlpyDhwhXd2vBXy5qQ+eI4Dp7Tw0KpZRyQlmHC3n7p10s\nTMyisKSMnq1CmDAgiqGdm+Hr5Vmr76VBoZRSTqzg1Gk+3ZDN+2t3s/tQIaGBPlzbM4Lre7eibdMG\ntfIeGhRKKeUCyssNqzMO8tEve/k29QCl5YZerRsxumcEI7q0IDjg/Ce/NSiUUsrF5B8v5rON2Xy2\nIZv0vBP4eHkwvl9r/jki5rxez96g0DOzlVLKSYQ19GXSJRdw58Bokvcd47ON2UQ08q/z99WgUEop\nJyMidGkZTJeWdXdE1JnsWhRQRIaKyHYRyRCRqWd5XkTkVdvzW0WkZ3V9RSRURL4RkXTbbSPb9sEi\nskFEkmy3l9XGQJVSSp2faoNCRDyBN4BhQAwwTkQq7xAbBrSz/UwEZtvRdyqw0hjTDlhpewxwEBhp\njOkCTADeP+/RKaWUqjF7vlH0ATKMMZnGmBJgARBfqU08MN9UWAeEiEjzavrGA/Ns9+cBowCMMZuM\nMftt21MAfxHxPc/xKaWUqiF7giICyDrjcbZtmz1tquobbozJsd3PBcLP8t7XAhuNMa5zTUGllHIy\nDjGZbYwxIvK743RFJBZ4DrjybH1EZCIVu7lo1apVndeolFLuyp5vFPuAyDMet7Rts6dNVX0P2HZP\nYbvN+62RiLQEPgfGG2N2nq0oY8wcY0ycMSYuLCzMjmEopZQ6H/YExXqgnYi0EREf4AYgoVKbBGC8\n7einfkCBbbdSVX0TqJisxna7GEBEQoAlwFRjzM81GJtSSqlaUO2uJ2NMqYhMAVYAnsA7xpgUEZlk\ne/5NYCkwHMgACoHbqupre+mZwEIRuR3YA1xn2z4FaAtME5Fptm1XGmP+941DKaVU/XGJJTxEJJ+K\nsDlfTag4LNdduNt4QcfsLnTM56a1MabaffcuERQ1JSKJ9qx34ircbbygY3YXOua6YdeZ2UoppdyX\nBoVSSqkqaVBUmGN1AfXM3cYLOmZ3oWOuAzpHoZRSqkr6jUIppVSV3CYoarJUurOyY8w32caaJCJr\nRKSbFXXWpurGfEa73iJSKiJj6rO+umDPmEXkUhHZLCIpIvJjfddY2+z4fztYRL4UkS22Md9mRZ21\nRUTeEZE8EUn+k+fr9vPLGOPyP1Sc7LcTiAZ8gC1ATKU2w4FlgAD9gF+srrsexjwAaGS7P8wdxnxG\nu++oOFF0jNV118PvOQTYBrSyPW5qdd31MOZHgOds98OAw4CP1bXXYMwDgZ5A8p88X6efX+7yjaIm\nS6U7q2rHbIxZY4w5Ynu4joq1uJyZPb9ngLuAzzhjfTEnZs+YbwQWGWP2AhjnX+XAnjEboKGICNCA\niqAord8ya48xZhUVY/gzdfr55S5BUZOl0p3VuY7ndir+InFm1Y5ZRCKAa7BdXMsF2PN7bg80EpEf\nbFeNHF9v1dUNe8b8OtAJ2A8kAfcYY8rrpzxL1Onnl0MsM66sJSKDqAiKi6yupR68AjxkjCmv+GPT\nLXgBvYDLAX9grYisM8bssLasOjUE2AxcBlwAfCMiq40xx6wtyzm5S1DUZKl0Z2XXeESkK/AWMMwY\nc6ieaqsr9ow5DlhgC4kmwHARKTXGfFE/JdY6e8acDRwyxpwETorIKqAb4KxBYc+YbwNmmood+Bki\nsgvoCPxaPyXWuzr9/HKXXU81WSrdWVU7ZhFpBSwCbnGRvy6rHbMxpo0xJsoYEwV8Ckx24pAA+/7f\nXgxcJCJeIhIA9AVS67nO2mTPmPdS8Q0KEQkHOgCZ9Vpl/arTzy+3+EZharBUurOyc8zTgMbAv21/\nYZcaJ15Qzc4xuxR7xmyMSRWR5cBWoBx4yxhz1sMsnYGdv+cngfdEJImKI4EeMsY47aqyIvIRcCnQ\nRESygemAN9TP55eema2UUqpK7rLrSSml1HnSoFBKKVUlDQqllFJV0qBQSilVJQ0KpZRSVdKgUEop\nVSUNCqWUUlXSoFBKKVWl/wd+zHJ2fWHvpQAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(t, b)" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Check that we get 1:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "1.4137685733217609e-07" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 - ls_qr(A, b)[14]" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "Bad condition number:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "5.827807196683593e+17" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kappa = np.linalg.cond(A); kappa" ] }, { "cell_type": "code", "execution_count": 181, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "row_names = ['Normal Eqns- Naive',\n", " 'QR Factorization', \n", " 'SVD', \n", " 'Scipy lstsq']\n", "\n", "name2func = {'Normal Eqns- Naive': 'ls_naive', \n", " 'QR Factorization': 'ls_qr',\n", " 'SVD': 'ls_svd',\n", " 'Scipy lstsq': 'scipylstq'}" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "pd.options.display.float_format = '{:,.9f}'.format\n", "df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "for name in row_names:\n", " fcn = name2func[name]\n", " t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())\n", " coeffs = locals()[fcn](A, b)\n", " df.set_value(name, 'Time', t)\n", " df.set_value(name, 'Error', np.abs(1 - coeffs[-1]))" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "hidden": true, "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeError
Normal Eqns- Naive0.0015650991.357066025
QR Factorization0.0026321040.000000116
SVD0.0035037850.000000116
Scipy lstsq0.0027635020.000000116
\n", "
" ], "text/plain": [ " Time Error\n", "Normal Eqns- Naive 0.001565099 1.357066025\n", "QR Factorization 0.002632104 0.000000116\n", "SVD 0.003503785 0.000000116\n", "Scipy lstsq 0.002763502 0.000000116" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": { "hidden": true }, "source": [ "\n", "The solution for least squares via the normal equations is unstable in general, although stable for problems with small condition numbers." ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Low-rank" ] }, { "cell_type": "code", "execution_count": 258, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "m = 100\n", "n = 10\n", "x = np.random.uniform(-10,10,n)\n", "A2 = np.random.uniform(-40,40, [m, int(n/2)]) # removed np.asfortranarray\n", "A = np.hstack([A2, A2])" ] }, { "cell_type": "code", "execution_count": 259, "metadata": { "hidden": true }, "outputs": [ { "data": { "text/plain": [ "((100, 10), (100, 5))" ] }, "execution_count": 259, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A.shape, A2.shape" ] }, { "cell_type": "code", "execution_count": 260, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "b = A @ x + np.random.normal(0,1,m)" ] }, { "cell_type": "code", "execution_count": 263, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "row_names = ['Normal Eqns- Naive',\n", " 'QR Factorization', \n", " 'SVD', \n", " 'Scipy lstsq']\n", "\n", "name2func = {'Normal Eqns- Naive': 'ls_naive', \n", " 'QR Factorization': 'ls_qr',\n", " 'SVD': 'ls_svd',\n", " 'Scipy lstsq': 'scipylstq'}" ] }, { "cell_type": "code", "execution_count": 264, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "pd.options.display.float_format = '{:,.9f}'.format\n", "df = pd.DataFrame(index=row_names, columns=['Time', 'Error'])" ] }, { "cell_type": "code", "execution_count": 265, "metadata": { "collapsed": true, "hidden": true }, "outputs": [], "source": [ "for name in row_names:\n", " fcn = name2func[name]\n", " t = timeit.timeit(fcn + '(A,b)', number=5, globals=globals())\n", " coeffs = locals()[fcn](A, b)\n", " df.set_value(name, 'Time', t)\n", " df.set_value(name, 'Error', regr_metrics(b, A @ coeffs)[0])" ] }, { "cell_type": "code", "execution_count": 266, "metadata": { "hidden": true, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TimeError
Normal Eqns- Naive0.001227640300.658979382
QR Factorization0.0023159200.876019803
SVD0.0017456471.584746056
Scipy lstsq0.0020679890.804750398
\n", "
" ], "text/plain": [ " Time Error\n", "Normal Eqns- Naive 0.001227640 300.658979382\n", "QR Factorization 0.002315920 0.876019803\n", "SVD 0.001745647 1.584746056\n", "Scipy lstsq 0.002067989 0.804750398" ] }, "execution_count": 266, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our results from above:" ] }, { "cell_type": "code", "execution_count": 257, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# rows100100010000
# cols201001000201001000201001000
Normal Eqns- Naive0.0012760.003634NaN0.0009600.0051720.2931260.0022260.0212481.164655
Normal Eqns- Cholesky0.0016600.003958NaN0.0016650.0040070.0936960.0019280.0104560.399464
QR Factorization0.0021740.006486NaN0.0042350.0177730.2132320.0192290.1161222.208129
SVD0.0038800.021737NaN0.0046720.0269501.2804900.0181380.1306523.433003
Scipy lstsq0.0043380.020198NaN0.0043200.0211991.0838040.0122000.0884672.134780
\n", "
" ], "text/plain": [ "# rows 100 1000 \\\n", "# cols 20 100 1000 20 100 1000 \n", "Normal Eqns- Naive 0.001276 0.003634 NaN 0.000960 0.005172 0.293126 \n", "Normal Eqns- Cholesky 0.001660 0.003958 NaN 0.001665 0.004007 0.093696 \n", "QR Factorization 0.002174 0.006486 NaN 0.004235 0.017773 0.213232 \n", "SVD 0.003880 0.021737 NaN 0.004672 0.026950 1.280490 \n", "Scipy lstsq 0.004338 0.020198 NaN 0.004320 0.021199 1.083804 \n", "\n", "# rows 10000 \n", "# cols 20 100 1000 \n", "Normal Eqns- Naive 0.002226 0.021248 1.164655 \n", "Normal Eqns- Cholesky 0.001928 0.010456 0.399464 \n", "QR Factorization 0.019229 0.116122 2.208129 \n", "SVD 0.018138 0.130652 3.433003 \n", "Scipy lstsq 0.012200 0.088467 2.134780 " ] }, "execution_count": 257, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From Trefethen (page 84):\n", "\n", "Normal equations/Cholesky is fastest when it works. Cholesky can only be used on symmetric, positive definite matrices. Also, normal equations/Cholesky is unstable for matrices with high condition numbers or with low-rank.\n", "\n", "Linear regression via QR has been recommended by numerical analysts as the standard method for years. It is natural, elegant, and good for \"daily use\"." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 2 }