{
 "metadata": {
  "name": "",
  "signature": "sha256:c664aaccd628e799654d187fc537bc02c6b5f9877793da347c9e0d8b8e74e511"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Finding A Trend Line Of Data\n",
      "\n",
      "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n",
      "- **Date:** -\n",
      "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n",
      "- **Note:** Based on: SciPy And NumPy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Import modules"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "from scipy.optimize import curve_fit"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Creating some clean data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create a function called func(),\n",
      "def func(x, a, b):\n",
      "    # that multiplies a and x, then adds b\n",
      "    return a * x + b"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create 100 elements, evenly spaced between 0 and 10\n",
      "x = np.linspace(0, 10, 100)\n",
      "\n",
      "# mix up the data by applying func()\n",
      "y = func(x, 1, 2)\n",
      "\n",
      "# view the first few entries\n",
      "y[0:10]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "array([ 2.        ,  2.1010101 ,  2.2020202 ,  2.3030303 ,  2.4040404 ,\n",
        "        2.50505051,  2.60606061,  2.70707071,  2.80808081,  2.90909091])"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Add noise to the data"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Add some randomness to the data by multiplying y by 100 values drawn from the normal distribution\n",
      "yn = y + 0.9 * np.random.normal(size=len(x))\n",
      "\n",
      "# view the first few entries\n",
      "yn[0:10]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "array([ 3.63606696,  3.51496281,  2.68882029,  2.11995938,  2.36446115,\n",
        "        3.41454019,  1.66117548,  2.59706936,  3.22359943,  2.16499171])"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Fitting a line to the data\n",
      "\n",
      "- func = the model function\n",
      "- x = the independent variable\n",
      "- y = the dependent variable"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# return an array of values (popt) so that the squared error of the line is minimized\n",
      "# also return a 2d array with eh covariance of popt\n",
      "popt, pcov = curve_fit(func, x, yn)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### What is popt?\n",
      "\n",
      "popt returns the values of x (the independent variable) and yn (the dependent variable) with the smallest squared error"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "popt"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 26,
       "text": [
        "array([ 0.92825153,  2.25863607])"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This means that if x = 0.92 and y = 2.25, the function (called func()) best fits the data"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}