{ "metadata": { "name": "", "signature": "sha256:c664aaccd628e799654d187fc537bc02c6b5f9877793da347c9e0d8b8e74e511" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding A Trend Line Of Data\n", "\n", "- **Author:** [Chris Albon](http://www.chrisalbon.com/), [@ChrisAlbon](https://twitter.com/chrisalbon)\n", "- **Date:** -\n", "- **Repo:** [Python 3 code snippets for data science](https://github.com/chrisalbon/code_py)\n", "- **Note:** Based on: SciPy And NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "from scipy.optimize import curve_fit" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating some clean data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# create a function called func(),\n", "def func(x, a, b):\n", " # that multiplies a and x, then adds b\n", " return a * x + b" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# create 100 elements, evenly spaced between 0 and 10\n", "x = np.linspace(0, 10, 100)\n", "\n", "# mix up the data by applying func()\n", "y = func(x, 1, 2)\n", "\n", "# view the first few entries\n", "y[0:10]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "array([ 2. , 2.1010101 , 2.2020202 , 2.3030303 , 2.4040404 ,\n", " 2.50505051, 2.60606061, 2.70707071, 2.80808081, 2.90909091])" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add noise to the data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Add some randomness to the data by multiplying y by 100 values drawn from the normal distribution\n", "yn = y + 0.9 * np.random.normal(size=len(x))\n", "\n", "# view the first few entries\n", "yn[0:10]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 20, "text": [ "array([ 3.63606696, 3.51496281, 2.68882029, 2.11995938, 2.36446115,\n", " 3.41454019, 1.66117548, 2.59706936, 3.22359943, 2.16499171])" ] } ], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fitting a line to the data\n", "\n", "- func = the model function\n", "- x = the independent variable\n", "- y = the dependent variable" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# return an array of values (popt) so that the squared error of the line is minimized\n", "# also return a 2d array with eh covariance of popt\n", "popt, pcov = curve_fit(func, x, yn)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is popt?\n", "\n", "popt returns the values of x (the independent variable) and yn (the dependent variable) with the smallest squared error" ] }, { "cell_type": "code", "collapsed": false, "input": [ "popt" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 26, "text": [ "array([ 0.92825153, 2.25863607])" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that if x = 0.92 and y = 2.25, the function (called func()) best fits the data" ] } ], "metadata": {} } ] }