{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feature Selection and Feature Reduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Objectives\n", "\n", "* Review core concepts around why when and how to select features\n", "* Using machine learning models to decide features to keep\n", "* How data transformations like PCA reduce your data but maintain variance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Class Notes\n", "\n", "We'll primarily be concerned with two sklearn modules today: \n", "\n", "* A deeper dive into [feature_selection](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection)\n", "* Exploring use cases for [dimensionality reduction](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What makes a good feature?\n", "\n", "In the effort of simplicity and generality, good, strong features can be described to have the following attributes:\n", "\n", "* **high variance**: a feature with only one value is significantly less useful than one with many values\n", "* **correlation**: a positive or negative relationship with a target variable; low correlation suggests a lack of that relationship. (Keep in mind that correlation doesn't always mean causation either)\n", "* **predictive power**: features with large coefficients and importances are great; features with close to 0 coefficients, not so great\n", "\n", "### What makes a good model?\n", "\n", "* **good features!**\n", "* and.... **simplicity**\n", "\n", "\n", "_while we haven't used polynomials, there's still a balance for our models between simplicity and feature dependence_\n", "\n", "#### Why?\n", "We should aim to keep our models as simple as possible in order to attribute the most gain. \n", "Simple models are much easier to understand as well" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How do we reduce the number of features in our data?\n", "\n", "There's a number of techniques available in sklearn that automate these processes for us:\n", "\n", "sklearn_helper | technique\n", "---------------|----------\n", "`VarianceThreshold` | Remove features with low variance, based on a tolerance level\n", "`SelectKBest` | Select the best group of correlated features using `feature_selection` tools. K (as usual) is something you search for and define.\n", "`L1 and Trees` | using fit_transform on any supervised learning algorithm that has it can drop features with low coefficients or importances.\n", "\n", "While SKlearn also has a `pipeline` module to _further_ automate this process for you, it is more recommended to explore the data first to get a sense of what you are working with. There's no magic button that says \"solve my problem,\" but if you are interested in automating a model fit (say, a nightly procedue on a deployed model with constantly updated data), then it might be something worth exploring. \n", "\n", "For each below we'll work through Iris and notice how it picks out the best features for us. We'll use iris because the data is well scaled (which otherwise requires finetuning) and relatively predictive (we know there are features more predictive than others).\n", "\n", "For each code sample below:\n", "\n", "1. Review what the code is doing. Consider opening up the help function or reading the documentation on sklearn.\n", "2. find the `.shape` of the new array returned and compare to the original dataset. What columns did it end up keeping, vs removing?\n", "3. Adjust the parameters. Do results change?\n", "4. ** \\* **These are all considered data preprocessing steps. In your final project, what and where might you consider adding one of these processes?" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import pandas as pd\n", "def make_irisdf():\n", " from sklearn.datasets import load_iris\n", " from pandas import DataFrame\n", " iris = load_iris()\n", " df = DataFrame(iris.data, columns=iris.feature_names)\n", " df['target'] = iris.target\n", " return df\n", "\n", "iris = make_irisdf()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn import feature_selection" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `VarianceThreshold`\n", "\n", "Goals:\n", "\n", "1. What is variance?\n", "2. How does changing the threshold change the fit_transform?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sepal length (cm) 0.685694\n", "sepal width (cm) 0.188004\n", "petal length (cm) 3.113179\n", "petal width (cm) 0.582414\n", "dtype: float64\n", " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n", "0 5.1 3.5 1.4 0.2\n", "1 4.9 3.0 1.4 0.2\n", "2 4.7 3.2 1.3 0.2\n", "3 4.6 3.1 1.5 0.2\n", "4 5.0 3.6 1.4 0.2\n", "[[ 5.1 1.4]\n", " [ 4.9 1.4]\n", " [ 4.7 1.3]\n", " [ 4.6 1.5]\n", " [ 5. 1.4]]\n" ] } ], "source": [ "print iris.ix[:,:4].apply(lambda x: x.var())\n", "print iris.ix[:,:4].head()\n", "print feature_selection.VarianceThreshold(threshold=.6).fit_transform(iris.ix[:,:4])[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `SelectKBest`\n", "Goals:\n", "\n", "1. while f test and chi2 are different tests, are the results the same?\n", "2. How might you solve for k?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_math sidebar:_\n", "\n", "$X^2 = \\dfrac{(O-E)^2}{E}$
\n", "O = observed frequencies
\n", "E = expected frequencies
" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n", "0 5.1 3.5 1.4 0.2\n", "1 4.9 3.0 1.4 0.2\n", "2 4.7 3.2 1.3 0.2\n", "3 4.6 3.1 1.5 0.2\n", "4 5.0 3.6 1.4 0.2\n", "sepal length (cm) 119.264502\n", "sepal width (cm) 47.364461\n", "petal length (cm) 1179.034328\n", "petal width (cm) 959.324406\n", "dtype: float64\n", "[[ 5.1 1.4 0.2]\n", " [ 4.9 1.4 0.2]\n", " [ 4.7 1.3 0.2]\n", " [ 4.6 1.5 0.2]\n", " [ 5. 1.4 0.2]]\n", "sepal length (cm) 10.817821\n", "sepal width (cm) 3.594499\n", "petal length (cm) 116.169847\n", "petal width (cm) 67.244828\n", "dtype: float64\n", "[[ 5.1 1.4 0.2]\n", " [ 4.9 1.4 0.2]\n", " [ 4.7 1.3 0.2]\n", " [ 4.6 1.5 0.2]\n", " [ 5. 1.4 0.2]]\n" ] } ], "source": [ "print iris.ix[:,:4].head()\n", "ftest = feature_selection.SelectKBest(score_func=feature_selection.f_classif, k=3)\n", "print pd.Series(ftest.fit(iris.ix[:,:4], iris['target']).scores_, index=iris.ix[:,:4].columns)\n", "print ftest.fit_transform(iris.ix[:,:4], iris['target'])[:5]\n", "\n", "chi = feature_selection.SelectKBest(score_func=feature_selection.chi2, k=3)\n", "print pd.Series(chi.fit(iris.ix[:,:4], iris['target']).scores_, index=iris.ix[:,:4].columns)\n", "print chi.fit_transform(iris.ix[:,:4], iris['target'])[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `LogisticRegression`\n", "Goals:\n", "\n", "1. How is L1 deciding to keep features?\n", "2. How does changing C change the fit_transform results?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n", "0 5.1 3.5 1.4 0.2\n", "1 4.9 3.0 1.4 0.2\n", "2 4.7 3.2 1.3 0.2\n", "3 4.6 3.1 1.5 0.2\n", "4 5.0 3.6 1.4 0.2\n", " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n", "0 0.000000 1.124342 -1.344433 0\n", "1 0.000000 -0.386422 0.122768 0\n", "2 -0.987901 0.000000 1.277067 0\n", "[[ 3.5 1.4]\n", " [ 3. 1.4]\n", " [ 3.2 1.3]\n", " [ 3.1 1.5]\n", " [ 3.6 1.4]]\n" ] } ], "source": [ "from sklearn import linear_model as lm\n", "clf = lm.LogisticRegression(penalty='L1', C=0.1)\n", "print iris.ix[:,:4].head()\n", "\n", "print pd.DataFrame(clf.fit(iris.ix[:,:4], iris['target']).coef_, columns=iris.ix[:,:4].columns)\n", "print clf.fit_transform(iris.ix[:,:4], iris['target'])[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### `DecisionTreeClassifier`\n", "Goals:\n", "\n", "1. What is Gini Importance?\n", "2. How does fit_transform decide what features to keep?\n", "3. How does changing the tree depth (or other preprocessing tools) change the result?" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n", "0 5.1 3.5 1.4 0.2\n", "1 4.9 3.0 1.4 0.2\n", "2 4.7 3.2 1.3 0.2\n", "3 4.6 3.1 1.5 0.2\n", "4 5.0 3.6 1.4 0.2\n", "sepal length (cm) 0.013514\n", "sepal width (cm) 0.000000\n", "petal length (cm) 0.558165\n", "petal width (cm) 0.428322\n", "dtype: float64\n", "[[ 0.2]\n", " [ 0.2]\n", " [ 0.2]\n", " [ 0.2]\n", " [ 0.2]]\n" ] } ], "source": [ "from sklearn import tree\n", "clf = tree.DecisionTreeClassifier(max_depth=4)\n", "print iris.ix[:,:4].head()\n", "print pd.Series(clf.fit(iris.ix[:,:4], iris['target']).feature_importances_, index=iris.ix[:,:4].columns)\n", "print clf.fit_transform(iris.ix[:,:4], iris['target'])[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What if we believe there are hidden features in our data?\n", "_I don't want to get rid of them!_\n", "\n", "Then Principal Component Analysis to the rescue!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "

What is principal component analysis?

\n", "comparing how we measure residuals in regressions vs noise in pca\n", "
\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

Consider Broadway...

\n", "

Manhattan is built on a grid system, with the exception of a couple key points:

\n", "
    \n", "
  • West Village (that's its own story)
  • \n", "
  • Broadway
  • \n", "
\n", "

If we needed to get from Harold Square to Eataly, what is easier to explain?

\n", "
    \n", "
  1. Walk down 6th avenue until 24th street and then walk east until the park
  2. \n", "
  3. Walk down Broadway until you get to Eataly at 24th
  4. \n", "
\n", "

Why is that one easier to explain?

\n", "
\n", "
\n", " \n", "
\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### When should we use it?\n", "PCA is a common technique already used in your day to day:\n", "\n", "* compressing images or files\n", "* Want to reduce computational expense\n", "* Recognition (signal processing, speach, computer vision)\n", "* Bioinformatics (microarray analysis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How does it work?\n", "\n", "Recall that variance is a 1-dimensional metric describing the average distance from the mean. **Covariance** is a representation of variance with respect to other features.\n", "\n", "If variance is a summary of one metric, and a correlation matrix is a square (the relationships of features against each other), what is our expected shape of the covariance matrix?\n", "\n", "\n", "\n", "We can intrepret the covariance matrix as:\n", "\n", "* **diagonals**: the variance of given feature (the covariance of the feature compared to itself)\n", "* **off-diagonals**: the covariance between two given features\n", "\n", "Principal Component Analysis is, essentially, the _decomposition_ of the _covariance_ matrix. We are interested in finding the eigenvalues of a square matrix, which for our needs with PCA, represent the amount of variance explained in each principal component." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Technical Details: finding the decomposition of a square matrix\n", "_here be demons_\n", "\n", "Eigenvalues are defined as found so that:\n", "\n", "$Av = \\lambda v$\n", "\n", "where $A$ is the original square matrix, lambda $\\lambda$ is the _eigenvalue_ and $v$ is the _eigenvector_. \n", "\n", "We can rewrite this as:\n", "\n", "$(A - \\lambda I)v = 0$\n", "\n", "where I is the identity matrix of shape A.\n", "\n", "This then means, since we are finding a nonzero vector, that the determinant of $A - \\lambda I$ must be 0. This solves our _eigenvalues_.\n", "\n", "We can then find the eigenvectors given the shape A and determinants:\n", "\n", "$\\begin{bmatrix} a & b - \\lambda \\\\ c - \\lambda & d \\end{bmatrix} * v = 0$\n", "\n", "applying each _eigenvalue_ will find your _eigenvectors_." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Back to explaining Variance\n", "\n", "So if the eigenvalues from the covariance matrix represent some value of how much variance each component explains (and in order from most to least)... how do we decide how much to keep?\n", "\n", "What does this remind us of?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Examples\n", "\n", "Let's walk through a few examples of decomposition with random data. We'll start with data where we expect the covariance values to be the same as the variance of each feature. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[ 6.66666667 6.66666667 6.66666667]\n", " [ 6.66666667 6.66666667 6.66666667]\n", " [ 6.66666667 6.66666667 6.66666667]]\n", "6.66666666667\n" ] } ], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "random_data = pd.DataFrame({\n", " 'x': range(1, 10),\n", " 'y': range(1, 10),\n", " 'z': range(1, 10),\n", " })\n", "print np.cov(random_data.T, bias=1)\n", "\n", "print np.var(random_data.x.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we want to pull out the eigenvalues and eigenvectors:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.77635684e-15 2.00000000e+01 0.00000000e+00]\n", "[[-0.81649658 0.57735027 0. ]\n", " [ 0.40824829 0.57735027 -0.70710678]\n", " [ 0.40824829 0.57735027 0.70710678]]\n" ] } ], "source": [ "eig, Q = np.linalg.eig(np.cov(random_data.T, bias=1))\n", "# sort for largest eigenvalue\n", "print eig\n", "print Q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How do we evaluate the performance of it?\n", "\n", "1. **It's linear.**\n", "2. Principal Components are sorted from most explanatory to least explanatory\n", "3. We want to maximize the variance explained, and cut off the least informative part\n", "4. What did we learn about that would help solve for this?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAfMAAAFkCAYAAAA0bNKwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAHv1JREFUeJzt3XmYHAWd//F3LiJHAJEHwkoEBf1KRFBjBFQkiOdGBGF1\n", "fxoQMV7rhS7IKq4JIYJ4gKzXIhB+ETxYUVyDi8AKkYAxnBEF8QssCwKiRDCJiSGEZPaPrtFhSGZq\n", "hqnuru7363nmmT6rvzOdzmequvpTo3p6epAkSfU1utUDSJKkJ8cwlySp5gxzSZJqzjCXJKnmDHNJ\n", "kmrOMJckqebGVrXgiBgHnAvsAowHPg3cB/wIuL242b9n5nermkGSpG5QWZgDM4BlmXlkRDwVuBmY\n", "A5yWmadX+LiSJHWVKsP8QuB7xenRwDpgChARcQhwB/CRzFxV4QySJHW8UVU3wEXEBOCHwFnAU4Cb\n", "M3NpRJwAPDUzPzbAfccDU4EHgPWVDipJUuuNAXYCrs/MtWXvVOWaORExCbgI+GpmXhAR22TmiuLq\n", "/wS+NMgipgJXVzmjJEltaH/gmrI3rnIHuB2By4H3Z+bC4uJLI+LDmXk9cBBwwyCLeQDgW9/6FhMn\n", "TqxqVEmS2sLvf/97ZsyYAUX+lVXlmvkJwDbArIiYVVz2EeCLEbGOxqDvGWQZ6wEmTpzIzjvvXNmg\n", "kiS1mSG9tVxZmGfmMcAxG7nq5VU9piRJ3cjSGEmSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJck\n", "qeYMc0mSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJckqeYMc0mSas4wlySp5gxzSZJqzjCXJKnm\n", "DHNJkmrOMJckqeYMc0mSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJckqeYMc0mSas4wlySp5gxz\n", "SZJqzjCXJKnmDHNJkmrOMJckqeYMc0mSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJckqeYMc0mS\n", "as4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJckqeYMc0mSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrO\n", "MJckqeYMc0mSas4wlySp5gxzSZJqzjCXJKnmDHNJkmrOMJckqebGVrXgiBgHnAvsAowHPg3cBswH\n", "NgC3AB/IzJ6qZpAkqRtUuWY+A1iWma8AXgd8FTgNOKG4bBRwSIWPL0lSV6gyzC8EZvV5nHXAizJz\n", "UXHZj4FXVfj4kiR1hcrCPDNXZ+aqiJhAI9j/td/jrQK2KbOs+x5cVcGEkiR1hkp3gIuIScCVwHmZ\n", "+R0a75X3mgAsL7OcL37nRv7w8F8qmFCSpPqrLMwjYkfgcuD4zJxfXLw0Ig4oTr8eWLSx+/a3YtWj\n", "zD5rMStWrR35QSVJqrkq18xPoLEZfVZELIyIhTQ2tc+JiMU09qT/XpkFvW6/Xbl/2WrmnLOENWsf\n", "q25iSZJqqLKPpmXmMcAxG7lq2lCXdfiBu8Nmy7ji+ns5Zf51zJq5L+PG+hF5SZKgJqUxo0aN4kNv\n", "fgFTJ+/IL25fxhkX3MSGDX48XZIkqEmYA4wZM5rjj3wxe+y6HYuW3s85C26hp8dAlySpNmEO8JTN\n", "xvKpmfvwjIkTuPjqu/jelXe0eiRJklquVmEOMGGLzZjz7v3YftvNOe+S27j82ntaPZIkSS1VuzAH\n", "2H7bzTnpPfsxYYvN+OqFv+DaWx5o9UiSJLVMLcMcYNKOE5j9rn0YN24Mnzv/Bm6966FWjyRJUkvU\n", "NswBYpft+MRRU1m/oYe585Zw9wMrWz2SJElNV+swB5jy3B35yP97IasfeYzZZy229lWS1HVqH+YA\n", "06ZMYuYb9+ThlWutfZUkdZ2OCHOAQw/YjcMP3N3aV0lS1+mYMAc4avpkDpo6iTvuXc4p869j3WMb\n", "Br+TJEk111Fhbu2rJKkbdVSYg7WvkqTu03FhDta+SpK6S0eGOVj7KknqHh0b5mDtqySpO3R0mIO1\n", "r5KkztfxYQ7WvkqSOltXhDlY+ypJ6lxdE+Zg7askqTN1VZiDta+SpM7TdWEO1r5KkjpLV4a5ta+S\n", "pE7SlWEO1r5KkjpH14Y5WPsqSeoMXR3mYO2rJKn+uj7MwdpXSVK9GeYFa18lSXVlmPdh7askqY4M\n", "836sfZUk1Y1hvhHWvkqS6sQw3wRrXyVJdWGYD8DaV0lSHRjmA7D2VZJUB4b5IPrXvs6z9lWS1GYM\n", "8xL61r4usPZVktRmDPOSrH2VJLUrw3wIrH2VJLUjw3yIrH2VJLUbw3wYrH2VJLUTw3yYrH2VJLWL\n", "QcM8IvaJiGMjYnxEXB4Rf4yIf2jGcO3O2ldJUjsos2b+JeAG4HBgDfAi4ONVDlUn1r5KklqtTJiP\n", "zsyrgOnA9zPzt8CYaseqF2tfJUmtVCbM/xIRxwEHAT+KiGOAP1c7Vr1Y+ypJaqUyYT4D2AI4LDMf\n", "BiYCb6t0qhqy9lWS1CqDhnlm3gdcCewVEU8BLi0uUz/WvkqSWqHM3uwfAeYC/wxMAM6MiI9VPVhd\n", "WfsqSWq2MpvZ3wG8DlidmcuAlwDvrHKourP2VZLUTGXCfH1m9v0A9RrAz18NwtpXSVKzlAnzqyLi\n", "NGCriDgUWEDjPXQNwtpXSVIzlAnzjwF3ADcDbwcuAY6tcqhOYu2rJKlqZcJ8C2BsZv4D8GFgR2Cz\n", "SqfqMNa+SpKqVCbMvw3sVJxeWdzn/LIPUHS7LyxOvzAi7ouIhcXXW4Y8cU1Z+ypJqsrYErfZJTMP\n", "BsjMlcAnI+LmMguPiOOBI4BVxUVTgNMz8/ThDFt3R02fzPJVa7ni+ns5Zf51zJq5L+PGeuA6SdKT\n", "UyZJNkTEXr1nImIP4NGSy78TOAwYVZyfAkyPiKsi4pyI2GpI09acta+SpCqUCfPjgMsj4saIuBG4\n", "jJI7wGXmRTz+Y2zXAsdl5gHAXcDsIc5be9a+SpJGWpk6158AzwDeAxwNPDszFw3z8X6QmUuL0/8J\n", "vHCYy6k1a18lSSOpTJ3rrsApwAeAj9Kocz13mI93aURMLU4fROM46V3J2ldJ0kgps5n9u8X3RcBP\n", "gauKr6Ho3Y78PuCLxd7t+wGfHuJyOoq1r5KkkVBmb/axmXnccB8gM+8GXlqcvhl4+XCX1Yl6a18/\n", "eeZiPnf+DZz03pfyvGc9rdVjSZJqpMya+TUR8caIsCimIta+SpKejDJh/mYaO6s9EhEbiq/1Fc/V\n", "dax9lSQN16Cb2TNzp8Fuo5Exbcoklq96lHkLbmH2WYv57Af3Z5utxrd6LElSmxs0zCNiR2AGsCWN\n", "8pcxwDMz8+0Vz9aVDj1gN5b/+RG+v/BO5pyzhJP/6WVsPr7Mrg2SpG5VZjP7RcDewJE0Av2NwH1V\n", "DtXtjpo+mYOmTuKOe5dzyvzrWPfYhlaPJElqY2XCfPvMPAq4GPgBMA2YOuA99KRY+ypJGooyYf5w\n", "8T2BvTJzBbB9dSMJrH2VJJVXJsyvjIgLKTrZI+LrgAfkbgJrXyVJZZTpZv8k8PHMvAd4G/AbGkdC\n", "UxNY+ypJGswmwzwiDi6+HwW8rPi+J43N7q9qzngCa18lSQMbaM38xcX3acCBxfdpfc6riXprX8eN\n", "G8Pnzr+BW+96qNUjSZLaxCY/wJyZvcca/12xqV0t1lv7Onfetcydt4RTP7g/u+60davHkiS1WJkd\n", "4A6OiDK3UxNY+ypJ6q9MtdhDwG8i4iZgTXFZT2a+s7qxNBBrXyVJfZUJ829s5DI/8Nxi1r5KknqV\n", "+WjafBrtb1cCC4FFWOfaFqx9lSRBiTCPiM8Ad9H4fPk1wJ3AJyqeSyVY+ypJgnI7wL0VeAbwXRof\n", "SzsI+N8KZ9IQWPsqSSoT5g8Ufey/Al6QmQuB51U7lobC2ldJ6m5lwnxFRBwJ3ATMiIj9gB2qHUtD\n", "Ze2rJHWvMmE+E9ihWCP/X+BM4F8rnUrDYu2rJHWnMmH+FuCbAJl5bGbunZkXVDuWhsvaV0nqPmXC\n", "/OnAkoi4LCKOiIgtqh5KT05v7ev6DT3MnbeEux9Y2eqRJEkVKvM58+OAZwEnA/sCN0fEN6seTE+O\n", "ta+S1D2G0rk+DtgM2ACsrWYcjaRpUyYx84178vDKtcw+azErVvm0SVInKlMa82Xgt8BHgCuAvTNz\n", "ZtWDaWQcesBuHH7g7ty/bDVzzlnCmrWPtXokSdIIK1PmfQfwosxcVvUwqsZR0yezfNVarrj+Xk6Z\n", "fx2zZu7LuLEeCE+SOkWZ98y/ZJDXm7WvktTZXD3rEta+SlLnMsy7iLWvktSZNvmeeUTM7nO2BxjV\n", "93xmnlTZVKpMb+3rx758NeddchvbbDWe1+yzS6vHkiQ9CQOtma8GVgEvBN4ArAAeBl4JRPWjqSrW\n", "vkpSZ9lkmGfmFzLzNGAi8IrMPCMzvwy8Gnh2swZUNax9laTOUeY98+2AMX3Obw5sU804aiZrXyWp\n", "M5QJ868DN0bEFyLidOBG4IvVjqVmsfZVkuqvzOfMTwOOAB4A7gMOz8wzqx5MzWPtqyTVW9mPpgWN\n", "ze1nAXtXN45axdpXSaqvMt3snwX+HjiMxsFWji42t6vDHDV9MgdNncQd9y7nlPnXse6xDa0eSZJU\n", "Qpk189cCRwKPZOafaOzN/vpKp1JLWPsqSfVUJszX9zs/fiOXqUNY+ypJ9VMmzC8ELgC2i4iPAlcD\n", "36l0KrWUta+SVC9l9mY/FTiXRqhPAmZl5slVD6bW6q193X7bzTnvktu4/Np7Wj2SJGkTyu7Nfh+w\n", "APgh8OeIeEV1I6ldWPsqSfVQZm/2rwKXACcBJxZfcyqdSm3D2ldJan+bPGpaH68BIjPXVD2M2lNv\n", "7evcedcyd94STv3g/uy609atHkuSVCizmf2ukrdTB7P2VZLaV5k18z8Bv46IxcAjxWU9mfnO6sZS\n", "O5o2ZRLLVz3KvAW3MPusxXz2g/uzzVbjWz2WJHW9MmF+afHVlx887lKHHrAby//8CN9feCdzzlnC\n", "yf/0MjYfX+afkSSpKpvcfB4RE4uTC4Eri+99v9SlrH2VpPYy0CrVPGA6cBUbXxN/ZiUTqe311r6u\n", "XP0o1//6D5xxwU0c+7YpjB49qtWjSVJX2mSYZ+b04vuuTZtGtdFb+zrr6z9n0dL72Xar8bzrkD0Z\n", "NcpAl6RmG/TNzoh4LvB+YEtgVHGfXTPT4pgu11v7+vGvXsOCq+9i2wnjefNBz2n1WJLUdcp85Ow/\n", "aOzR/kLgF8AOwI+rHEr1Ye2rJLVemTAfnZmzgcuAm4BDaBwWtZSI2CciFhand4+IayJiUUR8LSLc\n", "JtsBrH2VpNYqE+arI2I8cDswJTPXAtuXWXhEHA+cTeOwqQCnAycUm+hH0fjDQB3A2ldJap0yYf5N\n", "4EfF14cj4lLgdyWXfydwGI3gBnhRZi4qTv8YeNUQZlWb6619Xb+hh7nzlnD3AytbPZIkdYUyh0D9\n", "CnBYZi4DpgFfB95UZuGZeRHwWJ+L+m5WXwVsU3pS1YK1r5LUfJvcmz0iZvc73/fs82kcRW2o+raL\n", "TACWD2MZanPWvkpScw20Zj7QzmnD3XFtaUQcUJx+PbBooBurvg49YDcOP3B37l+2mjnnLGHN2scG\n", "v5MkaVgGKo05sfd0ROwIvJzGJvNFmfmnIT5Ob4PcscDZEbEZ8Gvge0NcjmrkqOmTWb5qLVdcfy+f\n", "mX8dn5q5L+PGegA+SRppZUpjjgC+APyMxpr8v0fEuzPzv8o8QGbeDby0OH0Hjffd1QWsfZWk5iiz\n", "mvQpGh9JOzwz3wTsB5xa7VjqFL21r3vsuh2Llt7PvAW30NPjQfckaSSVCfOVwF9bQDLzHmBtZROp\n", "4/TWvj5j4gQWXH0X37vyjlaPJEkdpcyBqJcCCyLibGA98Fbg/oh4C0BmfrfC+dQhemtfP/blqznv\n", "ktvYZqvxvGafXVo9liR1hDJr5psBfwQOBQ4H1gEP09gb/fXVjaZOY+2rJFWjzJr5JzPz/r4XRMQ+\n", "mXltRTOpg/XWvn7yzMV87vwbOOm9L+V5z3paq8eSpFors2a+pHeTekRsFhGfBdy0rmGz9lWSRlaZ\n", "MD8Q+GBE/AdwPbAFsGelU6njWfsqSSOnTJjfC1wF7A88FbgiM/9c6VTqCtOmTGLmG/fk4ZVrmX3W\n", "Ylas8kMSkjQcZcL8V8AkYA/g1cC/RMRFlU6lrmHtqyQ9eWXC/LjMfEdmrsjMpFHruqTiudRFjpo+\n", "mYOmTuKOe5fzmfnXse6xDYPfSZL0V4PuzZ6ZCyJiBjAZ+AyNw6F+rvLJ1DWsfZWkJ2fQNfNi7/W/\n", "Bw4DxgFHR8TpVQ+m7mLtqyQNX5nN7K8FjgQeKY6W9mosi1EFrH2VpOEpE+br+50fv5HLpBHRW/u6\n", "/babc94lt3H5tfe0eiRJantlwvxC4AJgu4j4KHA18J1Kp1JXs/ZVkoZm0DDPzFOBc2mE+iRgVmae\n", "XPVg6m69ta/jxo3hc+ffwK13PdTqkSSpbZXpZiczLwUurXgW6XF6a1/nzruWufOWcOoH92fXnbZu\n", "9ViS1HbKbGaXWsbaV0kanGGutmftqyQNzDBXLVj7KkmbZpirNqx9laSNM8xVG721r1Mn78jS25dx\n", "xgU3sWGDLXGSZJirVqx9laQnMsxVO9a+StLjGeaqJWtfJelvDHPVlrWvktRgmKvWrH2VJMNcHaC3\n", "9nX9hh7mnnstdz+wstUjSVJTGebqCH+tfV2zjtln/ZwHrX2V1EUMc3WMv9W+PsIsa18ldRHDXB3F\n", "2ldJ3cgwV8ex9lVStzHM1XGsfZXUbQxzdSRrXyV1E8NcHcvaV0ndwjBXR7P2VVI3MMzV8ax9ldTp\n", "DHN1BWtfJXUyw1xdw9pXSZ3KMFdXsfZVUicyzNV1rH2V1GkMc3Ula18ldRLDXF3L2ldJncIwV9ey\n", "9lVSpzDM1dWsfZXUCQxzdT1rXyXVnWEuYe2rpHozzKWCta+S6sowl/qw9lVSHRnmUj/WvkqqG8Nc\n", "2ghrXyXViWEubYK1r5LqwjCXBmDtq6Q6MMylQVj7KqndjW3Fg0bETcCK4uxdmTmzFXNIZfTWvq5c\n", "/SjX//oPnHHBTRz7timMHj2q1aNJEtCCMI+IpwBk5oHNfmxpuHprX2d9/ecsWno/2241nncdsiej\n", "RhnoklqvFZvZ9wa2iIjLIuKKiNinBTNIQ2btq6R21YowXw18PjNfC7wP+FZE+N69aqF/7et/W/sq\n", "qQ20IkRvB74FkJl3AA8BO7VgDmlY+ta+fsXaV0ltoBVhfjRwGkBE/B2wNeD/hqoVa18ltZNWhPk8\n", "YOuIWARcABydmX7WR7Vj7aukdtH0vdkz8zHgyGY/rlSF3trX0759E7PP+jmf/9D+7LDdFq0eS1KX\n", "cccz6Umy9lVSqxnm0giw9lVSKxnm0gix9lVSqxjm0gjprX2dOnlHlt6+jDMuuIkNG3paPZakLmCY\n", "SyOot/Z1j123Y9HS+5m34BZ6egx0SdUyzKURZu2rpGYzzKUKWPsqqZkMc6ki1r5KahbDXKqQta+S\n", "msEwlypm7aukqhnmUhP01r6uXrOO2Wf9nAcf/kurR5LUQQxzqUmsfZVUFcNcaiJrXyVVwTCXmsza\n", "V0kjzTCXmszaV0kjzTCXWsDaV0kjyTCXWsTaV0kjxTCXWsjaV0kjwTCXWszaV0lPlmEutQFrXyU9\n", "GYa51CasfZU0XIa51EasfZU0HIa51GasfZU0VIa51IasfZU0FIa51KasfZVUlmEutSlrXyWVZZhL\n", "bczaV0llGOZSm7P2VdJgDHOpBqx9lTQQw1yqCWtfJW2KYS7ViLWvkjbGMJdqxtpXSf0Z5lINWfsq\n", "qS/DXKopa18l9TLMpRqz9lUSGOZS7Vn7Kskwl2rO2ldJhrnUAax9lbqbYS51CGtfpe5lmEsdxNpX\n", "qTsZ5lKHsfZV6j6GudSBrH2VuothLnUoa1+l7mGYSx3M2lepOxjmUoez9lXqfIa51AWsfZU6m2Eu\n", "dQlrX6XOZZhLXcLaV6lzGeZSF7H2VepMhrnUZax9lTqPYS51IWtfpc5imEtdytpXqXMY5lIXs/ZV\n", "6gxND/OIGB0RZ0bE4ohYGBG7NXsGSX9j7atUf61YMz8U2CwzXwp8HDitBTNI6sPaV6nexrbgMV8G\n", "XAqQmddGxItbMIOkfqZNmcTyVY8yb8EtzDprMSe84yU8ZbNW/Bchda+HVjwyrPu14pW6NdB3O976\n", "iBidmRuroxoD8Pvf/74pg0nd7sW7jefuvbfm0p/fzXvm3NvqcaSus27N8t6TY4Zyv1aE+UpgQp/z\n", "mwpygJ0AZsyYUflQkiS1kZ2A/yl741aE+c+Ag4ELI2Jf4JcD3PZ6YH/gAWB9E2aTJKmVxtAI8uuH\n", "cqdRza5yjIhRwNeAvYqLjs7M25s6hCRJHaTpYS5JkkaWpTGSJNWcYS5JUs0Z5pIk1ZxhLklSzbVV\n", "vVNE7AOcmpkH9rv8YOBTwGPAuZl5Tivm06YN8Nx9FJgJLCsueq+fXmgfETEOOBfYBRgPfDozL+5z\n", "va+9Nlbi+fP118YiYgxwNvAcoAd4X2be2uf60q+/tgnziDgeOAJY1e/yccDpwIuBvwA/i4gFmflg\n", "86fUxmzquSu8CDgyM5c2dyqVNANYlplHRsRTgV8AF4OvvZrY5PNX8PXX3t4AbMjMl0fEAcDJNI5f\n", "MuTXXzttZr8TOAwY1e/yPYA7M3NFZq4DrgFe0ezhNKBNPXcAU4ATIuLqiPh4c8dSCRcCs4rTo2ms\n", "AfTytdf+Bnr+wNdfW8vMHwLvLc7uCvypz9VDev21TZhn5kU88R8iNLrcV/Q5/2dgm6YMpVIGeO4A\n", "vkPjH+srgZdHxPSmDaZBZebqzFwVERNoBMMn+1zta6/NDfL8ga+/tpeZ6yNiPvAl4Nt9rhrS669t\n", "wnwAK3h8l/sEHv/Xi9rbv2Xmw8Vflv8FvLDVA+nxImIScCVwXmZe0OcqX3s1MMDzB77+aiEz30Hj\n", "ffOzI2Lz4uIhvf7a5j3zAfwGeHbxftBqGpsZPt/akVRGRGwD/DIiJtN4z+eVwLzWTqW+ImJH4HLg\n", "/Zm5sN/Vvvba3EDPn6+/9hcRRwI7Z+ZngDXABho7wsEQX3/tGOY9ABHxVmCrzDw7Iv4ZuIzGloR5\n", "mflAKwfUJm3sufs4sBBYC/wkMy9t5YB6ghNobLqbFRG9772eDWzpa68WBnv+fP21t+8B8yPiKmAc\n", "cAzwpogYcvbZzS5JUs3V4T1zSZI0AMNckqSaM8wlSao5w1ySpJozzCVJqjnDXJKkmjPMpRaKiIMj\n", "Ys4w7jclIs4e5mOeGBGzh3PfuoiIl0TEqa2eQ2qWdiyNkbpGcbjKiwe94RPvdyPw7mE+bDeUS0wG\n", "dmz1EFKzGOZSBSJiGvCvxdmdgeuAdwF/B1xK4/jSjwDfBKZl5tERcTdwHvBaYEvg7Zl5U0S8APg6\n", "sDnwMI3DXj4bmJ2ZB0bET4FfAS8FngJ8JDP/OyL2pHHwhq2AHYDTMvPLA8z8NhoH6ugBrqfxx8Jm\n", "NBrF9qJRNfmFzDw/It4BTC9+np2BM4Bn0KgMfQh4PbATcBFwL7AbcA9wRGb+KSLeAMylsXXwLhrH\n", "2X5wgN/B7sDXgKfRqCb9UGb+ojhAxXIaRwfbGZgD/AA4CdgyIj5RVGVKHc3N7FJ19qVxxKo9aITs\n", "B4rLnwPMyMxXF+d7+nz/Y2buA5xJo6oT4FvAnMzcC7iARuVj37XrHmBsZk6hEfTfKI6FPBOYm5kv\n", "oRGyJxe3f8KhaiPi6TSOnfzqzNwTGEMjrE+kcbzs5xfLODEinl/cbSqN0N0fOA24JDP3Lq57bfF9\n", "b+CzxTJvK+6/Q/HzHVLc/mfAVwb5HXwDOL74Gd9b/B567ZyZ+wMH0/hjYwXwKeCHBrm6hWEuVecn\n", "mfk/mdkDnE8jDHuABzPzt8VtRvH4cO3tzr4V2C4ingZMzMxLADLzzMw8nicG8pnF9b8AHgCeDxwL\n", "bFH0c59MY013U/YDrsnM3xXLeXtxrOUDKQ7OkZkPAT8EphU/x88yc1Wfn+WK4vs9wLbFbX6VmYuL\n", "y79R/A6mAtf1ud/ZwEED/A62LO7z/yNiKY0/braMiO2Kx7i87+2L0/1/r1JHczO7VJ2+x3gf0+f8\n", "mgHu80jxvYdGGK3re2VEjAeevpH7re9zenRx/kIam7wvprEm+499lt3fo/QJv4jYvjg/mseH4mj+\n", "9v/Go30XkJkbNrLcjf0O+q9EjOLx/xf1/x2MAdZk5l8P3xkRkzLz4YiAxkFEyMye4rzUdVwzl6pz\n", "YERMjIjRwNuBSxji2mJmrgTujYhXFRe9ncb7wn0DeRSNzetExItprBX/CngVjffVL6axNk0xy8Zm\n", "uAHYpzikJsC/AW+kcZzsmcV9twcOoXEUrsF+jt41472K9+4BjqbxO7gW2Dcidikuf0/xOAP9Du6I\n", "iN6f8dXATwd5/MdwZUVdxDCXqnM/jU3Ct9LYCeyc4vL+73dvbE257+VHALOLTcxvBo7rt5weYPeI\n", "uJHG5vZ/LNaSTwSuiYifAc+l8Z71Mzf2mMXm9WOAyyLiV8Aq4FwaO5JtFxG/BK4CPl1syu+/jP4/\n", "Q+/1DwKnRMStwPbF/R+kEeA/iIhbaByn+X2D/A5mAO+KiJtpvGXwlk08du/p3j8YTtnIcqWO4yFQ\n", "pQoUe7P/S2a+vgmPtbB4rOuqfqyhiIhdgR9n5h6tnkXqdK6ZS9XY1Bp3t/F3IDWBa+aSJNWca+aS\n", "JNWcYS5JUs0Z5pIk1ZxhLklSzRnmkiTV3P8BeH16Onwt62UAAAAASUVORK5CYII=\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "from __future__ import division\n", "eigsort = np.sort(eig)[::-1]\n", "sns.set_style('white')\n", "plt.figure()\n", "plt.plot(range(1, len(eigsort) + 1), eigsort)\n", "plt.xlabel('principal component')\n", "plt.ylabel('explained variances')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAfYAAAFkCAYAAADSRRn0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAIABJREFUeJzt3XucXAV99/HPJpAIAaIh3BElTfgJSkAFQe4x8mqVYBH0\n", "qRik+ICiKBa1VbT1Xi3FIEUsXoC0ctEa+kgNlQIVoUgsgVAjEOWXghgEQW5p0iiJJOT548zAsCS7\n", "Zzd7dmbOfN6vV17szJkz89tdJt+cy3xP3/r165EkSfUwpt0DSJKkkWOwS5JUIwa7JEk1YrBLklQj\n", "BrskSTVisEuSVCOVB3tEHBARN2zg/qMj4taI+HFEnFL1HJIk9YJKgz0iPgJcCIzvd//mwJeAI4HD\n", "gXdHxPZVziJJUi+oeov9HuBYoK/f/XsC92Tmisx8CrgZOKziWSRJqr3NqnzyzPxuRLx0A4u2AVa0\n", "3P5fYOLGnicixgP7Aw8B60ZyRkmSOtRYYCfgtsxcU3alSoN9ACuArVtubw0sH+Dx+wM/qnQiSZI6\n", "06EUe7ZLaVew3w1Mi4gXAb+l2A3/xQEe/xDArq99D6977cv506P2GoURJUlqn4cffpjZs2dDIwPL\n", "Gq1gXw8QEccDW2XmhRHxIeBaiuP8F2fmQIOvA5jy0hdzy9InOeyAPg7dd5fKh5YkqQMM6RB0Xzdc\n", "3a1xnP6+y+fN56x/+m82G9PHeR+ewQ6Ttmz3aJIkVeKBBx5g5syZALtn5i/LrtdVBTU7bjuB97x5\n", "b367ei1zLlvEunVPt3skSZI6SlcFO8DM/XfjsH134e5ly/nWddnucSRJ6ihdF+x9fX2c9pZ92H7S\n", "llxx/VLuuOfRdo8kSVLH6LpgB5iwxeb8xQmvpq+vj3Mu/y9WrCr98T5JkmqtK4Md4GUvmcQJf/Qy\n", "nli5mi9/ZzHdcBKgJElV69pgBzhuxjT2mTaZW3/2MN9fcF+7x5Ekqe26OtjHjOnjg8e/im0mjGPu\n", "VUu479crBl9JkqQa6+pgB9h24hac8bZX8tTapzn70kWsXrO23SNJktQ2XR/sAPvvtSNvOnQKDzyy\n", "igu/d1e7x5EkqW1qEewAJ83aiyk7T+S6hcv40eIH2z2OJEltUZtg33yzsfzFO17N+HFj+fsrFvOb\n", "J37X7pEkSRp1tQl2gF2339rKWUlST6tVsIOVs5Kk3la7YLdyVpLUy2oX7GDlrCSpd9Uy2MHKWUlS\n", "b6ptsIOVs5Kk3lPrYLdyVpLUa2od7GDlrCSpt9Q+2MHKWUlS7+iJYAcrZyVJvaFngt3KWUlSL+iZ\n", "YAcrZyVJ9ddTwQ7PrZz9tpWzkqSa6blgb62cnWflrCSpZnou2MHKWUlSffVksIOVs5KkeurZYAcr\n", "ZyVJ9dPTwW7lrCSpbno62MHKWUlSvfR8sIOVs5Kk+jDYG6yclSTVgcHeYOWsJKkODPYWVs5Kkrqd\n", "wd6PlbOSpG5msPdj5awkqZsZ7Btg5awkqVsZ7Bth5awkqRsZ7AOwclaS1G0M9gFYOStJ6jYG+yCs\n", "nJUkdRODvQQrZyVJ3cJgL8nKWUlSNzDYS7JyVpLUDQz2IbByVpLU6Qz2IbJyVpLUyQz2IbJyVpLU\n", "yQz2YbByVpLUqQz2YbJyVpLUiQz2TWDlrCSp0xjsm8DKWUlSpzHYN5GVs5KkTmKwjwArZyVJncJg\n", "HyFWzkqSOsFmVT1xRIwBLgCmA2uAUzLz3pblbwY+DqwH5mbm16qaZTQ0K2fPOPc/+PsrFrPHbi9i\n", "h0lbtnssSVKPqXKL/RhgXGYeBJwJnNNv+ZeAI4GDgQ9HxMQKZxkVVs5KktqtymA/GLgGIDMXAvv1\n", "W/4U8EJgC6CPYsu961k5K0lqpyqDfRtgZcvtdY3d803nALcDdwFXZWbrY7uWlbOSpHaqMthXAlu3\n", "vlZmPg0QEbsB7wdeArwU2CEi3lLhLKPKyllJUrtUGewLgDcCRMSBwB0ty14ArAPWNML+EYrd8rVh\n", "5awkqR2qDPYrgdURsYBit/sHI+L4iHhXZi4Fvgn8OCJ+BEwE/rHCWdrCyllJ0mir7ONumbkeeG+/\n", "u5e2LD8XOLeq1+8EzcrZD5xzI3OvWsLLp2zL7jt3/cn/kqQOZkFNxayclSSNJoN9FFg5K0kaLQb7\n", "KLFyVpI0Ggz2UdKsnB0/bix/f8VifvPE79o9kiSphgz2UWTlrCSpagb7KLNyVpJUJYN9lFk5K0mq\n", "ksHeBlbOSpKqYrC3iZWzkqQqGOxtZOWsJGmkGext1Kyc3WbCOOZetYT7fr2i3SNJkrqcwd5mVs5K\n", "kkaSwd4BrJyVJI0Ug71DWDkrSRoJBnuHsHJWkjQSDPYOYuWsJGlTGewdxspZSdKmMNg7TP/K2Tvv\n", "eazdI0mSuojB3oFaK2fnXH67lbOSpNIM9g5l5awkaTgM9g5m5awkaagM9g5m5awkaagM9g5n5awk\n", "aSgM9i5g5awkqSyDvUtYOStJKsNg7xJWzkqSyjDYu4iVs5KkwRjsXcbKWUnSQAz2LmPlrCRpIAZ7\n", "F7JyVpK0MQZ7l7JyVpK0IQZ7F7NyVpLUn8HexayclST1Z7B3OStnJUmtDPYasHJWktRksNeElbOS\n", "JCgZ7BGxVURMj4gxETGh6qE0dFbOSpKgRLBHxExgMTAf2AlYFhF/WPVgGjorZyVJZbbY/wY4FFie\n", "mQ8ChwNfrHQqDZuVs5LU28oE+5jMfKh5IzOXALahdCgrZyWpt5UJ9l9FxNEAEfHCiPhL4P5qx9Km\n", "sHJWknpXmWB/DzAbeDHwC+CVwLurHEqbzspZSepNgwZ7Zv4GODszJwNTgK+27ppX5zp2xjSmT7Vy\n", "VpJ6SZmz4s8C/rZxcwvgExHxmUqn0ogYO6aPD73dyllJ6iVldsUfDfwRQGNL/fXAcVUOpZFj5awk\n", "9ZYywT4W2LLl9njAD0h3kdbK2YvmWzkrSXW2WYnHfB24PSLmA33AG4CvVDqVRtxJs/birnsf59pb\n", "lrHvHttxyD67tHskSVIFypw8dy5wAvAQsAyYnZkXVD2YRlZr5exX5lk5K0l1Vebkuc2B7YFHgRXA\n", "9Ig4serBNPKsnJWk+itzjP1bwCeA1wFHNP7MqG4kVcnKWUmqtzLH2PcG9sxMG05qoFk5e/f9y5l3\n", "/VL2mbYde0+d3O6xJEkjpMwW+88pruqmmrByVpLqq0ywTwAyIv4zIm5o/Plh1YOpWlbOSlI9ldkV\n", "/4UN3DdoCkTEGOACYDqwBjglM+9tWb4/cA7FR+geBE7MzN+XGVoj49gZ01i89NFnKmdnHTKl3SNJ\n", "kjZRmY+73QisBNZRFNOMAf6gxHMfA4zLzIOAMylCHICI6AO+AZyUmYcC1wO7D3V4bRorZyWpfsp8\n", "3O0S4DvA9yi23ucDR5Z47oOBawAycyGwX8uyPYDHgQ9FxI3ACzPTU7TbwMpZSaqXMsfYDwNeDlwB\n", "nAocUHK9bSi29JvWNXbPA0wGDgLOp+ienxkRfoSuTayclaT6KBPQv24c+/45MD0zlwAvKbHeSmDr\n", "1tfKzGYjyuPAPVlYS7Flv1//J9DoOWnWXkzZeSLX3rKMm3/6YLvHkSQNU5lgfzAiPgb8GDg1Io4H\n", "XlhivQXAGwEi4kDgjpZlvwC2iojmsfpDATcV28jKWUmqhzLBfjJwX2beCvw/4G3Ae0usdyWwOiIW\n", "UJw498GIOD4i3tXYA3Ay8K2IuBW4PzP/bXjfgkaKlbOS1P02+nG3iNgxMx+m2Dr/z4jYjeLEufmU\n", "+Lhbo6mu/z8AlrYsv4HieL06yMz9d+Mn+Sg3LX6Qb1+XnPCGPds9kiRpCAb6HPvFwFHATTw/yNcD\n", "fui5hqyclaTuttFd8Zl5VOPL92fm7v3+GOo1ZuWsJHWvMsfYz658CnUcK2clqTuVqZS9NyLmAguB\n", "1Y371mfmJdWNpU5g5awkdZ8yW+yPNx53IM9ei90ymR5g5awkdZ9Bt9gz86T+90XElpVMo47TrJz9\n", "7MULOfvSRZx7xuG8YHyZHT2SpHYY9G/oiHgL8EmKy7eOAcYC44Edqh1NnaJZOTv/R7/govl38f63\n", "7tvukSRJG1H25LkzKCpl3w7MBb5Y5VDqPFbOSlJ3KBPsyzPzh8AtwMTM/DTw5kqnUsexclaSukOZ\n", "YP9dROwB3A0cERHuhu9RVs5KUucrE+x/BXweuAqYCfwG+Jcqh1Lnmrn/bhy27y7cvWw5374u2z2O\n", "JKmfMqc3r8jMtza+3j8iJmXmE1UOpc5l5awkdbYyW+wXRcSdEXFmRLzYUJeVs5LUuQYN9szcDzgW\n", "GAdcHRE3RsQplU+mjmblrCR1pjJb7GTmfwNfAs4CtgHOrHIodYdjZ0xj+tTJz1TOSpLab9Bgj4jj\n", "IuIKis+xH0JxtbeplU+mjmflrCR1njJb7G8HLgf+IDPfm5k/rngmdZFm5exTa5/m7EsXsXrN2naP\n", "JEk9rcwx9uMy818y8/ejMZC6T7Ny9oFHVnHR/LvaPY4k9bRSx9ilwVg5K0mdwWDXiLByVpI6w0YL\n", "aiLicGCjn2HKzJsqmUhdq1k5e953FjPnskWc9b5DGDvWfztK0mgaqHnuLyiCfSdgD+CHwFrgCOAO\n", "4HVVD6fuM3P/3fhJPspNix/k29clJ7xhz3aPJEk9ZaObU5k5KzOPBh4HpmfmMZn5FmBvBtiSV29r\n", "Vs5uP2lL5l2/lDvveazdI0lSTymzn3S3zPxly+2HgV2qGUd1YOWsJLVPmWC/NSIui4hZEfEmYB5w\n", "Q8VzqctZOStJ7VEm2N8N/AQ4FTgFuAk4vcqhVA9WzkrS6CtTULMG+C7wdeA44PuZab2YBmXlrCSN\n", "vjJd8W8D5gPnAdsCCyLiHVUPpnqwclaSRleZXfEfBQ4GVmbmw8CrgI9VOpVqxcpZSRo9ZYJ9XWau\n", "bN7IzIeAddWNpDqyclaSRkeZYF8SEacD4yJi34j4BrC44rlUM1bOStLoKBPs76P43PqTwFxgJXBa\n", "lUOpnpqVs79dvZY5ly1i3bqn2z2SJNXOQJWyAGTmKuDMUZhFPcDKWUmq1qDBHhEnAXOASS13r8/M\n", "sVUNpfpqVs7eff9y5l2/lH2mbcfeUye3eyxJqo0yu+I/RXHhl7GZOabxx1DXsFk5K0nVKRPsD2Tm\n", "XZlpJ6hGjJWzklSNQXfFA7dHxD8D1wHNTav1mXlJdWOpFxw7YxqLlz76TOXsrEOmtHskSep6ZbbY\n", "XwisAl5LsUt+RuOPtEmsnJWkkVfmrPiTRmEO9ahm5exnL17I2Zcu4twzDucF48vsSJIkbchG/waN\n", "iO9n5lERsaHLcq3PTPebakQ0K2fn/+gXXDT/Lt7/1n3bPZIkda2BNo3e1fjvhna7e6aTRtRJs/bi\n", "rnsf59pblrHvHttxyD67tHskSepKGz3Gnpm/bnzZvPDLocBhFEF/cvWjqZdYOStJI6PMyXPfBU4H\n", "/gb4I+BzFJdvlUaUlbOStOnKBHsArwOuBL4IvAbYrcqh1Ltm7r8bh+27C3cvW863r8t2jyNJXadM\n", "sP+mUU5zNzC9sYt+x2rHUq9qVs5uP2lL5l2/lDvveazdI0lSVyl72dbzgRuBMyLiY8D4SqdST7Ny\n", "VpKGr0ywvweYl5lLKHrjdwTeXulU6nlWzkrS8Az0OfbDefZjbX0RcRiwguJkukkbW08aKVbOStLQ\n", "DfQ59s8w8OfVrZVVpZqVsx8450bmXrWEl0/Zlt13ntjusSSpo2002DPziNbbEbEtsC4z/6fqoaQm\n", "K2claWgGPcYeEftGxE+BpcB9EbEgIqZWP5pUaFbOPvDIKi6af1e7x5Gkjlbm5Lm5wF9m5raZ+SJg\n", "DvAP1Y4lPddJs/Ziys4TufaWZdz80wfbPY4kdawywU5m/mvL11cCW1U2kbQBVs5KUjllDlbeEBFn\n", "Al8F1gGzgZ9FxPYAmfnIhlaKiDHABcB0YA1wSmbeu4HHfQN4PDM/NrxvQb2iWTl73ncWM+eyRZz1\n", "vkMYO7bUv00lqWeU+VvxOOBUYDFwJ3AmcBCwELhlgPWOAcZl5kGNdc7p/4CIOBV4BV4tTiVZOStJ\n", "AyuzxT4lM59zNY6ImJiZKwZZ72DgGoDMXBgR+/V7joMoeue/Drys/MjqZc3K2bvvX86865eyz7Tt\n", "2Hvq5HaPJUkdo8wW+80RsXvzRkS8AfhpifW2AVa23F7X2D1PROwEfBJ4P9BXflzJyllJGkiZYP8K\n", "xXH290bExcDngbeWWG8lsHXra7Vs+b8FmAxcDXwUeHtEnFh+bPW61srZ8+dZOStJTYPuis/Mb0XE\n", "euBy4BHgwMz8ZYnnXgAcDVwREQcCd7Q85/nA+QAR8afAyzLzkqGPr17WrJxduMTKWUlqKlNQcwnw\n", "CeC1FFvXN0XEB0o895XA6ohYQHHi3Acj4viIeNcGHuvmloasWTm7zYRxzL1qCff9erDTPiSp/sqc\n", "PPcY8KrMXA0sjIgfUJzw9uWBVmpcw/29/e5euoHHfbPkrNLzWDkrSc816BZ7Zn4I2CkijoqIzSk+\n", "wjar+tGkcqyclaRnldkV/zZgPsUW+rbAjyPiHVUPJg2FlbOSVChzVvxHKT6TvjIzHwZeBdgSp45i\n", "5awkFcoE+7rMfObz6Jn5EEW1rNRRmpWzv129ljmXLWLduqcHX0mSaqZMsC+JiNOBcY1LuH6Dol5W\n", "6jhWzkrqdWWC/X3ALsCTFJdwXQmcVuVQ0nA1K2e3n7Ql865fyp33PNbukSRpVJUpqFlFcREXqSs0\n", "K2c/+pWbmXP57Xz5w0cwcavx7R5LkkaF17xULVk5K6lXGeyqrWNnTGP61MnPVM5KUi8YUrBHxBYR\n", "sfXgj5Taz8pZSb2odLBHxMnALcB/RsTnqhtJGjnNytmn1j7N2ZcuYvWate0eSZIqtdFgj4hX9Lvr\n", "mMzcJzNfAby52rGkkWPlrKReMtBZ8adGxDjgs5n5ILA4Iq4FngL821Fd5aRZe3HXvY9z7S3L2HeP\n", "7Thkn13aPZIkVWKjW+yZeTrF5Vb/NiK+AMwBPgB8PDPfNkrzSSPCyllJvWLAY+yZuTQzTwD+FbgU\n", "eCPw89EYTBppVs5K6gUDHWM/LSLujYilwM6Z+SZgGfCvETF71CaURpCVs5LqbqAt9tOAAF4JfBwg\n", "M78LHAVsU/1o0sizclZS3Q0U7A8Bf0dxHfZndr9n5trM/GrVg0lVaVbO9vX1Mefy21mxak27R5Kk\n", "ETNQsB8N/DtwBXDi6IwjjQ4rZyXV1UY/7paZq4HvjeIs0qg6dsY0Fi999JnK2VmHTGn3SJK0yeyK\n", "V8+yclZSHRns6mlWzkqqG4NdPc/KWUl1YrBLFJWzU3aeyLW3LOPmnz7Y7nEkadgMdomicvbPT3i2\n", "cvYRK2cldSmDXWp48Q5bc+oxjcrZy2+3clZSVzLYpRavf01ROfvzXz5h5aykrmSwSy2snJXU7Qx2\n", "qR8rZyV1M4Nd2gArZyV1K4Nd2ohjZ0xj+tTJz1TOSlI3MNiljbByVlI3MtilAVg5K6nbGOzSIKyc\n", "ldRNDHapBCtnJXULg10qwcpZSd3CYJdKsnJWUjcw2KUhsHJWUqcz2KUhsHJWUqcz2KUhsnJWUicz\n", "2KVhsHJWUqcy2KVhaq2cvdrKWUkdwmCXhqm1cvZiK2cldQiDXdoEVs5K6jQGu7SJrJyV1EkMdmkE\n", "WDkrqVMY7NIIsHJWUqcw2KURYuWspE5gsEsjyMpZSe1msEsjyMpZSe1msEsjzMpZSe1ksEsVsHJW\n", "UrsY7FJFrJyV1A4Gu1QRK2cltcNmVT1xRIwBLgCmA2uAUzLz3pblxwN/BqwF7gROy0z3V6pWmpWz\n", "n714IWdfuohzzzicF4yv7G0nSZVusR8DjMvMg4AzgXOaCyJiC+BzwBGZeQgwEZhV4SxS21g5K2k0\n", "VRnsBwPXAGTmQmC/lmWrgddm5urG7c2AJyucRWorK2cljZYqg30bYGXL7XWN3fNk5vrMfBQgIk4H\n", "JmTmDyqcRWorK2cljZYqg30lsHXra2XmMx2bETEmIuYAM4HjKpxD6ghWzkoaDVUG+wLgjQARcSBw\n", "R7/lXwfGA29u2SUv1ZqVs5KqVuXpuVcCR0bEgsbtdzbOhN8KWAT8X+Am4IcRAXBeZv5LhfNIbdes\n", "nL37/uXMu34p+0zbjr2nTm73WJJqpLJgb3x07b397l7a8vXYql5b6mTNytmPfuVm5lx+O1/+8BFM\n", "3Gp8u8eSVBMW1EhtYOWspKoY7FKbWDkrqQoGu9QmVs5KqoLBLrVRs3L2qbVPc/ali1i9Zm27R5LU\n", "5Qx2qc2snJU0kgx2qQNYOStppBjsUgewclbSSDHYpQ5h5aykkWCwSx3EyllJm8pglzpIs3J2+0lb\n", "Mu/6pdx5z2PtHklSlzHYpQ7TrJzt6+tjzuW3s2LVmnaPJKmLGOxSB7JyVtJwGexSh7JyVtJwGOxS\n", "h7JyVtJwGOxSB7NyVtJQGexSh7NyVtJQGOxSF7ByVlJZBrvUBayclVSWwS51CStnJZVhsEtdxMpZ\n", "SYMx2KUuYuWspMEY7FKXsXJW0kAMdqkLWTkraWMMdqlLWTkraUMMdqlLWTkraUMMdqmLWTkrqT+D\n", "XepyVs5KamWwSzVg5aykJoNdqgErZyU1GexSTVg5KwkMdqlWrJyVZLBLNWLlrCSDXaoZK2el3maw\n", "SzVk5azUuwx2qaasnJV6k8Eu1ZSVs1JvMtilGmutnP3iZVbOSr3AYJdqrlk5+6vfWDkr9QKDXeoB\n", "Vs5KvcNgl3qAlbNS7zDYpR5h5azUGwx2qYdYOSvVn8Eu9RArZ6X6M9ilHmPlrFRvBrvUg6yclerL\n", "YJd6lJWzUj0Z7FKPsnJWqieDXephVs5K9WOwSz3OylmpXgx2SVbOSjVisEuyclaqEYNdEmDlrFQX\n", "BrukZ1g5K3W/zap64ogYA1wATAfWAKdk5r0ty48GPgGsBeZm5kVVzSKpnGbl7N33L2fe9UvZZ9p2\n", "7D11crvHkjQEVW6xHwOMy8yDgDOBc5oLImJz4EvAkcDhwLsjYvsKZ5FUkpWzUnerbIsdOBi4BiAz\n", "F0bEfi3L9gTuycwVABFxM3AY8M8VziOppGbl7CVX/5zz5y3m3cfs3e6RpJ6zfOXqYa1XZbBvA6xs\n", "ub0uIsZk5tONZa01V/8LTBzgucYCPPzwwyM+pKQNO2CPLVhwG9x828+4+baftXscqec89eT/NL8c\n", "O5T1qgz2lcDWLbeboQ5FqLcu2xpYPsBz7QQwe/bsER1QkqQusBNw76CPaqgy2BcARwNXRMSBwB0t\n", "y+4GpkXEi4DfUuyG/+IAz3UbcCjwELCumnElSeooYylC/bahrNRX1eUaI6KPZ8+KB3gn8Gpgq8y8\n", "MCJmAZ+kOIHv4sz8aiWDSJLUQyoLdkmSNPosqJEkqUYMdkmSasRglySpRgx2SZJqpMqPuw1bRBwA\n", "nJWZM/rdb798Fxjg9/dB4GTg0cZdp2bm0tGeTxvWqHqeC7wEGA/8dWZe1bLc918HK/H78/3XoSJi\n", "LHAhsAewHnhPZi5pWT6k917HBXtEfAQ4AVjV7/5mv/x+wO+ABRExPzMfGf0ptTEb+/01vAp4R2b+\n", "ZHSnUkmzgUcz8x2NjonFwFXg+69LbPT31+D7r3PNAp7OzEMi4nDg8xTXWxnWe68Td8XfAxwL9PW7\n", "/5l++cx8Cmj2y6uzbOz3B0WPwccj4kcRcebojqUSrqDoloDi74a1Lct8/3W+gX5/4PuvY2Xm94BT\n", "GzdfynObWIf83uu4YM/M7/L8/yFh6P3yaoMBfn8A36b4n/d1wCERcdSoDaZBZeZvM3NVRGxNERJ/\n", "2bLY91+HG+T3B77/OlpmrouIfwS+DHyrZdGQ33sdF+wDGGq/vDrPeZn5RONfnd8HXtnugfRcEfFi\n", "4IfAJZn5Ty2LfP91gQF+f+D7r+Nl5kkUx9kvjIgtGncP+b3XccfYBzDUfnl1kIiYCNwREXtRHCd6\n", "HXBxe6dSq4jYAbgOOC0zb+i32Pdfhxvo9+f7r7NFxDuAXTPzb4AngacpTqKDYbz3OjnY1wNExPE8\n", "2y//IeBanu2Xf6idA2pAG/r9nQncAKwBfpCZ17RzQD3Pxyl28X0yIprHai8EJvj+6wqD/f58/3Wu\n", "fwb+MSL+A9gc+DPgzRExrOyzK16SpBrppmPskiRpEAa7JEk1YrBLklQjBrskSTVisEuSVCMGuyRJ\n", "NWKwSx0gIo6OiM8MY71XR8SFw3zNT0fEp4azbreIiNdExFntnkMaTZ1cUCP1jMblNa8a9IHPX+92\n", "4F3DfNleKLHYC9ih3UNIo8lglyoUEUcAf9W4uStwK3AKsDNwDcW1sVcDlwFHZOY7I+KXwCXAHwIT\n", "gBMz878iYl/g68AWwBMUl+mcBnwqM2dExI3AncBBwAuAMzLz3yPiFRQXltgK2B44JzPPH2Dmt1Nc\n", "QGQ9cBvFPxzGUbSYTaeou5yTmZdGxEnAUY3vZ1fg74DdKCpLHwfeAOwEfBf4FfAHwDLghMxcHhGz\n", "gM9R7D38BcU1wh8Z4GcwFbgA2JaiGvX0zFzcuHjG/1BcwWxX4DPAlcBngQkR8bFGXadUe+6Kl6p3\n", "IMVVtfakCNz3Ne7fA5idmUc2bq9v+e9jmXkA8DWKqlCAy4HPZOZ04J8oaidbt7rXA5tl5qspQv+b\n", "jWs5nwx8LjNfQxG4n288/nmX1o2IXSiu/XxkZr4CGEsR3J+muNb33o3n+HRE7N1YbX+KAD4UOAe4\n", "OjP3aSz7w8Z/9wH+tvGcP2+sv33j+/vjxuMXAF8Z5GfwTeAjje/x1MbPoWnXzDwUOJriHx4rgE8A\n", "3zPU1UsMdql6P8jMezNzPXApRTCuBx7JzPsbj+njuUHb7PFeAkyKiG2BHTPzaoDM/FpmfoTnh/PX\n", "GssXAw8BewMfBrZsdIV/nmILeGNeC9ycmb9uPM+JjWtFz6Bx0ZDMfBz4HnBE4/tYkJmrWr6X6xv/\n", "XQa8sPGYOzPzx437v9n4GewP3Nqy3oXAzAF+BhMa6/xDRPyE4h86EyJiUuM1rmt9fOPr/j9Xqfbc\n", "FS9Vr/X69GNbbj85wDqrG/9dTxFMT7UujIjxwC4bWG9dy9djGrevoNgtfhXFFu6ftDx3f7+nJQgj\n", "YnLj9hgr6FTBAAABsklEQVSeG5BjePbvj9+3PkFmPr2B593Qz6D/hkUfz/07qf/PYCzwZGY+c7nR\n", "iHhxZj4REVBc3ITMXN+4LfUkt9il6s2IiB0jYgxwInA1Q9yKzMyVwK8i4vWNu06kOI7cGs59FLvg\n", "iYj9KLaW7wReT3Ec/iqKrWwas2xohkXAAY1LgAKcB7yJ4hrfJzfWnQz8McWVwgb7PppbzNMbx/oB\n", "3knxM1gIHBgRL2nc/+7G6wz0M/jviGh+j0cCNw7y+mtxA0Y9xmCXqvcgxW7jJRQnkF3UuL//8fEN\n", "bUG33n8C8KnGbui3An/e73nWA1Mj4naKXfJ/0th6/jRwc0QsAF5GcYx79w29ZmMX/J8B10bEncAq\n", "YC7FSWiTIuIO4D+Av27s7u//HP2/h+byR4AvRMQSYHJj/UcowvzKiLiL4jrT7xnkZzAbOCUifkpx\n", "WOH/bOS1m183//HwhQ08r1RLXrZVqlDjrPiPZuYbRuG1bmi81q1Vv9ZQRMRLgX/LzD3bPYvUC9xi\n", "l6q1sS3xXuPPQBolbrFLklQjbrFLklQjBrskSTVisEuSVCMGuyRJNWKwS5JUI/8fvAXjgPk/QWYA\n", "AAAASUVORK5CYII=\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure()\n", "plt.plot(range(1, len(eigsort) + 1), eigsort / sum(eigsort))\n", "plt.xlabel('principal component')\n", "plt.ylabel('% explained variance')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we'd expect with the same data: PC1 will explain **all** of the variance in this data set, since all other features are literally the same. But what does this feature look like?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.73205081 3.46410162 5.19615242 6.92820323 8.66025404\n", " 10.39230485 12.12435565 13.85640646 15.58845727]\n" ] } ], "source": [ "# ordering eigenvalues and vectors together\n", "ordered = sorted(zip(eig, Q.T), reverse=True)\n", "eig = np.array([_[0] for _ in ordered])\n", "Q = np.column_stack((_[1] for _ in ordered))\n", "\n", "# transforming data: We take the dot multiplication of the eigenvectors by the random data\n", "X_transformed = np.dot(Q.T, random_data.T)\n", "print X_transformed[0]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAdsAAAFVCAYAAACnwEWcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAADz1JREFUeJzt3X+I7XWdx/GXejXtknbZSu0HNyL83C1Yg35HTYqkBQW7\n", "S7B/tG1KSNK2IyyMixYSUrA0u//MChGaeQOjKPqxEaSg4aj/1BZFBPO2Lbh/WfRjNsm0Uu/+cc6t\n", "WLpzRve859x77uMBF2bufBleF+fOc77f77lfTzt69GgAgD6nL3oAACw7sQWAZmILAM3EFgCaiS0A\n", "NBNbAGi2b6cPjjHOSnJrkpcm+X2S1ar63l4MA4BlMevM9uokv6mqN0zfvq1/EgAsl1mxfVmSrydJ\n", "VT2Y5AVjjHPbVwHAEtnxMnKS7yZ5e5IvjzFel+S5SfYnefj/HjjGeEaSVyd5KMkTc94JACeiM5Jc\n", "mORbVfXb4x00K7a3JfnLMcZ9SR5I8mCSXx7n2Fcnue9pDAWAk92bktx/vA/Oiu1rktxTVf88xnhV\n", "ktfsUO6HkuSOO+7IBRdc8LSWAsDJ5Mab78wDX/pYMm3g8cyKbSX53BjjhiSPZfIiqeN5IkkuuOCC\n", "vPCFL3wKUwHg5HTTB67IZZPY7nj7dMfYVtUvk7xljrsA4JTjoRYA0ExsAaCZ2AJAM7EFgGZiCwDN\n", "xBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJA\n", "M7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADTbt9MHxxin\n", "J7k1yUVJnkxydVXVXgwDgGUx68z28iT7q+qNSW5K8tH+SQAsi7WNzaxtbC56xsLNiu2jSc4bY5yW\n", "5Lwkv+ufBMAyWNvYzNaR7Wwd2T7lg7vjZeQkDyQ5O8lWkr9I8o72RQCwZGad2V6X5IGqGklekeTw\n", "GOOs/lkAnOzWV1dy6OCBHDp4IOurK4ues1Czzmz3J3l4+vZ2kjOTnNG6CIClcapH9phZsV1P8qkx\n", "xn2ZhPb6qnq0fxYALI8dY1tV/5Pkb/ZoCwAsJQ+1AIBmYgsAzcQWAJqJLQA0E1sAaCa2ANBMbAGg\n", "mdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sAaCa2ANBMbAGgmdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sA\n", "aCa2ANBMbAGgmdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sAaCa2ANBs36wDxhjvSXLl9N1zklyc5Pyq\n", "erhxFwAsjZmxrarDSQ4nyRjj5iS3Ci0A7N6uLyOPMV6V5OVVdWvjHoBTytrGZtY2Nhc9g2ZP5Z7t\n", "DUk+3LQD4JSztrGZrSPb2TqyLbhLblexHWM8O8lFVXVv8x4AWDq7PbNdSXJ35xCAU8366koOHTyQ\n", "QwcPZH11ZdFzaDTzBVJTFyX5UecQgFORyJ4adhXbqvq37iEAsKw81AIAmoktADQTWwBoJrYA0Exs\n", "AaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQT\n", "WwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAs32zDhhjXJ/k\n", "HUnOTHJzVR1uXwUAS2THM9sxxiVJXl9Vb0hySZKX7MEmAFgqs85sL0/y/THGl5Ocm2StfxLAH61t\n", "bCZJ1ldXFrwEnr5Z92yfm+SVSd6Z5Jokd7QvApha29jM1pHtbB3Z/kN04WQ0K7Y/T3JXVT1eVQ8m\n", "eWyM8Zw92AUAS2NWbO9P8tYkGWM8P8n+JL/oHgWQTC4dHzp4IIcOHnAZmZPajvdsq+prY4yVMcY3\n", "Mwnz+6vq6N5MA3CvluUw85/+VNW/7MUQAFhWHmoBAM3EFgCaiS0ANBNbAGgmtgDQTGwBoJnYAkAz\n", "sQWAZmILAM3EFgCaiS0ANBNbAGgmtgDQTGwBoJnYAkAzsQWAZmILAM3EFgCaiS0ANBNbAGgmtgDQ\n", "TGwBoJnYAkAzsQWAZmILAM3EFgCaiS0ANBNbAGgmtgDQTGwBoJnYAkCzfbMOGGN8J8mvpu/+uKre\n", "2zsJAJbLjrEdY5ydJFV16d7MAZ6qtY3NJMn66sqClwDHM+sy8sVJnjnGuHOMcfcY47V7MQrYnbWN\n", "zWwd2c7Wke0/RBc48cyK7SNJ1qvqiiTXJLljjOE+LwA8BbPC+WCSO5Kkqn6Y5BdJLuweBezO+upK\n", "Dh08kEMHD7iMDCewWS+QuirJXyX5xzHG85Ocm+Sh9lXAroksnPhmxfaTST41xjh2M+iqqnqyeRMA\n", "LJUdY1tVjyd59x5tAYCl5MVOANBMbAGgmdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sAaCa2ANBMbAGg\n", "mdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sAaCa2ANBMbAGgmdgCQDOxBYBmYgsAzcQWAJqJLQA0E1sA\n", "aCa2ANBMbAGgmdgCQDOxBYBmYgsAzcQWAJrt281BY4znJfl2ksuq6sHeSQCwXGae2Y4xzkzyiSSP\n", "9M8BgOWzm8vI60k+nuSh5i3QZm1jM2sbm4ueAZyidoztGOPKJD+rqrumv3Va+yKYs7WNzWwd2c7W\n", "kW3BBRZi1pntVUneMsb4RpJXJDk8xji/fxYALI8dXyBVVW8+9vY0uO+rqp+2r4I5Wl9d+cMZ7frq\n", "yoLXAKeiXb0aGU52Igss0q5jW1WXdg4BgGXloRYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQT\n", "WwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDN\n", "xBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmu2bdcAY44wktyS5KMnRJNdU1Q+6\n", "hwHAstjNme3bkzxZVW9M8qEkH+2dBADLZWZsq+orSd43fffFSbY7B7EYaxubWdvYXPQMgKW0q3u2\n", "VfXEGOP2JBtJPtO6iD23trGZrSPb2TqyLbgADXb9AqmqujKT+7a3jDHOaVsEAEtmZmzHGO8eY1w/\n", "fffRJE9Of7Ek1ldXcujggRw6eCDrqyuLngOwdGa+GjnJF5LcPsa4N8mZSa6tqt/2zmKviSxAn5mx\n", "rapHk/zdHmwBgKXkoRYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJA\n", "M7EFgGZiCwDNxBYAmoktADQTWwBoJrYA0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADQTWwBoJrYA\n", "0ExsAaCZ2AJAM7EFgGZiCwDNxBYAmoktADTbt9MHxxhnJrktycEkz0jykar66l4MA4BlMevM9l1J\n", "flZVK0nemuTm/kknprWNzaxtbC56BgAnoVmx/XySG//k2Md755yY1jY2s3VkO1tHtgUXgKdsx8vI\n", "VfVIkowxnpVJeD+4F6MAYJnMfIHUGONFSe5J8umq+mz/pBPP+upKDh08kEMHD2R9dWXRcwA4ycx6\n", "gdT5Se5K8v6q+sbeTDoxiSwAT9eOsU1yQ5Lzktw4xjh27/ZtVfVY7ywAWB6z7tlem+TaPdoCAEvJ\n", "Qy0AoJnYAkAzsQWAZmILAM3EFgCaiS0ANBNbAGgmtgDQTGwBoJnYAkAzsQWAZmILAM3EFgCaiS0A\n", "NBNbAGgmtgDQTGwBoJnYAkAzsQWAZmILAM3EFgCaiS0ANBNbAGgmtgDQTGwBoJnYAkAzsQWAZmIL\n", "AM3EFgCaiS0ANHtKsR1jvHaM8Y2uMQCwjPbt9sAxxnVJ/j7Jr/vmAMDyeSpntv+d5G+TnNa0BQCW\n", "0q7PbKvqi2OMF+9wyBlJcuPNd+amD1zx/90FACe8n/zkJ8fePGOn43Yd2124MEke+NLHctmXPjbH\n", "TwsAJ7wLk/zoeB+cZ2y/leRNSR5K8sQcPy8AnKjOyCS039rpoKcT26N/7jer6rdJ7n8anw8ATmbH\n", "PaM95rSjR/9sOwGAOfFQCwBoJrYA0ExsAaCZ2AJAs3n+05+MMV6b5F+r6tJ5ft5FG2OcmeS2JAeT\n", "PCPJR6rqq4tdNR9jjDOS3JLkokxeaX5NVf1gsavmb4zxvCTfTnJZVT246D3zNMb4TpJfTd/9cVW9\n", "d5F75m2McX2SdyQ5M8nNVXV4wZPmZozxniRXTt89J8nFSc6vqocXNmpOxhinJ7k1k+8tTya5uqpq\n", "savmZ4xxViZ/vpcm+X2S1ar63vGOn9uZ7fTZybdkEqNl864kP6uqlSRvTXLzgvfM09uTPFlVb0zy\n", "oSQfXfCeuZv+sPSJJI8sesu8jTHOTpKqunT6a9lCe0mS11fVG5JckuQlCx00Z1V1+Nh/uyT/leSf\n", "liG0U5cn2T/93nJTlu97y9VJfjP92rw6kxOy45rnZeRlfnby55PcOH379CSPL3DLXFXVV5K8b/ru\n", "i5NsL25Nm/UkH8/kgSvL5uIkzxxj3DnGuHt6dWmZXJ7k+2OMLyf5apL/XPCeFmOMVyV5eVXduugt\n", "c/RokvPGGKclOS/J7xa8Z95eluTrSTK9WvaCMca5xzt4brGtqi9miSL0p6rqkar69RjjWZmE94OL\n", "3jRPVfXEGOP2JBtJPrPgOXM1xrgyk6sSd01/a9l+GHwkyXpVXZHkmiR3TC/fLYvnJnllkndm+udb\n", "7Jw2NyT58KJHzNkDSc5OspXJlaX/WOycuftuJlcGM8Z4XSZfq/uPd/Ay/aVsNcZ4UZJ7kny6qj67\n", "6D3zVlVXZnJv5ZYxxjkLnjNPVyV5y/T/w/yKJIfHGOcveNM8PZhpgKrqh0l+kelzypfEz5PcVVWP\n", "T88eHhtjPGfRo+ZpjPHsJBdV1b2L3jJn1yV5oKpG/vh376wFb5qn25I8PMa4L8lfZ/J38ZfHO1hs\n", "d2H6zfmuJNdV1e0LnjNXY4x3T1+Akkwu+zw5/bUUqurNVXXJ9J7Yd5P8Q1X9dNG75uiqJP+eJGOM\n", "5yc5N8t1ufz+TF4ncezPtz+THyiWyUqSuxc9osH+JMfuP29n8gK3Hf/POCeZ1yS5p6relOQLSR6a\n", "Prb4z5rrq5GnlvH5jzdkcs/hxjHGsXu3b6uqxxa4aV6+kOT2Mca9mfxluHanLxhOOJ9M8qkxxub0\n", "/auqapl+WPraGGNljPHNTE4O3l9Vy/Y95qLs4tm6J6H1TL4278vke8v1VfXogjfNUyX53BjjhiSP\n", "ZfIiqePybGQAaOYyMgA0E1sAaCa2ANBMbAGgmdgCQDOxBYBmYgsAzf4XX6HqHBAVnJEAAAAASUVO\n", "RK5CYII=\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure()\n", "plt.plot(random_data.y, random_data.x, '.')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAd8AAAFVCAYAAACuK+XmAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAED5JREFUeJzt3X+MZWddx/HPbrctWGvZ8KMFabZB0mejRKr8qKww7Qap\n", "gBC1MZGwiK2KIOhiILPYopUQGhonEh0bG9ICLVolgVSkIWoNNAytia2IASTzLcV0TUyLUAcamrak\n", "7frH3IUN7M7s3N773M6Z1yvZ5NyzZ3q/m+7e9zznnntm26FDhwIA9LN91gMAwFYjvgDQmfgCQGfi\n", "CwCdiS8AdCa+ANDZjvUOaK2dm+SKqtrbWntakquTPCnJtiSvr6q7pjsiAAzLmivf1tqBrMb25NGu\n", "P0nyV1V1XpLLkjxnuuMBwPCsd9r5ziQXZnWVmyR7kpzZWvvnJPuSfHqKswHAIK152rmqbmitnXXE\n", "rrOS/F9Vvay19kdJ3pHkj4/19a21k5O8IMndSR55zNMCwOPbCUmenuT2qnroWAet+57v97k3ySdG\n", "2zcmuXyd41+Q5LMbfA4A2OxekuSWY/3mRuN7S5JfSPLXSc5L8qV1jr87Sa6//vqcccYZG3wqANhc\n", "7rnnnuzbty8Z9e9Yjje+h3/6wtuTXNNa+50k30zy2nW+7pEkOeOMM/LMZz7zOJ8KADa9Nd9qXTe+\n", "o48S7Rlt/3eSCyYyFgBsUW6yAQCdiS8AdCa+ADAh7732tuM6TnwBYALmF5fy1f/51nEdK74A0Jn4\n", "AsAELOyfy4/96GnHdaz4AsCEXHLRC4/rOPEFgM7EFwA6E18A6Ex8AaAz8QWAzsQXADoTXwDoTHwB\n", "oDPxBYDOxBcAOhNfAOhMfAGgM/EFgM7EFwA6E18A6Ex8AaAz8QWAzsQXADoTXwDoTHwBoDPxBYDO\n", "xBcAOhNfAOhMfAGgs3Xj21o7t7V28/fte21r7V+mNxYADNeOtX6ztXYgyeuSfPuIfT+V5DemPBcA\n", "DNZ6K987k1yYZFuStNaenOTyJL9/eB8AsDFrxreqbkjycJK01rYn+UCSt+WIlTAAsDEbueDqeUme\n", "neSqJH+b5Mdba++bylQADNL84lLmF5dmPcbMHXd8q+r2qnpOVe1N8pokX66qt01vNACGZH5xKcsH\n", "V7J8cGXLB/h443vo+x5vO8o+AOA4rHm1c5JU1V1J9qy3DwDWsrB/7rsr3oX9czOeZrbWjS8ATMpW\n", "j+5h7nAFAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQmvgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+IL\n", "AJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQmvgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0\n", "Jr4A0Jn4AkBn4gvwODO/uJT5xaVZj8EU7VjvgNbauUmuqKq9rbVzkiwmeSTJQ0leX1X/O+UZAbaM\n", "+cWlLB9c+e72wv65GU/ENKy58m2tHUhydZKTR7v+LMnvVtXeJDckecd0xwOA4VnvtPOdSS5Msm30\n", "+DVV9YXR9olJHpjWYABb0cL+uezetTO7d+206h2wNU87V9UNrbWzjnh8T5K01vYkeUuSl0x1OoAt\n", "SHSHb8MXXLXWfjXJVUleWVX3Tn4kABi2dS+4OlJr7XVJfjvJ+VW1Mp2RAGDYjnfle6i1tj3Jnyf5\n", "4SQ3tNZubq29a2qTAcBArbvyraq7kuwZPXzyVKcBgC3ATTYAoDPxBYDOxBcAOhNfAOhMfAGgM/EF\n", "gM7EFwA6E18A6Ex8AaAz8QWAzsQXADoTXwDoTHwBoDPxBYDOxBcAOhNfAOhMfAGgM/EFgM7EFwA6\n", "E18A6Ex8AaAz8QWAzsQXADoTXwDoTHyBTWd+cSnzi0uzHgPGJr7ApjK/uJTlgytZPrgiwGxa4gsA\n", "nYkvsKks7J/L7l07s3vXzizsn5v1ODCWHbMeAGCjRJfNzsoXADpbd+XbWjs3yRVVtbe19uwk1yZ5\n", "NMmXkrylqg5Nd0QAGJY1V76ttQNJrk5y8mjX+5JcWlVzSbYl+cXpjgcAw7Peaec7k1yY1dAmyU9X\n", "1eFr+/8hyc9NazAAGKo141tVNyR5+Ihd247Y/naS06YxFAAM2UYvuHr0iO1Tk3xzgrMAwJaw0fh+\n", "vrV23mj7FUncXgYANuh4P+d7+Irmtye5urV2UpIvJ/nYVKYCgAFbN75VdVeSPaPtryQ5f7ojAcCw\n", "uckGAHQmvgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQm\n", "vgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwzQ/OJS5heXZj0GcAziCwMzv7iU5YMrWT64\n", "IsDwOCW+ANCZ+MLALOyfy+5dO7N7184s7J+b9TjAUeyY9QDA5IkuPL5Z+QJAZ+ILAJ2JLwB0Jr4A\n", "0Jn4AkBn4gsAnW34o0atte1JrklydpJHk7yhqmrSgwHAUI2z8r0gySlV9eIk705y+WRHAoBhGye+\n", "DyQ5rbW2LclpSb4z2ZEAYNjGucPVrUmekGQ5yZOTvHqiEwHAwI2z8j2Q5NaqaknOSXJda+2kyY4F\n", "AMM1TnxPSXLfaHslyYlJTpjYRAAwcOOcdl5I8qHW2mezGt5LquqByY4FAMO14fhW1TeT/PIUZgGA\n", "LcFNNgCgM/EFgM7EFwA6E18A6Ex8AaAz8QWAzsQXADoTXwDoTHwBoDPxBYDOxBcAOhNfAOhMfAGg\n", "M/EFgM7Ely1pfnEp84tLsx4D2KLEly1nfnEpywdXsnxwRYCBmRBfAOhMfNlyFvbPZfeundm9a2cW\n", "9s/NehxgC9ox6wFgFkQXmCUrXwDoTHwBoDPxBYDOxBcAOhNfAOhMfAGgM/EFgM7EFwA6E18A6Ex8\n", "AaAz8QWAzsa6t3Nr7ZIkr05yYpIrq+q6iU4FAAO24ZVva+38JC+qqj1Jzk/yrAnPBACDNs7K94Ik\n", "X2ytfTzJjySZn+xIADBs48T3qUnOTPKqrK56P5Fk9ySHAoAhG+eCq28kuamqHq6qO5I82Fp7yoTn\n", "AoDBGie+tyR5eZK01p6R5JQk905yKAAYsg3Ht6o+meTzrbXbsnrK+c1VdWjikwHAQI31UaOqesek\n", "BwGArcJNNgCgM/EFgM7EFwA6E18A6Ex8AaAz8QWAzsQXADoTXwDoTHwBoDPx5ajmF5cyv7g06zEA\n", "Bkl8+QHzi0tZPriS5YMrAgwwBeILAJ2JLz9gYf9cdu/amd27dmZh/9ysxwEYnLF+qhHDJ7oA02Pl\n", "CwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQmvgDQmfgCQGfiCwCdiS8A\n", "dCa+ANCZ+AJAZzvG/cLW2tOSfC7JS6vqjsmNBADDNtbKt7V2YpL3J7l/suMAwPCNe9p5IclVSe6e\n", "4CwAsCVsOL6ttYuSfL2qbhrt2jbRiQBg4MZZ+V6c5GWttZuTnJPkutba6ZMdCwCGa8MXXFXVeYe3\n", "RwF+Y1V9baJTAcCA+agRAHQ29keNkqSq9k5qEADYKqx8AaAz8QWAzsQXADoTXwDoTHwfg/nFpcwv\n", "Ls16DAA2GfEd0/ziUpYPrmT54IoAA7Ah4gsAnYnvmBb2z2X3rp3ZvWtnFvbPzXocADaRx3STja1O\n", "dAEYh5UvAHQmvgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkv\n", "AHQmvgDQmfgCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0NmOjX5Ba+3EJB9MsivJyUne\n", "U1U3TnowABiqcVa++5J8varmkrw8yZWTHQkAhm3DK98kH03ysdH29iQPT24cABi+Dce3qu5Pktba\n", "qVkN8TsnPRQADNlYF1y11s5M8ukkH66qj0x2JAAYtnEuuDo9yU1J3lxVN09+JAAYtnHe8700yWlJ\n", "LmutXTba94qqenByYwHAcI3znu9bk7x1CrMAwJbQ5SYb7732th5PAwCbgjtcAUBnXeJ7yUUv7PE0\n", "ALApWPkCQGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQmvgDQmfgC\n", "QGfiCwCdiS8AdCa+ANCZ+AJAZ+ILAJ2JLwB0Jr4A0Jn4AkBn4gsAnYkvAHQmvgDQmfgCQGfiCwCd\n", "iS8AdLZjo1/QWtue5C+T/GSSh5L8VlV9ddKDAcBQjbPy/aUkJ1XVniR/kORPJzsSAAzbOPH92ST/\n", "mCRV9a9Jnj/RiQBg4DZ82jnJjyS574jHj7TWtlfVo0c59oQkueeee8aZDQA2lSN6d8Jax40T3/uS\n", "nHrE42OFN0meniT79u0b42kAYNN6epJjXg81TnxvTfLqJB9trf1Mki+sceztSV6S5O4kj4zxXACw\n", "mZyQ1fDevtZB2w4dOrSh/2prbVu+d7VzklxcVXeMMyEAbEUbji8A8Ni4yQYAdCa+ANCZ+AJAZ+IL\n", "AJ2N81GjdW2V+z+31s5NckVV7Z31LJPUWjsxyQeT7EpycpL3VNWNs51qclprJyS5OsnZSQ4leVNV\n", "/edsp5q81trTknwuyUuH9ImE1tq/J/nW6OF/VdVvznKeSWutXZLVj3OemOTKqrpuxiNNTGvt15Nc\n", "NHr4xCTPTXJ6Vd13zC/aREbtuyarry2PJnlDVdXRjp3Wynfw939urR3I6gv4ybOeZQr2Jfl6Vc0l\n", "eXmSK2c8z6S9KsmjVfXiJH+Y5PIZzzNxo2+g3p/k/lnPMkmttSckSVXtHf0aWnjPT/Ki0Wvn+Ume\n", "NdOBJqyqrjv8/y7JvyX5vaGEd+SCJKeMXlvenTVeW6YV361w/+c7k1yYZNusB5mCjya5bLS9PcnD\n", "M5xl4qrq75O8cfTwrCQrs5tmahaSXJXVG9wMyXOT/FBr7Z9aa58anX0akguSfLG19vEkNyb5xIzn\n", "mYrW2vOT/ERVXTPrWSbsgSSnje6HcVqS7xzrwGnF96j3f57Sc81EVd2QgUXpsKq6v6q+3Vo7Nash\n", "fuesZ5q0qnqktXZtksUkfzPjcSaqtXZRVs9c3DTaNaRvEO9PslBVP5/kTUmuH9hry1OTPC/Jr2T0\n", "55vtOFNzaZJ3zXqIKbg1yROSLGf1zNNfHOvAaf2l3cj9n3kcaq2dmeTTST5cVR+Z9TzTUFUXZfW9\n", "matba0+c8TiTdHGSl7XWbk5yTpLrWmunz3imSbkjoyBV1VeS3JvRPeQH4htJbqqqh0fv0z/YWnvK\n", "rIeapNbak5KcXVWfmfUsU3Agya1V1fK9f3snHe3AacX31iSvTJLjuP8zjzOjF+qbkhyoqmtnPM7E\n", "tdZ+bXRRS7J6mujR0a9BqKrzqur80ftq/5Hk9VX1tVnPNSEXZ3QNSWvtGVk9yzakU+u3ZPU6i8N/\n", "vlOy+g3GkMwl+dSsh5iSU/K9s74rWb1o7qg/3WgqVzsn+busfud96+jxxVN6nseDId6f89Ksvl9x\n", "WWvt8Hu/r6iqB2c40yR9LMm1rbXPZPUfx1ur6qEZz8Tx+UCSD7XWlkaPLx7SWbWq+mRrba61dltW\n", "F0dvrqqhvcacnTV+2s8mt5DVv5+fzepryyVV9cDRDnRvZwDobEgXKgDApiC+ANCZ+AJAZ+ILAJ2J\n", "LwB0Jr4A0Jn4AkBn/w8QKCFRo+ekIgAAAABJRU5ErkJggg==\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure()\n", "plt.plot(X_transformed[0], '.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And for sanity, let's compare our PC1 vs what sklearn would spit out." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAeAAAAFVCAYAAAA30zxTAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAD/VJREFUeJzt3X2IZXd9x/HPJptEDUlcfEi0kS2W5rd90Ag+1MdJpJiq\n", "KMTS/pVqE6JopW7SlrFG21BEa2GwhalorUYTq9USqVYraIqKk4RqrG3RSvcba8v+UYxEGQ0NeSDJ\n", "9o+50SC7a/buuflm7rxesHDvnXP3fJfdve/5nXvumV2HDh0KAPDQOqF7AADYiQQYABoIMAA0EGAA\n", "aCDAANBAgAGgwe55njTGOCHJ+5Ock+S+JK+pqppyMABYZvOugC9IcmpVPT/JW5O8fbqRAGD5zRvg\n", "O5KcMcbYleSMJHdPNxIALL+5DkEnuTHJI5IcSPKYJC8/0oZjjFOSPDPJd5LcO+f+AGC7ODHJE5J8\n", "taruOtJG8wb4jUlurKq3jDHOTvKFMcYvV9XhVsLPTHL9nPsBgO3qBUluONIX5w3wqUlum93eTHJS\n", "top/ON9Jko985CM566yz5twdAGwPt9xySy666KJk1r8jmTfAa0k+OMa4PlvxvaKq7jjCtvcmyVln\n", "nZWzzz57zt0BwLZz1Ldd5wpwVf0gySvmGgcAcCEOAOggwADQQIABoIEAA0ADAQaABgIMAA0EGAAa\n", "CDAANBBgAGggwADQQIABoIEAA0ADAQaABgIMAA0EGAAm9I6rb3pQ2wkwAExkdX0j3/7fHz6obQUY\n", "ABoIMABMZG3/Sn7uZ854UNsKMABM6IqLn/WgthNgAGggwADQQIABoIEAA0ADAQaABgIMAA0EGAAa\n", "CDAANBBgAGggwADQQIABoIEAA0CD3fM+cYxxRZKXJzkpybuq6prJpgKAJTfXCniMcX6S51TVc5Oc\n", "n+TJE84EAEtv3hXwBUm+Mcb4ZJLTk6xONxIALL95A/y4JE9K8rJsrX4/lWTfVEMBwLKb9ySs7yW5\n", "rqruqaqbk9w5xnjshHMBwFKbN8A3JHlxkowxnpjk1CTfn2ooAFh2cwW4qj6T5N/GGDdl6/Dz66vq\n", "0KSTAcASm/tjSFX1h1MOAgA7iQtxAEADAQaABgIMAA0EGAAaCDAANBBgAGggwADQQIABoIEAA0AD\n", "AQbgIbW6vpHV9Y3uMdoJMAAPmdX1jRw4uJkDBzd3fIQFGAAaCDAAD5m1/SvZt3dP9u3dk7X9K93j\n", "tJr7pyEBwDx2enjvZwUMAA0EGAAaCDAANBBgAGggwADQQIABoIEAA0ADAQaABgIMAA0EGAAaCDAA\n", "NBBgAGggwADQQIABoIEAA0ADAQaABruP58ljjMcn+VqSX62qm6cZCQCW39wr4DHGSUnem+T26cYB\n", "gJ3heA5BryV5T5LvTDQLAOwYcwV4jHFxklur6rrZQ7smmwgAdoB5V8CXJHnRGOOLSZ6W5JoxxpnT\n", "jQUAy22uk7Cq6rz7b88i/Nqq+u5kUwHAkvMxJABocFwfQ0qSqnrhFIMAwE5iBQwADQQYABoIMAA0\n", "EGAAaCDAAA8zq+sbWV3f6B6DBRNggIeR1fWNHDi4mQMHN0V4yQkwADQQYICHkbX9K9m3d0/27d2T\n", "tf0r3eOwQMd9IQ4ApiW8O4MVMAA0EGAAaCDAANBAgAGggQADQAMBBoAGAgwADQQYABoIMAA0EGAA\n", "aCDAANBAgAGggQADQAMBBoAGAgwADQQYABoIMAA0EGAAaCDAANBAgAGggQADQIPd8zxpjHFSkg8k\n", "2ZvklCRvq6pPTzkYACyzeVfAFyW5tapWkrw4ybumGwkAlt9cK+Ak1yb5+Oz2CUnumWYcANgZ5gpw\n", "Vd2eJGOM07IV47dMORQALLu5T8IaYzwpyReSfKiqPjbdSACw/OY9CevMJNcleX1VfXHakQBg+c37\n", "HvCbk5yR5MoxxpWzx15SVXdOMxYALLd53wO+LMllE88C8KCtrm8kSdb2rzRPAvNxIQ5g21ld38iB\n", "g5s5cHDzRyGG7UaAAaCBAAPbztr+lezbuyf79u5xCJpta96TsABaCS/bnRUwADQQYABoIMAA0ECA\n", "AaCBAANAAwEGgAYCDAANBBgAGggwADQQYABoIMAA0ECAAaCBAANAAwEGgAYCDAANBBgAGggwADQQ\n", "YABoIMAA0ECAAaCBAANAAwEGgAYCDAANBBgAGggwADQQYABoIMAA0GD3PE8aY5yQ5N1JnprkriSv\n", "rqpvTzkYACyzeVfAFyY5uaqem+RNSd453UjAFFbXN7K6vtE9BnAE8wb4eUk+myRV9ZUkz5hsIuC4\n", "ra5v5MDBzRw4uCnC8DA1b4BPT3LbA+7fOzssDQA8CPNG87Ykpz3w96mq+yaYB5jA2v6V7Nu7J/v2\n", "7sna/pXucYDDmOskrCQ3Jnl5kmvHGM9O8vXpRgKmILzw8DZvgD+R5EVjjBtn9y+ZaB4A2BHmCnBV\n", "HUryOxPPAgA7hhOnAKCBAANAAwEGgAYCDAANBBgAGggwADQQYABoIMAA0ECAAaCBAANAAwEGgAYC\n", "DAANBBgAGggwADQQYABoIMAA0ECAAaCBAANAAwEGgAYCDAANBBgAGggwADQQYABoIMAA0ECAAaCB\n", "AANAAwEGgAYCzI61ur6R1fWN7jGAHUqA2ZFW1zdy4OBmDhzcFGGghQADQIPdx/qEMcYZST6c5LQk\n", "Jyf5/ar68tSDwSKt7V/50cp3bf9K8zTATnTMAU7ye0n+qarWxxjnJPlokqdPOxYsnvACneYJ8F8k\n", "uWt2+6Qkd0w3DgDsDEcN8Bjj0iSX/8TDF1fV18YYZyX5mySXLWo4AFhWRw1wVV2V5KqffHyM8ZRs\n", "HXr+g6q6fkGzAcDSmuckrF9Mcm2S36yqb0w/EgAsv3neA/7TbJ39vD7GSJIfVNUrJp0KAJbcMQe4\n", "qi5cxCAAsJO4EAcANBBgAGggwADQQIABoIEAA0ADAQaABgIMAA0EGAAaCDAANBBgAGggwADQQIAB\n", "oIEAA0ADAQaABgIMAA0EGAAaCDAANBBgAGggwADQQIABoIEAA0ADAQaABgLMEa2ub2R1faN7DICl\n", "JMAc1ur6Rg4c3MyBg5siDLAAAgwADQSYw1rbv5J9e/dk3949Wdu/0j0OwNLZ3T0AD1/CC7A4VsAA\n", "0ECAAaCBAANAg7nfAx5j7Evy5SSPr6q7pxsJAJbfXCvgMcbpSd6Z5M5pxwGAneGYAzzG2JXkvUmu\n", "SHLH5BMBwA5w1EPQY4xLk1z+Ew8fTPKxqvr6GCNJdi1oNgBYWkcNcFVdleSqBz42xvhWkktncT4r\n", "yeeSnL+oAQFgGR3zSVhV9fP33x5j/E+SCyadCAB2gOP9GNKhSaYAgB3muC5FWVVPnmoQANhJXIgD\n", "ABoIMAA0EGAAaCDAANBAgAGggQADQAMBBoAGAgwADQQYABoIMAA0EGAAaCDAANBAgAGggQADQAMB\n", "BoAGAgwADQQYABoI8HFYXd/I6vpG9xgAbEMCPKfV9Y0cOLiZAwc3RRiAYybAANBAgOe0tn8l+/bu\n", "yb69e7K2f6V7HAC2md3dA2xnwgvAvKyAAaCBAANAAwEGgAYCDAANBBgAGggwADQQYABoIMAA0ECA\n", "AaDBMV8Ja4xxYpI/T/L0JCcnubKqPjv1YACwzOZZAb8yye6qen6SC5P8wrQjAcDym+da0Bck+Y8x\n", "xj8m2ZXkDdOOBADL76gBHmNcmuTyn3j41iR3VNXLxhgrST6Y5LwFzQcAS+moAa6qq5Jc9cDHxhgf\n", "TfKZ2dc3xhjnLG48AFhO87wHfEOSlybJGOPcJAcnnQgAdoB5Avy+JLvGGP+c5K+SvG7akQBg+R3z\n", "SVhVdXeSSxcwCwDsGC7EAQANBBgAGggwADQQYABoIMAA0ECAAaCBAANAAwEGgAYCDAANBBgAGggw\n", "ADQQYABo8JAF+B1X3/RQ7QoAHvasgAGgwUMW4CsuftZDtSsAeNizAgaABgIMAA0EGAAaCDAANBBg\n", "AGggwADQQIABoIEAA0ADAQaABgIMAA0EGAAaCDAANBBgAGggwADQQIABoMHuY33CGONRST6a5NFJ\n", "7k7yW1X13akHA4BlNs8K+FVJ/rOqzkvyd0lWpx0JAJbfPAG+I8ljZrfPyNYqGAA4Bkc9BD3GuDTJ\n", "5Q946FCS303ypjHGN5PsSbLyU/ZxYpLccsstxzEmAGwPD+jdiUfbbtehQ4eO6TceY/x1kq9W1fvG\n", "GE9J8uGqOvco2z8/yfXHtBMA2P5eUFU3HOmLx3wSVpJTk9w2u31rktN/yvZfTfKCJN9Jcu8c+wOA\n", "7eTEJE/IVv+OaJ4V8N4k70vyiGwF/I+r6vNzDgkAO9IxBxgAOH4uxAEADQQYABoIMAA0EGAAaDDP\n", "x5AelDHGCUneneSpSe5K8uqq+vai9tdljPErSf6sql7YPcuUxhgnJflAkr1JTknytqr6dO9U0xlj\n", "nJits/nPydYFZl5XVd/snWpaY4zHJ/lakl+tqpu755nSGONfk/xwdve/q+rSznmmNsa4IsnLk5yU\n", "5F1VdU3zSJMZY/x2kotndx+Z5NwkZ1bVbUd80jYya9/7s/Xacl+S11RVHW7bRa6AL0xyclU9N8mb\n", "krxzgftqMcZ4Y7ZexE/pnmUBLkpya1WtJHlxknc1zzO1lyW5r6qen+SPkry9eZ5Jzb6Bem+S27tn\n", "mdoY4xFJUlUvnP1atvien+Q5s9fO85M8uXWgiVXVNff/3SX5lyRvWJb4zlyQ5NTZa8tbc5TXlkUG\n", "+HlJPpskVfWVJM9Y4L66/FeSX0+yq3uQBbg2yZWz2yckuadxlslV1T8kee3s7s8m2eybZiHWkrwn\n", "WxfAWTbnJnnUGONzY4zPz45CLZMLknxjjPHJJJ9O8qnmeRZijPGMJL9UVe/vnmVidyQ5Y4yxKz/l\n", "5yUsMsCn58dXzEqSe2dL86VRVX+fJQvT/arq9qr6vzHGadmK8Vu6Z5paVd07xrg6yXqSv20eZzJj\n", "jIuzdfTiutlDy/YN4u1J1qrq15K8LslHluy15XFJnp7kNzL78/WOszBvTvIn3UMswI3ZulDVgWwd\n", "hfrLI224yH+0tyU57YH7qqr7Frg/JjbGeFKSLyT5UFV9rHueRaiqi7P1Xs37xhiPbB5nKpckedEY\n", "44tJnpbkmjHGmc0zTenmzKJUVd9K8v1sXfZvWXwvyXVVdc/svfs7xxiP7R5qSmOMRyc5p6q+1D3L\n", "ArwxyY1VNfLj/38nH27DRQb4xiQvTZIxxrOTfH2B+2Jisxfs65K8saqubh5ncmOMV85OdEm2Dhnd\n", "N/u17VXVeVV1/uw9tn9P8qqq+m73XBO6JLNzSsYYT8zW0bZlOtR+Q7bOu7j/z3dqtr7JWCYrSZb1\n", "EsYP/HkJm9k6ke6wPxVpYWdBJ/lEtr4Lv3F2/5IF7qvbMl7P883Zev/iyjHG/e8Fv6Sq7mycaUof\n", "T3L1GONL2foPcllV3dU8Ew/OVUk+OMbYmN2/ZJmOrlXVZ8YYK2OMm7K1SHp9VS3ba8w5SZbuUzEz\n", "a9n693l9tl5brqiqOw63oWtBA0CDZTpxAQC2DQEGgAYCDAANBBgAGggwADQQYABoIMAA0OD/AXTf\n", "BizRtjQzAAAAAElFTkSuQmCC\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn import decomposition\n", "plt.plot(decomposition.PCA().fit_transform(random_data).T[0], '.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### How do we explain it?\n", "\n", "1. This isn't \"made up data.\" We're using your original features to help explain the relationships.\n", "2. Our shape is different because this is a _linear projection_ of our original data. It represents the same information, but is displayed differently.\n", "3. Instead of using rules, we can use something else to explain relationships: correlations with the dependent variable.\n", "4. Did we have features that were highly correlated? This may help us understand where our explained variances comes from.\n", "5. The order of principal components is also related to the order of explained covariance; Therefore, PC1 would likely mostly be related to the features mostly correlated." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Practice it!\n", "\n", "What happens when we start introducing noise into our data? Run PCA on our new dataset below and evaluate what changes:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " x y z\n", "0 1 -1.878842 6.794887\n", "1 2 -8.625472 44.035247\n", "2 3 26.777612 12.021532\n", "3 4 6.347470 -4.309537\n", "4 5 11.734814 5.665158\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAesAAAFVCAYAAADPM8ekAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAFnNJREFUeJzt3X+sZGV9x/H3LrIQye7GGHW1GKwxfrVJCUEpirK7BEQx\n", "JShp0zbbGoxirFSoJaBsLKkGfyRbrN1YqVnUldb6BwRJDUFpkLq4TcUQWiXI12qExARSbXZdIMiv\n", "vf3jzMjlMvfuzJkzM8855/1KjHNn75x55pnLfOb7PM95zrqlpSUkSVK51i+6AZIkaW2GtSRJhTOs\n", "JUkqnGEtSVLhDGtJkgpnWEuSVLjnjfNLEXEq8OnMPCMiTgJ2A08DjwPvysz/jYgLgfcBTwFXZebN\n", "s2q0JEl9csTKOiIuB/YAxwzu+izwF5l5BnAj8OGIeAnwQeA04K3ApyJiw2yaLElSv4wzDP4T4Hxg\n", "3eDnP87MHwxuHw08BvwesD8zn8zMQ4PHnNh0YyVJ6qMjDoNn5o0R8YplPz8EEBGnARcBpwNvA361\n", "7GEPA5tXO2ZEHAOcAjxINZwuSVLXHQW8FPh+Zj4+yQPHmrNeKSL+CNgJvD0z/y8iDgEbl/3KRuDA\n", "Goc4BbijznNLktRypwPfneQBE4d1RPwp1UKy7Zk5DOQ7gU8MKuZjgdcC96xxmAcBvvrVr7Jly5ZJ\n", "myBJUus89NBD7NixAwYZOIlJwnopItYDfw88ANwYEQD/npkfi4jdVNXyemBnZj6xxrGeBtiyZQvH\n", "H3/8pG2WJKnNJp7+HSusM/N+qpXeAC9c5XeuBa6dtAGSJGltbooiSVLhDGtJkgpnWEuSVDjDWpKk\n", "whnWkiQVzrCWJKlwhrUkSYUzrCVJKpxhLUlS4QxrSZIKZ1hLklQ4w1qSpMIZ1pIkFc6wliSpcIa1\n", "JEmFM6wlSSqcYS1JUuEMa0mSCmdYS5JUOMNakqTCGdaSJBXOsJYkqXCGtSRJhTOsJUkqnGEtSVLh\n", "DGtJkgpnWEuSVDjDWpKkwhnWkiQVzrCWJKlwhrUkSYUzrCVJKpxhLUlS4QxrSZIKZ1hLklQ4w1qS\n", "pMI9b5xfiohTgU9n5hkR8SpgL3AYuAe4KDOXIuJC4H3AU8BVmXnzjNosSVKvHLGyjojLgT3AMYO7\n", "PgPszMytwDrgvIjYAnwQOA14K/CpiNgwmyZLklSWy3bv47Ld+2Z2/HGGwX8CnE8VzAAnZ+awRbcA\n", "ZwGnAPsz88nMPDR4zIlNN1aSpNJctnsf9z1wgPseODCzwD5iWGfmjVRD20Prlt1+GNgMbAJ+NeJ+\n", "SZI0pToLzA4vu70JOAgcAjYuu38jcGCKdkmS1Aq7Lt7Ka054Aa854QXsunjrTJ5jrAVmK9wdEdsy\n", "8zvAOcBtwJ3AJyLiGOBY4LVUi88kSeq8WYX00CRhvTT4/0uBPYMFZPcCNwxWg+8G7qCq1ndm5hPN\n", "NlWSpH4aK6wz836qld5k5v8A20f8zrXAtQ22TZIk4aYokiQVz7CWJKlhTZ93bVhLkjpp1huVrPW8\n", "TZ93bVhLkjpnHhuVzJNhLUkq2qIq5Lpmcd61YS1JasQsQrVuhTyPjUqO9PxNPm+dTVEkSXqWYagO\n", "by8iIFcqoQ1NsbKWJBVr0RVyKaysJUlT23Xx1t8MUzcdqosO6Vm9rkkY1pKkRiw6VGehlOF9h8El\n", "SXPTtpXdpTCsJUlz0cZzn0uZM3cYXJKkNZQwvG9lLUmai1Kq1DayspYkzY0hXY+VtSRJhTOsJUkq\n", "nGEtSVLhDGtJkgpnWEtSoY60gYgbjPSHYS1JBTrSBiLTbDBSWsiX1p4SGdaS1COl7SJWtz19C3jD\n", "WpIKNGoDkeUB1ecNRkr7wjEPbooiSYVaHsKjrv5UJ6RneSnLOka1p6T2lcKwlqSeKS0Ej/SlZNTv\n", "9y3QDWtJaoE+BtRa+tYHhrUktUQfAsovJaMZ1pKkohjSz+VqcEmSCmdYS5JUOMNakqTCGdaS1GF9\n", "2+mrqwxrSeqoEnb68stCMwxrSVqALoTYOFcFW/SXha4wrCVpCnVCd14hNsv9ww3i+TKsJammNgRW\n", "3T3Em3ru0i820pYRjlqbokTEeuBa4NXAYeBC4Glg7+Dne4CLMnOpmWZKUnd0YZeucV9Dya9vnH3I\n", "S1F3B7OzgeMy880RcRbwycGxdmbmvoi4BjgPuKmhdkpScaYJ3ZKDYVxdeA1tUTesHwM2R8Q6YDPw\n", "BHBqZg7HEm6hCnTDWlKnNRVYba+026hNIxx1w3o/cCxwH/BC4Fxg+St9hCrEJUlH0Kbh2K5pS1/X\n", "XWB2ObA/MwM4CbgOOHrZv28EDk7ZNklSIdqyEKur6ob1ccChwe0DVBX63RGxbXDfOYDvqqSFaFuw\n", "lL5qug2r3ruu7jD4LuDLEXEHVUV9BXAXsCciNgD3Ajc000RJGl8JQ8p15kFLDGmVo1ZYZ+ZB4J0j\n", "/mn7VK2RpJYr4ctC09q0EGstbX4NbooiqVNKH1Juq0VurtKEtg/l1x0Gl6RiLTJUulKFqiyGtSQ1\n", "zJAuT9u/RBnWkqReaGNIDzlnLUkTaNtpYX3R9ffFsJakMbV9kVJX9eF9MawlSSqcYS1JY/K0sDL1\n", "4X1xgZkkTaCrYdB283xfFrGq3MpakqQxLWp+3LCWJKlwhrWkuen66TXqvkXNjztnLWkuuniBC/XT\n", "Iv52rawlqUMcvegmw1rSXNQdPjR8xteHzUFmrdS/N8Na0txMepnFJsOn1A9hlaPkLzuGtaTOK/lD\n", "uEl92Bykr1xgJqlYbb+s4SLYT/WV/PdmWEsqWhMfmiV/CKsspf59GNaSemGcD2EDXaVyzlqS6M+8\n", "ttrJsJbUS64OV5sY1pJ6Z1QV7Upqlcw5a0kaMKRVKitrSb1jFa22sbKW1EuGtNrEylqSpMIZ1pK0\n", "iqZWjLvyXNMyrKWW8AN/vpo679rzt9UEw1pqAT/wpX4zrCVphKZWjLvyXE1wNbjUAl6IYjGa6us2\n", "vmf+vZXFsJZawg9Nzctw2mV427+9xXMYXJKkwhnWklrPlfLNcp69PA6DS2o1h2xnw34sS+2wjogr\n", "gHOBo4HPAfuBvcBh4B7gosxcaqCNkiT1Wq1h8IjYDrwxM08DtgOvBK4GdmbmVmAdcF5DbZSkVTlk\n", "qz6oW1mfDfwwIm4CNgGXAe/JzOGk0S2D37lp+iZK0toMaXVd3bB+EfBy4PepqupvUFXTQ48Am6dr\n", "miRJgvph/UvgR5n5FPDjiPg18FvL/n0jcHDaxknqNjfekMZT99St7wJvA4iIlwHPB26LiG2Dfz8H\n", "8DwKSatyv3NpfLUq68y8OSK2RsSdVIH/AeB+YE9EbADuBW5orJWSJPVY7VO3MvPDI+7eXr8pkvrE\n", "/c7L4ftQPjdFkbQwhsPiualMO7jdqCRJhTOsJTVi5f7c7tfdDm4q0w4Og0ua2sqhVKD20Krzp/Nn\n", "X5fPylpSMTydSxrNylrS1Eat7LZClppjWEua2KggXhnKdUJ63NO5/CKgvnEYXNJEZj1UvevirUcM\n", "aofK1TeGtSRJhTOsJU1k0af6LPr5pUVwzlqt4lzldOr238rHLbr/F/380rxZWas1nKucTt3+s9+l\n", "xTOsJUkqnGGt1nCucjp1+89+lxbPOWu1ShfCYpHz7uM85zjnUEuaLytraY5Kn/8tvX1SXxnWWhiv\n", "yiRJ4zGstRB9reBKn/8tvX1SXzlnLc1ZaSFY2jnUkp7LyloLYQVXhr6OcEhtY2WthTGkm+XublJ3\n", "WVlLHVC3QnaEQ2oHK2up5wxpqXxW1lIHWCFL3WZlLXWEIS11l5W1JEmFM6wlSSqcYS1JUuEMa0mS\n", "CmdYS6vwQiOSSmFYSyOM2mTE8Ja0KIa1NIZ576HtFwNJyxnW0giL3GTEi2tIWslNUaRVLA/pXRdv\n", "9UIZkhbGsJbGNK+Q9ouBpJUMa6lAhrSk5aYK64h4MXAXcCZwGNg7+P97gIsyc2naBkqS1He1F5hF\n", "xNHAF4BHgXXAZ4Cdmbl18PN5jbRQUm2uKpe6YZrV4LuAa4AHBz+fnJnDT4VbgLOmaZik6biqXOqO\n", "WmEdERcAv8jMWwd3rRv8b+gRYPN0TZMkSVC/sn438JaIuB04CfgK8KJl/74RODhl2yRNYZHniktq\n", "Vq0FZpm5bXh7ENjvB3ZFxLbM/A5wDnBbM02UVJchLXVDU6duLQGXAnsiYgNwL3BDQ8eWJKnXpg7r\n", "zDxj2Y/bpz2epPG4cYrUH+4NLtG+U5xc6S31i2Gt3jP4JJXOsJZayJXeUr+4N7h6r60XzmhTWyVN\n", "x7CWMPgklc1hcEmSCmdYS5JUOMNakqTCGdaSJBXOsFbrtW1DE0malGGtVnNDE0l9YFhLklQ4w1qt\n", "tuidvByClzQPboqi1lvUhibDIfjhbTdWkTQrVtaaCytQSarPsG6BtgddVxeBLXoIXlJ/OAxeOIda\n", "yzbO+9HGi4RIKouVtWauzxVoV0cVJM2XlXXh2nr5xpXa3HZJWjTDugUMusmU9OWmK1+2JC2WYa1O\n", "GXeOf54BakhLmpZz1uod55EltU2rwrrtpzB1VUnvS58Xs0nqrtaEtdVQmUp8X3ZdvHXNoC4x0Ev6\n", "wiOpPM5Zq5dKCWnwXHpJR9aayrrEakjNvi9Wl5I02rqlpaW5P2lEvAL42W233cbxxx8/9+dXeZZX\n", "l338QubpXVL3/fznP+fMM88E+O3MvH+SxzoMLhXAkJa0ltYMg6vbnOaQpNVZWasYhrQkjWZlLUlS\n", "4QxrrWnUCm1XbUvSfBnWC1Zy8I3a8KTETVAkqesM6wUy+CRJ4zCstapRK7RdtS1J81drNXhEHA18\n", "CTgBOAa4CvgRsBc4DNwDXJSZ899xpUXacK3jUe0qta2S1FV1T93aAfwiM/8sIl4A/DdwN7AzM/dF\n", "xDXAecBNDbWzs5oKvtJDX5JUX91h8OuBK5cd40ng5MwcTrzeApw1Zds0Jue+JanbalXWmfkoQERs\n", "pArujwJ/u+xXHgE2T906rcpKWpL6o/YCs4h4OfBt4LrM/BrVXPXQRuDglG3rpCZO1VpZSbvoS5K6\n", "rVZYR8RLgFuByzNz7+DuuyNi2+D2OYDjsSuMO1xdJ9B3XbzVoJakjqpbWe+kGua+MiJuj4jbqYbC\n", "PxYR/0E1vH5DQ23slXEC3Upakvql7pz1JcAlI/5p+1St6bgmT9UypCWpP7zqFvNdrHWk52jDudeS\n", "pPnqfVgPh52Ht0sIyBLaIEkqh9uNSpJUuN6HtYu1JEml6/0wODjsXDLn7yXJylozMovNX9qg5OuT\n", "S2ovw1qNa2PINqGvr1vS7BnWKpbrCSSp4px1Ry1yrrevm794jrykWTGsxzTOh3ApH9QlnDu+6D5Y\n", "lL6+bkmz5TD4GMaZi3S+UpI0K4Z1BznX+2yu0JbUdp0L61l8MI8TfqUFpJfMrDjiIakLOjVnPcu5\n", "2nGOZThKkmahc5W1tFxpIx6SVEenKmtPndEo/i1IartOhTX4wSxJ6h6HwWfIVciSpCYY1jPiKmRJ\n", "UlMMa0mSCmdYz4irkCVJTencArOStDGkXU0vSeWxstZvOM8uSWUyrCVJKlyrw9pTo5rlPLsklam1\n", "c9YlXLO5i+xHSSpPqyvrOvpcjff5tUtSm7U2rOsM2fZ5AVWfX7sktV1rh8HBIVtJUj+0trKuo88L\n", "qPr82iWp7VpdWY9j5SYffQ6qla/dDVAkqR06XVk7T7s6+0aS2qPTYS1JUhd0Oqydp12dfSNJ7dH5\n", "OWuDaHX2jSS1Q6cra0mSuqDRyjoi1gOfB04EHgfem5k/bfI5JEnqm6Yr63cAGzLzNOAjwNUNH78Y\n", "bt0pSZqXpsP6TcA3ATLze8DrGz5+ETztSZI0T02H9Sbg0LKfnx4MjbeKVbMkqSRNB+khYOPy42fm\n", "4YafY6bGqZo97UmSNE9Nn7q1HzgXuD4i3gD8oOHjF8OQliTNS9OV9deBX0fEfqrFZR8a94GlDD1b\n", "NUuSStNoZZ2ZS8CfT/q44dDz8PaiQ3LRzy9J0nKtW/wlSVLfFBHWDj1LkrS6YvYGN6QlSRqtiMpa\n", "kiStzrCWJKlwhrUkSYUzrCVJKpxhLUlS4QxrSZIKZ1hLklQ4w1qSpMIZ1pIkFc6wliSpcMWGdSmX\n", "zJQkadGKDOvhJTPve+CAgS1J6r0iw1qSJD2jyLD2kpmSJD2jmEtkrmRIS5JUKbKyliRJzzCsJUkq\n", "nGEtSVLhDGtJkgpnWEuSVDjDWpKkwhnWkiQVzrCWJKlwhrUkSYUzrCVJKpxhLUlS4QxrSZIKZ1hL\n", "klQ4w1qSpMIZ1pIkFc6wliSpcIa1JEmFM6wlSSrc8yZ9QERsBv4Z2AhsAP4qM/8zIt4AfBZ4Crg1\n", "Mz/eaEslSeqpOpX1h4B/y8ztwAXAPwzu/0fgTzLzzcCpEXFSIy2UJKnnJq6sgb8DHh/cPhp4LCI2\n", "Ahsy82eD+78FnAX81/RNlCSp39YM64h4D/CXK+6+IDPviogtwD8BlwCbgUPLfudh4JVrHPoogIce\n", "emjiBkuS1EbLMu+oSR+7Zlhn5heBL668PyJ+F/gacGlm3hERm6jmsIc2AQfXOPRLAXbs2DFpeyVJ\n", "aruXAj+d5AF1Fpj9DnA98IeZ+UOAzDwUEU9ExCuBnwFnA3+zxmG+D5wOPAg8PWkbJElqoaOogvr7\n", "kz5w3dLS0kQPiIibgBOBBwZ3HczMd0bEqVSrwY8CvpWZfz1pYyRJ0nNNHNaSJGm+3BRFkqTCGdaS\n", "JBXOsJYkqXCGtSRJhauzg9lUImI98HmqFeWPA+/NzInON9NzRcTRwJeAE4BjgKuAHwF7gcPAPcBF\n", "memKwilFxIuBu4Azqfp2L/ZxoyLiCuBcql0SPwfsx35uzOBz+Frg1VR9eiHVabR7sY+nNjg76tOZ\n", "eUZEvIoR/RoRFwLvo7qexlWZefNax1xEZf0Oqq1JTwM+Aly9gDZ00Q7gF5m5FXgb1Z7tVwM7B/et\n", "A85bYPs6YfCl6AvAo1R9+hns40ZFxHbgjYPPiO1UuyH6t9yss4HjBtdy+DjwSezjRkTE5cAeqqIJ\n", "RnxGDHYA/SBwGvBW4FMRsWGt4y4irN8EfBMgM78HvH4Bbeii64ErB7fXA08CJ2fmvsF9t1Dt167p\n", "7AKuodrQB+zjWTgb+OFgT4dvAP8KvM5+btRjwOaIWEe1XfQT2MdN+QlwPlUww+jPiFOA/Zn5ZGYe\n", "GjzmxLUOuoiw3sSz9xF/ejAkoylk5qOZ+cjgoirXAx/l2e/vI1T/UaqmiLiAavTi1sFd63jmP0iw\n", "j5vyIuB1wB8A7wf+Bfu5afuBY4H7qEaKdmMfNyIzb6Qa2h5a3q8PU/XrJuBXI+5f1SJC8hDP3kd8\n", "fWYeXkA7OiciXg58G7guM79GNUcytJG192vXkb0beEtE3A6cBHyFKliG7ONm/BK4NTOfyswfA7/m\n", "2R9k9vP0Lqeq7ILqb/k6qvUBQ/Zxc5Z/Dg+vm7EyBzcCB9Y6yCLCej/wdoCIeAPwgwW0oXMi4iXA\n", "rcDlmbl3cPfdEbFtcPscYN+ox2o8mbktM7dn5hlUl399F/BN+7hx36Vad0FEvAx4PnCb/dyo43hm\n", "hPMA1WJjPy9mY1S/3gmcHhHHRMRm4LVUi89WNffV4MDXqaqT/YOf372ANnTRTqrq48qIGM5dXwLs\n", "HixcuBe4YVGN66gl4FJgj33cnMy8OSK2RsSdVAXFB4D7sZ+btAv4ckTcQVVRX0F1hoN93JzhSvrn\n", "fEYMVoPvBu6g+hvfmZlPrHUw9waXJKlwLuySJKlwhrUkSYUzrCVJKpxhLUlS4QxrSZIKZ1hLklQ4\n", "w1qSpML9P9hsuG/IsDVwAAAAAElFTkSuQmCC\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "random_data_scattered = pd.DataFrame({\n", " 'x': range(1, 100),\n", " 'y': range(1, 100),\n", " 'z': range(1, 100),\n", " })\n", "\n", "random_data_scattered['y'] = random_data_scattered.y.apply(lambda y: y + np.random.normal(scale=10))\n", "random_data_scattered['z'] = random_data_scattered.z.apply(lambda z: z + np.random.normal(scale=20))\n", "\n", "print random_data_scattered.head()\n", "\n", "plt.plot(random_data_scattered.x, random_data_scattered.y, '.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Apply PCA to the above data using the PCA() class.\n", "2. Evaluate the eigenvalues: how many principal components do we need?\n", "3. Which feature[s] do the principal component[s] we keep explain?\n", "4. Try:\n", " * changing the scatter on y and z,\n", " * adding more features with more scatter\n", " * adding scatter to x\n", " * with the iris data set:\n", " 1. how many principal components do we need in this real data application?\n", " 2. how do you \"name\" the new principal components? what do they explain?\n", " * with your own data set (where would this make sense?) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### What if we need a non linear solution?\n", "\n", "PCA also allows for kernels, like SVMS, because sometimes we are not looking for a linear solution. The strategy and technique is the same. There's a sample script saved as `scripts/kernel_pca.py`, feel free to run the code and experiment." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Review / Reading / Next Steps\n", "\n", "* If you're unsure what is so \"magical\" about solving for eigenvectors and values, this [khan academy](https://www.khanacademy.org/math/linear-algebra/alternate_bases/eigen_everything/v/linear-algebra-introduction-to-eigenvalues-and-eigenvectors) set of videos is very informative!\n", "* If you are looking for more detail, then [this paper](http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf) with included matlab code covers PCA from a very approachable mathematical level.\n", "* Or if you prefer a slide deck, [this one](http://www.cabnr.unr.edu/Saito/classes/ers701/pca2.pdf) is also very informative\n", "* Another common technique is using singular value decomposition as the basis of principal component analysis.\n", " * The [wikipedia article](http://en.wikipedia.org/wiki/Singular_value_decomposition), while mathy, is a good read. The math is also very much required.\n", " * A further explanation on why SVD is the [sexiest matrix decomposition](http://www.quora.com/Why-is-SVD-considered-a-highlight-of-linear-algebra) (need we say more?)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }