{ "metadata": { "name": "", "signature": "sha256:5bfa2c41117defa6e32c7aec94a0870ac58c3762c536c86fb9dc2d7b2292d226" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Feature Scaling and Model Evaluation" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Learn and implement different approaches to feature scaling, and when to use it\n", "* Building some grasp of when to use what machine learning algorithm\n", "* Apply model evaluation skills to a difficut data set" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Data Objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll be working with a dataset that can be difficult to accurately predict! We'll learn why it's important to go beyond just using accuracy when working in classification problems, hypothesize some approaches to working with the features we have, and have a discussion on what are we gaining versus losing." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Class Notes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Data\n", "\n", "Included in the repository today is a used car sale data set. Many features relate to cost, though they are all defined in the data dictionary here:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lines = open('../data/lemons_description.txt')\n", "for line in lines:\n", " print line.strip()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Field Name Definition\n", "RefID Unique (sequential) number assigned to vehicles\n", "IsBadBuy Identifies if the kicked vehicle was an avoidable purchase\n", "PurchDate The Date the vehicle was Purchased at Auction\n", "Auction Auction provider at which the vehicle was purchased\n", "VehYear The manufacturer's year of the vehicle\n", "VehicleAge The Years elapsed since the manufacturer's year\n", "Make Vehicle Manufacturer\n", "Model Vehicle Model\n", "Trim Vehicle Trim Level\n", "Submodel Vehicle Submodel\n", "Color Vehicle Color\n", "Transmission Vehicles transmission type (Automatic, Manual)\n", "WheelTypeID The type id of the vehicle wheel\n", "WheelType The vehicle wheel type description (Alloy, Covers)\n", "VehOdo The vehicles odometer reading\n", "Nationality The Manufacturer's country\n", "Size The size category of the vehicle (Compact, SUV, etc.)\n", "TopThreeAmericanName Identifies if the manufacturer is one of the top three American manufacturers\n", "MMRAcquisitionAuctionAveragePrice Acquisition price for this vehicle in average condition at time of purchase\n", "MMRAcquisitionAuctionCleanPrice Acquisition price for this vehicle in the above Average condition at time of purchase\n", "MMRAcquisitionRetailAveragePrice Acquisition price for this vehicle in the retail market in average condition at time of purchase\n", "MMRAcquisitonRetailCleanPrice Acquisition price for this vehicle in the retail market in above average condition at time of purchase\n", "MMRCurrentAuctionAveragePrice Acquisition price for this vehicle in average condition as of current day\n", "MMRCurrentAuctionCleanPrice Acquisition price for this vehicle in the above condition as of current day\n", "MMRCurrentRetailAveragePrice Acquisition price for this vehicle in the retail market in average condition as of current day\n", "MMRCurrentRetailCleanPrice Acquisition price for this vehicle in the retail market in above average condition as of current day\n", "PRIMEUNIT Identifies if the vehicle would have a higher demand than a standard purchase\n", "AcquisitionType Identifies how the vehicle was aquired (Auction buy, trade in, etc)\n", "AUCGUART The level guarntee provided by auction for the vehicle (Green light - Guaranteed/arbitratable, Yellow Light - caution/issue, red light - sold as is)\n", "KickDate Date the vehicle was kicked back to the auction\n", "BYRNO Unique number assigned to the buyer that purchased the vehicle\n", "VNZIP Zipcode where the car was purchased\n", "VNST State where the the car was purchased\n", "VehBCost Acquisition cost paid for the vehicle at time of purchase\n", "IsOnlineSale Identifies if the vehicle was originally purchased online\n", "WarrantyCost Warranty price (term=36month and millage=36K)\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "from sklearn import tree\n", "from sklearn.cross_validation import cross_val_score\n", "\n", "# Load in data and create sets. dropping all na columns on the live data set.\n", "lemons = pd.read_csv('../data/lemons.csv')\n", "lemons_oos = pd.read_csv('../data/lemons_oos.csv')\n", "print lemons.dtypes" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "RefId int64\n", "IsBadBuy int64\n", "PurchDate object\n", "Auction object\n", "VehYear int64\n", "VehicleAge int64\n", "Make object\n", "Model object\n", "Trim object\n", "SubModel object\n", "Color object\n", "Transmission object\n", "WheelTypeID float64\n", "WheelType object\n", "VehOdo int64\n", "Nationality object\n", "Size object\n", "TopThreeAmericanName object\n", "MMRAcquisitionAuctionAveragePrice float64\n", "MMRAcquisitionAuctionCleanPrice float64\n", "MMRAcquisitionRetailAveragePrice float64\n", "MMRAcquisitonRetailCleanPrice float64\n", "MMRCurrentAuctionAveragePrice float64\n", "MMRCurrentAuctionCleanPrice float64\n", "MMRCurrentRetailAveragePrice float64\n", "MMRCurrentRetailCleanPrice float64\n", "PRIMEUNIT object\n", "AUCGUART object\n", "BYRNO int64\n", "VNZIP1 int64\n", "VNST object\n", "VehBCost float64\n", "IsOnlineSale int64\n", "WarrantyCost int64\n", "dtype: object\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below includes a very simple \"benchmark\" script. One common test we'd want to consider for model evaluation is if we can do better than random, which we can use the sklearn DummyClassifier to do." ] }, { "cell_type": "code", "collapsed": false, "input": [ "lemons = lemons.dropna(axis=1)\n", "# Generating a list of continuous data features from the describe dataframe. \n", "# Then, removing the two non-features (RefId is an index, IsBadBuy is the prediction value)\n", "features = list(lemons.describe().columns)\n", "features.remove('RefId')\n", "features.remove('IsBadBuy')\n", "\n", "best_score = -1\n", "for depth in range(1, 10):\n", " scores = cross_val_score(tree.DecisionTreeClassifier(max_depth=depth, random_state=1234),\n", " lemons[features],\n", " lemons.IsBadBuy,\n", " scoring='roc_auc',\n", " cv=5)\n", " if scores.mean() > best_score:\n", " best_depth = depth\n", " best_score = scores.mean()\n", "\n", "# Is the best score we have better than each DummyClassifier type?\n", "from sklearn import dummy, metrics\n", "for strat in ['stratified', 'most_frequent', 'uniform']:\n", " dummyclf = dummy.DummyClassifier(strategy=strat).fit(lemons[features], lemons.IsBadBuy)\n", " print 'did better than %s?' % strat, metrics.roc_auc_score(lemons.IsBadBuy, dummyclf.predict(lemons[features])) < best_score\n", "\n", "# seems so!\n", "\n", "# Create a classifier and prediction.\n", "clf = tree.DecisionTreeClassifier(max_depth=depth, random_state=1234).fit(lemons[features], lemons.IsBadBuy)\n", "\n", "y_pred = clf.predict(lemons_oos[features])\n", "\n", "# Create a submission\n", "submission = pd.DataFrame({ 'RefId' : lemons_oos.RefId, 'prediction' : y_pred })\n", "submission.to_csv('submission.csv')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "did better than stratified? True\n", "did better than most_frequent? True\n", "did better than uniform? True\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a good start for us!\n", "\n", "In order for us to work on improving this model, we'll have to continue exploring the data set available, impute missing values, create new features, and scale numerical values. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Feature Scaling\n", "\n", "Feature scaling can play a significant role in the performance of our model. Many of the techniques that follow are typically more applicable to algorithms were the algorithm is less dependent on learning weights; in particular, for classifiers. \n", "\n", "### Zeroing on the mean / Centering a feature\n", "\n", "One common technique is to subtract the mean away from all values in a feature. This effectively \"zeroes\" the feature, and is easier for the model to assert the normal centers of the data as being the same.\n", "\n", "$x^` = x - mean(x)$\n", "\n", "\n", "### Normalizing the feature\n", "\n", "Another common technique takes the above one step further: not only do we center the data on 0, but then provide a scope for the data to reside in (either -1 to 1, or 0 to 1, are typical). Normalizing the fe\n", "\n", "**Normalizing to 0 and 1 (where 0 remains 0)**\n", "\n", "$x^` = \\dfrac{x_0}{max(x)}$\n", "\n", "\n", "**Normalizing to 0 and 1 (where min == 0)**\n", "\n", "$x^` = \\dfrac{x_0 - min(x)}{max(x) - min(x))}$\n", "\n", "**Normalizing to -1 and 1 (where mean == 0)**\n", "\n", "$x^` = \\dfrac{x_0 - mean(x)}{max(x) - mean(x)}$\n", "\n", "**Standardization using mean and standard deviation (where mean = 0)**\n", "\n", "Standardization is a slightly different process for normalizing where our data splits are represented using standard deviations instead.\n", "\n", "$x^` = \\dfrac{x_0 - mean(x)}{std(x))}$\n", "\n", "### Practice: Writing python functions to handle these transformations\n", "\n", "Assuming the input of an array, how would we end up writing code to handle each of these transformations?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "from __future__ import division\n", "class Transformations(object):\n", " \"\"\"since these transformations are all related, we'll nest them all under a feature norm class\"\"\"\n", " def mean_at_zero(self, arr):\n", " return np.array([i - np.mean(arr) for i in arr])\n", "\n", " def norm_to_min_zero(self, arr):\n", " return np.array([i / max(arr) for i in arr])\n", " \n", " def norm_to_absolute_min_zero(self, arr):\n", " \"\"\"should be a range of 0 to 1, where 0 maintains its 0 value\"\"\"\n", " \n", " def norm_to_neg_pos(self, arr):\n", " \"\"\"should be a range of -1 to 1, where 0 represents the mean\"\"\"\n", " \n", " def norm_by_std(self, arr):\n", " \"\"\"should be a range where 0 represents the mean\"\"\"\n", "\n", "## tests to make sure we built this correctly:\n", "transformer = Transformations()\n", "a = np.array([1.0, 2.0, 3.0, 4.0, 5.0])\n", "print transformer.mean_at_zero(a) == np.array([-2, -1, 0, 1, 2])\n", "print transformer.norm_to_min_zero(a) == np.array([0.2, 0.4, 0.6, 0.8, 1.0])\n", "print transformer.norm_to_absolute_min_zero(a) == np.array([0.0, 0.25, 0.5, 0.75, 1.0])\n", "print transformer.norm_to_neg_pos(a) == np.array([-1.0, -0.5, 0.0, 0.5, 1.0])\n", "print transformer.norm_by_std(a) == np.array([-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095])\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ True True True True True]\n", "[ True True True True True]\n", "False\n", "False\n", "False\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "-c:28: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.\n", "-c:29: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.\n", "-c:30: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scikit learn has functions to also handle some of this:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import preprocessing\n", "print a\n", "print preprocessing.scale(a, with_mean=True, with_std=False)\n", "print preprocessing.scale(a, with_mean=True, with_std=True)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[ 1. 2. 3. 4. 5.]\n", "[-2. -1. 0. 1. 2.]\n", "[-1.41421356 -0.70710678 0. 0.70710678 1.41421356]\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solving Power Laws" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In class, we previously learned about linear transformations, particularly using log when we're looking at a log-log distribution.\n", "\n", "\n", "\n", "However, transforming to log isn't always going to land us a perfect linear fit. Instead, we'd ideally like to solve for the power law, which is identifying a curve's amplitude and index. We can do this with one extra step: fitting a linear model to the log(10)-transformed data, optimizing against the error. scipy has a handy function to solve this for us.\n", "\n", "Another option could be to experiment with the [plfit](https://github.com/keflavich/plfit) library (not included in anaconda)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "class PowerLaw(object):\n", " def fit(self, x, y, transform=True):\n", " \"\"\"\n", " returns back the amplitude and index of a powerlaw relationship.\n", " assumes the data is not already log10 transformed.\n", " return: [index, amp], also stored on the instance\n", " \"\"\"\n", " if transform:\n", " x = np.log10(x)\n", " y = np.log10(y)\n", " # define our (line) fitting function and error function to optimize on\n", " fitfunc = lambda p, x: p[0] + p[1] * x\n", " errfunc = lambda p, x, y: (y - fitfunc(p, x))\n", " # defines a starting point to optimize from.\n", " p_init = [1.0, -1.0] \n", " out = sp.optimize.leastsq(errfunc, p_init, args=(x, y), full_output=1)\n", " result = out[0]\n", " self.index = result[1]\n", " self.amp = 10.0**result[0]\n", " return np.array([self.amp, self.index])\n", " \n", " def transform(self, x):\n", " \"\"\"returns the x-transformed data\"\"\"\n", " return self.amp * (x**self.index)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "xdata=np.array([ 0.00010851, 0.00021701, 0.00043403, 0.00086806, 0.00173611, 0.00347222])\n", "ydata=np.array([ 29.56241016, 29.82245508, 25.33930469, 19.97075977, 12.61276074, 7.12695312])\n", "\n", "powerlaw = PowerLaw()\n", "powerlaw.fit(xdata, ydata)\n", "print 'amp:',powerlaw.amp, 'index', powerlaw.index\n", "\n", "sns.set_style('white')\n", "plt.figure()\n", "plt.subplot(2, 1, 1)\n", "plt.plot(xdata, powerlaw.transform(xdata))\n", "plt.plot(xdata, ydata)\n", "plt.text(0.0020, 30, 'Ampli = %5.2f' % powerlaw.amp)\n", "plt.text(0.0020, 25, 'Index = %5.2f' % powerlaw.index)\n", "plt.xlabel('X')\n", "plt.ylabel('Y')\n", "plt.subplot(2, 1, 2)\n", "plt.loglog(xdata, powerlaw.transform(xdata))\n", "plt.plot(xdata, ydata)\n", "plt.xlabel('X (log scale)')\n", "plt.ylabel('Y (log scale)')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "amp: 0.895599738306 index -0.409432655356\n" ] }, { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZIAAAEWCAYAAABMoxE0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd8XNWd///X9KY66rJky/XYxmCwDcaYYhucCiHJksAu\nYQMkEPZLIIQs2QRIIOxuCoS6KZTAjwRIwSywAUIKxQaDsbExGBeOuy0ZWcUa9T4zvz/u1WgkjWTJ\nmtGM5M/z8dBjyi1zdBnrzTn3FEs4HEYIIYQ4VtZkF0AIIcT4JkEihBBiVCRIhBBCjIoEiRBCiFGR\nIBFCCDEqEiRCCCFGxZ6sD1ZK5QObgHOBEPC4+bgVuFZrLf2ShRBiHEhKjUQp5QAeAloAC3APcLPW\n+mzz9YXJKJcQQoiRS1bT1l3Ar4FK8/UCrfUb5vOXgfOSUiohhBAjNuZBopS6HKjRWv/dfMti/vRo\nBjLHulxCCCGOTTLukVwBhJVS5wEnA78F8qK2pwP1Q51AKeUCTsWo0QQTVE4hhJhobEAR8K7WuiNe\nJx3zINFan9PzXCn1OnANcJdS6hyt9Rrg08CrRznNqcCbiSulEEJMaGcBa+N1sqT12ooSBr4DPKKU\ncgLbgWeOckwlwFNPPUVhYWGCiyeEEBPD4cOHufTSS6H3/nRcJDVItNbLo14uG8GhQYDCwkJKSkri\nWiYhhDgOxPWWgAxIFEIIMSoSJEIIIUZlQgdJbX0bm3V1soshhBAT2oQOkmde28UPH17Hvo8bkl0U\nIYSYsCZ0kMyfmQvAKxsOJrkkQggxcU3oIDl1biFZaS5e31ROV7eMWxRCiESY0EFit1lZvqiUptYu\n1m87nOziCCHEhDTm40iUUjbgEWAWxmDEawAn8CKw09zt11rrp+PxeStPm8xzq3fzj/UHOXP+pHic\nUgghRJRkDEg8Hwhprc9USp0D/DfwAnC31vqeeH9YaUE6s6dks3lnNTWBNvKyPfH+CCGEOK6NedOW\n1vr/gG+YL8swJmhcCHxWKbVGKfUbpVRaPD9z5eIphMPw6ka56S6EEPGWlHskWuugUupx4H7gKWAD\n8O/mhI57gdvi+Xlnzi/G7bTxjw0HCYVk4UUhhIinpN1s11pfDiiM+yV/11pvNjc9D5wSz8/yuh2c\nOX8S1XWtfLinNp6nFkKI414yFra6TCn1ffNlG8Y67c8qpU413zsX2Bjvzz3vtMkA/GO9NG8JIUQ8\nJeNm+zPA40qpNYAD+BZwEPilUqoLY3rjq+P9oXOn+pmUl8bbH35Mc+uJpHmd8f4IIYQ4LiVjYas2\n4OIYm85M5OdaLBZWnjaZx1/azprNh/js0qmJ/DghhDhuTOgBif2tWFSK1WrhHxsOJLsoQggxYRxX\nQZKd4ebUOQXsqWhg7yGZyFEIIeLhuAoSMEa6A1IrEUKIODnugmThnAKy0l2s3lRBZ5dM5CiEEKN1\n3AWJ3Wbl3EWlNLd1sX6rTOQoxrdHHnmEM888k87OzlGdZ8WKFXR2dvLwww+zZcuWUZ3rF7/4BV/6\n0pe45JJLYp5r3bp1fPGLX+SSSy7hzjvvHPZxInUdd0ECvWNK/i7NW2Kc+/Of/8z555/PSy+9FJfz\nXX311Zx00knHfPy2bdt49913WbVqFffeey933HFHn+2hUIhbbrmFBx54gD/+8Y/U1NTwyiuvHPU4\nkdqSMY5ksBmAO4DHMQYobgWu1VonZD6Tkvx05pT5+WBXDdV1reT7vYn4GCESav369ZSVlXHxxRdz\n00038YUvfIHLLruM2bNns2vXLrxeL4sWLWLt2rU0Njby2GOP8corr/Dmm28SCAQIBAJcd911nHfe\neZFzfu973+Ozn/0sZ511VuS9++67j02bNkVeWywWHn30URwOx4Aybdq0iTPPNHryFxUVEQwGqaur\nw+/3AxAIBEhPT6ekpASABQsWsGHDBkpLS1m6dGmf4wKBANnZ2fG/cCLuklUjicwADNwK/Bi4G7hZ\na302YAEuTGQBPrF4sjGR47sy0l2MT6tWreKiiy5i6tSpOJ3OSHPQ/Pnzefzxx+ns7MTj8fDYY48x\nY8YMNmzYgMViIRQK8fjjj/Ob3/yGH//4xwSDvfcKLRbLgM+54YYbeOKJJyI/v/vd72KGCEBLSwtp\nab1zrvp8PlpaWiKv/X4/7e3t7N27l2AwyJo1a2hra6O5uZn09PQ+xzU3N4/6GomxkZQaidb6/5RS\nL5ovy4AAcJ7W+g3zvZeBT2DMu5UQS+dP4uHnP+SVdw9y8UqF1TrwH5AQqaqhoSFSs3jiiSdobm7m\nySefBGDu3LkAZGRkMGPGjMjznvsoS5YsASAvL4/09HQCgcCQn3Xvvffy3nvv9XnvscceixkmaWlp\nfYKjpaWlT0BYLBbuvPNObr/9dpxOJzNnzsThcBz1OJHakhIk0GcG4M8DXwJWRm1uBjIT+fkel50z\n50/iHxsOsmV3DSfPyk/kxwkRV3/+85+56KKLuOmmmwBob29nxYoV+P3+mLWKaB9++CGXXHIJtbW1\ntLe3R5qdBvPtb3972OVasGABd911F1/72teorKwkFAqRlZXVZ58333yTRx99FLvdzrXXXsvll1+O\nz+c76nEidSUtSMCYAVgpVYAxjbw7alM6xjoloxIOhznSGqCxo5nmzhaaO1to6jAfO1tozg3gnFXB\n/Zs2kbvPxTlli/n0zOXYrLbRfrQQCfXMM89w1113RV673W4++clP8swzzxz12AMHDnD55ZfT3NzM\n7bffjtVq7RM+RwuioZxwwgksWrSIiy++mFAoxG23GStCvPPOO2zatIlrr72WgoICvvzlL2O32zn3\n3HM57bTTAGIeJ8YHSzg89utzKKUuA0q01j9RSmUA7wO7gB9rrdcopR4EXtVarxrk+DJg36uvvhq5\naRfL01tf5JltR+/NEg5bcNhsdIe6mZpVytWnXsp0/5Rj+M2ESG3PPfccgUCAK6+8MtlFEUlQUVHB\nueeeCzBVa70/XudNVo0k1gzAHwGPKKWcwHZzn1GZXziH2tY6vA4PaU4f6U4faS4v6c400pxe0lxp\nvP5OFU/+ZScXf24G1e73WL1/HTe/8jM+NWMZF594AV6HLM0rJpbR1DiEiCUpNZLRGm6NZDgCTe1c\nccffmVKYwf3fWcbWKs0jm35PZVM1fk8WVy64mNNKTo5LuYUQIpkSVSM5LgckRstOd3Pq3AL2ftzA\n7vJ65hUo7vrkrVx0wmdp7Gjm5289xJ1rH6S2tS7ZRRVCiJR03AcJwCdPLwPgzic3UhNow2lz8OV5\n53PXJ29hbt5MNh76gG+/fAcv6VcJhmR+LiGEiCZBAiycnc/F582israF7/9qLVV1rQBMyijktuXf\n5t9OvQyH1c5v33+Gm1/5GXvrZGoVIYToIUGCcfPxK5+ew6Wfmk1VXSvf/9VaKmtbItuWTzuD+z59\nG2eXLWZfoJzvv/IzHt+8irau9iSXXAghkk+CJMolKxX/+pk51ATa+P6v1lJR3RTZluFO55uLL+cH\ny75FgS+Xv+x8jRtfvoN3D32QxBKL41FFRQUXXxxrterYrrnmGg4dOpTAEvU13Fl8H3zwQW688cY+\n7x04cIALLrgg0UUUcSZB0s+Xzp3F1z53Akca2rn5V29x8HBjn+0nFszm55/6Af809zPUdzRy19oH\nuWvtgxxpHXqaCSGSaay6/A53Ft81a9awZs2aPuV6/vnnufHGG486ZYtIPUkd2Z6qPn/ODOw2Kw89\n9yE3//ot/uuapZQVZUS2O20OLj7xApZOXsTDG5/i3UMf8GHVR1xy4uf41IxlWK2Sz2JsXHbZZcyZ\nM4ddu3bR3NzM/fffT3FxMQ888ACrV68mPz+fyspKAJqamrjllluorzcmjbj11ltJT0/n8ssv58kn\nn2T37t384he/4Iknnoh8hzdt2sR9993X5zOvvPJKli9fHrM8sWb/7T+L74EDB3j66ae5/vrrWbWq\nd8xxVlYWTz75JCtXrhxwXpHaxjxIlFIO4DFgCuAC/guoAF4Edpq7/Vpr/fRYly3a+WdOw2az8qtn\nPuDmX73Ff35jCdNL+s79U5JZxO0rbmT1vnU88cGzPL55FW/u38DVp17K1OzSJJVcHG/mz5/PzTff\nzL333suLL77I0qVLWb9+Pc8++ywdHR2cf/75hMNhHnzwQZYsWcI///M/s3//fm6++WZ+//vfc9NN\nN/Ef//EfHDlyhIcffrjP/wgtXLiQJ554YthlaWlp6TNHVs8svj1B0tLSwh133MGdd97J7t27+xy7\nbNmy0V0IkTTJqJFcCtRorS9TSmUDHwA/Au7WWt+ThPIM6tNLyrBbLfzPqve55cG3uePqJcya3Hd9\nBKvFyoppS1lYfCK/ff9/WXtgA9/7x0/47MwVfHne+bgd7kHOLkR8zJkzBzBqALW1tezbt48TTjgB\nAJfLxYknngjAzp07Wb9+PX/5y18AaGw0mm3PPfdc7r33Xs444wwKCgr6nDtWjeSKK65gxYoVkdfX\nXHMNLS0tKKWYMmXKkLP4vv3229TW1nLDDTfQ1NREdXU1jzzyCFdddVW8LodIgmQEySp6pz+xAl3A\nQkAppS7EmHPrBq11SixGsHLxFOx2K/f94T1+8NDb/OiqJcwuGzhbaqY7g+tPv4JlZafzyKY/8OLO\nV3m7fBPnTT+Lc8oWk+fLSULpxfGg//2PGTNm8OSTTxIKheju7mb79u0ATJ8+nXnz5nH++edTVVXF\nCy+8ABhTwi9dupQtW7bwwQcfMH/+/Mi5hlMjefDBByPPt23bNuQsvitXrow0XW3YsIE//vGPEiIT\nwJgHida6BUAplY4RKrdgzPz7iNZ6s1LqZuA24KaxLttgli8sxWa1cPfv3+OHD7/NbV9fwgnTYgfD\nSYVzuPuTt/K/21/mpZ2v8vTWF3h66wvMy1csm7qE00pOxm13jfFvICaaoW6ez549mxUrVnDRRReR\nk5NDdnY2FouFa665hltuuYU//elPtLS0cN111/Hhhx/y0ksv8ac//YmDBw9y/fXX86c//anP4lQj\nMZzZf4f7e4jxI1mz/5YCzwK/1Fo/rpTK1Fo3mNvmAg9orc8b4vgy4jTX1ki8teVj7npiI3a7ldu+\ndjonzsgdcv/WrjbeKd/Mmv3r2FFjtAe77S6WlC5k2dTTmZ07Q/4hCSHGzISZ/ddcf+TvwP/TWr9u\nvv1XpdT1Wut3gXOBjWNdruFYelIx9q+eyk9/9y63/+Ydbr3iNE5Rgy+I5XV4WDHtDFZMO4PDTdWs\n2b+eNfvf4fV9b/P6vrcp8OVyztQl0vQlhBjXxrxGopS6H2NFRB319vcw1mzvAiqBq4e6R5KsGkmP\njTuq+PHjGwC4+fLTWDSn4ChH9AqFQ2yv3sXq/etYX76ZjqCx/Kk0fQkhEi1RNZLjfhr5Y7VZV/Nf\nj60nFIbv/esiFs8rGvE52rraeaf8PVbvf4cdNbsAo+nr9NIFLCtbwpw8afoSQsRPygSJUsrXc8M8\nWVIhSAC27K7hjkfX090d4qbLFrH0pOJjPtfh5hre2P8Oa/a9Q405Zb3R9HU6Z5edTr40fQkhRimV\ngmQPcIXW+o14FWKkUiVIALbtPcKPfrOOjq4Q3/mXBZx9yujKM1jT1wn5s1hWtoTFpadI05cQ4pik\nUpB8AvgV8H/AzVrrjngVZgRlKCNFggTgo/113PbIOto7uvnWJQtYsSg+o9ql6UsIEU8pEyRgNG8B\ndwDnAdcBkQJprQ/Gq3BDfH4ZKRQkADsPBvjhw+tobe/iui+dzMrFU+J6/qrmGqPX17510vQlhDgm\nKRUkAEopL/A4sBKo73lfaz01LiUb+rPLSLEgAdhTUc8PHlpHU2sn/++i+Xx6SVncPyMUDrGjZjer\n963jnfL3pOlLCDFsKRUkSqnzgV8CfwO+o7VuOsohcZWqQQKwv7KRWx98i4bmTq66cB4XnDUtYc1P\nbV3trK/YzOp969guTV9CiKNImSBRSq3CmBvrKq31qyP9wEFm/92BUbsJAVuBa7XWgxYslYME4ODh\nRm598G0CTR0U5/pYsaiUZQtLKfB7E/aZkaav/e9Q03IEkKYvIURfqRQkvwD+41i7ACulLgdO0lrf\nGDX772aM2X/fUEr9Gvib1vr5Ic5RRgoHCUBlbQtP/nUH73xYSWd3CIATp+eyYlEJZ5xUjNftSMjn\nStOXEGIwKRMko2XeqLdorZuVUjnABsCptS41t38O+ITW+ptDnKOMFA+SHq3tXbz1wce8tqmcrXuM\nmoLTYWPJvCJWLCpl/qw8bNbkNH3NzpuO1SKLcAlxvJgwc23FmP33VuDnUbs0A5ljXa5E8bodrFw8\nhZWLp1BV18rrm8p5bWM5azZXsGZzBf4MF8sWlLJiUSlTolZhjAePw82yqUtYNnVJn6av1fvWsXrf\nOgp8uZxdtpglpQuZlFEo91OEEMckVWb/LY+qkVwInKe1vm6I48sYJzWSWMLhMPpAgNc2lvPG+4do\naesCYNqkTFYsKuWcU0rISk9M81Ofpq+KzXR0G8OAMt0ZzMufxbx8xQkFigJfrgSLEBPMRGraKgBW\nEzX7r1Lqzxj3SNYopR4EXtVarxriHGWM4yCJ1tkV5N0dVbz2bjmbPqoiGApjtVpYODufFYtKOW1u\nIU6HLSGf3d7VzvqK9/ng8Ha2Vmvq2xsj23K9fublK+YVKE7In0WON3uIMwkhxoOJFCSxZv/9FvAA\n4AS2Y/QIG7e9to5VfVMHb7xfwesby9ld0QCAz23nzJMnsWJRKXPK/AmrJYTDYQ41HWZb1U62Vmu2\nVe+kubO3P0VRWj4nFCijxpI/k0x3fJvhhBCJN2GCJB4mapBEO3C4kdc3lvP6pgrqGtsBKMrxsXxR\nKcsXllCY40vo54fCIQ7WH2JrtREsO6p30dbdHtlemlkcqbHMyZtBmjOx5RFCjJ4ESZTjIUh6BENh\ntuyq4bVN5az7sJKOziAAJ0zLYfnCUs6cX4zPk5iuxH3LEWRv4CBbq4zayke1u+kMGvd2LFiYml0a\naQabkzsDt8Od8DIJIUZGgiTK8RQk0Vrbu1j3YSWvbSznwz21hMPgtFtZbHYlPmVWHjbb2HTn7Qp2\nsevI/kgz2M4jewmGjJCzWazM8JdFmsJm5U7DaUt82AkhhiZBEuV4DZJo1YFWVm+q4LWN5RyqMRaT\nzEp3sWxBCSsWlTK1eGx7UHd0d6Jr9xjBUqXZHThAz3fLYbUzK3dapClsur8MuzUxHQiEEIOTIIki\nQdIrHA6zq7ze6Eq8uYKmVqO5qawow5iaZUEJ2Rlj38zU2tnGjtrdZlOYZn99RWSby+5iTu505pk1\nlrKsUqxWGRgpRKJJkESRIImtqzvIxh1VvLaxnI07qugOhrFa4BRldCVePK8IV4K6Eh9NY0cz26t7\ne4Qdajwc2eZzeJiTPysyjqUks0hG3AuRABNmZLtIHIfdxpITi1lyYjENzR2sff8Qr24sZ9NH1Wz6\nqBqv287Sk4pZsaiUuVNzsCZoapZYMlxpnF66gNNLFwAQaGtgW7WO3LzfeOgDNh76ILLvCflmV+OC\nWRSl5cvgSCFSmNRIjgPlVU28vqmc1zeWU9tgdOEt8HtZvrCU5YtKKM5NS3IJobrlCNuqNFurjZ9A\nW0Nkm9+T1WdwZJ7MZCzEMZlwTVtKqcXAT7XWy5VSpwAvALvMzb/WWj89xLFlSJCMWCgU5sM9tby2\nsZy3t3xMu9mVuDDHS2GOj6JcH0VRjwU5XtzOsa+0hsNhKpur2WoGy7bqnTR1NEe2F/hyIz3C5uXP\nIsszYaZmEyKhJlSQKKW+C3wFaNZan6GU+jqQobW+Z5jHlyFBMirtHd2s21rJ6vcq2HeogUBTR8z9\n/BnuSLAU5nopzkmjMNdLUW4aaWMwfgWMwZEVDZVGbaVKs71mF61dbZHtkzIKIzWWuXkzSXclv4Yl\nRCqaaPdIdgNfBJ4wXy8EZpkTNu4CbtBaNw92sBg9t8tuNG0tLAWgraObw0daqKxt4fCRFj42HyuP\ntLJj3xG27T0y4BzpXgdFuT6jNmPWZApzfBTn+shKd8XtvobVYmVy1iQmZ03iM7NWEAqF2FdfHqmx\nfFSzm7/tXsPfdq/BgoUpWZMiwTI7bwZehycu5RBCxJaUINFaP2vWKnqsBx7WWm9WSt0M3AbclIyy\nHa88LjtTizNjjj/p6g5RHWilstYImkozcCprW9h7qJGdB+sHHON22vo0lxXm+ig2H3OzPKNag8Vq\ntTLdP4Xp/ilcOOcTdAe72V13wGwG0+ys3cv++gpe3PkqVouV6dmTI01hKnc6LrvzmD9bCDFQqvTa\nek5r3XN39XmMCRxFinDYrUzKS2NS3sAmo2AozJH6NiqPtERqNJVRNZv9lY0DjrHbLBT4B96XKczx\nUZjjxWEfWRdlu83O7LzpzM6bzkUnfIbO7k52HtlrNoXtZHfdfnbV7ef5HX/DbrUzM2eq0dW4QDHT\nPxW7LVX+GQgxPqXKv6C/KqWu11q/C5wLbEx2gcTw2KwW8v1e8v1e5s/M67MtHA5T39zB4dpWKo80\nU1nb2idsDtVUDzifxQK5WZ4+4RKp1eR4h7VEsdPuZF7BbOYVzIYTjZUiPzIHR/Y0he2o2cWqbS/h\ntDmYnTsjMjhyanYpNhl1L8SIJDtIeu70XwP8UinVBVQCVyevSCJeLBYL2elustPdzJnqH7C9pa2r\nT+0lujazZXctW3bXDjgmK81lBoxxw78oxxsJnAyfM+Z9GY/DzSlF8zilaB4AzR0tbK/ZFZnOZUvV\nDrZU7YjsOydvptkjTDE5q1gGRwpxFDKORKSkjq4gVZFwaaWytpnDR4z7NFWBVkKhgd9br9ve54Z/\ndG3Gn+EedABmfXujMererLEcbq6JbEt3+phrjrifV6AoTi+QwZFi3JpovbaEGJLLYWNyYQaTCwcu\noNUdDFET6HdfxqzNlFc1s6eiYcAxTruVgpzocTJGjaYw10t+dhpnTF7EGZMXAVDbWhdZ4GtrtWZ9\nxWbWV2wGIMudEQmVefmK/LTcxF4IIcYBCRIx7thtViMMcgcuphUKhQk0tfcJl8NmjaaytoXyqqYB\nx1itFvKzPb29y3J9FOZM4fzJc7nyZA8NnYHIAl/bqjRrD77L2oPvApDn9UcNjlT4vVkJ//2FSDUS\nJGJCsVot5GR6yMn0MG9639pCOBymqbWr7ziZqMDZvLMGdtYMOGfvoMwFnJdzJq7CNpqsH3Oo7QC6\nbjer961j9b51AKS70shyZ5DtziTLk0GWO5NsdwZZHvM9dwZZnkw8drc0kYkJQ4JEHDcsFgsZPicZ\nPiezJmcP2D7ooMzalkEGZRaT5p1MQWEnzux6Ol3VtAUbqWo6QnnDx0OWxWlz9AmWLHcG2eZjlrv3\ndYYrTXqRiZQnQSKEaehBmUGq6lojN/yjB2VWHAjRvdcPRPVMswaxODqwODrAfLQ4OrA4O7G7Ouly\ndlDd0UxV8xGwDN7hxYIFj81LmiOdTFc6me4M/N4scn1Z5KVl4/f21H4ycdtdCbgqQhydBIkQw+Cw\n2yjJT6ckP33AtuhBmY3NnbR2dNPW0U1be1fU8+7e543GY2tHF23tXXTSboZMVOA4OsDRicXZQbOj\ngxZHDdXth2FgP4IIS8iOPezBgRe3xYvH5sNnTyfdmUamK4MsdwY53mz83nS8bidetx2Py47HfLSP\n0TLNYuJJWpD0m/13BvA4EAK2Atdqrcdfv2RxXIoelHksuoOhSNi0dXTTaj4az7sizxvbWmnoaKSp\ns5nm7ibagi20h1ropJVuSxtBWzud9nY67U20WjBGaXWZPy29nxcOW6DLSbjLZfx0Go+2oAunxYvb\nkmaEkCMNn8uNNypsep87+r7fE0rm85HOTiDGt6QESfTsv+Zb9wA3a63fUEr9GrgQY6oUISY8u81K\nutdJunf0c4AFQ2Fa2jqoaqynpjlAbUuAutYGAu2NNHY00tTZREt3M63WFjqcLYToO4VNN8Y/ymag\nBgh3242wqXcR7nJCVPBE/9DtAHo7D9htFjwuBx63GT5m6PR97uj7/oCwsuN1O3DardIxIcWlyuy/\nC7TWb5jPXwY+gQSJECNms1rI8LnJ8BUyk8Ih9w2Hw7R1tVPfbgRNfXsDgbZG6tsbqW9rINDeQKC1\ngfr2Rpq76oY8lwUrjrAHe9iDNeiGLhfBThcdHU6a2+x01NoJmQFEeGRNaFarZWCtp1/YRNeGYj83\n9nE7bRJKCZAqs/9G/5dtBmSlIiESzGKx4HV68Do9FGcMHTrdwW4aOpoItDVQ326Ei/G8kUB7Iw1t\nPWFUT7e1GxxAVEtfdF3La/eS5kjDZ0/DbfXhsvhwhN3Ywh4s3UYIhbtcdLRbae8I9mniCzS2c6i9\nm2CMmQ2G9zsTCZfBwmY4geR123E77WO6XHUqS5Wb7aGo5+nAwHnJhRBJY7fZyfFmk+Md2G06Wjgc\npqWz1QyYBurbzEezlhN5v72R6raBk3ZGc3jtZPmN7tFl0V2lXdmkO9NxW3048WEPu+nsCve5n9T3\nPtPA+02t7d00tnRSVddKV3doyHIMxe20DRpGsZv1Bm/us43jzg6pEiSblVLnaK3XAJ8GXk12gYQQ\nI2exWEhz+Uhz+SjJLBpy385gFw1RNZvemo753Gxm21t3gGB48D/2Fiyku3zG4E9PBpnuDLKzjeAp\n8/QdqxNrIGhXdyhm54ZYYTTYPi3t3dTUt9PZFTzma+d02PoEzGhqTQ772IZSsoOkp376HeARpZQT\n2A48k7wiCSHGgtPmIM+XQ54vZ8j9QuEQzZ2tvTUa8/5N/1pOTesRDjYcGvJcLpuz7yDQqBkIegaB\nFmVnkOnKxmod+R/joNkDL2a378G6g/fUmMx92jq6qW9up63j2EPJbrP2CZjlC0v54vIZx3y+o35e\nws58FObMk2eYz3cBy5JVFiFE6rJarGS40shwpTGZSUPu29HdGdVpoLemE92JoL69kZ1H9jLUzOcW\ni4VMV3okXDLdvVPcZPer5UQPBLXZrKR5naTFoQdeKBSmvbNfjajdHH80ghpTdaCVj2sTu3J5smsk\nQggRNy4Tm0OlAAAf5UlEQVS7k4K0PArS8obcLxQK0djZ3KeWE33/puf9yuYa9tdXDHkuj93dr5bT\n+7ynuS3LnUG6K21Ea9tYrRa8bgdet4OcFO9+JEEihDjuWK1W8w/9wGUK+mvvau/XPbonbPoGz+Hm\nGsIMXsuxWqxkutMHzrHmzhxQy3Hajr4SaCqRIBFCiCG4HW6KHG6K0vOH3C8YCtLQ0TRELccIofLG\nSvYGDg55Lp/DY9y3MWszse7lZLkzSHP6UmJcjASJEELEgc1qw+/Jwu8Zek2anoGggX7jcepjdJc+\n1HT4qJ+Z5c6IWcuJXrrA781K6JLREiRCCDGGogeCThrGQND6jt7azGADQffXV9Ad6h70PKeVnMy/\nL/1GvH+VCAkSIYRIUXabnVyvn1yvf8j9jjYQdH7h3MSWM6FnF0IIkXAjGQiaCCkVJEqp9+hdcWGv\n1vprySyPEEKIo0uZIFFKuQG01suTXRYhhBDDlzJBAswHvEqpv2GU62at9fokl0kIIcRRpFKQtAB3\naa0fVUrNBF5WSs3SWsearc0GcPjw0F3jhBBC9Ir6mxnXJSxTKUh2Yix4hdZ6l1LqCFAExJqFrQjg\n0ksvHbvSCSHExFEE7InXyVIpSK4ATgKuVUoVAxlA5SD7vgucZW4/9ikyhRDi+GLDCJF343lSy1Az\nYI4lpZQd+P+AKeZb39Vav5PEIgkhhBiGlAkSIYQQ49P4XdtRCCFESpAgEUIIMSoSJEIIIUYlqb22\nlFJW4FcYvbU6gK9rrfdEbb8A+AHQDTymtf7NYMcopWYAjwMhYCtwrdY6rJS6CrjaPMd/aa1fGmfl\nvx9YCjRhrHH/ea11Y6qVP+qYe4GPtNYPma8Tcv3HqOzj4torpU4GHsDowdgB/KvWunocffcHK/94\nuf5zgYfNQ3eZ7wfH0fUfrPzDvv7JrpF8HnBqrc8Avgfc3bNBKeUA7gFWAucAVyul8s1jXDGOuQdj\nNPzZgAW4UClVCFyHsTb8J4GfKKVGv5jyGJXffH8B8Amt9XKt9Yp4/UOKd/mVUnlKqZeBCzC+dCT4\n+ie07KZxce2B+4BvmtMLPQv8h1KqgPHz3R9QfvP98XL9/xv4ntb6TPP1BePsb8+A8puPw77+yQ6S\npcBfAczpUBZFbZsD7NZaN2itu4C1wNnmMS/HOGaB1voN8/nLwHnAqcBbWusu8yLsxkjjcVF+pZQF\nmAk8opRaq5S6Io5lj3f5fcBtwBMYQQhwGom7/gktu/l/b+Pl2l+itd5iPncAbST22ie8/OPsu/9P\nWuu1ZlAUAvWMr+s/oPwj/f4nO0gygOiUC5q/QM+2hqhtTUDmIMfY6P3j1X/fWOeIl0SVv9nc14dR\n5b8U+BTw/5RSJ6Zg+a1a6/1a6w39zp8+yDniIdFl9zJ+rv1hAKXUGcC1wL1DnGO8lH88ffdDSqnJ\nwDYgB9hCYr/7Y1H+EX3/kx0kjRgXvIdV986t1dBvWzpG0sc6Johxb6FHxiD7pgOB+BQdBilLPMrf\ns28r8IDWul1r3Qy8hjG5ZaqVP9Z8aLHOH8/rn+iyj6trr5S6GPg18Bmt9ZEY+6bqd3+w8o+r66+1\nPqi1ngk8hNGsNK6uf4zyj+j6JztI3gI+A6CUOh0jCXt8BMxUSmWbVa6zgbeHOGazUuoc8/mngTeA\nDcBZSimXUioTo8q3dRyVfxawVillNds9zwQ2pWj5Y3mXxF3/RJddMU6uvVLqKxj/J79Ma73fPMe4\n+e4PUv7xdP3/rIzOMmC0JgQZX9c/VvlH9LcnqSPbzXbQnl4EmRhVq2LgVa317Uqp84EfYgTeo1rr\nX/c7BuAKrfVOZcwY/AjgBLYDV2mj19PXMXpOWIH/1lo/l6DygzFf2EIgTWv9SJzKfyNwMdAF/FZr\n/Ugqlj/qnLcBlVrrh83XCbn+Y1T2lL/2GBPvVQMH6G3OWK21/tF4+O4fpfwpf/3Nf7tLgLuAToxZ\nzL+uta4aD9f/KOUf9vVPmSlSlFLna61fVEZXwPO01j9PdpmEEEIcXcoECYBSygf8D8aEjbXJLo8Q\nQoijG5MBiUqpxcBPtdbL1eCDYnKBO4EfSogIIcT4kfCb7Uqp72K0/bvMtwYbSHM3UIAxcOefEl0u\nIYQQ8TEWNZLdwBcxBnuBcfc/MpBGKbXIfP7VMSiLEEKIOEt4kGitn1VKlUW9lc4gg2KGe06llAtj\n1LqskCiEEMMXWSFRa90Rr5MmY9LGkQwKG8ypwJvxK5IQQhxXzsKYOiUukhEkb2FMCrZqGIPCBlMJ\n8NRTT1FYWBjPsgkhxIR1+PBhLr30UjD/hsbLWAZJTz/j54CVSqm3zNfHMhlbEKCwsJCSkpJ4lE0I\nIY4ncb0lMCZBYk57cIb5PAz821h8rhBCiMRL9lxbQgghxjkJEiGEEKMiQSKEEGJUJEiEEEKMigSJ\nEEKIUZEgEUIIMSoSJEIIIUZFgkQIIcSoSJAIIYQYFQkSIYQQoyJBIoQQYlQkSIQQQoyKBIkQQohR\nkSARQggxKhIkQgghRkWCRAghxKhMqCAJBkMEQ+Gj7yiEECJuhrVColLqJGAmxvKMu7XWWxNaqmPQ\n1R3k6//9DxpbOsnL8pLv95Cf7SXf7yU/20uB30tetoecTA82qyXZxRVCiAlj0CBRSlmBbwA3AM3A\nAaALmKqUygTuAx7SWofGoqBHY7dZOfuUEnbsq6Mq0MoHu2pj7mezWsjN8lBgBkx+tscIG7+Xgmwv\nOZlubLYJVVETQoiEGqpGsgp4BThdax2I3qCUygK+CjwPfC5xxRs+i8XC1z43L/K6oytITaCV6ro2\nqgKtVNe1Uh31uGV37KCxWi3kZrr71GQiYZPtJTfLg12CRgghIoYKkq9qrZtjbdBa1wP3K6UeTUyx\nRs/lsFGSn05JfnrM7Z1dQWrr26gyg6WqrpWaQO/rbXuPsDV8ZMBxVgv4Mz0DAqbAbEbLzfLgsEvQ\nCCGOH4MGSXSIKKUuBeYCPwG+qLX+Xf99xhunw0ZxXhrFeWkxt3d1h6itb6O6rrVvjcYMmx37jrBt\n78DjLBbIyXCT11ObiWpC67lP47DbEvzbCSHE2DnqzXal1M+AEmAB8HPgCqXUyVrrGxNduJEIh8M8\n9O6THGqqgnCYMBAm3O+58dj3OcPfPzNMOBPSpoTxAcFQiO5giGAoRDAYNh5DYVpDIfaFw+wLAbVA\nbRgi9/fDWCwWrFYjdKwWCxaL8RzzMQzYLVbSnD7jx+UjzemNvE53me/3vGe+9jk82KwSUkKIsTWc\nXlufxAiRTVrrgFJqJfAhkFJBEgwF+ah2D5XN1ViwGH+3LZZ+zzFe9XnOMe5vPHfYrTgxUwF69wNC\nIQiGwsZP0PwJhenuDtHdZYQTPUdFPbdZLYTsUNfZSo0lQIjgsK+D1+EhPRJAXnxOX+/rqOBJj3rt\nc3olgIQQx2w4QdL/r5grxntJZ7fZue8ztye7GMMWDIWpa2iP3J+p7tMhoI2a+la6g2a6WIJg78Ji\n78Ji78SXBukZ4PGFcblD2F3dWGxdBC0ddITbae1qpbzxYzqDXcMuj9fh6VPrGVAL6vdeugSQEMI0\nnCBZBfwR8Culvg1cBvwhoaU6DtisFvKyPeRlezhhWs6A7cFQmPqmdiNkIvdp2noDZ3cb3cHYPa8z\nfE7jxn+2g6wsGxkZ4PWFcXlC2JzddIbaae5sobmz1XxsobnDeH2o8TAdwc5h/x4ehztS20l3pkXV\nerykma+zPZn4PVn4PVl4HR4sFhnHI8REctQg0Vr/VCn1KeAgUAr8UGv9YsJLdpyzWS3kZBoDKOdO\nHRg0oVCYQFN7n55mPaFTHWjjQGUju8tjB02610G+P5v87EnkZ3uZ6fdQkNfbMcDhpE+4RMKmJ3w6\nWmiKft3ZwseNVXQEy4/6ezltDvyeLLI9WfjNgMk2Q8ZvvpftycRhc4z6GgohxsZQAxLPobflvg14\nIWrb2VrrNxJcNjEEa1TQzC7zD9geCoVpaO6gKtBKTdRYmqpAKzWBVsqrmtlT0RDz3D6Pg4Jso4eZ\n0fMsnfzsAqaYr30eR8xaRWewi5ao4GnqaKGpo5lAeyN1bfUE2urNxwY+qtnd09UhpnRXGn53Jn7v\nYKGTSborDatFuloLkWxD1Uh+BEP8S4flcS6LiCOr1UJ2hpvsDDezpwzcHg6HaWjuNLs0myFT19u9\n+VBtM3s/jh00Xrfd7NJsTEXTO0uAUaspycg4avNVdyhIfXsDgbYG6trqqWutJ9DeQF1rb9hUtxzh\nQMOhQc9hs9rIdvcETG/zWeS5Nwu/OxO3wz2iayeEGJmhxpEsG8NyiDFmsVjISneRle5i1uTsAdvD\n4TCNLZ2Rm//VgdY+Y2qq6lrYX9kY89wel438bG/vWJqouc4K/F4yfE7sVhu5Xj+53oG1qWhtXe2R\nmkxdT+iYQdPzfE/dfoLhwWfq8Tjc+N1Z+L2Z/ZrRekMny50hHQeEOEbDGUdyFnAT4MOYLdgGTNZa\nlyW2aCKZLBYLmWkuMtNczCyNHTTNbV1R92WM2kxvzaaVA4ebYp7b5bT1DRezJtPzOivNFanReBxu\nPI5CijMKBy1rKByisaO5T60m0F7fp3ZT11bPoabDg/++WMh0pw8ImOimtPy0XNx21wivpBAT33B6\nbf0G+BnG3FoPAJ8B/jeRhRKpz2KxkO51ku51MqMkK+Y+zW1dVB1pMQImqgmtZ/6z8qrYQeN02PpM\nP5MfuVdjTEWTle7q03RmtVjJcmeQ5c4YssydwS7qo2oysWo45Y2V7A0cHPR3LkkvZKp/MtOzpzDN\nP5myrFJcducwr5oQE9NwgqRNa/2YUqoMCABXAWuA+xNZMDH+pXkcpJVkMX2QoGlp6+rTZBbdA626\nrpWK6tgz8DjsViNoopYJ6AmZfL+H7HQ31hhLBThtDvLTcslPyx20zOFwmJau1gH3bOra6qlorGRf\noJzyxkre2L8eMMMloygSLNOyJ1OWVYJTwkUcR4YVJEopP6CB04HXgbyElkocF3weB1M9mUwtzoy5\nvbW9ywiXQG+35uj7NIdqamIeZ7dZ+zSZRdamMZvTsjPcg65JY7FYIoMwJzNpwPZQKMTHzVXsrTvI\n3roD7A0cNMKl4WNW718HGDWkkoyiSLBM909hSuYkCRcxYQ0nSO4Bnga+AGwEvgK8l8hCCQHgdTuY\nUuRgSlHsJqv2ju4+E2nW9JkloI33dw0WNMaaNPkxJtbM93uHXPzMajVCoiSjiLPLFgNmuDRVsTdw\nkD1muOwPlHOw4RCr9/WGS2lmsRksk5mWPYXJWZNwyngZMQEMZ0DiKqXU81rrLqXUqcDJwNrEF02I\noblddiYXZjC5cJCg6eymJhA1G0C/Gs1ga9L0LH4W6d7crwktt9/iZ1arlZLMIkoy+4bLoabDkWDZ\nW3eQ/fXlHKiv4PV9bxufExUu0/xTmJY9mSlZk2Qwphh3htNr68vAD4F5QD7we+CbGItaCZGy3E47\npQXplBYMviZNTc+aNFE1mZ5ZArburSW8Z+BxPYuf9e3e7Im87ln8rDSzmNLMYpZNXQIYE4seajwc\nCZY9gQPsr69gf30Fr/WEi9XG5IziSLBM809mcmaxhItIacNp2voBcC6A1nq3UmoB8A8kSMQ453TY\nmJSXxqRB16Qxgsbo0txmNJ1Fep61sn3fEbbtHXzxs56mst4ajYd8fyZLS08bEC69NZcD7G84xL76\ncl41z2ez2picWRx1Q38KkzOLsduG889XiMQbzjfRobWu6nmhta5WSiWwSEKkBofdRnFuGsW5gy9+\ndqShrc/EmtE9zz7aX8f2fXUDjrNYwJ/h7jc7QDGnZM/gU5O9ZGc4qW6tNoIlcIC9dQc5UF/BvkA5\nmIup2a12Jmf21lym+6dQmlEk4SKSYjjfureUUn8AnsJYbuPLwLqElkqIccBht1KY46Mwxxdze3fQ\nWGWz/8SaPT3R9MEAO/YPDBoAf4bLaCrLnonyz2fpJBcWTxOtliPUdh7mQEM5B+oP9RnzYrfamZI5\nqU9vsZLMYuwyYl8k2HCC5FrgOuAbQBfwBvCrRBVIKbUC+Get9VWJ+gwhxoLd1hs0J8bYHgyGONLQ\nHplIs6quLWqWgFZ2l9ejDwRiHJlGVvrJFGafRrq/E3taIx2OOhpDNeyvr2BP4EBkT4fVzpSskj43\n9EsyiyRcRFwNp9dWu1LqKa31XUqps4GTAAcw/EUrhkkpNR2jV5jMsicmPJvNao5z8cbcHr34Wd+J\nNY1OAfsONdN9MIyx1lyR8WMJYfE04fO34MpsIeypZ0/dQXbX7Qez44DT5kDlTmNu3ixOyJ/FdP8U\nuZkvRmU4vbYeBEJKqV9iNG/9HWPm33+Kd2G01nuAe5RST8T73EKMN30WP2PwNWmi16Hpsy5Npbn4\nmRkuVl8DVl8jobR6PgxqPqzSAFixk+8sZnrWdE4sVCycPJNMX+xwEyKW4TRtnQYsBG4DHtNa36aU\n2jjSD1JKLQZ+qrVerpSyYjSPnQR0AF83Q0QIMUzWYSx+Vt/c0bcmE2ij6kgLh2vqORI8RNhbizUj\nwGEOcrj6IG9Vv074fSuWVj9poUIKnKWUZU6myJ/eZyxNmkdqMKLXcILEav5cCFyjlPIBI/rfFaXU\ndzFGxPdMnvR5wKm1PsMMmLvN94QQcWK1WvBnuPFnuGMufhYO9wbNgZojbK/ZxYGmfdSGD9GRVksz\ntTSzld1tVkIfZRFq8hNq9BNqycLncvYJlp6xND2zBQy2+JmYmIYTJL8DKoG3tdbrlVLbgYdH+Dm7\ngS8CPU1WZwJ/BTDPuSh6Z631ZSM8vxBihCwWC9npbrLT3agpfj7BzMi2po5mtlfv4v2PP2Jb9U4O\nWw9jyzR6mFnCNmwdOVQGsjhwMIvQ9iwI912p0uOy9y545vf0m1jTS7pXgmYiGc7N9nuUUg9orbvN\nt87WWseeW2Lwczxrzh7cIx2IXhUpqJSyaq0HX51ICDFm0l1pLC49hcWlpwBGsOyo2c326p1sq9nF\ngfoKLEXVuIqMbseF7hKyLcU4OvLoqM+gNtBBVV3roIufuZ22qJqMt8+yAT2Ln0nQjB/DGr0UFSKM\nNEQG0YgRJj0kRIRIYemuNE4rOZnTSk4GBgbLwfoDVLAfAEemg1nTpnJW3kymZ8zERx6Bhq7e2QGi\nlgo4OOTiZ/1qMj21G7+3z+JnIvmSNQz2LeACYJVS6nRgS5LKIYQ4Bv2DpbmjhR21u9lWvZPt1TvZ\nXr2LbdU7AWMsy6zcaczNm8nSObOYkTM7Mutxc1tXb7jE6HlWXhV7TRqn3UpBjpcrL5jHojkFY/NL\ni0GNdZCEzcfngJVKqbfM11eMcTmEEHGU5vJx6qT5nDppPjB4sKza9lKfYDkhfxYzCqYOuSZN/yWc\ne2ozdY0dNDR3jOWvKQZhCYfDQ+6glHodIwB66pFhoA3YDvxYax1r6G1Cmfdb9r366quUlJSM9ccL\nIUaoubOFj2p2s616F9urd7K/voKw+f+VDqudmTlTmZtvDJCcmTNV1mlJkIqKCs4991yAqVrr/fE6\n73BqJDswRrE/hhEm/wKUYPTkehSjN5YQQgwqzelj0aT5LOqpsfQLlh01u9les4tnzBqLBMv4Mpwg\nOV1rvSDq9QdKqY1a60uVUtJNVwgxYkMGS03fYLGbwXJC/kzm5s1iVs5UWbY4xQwnSOxKqXla660A\nSql5gFUp5QXkv6YQYtRiB8ses1fYTj6q2c2Oml3AX/oEyydnnEOmO/YKmWLsDCdIrgdeVkpVYYxw\nz8YYpX4bxmBFIYSIKyNYTmLRpJMAaOlsjXQ33l6zi49qjWAJhUNccuKFSS6tGM6AxNVKqanAiUAQ\n2GGu3/621nroO/VCCBEHPqd3QLDsC5QzLXtykksmwKhhDEkp1bNO+yvAm8DTSqkCCREhRLL4nF7m\nFSi8Tk+yiyIYRpAADwEbgKnAFIzVER9NZKGEEEKMH8O5RzJNa/2FqNd3KqX+NVEFEkIIMb4Mp0YS\nUkpFGiKVUlNIwOqIQgghxqfh1Eh+ALytlNpgvj4duDpxRRJCCDGeDKfX1otKqQXAqRg1mGu01tUJ\nL5kQQohxYdAgUUrdNsimBUqpsNb6jgSVSQghxDgy1D2SoSb7l4UAhBBCAEM3bT2kta4c6mClVNHR\n9hFCCDGxDRUkP1FKHQJ+q7XeGb1BKTUHuBIowpguRQghxHFq0CDRWl+ulDofeEQpNQv4GOjGmEJ+\nD3CX1vqFsSmmEEKIVDVkry2t9YvAi0opPzAdCAH7tNZ1Y1E4IYQQqW9YS+2awSHhIYQQYoDhjGwX\nQgghBjVokCilfGNZECGEEOPTUDWSLUqps8esJEIIIcaloYLk34DHlFJ3K6VcY1UgIYQQ48ugQaK1\n/jsw33y5QSl1tlJqcs/P2BRPCCFEqjta998WpdQPgFLg/4D6qM1TE1kwIYQQ48OQQWIOSPwl8Ddg\nsta6aUxKJYQQYtwYavbfVcBC4Eqt9atjVyQhhBDjyVA1kirgRK11y1gVRgghxPgz1Fxb3xzLgggh\nhBifZGS7EEKIUZEgEUIIMSoSJEIIIUZFgkQIIcSoSJAIIYQYFQkSIYQQoyJBIoQQYlQkSIQQQoyK\nBIkQQohRkSARQggxKhIkQgghRkWCRAghxKgMuR7JWFJKnQFcbb78lta6IZnlEUIIMTypVCO5CiNI\nHgUuTnJZhBBCDFMqBYlNa90JVAJFyS6MEEKI4RmTpi2l1GLgp1rr5UopK/Ar4CSgA/i61noP0KqU\ncgLFwOGxKJcQQojRS3iNRCn1XeARwGW+9XnAqbU+A/gecLf5/sPAQxhNXE8kulxCCCHiYyxqJLuB\nL9IbDmcCfwXQWq9XSi0yn78HXDEG5RFCCBFHCQ8SrfWzSqmyqLfSgcao10GllFVrHRrBaW0Ahw9L\nC5gQQgxX1N9MWzzPm4zuv40YYdJjpCEC5s34Sy+9NG6FEkKI40gRsCdeJ0tGkLwFXACsUkqdDmw5\nhnO8C5yF0cMrGMeyCSHERGbDCJF343nSsQySsPn4HLBSKfWW+XrE90W01h3A2ngVTAghjiNxq4n0\nsITD4aPvJYQQQgwilQYkCiGEGIckSIQQQoyKBIkQQohRSZnZf+NBKVUAvKi1PjXZZREimlJqIfBN\nwAJ8V2tdneQiCQGAUupcjIlyvcCdWusR96SdaDWSm4D9yS6EEDG4gBuAl4AlSS6LENE8WuurgZ8D\nnziWE0yYIFFK/RvwJNCe7LII0Z/W+m1gLvDvwPtJLo4QEVrrF5VSPuB64PFjOUdKd/8dzqzBSqk7\ngJlAPrATWAHcrLX+32SVWxwfRvj9vAcjQNKB27TW30pWucXEN8zv5n8CM4BvAT8Ffqi1rjiWz0vZ\neyTmrMFfAZrNtyKzBpsX6W7g81rrH/Y77ncSIiLRRvr9VEotBx4DOjFmuRYiIUbw3fyBuf9vgVzg\nJ0qp54/l72fKBgnDnDW4P631v45N8cRxbkTfT63168DrY1pCcbwa6Xfzq6P9wJS9R6K1fhbojnor\n5qzBY1sqIQzy/RSpKhnfzfH0RY/HrMFCJIp8P0WqSvh3czwFyVvAZwBGMWuwEIki30+RqhL+3Uzl\neyQ94jZrsBAJIN9PkarG7LuZ0t1/hRBCpL7x1LQlhBAiBUmQCCGEGBUJEiGEEKMiQSKEEGJUJEiE\nEEKMigSJEEKIUZEgEUIIMSoSJEIIIUZlPIxsF+KYKKWWAb8H5muta8z3/h04XWt9UYz9rwf2AU0Y\na4YsH8PixqSUuh0Ia61/NMQ+vwW+r7X+eMwKJkQUqZGICUtrvRpj1cxHIDLP0NXAlf33VUoVABdo\nrV8YyzIOw3CmnvgZcG+iCyLEYKRGIia6W4ANZm3jm8BlWuvGGPtdC6zq/6ZSahbwMJANtADXa603\nKqVKgKeALOBD4BytdWm/Y8/F+CMfBgLAP2utjyilvg18AwgCL2itv6eUmgc8AKRhrPZ5t9b6f/qd\n71PAjwAHRs3pKq11ndZ6u1KqTCk1TWu991gukhCjITUSMaFprbuASzGWuv2D1nr9ILteALwR4/0n\ngfu01vOBbwPPKKWcwP3m+eYDzwCTYhx7C/ANrfWpwAvAAqXUacC/AadiLH26UCm1APga8J9a69Mw\nlov+b/McFgClVB7wE+ATWusFwN8xQqrHWuD8o10PIRJBgkQcD84EajBmQLUNss9MoM961UopHzBd\na/08GKvLAXWAAs7DXIHO3F4f45x/Bp5XSv0PsENr/Q/gbODPWusmrXVQa71Sa/0e8B3Aq5T6HkaI\n+Mxz9DRtnQZMBlYrpTZj1KBmRH3WAfN3EGLMSZCICU0pNRe4HVgCdAC3DrJriL6ryoHx78PS7z0L\nRpNwEBgslADQWt8HLMNY+vROpdTNGGu2R86plCpWSmVhNKtdCGwDvh/jc23AWq31KVrrUzCC5ctR\n27vM30GIMSdBIiYspZQb+BPw71rr/cBXgeuUUotj7L4HKIt+Q2vdBOxRSn3BPN/pQAGwFfgH8C/m\n+5/GuFfS//PfBtK11vcD9wGnAG8Cn1ZK+ZRSdoxeZQsxaji3mTf7l5nHRwfZemCJUqqn1nErcGfU\nx00Ddg3nuggRbxIkYiK7B/hAa/17AK31QeAG4EmllLffvi8APd19w/Q2KX0FuF4ptQXjZvgXzfsu\nNwD/pJR6D6NmEKtp61bgcaXURuDrGEGxGfgFsA54H1ijtX4Vo9a01lx8aDawA5jaUxatdRVGb7On\nzbKcAtwY9Vlnm7+DEGNOFrYSgkj336e11ucMc//rgFe01jvMm+UPmTfVx5xSaj5ws9b64mR8vhBS\nIxECMP+P/zml1IXDPGQX8AezRvIL4KqEFe7obsK4WS9EUkiNRAghxKhIjUQIIcSoSJAIIYQYFQkS\nIYQQoyJBIoQQYlQkSIQQQoyKBIkQQohR+f8B7pdyLzXNPh0AAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reviewing Model performance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have introduce several different concepts in class to understanding model performance throughout the last few weeks; here is one primary location that we can refer to for regressions and classification problems.\n", "\n", "### Regressions\n", "\n", "**R-Squared**\n", "\n", "Definition: On (technically) a scale of 0 to 1, how well does this regression explain the variance in our data? \n", "_note: can be negative if the regression model is inversely related, though this rarely occurs_\n", "\n", "math: $R^2=\\dfrac{SS_{res}}{SS_{tot}}$\n", "\n", "* we like this because: It's easily explainable on a scale, independent of what your y represents. very universal\n", "* we don't like this because: it doesn't actually represent any \"error\" amount\n", "\n", "**Root Mean Squared Error (RMSE)**\n", "\n", "Definition: The square root of the mean of the squared errors, where squared error = $(y_{true} - y_{pred})^{2}$\n", "\n", "math: $\\sqrt{\\dfrac{1}{n}\\sum(y_{true} - y_{pred})^{2}}$\n", "\n", "* we like this because: It represents error against the y units, unlike $R^2$, and punishes larger error\n", "* we don't like this because: if the scale of y changes dependent on the model, it's tough to compare results\n", "\n", "### Classification\n", "\n", "**Confusion Matrix**\n", "\n", "Definition: Given class labels, a true vs predicted label for all observations.\n", "\n", "\n", "\n", "\n", "* we like this because: It's the core object for solving for a variety of classification metrics\n", "* we don't like this because: we actually just like this\n", "* **business prop**: in a business decision, what's the value to error? How much is a true positive worth to a business vs a false positive or false negative? Consider weighting against some business goal.\n", "\n", "**Accuracy**\n", "\n", "Definition and math: $\\dfrac{TP + TN}{TP + TN + FP + FN}$\n", "\n", "* we like this because: It is a nice \"overall\" number to look at.\n", "\n", "**Misclassification Rate**\n", "\n", "Definition and math: $\\dfrac{FP + FN}{TP + TN + FP + FN}$\n", "\n", "* we like this because: it provides the 1 - accuracy value, a number we want to drive down\n", "\n", "\n", "**false positive rate (fpr)**\n", "\n", "Definition: The percent of the negatives were predicted as positive (how often is the predictor wrong on negatives?)\n", "\n", "Math: $\\dfrac{FP}{FP + FN}$\n", "\n", "**true positive rate/recall (tpr)**\n", "\n", "Definition: The percent of the positives were accurately measured as positives\n", "\n", "Math: $\\dfrac{TP}{TP + FN}$\n", "\n", "** ROC Curve and AUC**\n", "\n", "Definition: The area of a [1, 1] plot given the a line drawn from [0,0], [fpr, tpr], and [1, 1]\n", "\n", "Note there are two routes to go with using AUC in python:\n", "\n", "* **plotting classes**: calculate the roc using the binary class labels to gain a single point on the plot and visualize\n", "* **plotting scores/probalities**: calcuate the roc using the probalities for class $y_1$. Helpful to visualize when the data may not be weighted normally (a priori is not around .5)\n", "\n", "All of sklearn's model metrics are in the [sklearn metrics page](http://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics). Keep this around as a reference point so you know how to run the metrics you need to use!" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Guided Practice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Back to the data problem at hand!\n", "\n", "We'll start by working through an explain on each column in the dataset together as a class and come up with some ideas on how to handle each column." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print lemons.groupby('Auction').Auction.count()\n", "print lemons.groupby('Auction').IsBadBuy.mean()\n", "\n", "# seems like the ADESA auction is particularly worse for bad buys (about 36% more)\n", "# it may help to create a new column that specically refers to \"is_adesa\"\n", "\n", "lemons['auct_adesa'] = lemons.Auction.apply(lambda x: 1 if x == 'ADESA' else 0)\n", "\n", "print lemons.groupby('auct_adesa').IsBadBuy.mean()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Auction\n", "ADESA 10128\n", "MANHEIM 28645\n", "OTHER 12315\n", "Name: Auction, dtype: int64\n", "Auction\n", "ADESA 0.153732\n", "MANHEIM 0.114400\n", "OTHER 0.118149\n", "Name: IsBadBuy, dtype: float64\n", "auct_adesa\n", "0 0.115527\n", "1 0.153732\n", "Name: IsBadBuy, dtype: float64" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "print plt.hist(lemons.VehicleAge)\n", "\n", "print lemons.groupby('VehicleAge').IsBadBuy.mean()\n", "\n", "# there seems to be a stronger relationship with bad buys as vehicles are older.\n", "# is there anything we should do here?" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(array([ 1.00000000e+00, 2.15900000e+03, 5.93600000e+03,\n", " 1.11580000e+04, 1.19690000e+04, 9.06400000e+03,\n", " 5.57700000e+03, 3.23400000e+03, 1.53200000e+03,\n", " 4.58000000e+02]), array([ 0. , 0.9, 1.8, 2.7, 3.6, 4.5, 5.4, 6.3, 7.2, 8.1, 9. ]), )\n", "VehicleAge\n", "0 0.000000\n", "1 0.044465\n", "2 0.062163\n", "3 0.083259\n", "4 0.110118\n", "5 0.146293\n", "6 0.180384\n", "7 0.219852\n", "8 0.253916\n", "9 0.316594\n", "Name: IsBadBuy, dtype: float64\n" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAAECCAYAAAAciLtvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEVhJREFUeJzt3X9sVed9x/G3STDBibGyjB9tUpVElb9Ck1CULEsDCRCV\nJQOtylZN6UqWppVCVMqiTotEh8fQGtHSKWvWsWq0AbVAkm5SUVY1QiFM7TaIlxVUJbQo6bcjsid1\nCiSggJ3w0+D9cY4bjxr/uJh7jHm/JMS5z33O/T4HmfPxOc+55zT09vYiSbq8Tah6AJKk6hkGkiTD\nQJJkGEiSMAwkSRgGkiTgyqE6RMTtwFcz8+6IuBlYB5wBTgKfzsy3ImIp8AjQA6zJzG0RMRl4BpgK\ndAMPZeahiPgo8PWy747MfPyibJkkadgGPTKIiBXABmBS2fR14E8z827gOeCLETEdeBSYA9wLrI2I\nRmAZsDcz5wFbgFXlZ3wT+FRm3gncXgaMJKlCQ50m2g98AmgoX/9xZv60XJ4IHAd+B2jPzNOZ2VWu\nMxuYC2wv+24HFkZEM9CYmR1l+4vAwlHZEklSzQYNg8x8juJ0Tt/rAwARMQdYDvwdMAU42m+1bqCl\nbO8apK1/uySpQkPOGZwrIj4JtAGLM/NwRHQBzf26NANHKHb6zYO0QREOR4aoNwm4DXiTYq5CkjS4\nK4APAHsy8+RwVhhRGETEn1BMFC/IzHfK5t3Al8ud9lXALGAf0A4sBvYAi4CdmdkdEaci4iagA7gH\n+Oshyt4G7BrJOCVJANwFvDScjsMNg96ImAD8PfA/wHMRAfDvmfmliFhHscOeALRl5smIWA9sjohd\nFFceLSk/63PAsxTJ9WJm7hmi9psAzz77LDNmzBjmcHWp6OjooG19O1e3TK9r3feOHuQry+Zy4403\n1rWuVA8HDhzggQcegHL/ORxDhkFmdlJcKQRw3Xn6bAQ2ntN2HLh/gL4/Bu4Y7gApTw3NmDGDG264\nYQSr6VJw7NgxrrxqChObfqOuda88eZzp06f7M6Xxbtin1v3SmSTJMJAkGQaSJAwDSRKGgSQJw0CS\nhGEgScIwkCRhGEiSMAwkSRgGkiRquIW1xqdTp07R2dlZ97odHR1Dd5J00RkGAqCzs5MHV36XppZp\nda17+Jevc90Ns+paU9KvMwz0K00t07jm2uvrWvPY0YN1rSdpYM4ZSJIMA0mSYSBJwjCQJGEYSJIw\nDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiSG8TyDiLgd+Gpm3h0RHwE2AWeB\nfcDyzOyNiKXAI0APsCYzt0XEZOAZYCrQDTyUmYci4qPA18u+OzLz8YuxYZKk4Rv0yCAiVgAbgEll\n05NAW2bOAxqA+yJiBvAoMAe4F1gbEY3AMmBv2XcLsKr8jG8Cn8rMO4HbI+LmUd4mSdIIDXWaaD/w\nCYodP8AtmbmzXH4BWAjcBrRn5unM7CrXmQ3MBbaXfbcDCyOiGWjMzL4H375YfoYkqUKDhkFmPkdx\nOqdPQ7/lbqAFmAIcPU971yBt/dslSRUa6QTy2X7LU4AjFDv35n7tzQO0D9TW/zMkSRUaaRi8EhHz\ny+VFwE5gN3BXREyKiBZgFsXkcjuwuH/fzOwGTkXETRHRANxTfoYkqUJDXk1U6i3/fgzYUE4QvwZs\nLa8mWgfsogiXtsw8GRHrgc0RsQs4CSwpP+NzwLPAFcCLmblnlLZFklSjIcMgMzsprhQiM/8bWDBA\nn43AxnPajgP3D9D3x8AdNY1WknRR+KUzSZJhIEkyDCRJGAaSJAwDSRKGgSQJw0CShGEgScIwkCRh\nGEiSMAwkSRgGkiQMA0kSw7+FtTSunD3TQ0dHx9AdL4KZM2fS2NhYSW3pfAwDXZZOvHuY1U+9TFPL\nG3Wte+zoWzy9dgmtra11rSsNxTDQZaupZRrXXHt91cOQxgTnDCRJhoEkyTCQJGEYSJIwDCRJGAaS\nJAwDSRKGgSQJw0CShGEgScIwkCRhGEiSMAwkSRgGkiRquIV1REwANgKtwFlgKXAG2FS+3gcsz8ze\niFgKPAL0AGsyc1tETAaeAaYC3cBDmXloFLZFklSjWo4M7gGuzsw7gceBrwBfA9oycx7QANwXETOA\nR4E5wL3A2ohoBJYBe8u+W4BVF74ZkqQLUUsYHAdaIqIBaAFOAbdm5s7y/ReAhcBtQHtmns7MLmA/\nMBuYC2wv+24v+0qSKlTLk87agauAnwPXAR8H5vV7v5siJKYAR8/T3nVOmySpQrUcGayg+I0/gJsp\nTvVM7Pf+FOAIxQ6/uV978wDtfW2SpArVEgZX8/5v9u9QHF28EhHzy7ZFwE5gN3BXREyKiBZgFsXk\ncjuw+Jy+kqQK1XKa6AngOxGxi+KIYCXwE2BDOUH8GrC1vJpoHbCLInTaMvNkRKwHNpfrnwSWjMaG\nSJJqN+IwyMwjwB8O8NaCAfpupLgMtX/bceD+kdaVJF08fulMkmQYSJIMA0kShoEkCcNAkoRhIEnC\nMJAkYRhIkjAMJEkYBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CS\nhGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkgRcWctKEbES+DgwEfgG0A5sAs4C+4Dlmdkb\nEUuBR4AeYE1mbouIycAzwFSgG3goMw9d6IZIkmo34iODiFgA3JGZc4AFwE3A14C2zJwHNAD3RcQM\n4FFgDnAvsDYiGoFlwN6y7xZg1ShshyTpAtRymuge4GcR8X3geeAHwK2ZubN8/wVgIXAb0J6ZpzOz\nC9gPzAbmAtvLvtvLvpKkCtVymmgq8CHg9ymOCp6nOBro0w20AFOAo+dp7zqnTZJUoVrC4BDwemb2\nAL+IiBPA9f3enwIcodjhN/drbx6gva9NpVOnTtHZ2Vn3uh0dHXWvKWnsqCUMXgK+ADwZER8EmoAf\nRsT8zPwPYBHwQ2A38OWImARcBcyimFxuBxYDe8q+O3+9xOWrs7OTB1d+l6aWaXWte/iXr3PdDbPq\nWlPS2DHiMCivCJoXEbsp5hw+D3QCG8oJ4teAreXVROuAXWW/tsw8GRHrgc0RsQs4CSwZpW0ZN5pa\npnHNtdcP3XEUHTt6sK71JI0tNV1amplfHKB5wQD9NgIbz2k7DtxfS11J0sXhl84kSYaBJMkwkCRR\n45yBpNqcPdNT6WW8M2fOpLGxsbL6GrsMA6mOTrx7mNVPvUxTyxt1r33s6Fs8vXYJra2tda+tsc8w\nkOqsikuHpaE4ZyBJMgwkSYaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJw0CShGEg\nScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEnCMJAkYRhIkoAra10xIqYBPwE+BpwFNpV/\n7wOWZ2ZvRCwFHgF6gDWZuS0iJgPPAFOBbuChzDx0QVshSbogNR0ZRMRE4FvAe0AD8CTQlpnzytf3\nRcQM4FFgDnAvsDYiGoFlwN6y7xZg1QVvhSTpgtR6mugJYD3wZvn6lszcWS6/ACwEbgPaM/N0ZnYB\n+4HZwFxge9l3e9lXklShEYdBRHwGeDszd5RNDeWfPt1ACzAFOHqe9q5z2iRJFaplzuCzQG9ELARu\nBjZTnP/vMwU4QrHDb+7X3jxAe1+bJKlCIz4yyMz5mbkgM+8GXgU+DWyPiPlll0XATmA3cFdETIqI\nFmAWxeRyO7D4nL6SpAqNxqWlvcBjwJci4j8pjja2ZuZBYB2wC/ghxQTzSYq5ht+KiF3Aw8CXRmEM\nkqQLUPOlpQDl0UGfBQO8vxHYeE7bceD+C6krSRpdfulMkmQYSJIMA0kShoEkCcNAkoRhIEnCMJAk\nYRhIkjAMJEkYBpIkLvB2FJIuHWfP9NDR0VFJ7ZkzZ9LY2FhJbQ2PYSBdJk68e5jVT71MU8sbda17\n7OhbPL12Ca2trXWtq5ExDKTLSFPLNK659vqqh6ExyDkDSZJhIEkyDCRJGAaSJAwDSRKGgSQJw0CS\nhGEgScIwkCRhGEiSMAwkSRgGkiQMA0kShoEkCcNAkoRhIEmihofbRMRE4NvAh4FJwBrgdWATcBbY\nByzPzN6IWAo8AvQAazJzW0RMBp4BpgLdwEOZeWgUtkWSVKNannT2APB2Zj4YEdcCe4FXgLbM3BkR\n64H7IuK/gEeBW4HJwEsR8a/AMmBvZj4eEZ8EVgF/NhobM5pOnTpFZ2dn3etW9YxaSZe3WsLge8DW\ncnkCcBq4JTN3lm0vAPcAZ4D2zDwNnI6I/cBsYC7wN2Xf7cBf1Tj2i6qzs5MHV36XppZpda17+Jev\nc90Ns+paU5JGHAaZ+R5ARDRTBMMq4G/7dekGWoApwNHztHed0zYmVfG82GNHD9a1niRBjRPIEfEh\n4EfAlsz8J4q5gj5TgCMUO/zmfu3NA7T3tUmSKjTiMIiI6cAOYEVmbiqbX4mI+eXyImAnsBu4KyIm\nRUQLMIticrkdWHxOX0lShWqZM2ijOLWzOiJWl21fANZFRCPwGrC1vJpoHbCLInTaMvNkOcG8OSJ2\nASeBJRe8FZKkC1LLnMEXKHb+51owQN+NwMZz2o4D94+0riTp4vFLZ5Ikw0CSZBhIkjAMJEnUdjWR\nJA3b2TM9ld1mZebMmTQ2NlZS+1JjGEi6qE68e5jVT71MU8sbda177OhbPL12Ca2trXWte6kyDCRd\ndFXc2kUj45yBJMkwkCQZBpIkDANJEoaBJAnDQJKEYSBJwjCQJGEYSJIwDCRJGAaSJAwDSRKGgSQJ\nw0CShGEgScIwkCRhGEiS8ElnksYpn708MoaBpHHJZy+PjGEgadzy2cvD55yBJMkwkCRVdJooIiYA\n/wjMBk4CD2dmfU/sSZJ+paojgz8AGjNzDvAXwNcqGockierCYC6wHSAzfwz8dkXjkCRRXRhMAbr6\nvT5TnjqSJFWgqktLu4Dmfq8nZObZ8/S9AuDPH3uMSZMmXfSB9WlqaqL77WvoOdE1dOdRdOyd/+XM\nqfesa91xU/tyq3u8+xAHDx6kqamprnX7O3DgQN/iFcNdp6owaAc+DnwvIj4K/HSQvh8A2Pvqq/UY\n16/prqDmKetad5zVvtzqPvzw8xVUHdAHgGFdnFNVGPwL8LsR0V6+/uwgffcAdwFvAmcu9sAkaRy4\ngiII9gx3hYbe3t6LNxxJ0iXBSVtJkmEgSTIMJEkYBpIkxvAtrL1/0fsiYiLwbeDDwCRgTWaOmWvX\nqhAR04CfAB/LzF9UPZ6qRMRKisu0JwLfyMzNFQ+pEuX+YiPQCpwFlmZmVjuq+ouI24GvZubdEfER\nYBPFv8c+YHlmnveKobF8ZOD9i973APB2Zs4Dfg/4RsXjqVQZjt8C3qt6LFWKiAXAHeX/kQXATZUO\nqFr3AFdn5p3A48CXKx5P3UXECmADxS+MAE8CbeV+owG4b7D1x3IYeP+i930PWF0uTwB6KhzLWPAE\nsJ7iuyeXs3uAn0XE94HngR9UPJ4qHQdaIqIBaKH4vtnlZj/wCYodP8AtmbmzXH4BWDjYymM5DLx/\nUSkz38vMdyOimSIY/rLqMVUlIj5DcZS0o2xqGKT7eDcVuBX4I+BzwLPVDqdS7cBVwM8pjhr/odrh\n1F9mPsf//0Wx//+NdylC8rzG8s51JPcvGvci4kPAj4AtmfnPVY+nQp+l+Pb6vwE3A5sjYnrFY6rK\nIWBHZvaU8yYnIuI3qx5URVYA7ZkZvP9zcWk9kX709d9fNgNHBus8lsOgHVgMMIz7F41r5c5uB7Ai\nMzdVPJxKZeb8zFyQmXcDrwKfzsyDVY+rIi9RzCERER8ErgYOVzqi6lzN+2cS3qGYUB/2TdrGqVci\nYn65vAjYOVjnMXs1ESO7f9F410ZxiLc6IvrmDhZl5okKx6SKZea2iJgXEbspfrH7/GBXi4xzTwDf\niYhdFEGwMjOPVzymqvT9DDwGbCiPkF4Dtg62kvcmkiSN6dNEkqQ6MQwkSYaBJMkwkCRhGEiSMAwk\nSRgGkiQMA0kS8H/Q6AWOQsCrUwAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Group Work\n", "\n", "1. Continue to parse through each column and determine relationships against IsBadBuy.\n", " 1. Is there missing data? How should it be imputed? What does the missing data mean?\n", " 2. Are there any clear relationships?\n", " 3. Do we need to be concerned with feature scaling?\n", "\n", "2. Generate a model in your group. The goal should be a cross validated model that, on average, performs better than the benchmark on the training data.\n", "3. Once you've created a model your team is comfortable with, generate a \"submission\" csv file on the out of sample data. Ed, Julia, and Pooja will post \"scores\" against the actual values for the out of sample data.\n", "\n", "Once you've done so, use the rest of the class today to work on your projects individually. Use this time to practice everything we learned today and to improve your project 2 for Monday." ] } ], "metadata": {} } ] }