{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cooling Tower Identification: MODA Analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Import dependencies\n", "\n", "* This was written in Python 2.7" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/srimmele/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", " \"This module will be removed in 0.20.\", DeprecationWarning)\n" ] } ], "source": [ "import sys as sys\n", "import os\n", "import warnings\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "with warnings.catch_warnings():\n", " warnings.filterwarnings(\"ignore\")\n", " from patsy import dmatrices\n", " from sklearn.linear_model import LogisticRegression;\n", " from sklearn.cross_validation import train_test_split;\n", " from sklearn import metrics\n", "\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from matplotlib import style\n", "style.use('seaborn-notebook')\n", "%matplotlib inline\n", "\n", "\n", "\n", "np.random.seed(100)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/srimmele/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (7,8,9,11,28,51,53,54,78,80,84,92) have mixed types. Specify dtype option on import or set low_memory=False.\n", " interactivity=interactivity, compiler=compiler, result=result)\n" ] } ], "source": [ "data_path = \"../data/tower_data.csv\"\n", "\n", "if os.path.isfile(data_path):\n", " tower_data = pd.read_csv(data_path)\n", "else:\n", " print 'run Tower-Identification-OSA-Data notebook first'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "###Some helper functions\n", "\n", "def Remove_Dummy_BINS(df):\n", " '''\n", " removes illegitimate BINs from input dataframe.\n", " '''\n", " dummy_BINs = [1000000 , 2000000, 3000000, 4000000, 5000000]\n", " df = df[~df.BIN.isin(dummy_BINs)]\n", " return df\n", "\n", "\n", "def CleanInputData(df):\n", " '''\n", " Performs cleaning on input tower_data, see comments below\n", " '''\n", " exclude_BC = ['A','B','V'] # Removing single-family, 2-family, vacant land\n", " testset = df[~df['BC'].isin(exclude_BC)] \n", " testset = testset[testset.BldgArea != 0] # removing any buildings with no reported area\n", " testset = testset[testset.NumFloors != 0] # removing any buildings with no report floors\n", " return testset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploratory Analysis\n", "\n", "There are so many potentially relevant characteristics available in PLUTO. One very relevant factor is a building's class, which can indicate a lot about what's inside a building (and what needs to be kept cool).\n", "\n", "\n", "Building Class Codes can be found in the appendix of the [PLUTO Data Dictionary](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwiNgdrMp-nWAhUFOiYKHYWoACMQFggoMAA&url=https%3A%2F%2Fwww1.nyc.gov%2Fassets%2Fplanning%2Fdownload%2Fpdf%2Fdata-maps%2Fopen-data%2Fpluto_datadictionary.pdf%3Fr%3D16v2&usg=AOvVaw1-CEPaXRsh0VS-bz9UKkb9)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYoAAAEVCAYAAAD+TqKGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmcXEW5xvHfZLJJAggKQgDFhfuwyKKskS0YQFxQJERl\nUUG4KCKCiAIuEEBFryAqcGWRRUW87AgIGLYIChhEEVB8QDaBiIRFIEISksz9o6pDZ9Ldc7qne3p6\n5v3ymQ99TtepU92ZOe+p5VR19fT0EEIIIVQzot0FCCGEMLhFoAghhFBTBIoQQgg1RaAIIYRQUwSK\nEEIINUWgCCGEUNPIdhcgDF2STgMOsz1H0nm296rj2LcA04E5tjeq8P7mwDeBN5FueB4Bvm771n6U\ntwdYA9gU2Nn2pxrNqyzPfYDD8+aqwHzgmbx9oO0b+nuOEFotAkVopeVykBgJvFLnsVsC/7S9de83\nJG0EXA3sa/vyvO+DwDWS3mX7L/0ptO3LgMv6k0dZXucA5wBIOhf4u+1vNCPvEAZKBIrQSl35/2sB\nf6+UQNJU4GjS7+Is4L+BlYH/AZaT9GfbG/Y67KvA6aUgAWD7Ckm7Ak9Vy9f2g5LGAt8HtgMWkQLO\nl20vLCvT3sBetrfPF/dHgXcB/wXcD3zI9kuS3gP8GJgDnAScAGxg+5GiX5Ck1wA/BLbJ5bkSOBI4\nD/i17XMlrQY8Drzb9k2SNgNOs/1OSQcABwNjgd8C+9meK+k84F/AjsBRwIPAGcBywCjgJNv/W7Sc\nYXiLPorQdJIOkXQtsImky4GfAjtLOrBXujcCZwK72F4b+BUpANxGuljeViFIAGyb0y7B9g22Z1fL\nNyc7hNS8tB7wTmBrYPc+PtJU4KPAW4GVgA9L6gZ+Auxvex1SMBzXRz6VfJEUGNcDNgYm5/PdBEzM\nabYBbifVsshlvkHSdqQgMAl4MzAXmFaW93bAJrmGdAxwsu11cz47ShrVQHnDMBSBIjSd7e8DPyL1\nGewCzACm2j61V9IdgJtsl2obPwa2y01VtaxIuluupla+7wfOsL3A9svAz0l33bX8yvazthcA9wBv\nJNUuxti+Jqc5mcb+nsrL8xJwfi5PeaDYmvR9lgLFVsANwM7AL2w/absHOA3YtSzv623Py6+fAqZK\negcw2/YututtDgzDVASK0CobAzPz6zVsP1YhzUrAc6UN28+Tmqte30feTwOr1Xi/Vr5LvJdfr9zH\n+Z4ve70Q6AZW6JXPrD7yKFTWUnlsPwAsL2lZUsC4GHiTpBHAZsAtwGuBj0v6m6S/kYLM6LK8ni17\nfRhwX87nH5I+3WB5wzAUfRSh6STdCbwN+IgkgAn5QtZ7lM+/ePWuGUkrkNrpn+7jFDcBU4Df9Drv\nPqQ7/lr5/gt4Xdlhr6N27aSaF4DxZdurNJAHfZTnZuB9wPzcJ/JX0ud+0PZ/JM0CzrJ9RF8nsf0i\nqTnvyNzH8WtJ19l+qMFyh2EkahSh6WxvDFyZ+wc+APzA9toVhoJeB2yTh8ICfAaYnpt4avkGsJek\nT5Z2SPow8G3SBbxWvlcB+0rqljQO+DgV+jsKeAAYJWlS2TkamYr5KmC/XJ7xwF5l5bkJ+AJQGvJ7\nG3AoqdkJ4JfAbpJeByBpV0mHVTqJpKslrZ037yV9TzF1dCgkAkVoOklrkUbZQGpfn1Epne3Hgf2A\nX+YaxzZAn00iefjrDqRg8ZCk+4BPAZNt399HvicDjwF/Af5AulBfVO9nzG3/BwDnSrqLNBpqEfVf\nfL8PPJnLcwdpWG5paO5NwOa8GihuBbYAbsxluAP4LnBL/g4+D1xR5Tw/BC7M6e4Evm/74TrLGoap\nrliPIoT+y7WTOcBrc59ICENG1ChCaJCkOyR9NG9+FLgvgkQYiqIzO4TGfQE4VdJxpDb/T/aRPoSO\nFE1PIYQQaoqmpxBCCDVF01Mvcxd0xpDBOXP7GkH6qnkLFhVOu+K40X0nyrq6+k4D9ZV13Jjiv5J/\n/9ecQunWWmV834myl+Yt7DtRtsyY7sJpX55fLN8RRb9UYFEdrQGjuovdE47sLn7+V+r4vRo1svn3\npE88+3LhtCsvN6Zw2nr+Xl4/fmTxL6yK17zjc4X/IV/+0yn9Pl8jokYRQgihpqhRhBBCO3UN/vv1\nCBQhhNBOI4o3YbZLBIoQQminOvql2iUCRQghtFM0PYUQQqgpahQhhBBqihpF/+Rpl68gzcG/MfAt\n0sIxV9s+rsZxGwCnkmbzfA7YA/gOaYnJ/Wxf1eKihxBCMR1QoxjsoWwacKbt50jTJE/h1fV+161x\n3MnAF21vS1o3YG/bBwHXtri8IYRQnxHdxX/aVcS2nbkPksYCu5Hm0H8L8Kztx2wvAq4mLUJfzc62\nS8twzmbJFcRCCGHw6BpR/KdNBm2gIK0LfLfthaRlJmeXvfcUsGq1A22/AIvXCPgEaZ3gEEIYfLq6\niv+0yWDuo5gAPF7lvT6/sRwkrgBOsH1fMwsWQghNE53Z/VaaLGsWSy5ev1reV5GkkaT1hM+3fW7L\nShdCCP0VgaJfZgGrA9h+RNJyktYk1TI+AOxZ49jDgRm2z2p5KUMIoT+6YwqP/pgJbCipO/dTHAD8\nIr93ge37Ja0CHGP7072OPRB4RNL2eftG28cOTLFDCKEOTex7kPR2UmvKSbZPkbQGcA4wCngF2Mv2\nk5JeAX5XdujkfJ2taNAGCttzJV1KGvl0ge2bgYm90jwp6YUKx04YoGKGEEL/NKnpKffLngzcULb7\nG8AZti+UdCBwKPBl4Hnbk4rmPdgbx44G9pe0QqU3JY0GriuSkaSTgZ2aWLYQQui/5o16mge8jyX7\nbz8LXJJfN/yowKCtUQDYfpEaz0vYng9ML5jXQcBBTSpaCCE0R5NqFLYXAAskle/7D4CkblKTfKkJ\nfqyk84E3AZfY/l6tvAd7jSKEEIa2Fj9HkYPEz0h9taVmqcOA/YEdgT0lbVIrj0FdowghhCGv9VNz\nnAM8YPuY0g7bp5VeS7oBWB/4Q7UMIlCEEEI7tfA5Ckl7AvNtH122T6T+3z2BbtL8eTVnr4hAEUII\n7dSk4bGSNgZOBNYEXpG0G7AyMFfSjJzsr7Y/K+kx0iMIi4AryubGqygCxQDo6ek7TUnR35llRhev\nro4ZWfyOpZ7f2QULi32wua9UHZ69lLGjmv+5WvH915vvS/OLfQet+nddVLCs9Xym7hHtnR774Mvu\nLZz2ok9tWjht10DPqdS8zuw7gUkF0x5eT94RKEIIoZ1iCo8QQgg1tXGdiaIiUIQQQjt1wAp3EShC\nCKGdoukphBBCTVGjCCGEUMuAj7JqwKAOFJLGk1apmwJsBBwPLAQM7JfXz651/KeBI22vmScFnJqP\nu6q1JQ8hhGI6IVAM9saxacCZtp8DzgB2s70lsCx9zAQraWVg19J2nhTw2tYVNYQQ6tc1oqvwT7sM\n2kAhaSxpLYoL866NbZfW0C4yXe7/AEe1qHghhNAUXV1dhX/aZdAGCmAz4O7Sqku2XwCQtCppxsOr\nqx0oaRLwsu3fD0A5QwihYZ0QKAZzH8UE0vrYi+XmpCuBz9p+ptJBeTGjY4EPtbyEIYTQT9FH0X+L\nZ56RtBxwDfA127UWK3oH8AbgGkm3A6tK+r/WFjOEEBrUVcdPmwzmGsUsYPWy7RNJC4bX7JDOzU2L\nl3iS9Ijtj7WmiCGE0D+dUKMYzIFiJrBhXp1pDPAJYC1J++X3zycNnT3G9qfbVMYQQuiXESMGe8PO\nIA4UtudKupQ0JPYCUrBYiqQX+shnzRYUL4QQmqITahSDPZQdDewvaYVKb+aO6+uKZJQfuKv57EUI\nIQy46KPoH9svApNrvD8fqNWxXZ72IOCgJhUthBCaohNqFIM6UIQQwlAXgSKEEEJN7Zyao6gIFCGE\n0EZRowh1m/dKzQlxFxszqvg4hBF13LEsWtTTd6I6PTNnfuG0K44bXTjtSstVHAjXLyPr+K56eop/\nV/fOer5Qune9pa8pzBoz67mXC6V74+uXKZxnuy9w337/OoXTzl9Q7O8KoL4b/P5/B+3+HouIQBFC\nCG0UgSKEEEJNEShCCCHUNvjjRASKEEJop5jCI4QQQk3R9BRCCKG2wR8nIlCEEEI7RY2inySNJ00l\nPsX2c3nf8cBE25NqHLc88H/AisATwO7ACcBUYD/bV7W46CGEUEgnBIrB3osyDTizLEisC2xT4Liv\nAtNtbw7cBWyYJwWsuehRCCEMtFgzux8kjQV2Aw4v230iKQhM6+PwnYFtAWwf24ryhRBCM8RcT/2z\nGXC37YUAkvYGfgM8UuDYVYDPSNoB+CvwedvzWlTOEEJoWDNrCpLeDvyStGz0KZLWAH4GdAP/BD5u\ne56kPYFDgEXAGbbPqpXvYG56mgA8DiBpRWAfUo2iiLHAdba3Jn3G/fpIH0IIbdGspidJ44CTgRvK\ndh8LnJqvhX8HPpXTHQVsD0wCvpCvsVUN5kABUJp17d3ASsAtwGXAOyWdVOO4x2zfll9PB9ZrXRFD\nCKFxXV3Ff/owD3gfMKts3yTSgCCAK0nBYXPgDtvP234Z+B2wZa2MB3OgmAWsDmD7Ytvr2t4C+DDw\nR9tfqHHsjZK2y683BtzaooYQQmOaVaOwvSBf+MuNK2t2fwpYldQ0P7ssTWl/VYM5UMwENpTUXS2B\npFUknV7hra8DR0q6BXgb8OMWlTGEEPplxIiuwj/9VC2DPjMetJ3ZtudKupQ08umCsv2PkKpT2H5S\n0gsVjp0N7DgwJQ0hhMa1eNTrHEmvyTWN1UgtNbNItYqS1YDba2UymGsUAEcD+0taodKbkkYD1xXJ\nSNLJwE5NLFsIIfRbi2sU1wNT8usppGfJfg9sKum1+aHmLUn9v1UN2hoFgO0Xgck13p9P6qwuktdB\nwEFNKloIITRFs2oUkjYmjQxdE3hF0m7AnsC5kj4NPAr8xPYrko4Afk0aMHSM7ZpLMA7qQBFCCENd\ns56jsH0nuVm+lx0qpL0YuLho3hEoQgihjTpgqqcIFCGE0E6xcFEA6rtjmL9wUdPPP6q7eAHqqgb3\n9PSdBlh27KjCWdbTYTeyBXPkdNeR56JiHx+A8aOK/anVc/56vquF9RS2oIL//EBr7ppfO67479Xo\nkcUvxj31fLAmiBpFCCGEmjphmvE+A4WkNYHVbP9O0n8DWwAn2L6v1YULIYShrgPiRKHnKM4B5kt6\nB2lyvUuAH7a0VCGEMEx0wnoURQJFj+07SHMsnWL7ajpildcQQhj8mjgpYMsU6aMYL2lT0lQa20oa\nA1R8UjqEEEJ9mjCHU8sVqVGcAJwJnJ7nUJoGnN/KQoUQwnDRCU1PRWoUL9reqGz7q7abP4azgjwP\nyRWkOUrGA78ARpOmGf9MjeOmAIcB84EngL1Jj7ZPBfazfVVrSx5CCMUMlc7sQyUtDigDFSSyacCZ\ntp8jXehPtL0ZsFDSG2sc90NgJ9vbAnOAXfNcT9e2usAhhFCPoVKj+DfwV0l/JN2hA2D7Ey0rFSBp\nLKlf5HBJI4Ctgd3zuQ/s4/BngdcCz+f/P93CooYQQsM6oUZRJFBclX8G2mbA3bYXSnoD8CJwkqR3\nArfYPrLGsQcBf5L0b+BPtq8fgPKGEELdhkRntu2fAL8hXahfAG7K+1ptAvB4ft1FWlzjB8C2wDsk\nvb/SQbn28UNgU+CtpGaqD7a+uCGEUL9OaHrqM1BI+gxwE/Ax0tzmMyR9stUFy0qTrjwNPGr7QdsL\ngRuA9aocsxLQldP25LSbtL6oIYRQvyERKICPA+vY/ojt3YD1gaojjppoFrA6pEXDgYckrZXf2xhw\nleOeBlaQtFLe3hR4oJUFDSGERnXCA3dFAsUC23NLG7b/Q1mndgvNBDaU1J23DwHOkXQrqZP6Skmr\nSDq9/KBc4zgwv/8bUj/M/w1AeUMIoW6dUKMo0pn9WF5vurQ29XuAf7SuSIntuZIuJY18usD234Gt\neiV7UtILFY79JfDLVpcxhBD6qxNGPRWpUexPemhtH9KDa4/mfQPhaGB/SRWnDJE0mlcDWE052O3U\nxLKFEEK/jRjRVfinXfqsUdh+qaxGsSjt8sstL1k60YvA5BrvzwemF8zrINKw2RBCGDRGdECVosio\np12AvwM/Is35dL+k97a6YCGEMBx0Qmd2kT6KLwEb5AkBkTQBuBi4ppUFCyGE4aATVrgr0kcxvxQk\nAGzPAua1rkghhDB8jOgq/tMuRWoUcyR9kSVHPb3YuiKFEMLw0QlTeBQJFPsCxwJ7kZ6Uvj3vC200\nZlSRymBrzV9QbCLhea8sLJxnT0/faUqKVtnrqdnX1QywqHhhx48t8qfWGc0QJe0uaqu+q1cW1vFL\n2ITFPrs6YMHQIqOenmJgnsQOIYRhpwMqFNUDhaTHeHWupaXYrrUeRAghhAI6oRZZq0bR+ynoEEII\nTdYBcaJmoDjY9qGlDUlftH3iAJQphBCGjU5/4G6jXtsV138IIYTQuE6fwqN3qQZ/2AshhA7TARWK\nmoGid0d2PWPG6iJpPHAFMAXYgzQUdyHwB9uH1DhuBPAtYF/bK5Xt/xIwNZf5GODPwM+Bt9t+fas+\nRwgh1KtZTU+S9iWtH1SyCWkWjY2BZ/K+79r+Vb151woUEyR9qmx71fJt22fXe7IappHmkVpImjLk\nbbYXSJouaQvbt1c57gjSlOeLv2lJbyatxjcRWB64BVjP9iRJTzexzCGE0G/NqlDYPgs4C0DStsBH\ngHHAkbav6k/etfoobgO2Lvu5vex100ZESRpLWnPiQtKCSPOB8ZJGAssAz9Y4/GTb/9tr33bANbZL\nU488CqzbrPKGEEIztWjhoqOA45pVxqo1Ctv7NOskfdgMuDuvTLdQ0jHAQ8DLwP/Zvr9GGStNJbIK\nMLts+ylgVeCe5hU5hBCao9l91JI2BR6z/aQkgM9JOpR0Lfyc7bpbVto/DwRMAB4HkLQc8BXgv4A3\nA5tL2rCf+XdAV1EIYbhqwain/YBz8+ufAUfYfjdwF6mZv27FJqBpvVJH+TrAQ6WIJ+kWUkfMn+vI\naxagsu3V8r4QQhh0WvBk9iTyIm22byjbfwVpXaG6Va1RSNon/3+/RjKuwyxg9fz6EWAdSa/J25sA\nD9SZ343A+yWNzmtnrAb8tRkFDSGEZmvmNOP5mjcnr/6JpEskvSW/PQm4t5Ey1qpRfC2vSX2IpKWm\nCW3iqKeZwIaSum3/S9J3gZskLQButX2LpFWAY2x/uvzAvETr+sDykmYAV9j+nqQzgZtJNZUDbBeb\n5jSEEAZYk2sUq5L6IkpOAS6Q9BIwB2io77lWoPgS8D7gtaSRTuV6gKYECttzJV1KGvl0ge3TgdN7\npXlS0gsVjq24Brbtk4GTm1G+EEJopWaGCdt3Au8t274J2LS/+dYa9XQpcKmkKbYv6e+J+nA0cLmk\n6baf6/1mrtlct/RhxUhajfTAXQghDCrdHTDPeJHO7NsknUWKSqWFi75Wvjxqf+VhrpNrvD8fmN6P\n/J8gtc+FEMKg0gnTjBcZHns68Edgd2BP4D7y038hhBD6p6ur+E+7FKlRLGP71LLteyV9sFUFCiGE\n4aTTpxkvGSdp1dKGpNWBsa0rUgghDB9DpUZxHHCnpCdJHfQrAfu2tFQhhDBMdEIfRZ+BwvavJL2V\nNK1GD3C/7bktL9kw9ftHnuk7ETBZbyicZ6t+D59+cV6hdPMXFn+Mpaen+Gz2xdMW/wLqOH1d9jnn\njkLpbj58u8J5juwu/rk+c8FdhdL9+qAtC+fZbnPnLyyeeFzrytFf3UMhUADYfpn6ptEIIYRQQAeM\njh00cz2FEMKw1AmBos/ObEkd8DFCCKEztWg9iqYqMurpxpaXIoQQhqlmTgrYKkWanu6SdCxwK2n1\nOQBsRwAJIYR+6oC+7EKBYqP8//KJAXuImkYIIfTbyA6IFEWGx24Hqa/Cdr8HD0oaT1pAYwppudPT\ngfVsb1KW5iRgC1JAOth21bGFktYALgNm2D4s75tGmm7kiZzsZ/n/XwKuKqULIYR264A40XegyEuR\nngWMB9aW9HVguu3fN3jOacCZtp/L60ncBaxXdr5tgbVsT5S0Dmk684k18jsbuAHo7rX/B7ZP6fVZ\nFgJvb7DcIYTQdENlCo9TgE8B/8zbFwDfa+RkksaS1p24MO/6Cqk2UG4ycDmA7fuAFfJa2tXsSpqo\nMIQQOk4nTOFRJFC8Yvvu0obt+4EFDZ5vM+Bu2wtzXi9WSLMKUD6F+ey8r6IqeQBMlXSdpKskvbnB\n8oYQQkt1wqinIoFiQb7Q9gBIei+NL8o0AXi8zmMaOdfVwNdt7wCcR6x2F0IYpLpHdBX+aZcio56+\nCPwSUF6O9GHgk/04Z18d4rNYsgYxgVebvQqxPbNs8wrgO/UcH0IIA6UTnswuMurpHmADSSsB82wv\ntXZ1HWYBq/eRZjpwDHC6pHcCs2o0L1Uk6QfAxbZvIa1sd28DZQ0hhJbrauqq2a1RZNTTuqSRSusB\nPZLuAabZdgPnmwlsKKnb9kJJFwFrpNNoBnCG7fMl3SnpVmARcGAux97A87YXd36XrYW9CmndjE2A\nzwI/JgWaV3Ie/91AWUMIoeWGRI0C+Cnwv8BRpP6CrUjt/pvWezLbcyVdShr5dIHtqVXSHVFh90xS\nZ3h5ulprYb+r3vKFEMJAGyqBYo7ts8u275M0pR/nPBq4XNJ028/Vcdw44JpGTyppX/IDd43mEUII\nzdbRCxdJKo2Iul7SrsD1pGacycDNjZ4w9zdMbuC4Yiu/VD/+LNKDgyGEMGh0Fxl72ma1ahQLSCOU\nKoW7BcC3WlKiEEIYRjrhyeyqgcJ2B8S5EELobEOij0LSBFLn8/KU1S5sH9vCcoUQwrDQARWKQp3Z\n1wB/pP4nqkMDjr/m/kLptl/7DS0uSd9enFtsJpc3LD+mcJ4jOuH2KqunrBPXX7VQulY9fbvjBu3/\nfWm2FcePLpy2novx6JED25gyYig8RwE8Y3uflpckhBCGoaFSo7hM0p7AbZRNBmj7Hy0rVQghDBMj\nO6AWXSRQbEBaBOiZsn09wBtbUqIQQhhGhkqNYgtgBdvzWl2YEEIYbpo1PFbSJOAi4C951z3A/5BW\n+OwmTa768Uau5UV6be4AxtabcQghhL41eeGi39ielH8OAo4FTrW9NfB30iJ0dStSo1gdeETSfSzZ\nR7FNIycMIYTwqhaPsZoEfCa/vhI4DPhRvZkUCRTfrDfTEEIIxTT5yex1JV0BrEharmFcWVPTU0Cx\ncdq9FAkU3Y1kXI2k8aTFhKYALwOnA+vZ3iS/P4le7Wy5ClUtv4NJne1dwDm2/1fS8sD5pIcE5wB7\nAB8mTwpo+7BmfqYQQmhUEwPFA6TgcCHwFuAmlrzGN3yiIoHi62WvR5PWpfgdcGOD55wGnGn7OUkn\nA3flPMv9xvZufWUk6S3APsAmpBrc/ZJ+DhwCzLD9XUn7A4fbPlzSQuDtDZY7hBCarllhIi+7cEHe\nfFDSk8Cmkl5j+2VgNdLicXUrssLdduXbklYGjm/kZJLGkqYDOTzv+grwOlKNoBGPAFvZXpDzfwlY\njjQ7banT5kpiavEQwiDVrApFft5tVdsnSFoFeANwDqn15rz8/2sbybvufhTbTwHrNHIy0sJDd9te\nmPOqtsTpupKukPRbSTvUKMsi23MAJO0IPG37MdKKd7Nzsobb5UIIodW6uroK//ThCmBbSbcAvwQO\nAL4KfDLvWxH4SSNlLDIp4M9ID9iVrAEsbORkwAT6njNqqXY2SW+zPb9GGbcATgDeX+HtDnicJYQw\nXDVr1FO+8d65wltVb7aLKtJHcX3Z6x7gBWB6P87ZU+vNKu1sqwEPV0ovaUPSGtkfyLUJSO1wqwDP\n0492uRBCaLWOXo+ixHZDVZUqZpGey6iqSjvbE1XSdgNnA1NsP1L21nRgKvAN+tEuF0IIrdbpS6E+\nzJJ3/115ewywiu1Ghs3OBDaU1G17oaSLSE1ZkjQDOIPUzna+pA+RRlkdYHu+pL2B521fVpbfZODN\nwOmSSvu+DPwQOC+3y/0b2KuBsoYQQst1wgpxtVa4e3PvfZJ2IY14OruRk9meK+lS0sinC2xPrZK0\nUjvbTFJneHl+00kdNJXs0kgZQwhhIHV0jaKcpLVId+nzgffbfqgf5zwauFzSdNvP1XHcONIiSg2R\ntC/5gbtG8wghhGYb/GGij0AhaRxwFGk00ZdsN3yhLsk985MbOO6Ofp73LOCs/uQRQgjN1t0BNYqq\nzWOSdgfuBJ4F3tGMIBFCCGFJTZ49tiVq1Sh+DtwP7AS8p6yzuAvosf3uFpcthBCGvK4OaHyqFSiW\n6swOIYTQXB3Q8lRz1NOjA1mQkKz7phXaXYTCVhg3ulC65V8zqnCePTUfx1xSK0aLtOqP9uwTzy+U\n7oQPHNeS84/qgHWZ6zV2VFMntm6bER1eowghhNBiHV2jCCGE0HpDYgqPEEIIrdMJrYIRKEIIoY06\nfdRTCCGEFuuAlqcIFCGE0E5Ro6hA0njSDLFTgI1IkwwuBAzsZ3uRpJOALUiz1R5cbfoOSSOAU4AN\ngFHAGbbPkrQ8cD6wPDAH2AP4MHmuJ9uHtfAjhhBCYZ3QR9GOGW6nAWfmCQHPAHazvSWwLLCTpG2B\ntWxPBPYlTUZYzbuAV2xvRZo/6vgcPA4BZuT9lwKH57mevt2qDxVCCI0Y0dVV+KdtZRzIk0kaS5pi\n/MK8a2PbpaVRZwOvI13wLwewfR+wgqTlKuVn+7e2D86bKwPP2l6U8yitW3ElsH2zP0sIITRDVx0/\n7TLQNYrNgLttLwSw/QKApFWBHYGrSUuYzi47ZnbeV1VeAOl3wIF5V3keTwGrNqn8IYTQVFGjWNoE\n4PHyHZJWJt31f9b2MxWO6fPbyQsgbQGcKmnZeo8PIYR2iRpFZYtn88lNStcAX8ur1UFaV7u8BjEB\n+GeljCStLWkdWDw31UPAOr3yWC1vhxDC4NMBkWKgA8UsYPWy7ROBk2xfW7ZvOqkfA0nvBGblxY4q\nWQf4Vk67DCDg4ZxHaZnVKcC1FY8OIYQ264Smp4EeHjsT2FBSNzAG+ASwlqT98vvn2z5D0p2SbgUW\nkfsdJO1YXJbKAAAT2klEQVQNPG/7srL8LgfendOOAb5te7akHwLnSboF+Dew10B8uBBCqFcntI0P\naKCwPVfSpaQhsReQLu6V0h1RYfdMUmd4eboe4KAKx88Bdul/iUMIocU6IFK0o4/iaGB/SfUuvDCO\n1J/REEn7ApUCUAghtE1XHf+1y4A/mZ37GyY3cFzFp7PrOP4s4Kz+5BFCCM0Wcz2FEEKoqQPiRASK\nEEJop1Ys6dtsEShCCKGNOiBORKAIIYR26oA4EYFisNl9g86Zluo1o4sNmhs1sh2D6waZ/zxXKFmr\n7i5vf+jfhdJ9fuvWnL8VWvVd9fT09J3o1VL0/4RN/ByS/gfYmnRtPx74ILAxUJoe6bu2f1VvvhEo\nQgihjZo17FXSdsDbbU+U9DrgT8CNwJG2r+pP3hEoQgihjZpYM7qZ9GAypBkpxgHdzcg4AkUIIbRR\nswJFXr7hP3lzX9KyDQuBz0k6lLTkwudsP11v3tF4HEIIbdTsJ7MlfYgUKD4H/Aw4wva7gbtIK4zW\nLWoUIYTQRs3slJf0HuCrwE62nwduKHv7CuBHjeQbNYoQQmijZi1HIWl54LvAB2w/m/ddIuktOckk\n4N5GyjjgNQpJ40mRbQpp3Yl9Se1ofwYOtN0j6STSinU9wMHV5nmStCZwD3Bn3jXb9tT8hZ0PLA/M\nAfYAPgx8CbjK9mEt+nghhFCf5tUoPgq8HrhQUmnfOcAFkl4iXQv3aSTjdjQ9TQPOBOYBHwO2tv2K\npBuBiZJGAWvlIV7rAGcDE2vkZ9uTeu07BJhh+7uS9gcOt324pIXA25v8eUIIoWHNWpDI9hnAGRXe\n+kl/8x7QpidJY0m1iAttv2R7cg4Sy5Du/p8kzSx7OYDt+4AV8pKp9ZgMlBY4uhLYvikfIIQQmqwD\nVkId8D6KzYC78zAuACQdATxICh4Pkda6nl12zGyWXEO7t1UkXSzpVkl7lvaV5fEU0DmPO4cQhpcO\niBQDHSgmAI+X77D9beAtwE6StqxwTK2v5xng68DupEfVj5PUOyh0wlQqIYRhKhYuqqwHQNKKpMfN\nb7b9sqRrgC2BWSxZg5gA/LNSRnkRpHPy5tOS/gCsXZbH88BqeTuEEAadTpg9dqBrFLOA1fPrUcC5\neRQUpGYpA9NJ/RhIeicwKweEpUjaTtL38utxwEbA/TmPqTnZFODa5n+UEELovw5oeRrwQDET2FBS\nt+1/AccCN0m6DXgauML2rcCdkm4FfggcCCBpb0kf7pXfLcCK+fibgONtP5GP20TSLcB2pLHFIYQw\n6HR1dRX+aZcBbXqyPVfSpaQawwW2zwXOrZDuiAqHzyTVOsrTLQD2rnD8HGCX/pc4hBBaK5qeKjsa\n2F/SCnUeNw64ptGTStoXqBSAQgihbTqh6WnAO7Nzf8PkBo6r+HR2HcefBZzVnzxCCKHpOqBGEZMC\nhhBCG7Vz2GtREShCCKGNOqGPIgJFCCG00YgIFCGEEGob/JGiq6enp91lGFTmLqCtX8iiRcVOP2IQ\n3IYsWFisrCO721/WdnvqhXmF0q283JiWnP9vsyo+s7qUtScs25LzD1VjR/b/Kv/Ev+cXvuas9trR\nbfljihpFCCG0USfcRkWgCCGENorO7BBCCDW1c2qOoiJQhBBCGw3+MBGBIoQQ2qoDKhQRKEIIoZ2G\n9ZPZeZ2JK0jrQfwJeAwoLYG6p+0nJJ0EbEFazOjgWvM5SVqDtA72DNuH5X3LA+eT1tueA+xh+1lJ\n2wPfyue72vZxki4CdgC2sn1v8z9xCCE0YPDHiZbOHjsNONP2c3n7vbYn5Z8nJG0LrGV7IrAvaQ2J\nWs4Gbui17xBS4NgKuBQ4PO//ISlAbQnsKGld21OBu/r9qUIIoYk6YfbYlgQKSWNJa05cWCPZZOBy\nANv3AStIWq5G+l2B+yrkcVl+fSWwvaS3AM/afsz2IuBqGpitNoQQBsKIrq7CP+3SqqanzYC7bS8s\n23eapDWB3wJHkta0vrPs/dl53wuVMrT9oqTeu1fJxwE8Bazaa19p/1sb+hQhhNBiw7kzewLweNn2\nUaR1q58l1SKmVDimv19XteM74J8hhBAGr1aOelo8f4ntn5ZeS7oaWB+YRbr7L5kA/LPOc5TyeB5Y\nLW/3zre0P4QQBp1OqFG0qjN7FrA6pJFJkn4taXR+b1vgXmA6qR8DSe8EZuXV7+oxHZiaX08BrrX9\nCLCcpDUljQQ+kNOFEMKg01XHf+3SqhrFTGBDSd22n8+1iNslvUwaKnux7R5Jd0q6FVgEHAggaW/g\nedulTmokrQb8nFRTGCdpE+CzpNFN50m6Bfg3sFc+5ADgF/n1Bbbvb9HnDCGEfumEGkVLAoXtuZIu\nJdUYLrD9A+AHFdIdUeHwmaTO8PJ0TwCTqpxulwr53gxMrLPYIYQw4DohULTyOYqjgf0lrVDnceOA\na5pdmPzA3UbNzjeEEPqjE5qeYuGiXmLhouJi4aLiYuGioakZCxf9Z37xi/C40e2pf8RcTyGE0EbN\nvPLXMy1SPVrZ9BRCCKEvTZrDo4FpkQqLQBFCCG3UxCk86p0WqbBoeuqlGW2O/dNB7fkjO6isbfbG\nFVvT91DURm+MvofBqonXnLqmRapH1ChCCGFoatqdXASKEEIYGpoxLVJFEShCCGFoaMa0SBXFcxQh\nhDBESPo2sA15WiTbf25GvhEoQggh1BRNTyGEEGqKQBFCCKGmeI6iD5J2B34KrGr76Rrp1gTuIY1j\n7gHGAl+y/dsKadcCvg+sBHQDtwKH2Z7XK115nl3AAuBbtm/o4/zldrX9bIG0d9k+pFe6twHfA96Q\ndz0KfLbS95DzvNj2JmX7pgFP2z6ld/pqx1Qj6RrgHcB+tq/qKy9JHwK+COxQ5Xvt87w53cPARNu3\nl+2/A/iL7b17pX0QeIftu/O+vQFsn9sr37eS/v1XIf37/w74su2Xq5Sh/HdgDPCd8mn4c7o3k57E\nXYV0A3gzcKTtuX3kSc7zHuCAXssXI+lA4OPAPOA1wFdsX18hzxOBjfP5x+Xv4lnbu1ZIOwn4nO3d\nyvZNo9fviqTbcro7y/Ydn9OdmLfvAXax/WDe/ivpb+nqvH0ZcJrtX5fl8TVgGdtfydsjgD8Cnyj9\n25Wl3Rg4sWzXm4GrbR/Q+3MNZREo+rYH6Zd+N+C0PtLa9iQASdsAXwfeU55AUjdwCXCQ7d9I6iL9\ngR8FfLWPPN8KXCnpY71/oXunLaBm2rJyHlgKdpIOz2Xdo+A5msb2eyWdWyStpPWBY4HJvYNEAx4C\ndgduz3m/Dag2I/JfgW8D76tRthGk7/WLpYAv6YvAGaQLciXlvwMrAn+SdG0psOQ8LyVdIMvzPB34\nZF955vTnkv5df1a2b03gv4FNbb+Sb3B+DCwVKGx/MR+zN/B224dV+w7qcD7wEZa8oZkCbFe2fROp\n8/ZBSa8nBaltgKvz+5vz6jo1JScCd0k6NS9hsA/w+0p/UzlITQKQNI60DMJ3+/exOk80PdWQ/yg3\nI92Z7l7n4W8Anqiwfwfgb7Z/A2C7B/gy6cJWU75r+iZ5kacW2wG4t1eN6LtUv5gNCvli8VPgY7Vq\ngHW4HdghB06Aj1F9xcQ7gTmS3l0jvx2B+3vVCr8HbC5p5b4Kk2uH/2TJ8fI7AA9UyHOipJX6yjP7\nPbBWr33Lk2rGo/O5H7C9bcH8muECYHGNJN/dP5Ev7iWlQAGwFSnQTczp1wEetv2f8kxzgD0O+Iak\nZYDDSDd1fTkOONf2Q419nM4VgaK2qcBVwLXAWnmlvVokaYak20l/qCdUSLM2cFf5Dtsv13Hn+wdg\n3YJp+2NtUnPEYrYX9W6a6KX0+WdImgHs3cLyVTKKdLd+YZ7rphleIV1ES3exH+LVu9VKvgp8M9cU\nK1mbtMrjYvlm4V6WvlAvJd/lvw54rGCe/1Ugz1Gkz/XHXnn8mXQH/bCkcyV9JC8vPCBsPwU8JKm0\nkNlHSLWMcr8hBQiArUm1nW5JryEFkJuqZP9zYB3gTNLF/6laZcmram4NnFTv5xgKoumptj2A42wv\nlHQx8FFSAKimvIlgbeAiSe+wvaAsTQ+pXbpRywLVLtbKF+jy8ny6YNrrbH+zbHsRZb8fkn5JusNc\nHdjA9ksV8uzdnDGtyrlbRaTa3yGSfmb78SblexGwu6QnSbXEOdUS2n5A0h9JvyuVVPv376Lvf9cu\nYC6pLb38d2pEjTyrBazyf/8NSP0el/dOZPsT+c78PaSa7wGS3p0DUbNVyvN80nc5E/gg8K5e5XtW\n0px8E7c58LWcdgvShf2cSifKSzF/BTgP+FStQuXgeBrw6V7f+7ARgaIKSauTfvFOlNQDLENal7tW\noFjM9t/yGuFrkDpES/4GfK7XucaQpge+t0DWm9Dr7nHJ0zanjwL4C/D5ssQfApD0CANYE5X0WuAl\n2/PzeWv9od5r+1RJ/wJ+ni9otWpARV0PnEJq8rm4QPpjgV8Dp5JqJOX+RlrTfbFc+1gPqLa2e1//\nVn8DlrghyHmuC7ivPPNN0FLnznmMybWz+ySdnM/1RtLAhkbNBl7ba99KQKV+t0uBr0j6BanJ7rkK\naW4iBbIe2y9L+i0poGxG6mOp5iHS08t91eYPA2aUd6oPN9H0VN3uwKm2N7S9EeludcXcodyn3L+x\nKkv3U1wHvEnSzjndCOA7VL8DLc/zrcChDEz190ZgjVI58/nfSe0aTSucCnw4X7TWpvqFbzHbF5MG\nIBzVjALkIHUzaY7/Kwuk/xdpuudKtbnrgDdLKu/w/gJwS+/RaXWYDqxTIc/bbM8ucPyXgG/n9vpy\n+wJnlDWjLU+6ZtRspingfmD1PDCA3I+yHWn01xKcpqC4G/gKSzc7ldxE+q5vy9u/BT4A/LPSSLJ6\n5DJ+nCb9LnWqqFFUtzvwidJGrqr+hNSZ+c0qx5RX58eShvbNL09ge5Gk95D+AI8G5pMuHsf0kecY\nUvPCgbb/UeD8JV+2PbNK+qry590JOEXSUbmc/wF27u8fX52mkTqnDyYNS3y4dvLFPg/8QdJNtmdU\neL/3d9XX93QRsJLt5yUVOf8J9Ko5wBL//qdJOpZ04f0DZbW3euWm0Z2An+bho12kIdefKXj8w5Iu\nITXbfKXsrXNIwfn3kuaQ+oA+399//zyCak/S38CIXN7P5wBbyfmk34E9q7x/M2lo7jdz/k/lG7Vf\n9Kec2WHAeODqsn/3J2xXK8uQFFN4hDCESHoXqXn0XbYXtbs8YWiIpqcQhhDbt5JGad0paWq7yxOG\nhqhRhBBCqClqFCGEEGqKQBFCCKGmCBQhhBBqiuGxoaXylBMmjXHvIQ01vIE0s2nVDjJJ3yfN27Ms\n8A3bW/V6/23A9bbXlHQEcI/tXzWhvB8nDcV9hTQk+XfA4bZfyg8bbm/77/09T4XzrkqaS2t9oLR8\n5TRXmKm1Rh7nkb6Tc5tdvjC8RY0iDITZtifZ3o40YdsUYMNaB9g+pOiTsLa/3aQg8X7SuPmdbU8k\nPdk7gvRUdsvkB9ouJz0gt2EOigcA5xV9wDOEVooaRRhoK5Ie3PoXLJ4SZHvbf1dap+AbtrfKD8N9\ng7IpO/IzAqeRpoAoX6PgXNLTuNcDV5Cmz9icVBt5v+1Zkj4FHJKPvSWfc4laCnAkqfbwTwDbCyQd\nSq95lPJ00z/Nn2VZ4CLb35E0gTTZXBdp7YbTbZ8t6WDSVNcv5Z+9bD9TluVk0vQTp5Z22L5H0jq2\nn8sz136f9FBZD3Cj7a/nh9XOItVCHiVNsV0q40eAg3JZZpPW8Sg/ZwiFRY0iDISV8oyyNwP3AWeW\nLsZ1OoF0IZ8MPFklzbqk2UC3Ic3S+1FJy5GadXbIx1abUXU94I7yHbbnVZgAcWXg8lxD2pI0F9Fy\npGlY/pbnUNqWND8YpLmfPpCn6P4+MKGv8+Zzl+Y1+ghpwZwtSTOi7ihpW2B70pPTm5KmmdgQQNIa\npFlsS8FwBks+cR1CXSJQhIFQanrahjT/1TqSPtfXQRWsT6o5QJqLqpKnbf8lv36UdNf/X8CjZVNE\nXFLl2IUUm9n3KWBrSbeSai9j83muAbbPNZydSQsHQbrrv1bSV0nrI9zTK7++zrs5qe+hJ09yeAsp\nOKwP3Jr3v0R60A5S896qwK9zzexjeTuEhkSgCAMqz311Ea+u/FfeoT26j8O7SNOfQ/ULa+/ZZbtI\nv+fl01lUm9TwHtJd+2KSRua1CModQuro3jLXHl6ENGMwqUZzHuluf0befyiwC/AscLmk91Y477t6\n7UPS+rmZq3enf1feV/59wKvfyTxgZg7Ok2xPtD3gqxKGoSMCRWiHbUiL6gC8QJqKHaDWynCQlhqd\nmF9vX8f5HgTeKqm0hOmHq6T7FnC8pDfB4uVgT2Tpyf3eAPw1T5z4QVIT0xhJe5CWDb0e+CzwRkkr\n5XU5HrP9I9JsuJuVZ+a02uGLefQW+dzrkfpbVufVVfa68toI2+Z9fwW2yPuXJdU8IDVjbSZplZzX\nVKU1xENoSHRmh4GwUtlMraNJ6wCUpuA+EThL0v1UmGa6ly+TZrP9B9XX5FiK7WckfRP4naRHSR3h\nb6qQ7jpJXwAukVSqmVxHWgyp3NnAL/IssL8kdWD/HNiPNCvsPNLd/ndsz84X8TskPUcadrtvhWK+\nH/iepHuBZ0gLFH3UtiU9QKpx/JZUa7jc9u9yINuT1OT0KHma7dx5fzBwlaRSB3q1tbND6FPM9RSG\nhfx8xK/yimiHAnL11f9CCGWiRhGGi/HAjZKeJ93V79Pm8oTQMaJGEUIIoabozA4hhFBTBIoQQgg1\nRaAIIYRQUwSKEEIINUWgCCGEUNP/A2jMFqtuZG7CAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "floor_bin = np.concatenate((np.arange(0,11,2) , [15,20,30,50,100]), axis=0)\n", "\n", "tower_data['FLOOR_BIN'] = pd.cut(tower_data.NumFloors,floor_bin)\n", "\n", "pivot = tower_data.pivot_table(index='FLOOR_BIN',columns='BC',values='Identified',aggfunc='sum')\n", "\n", "hmap = sns.heatmap(pivot.fillna(0), cmap=\"Blues\");\n", "\n", "hmap.set(xlabel='Building Class Code', ylabel='Number of Floors', alpha=0.8, title = '# of Cooling Towers');\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More Data Preparation\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "246335\n" ] } ], "source": [ "## Cleaning the input data before running any models. We don't want to include people's homes who would not have\n", "## commercial cooling units. \n", "\n", "testset = CleanInputData(tower_data)\n", "print len(testset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Specify a model\n", "\n", "It it tempting to include every available variable, but this can lead to a model that fits the training data extremely well but does not generalize." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Identified ~ Log_BldgArea + NumFloors + OfficeArea + RetailArea + ComArea + C(BC) + C(BoroCode)\n", "246335\n", "246334\n" ] } ], "source": [ "# Lots of variables available, Adding too many leads to overfitting.\n", "\n", "## Numerical Variables\n", "I_Vars = [ 'Log_BldgArea', 'NumFloors', 'OfficeArea' , 'RetailArea', 'ComArea'] #,'LotArea','ComArea','ResArea','OfficeArea','RetailArea',\\\n", " #'GarageArea','StrgeArea','FactryArea','OtherArea', 'OtherArea','NumBldgs',\\\n", " #'UnitsRes','UnitsTotal','LotFront','BldgFront','LotDepth','BldgDepth','AssessLand',\\\n", " #'ExemptLand','YearBuilt','BuiltFAR','ResidFAR','FacilFAR','CommFAR']\n", "\n", "## Categorial Variables\n", "C_Vars = ['BC', 'BoroCode']#,'LotType','BsmtCode'\n", "\n", "C_Vars_format = [ 'C('+ c + ')' for c in C_Vars ]\n", "\n", "s = ' + '\n", "\n", "Deps = s.join(I_Vars)\n", "Cats = s.join(C_Vars_format)\n", "\n", "Spec = 'Identified ~ ' + Deps + ' + ' + Cats\n", "Spec_I = 'Identified ~ ' + Deps \n", "Spec_C = 'Identified ~ ' + Cats \n", "\n", "\n", "print Spec # this is the model string for logistic regression\n", "\n", "\n", "print len(testset)\n", "\n", "## Dropping records that aren't valid for the variables in question\n", "testset.dropna(subset=C_Vars, inplace=True)\n", "testset.dropna(subset=I_Vars, inplace=True)\n", "\n", "print len(testset) # appears to be only one \n", "\n", "\n", "for c in C_Vars:\n", " testset[[c]] = testset[c].astype(str)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Splitting into training and testing data (30% testing)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/srimmele/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py:3863: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n", " raise_on_error=True)\n" ] } ], "source": [ "y, X = dmatrices(Spec,testset,return_type=\"dataframe\")\n", "\n", "y = np.ravel(y)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)\n", "\n", "\n", "\n", "## choice to normalize variables if wanted, makes a difference with the SVM classifier\n", "normed = X_train[I_Vars] \n", "normed = (normed - normed.mean()) / normed.std()\n", "X_train.update(normed)\n", "\n", "normed = X_test[I_Vars] \n", "normed = (normed - normed.mean()) / normed.std()\n", "X_test.update(normed)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing a simple logistic regression with varying regularization\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(inverse) regularization: 1e-05\n", " precision recall f1-score support\n", "\n", " 0.0 0.99 1.00 1.00 73355\n", " 1.0 0.82 0.14 0.24 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.959589649212\n", "(inverse) regularization: 0.0001\n", " precision recall f1-score support\n", "\n", " 0.0 0.99 1.00 1.00 73355\n", " 1.0 0.79 0.20 0.32 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.969697651768\n", "(inverse) regularization: 0.001\n", " precision recall f1-score support\n", "\n", " 0.0 0.99 1.00 1.00 73355\n", " 1.0 0.73 0.26 0.38 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.983183976862\n", "(inverse) regularization: 0.01\n", " precision recall f1-score support\n", "\n", " 0.0 0.99 1.00 1.00 73355\n", " 1.0 0.71 0.30 0.42 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.986435988068\n", "(inverse) regularization: 0.1\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.68 0.35 0.46 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.988877774124\n", "(inverse) regularization: 1.0\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.67 0.37 0.48 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.990118091982\n", "(inverse) regularization: 10.0\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.67 0.37 0.47 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.99034816886\n", "(inverse) regularization: 100.0\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.67 0.37 0.47 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.990365146861\n", "(inverse) regularization: 1000.0\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.67 0.37 0.47 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.990366045696\n", "(inverse) regularization: 10000.0\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 1.00 1.00 73355\n", " 1.0 0.67 0.37 0.47 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", "\n", " score: 0.990366819693\n" ] } ], "source": [ "for c in np.arange(-5,5):\n", " \n", " model = LogisticRegression(C=10.0**c, penalty='l2')\n", " model.fit(X_train, y_train)\n", " \n", " predicted = model.predict(X_test)\n", " print '(inverse) regularization: ' + str(10.0**c)\n", " \n", " print metrics.classification_report(y_test,predicted )\n", " probs = model.predict_proba(X_test)\n", " \n", " print '\\n score: ' + str(metrics.roc_auc_score(y_test, probs[:, 1]))\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def ModelSummary(y_test, predicted):\n", "\n", " TP = float(metrics.confusion_matrix(y_test, predicted)[1][1] ) # true positive\n", " FP = float(metrics.confusion_matrix(y_test, predicted)[0][1] ) # false positive\n", " FN = float(metrics.confusion_matrix(y_test, predicted)[1][0] ) # false negative\n", " TN = float(metrics.confusion_matrix(y_test, predicted)[0][0] ) # true negative\n", "\n", " print 'Recall %.3f' % (TP/(TP+FN))\n", " print 'Precision %.3f' % (TP/(TP+FP))\n", " print '%.3f of buildings sampled have towers' % (sum(y)/len(y))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Results\n", "\n", "* We are identifying on the order of 30% of true towers in the testing set, not ideal\n", "* But, 65-70% of our predicted towers are true towers, not bad when compared to picking randomly (< 1%)\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Recall 0.366\n", "Precision 0.669\n", "0.007 of buildings sampled have towers\n" ] } ], "source": [ "ModelSummary(y_test, predicted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Another way of showing this with a ROC curve\n", "\n", "Aiming for higher recall (y-axis) will dramatically increase the the false-positive-rate (x-axis). The false positive problem is particularly important in the cooling towers case, because the classes are so imbalanced - less than 1% of buildings are identified with a cooling tower. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEVCAYAAAALsCk2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xd8FMX7wPHPpUEgAQKEKoK0oUnvTQRUFOGLilS7iAgo\nICVKE0E6SFNEUBQLYuEHIlZQQYooRREpg0gHgdBCQurl7vfHbsKReoFcNpc879eLF3e3e7vPzV32\n2ZnZnbE5nU6EEEIIVz5WByCEECL3keQghBAiFUkOQgghUpHkIIQQIhVJDkIIIVKR5CCEECIVP6sD\nyM+UUk7gX8BuvuQHbARe0FpfNdcpC0wFWpvrxQKLtNaLXLYTAIwHugM289/nwKta6/g09pul9T1J\nKbUBqA5cSbHoDq312Rvc5k6gMOAPVAa0uWi/1vqBGwz1piil+gHdtdad0lj2BPCI1rrjTWz/Ga31\nEqWUL7AX4/dyGVgPVATuB1YCrbXW529g+2WBRlrrtUqpFsA4rfV9Nxpvim1/hPFbLK21jnB5vR3w\nM/Co1vqjLGyvHfCO1rpqBuv4AQlAWa31mRsMPU+T5GC9dlrrkwBKqQLACmA0MEYpVRgjWawAamit\n7UqpSsD/KaVKa61fNbfxIcbBsLnW+rJSqjjwAfAe0DeNfWZ1fU8blZU//sxorRsBmGV1SGtdI7u2\nnRsppW4BhgNLtNaJQA3z9duAVkCg1tqe9PoN6oCRcNZqrX8FsiUxuDgHPAQsdXmtN3Aim/cj3CTJ\nIRfRWscppb4DupovPQGc01qPd1nnqFLqcWCbUmoucAvGH2pFrfVlc52LSqmngNtT7kMpVTuz9c2z\n+XeSDtiuz83azmgztjVAQa318+Z6JYFjQDmgPPAWUBaIA57UWu/ISnkopQoCc4E7AQfwDUYiSVRK\nHcU4kPQF7tJaH8/CdjsAs4BAjLPr54Bo4HutdSVznSVAda31Hebzb4BFwJ/m56oGODFqed8rpaoC\nPwGrgdpa6w4Z7N8XeAPoDPwHbHZZFgK8CTTG+PucoLX+wOVM91GMRFAGmKK1XgBsBUorpQ4A9TBq\nlxUxag2+wN9KqZ5m7GW11meUUqOBfhi10S+11iPN/U8A+pj7/ht4BCOpzAV8lVLBGCcRb2itayil\nAoH5QFuM7+gr4GXzOzoJvAo8A1QAPtBah6VTLN9iJIOlZhz+wF3ANpeyqQQsAW41y2Kq1vpjc9kr\n5ucJB9a6vKcgxnd9NxAAvKW1np7edyOukT6HXMQ8MPTB+GMHuAOXH3oSrfUejDOtpuY627TWF1Os\nc05r/WMau8nq+mmxaa0VRlNUF5fXuwA/ApEYB8kPtNbVgQHAl+YBLiuGYhxUagMNgTYYB5Akt2it\nVRYTQzDwGTDArFHMAZYDB4EAs/kEoD5QSCnlr5TywSjrjRi1rt/Nz9UFWG5+bwClzWXpJgZTZ6Ad\nxkH3TozvJMlcIMZc1gKYqpSq6bK8hta6AfAgMM2M7SngiPl5Es314oF7gHjz9b0uZdAOeAyoi3FC\n0F4p9YBSqhnwLNAII/kFAwO11r9jJMZPtdYpa5bDgVIY31EjjBrGwy7L2wDNzfJ7USlVJp0y2WKE\nlrz8LvM112bOd4AfzN9eF2ChUqqCUqouMNjcfxOggct7XjY/Sx3zX2+lVKqmPZGaJAfrbVBKHVBK\nHQaOYBxck85simOcCaXlrLm8uPnYXVldPy1rAcyDhk0pVc98/QGMA28NjAPGUnO9LRifo2U625th\nlkHSv93m652BxVpru9Y6BvgY4wzwujiyqCVwWGv9m/n8M4yaTgWM9u0WSqlSQBTGmXNDjAPoIXP9\nNhgJBa31QeBX4F5zWQCwyo0Y2mI0z0RrraMxkmySLsA8rbXD7HNZhVGuST40/98FFAJKuPOhU7jP\n3H+U1jrO/ExrzDKpqLWONJuntmL02WTE9TuKxki0rt/Rx+ZnOQGcxyjntDgw+kR6ms97AZ8mLTRr\nAO0xam1orY8Av2Ak17bABvMEJxFwbaLsAryptY7XWkeZyx7M5DMJpFkpN2intT5pNskcxDg7S+qg\nPo9x4EpLaYzaQwhGE467zmdx/bS41jpWAl2VUocw2qT7YhxMCwH7lVJJ6xUh/QNZen0OocAll+eX\nMJJOWnG467ptaq2dSqkIc7s/Y5yt+2AcGI9htNnHYyTtohid97+7fK4gjOYugLikCwkyURw47PLc\n9TMWxehTSvoNBAKfuCxP6rBNqiH4urG/lEq67t88qGP2cc1VSrU1F5XAqAFmJLPvKMLlcWIm8X4C\nzFdKvY1xwO+HkSSSYrabB/iU+yqUYj+u8RQDFiilZpjPC3CtZi4yIMkhl9Ban1dKzQdmAP8zX/4W\neAGY5LquUqoOxgHmd4w26zlKqXJa69Mu6xQDXgRe0Vq7jq64IbP1Sf1HHEL6vgDmYTRbbNRaRyql\nTgNXsqEj+CzXJ5QS3Hyt57ptms0yIebrP2P0pQQA3wPHMdrME4DFwBmMM9wGZk0Gl+2ke2VMGi5h\nJIEkoS6P/wPu11ofSLH97PxbPY9xsE3adgmM/pPBQCWgodb6qlJqOpnXTLLtO9Ja/26eJPUDftRa\nx7sk4XDATylVRGuddGVb0r6KcX3/mmt5ngZe01p/57qvbC7PPEmalXKX2UBLpVRSG/RHGH8Qs80O\nOpRStwLLgEla66vmQeRTYIVSqrS5TnGMs7CSKRIDbq7/H0bHJuZli9UziPlXjFrMExhNNGCccZ9U\nSnU3t1FSKfWJeWaaFWuBp5VSvuZ7HwW+zuI2UtoG3KqUamI+74txOfFJrfW/GGeibTE+1z6gJkYb\n9lbzMt/vMNrlUUoVVkq9p5TKak3sV6CTUirQ/FzdXZZ9idFHg9nfMU8pVT+T7SUAQWZHtzvWAN2U\nUkXN39VXQEeMz77fTAy3YTSXBbnso1ga21oL9DO/oyCMDuyb+Y4+Bcbi0qQExsUawDqgP4BSqhpG\nE+GPGOXZVilVwjzou/aLfAk8Y8ZnU0q9opRybfYS6ZDkkItorSOBacAspZTNbD+9C6OWcMC8GmUN\nxhUXs1ze+gzGWe8mc52N5vPB6ewqs/VfBzorpfZjdFz+kEHMToymh44YB5mk13oBg83t/4JxJuhO\nk4urBRiXMu4FdmAciD7P8B2ZMMu4B7DIjO0ZoI9LEv0V8NdaX9JaO8z9n9Bax5rL+wN3me/dCRzU\nWp/KYhirMWp9BzHK3bXvZAxQSimlMT63A9iTyfb+BK5i1GwyTVRa680Ytb2/zH1swyjXt4CO5meb\nDgwD7lFKPY9Rk7pHKfVris3NNfe7F9iO0UfiTr9Lej7BqLX+lMay/sDdZnwrMa6AO21eBfcuRjls\nx/i9JZmPUXvYi3G/S1Vcrg4T6bPJfA5CCCFSkpqDEEKIVCQ5CCGESEWSgxBCiFQkOQghhEjFa671\ntdsTnZcuRVsdRq4QElIIKQuDlMU1UhbXSFlcExoabLuR93lNzcHP70ZuBM2bpCyukbK4RsriGimL\nm+c1yUEIIUTOkeQghBAiFUkOQgghUpHkIIQQIhVJDkIIIVKR5CCEECIVj97nYM478CUwR2v9Ropl\nHYEpGHMHfKO1npTGJoQQQljAY8nBHKd+AcZ462mZjzHH7Slgo1JqpdZ6n6fiyQ2cTifnLsWQ6Li5\nkXBjHXDxYlZHv86bpCyukbK4RsrC4HQ6CQ0NvqH3erLmEIcxV21YygVKqcrARXNeWZRS32BMTO6V\nySHR4cB15PMLV2LZf/TaTIXhl2P4698LnDovP1YhhOc5nQ5O/P0jJ/b+yMVT+29oGx5LDuY8yHaX\naf5clcGY9i/JOaBKZtu80Qx4s06ei+T85ZhUr6/77Th7/j3Ppci4LG2vbYPyFC7on13hCSFEslPH\n/mHFktc4rHdToGChG95Obhlbya2xP8LDI7N1p06nk0uRcTjMZp5d/5xnxY//EBJcIHmdmDg7sfGJ\n6W0iWUhwAcqVuPZFxNkdtG94bVKuwAA/bq9cAh+fGxrm5DqhocHZXhbeSsriGimLa/JzWfTo8SKH\n9W66dOnGa69Nu+HtWJUcTmPUHpKUN1/LUR/+cJANf6Se4fFSZByhxQoCEFzIn0IF/ShaOID6VUte\nt57TCdUqFKNi6WAKFcwteVYIkd/s37+PmjVrATB16kyOHDlMx4733NQ2LTmiaa2PKqWKKKUqASeB\n+7l+UvBs43Q6Wbv1KOcjjCmAo2Pt7DwYft06NSuGUDy4AA4nlCkeyL3NK+LnK1f5CiFyt9OnTzF6\n9Ci+/XYt33yznkaNmlClSjWqVKl209v25NVKjYDZQCUgQSnVHVgDHNFarwKew5hMHOBTrfVBT8Tx\ny+7TrNp0JM1lVW8pSmjRQJ7pUssTuxZCCI+w2+28884ipk+fwtWrUTRv3pIiRYpm6z482SG9E2iX\nwfJfgBae2n+iw8H73x5gy54zADRWoTzUzujz9vWxUaJIQWy2m2//F0KInLRz53ZGjBjK3r17KF68\nOFOmLKRXr77ZfjzLsw3lz8zYkPzY18fGs/+rja+PNBUJIbzb6tX/x969e+jd+xHGj59EiRIlPLKf\nPJkczkdcu+y03/01aV67DD5SSxBCeCGn08mPP/5A+/Z34ePjQ1jYGDp37kLz5i09ut88eSq9+a//\nAKheoRgt65SVxCCE8EqHDx/i4Ye70afPw3z88QcABAUFeTwxQB6tOazZchSAO+qXszYQIYS4AXFx\nccyf/zrz579OXFwcHTrcRZs2d+RoDHkuOazfcSL5cYNqJTNYUwghcp+tWzczfPgL/PvvIcqUKcvk\nydO5//7/5fgFNHkuOSxf/w8A9aqUoGBAnvt4Qog87sSJ4xw5cpj+/Z8jLGwMwcFFLIkjTx09Y+Ls\nyY9f6F7XwkiEEMI9DoeDFSs+pnPnLhQtWowePXpTv35DlKphaVx5qkN64rIdAJQrWVjuYRBC5Hp/\n/72Hzp3vYujQQcyaZYyDZLPZLE8MkMdqDmcvRgPQu8PN3zouhBCeEhUVxcyZU1m8eCGJiYl06/Yg\ngwYNsTqs6+SZ5OB0mVCh9m3FLYxECCHSt2XLJgYPfpZTp05SsWIlpk+fTfv2d1kdVip5Jjn8tv8s\nAGWK3/j45UII4WnBwcFcvHiBF18cyZAhIwgMDLQ6pDTlmeTw/e/GJawNqsvlq0KI3CMhIYHFi9+i\nffuO1KxZi7p167Nr1z6PDXuRXfJMcjh2xpjYo2OjChZHIoQQht9//42RI4eyf/9efvvtVz74wBiI\nOrcnBshjVysBFAsKsDoEIUQ+d+nSRYYPH8L999/F/v17eeSRx5k79w2rw8qSPFFzsCc6AChZVIbh\nFkJY6/fff+OJJ3pz/vx5atasxYwZc2nWrLnVYWVZnkgOC1f9DUCBAF+LIxFC5HdVq1alYMFAxo2b\nyIABg/D397c6pBvi9cnB4XTy56HzAHRuUdHiaIQQ+U1sbCzz5s2mVq06dOnyP4oXL8G2bX8QEODd\nTdxenxwiouIBo9bQvFYZi6MRQuQnGzb8RFjYixw5cpiGDRtx//1dsdlsXp8YIA90SJ8KjwKgUulg\niyMRQuQXZ8+eZcCAp+jRoxvHjx9jwIDBfPHFmjzV5+n1NYfD/10BoEKpIIsjEULkB/v376NLl3u4\nciWChg0bMXPmPG6/Pe8N9On1yUEfvwyAurWYxZEIIfKD6tUV9eo14P77u/LYY0/i65s3L4Tx6uSQ\nYE9k/7FLAKhbQyyORgiRF0VFRTJ9+mQKFw7ipZfG4uvryxdffJmnmpDS4tV9DsfORiU/Dgr0zsvF\nhBC5k9PpZO3aNbRq1YS3317I11+vIT7euAAmrycG8PLkcPKckRy6tb7N4kiEEHnJ8ePHeOSRHjz1\n1CNcuHCeESNeYt26X/LEVUju8upmpbiERONB3k/iQogccvbsGdq2bUZ0dDRt2tzB9OmvU7Vq/psj\nxruTQ7yRHKqUK2pxJEIIb2e32/Hz86N06TI88UQ/ateuQ/fuPfNFE1JavDo56BPGlUp+vvnzyxNC\n3LyLFy8wadIrnDt3lo8++gybzcaECa9ZHZblvLrPIekGuNBiuXOyDCFE7uV0Olmx4mNatWrMxx9/\nwKlTp7h8+ZLVYeUaXp0cigYVACAkuIDFkQghvMnBg5oHHujMCy88R0xMDBMmTGb9+l8ICZEphpN4\ndbPSiXNRFC0ckG/bBIUQWRcdHU3Xrvdw8eJFOnXqzJQpM7jlFpkkLCWvTQ5OpxOAiKvxFkcihPAG\nly9folixEAoVKsSECZMpWrQY997b2eqwci2vTQ4xcXYAasiwGUKIDJw58x/jxr3M7t1/sHHjNgID\nA+nVq6/VYeV6Hk0OSqk5QHPACQzRWm93WTYIeARIBHZorYdmZdsnzBvgfH29uttECOEhiYmJvPfe\nEqZMmURUVCSNGzfl4sULlC9/i9WheQWPHVmVUncA1bTWLYCngfkuy4oAI4E2WuvWQC2lVJbm0dt+\n4BwAZYsXyraYhRB5w86dO+nUqT2jR4/C19eXWbPmsXbtD5IYssCTp90dgNUAWuv9QIiZFADizX9B\nSik/oBBwMSsbP33+KgC1KsnVBUKIaxwOB48//ji7d//Bww/3YuvWnTz22JP4+EgrQ1Z4slmpDLDT\n5Xm4+doVrXWsUupV4DAQA6zQWh/MbIOhodcm9Ikx745uVq88RQrnn/FOkriWRX4nZXFNfi0Lp9PJ\noUOHqFbNGOZi8eLFxMbG0r59e4sj81452SGdfL2pWYMYDVQHrgA/KaXqaa13Z7SB8PDI5McOh3G1\nUkxULHHRcZ6IN9cKDQ2+rizyMymLa/JrWRw9eoSXXx7B1q2b2bTpd269tSItW7YkPDwyX5ZHSjd6\nwuDJetZpjJpCknLAf+bjmsBhrfV5rXU8sAlolJWNnzgXRYC/Dz4+co+DEPlRfHw8c+fOom3bZvz4\n4zqaNMlSt6XIhCeTww9AdwClVEPgtNY6KY0fBWoqpZLGvWgM/OPuhh3mPQ5CiPzp11+30L59K6ZM\nmUhwcBEWLXqXzz9fza23VrQ6tDzDY81KWuutSqmdSqmtgAMYpJR6AojQWq9SSs0EflZK2YGtWutN\n7m7bbncAUL2C3OMgRH60ePFb/PPPQZ58sh+jR4+naFE5FmQ3j/Y5aK1fSvHSbpdlbwNv38h2r8Ya\nN8BdjUm44diEEN7D4XCwdetmWrduC8DkydMZPHgIjRo1sTiyvMsrr+2KNyf5KR0i9zgIkdcdOLCf\nbt3u48EH72fDhp8AKFeuvCQGD/PK4TOizaEzCheUeaOFyKuio6N5/fUZLFw4H7vdTufOXaleXVkd\nVr7hlckhae7omHi7xZEIITzhxx9/ICxsOMePH6NChVuZOnUmd999r9Vh5StemRx+Txo6o4Q0KwmR\nF/3xxy5Onz7F888P48UXR1G4cGGrQ8p3vDI5FAzwBaBqeZk7Woi8wG638/nnK+jevSf+/v48//ww\n7r//f9SoUdPq0PItr+yQTrqUtWRRmR5UCG+3a9cO7rnnToYMGci77xoXMBYoUEASg8W8suZwwpw7\nOriQdEgL4a2uXIlgypSJvPfeOzidTnr27EP37r2sDkuY3Ko5KKVKKKUam48tr20EBRpJIcDf1+JI\nhBA34vvvv6Vly8YsXbqEqlWrsWrV1yxYsIiSJUtaHZowZXqgV0r1BrYB75svLVBKPe3JoDJz/GwU\nRaTWIITXcjqdXLkSwcsvj+Pnn7fSqlUbq0MSKbhTC3gRqIcx5DbACKC/xyLKRNLc0dFxiVaFIITI\nori4OObNm825c8aVhp063cf27X8xbNhIAgLy35D73sCd5BChtY5OeqK1jsGYqMcSkeaQGSWKFLAq\nBCFEFmze/At33tmSyZNfZfbsacmvly5dJoN3Cau50yF9Xin1OBBojq7ak2u1iBwXEWXkpWJBkhyE\nyM3Cw8OZMGEMn3++ApvNxtNP9+fll8dZHZZwkzvJYQDwGhAMvANsxpgT2hLRsUbNoXK5IpmsKYSw\nyvr13zNw4DNcvnyZunXrM2vWXOrXb2h1WCIL3EkOnbTWg11fUEoNABZ5JqSMnQw35o5OMO91EELk\nPrfdVhlfX18mT57OU0/1x9dXriz0NukmB6VUA6AhMEIp5TpOhT8wHouSwwlzXKVyJeV2eiFyi6tX\nrzJr1jQ6d+5C48ZNqVKlGrt27SMwUG5U9VYZ1RxigdJAMcD1OjMHMNKTQWXkYmQsABXL5M+J1IXI\nbb7//ltefnkEJ0+e4N9/D/HBB58ASGLwcukmB631fmC/UuonrfU212VKqYc8Hlk6ihY2LntLGl9J\nCGGNU6dOMnr0KL79di1+fn4MGTKcYcMsO28U2cydPofTSqkZQNKtiwWA9sBKj0WVgRjz/obAAl45\n8ocQecLWrZvp0+dhoqOv0rx5S2bMmCNjIeUx7tzn8CFwEWgB7ARCgUc9GVRG9PFLABSQoTOEsEzd\nuvWpUqUq8+Yt5Msvv5XEkAe5kxzsWutpwFmt9ZtAV2CQZ8NKn7+fEbLUHITIORERlxk1ahgffbQM\ngKCgINav/4XevR/BZrNZHJ3wBHeSQ6BS6hbAoZSqDCQAlTwaVQYirsYnD7wnhPAsp9PJypWf0bJl\nY95//10+/XR58hA2khTyNneSwwygAzAT+BM4D2z1ZFAZKRjgR6JD7nEQwtP+/fcfunf/H88914+o\nqEjGjp3AypVfSVLIJzJtm9Far056rJQqDgRrrS95NKp0OBxOYuLscne0EB62b99e7r77DuLj4+nQ\n4S6mTZtNxYqVrA5L5KCMboLzAZ4B6gBbtdafaK3tSqk4pdSbWusc73eIibcDcOWqZeP+CZGnORwO\nfHx8qFmzFl26dOO++7pw//1dpbaQD2VUc1gAFAd+BQYopUoCe4HFwKociC2VpOlBbysrNQchstO5\nc+eYMGEMwcHBTJ/+OjabjbfeesfqsISFMkoO9bXWrQCUUu8Cx4CjQE+t9c4ciC2VCLPGIOMqCZE9\nHA4HH374Pq+9NoGIiMs0bNiI+Ph4mWNBZNghndx2o7W+CmigmVWJAcCeaFwlUaigXMYqxM36++89\ndO58FyNHDiUxMZGpU2fy9dfrJTEIIOOagzPF8zittaXTryVdpRQSLHM5CHEzzp49y733ticuLo5u\n3R5k4sSplClT1uqwRC6SUXIop5R6yuV5WdfnWuulngsrbbHxRm7y9ZHOMSFuRFRUFEFBQZQuXZpR\no8ZQu3Zt2re/y+qwRC6UUXL4letHY93m8twJ5HhyiDOTw9UYe07vWgivduLEccaMGcXFixdZs+Y7\nfHx8eP75oVaHJXKxjEZlfTInA3GHw7wzs0yJQpmsKYQASEhIYPHit5g5cwrR0dG0bNmaiIjLhIQU\ntzo0kct5Vc9u+OUYQIbrFsId27f/xsiRw9i3729KlCjB9Omv06NHb7lnQbjFo8lBKTUHaI7RDDVE\na73dZVkF4BMgANiltR6Q2faSLmWV37YQGYuJieHxx/tw/nw4jzzyOGPHTqB48RJWhyW8iDtjK90Q\npdQdQDWtdQvgaWB+ilVmA7O11k2BRKXUrZlt08/HCDckSK5WEiIlp9PJyZMnAGMWttdfX8CaNd/z\n+usLJDGILMs0OSil6imldiilDpjPxymlmrmx7Q7AakieVS5EKVXE3IYPRuf2GnP5IK318cw2GJdg\ndEgXKSzXYQvh6tChf+jQoQP33tuBK1ciAOjU6T6aN29hcWTCW7lTc3gDeAr4z3z+KfC6G+8rA4S7\nPA83XwNjwqBIYI5SarNSaqo7wR757woA/jLRjxAAxMbGMn36ZNq1a8HPP/9MvXr1iYmJtToskQe4\n0+eQoLX+SykFgNb6oFLqRq4ltaV4XB6YhzEkx9dKqc5a668z2kCRoAJAJJVvLZ7vJ/sJDQ22OoRc\nI7+Wxbp16xg4cCCHDh2ifPnyLFiwgG7dukmHsym//i6yiztHWLtS6jbMO6aVUvdy/YE+Pae5VlMA\nKMe12sd54JjW+l9zmz8CtYEMk8PR00Z1OTIimqh8/AcQGhpMeHik1WHkCvm1LJxOJ6NGhXH48GGe\nfXYQYWGjue22cvmyLNKSX38XabnRJOlOs9Jw4EuglVIqApgGPO/G+34AugMopRoCp7XWkQBaaztw\nWClVzVy3EcbYTRkqbM4AJ2dGIj9KTExk507jgj+bzcbcuQtZt24jkyZNJShIzpJF9nKn5hCvta6r\nlArFGF/pijsb1lpvVUrtVEptBRzAIKXUE0CE1noVMBR43+yc3gN8ldk2/7sQTfEicqWSyH/27NnN\nyJFD+euv3fz00xZq1KhJrVq1rQ5L5GHuJIevlFKXgY8w7ktwm9b6pRQv7XZZdghonZXtFQzw5eKV\nuKy8RQivFhUVyfTpk1myZBEOh4MHH3xY7m4WOSLTZiWtdXXgOYwO5K1KqbVKqZ4ejywNiQ6nTPQj\n8o21a9fQqlUT3n57IRUrVuKzz1azaNG7lC5d2urQRD7g1k1wWuudWuswjHsTjgEfejSqdNgTHfj5\nSn+DyB/WrfuOCxfOM2LES2zcuI127dpbHZLIRzJtVlJKlQUeAh7GuD9hBVDLw3GlkpjowOkEP1+P\n3dQthKUSEhJYu/ZLunV7CJvNxvjxk3j++WFUrVot8zcLkc3c6XPYgXHj23Ct9Q4Px5OupKlB4xMs\nnW9ICI/Ytu1XRo0ayoED+/Hz86dLl/9RokQJSpSQYS+ENdJNDkqpslrr/4A7Abv5WuWk5Vrrw54P\n75qkoTOk5iDykosXLzBp0it8/PEHADz22FO0adPW4qiEyLjmMBvoA3yPcQOca2O/E6ic1ps8JSk5\nyKWsIq9YteoLRo8eyYULF6hZszazZs2lSRN3hi0TwvMymuynj/nwPnPgvGRKqRwfzSsiyriENeXE\n1kJ4q/Pnw4mJiWHChMk888wA/P39rQ5JiGQZNSsVA0oAS5VSfbhWc/AHlgHVPR/eNWcuRAMQ4CeD\n7gnvFBMTw5Ili3jmmQEEBgby1FP96dy5K+XKlbc6NCFSyahZqQUwDKgP/OTyugOjqckSxYJkuG7h\nfX76aR3PHAI2AAAgAElEQVRhYcM5duwoTqeDIUOG4+vrK4lB5FoZNSt9C3yrlBqgtV6UgzGlyZ5o\nXK1UvEhBiyMRwn1nzvzHuHEv8+WX/4evry8DB77A008/a3VYQmQqo2alJ7XW7wHllVITUy7XWo/3\naGQp2M1LWX195CY44R1WrfqCESOGEhl5hcaNmzJz5lxq165jdVhCuCWjZiWH+f+NzN2Q7Y6eMcb7\nk+QgvEXp0mXw9fVh1qx5PPLI4/j4yGXYwntk1Ky0zPz/VaVUsNY6UilVGqMjektOBZikYIARaoDM\nAidyqcjIK8yYMZV+/Z6lYsVKtGzZml279spw2sIruTOH9AKgh1KqOLAVGAy85enAUjpiTvQTEiz3\nOYjcxel0smbNKlq2bMzbb7/JG2/MS14miUF4K3fquQ201u8CPYD3tdY9gaqeDSu1YkFGUigYIDUH\nkXscPXqEPn2606/f41y+fIlRo0bz2mvTrA5LiJvmzthKSY389wNjzcc5fvqe6DBuf5PhM0Ru8fXX\nX/Hcc08TGxtL27Z3MmPGbCpXzvHzJiE8wp3kcFAptQ8I11r/qZR6DLjo4bhScTiN5CAd0iK3aNCg\nIWXLliMsbAwPPNBdpq8VeYo7yaEfcDuwz3y+F1jjsYjS4Ug0koOPJAdhkQsXLjBx4ji6dXuIO+/s\nQLly5dm6dSe+vtLUKfIed5JDINAFmKiUcgLbgLkejSoNdodxZa0kB5HTHA4HK1Z8zKuvjuXSpUtE\nRkZy550dACQxiDzLnQb8JUAR4G3zcWnz/xx1+JRxtZI0K4mcdODAfrp1u4+hQwcRH5/ApElTWbz4\nPavDEsLj3Kk5lNZa93Z5vlYptcFD8aQrtFghzlyIJrCAOyELcfM2bdpIz54PYLfb6dy5K5MnT5ex\nkES+4U7NobBSqlDSE6VUYSDHBzhKsCfi62PDRzr9hIc5zYsfmjZtTps2d/DRR5/y3nsfSWIQ+Yo7\np+FvAweUUklThDYCxnkupLRFXI2Xy1iFR50+fYoxY8Jo1KgJgwcPoUCBAnz66SqrwxLCEpkebbXW\nS4FWGHM4vA+01Fp/4OG4UrlyNZ5EhyPzFYXIIrvdzuLFC2nVqglff72GX375Obn2IER+lWHNQSl1\nH1AD2Ky1/jJnQkpb0cIBnI5JsDIEkQf98cdORowYyp49uwkJCeG1196gd+9H5J4Fke+lW3NQSk0A\nxgDlgCVKqb45FVRaEh1OmT9aZKv9+/fRqVN79uzZTc+efdiyZSd9+z4mo6cKQcY1h3uANlpru1Kq\nKLAS+Dhnwkot0eGUzmhx05xOJ7GxsQQGBlKzZi2efXYQ99xzL61atbE6NCFylYxOkWK11nYArXUE\nYOndPg6HQ+5xEDfl8OF/6dnzAYYNG5z82sSJUyQxCJGGjJJDyh45S3voEh1OuTta3JC4uDhmz57O\nHXc0Z8OGn7h06SJxcXFWhyVErpZRs1ItpdQH6T3XWj/mubBSsydKchBZt3nzL4waNYxDh/6hVKnS\nTJ48na5dH5AOZyEykVFyCEvx/EdPBpKZqzEJhAQFWBmC8DLnzp2jd++HiI+P5+mn+/Pyy+MoUqSo\n1WEJ4RUynSY0N7kcFW91CCKXczgchIeHU7p0aUqVKsX06a9Tq1Zt6tdvaHVoQngVrxqoqEKpIKtD\nELnYvn17GTlyKFFRUaxf/wv+/v706fOo1WEJ4ZU8mhyUUnOA5hid2UO01tvTWGcq0EJr3S6z7cnw\nGSItV69eZdasaSxa9AaJiYl07foA0dFXKVq0mNWhCeG13DraKqVKKKUam4/dfc8dQDWtdQvgaWB+\nGuvUAtq6G6yfr3Qiiut99dVXtGnTlDffnEf58hVYvvxz3nlnmSQGIW5Spgd6pVRvjAl+3jdfWqCU\netqNbXcAVgNorfcDIUqpIinWmY1xF7ZbIq5Kn4O4JjY2lkGDBnH27BmGDh3BL79so2PHe6wOS4g8\nwZ1mpReBesDX5vMRwAbg3UzeVwbY6fI83HztCoBS6glgI3DU3WDLhQYRGhrs7up5Wn4tB7vdzr59\n+6hbty4QzEcffUTJkiWpVauW1aHlCvn1d5EWKYub405yiNBaRyulANBaxyilbuQUPrlNSClVHHgS\n6Ai4PUi+zekkPDzyBnadt4SGBufLctix43dGjhzGyZMn2LJlB6VKlaJt27aEh0fmy/JIKb/+LtIi\nZXHNjSZJd/oPziulHgcClVINlVLTMWoBmTmNUVNIUg74z3zcHggFNgGrgIZm53WG/PykQzo/unz5\nEiNHDqNz57vYu3cPnTt3wd/fqy60E8LruHO0HQA0AYKBd4BAoJ8b7/sB6A6glGoInNZaRwJorb/Q\nWtfSWjcHHgB2aa2HZbZBGVspf3E6naxc+RktWzZm2bJ3qV5dsWbNd8yd+yYhIcWtDk+IPC3T0y+t\n9WVgcGbrpfG+rUqpnUqprYADGGT2M0RorW9oeq0IuQku31m+/EOuXo1i7NgJDBgwmIAAuUteiJyQ\naXJQSp0gjUH3tNa3ZvZerfVLKV7ancY6R4F2mW0LoFRIoDurCS8WGxvLpk0buOuuTthsNmbPno/N\nZqNixUpWhyZEvuJOw21rl8cBGJeoWnKUlvkc8raNG38mLOxFjhw5zDffrKdRoyZUqnSb1WEJkS+5\n06x0LMVL/yilvgcy7UDObjIqa9507tw5xo9/mf/7v8/x8fHhmWcGUL26sjosIfI1d5qV2qd4qQJQ\nxTPhZExyQ97z4Yfv8+qr47hyJYL69Rswa9Y86tatb3VYQuR77jQrjXN57MS4iW2AZ8LJmNQc8p6D\nBzVOp5OpU2fxxBNP4+tr6YSDQgiTO8lhuNZ6l8cjcYP0OXi/qKgoli//gH79BuDj40NY2BgGDx5C\n6dJlMn+zECLHuJMcZmHctGY5mb3Lu33zzVpGjx7J6dOnKFYshB49ehMUFERQkAzFLkRu405yOK6U\n2oAx+F7yjQZa6/GeCio90qzknU6cOM6YMaP47rtv8Pf358UXR9GlSzerwxJCZMCd5HDE/Ge5yGi5\nCc7bLFu2lFdeGU10dDStWrVhxow5VKtW3eqwhBCZSDc5KKX6aq0/1lq/mpMBZaR4cAGrQxBZVLBg\nQQIDA5k+/XV69OgtTYNCeImMxlZyZ86GnCUHllzv0qWLjB8/moiIywD06NGbbdv+oGfPPpIYhPAi\nXjXMqRxbci+n08lnn31Cq1aNWbToDd55523AuIhAZmUTwvtk1OfQUil1PI3XbYDTnbGVspvkhtzp\n0KF/GDVqGJs3/0KhQoUYP34Szz470OqwhBA3IaPk8AfQK6cCcYc0S+Q+y5d/yKhRw4iPj+eee+5l\nypSZVKiQ4+cNQohsllFyiE1jXCVLSWrIferUuZ0yZcoyceJU7r23syRwIfKIjPocfs+xKNwkBx7r\nnT17huee68f+/fsAqFu3Ptu2/cF9990v348QeUi6yUFrHZaTgbhDjj3WSUxMZOnSJbRs2ZiVKz/j\n/fffSV7m5ydTdgqR13jVX7UkB2v89defjBw5lD/+2EWRIkWZMWMOjz76hNVhCSE8yLuSg/Q65LjV\nq1cyYMDTOBwOHnzwYV59dQqlS5e2OiwhhId5V3KQ3JAjnE5jVlibzUbbtu1o2LAxYWFjuOOOOy2O\nTAiRU7zqJjipOHjesWNH6dv3YdasWQVA8eIl+Oab9ZIYhMhnvCo5yHwOnpOQkMD8+a/Ttm0z1q//\nge+//9bqkIQQFvKyZiVJDp6wbduvjBo1lAMH9lOyZCizZ8/noYd6WB2WEMJC3pUcrA4gD/rllw10\n794Vm83G448/zZgx4ylWLMTqsIQQFvOu5CDZIVs4nU7sdjv+/v60atWGnj378PjjT9G4cVOrQxNC\n5BJe1ecgdYebp/UBunW7j5kzpwLg6+vLggWLJDEIIa7jVTWHmDi71SF4rejoaObOncWbb84jISGB\n0NBSOJ1O6ccRQqTJq5JDUKC/1SF4pZ9+WseoUcM5fvwot9xSgSlTZtKp031WhyWEyMW8KjlIq1LW\naX2AXr0ewtfXl0GDhjB8eBhBQUFWhyWEyOW8KjlIbnBPYmIikZFXKFYsBKVqMG7cRNq370jt2nWs\nDk0I4SW8qkNa2scz9+efu+jUqT3PPdcveRiM558fKolBCJElXpYcrI4g97pyJYKXXx7BPffcye7d\nf1C8eAni4uKsDksI4aWkWcnLOZ1O1qxZxdixL3H27BmqVq3GjBlzaN26rdWhCSG8mEeTg1JqDtAc\ncAJDtNbbXZbdCUwFEgEN9NNaOzLanjQrpXbhwgWGDh2M3Z5AWNgYBg8eSoECBawOSwjh5TzWrKSU\nugOoprVuATwNzE+xymKgu9a6FRAMdMp0o5IbAIiPj+fw4UMAlCxZkoULl7Bx4zaGDw+TxCCEyBae\n7HPoAKwG0FrvB0KUUkVcljfSWp80H4cDJTLboOQG2Lp1M/Xr16dnzweJiYkB4N57O1O5chWLIxNC\n5CWebFYqA+x0eR5uvnYFQGt9BUApVRa4GxiX2QZDQgoRGhqc/ZF6gfDwcEaOHMmyZcuw2WwMHDiQ\nkJBAgoPzZ3m4yq+/ibRIWVwjZXFzcrJDOtWJv1KqFPAVMFBrfSGzDURcjiE8PNITseVaDoeDTz75\niIkTx3Hp0iXq1KnLu+8u4bbbahIbC7Gx+as8UgoNDc53v4n0SFlcI2VxzY0mSU82K53GqCkkKQf8\nl/TEbGL6Fhirtf7BnQ3mx/5ou93OwoXziY9PYNKkqfzwwwaaNpVB8oQQnuXJmsMPwKvA20qphsBp\nrbVrKp8NzNFaf+fuBvPL1UpXr15l9+4/aNmyNQEBASxatJQSJUpQrlx5q0MTQuQTHksOWuutSqmd\nSqmtgAMYpJR6AogAvgceA6oppfqZb1mutV7sqXi8xbp13/HSSyM4fz6cTZt+59ZbK3L77XWtDksI\nkc94tM9Ba/1Sipd2uzzO8jWXebnicPr0KcaMCePrr9fg5+fHwIEvULJkqNVhCSHyKe+6QzoPZgen\n08nixQuZNm0yV69G0axZC2bOnEuNGjWtDk0IkY95V3KwOgAPsNlsbNv2KwEB/kye/Ca9evXFx8er\nhrwSQuRBXpUc8kp2iIi4zDffrKV370cAmDZtNr6+vpQsWdLiyIQQwuBVycHHy5uVnE4nq1evZNy4\nlzl37iwVKtxK69ZtKV26tNWhCSHEdbwqOXizw4f/JSzsRTZu/JmCBQsyevR4mjZtbnVYQgiRJq9K\nDomJTqtDuCELFsxlxozJxMXF0b59R6ZNm02lSrdZHZYQQqTLq5KDv593dtTGxcVSrFgIkydPp0uX\nbnnyqishRN7iVUdbbzmmhoeHM3nyqyQkJADw/PPD2LJlO127PiCJQQjhFbwqOeR2DoeDDz98n1at\nGjFv3mw+/3wFAAUKFKBIkaIWRyeEEO7zqmal3HzWvXfv34wcOZQdO34nKCiYqVNn0rNnH6vDEkKI\nG+JdycHqANLx5pvzee21V0hMTKRr1wd47bVplClT1uqwhBDihnlVcsit2aFKlaqUL1+B6dNn0aHD\n3VaHI4QQN82r+hxyS244efIEzz3Xj3PnzgHQqdN9bNmyXRKDECLP8KrkYPXlSgkJCSxcuIDWrZuy\ncuVnfPTR+8nLChTI8iCzQgiRa3lVs5KVqWHHjt8ZMWIo+/b9TfHixZk2bZZ0OAsh8iyvqjlYVXF4\n993FdO58F/v2/U3fvo+xdetOevXqm6uvnhJCiJvhVTUHq7Rrdyd169Zn0qRpNG/ewupwhBDC47ys\n5pAzZ+qHDv1D9+7/Y/v23wCoUqUaP/ywQRKDECLf8Kqag6dTQ2xsLPPmzWbBgjnEx8dTs2YtmjRp\nZuxbmpBELvbff6e5++5eVK9eAzAunqhcuSojRryEr68vsbGxLFjwOvv2/Y2fnx8hISUYPjyM0qXL\nAHDixHHmz5/N5cuXSEx0cPvtdRk0aCgBAQGWfabExETCwoYxbNgoype/BYA+fR6iWbOWDBkyHDA+\n99ixYbz77ofJ7/vmm684c+YETz01kMGD+xMbG0vBggWTlw8f/hK33VYZgB9++I5PP/0YPz8/7HY7\njz76BO3adQDAbrezZMlb/P77rxQsGIi/vz9DhoygSpWqWf4s8+fPZu/ev7HZbAwZMpyaNWtft3zT\npg0sW7YUf39/Ona8m4ce6onD4WDmzKkcOfIvfn5+jBw5mtDQUMLCXmTq1NkEBQVlOY6s8Krk4Mns\nsHHjz4waNYwjRw5TpkxZJk+ewf33d/XcDoXIZrfddhtvvLE4+fnkyRNYt+47OnXqzIIFr1OyZCjv\nvbccgL/++pPhw1/g/feXY7PZGDt2FEOHjqRBg0Y4nU7mzp3Je+8t4dlnB1n1cVi9+gvq1WuQnBgO\nHNiP0+lkw4Yfef75YW7PmDh69HgqVzYO6Lt27WDu3JnMm/cWf//9F599tpw5c96gSJGiXL0axYgR\nQwgKCqZx46YsX/4BUVGRLF36MTabjT17djN69Ag+/vgL/PzcP3T+8cdOTp48wdtvv8fRo0eYOnUi\nb7/9XvJyh8PBnDkzeffdjyhatCgjRrxAmzbt2L9/H1evRrFo0VJOnTrJvHmzmDFjLt2792Lx4jd5\n8cWwLJRm1nlVcvBUbvjii08ZOPAZfHx8ePbZgYSFjSEoKNhDexN53Wc/HWL7gXPZus0mNUrRo33W\nzlhr1arDyZMniI6+yrZtW/n009XJy+rWrU+tWrXZtGkDgYGFuPXWSjRo0AgwaskDB76AzXb9wddu\nt/Paa69w9ux/BAQUYOzYV9m+/TcOH/6XwYOHEh0dzWOP9eSLL76iV68HaN68FSEhIXz77desWPF/\nAHz77VoOHTpI796PMnXqJOz2BHx8fAgLG0eZMmWu298XX3x63UF03brv6NKlG5s2beDPP3fRsGHj\nLJUHQO3adThx4jgAn3/+CU8/3T953LPChYN49tlBLF/+AY0bN2X16pUsW7YiudXg9tvr8c47H16X\nGMLDz/Hqq2NTlHttBg4ckvx8587ttGnTDoBKlW4jMvIKV69GUbiwceYfEXGZoKAgQkJCAGjUqAk7\ndvzOpUsXk2sY5cvfwpkz/5GYmEjbtu14660FREdHU6hQoSyXgbu8Kjlk5+VKDocDAB8fHzp16sx9\n93Vh+PBR3H57vWzbhxBWsdvtbNq0kW7dHuLUqZNUrFgp1dlutWqK48ePERgYSLVq1a9bVqBAQVL6\n9tu1lChRggkTJrN+/fds3vxLuvf32O12mjdvSfPmLdm1aweHD/9L5cpV2LRpI717P8KSJW/Rq1df\nmjRpxq+/bmbZsncIC7t2kD1z5gwBAQHJB26Hw8HPP69n4cJ3KVCgAOvXf39DyeHnn39EKaPp7dix\nY1SrViNFmVTn+PFjREVFERBQgODg608SUz4PDS11XW0tLRcuXEjeJ0CxYiFcuHAhOTkUKxZCdHQ0\nJ04cp2zZcuzatZMGDRpSpUo1PvtsOT169ObUqROcPn2KiIjLFC9egho1arJ37180aeK5CcO8Kjlk\nV2rYs+cvRo0aSu/ej/LYY08SFBTE++9/nE1bF/ldj/ZVs3yWnx2OHDnC4MH9Afj330P07fsYbdu2\n459/DpKY6Ei1vtPpxMfHF7AlnyxlROsDNG7cBICOHe8BjPb99NSqZZz1tm17J1u2bKJ8+Vs4cuRf\n6tSpy7Rpkzh+/BjLlr2Lw+GgWLGQ6957/nw4oaGlkp//+ecuSpcuQ5kyZWjf/i6WLVvqdrPKlCkT\nKViwIOfPn6dcuXKMHj0BMM41HY7E69Z1Oklurkq5LLs4nddPWmaz2RgzZgJTp04kKCiIsmXL4XRC\nixat2LNnN4MHP0OVKtWoWPG25PeWKlWKs2fPeiS+JPkqOURFRTJ9+hSWLHkLh8NBnTpSSxB5h2uf\nw9ixo6hQoSIA5cuX58SJYyQkJODv75+8/qFDB2nbth3+/gGsXPnZdduKj4/n5MnjyW31AL6+Pjgc\nqQ9sSex2+3XL/PyMfd1xx52MG/cSlStXoVmzFthsNvz8/Jk0aTolS5ZM9/O4bnvduu84c+Y/nnjC\nuPE0NjaW7du3Ua9eQ65ejbrufZcvX6JUqWuJJanPYcuWTXz11arkfd56ayUOHNhPqVLX5nD/5x9N\npUqVCQoKwm63c/HiBYoXL5G8XOsDVK+ukmNzp1mpZMmSXLhwIfn5+fPnU33uBg0asXDhOwAsWvQG\nZcsaA3f27z8weZ0ePf5HSEjxdMsru3nVpaw3mh2cTidff/0VrVs35e233+TWWyvy6aermDlzTvbG\nJ0QuMXDgEBYtWkBsbCyFChWmZcs2LF16rfljz57dHDyoadGiNU2aNOPs2f/YvPkXwGjCeeutBfz4\n47rrtlmjRi127doOwJYtm/jgg6UUKlSYCxfOA0Ynd1pKlgzFZrOxfv33yVcC1apVh02bNgBGm/wP\nP3yX6j1JY5clJCSwZcsm3n9/efK/YcNGsn799xQqVIhixULYvdvYd0xMDD//vJ6WLVumiqNVqzbE\nx8ezdetmAB5+uDdLly7m0qVLAERHX2Xx4oXJIx889FAP5s9/PTnp/fXXn0yZMoH4+PjkbSY1K7n+\nc00MAE2bNmfDhh8BI7mULFmSQoUKX7fO8OEvcOnSRWJiYtiy5RcaN27GP/8cZMqUVwHYtm0r1avX\nSK7VhIeHX5fUPMHLag43lh02bdrIk0/2xd/fnxdfHMWQIcMJDAzM5uiEyD3KlStPu3YdWLbsXZ59\ndhBDhgxn0aIFPP54bwIC/ClWLIRJk6bh6+sLwOzZbzBjxmTee28J/v7+NGnSjCeffOa6bXbseA87\ndvzO4MH98fX1Y+zYCRQqVIgPPljK4MH9admydapO7CStW7fl889XMG7cRACefro/U6a8yvr132Oz\n2Rg9+pXr1i9TpgxxcXFcuXKF3bt3UbduPYoWLZa8/M47O7J48ULi4uIYN24ic+bMYMmSWBIT7fTs\n2RelFOHhkanieP75Fxk9egSNGjWhTp3b6d9/IMOHP4+/vz92u52HH+5FvXoNAOjT5zE++GApTz3V\nlyJFihIUFMS0aa9neRy122+vh1I1GTDgKWw2W3Jz2DfffEXhwkHccceddO3ajWHDBmOzwaOPPkmx\nYsUoUqQITqeTZ555jICAAowfPwkwTnb379/HyJGjsxRHVtlStn/lVl2Gf+mcNbAlxYuk7ihLS0JC\nAvHx8RQuXBin08mkSa/Qu/cjqTrevFFoaHCaP/z8SMrimrxWFp9/voK4uFgeeeSJLL83r5WFq02b\nNvDbb9sYMeIlt9YPDQ2+obNq72pWctNvv22jY8c2yW2BNpuN8eMn5onEIER+8cAD3fnzz12cOnXS\n6lByjejoq3z22SfX9UV4ilclh8zuUr506SIvvvg8Xbrczf79+3A6U18ZIITwDn5+fsyaNT/5JjgB\nhQoVZsGCtylSpIjH9+VVfQ7pcTqdfPbZJ0yYMIYLFy5Qs2ZtZs6cS9OmzawOTQghvJJXJYf0Kg7/\n/nuIIUMGUrBgQV555TX693/uukv2hBBCZI13JQeXxzExMVy6dJFy5cpTtWo15s59k1at2lChwq2W\nxSeEEHmFV/U5JFUdfvppPW3bNuOZZ55IvrOzV6++khiEECKbeLTmoJSaAzQHnMAQrfV2l2UdgSlA\nIvCN1npSZtsLP3eGkVPHs3r1/+Hr68t993UhISFB5m8WQohs5rHkoJS6A6imtW6hlKoJLAVcZ8uZ\nD9wDnAI2KqVWaq33pbe9o39+w91LPiEqMpJGjZowc+Zc6tS53VPhCyFEvubJZqUOwGoArfV+IEQp\nVQRAKVUZuKi1PqG1dgDfmOun6+CvK/D18WHmzLl8/fU6SQxCCOFBnmxWKgPsdHkebr52xfw/3GXZ\nOaBKRhuLi46QqdhchIbKfBNJpCyukbK4Rsri5uRkh3RGB3c58AshRC7iyeRwGqOGkKQc8F86y8qb\nrwkhhMgFPJkcfgC6AyilGgKntdaRAFrro0ARpVQlpZQfcL+5vhBCiFzAo6OyKqWmAW0BBzAIaABE\naK1XKaXaAtPNVVdqrWd5LBAhhBBZ4jVDdgshhMg53nWHtBBCiBwhyUEIIUQquXLgvewedsObZVIW\ndwJTMcpCA/3MmwrznIzKwWWdqUALrXW7HA4vR2Xym6gAfAIEALu01gOsiTJnZFIWg4BHMP4+dmit\nh1oTZc5RStUBvgTmaK3fSLEsS8fOXFdzcB12A3gaY5gNV/OBh4BWwN1KqVo5HGKOcaMsFgPdtdat\ngGCgUw6HmCPcKAfM30HbnI4tp7lRFrOB2VrrpkCiUirPjkaZUVmYozGMBNporVsDtZRSza2JNGco\npQoDC4Af01klS8fOXJccyOZhN7xcumVhaqS1TppDMRwokcPx5ZTMygGMg+KYnA7MAhn9ffgAbYA1\n5vJBWuvjVgWaAzL6XcSb/4LMy+ULARctiTLnxAH3kcY9Yzdy7MyNySHl0BpJw26ktewcUDaH4rJC\nRmWB1voKgFKqLHA3xheeF2VYDkqpJ4CNwNEcjcoaGZVFKBAJzFFKbTab2fKydMtCax0LvAocBo4B\nv2mtD+Z4hDlIa23XWsekszjLx87cmBxSkmE3rkn1eZVSpYCvgIFa6ws5H5IlkstBKVUceBKj5pAf\n2VI8Lg/MA+4AGiilOlsSlTVcfxdFgNFAdeA2oJlSqp5VgeVCmR47c2NykGE3rsmoLJL+AL4Fxmqt\n8/Id5hmVQ3uMM+ZNwCqgodlJmVdlVBbngWNa63+11okYbc+1czi+nJRRWdQEDmutz2ut4zF+H41y\nOL7cJMvHztyYHGTYjWvSLQvTbIyrEr6zIrgclNFv4gutdS2tdXPgAYwrdIZZF6rHZVQWduCwUqqa\nuW4jjKvY8qqM/j6OAjWVUoHm88bAPzkeYS5xI8fOXHmHtAy7cU16ZQF8D1wCfnVZfbnWenGOB5kD\nMizl+jkAAASqSURBVPpNuKxTCXg/H1zKmtHfR1XgfYwTvz3Ac3n18mbItCyexWhytANbtdajrIvU\n85RSjTBOGCsBCRgTqa0BjtzIsTNXJgchhBDWyo3NSkIIISwmyUEIIUQqkhyEEEKkIslBCCFEKpIc\nhBBCpJIrR2UV+Y95Garm+ktzAYZqrf9M5z0TAD+t9dib2G87jFEs/zBfKgjswhjhMyGL2+qEMd7V\nZKVUS+CM1vqwUmou8KHWeudNxDkB47LMI+ZLfsBJ4FmtdUQG7ysH1NBa/3Sj+xb5kyQHkZuEW3SP\nwp6k/SqlbMAK4FngjYzelJJ5M2LSDYlPAp9i3KWbXUNFf+iaCJVS0zGGiAjL4D13YtwtLMlBZIkk\nB5HrKaVqAG9j3MxUBGO4kO9dlvsB7wAKY1z/P7TWg5RSAcCbQFWMIc0/0VpnOAaT1tqplNoM1DC3\n3RkYD0Sb//prrU+ZN1+1xxgJ8xTwONAb6AisBB4Gmiqlhpnvfw1j7o0hWuut5rbXY9y0tBdYiDFy\naBAwWmu93o2i2Qr0N7fVGuMGpzhzOwMxbpKcDNiUUhcxkl2WykPkX9LnILxBGWCc1roD8ALGAc/V\n7UAzrXULrXVL4P/bu5cQHeMojuPfcYmUJhozm7GR+q2IEkmzMOxcihmXlCQbWbivpMilyKUsJCUL\nGozLWIwI0WhcJqKJxFFYsBg1McrOJYvzf3l6nmneic1Mcz6797m//8V73uf8n+ecTkmVwCa8pMJc\nYBawUtLUvk4kaTSwCGiXNAYPOg3pGDeAfZLG4W/jzjazOqAFqCkdI7213Qlsy6Vzmvhb7qEa/0d/\nCziB92CoBxYDp1LA6+s6RwCr+JuGq8Lfhq7HC+/tMLP3+NvSZ83s6L+MRxi64s4hDCQTJLXlli3D\ni6kdkrQf73BWldvmFdAt6TpeofaimX1NnfJqU1MY8PmEycDz3P5TcudtNbNmSdOAT5meGW3AejP7\nIukmcE/SVaDZzD5KKvf9LgAPgK14kLhkZj/TdY6VtCtt9x2oplgYbXW6Q6jAy0QcAw6kdV3A4RTc\nKvG7hrz+jkcIERzCgNLrnIOkc3gK5HRqg3gtuz7V7q9LxdcWAk8kzcFTLHvM7HKZ877o7bx4iiqr\norTMzBpTumsBHiQayn05M+uS9E7STGAFHiRI17nUzLrLHOLPnIOkVrwC64/SOnxy+q6khcD2Xvbv\n73iEEGmlMCjU4Hl58B/VUdmVkmZIWmNmz8xsD/AUr+N/H1iethkm6Wjq/9Bfb4DqTKvN+UCHpEmS\ntpjZ65SzbwHyvQJ+ASN7OWYT3tJyfObppex1VqWnm8rZAOyWVJs+1wAvJQ3H77ZKY5S9jv8djzCE\nRHAIg8ER4ExK5dwHPkvKTqS+BRolPZR0F+jB0zfHgW+SHgEdQI+Z9btVZOqqtQ5oTmmnecBO/BHS\n6ZIeS7qDN5O5ktv9NnBS0tLc8hZ8ruB8ZtlGYImkdrybX9kni8zsAz4BXarCezDt14rPM0yUtBnv\nY7BW0l7+czzC0BJVWUMIIRTEnUMIIYSCCA4hhBAKIjiEEEIoiOAQQgihIIJDCCGEgggOIYQQCiI4\nhBBCKPgNT8aYpE4qggkAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "n_classes = 2\n", "\n", "FPR = dict()\n", "TPR = dict()\n", "\n", "roc_auc = dict()\n", "for i in range(n_classes):\n", " FPR[i], TPR[i], _ = metrics.roc_curve(y_test, probs[:, i])\n", " roc_auc[i] = metrics.auc(FPR[i], TPR[i])\n", "\n", "\n", " \n", "plt.figure()\n", "plt.plot(FPR[1], TPR[1], label='ROC curve (AUROC = %0.2f)' % roc_auc[1])\n", "plt.plot([0, 1], [0, 1], 'k--')\n", "plt.xlim([0.0, 1.0])\n", "plt.ylim([0.0, 1.05])\n", "\n", "plt.xlabel('False Positive Rate')\n", "plt.ylabel('True Positive Rate')\n", "plt.title('ROC Curve For Tower Identification Model ')\n", "plt.legend(loc=\"lower right\")\n", "\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Another classifier: support vector machine \n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0.0 0.99 1.00 1.00 73355\n", " 1.0 0.76 0.27 0.40 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 0.99 1.00 73355\n", " 1.0 0.46 0.58 0.51 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 0.99 0.99 73355\n", " 1.0 0.33 0.72 0.46 546\n", "\n", "avg / total 0.99 0.99 0.99 73901\n", "\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 0.99 0.99 73355\n", " 1.0 0.29 0.79 0.42 546\n", "\n", "avg / total 0.99 0.98 0.99 73901\n", "\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 0.98 0.99 73355\n", " 1.0 0.25 0.85 0.38 546\n", "\n", "avg / total 0.99 0.98 0.99 73901\n", "\n", " precision recall f1-score support\n", "\n", " 0.0 1.00 0.98 0.99 73355\n", " 1.0 0.23 0.89 0.36 546\n", "\n", "avg / total 0.99 0.98 0.98 73901\n", "\n" ] } ], "source": [ "from sklearn import svm\n", "\n", "# Option to give more class-weight to identified cooling towers because they are very sparse in the dataset of total buildings.\n", "ClassWeights = [1, 5, 10, 15, 20, 30] \n", "\n", "\n", "for weight in ClassWeights:\n", " svm_model = svm.LinearSVC(class_weight={1: weight, 0 : 1})\n", " svm_model.fit(X_train, y_train)\n", " predicted = svm_model.predict(X_test)\n", " print metrics.classification_report(y_test,predicted )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Support Vector Classifier Results\n", "\n", "* Much higher recall, finding nearly 90% of cooling towers\n", "* We have to pay a price in precision for finding more towers, precision is only 23%\n", "* Still much better than picking randomly\n", "* Can tune class weights to dial-in the precision vs. recall we want\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Recall 0.886\n", "Precision 0.228\n", "0.007 of buildings sampled have towers\n" ] } ], "source": [ "ModelSummary(y_test, predicted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Postscript\n", "\n", "One obvious conclusion is the trade-off between precision and recall, which amounts to asking oneself *\"how many false positives am I willing to tolerate in order to find every true positive (cooling tower).\"* The answer to this kind of question is more often than not dependant on a (non-technical) operational constraint, which could stem from many factors such as the resources available or relative priorities of competing goals. \n", "\n", "### Try it for yourself\n", "\n", "These models are very basic and a reflection of time and resource constraints during the Legionnaires' disease activation. Playing with different types of models, cross-validation, hyperparameter tuning, etc. would surely produce an improved model. If you find one, let us know!\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }