{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\piers\\Anaconda3\\lib\\site-packages\\sklearn\\cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", " \"This module will be removed in 0.20.\", DeprecationWarning)\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "import seaborn as sb\n", "import matplotlib.pyplot as plt\n", "import sklearn\n", "\n", "from pandas import Series, DataFrame\n", "from pylab import rcParams\n", "from sklearn import preprocessing\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.cross_validation import train_test_split\n", "from sklearn import metrics \n", "from sklearn.metrics import classification_report" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "rcParams['figure.figsize'] = 10, 8\n", "sb.set_style('whitegrid')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logistic regression on the titanic dataset\n", "The first thing we are going to do is to read in the dataset using the Pandas' read_csv() function. We will put this data into a Pandas DataFrame, called \"titanic\", and name each of the columns." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "url = 'https://raw.githubusercontent.com/BigDataGal/Python-for-Data-Science/master/titanic-train.csv'\n", "titanic = pd.read_csv(url)\n", "titanic.columns = ['PassengerId','Survived','Pclass','Name','Sex','Age','SibSp','Parch','Ticket','Fare','Cabin','Embarked']\n", "titanic.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just a quick fyi (we will examine these variables more closely in a minute):\n", "\n", "##### VARIABLE DESCRIPTIONS\n", "\n", "Survived - Survival (0 = No; 1 = Yes)
\n", "Pclass - Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)
\n", "Name - Name
\n", "Sex - Sex
\n", "Age - Age
\n", "SibSp - Number of Siblings/Spouses Aboard
\n", "Parch - Number of Parents/Children Aboard
\n", "Ticket - Ticket Number
\n", "Fare - Passenger Fare (British pound)
\n", "Cabin - Cabin
\n", "Embarked - Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checking that your target variable is binary\n", "Since we are building a model to predict survival of passangers from the Titanic, our target is going to be \"Survived\" variable from the titanic dataframe. To make sure that it's a binary variable, let's use Seaborn's countplot() function." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAmIAAAHfCAYAAADz6rTQAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGVlJREFUeJzt3X+MXQWd9/HPtMO0OP0FJhhjU6BIBTUjpaTFh6W7LGhh\nFaKIox0zmmiIVAlpVVKU0uLCCg1a/LWK2+UPnAp1BGKI8WcbtKHVUcagobFaG0VFg+CPODPoTNu5\nzx9PHODBlkG48522r9dfc8+555zvbXJu3j135tyWRqPRCAAAE25K9QAAAEcqIQYAUESIAQAUEWIA\nAEWEGABAESEGAFCktXqAf0Z/f3/1CAAA47Zo0aJ/uPyQDLHkwC8IAGAyOdgFJB9NAgAUEWIAAEWE\nGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWE\nGABAESEGAFBEiAEAFBFiAABFhBgAQJHW6gEOBfdfcVn1CHBEOuOTt1SPANBUrogBABQRYgAARYQY\nAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQY\nAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQY\nAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFCktZk7f+Mb35gZM2YkSebOnZvLLrssV111VVpaWnLy\nySdn3bp1mTJlSnp7e7N58+a0trZmxYoVOeecc5o5FgDApNC0EBseHk6j0UhPT8/YsssuuywrV67M\nkiVLsnbt2mzdujWnnXZaenp6ctddd2V4eDhdXV0566yz0tbW1qzRAAAmhaaF2K5du/LXv/4173zn\nO7Nv3768733vy86dO7N48eIkydKlS7N9+/ZMmTIlCxcuTFtbW9ra2jJv3rzs2rUrHR0dzRoNAGBS\naFqITZ8+Pe9617vy5je/Ob/85S9z6aWXptFopKWlJUnS3t6egYGBDA4OZubMmWPbtbe3Z3BwsFlj\nAQBMGk0LsRNPPDHHH398WlpacuKJJ2bOnDnZuXPn2PqhoaHMmjUrM2bMyNDQ0FOWPznMDqS/v78p\ncwOTh/McONw1LcTuvPPO/OxnP8u1116bRx55JIODgznrrLPS19eXJUuWZNu2bTnzzDPT0dGRj3/8\n4xkeHs7IyEj27NmTBQsWPOP+Fy1a1KzRn+b+2zZO2LGAJ0zkeQ7QLAf7T2XTQuySSy7JBz/4wSxf\nvjwtLS35yEc+kmOOOSbXXHNNNmzYkPnz52fZsmWZOnVquru709XVlUajkVWrVmXatGnNGgsAYNJo\naTQajeohnq3+/v6JvSJ2xWUTdizgCWd88pbqEQCes4N1ixu6AgAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEA\nFBFiAABFhBgAQBEhBgBQpKkh9oc//CH/+q//mj179uShhx7K8uXL09XVlXXr1mV0dDRJ0tvbm4sv\nvjidnZ259957mzkOAMCk0rQQ27t3b9auXZvp06cnSW644YasXLkyt99+exqNRrZu3ZpHH300PT09\n2bx5c2699dZs2LAhIyMjzRoJAGBSaVqIrV+/Pm9961tz3HHHJUl27tyZxYsXJ0mWLl2aHTt25Mc/\n/nEWLlyYtra2zJw5M/PmzcuuXbuaNRIAwKTS2oyd3n333Tn22GNz9tln53/+53+SJI1GIy0tLUmS\n9vb2DAwMZHBwMDNnzhzbrr29PYODg+M6Rn9///M/ODCpOM+Bw11TQuyuu+5KS0tLvvvd7+YnP/lJ\nVq9enT/+8Y9j64eGhjJr1qzMmDEjQ0NDT1n+5DA7mEWLFj3vcx/I/bdtnLBjAU+YyPMcoFkO9p/K\npnw0+YUvfCGbNm1KT09PTj311Kxfvz5Lly5NX19fkmTbtm0544wz0tHRkf7+/gwPD2dgYCB79uzJ\nggULmjESAMCk05QrYv/I6tWrc80112TDhg2ZP39+li1blqlTp6a7uztdXV1pNBpZtWpVpk2bNlEj\nAQCUanqI9fT0jP28adOmp63v7OxMZ2dns8cAAJh03NAVAKCIEAMAKCLEAACKCDEAgCJCDACgiBAD\nACgixAAAiggxAIAiQgwAoIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBAD\nACgixAAAiggxAIAiQgwAoIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBAD\nACgixAAAiggxAIAiQgwAoIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKtFYPAHCkumzH/dUj\nwBHplv9zRvUIY1wRAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAA\niggxAIAiQgwAoIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAA\niggxAIAiQgwAoIgQAwAoIsQAAIoIMQCAIq3N2vH+/fuzZs2a/OIXv0hLS0s+/OEPZ9q0abnqqqvS\n0tKSk08+OevWrcuUKVPS29ubzZs3p7W1NStWrMg555zTrLEAACaNcV0Ru+666562bPXq1Qfd5t57\n702SbN68OStXrszNN9+cG264IStXrsztt9+eRqORrVu35tFHH01PT082b96cW2+9NRs2bMjIyMg/\n8VIAAA4tB70idvXVV+fXv/51HnzwwezevXts+b59+zIwMHDQHZ933nn5t3/7tyTJb3/728yaNSs7\nduzI4sWLkyRLly7N9u3bM2XKlCxcuDBtbW1pa2vLvHnzsmvXrnR0dDzHlwYAMLkdNMRWrFiRhx9+\nOP/1X/+Vyy+/fGz51KlTc9JJJz3zzltbs3r16nzrW9/KJz/5yWzfvj0tLS1Jkvb29gwMDGRwcDAz\nZ84c26a9vT2Dg4PPuO/+/v5nfA5waHOeA80wmd5bDhpic+fOzdy5c3PPPfdkcHAwAwMDaTQaSZLH\nH388c+bMecYDrF+/Ph/4wAfS2dmZ4eHhseVDQ0OZNWtWZsyYkaGhoacsf3KYHciiRYue8TnPl/tv\n2zhhxwKeMJHneYWNO+6vHgGOSBP93nKw8BvXL+t/7nOfy+c+97mnhFdLS0u2bt16wG2+/OUv55FH\nHsm73/3uHH300WlpackrX/nK9PX1ZcmSJdm2bVvOPPPMdHR05OMf/3iGh4czMjKSPXv2ZMGCBc/i\n5QEAHJrGFWJf+tKXsmXLlhx77LHj3vFrX/vafPCDH8zb3va27Nu3Lx/60Idy0kkn5ZprrsmGDRsy\nf/78LFu2LFOnTk13d3e6urrSaDSyatWqTJs27Z9+QQAAh4pxhdiLX/zizJ49+1nt+AUveEE+8YlP\nPG35pk2bnrass7MznZ2dz2r/AACHunGF2AknnJCurq4sWbIkbW1tY8uf/Av8AAA8O+MKsRe96EV5\n0Yte1OxZAACOKOMKMVe+AACef+MKsVNOOWXs/l9/d9xxx+U73/lOU4YCADgSjCvEdu3aNfbz3r17\ns2XLljzwwANNGwoA4Egwru+afLKjjjoqF1xwQb73ve81Yx4AgCPGuK6IffnLXx77udFoZPfu3Tnq\nqKOaNhQAwJFgXCHW19f3lMfHHHNMbr755qYMBABwpBhXiN1www3Zu3dvfvGLX2T//v05+eST09o6\nrk0BADiAcdXUgw8+mCuuuCJz5szJ6OhoHnvssfz3f/93XvWqVzV7PgCAw9a4Quz666/PzTffPBZe\nDzzwQK677rrceeedTR0OAOBwNq6/mnz88cefcvXrtNNOy/DwcNOGAgA4EowrxGbPnp0tW7aMPd6y\nZUvmzJnTtKEAAI4E4/po8rrrrsu73/3uXH311WPLNm/e3LShAACOBOO6IrZt27YcffTRuffee3Pb\nbbfl2GOPzfe///1mzwYAcFgbV4j19vbmjjvuyAte8IKccsopufvuu7Np06ZmzwYAcFgbV4jt3bv3\nKXfSd1d9AIDnbly/I3beeeflHe94Ry644IIkyTe/+c2ce+65TR0MAOBwN64Qu/LKK/P1r389P/jB\nD9La2pq3v/3tOe+885o9GwDAYW3c31N0/vnn5/zzz2/mLAAAR5Rx/Y4YAADPPyEGAFBEiAEAFBFi\nAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEAFBFi\nAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEAFBFi\nAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABAESEGAFBEiAEAFBFi\nAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEVam7HTvXv35kMf+lAefvjh\njIyMZMWKFXnpS1+aq666Ki0tLTn55JOzbt26TJkyJb29vdm8eXNaW1uzYsWKnHPOOc0YCQBg0mlK\niN1zzz2ZM2dObrrppvz5z3/OG97whpxyyilZuXJllixZkrVr12br1q057bTT0tPTk7vuuivDw8Pp\n6urKWWedlba2tmaMBQAwqTQlxM4///wsW7YsSdJoNDJ16tTs3LkzixcvTpIsXbo027dvz5QpU7Jw\n4cK0tbWlra0t8+bNy65du9LR0dGMsQAAJpWmhFh7e3uSZHBwMFdccUVWrlyZ9evXp6WlZWz9wMBA\nBgcHM3PmzKdsNzg4OK5j9Pf3P/+DA5OK8xxohsn03tKUEEuS3/3ud3nve9+brq6uXHjhhbnpppvG\n1g0NDWXWrFmZMWNGhoaGnrL8yWF2MIsWLXreZz6Q+2/bOGHHAp4wked5hY077q8eAY5IE/3ecrDw\na8pfTT722GN55zvfmSuvvDKXXHJJkuTlL395+vr6kiTbtm3LGWeckY6OjvT392d4eDgDAwPZs2dP\nFixY0IyRAAAmnaZcEbvlllvyl7/8JZ/5zGfymc98Jkly9dVX5/rrr8+GDRsyf/78LFu2LFOnTk13\nd3e6urrSaDSyatWqTJs2rRkjAQBMOk0JsTVr1mTNmjVPW75p06anLevs7ExnZ2czxgAAmNTc0BUA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAAiggxAIAiQgwA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAAiggxAIAiQgwA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAAiggxAIAiQgwA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAAiggxAIAiQgwA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKCDEAgCJCDACgiBADACgixAAAiggxAIAiQgwA\noIgQAwAoIsQAAIoIMQCAIkIMAKCIEAMAKCLEAACKNDXEfvSjH6W7uztJ8tBDD2X58uXp6urKunXr\nMjo6miTp7e3NxRdfnM7Oztx7773NHAcAYFJpWoht3Lgxa9asyfDwcJLkhhtuyMqVK3P77ben0Whk\n69atefTRR9PT05PNmzfn1ltvzYYNGzIyMtKskQAAJpWmhdi8efPyqU99auzxzp07s3jx4iTJ0qVL\ns2PHjvz4xz/OwoUL09bWlpkzZ2bevHnZtWtXs0YCAJhUWpu142XLluU3v/nN2ONGo5GWlpYkSXt7\newYGBjI4OJiZM2eOPae9vT2Dg4Pj2n9/f//zOzAw6TjPgWaYTO8tTQux/9+UKU9cfBsaGsqsWbMy\nY8aMDA0NPWX5k8PsYBYtWvS8z3gg99+2ccKOBTxhIs/zCht33F89AhyRJvq95WDhN2F/Nfnyl788\nfX19SZJt27bljDPOSEdHR/r7+zM8PJyBgYHs2bMnCxYsmKiRAABKTdgVsdWrV+eaa67Jhg0bMn/+\n/CxbtixTp05Nd3d3urq60mg0smrVqkybNm2iRgIAKNXUEJs7d256e3uTJCeeeGI2bdr0tOd0dnam\ns7OzmWMAAExKbugKAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAU\nEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAU\nEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAU\nEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAU\nEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAEARIQYAUESIAQAU\nEWIAAEWEGABAESEGAFBEiAEAFBFiAABFhBgAQBEhBgBQRIgBABQRYgAARYQYAECR1uoBkmR0dDTX\nXnttfvrTn6atrS3XX399jj/++OqxAACaalJcEduyZUtGRkbyxS9+Me9///tz4403Vo8EANB0kyLE\n+vv7c/bZZydJTjvttDz44IPFEwEANN+k+GhycHAwM2bMGHs8derU7Nu3L62tBx6vv79/IkZLkrS8\n49IJOxbwhIk8zytcOq2legQ4Ik2m95ZJEWIzZszI0NDQ2OPR0dGDRtiiRYsmYiwAgKaaFB9Nnn76\n6dm2bVuS5IEHHsiCBQuKJwIAaL6WRqPRqB7i7381+bOf/SyNRiMf+chHctJJJ1WPBQDQVJMixAAA\njkST4qNJAIAjkRADACgixDgsjY6OZu3atXnLW96S7u7uPPTQQ9UjAYeRH/3oR+nu7q4eg8PApLh9\nBTzfnvxtDQ888EBuvPHGfPazn60eCzgMbNy4Mffcc0+OPvro6lE4DLgixmHJtzUAzTJv3rx86lOf\nqh6Dw4QQ47B0oG9rAHiuli1bdtCbjsOzIcQ4LD3bb2sAgApCjMOSb2sA4FDgEgGHpde85jXZvn17\n3vrWt459WwMATDburA8AUMRHkwAARYQYAEARIQYAUESIAQAUEWIAAEWEGHDI+vrXv56LL744F110\nUS688ML87//+73Pe5x133JE77rjjOe+nu7s7fX19z3k/wOHNfcSAQ9IjjzyS9evX5+67784xxxyT\noaGhdHd358QTT8y55577T+93+fLlz+OUAAcnxIBD0p/+9Kfs3bs3f/vb35Ik7e3tufHGGzNt2rT8\n+7//ez7/+c9n7ty56evry6c//en09PSku7s7s2fPzu7du3PhhRfmj3/8Y9auXZskWb9+fY477rgM\nDg4mSWbPnp1f/vKXT1vf2dmZ//zP/8zu3buzf//+XHrppXn961+fkZGRXH311XnwwQfzkpe8JH/6\n059q/mGAQ4qPJoFD0imnnJJzzz035513Xi655JLcdNNNGR0dzfHHH3/Q7V72spflG9/4RpYvX54t\nW7Zk//79aTQa+cY3vpHXve51Y8973ete9w/Xf/azn80rXvGK3H333fnCF76QW265Jb/+9a/T09OT\nJPna176WNWvW5Fe/+lVTXz9weHBFDDhkffjDH8573vOe3HfffbnvvvvS2dmZj370owfdpqOjI0ny\nwhe+MKeeemr6+vpy1FFH5YQTTshxxx039rwDrd+xY0f+9re/5a677kqSPP7449m9e3e+//3v5y1v\neUuS5IQTTsjChQub9KqBw4kQAw5J3/72t/P444/nP/7jP/KmN70pb3rTm9Lb25s777wzSfL3b2/b\nt2/fU7abPn362M8XXXRRvvrVr+aoo47KRRdd9LRj/KP1o6Ojuemmm/KKV7wiSfLYY49l9uzZ6e3t\nzejo6Ni2ra3eXoFn5qNJ4JA0ffr0fOxjH8tvfvObJP8vvH7+85/n1FNPzTHHHJOf//znSZKtW7ce\ncB/nnntufvCDH+S+++7La17zmnGtP/PMM8f+qvL3v/99Lrroovzud7/Lq1/96nzlK1/J6OhoHn74\n4fzwhz98vl8ycBjyXzbgkHTmmWfm8ssvz2WXXZa9e/cmSc4+++y8973vzemnn57rrrsun/70p/Mv\n//IvB9zH9OnTc/rpp2dkZCTt7e3jWn/55Zfn2muvzetf//rs378/V155ZebNm5eurq7s3r07F1xw\nQV7ykpdkwYIFzXnhwGGlpfH36/cAAEwoH00CABQRYgAARYQYAEARIQYAUESIAQAUEWIAAEWEGABA\nESEGAFDk/wKEO3lD04KFOAAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sb.countplot(x='Survived',data=titanic, palette='hls')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so we see that the Survived variable is binary (0 - did not survive / 1 - survived)\n", "\n", "### Checking for missing values\n", "It's easy to check for missing values by calling the isnull() method, and the sum() method off of that, to return a tally of all the True values that are returned by the isnull() method." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PassengerId 0\n", "Survived 0\n", "Pclass 0\n", "Name 0\n", "Sex 0\n", "Age 177\n", "SibSp 0\n", "Parch 0\n", "Ticket 0\n", "Fare 0\n", "Cabin 687\n", "Embarked 2\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, how many records are there in the data frame anyway?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 891 entries, 0 to 890\n", "Data columns (total 12 columns):\n", "PassengerId 891 non-null int64\n", "Survived 891 non-null int64\n", "Pclass 891 non-null int64\n", "Name 891 non-null object\n", "Sex 891 non-null object\n", "Age 714 non-null float64\n", "SibSp 891 non-null int64\n", "Parch 891 non-null int64\n", "Ticket 891 non-null object\n", "Fare 891 non-null float64\n", "Cabin 204 non-null object\n", "Embarked 889 non-null object\n", "dtypes: float64(2), int64(5), object(5)\n", "memory usage: 83.6+ KB\n" ] } ], "source": [ "titanic.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so there are only 891 rows in the titanic data frame. Cabin is almost all missing values, so we can drop that variable completely, but what about age? Age seems like a relevant predictor for survival right? We'd want to keep the variables, but it has 177 missing values. Yikes!! We are going to need to find a way to approximate for those missing values!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Taking care of missing values\n", "##### Dropping missing values\n", "So let's just go ahead and drop all the variables that aren't relevant for predicting survival. We should at least keep the following:\n", "- Survived - This variable is obviously relevant.\n", "- Pclass - Does a passenger's class on the boat affect their survivability?\n", "- Sex - Could a passenger's gender impact their survival rate?\n", "- Age - Does a person's age impact their survival rate?\n", "- SibSp - Does the number of relatives on the boat (that are siblings or a spouse) affect a person survivability? Probability\n", "- Parch - Does the number of relatives on the boat (that are children or parents) affect a person survivability? Probability\n", "- Fare - Does the fare a person paid effect his survivability? Maybe - let's keep it.\n", "- Embarked - Does a person's point of embarkation matter? It depends on how the boat was filled... Let's keep it.\n", "\n", "What about a person's name, ticket number, and passenger ID number? They're irrelavant for predicting survivability. And as you recall, the cabin variable is almost all missing values, so we can just drop all of these." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassSexAgeSibSpParchFareEmbarked
003male22.0107.2500S
111female38.01071.2833C
213female26.0007.9250S
311female35.01053.1000S
403male35.0008.0500S
\n", "
" ], "text/plain": [ " Survived Pclass Sex Age SibSp Parch Fare Embarked\n", "0 0 3 male 22.0 1 0 7.2500 S\n", "1 1 1 female 38.0 1 0 71.2833 C\n", "2 1 3 female 26.0 0 0 7.9250 S\n", "3 1 1 female 35.0 1 0 53.1000 S\n", "4 0 3 male 35.0 0 0 8.0500 S" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data = titanic.drop(['PassengerId','Name','Ticket','Cabin'], 1)\n", "titanic_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have the dataframe reduced down to only relevant variables, but now we need to deal with the missing values in the age variable.\n", "\n", "#### Imputing missing values\n", "Let's look at how passenger age is related to their class as a passenger on the boat." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAHfCAYAAACF0AZbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X+QVfV9//HXXbYusBQpnWrbOFpQFKxxasFFJwltGglk\nMhkTSqOLQ0z8kSFjm0CjAZEFGxzRUiUJo8WmNXUkYjIVLZ1OzBh03EQ64DDRSclufsyYTv1FRjdB\nWGBd3Pv9o1+ovzENn7137308/jq7sPe893pcnvdzzj1bqVar1QAAUExLrQcAAGh0ggsAoDDBBQBQ\nmOACAChMcAEAFCa4AAAKa631AG9n586dtR4BAOAdmz59+pt+vq6DK3nrwQEA6snbLRQ5pQgAUJjg\nAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACF\nCS4AgMJaSz3w4OBgli1blmeeeSYtLS1ZvXp1Wltbs2zZslQqlUyZMiWrVq1KS4vmAwAaW7HgevTR\nR3Po0KHce++9eeyxx/KlL30pg4ODWbx4cWbOnJmVK1dm69atmT17dqkRAADqQrHlpUmTJuWVV17J\n0NBQ9u3bl9bW1uzatSsdHR1JklmzZmXbtm2ldg8AUDeKrXCNHTs2zzzzTD70oQ/lF7/4RTZs2JDH\nH388lUolSdLe3p69e/eW2n3d27RpU3bs2FHrMV6jv78/yf/8t6k3HR0d6ezsrPUYAPB/Uiy4/vmf\n/znvfe978/nPfz7PPfdcLr300gwODh758/7+/owfP/6oj7Nz585SI9bU7t27MzAwUOsxXuPgwYNJ\nktbWYofF/9nu3bsb9lgAoPEV+5d1/Pjx+Y3f+I0kyfHHH59Dhw7lzDPPzPbt2zNz5sx0d3fnvPPO\nO+rjTJ8+vdSINVWP39eSJUuSJOvWravxJAAw8rzdwkCx4PrkJz+Z5cuXZ8GCBRkcHMySJUty1lln\npaurK7feemsmT56cOXPmlNo9AEDdKBZc7e3t+fKXv/yGz2/cuLHULgEA6pKbYAEAFCa4AAAKE1wA\nAIUJLgCAwgQXAEBhggsAoDDBBQBQmOACAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQXAEBhggsAoDDB\nBQBQmOACAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQXAEBhggsAGkxPT096enpqPQavIrgAoMFs3rw5\nmzdvrvUYvIrgAoAG0tPTk97e3vT29lrlqiOCCwAayKtXtqxy1Q/BBQBQmOACgAYyb968N92mtlpr\nPQAAcOxMmzYtU6dOPbJNfRBcANBgrGzVH8EFAA3Gylb9cQ0XAEBhggsAoDDBBQBQmOACAChMcAEA\nFCa4AAAKE1wAAIUJLgCAwgQXAEBhggsAGkxPT096enpqPQavIrgAoMFs3rw5mzdvrvUYvIrgApqK\nV/40up6envT29qa3t9exXkeKBdfmzZuzcOHCLFy4MB//+Mfz7ne/Oz/4wQ/S2dmZBQsWZNWqVRka\nGiq1e4A35ZU/je7Vx7djvX4UC6558+bl7rvvzt13350//MM/zIoVK3Lbbbdl8eLFueeee1KtVrN1\n69ZSuwd4A6/8gVopfkrxBz/4QX7605/moosuyq5du9LR0ZEkmTVrVrZt21Z69wBHeOVPM5g3b96b\nblNbraV3cMcdd+Sqq65KklSr1VQqlSRJe3t79u7de9Sv37lzZ9H5+F8DAwNJPOc0rlf/zNm7d69j\nnYb1rne9K0myf/9+x3mdKBpcL730Up566qmcd955SZKWlv9dUOvv78/48eOP+hjTp08vNh+vtXHj\nxiSecxrX2LFjc+ONNyZJLr300kybNq3GE0EZY8eOTRLH+DB7u7gtGlyPP/54zj///CMfn3nmmdm+\nfXtmzpyZ7u7uIyEGMBymTZuWqVOnHtmGRuX4rj9Fg+upp57KSSeddOTjpUuXpqurK7feemsmT56c\nOXPmlNw9wBu4pgWohaLBdcUVV7zm40mTJh05bQVQC175A7XgxqcAAIUJLgCAwgQXAEBhggsAoDDB\nBQBQmOACAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQXAEBhggtoKj09Penp6an1GECTEVxAU9m8eXM2\nb95c6zGAJiO4gKbR09OT3t7e9Pb2WuUChpXgAprGq1e2rHIBw0lwAQAUJriApjFv3rw33YZG480h\n9UdwAU1j2rRpmTp1aqZOnZpp06bVehwoxptD6k9rrQcAGE5Wtmh0h98ccnjbi4v6YIULaCrTpk3z\nDxANzZtD6pPgAgAoTHABQAPx5pD65BouAGggh98ccnib+iC4AKDBWNmqP4ILABqMla364xouAIDC\nBBcANBh3mq8/ggsAGow7zdcfwQUADeTwneZ7e3utctURwQUADcSd5uuT4AIAKExwAU3FxcQ0Onea\nr0+CC2gqLiam0R2+0/zUqVPdj6uOuPEp0DQOX0x8eNs/RjQqK1v1xwoX0DRcTEyzmDZtmhcUdUZw\nAQAUJriApuFiYpqFN4fUH8EFNA0XE9MsvDmk/rhoHmgqVrZodN4cUp+scAFNxcXENDpvDqlPggsA\noDDBBTQVFxPT6Lw5pD4VvYbrjjvuyMMPP5zBwcF0dnamo6Mjy5YtS6VSyZQpU7Jq1aq0tGg+YPgc\nPsVy3XXX1XgSKOPwm0MOb1MfitXO9u3b8/3vfz+bNm3K3Xffneeffz5r1qzJ4sWLc88996RarWbr\n1q2ldg/wBocvJu7t7bXKRUObN2+e1a06Uyy4vve97+X000/PVVddlUWLFuVP//RPs2vXrnR0dCRJ\nZs2alW3btpXaPcAbuJiYZuHNIfWn2CnFX/ziF3n22WezYcOGPP300/nMZz6TarWaSqWSJGlvb8/e\nvXuP+jg7d+4sNSKvMzAwkMRzTuN69c+cvXv3OtaBYVMsuCZMmJDJkyfnuOOOy+TJk9PW1pbnn3/+\nyJ/39/dn/PjxR32c6dOnlxqR19m4cWMSzzmNa+zYsbnxxhuTJJdeeqkVABrW4VPmjvHh9XYv4oqd\nUpw+fXq++93vplqtZvfu3Tlw4EDOP//8bN++PUnS3d2dGTNmlNo9wBtMmzYtY8aMyZgxY/xDRENz\np/n6U2yF6/3vf38ef/zxzJ8/P9VqNStXrsxJJ52Urq6u3HrrrZk8eXLmzJlTavcAb9DT05MDBw4c\n2RZdNCJ3mq9PRW8L8YUvfOENnzt82gpguL3+onm3hqAROc7rk5tgAQAUJriApuEO3DQDx3l9KnpK\nEaCeuAM3zcBxXp8EF9BUvOKnGTjO64/gApqKV/w0A8d5/XENFwBAYYILaCoPPvhgHnzwwVqPATQZ\npxSBpnL4HkVz586t8SRAM7HCBTSNBx98MAcOHMiBAwescgHDSnABTeP1d+AGGC6CCwCgMMEFNA13\n4AZqRXABTWPu3LkZM2ZMxowZ46J5YFh5lyLQVKxs0Qx6enqSuAFqPRFcQFOxskUzOPymkOuuu67G\nk3CYU4oA0EB6enrS29ub3t7eIytd1J7gAoAG4vYn9UlwAQAUJrgAoIG4/Ul9ctE8UMymTZuyY8eO\nWo/xGv39/UmS9vb2Gk/yRh0dHens7Kz1GIxw06ZNy9SpU49sUx8EF9BUBgYGktRncMGxYmWr/ggu\noJjOzs66W7FZsmRJkmTdunU1ngTKsbJVf1zDBQBQmOACAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQX\nAEBhggsAoDDBBQBQmOACAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQXAEBhggsAoDDBBQBQWGutBwCA\nkWzTpk3ZsWNHrcd4jf7+/iRJe3t7jSd5o46OjnR2dtZ6jGFnhQsAGszAwEAGBgZqPQavYoULAH4N\nnZ2ddbdis2TJkiTJunXrajwJhxUNro997GMZN25ckuSkk07KokWLsmzZslQqlUyZMiWrVq1KS4tF\nNgCgsRULroGBgVSr1dx9991HPrdo0aIsXrw4M2fOzMqVK7N169bMnj271AgAAHWh2PJSb29vDhw4\nkMsuuyyf+MQn8sQTT2TXrl3p6OhIksyaNSvbtm0rtXsAgLpRbIVr9OjRufzyy/MXf/EX+dnPfpYr\nr7wy1Wo1lUolyf+8c2Lv3r1HfZydO3eWGpHXOXyBpeecRuY4pxk4zutPseCaNGlSTjnllFQqlUya\nNCkTJkzIrl27jvx5f39/xo8ff9THmT59eqkReZ2NGzcm8ZzT2BznNAPHeW28XeAWO6X4L//yL7np\nppuSJLt3786+ffvynve8J9u3b0+SdHd3Z8aMGaV2DwBQN4qtcM2fPz/XXnttOjs7U6lUcuONN+a3\nfuu30tXVlVtvvTWTJ0/OnDlzSu0eAKBuFAuu4447LrfccssbPn94mXM4rV69On19fcO+35Hm8HN0\n+P4tvL2JEyemq6ur1mMAMAI0xY1P+/r68uILL2RcS6XWo9S1UUPVJMlA34s1nqT+7fv/zxUAvBNN\nEVxJMq6lksuOH1vrMWgQd+7ZX+sRABhB3OYdAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACFCS4A\ngMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjg\nAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACF\nCS4AgMIEFwBAYe84uPbs2VNyDgCAhnXU4Orp6cncuXNz4YUXZvfu3Zk9e3Z27do1HLMBADSEowbX\nDTfckNtuuy0TJkzIiSeemOuvvz6rVq0ajtkAABrCUYPrwIEDOfXUU498/J73vCcvv/xy0aEAABrJ\nUYNrwoQJ6e3tTaVSSZJs2bIlxx9/fPHBAAAaRevR/sL111+fpUuX5ic/+UlmzJiRU045JWvXrh2O\n2QAAGsJRg+vkk0/Opk2bsn///gwNDWXcuHHDMRcAQMM4anAtXLjwyOnEJKlUKhk9enQmT56cRYsW\nve3pxRdffDHz5s3LnXfemdbW1ixbtiyVSiVTpkzJqlWr0tLiNmAAQOM7avGcdtppOeOMM7J8+fIs\nX7487373u/Obv/mbOfHEE3Pddde95dcNDg5m5cqVGT16dJJkzZo1Wbx4ce65555Uq9Vs3br12H0X\nAAB17KjB9eSTT+a6667L1KlTM3Xq1Fx99dV56qmn8slPfjJPP/30W37dzTffnIsvvjgnnHBCkmTX\nrl3p6OhIksyaNSvbtm07Rt8CAEB9O+opxcHBwfzkJz/JlClTkiQ//vGPMzQ0lIMHD2ZwcPBNv2bz\n5s2ZOHFi3ve+9+Uf/uEfkiTVavXIqcn29vbs3bv3HQ24c+fOd/T33s7AwMCv/RjwegMDA8fk+GR4\nHf554L8djcxxXn+OGlwrVqzIlVdemd/+7d9OtVrNnj17snbt2qxfvz4XXnjhm37Nfffdl0qlkv/4\nj/9IT09Pli5dmr6+viN/3t/fn/Hjx7+jAadPn/4Ov5W3tnHjxgz07/u1Hwdera2t7ZgcnwyvjRs3\nJjk2P1ugXjnOa+PtAveowTVz5sx85zvfyQ9/+MN0d3fne9/7Xi6//PJ8//vff8uv+frXv35ke+HC\nhbn++uuzdu3abN++PTNnzkx3d3fOO++8X/HbAAAYmY56Ddd///d/50tf+lIWLVqUDRs25L3vfe//\n6YL3pUuXZv369bnooosyODiYOXPm/J8GBgAYad5yheuhhx7Kvffem127dmX27NlZu3Zturq68pd/\n+Ze/0g7uvvvuI9uHlzgBAJrJWwbXX/3VX2Xu3Ln5xje+kVNOOSVJXnM/LgAA3pm3DK4tW7bk/vvv\nz4IFC/Kud70rH/7wh/PKK68M52wAAA3hLa/hOv3007N06dJ0d3fn05/+dHbs2JEXXnghn/70p/Po\no48O54wAACPaUS+aHzVqVC644ILcdttt6e7uzvnnn59bbrllOGYDAGgIv9IvM5w4cWI+9alPZcuW\nLaXmAQBoOH57NABAYYILAKAwwQUAUJjgAgAo7Ki/S7ER9Pf35+BQNXfu2V/rUWgQ+4aqOdTfX+sx\nABghrHABABTWFCtc7e3taR04mMuOH1vrUWgQd+7Zn7b29lqPAcAIYYULAKAwwQUAUFhTnFKEZrB6\n9er09fXVeoy6d/g5WrJkSY0nGRkmTpyYrq6uWo8BI57gggbR19eXF158IZVxFq7fTnXUUJLkxQFx\nejTVfUO1HgEahuCCBlIZ15Kxlx1f6zFoEPvv3FPrEaBheCkMAFCY4AIAKExwAQAUJrgAAAoTXAAA\nhQkuAIDCBBcAQGGCCwCgMMEFAFCY4AIAKExwAQAUJrgAAAoTXAAAhQkuAIDCBBcAQGGCCwCgMMEF\nAFBYa60HAIB3avXq1enr66v1GHXv8HO0ZMmSGk8yMkycODFdXV1F9yG4ABgx+vr68sILL6alMq7W\no9S1oeqoJEnfiwM1nqT+DVX3Dct+BBcAI0pLZVyOH3tZrcegQezZf+ew7Mc1XAAAhQkuAIDCBBcA\nQGGCCwCgMMEFAFBYsXcpvvLKK1mxYkWeeuqpVCqV/M3f/E3a2tqybNmyVCqVTJkyJatWrUpLi+YD\nABpbseB65JFHkiT33ntvtm/fnnXr1qVarWbx4sWZOXNmVq5cma1bt2b27NmlRgAAqAvFlpcuuOCC\nrF69Okny7LPPZvz48dm1a1c6OjqSJLNmzcq2bdtK7R4AoG4UvfFpa2trli5dmoceeihf+cpX8thj\nj6VSqSRJ2tvbs3fv3qM+xs6dO3/tOQYG3GmXY29gYOCYHJ/HiuOcEhznNIPhOM6L32n+5ptvztVX\nX52Pf/zjr/kfpb+/P+PHjz/q10+fPv3XnmHjxo0Z6B+eW/fTPNra2o7J8XmsbNy4MfsG+ms9Bg2m\nHo/z/n2ii2PrWB3nbxdtxU4pPvDAA7njjjuSJGPGjEmlUslZZ52V7du3J0m6u7szY8aMUrsHAKgb\nxVa4PvjBD+baa6/NJZdckkOHDmX58uU59dRT09XVlVtvvTWTJ0/OnDlzSu0eAKBuFAuusWPH5stf\n/vIbPr9x48ZSu4Sm1t/fn+rBoey/c0+tR6FBVPcNpf+Q09RwLLgJFgBAYcUvmgeGR3t7ew62DmTs\nZcfXehQaxP4796S9rb3WY0BDsMIFAFCY4AIAKExwAQAUJrgAAAoTXAAAhQkuAIDCBBcAQGFNcx+u\nfUPV3Llnf63HqGsHh6pJktEtlRpPUv/2DVXTVushABgxmiK4Jk6cWOsRRoT+vr4kSZvn66ja4rgC\n4J1riuDq6uqq9QgjwpIlS5Ik69atq/EkANBYXMMFAFCY4AIAKExwAQAUJrgAAAoTXAAAhQkuAIDC\nBBcAQGGCCwCgMMEFAFCY4AIAKExwAQAUJrgAAApril9eDUBj6O/vz1D1YPbsv7PWo9Aghqr70t9/\nqPh+rHABABRmhQuAEaO9vT0DB1tz/NjLaj0KDWLP/jvT3t5WfD9WuAAAChNcAACFOaUIDaS6byj7\n79xT6zHqWvXgUJKkMtrrzaOp7htKyp9pgaYguKBBTJw4sdYjjAh9/X1Jkoltnq+janNcwbEiuKBB\ndHV11XqEEWHJkiVJknXr1tV4EqCZWFMHAChMcAEAFCa4AAAKE1wAAIUJLgCAwgQXAEBhggsAoDDB\nBQBQmOACAChMcAEAFFbkV/sMDg5m+fLleeaZZ/Lyyy/nM5/5TE477bQsW7YslUolU6ZMyapVq9LS\novcAgMZXJLi2bNmSCRMmZO3atfnlL3+Zj370o5k6dWoWL16cmTNnZuXKldm6dWtmz55dYvcAAHWl\nyBLT3Llz87nPfS5JUq1WM2rUqOzatSsdHR1JklmzZmXbtm0ldg0AUHeKrHC1t7cnSfbt25fPfvaz\nWbx4cW6++eZUKpUjf75379539Fg7d+4sMSJvYmBgIInnnMbmOB/ZDv/3g2NpYGCg+M+EIsGVJM89\n91yuuuqqLFiwIB/5yEeydu3aI3/W39+f8ePHv6PHmT59eqkReZ2NGzcm8ZzT2BznI9vGjRvTv090\ncWy1tbUdk58JbxdtRU4pvvDCC7nssstyzTXXZP78+UmSM888M9u3b0+SdHd3Z8aMGSV2DQBQd4oE\n14YNG/LSSy/l9ttvz8KFC7Nw4cIsXrw469evz0UXXZTBwcHMmTOnxK4BAOpOkVOKK1asyIoVK97w\n+cNL+QAAzcSNsAAAChNcAACFCS4AgMIEFwBAYcXuwwUAJQxV92XP/jtrPUZdG6oeTJK0VEbXeJL6\nN1Tdl6St+H4EFwAjxsSJE2s9wojQ19efJJk4sXxIjHxtw3JcCS4ARoyurq5ajzAiLFmyJEmybt26\nGk/CYa7hAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABQmuAAA\nChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYIL\nAKAwwQUAUJjgAgAoTHABABQmuAAAChNcAACFCS4AgMIEFwBAYYILAKAwwQUAUJjgAgAoTHABABRW\nNLiefPLJLFy4MEnyX//1X+ns7MyCBQuyatWqDA0Nldw1AEDdKBZcX/3qV7NixYoMDAwkSdasWZPF\nixfnnnvuSbVazdatW0vtGgCgrhQLrpNPPjnr168/8vGuXbvS0dGRJJk1a1a2bdtWatcAAHWltdQD\nz5kzJ08//fSRj6vVaiqVSpKkvb09e/fufUePs3PnziLz8UaHVyM95zQyxznNwHFef4oF1+u1tPzv\nYlp/f3/Gjx//jr5u+vTppUbidTZu3JjEc05jc5zTDBzntfF2gTts71I888wzs3379iRJd3d3ZsyY\nMVy7BgCoqWELrqVLl2b9+vW56KKLMjg4mDlz5gzXrgEAaqroKcWTTjop3/zmN5MkkyZNOrLECQDQ\nTNz4FACgMMEFAFCY4AIAKExwAQAUJrgAAAoTXAAAhQkuAIDCBBcAQGGCCwCgMMEFAFCY4AIAKExw\nAQAUJrgAAAoTXAAAhQkuAIDCBBcAQGGCCwCgMMEFAFCY4AIAKExwAQAUJrgAAAoTXAAAhQkuAIDC\nBBcAQGGttR4AaFybNm3Kjh07aj3Ga/T19SVJlixZUuNJ3qijoyOdnZ21HgMoQHABTaWtra3WIwBN\nSHABxXR2dtbdis1dd92VJLn00ktrPAnQTFzDBTSVhx9+OA8//HCtxwCajOACmsZdd92VoaGhDA0N\nHVnpAhgOggtoGq9e2bLKBQwnwQUAUJjgAprGn/3Zn73pNkBpggtoGq9+Z6J3KQLDSXABTePBBx98\n022A0gQX0DQ2b978ptsApQkuAIDCBBfQNObNm/em2wClCS6gacydO/dNtwFKE1xA03DRPFArggto\nGt/85jffdBugtNZaD9CsNm3alB07dtR6jNfo6+tLkixZsqTGk7xRR0dHOjs7az0GI9zg4OCbbgOU\nJrg4oq2trdYjQFGVSiXVavXINsBwGdbgGhoayvXXX58f/ehHOe6443LDDTfklFNOGc4R6kZnZ6cV\nGxhmv/M7v5Of//znR7bhWHDG4lfTrGcshvUaru985zt5+eWX841vfCOf//znc9NNNw3n7oEmd8UV\nV7zpNjSatrY2Zy3qzLCucO3cuTPve9/7kiR/9Ed/lP/8z/8czt0DTW7atGk54YQTjmzDseCMBe/E\nsAbXvn37Mm7cuCMfjxo1KocOHUpr61uPsXPnzuEYDWgSh1/0+dkCDKdhDa5x48alv7//yMdDQ0Nv\nG1tJMn369NJjAU3EzxSglLd7ITes13D98R//cbq7u5MkTzzxRE4//fTh3D0AQE0M6wrX7Nmz89hj\nj+Xiiy9OtVrNjTfeOJy7BwCoiWENrpaWlnzxi18czl0CANScX+0DAFCY4AIAKExwAQAUJrgAAAoT\nXAAAhQkuAIDCBBcAQGGCCwCgMMEFAFCY4AIAKExwAQAUJrgAAAoTXAAAhbXWeoCj2blzZ61HAAD4\ntVSq1Wq11kMAADQypxQBAAoTXAAAhQkuAIDCBBcAQGGCCwCgMMHFazz55JNZuHBhrceAIgYHB3PN\nNddkwYIFmT9/frZu3VrrkeCYe+WVV3Lttdfm4osvTmdnZ3784x/XeiQyAu7DxfD56le/mi1btmTM\nmDG1HgWK2LJlSyZMmJC1a9fml7/8ZT760Y/mAx/4QK3HgmPqkUceSZLce++92b59e9atW5e///u/\nr/FUWOHiiJNPPjnr16+v9RhQzNy5c/O5z30uSVKtVjNq1KgaTwTH3gUXXJDVq1cnSZ599tmMHz++\nxhORWOHiVebMmZOnn3661mNAMe3t7UmSffv25bOf/WwWL15c44mgjNbW1ixdujQPPfRQvvKVr9R6\nHGKFC2gyzz33XD7xiU/kwgsvzEc+8pFajwPF3Hzzzfn2t7+drq6u7N+/v9bjND3BBTSNF154IZdd\ndlmuueaazJ8/v9bjQBEPPPBA7rjjjiTJmDFjUqlU0tLin/ta818AaBobNmzISy+9lNtvvz0LFy7M\nwoULc/DgwVqPBcfUBz/4wfzwhz/MJZdckssvvzzLly/P6NGjaz1W0/PLqwEACrPCBQBQmOACAChM\ncAEAFCa4AAAKE1wAAIW50zwwIj399NOZO3duTj311FQqlQwODuaEE07ImjVr8ru/+7tv+PubN2/O\njh07ctNNN9VgWqDZWeECRqwTTjgh//qv/5oHHngg//7v/56zzjrryO+QA6gnVriAhjFjxow8/PDD\n2bZtW2666aZUq9X8/u//fm655ZbX/L1vfetb+drXvpaDBw9mYGAgN9xwQ84999x87Wtfy/3335+W\nlpacffbZ+eIXv5je3t6sXLkyhw4dSltbW9asWZM/+IM/qM03CIxYVriAhjA4OJhvfetbOfvss3P1\n1Vfn5ptvzr/927/ljDPOyP3333/k7w0NDeXee+/Nhg0bsmXLllx55ZX5p3/6pxw6dCh33HFH7rvv\nvmzevDkgomNMAAACCElEQVSVSiW7d+/OXXfdlU996lPZvHlzFi5cmCeeeKKG3yUwUlnhAkasn//8\n57nwwguTJC+//HLOPvvsLFiwIL29vZk2bVqS5K//+q+T/M81XEnS0tKS2267LQ8//HCeeuqp7Nix\nIy0tLWltbc0555yT+fPn5wMf+EAuueSSnHjiifmTP/mTfPGLX8x3v/vdvP/978+cOXNq880CI5rg\nAkasw9dwvVpvb+9rPt67d2/6+/uPfNzf358///M/z4UXXphzzz03Z5xxRr7+9a8nSW6//fY88cQT\n6e7uzhVXXJG/+7u/y9y5c3POOefkkUceyV133ZVHH300N9xwQ/lvDmgoggtoKJMmTUpfX19++tOf\n5rTTTss//uM/JklOOeWUJMnPfvaztLS0ZNGiRUmSFStW5JVXXklfX18WLFiQ++67L+ecc06ef/75\n/OhHP8o999yTD3/4w7n44otz6qmnZs2aNTX73oCRS3ABDaWtrS1r167NF77whQwODubkk0/O3/7t\n3+bb3/52kmTq1KmZNm1aPvShD2X06NE599xz8+yzz2bixIm5+OKLM3/+/IwZMya/93u/l4997GM5\n99xzc9111+X222/PqFGjsmzZshp/h8BIVKlWq9VaDwEA0Mi8SxEAoDDBBQBQmOACAChMcAEAFCa4\nAAAKE1wAAIUJLgCAwgQXAEBh/w9OjsxyWSYgYgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sb.boxplot(x='Pclass', y='Age', data=titanic_data, palette='hls')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassSexAgeSibSpParchFareEmbarked
003male22.0107.2500S
111female38.01071.2833C
213female26.0007.9250S
311female35.01053.1000S
403male35.0008.0500S
\n", "
" ], "text/plain": [ " Survived Pclass Sex Age SibSp Parch Fare Embarked\n", "0 0 3 male 22.0 1 0 7.2500 S\n", "1 1 1 female 38.0 1 0 71.2833 C\n", "2 1 3 female 26.0 0 0 7.9250 S\n", "3 1 1 female 35.0 1 0 53.1000 S\n", "4 0 3 male 35.0 0 0 8.0500 S" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Speaking roughly, we could say that the younger a passenger is, the more likely it is for them to be in 3rd class. The older a passenger is, the more likely it is for them to be in 1st class. So there is a loose relationship between these variables. So, let's write a function that approximates a passengers age, based on their class. From the box plot, it looks like the average age of 1st class passengers is about 37, 2nd class passengers is 29, and 3rd class pasengers is 24.\n", "\n", "So let's write a function that finds each null value in the Age variable, and for each null, checks the value of the Pclass and assigns an age value according to the average age of passengers in that class." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def age_approx(cols):\n", " Age = cols[0]\n", " Pclass = cols[1]\n", " \n", " if pd.isnull(Age):\n", " if Pclass == 1:\n", " return 37\n", " elif Pclass == 2:\n", " return 29\n", " else:\n", " return 24\n", " else:\n", " return Age" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we apply the function and check again for null values, we see that there are no more null values in the age variable." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Survived 0\n", "Pclass 0\n", "Sex 0\n", "Age 0\n", "SibSp 0\n", "Parch 0\n", "Fare 0\n", "Embarked 2\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data['Age'] = titanic_data[['Age', 'Pclass']].apply(age_approx, axis=1)\n", "titanic_data.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 2 null values in the embarked variable. We can drop those 2 records without loosing too much important information from our dataset, so we will do that." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Survived 0\n", "Pclass 0\n", "Sex 0\n", "Age 0\n", "SibSp 0\n", "Parch 0\n", "Fare 0\n", "Embarked 0\n", "dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data.dropna(inplace=True)\n", "titanic_data.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting categorical variables to a dummy indicators\n", "The next thing we need to do is reformat our variables so that they work with the model. Specifically, we need to reformat the Sex and Embarked variables into numeric variables." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
male
01
10
20
30
41
\n", "
" ], "text/plain": [ " male\n", "0 1\n", "1 0\n", "2 0\n", "3 0\n", "4 1" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gender = pd.get_dummies(titanic_data['Sex'],drop_first=True)\n", "gender.head()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
QS
001
100
201
301
401
\n", "
" ], "text/plain": [ " Q S\n", "0 0 1\n", "1 0 0\n", "2 0 1\n", "3 0 1\n", "4 0 1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "embark_location = pd.get_dummies(titanic_data['Embarked'],drop_first=True)\n", "embark_location.head()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassSexAgeSibSpParchFareEmbarked
003male22.0107.2500S
111female38.01071.2833C
213female26.0007.9250S
311female35.01053.1000S
403male35.0008.0500S
\n", "
" ], "text/plain": [ " Survived Pclass Sex Age SibSp Parch Fare Embarked\n", "0 0 3 male 22.0 1 0 7.2500 S\n", "1 1 1 female 38.0 1 0 71.2833 C\n", "2 1 3 female 26.0 0 0 7.9250 S\n", "3 1 1 female 35.0 1 0 53.1000 S\n", "4 0 3 male 35.0 0 0 8.0500 S" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassAgeSibSpParchFare
00322.0107.2500
11138.01071.2833
21326.0007.9250
31135.01053.1000
40335.0008.0500
\n", "
" ], "text/plain": [ " Survived Pclass Age SibSp Parch Fare\n", "0 0 3 22.0 1 0 7.2500\n", "1 1 1 38.0 1 0 71.2833\n", "2 1 3 26.0 0 0 7.9250\n", "3 1 1 35.0 1 0 53.1000\n", "4 0 3 35.0 0 0 8.0500" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_data.drop(['Sex', 'Embarked'],axis=1,inplace=True)\n", "titanic_data.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassAgeSibSpParchFaremaleQS
00322.0107.2500101
11138.01071.2833000
21326.0007.9250001
31135.01053.1000001
40335.0008.0500101
\n", "
" ], "text/plain": [ " Survived Pclass Age SibSp Parch Fare male Q S\n", "0 0 3 22.0 1 0 7.2500 1 0 1\n", "1 1 1 38.0 1 0 71.2833 0 0 0\n", "2 1 3 26.0 0 0 7.9250 0 0 1\n", "3 1 1 35.0 1 0 53.1000 0 0 1\n", "4 0 3 35.0 0 0 8.0500 1 0 1" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_dmy = pd.concat([titanic_data,gender,embark_location],axis=1)\n", "titanic_dmy.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a dataset with all the variables in the correct format!\n", "\n", "### Checking for independence between features" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAikAAAHRCAYAAACvuin3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0FGW+//FPdRYSSAAhEFwmyCLgxlU4gIwDM7IqKrJK\nAEFEYHTEo8gFcUPIUWBgWAYFHBUYWWQTVATlOuCCgzszqCCCMEgQbhZkTQjZ+vn9wc++kxE6TUh1\ndVW/X+f0OZ2uTtWnoDv55vs8T7VljDECAACIMD6nAwAAAJwNRQoAAIhIFCkAACAiUaQAAICIRJEC\nAAAiEkUKAACISLF27vw+63I7dx8Wv9u2xekIlaJLw4ucjlApYnyW0xEuWGLBT05HqBS+/V85HeGC\nDd5R1+kIlWJB32ucjlApDpwscTpCpbiibnLYjmXX79kXzA+27Pd80UkBAAARydZOCgAAsE+M+5vL\nQdFJAQAAEYlOCgAALhVjebuVQicFAABEJDopAAC4lNfnpFCkAADgUgz3AAAAOIBOCgAALuX14R46\nKQAAICLRSQEAwKW8PieFIgUAAJdiuAcAAMABdFIAAHAprw/30EkBAAARiU4KAAAu5fVOA0UKAAAu\nxXAPAACAA+ikAADgUixBBgAAcACdFAAAXIo5KQAAAA4I2kn54osvzrmtVatWlR4GAACEzutzUoIW\nKcuWLZMkZWZmqri4WNdee62+/fZbVatWTYsXLw5LQAAAcHZeH+4JWqTMmDFDkjRixAjNnTtXsbGx\nKi0t1YgRI8ISDgAARK+QJs7m5uYG7peWlurIkSO2BQIAAKGJ6uGen/Xp00e33nqrmjRpou+//17D\nhw+3OxcAAIhyIRUpAwcO1M0336zMzEzVr19ftWrVsjsXAAAoR1TPSfnZ999/r6efflonTpxQ9+7d\ndcUVV+imm26yOxsAAAjC68M9IV0n5ZlnntHkyZN10UUXqU+fPnruuefszgUAAKJcyFecrV+/vizL\nUq1atVStWjU7MwEAgBDQSZFUo0YNLV++XAUFBVq/fr2qV69udy4AABDlQipSJk2apB9//FEXXXSR\ntm/frmeffdbuXAAAoBwxlmXLLVKENNwze/Zs3XnnnWrcuLHdeQAAQIi8PtwTUpHSsmVLTZs2Tfn5\n+erVq5e6deumhIQEu7MBAIAoFtJwT9euXfWXv/xFM2bM0EcffaTf/OY3ducCAADlYLhH0qFDh/T6\n66/r3Xff1VVXXaWXXnrJ7lwAACDKhVSkPPjgg+rbt6+WLl2qpKQkuzMBAIAQRPWclKysLNWrV0/T\npk2TZVnKzc0NfNhggwYNwhIQAACcXSQNzdghaJGycOFCPfbYY3r66afLPG5ZlhYtWmRrMAAAEJn8\nfr8mTJigXbt2KT4+Xs8884zq168f2L527VotXLhQPp9PvXv31oABAyp0nKBFymOPPSZJuvvuu9Wh\nQwf5fCHNswUAAGHg1HDPxo0bVVRUpBUrVmjbtm2aMmWK5s2bF9g+depUrVu3TlWrVtWtt96qW2+9\nVTVq1Djv44RUdXzyySe64447NHPmTB04cOC8DwIAALxj69atateunSTpuuuu0/bt28tsb9q0qU6e\nPKmioiIZY2RVcFgqpImzTz31lIqKirRp0yZlZGSouLhYf/3rXyt0QAAAUDmcmpOSl5dXZiFNTEyM\nSkpKFBt7pqy44oor1Lt3byUmJqpz584V/jidkMdvvv76a/3973/XTz/9pLZt21boYAAAwP2SkpKU\nn58f+Nrv9wcKlO+++04ffPCBNm3apPfee09HjhzRO++8U6HjhNRJ6datm5o1a6a+ffvyuT0AAEQI\nn0OdlBYtWuj9999Xt27dtG3bNjVp0iSwLTk5WQkJCapSpYpiYmJUq1YtnThxokLHCalI6dWrl4YN\nG1ahAwAAAHtYDs2c7dy5s7Zs2aL09HQZYzRp0iS99dZbOnXqlPr166d+/fppwIABiouLU1pamnr2\n7Fmh44RUpGzevFn33HOPYmJiKnQQAADgHT6fTxkZGWUea9SoUeB+//791b9//ws+TkhFytGjR9Wu\nXTtddtllsixLlmVp+fLlF3xwAABQcT6PX3I2pCLlhRdeqNDOf7dtS4W+L5J8cN2NTkeoFNl/f8/p\nCJXi/qbxTke4YCbx/K8VEIkGflPH6QgXbGmXiq04iDTWvs+cjlAp6tdrUv6TXCHZ6QCeEVKR8vrr\nr//isZEjR1Z6GAAAEDorxtsXWQ2pSElJSZEkGWP07bffyu/32xoKAACUz6mJs+ESUpGSnp5e5mtW\n+gAAALuFVKTs27cvcD8nJ0eHDh2yLRAAAAgNE2cljR8/XpZl6fjx46pZs6bGjRtndy4AABDlgs64\n2bFjh3r06KH58+frrrvuUk5OjrKyslRcXByufAAA4Bwsn8+WW6QI2kmZOnWqpkyZovj4eM2aNUsv\nv/yy6tevr2HDhqljx47hyggAAM4iqod7/H6/mjVrpuzsbBUUFOjqq6+WdOZKcwAAAHYKWqT8/ImG\nH330UeCTj4uLi8t88iEAAHBGVC9Bbtu2rdLT05WVlaV58+YpMzNTGRkZ6tatW7jyAQCAKBW0SBkx\nYoQ6duyopKQkpaamKjMzU/369VPnzp3DlQ8AAJxD1F9x9t8/1TAtLU1paWm2BgIAAJBCvE4KAACI\nPFG9ugcAAEQuy+ftIsXbg1kAAMC16KQAAOBSPo9PnPX22QEAANeikwIAgEtF9cXcAABA5PJ6kcJw\nDwAAiEh0UgAAcCkmzv5/fr9fpaWl+vLLL1VUVGRnJgAAgNA6Kc8++6waNWqkQ4cOaceOHUpJSdEf\n//hHu7MBAIAgmJMi6ZtvvlF6err++c9/av78+crKyrI7FwAAKIfPZ9lyixQhFSl+v1/bt2/XZZdd\npqKiIuXn59udCwAARLmQhnvuuOMOTZw4UZMmTdK0adPUr18/u3MBAIByWB6fOBtSkTJw4EANHDhQ\nkjR06FBdfPHFtoYCAAAIqUh5+eWXVb16dZ04cUJr1qxRu3bt9Nhjj9mdDQAABOFj4qz07rvvqkeP\nHtq8ebPefvttffvtt3bnAgAAUS6kTorP59Phw4eVkpIiSSosLLQ1FAAAKJ/XlyCHVKS0adNGgwYN\n0rRp0zRp0iT99re/tTsXAAAoBxNnJY0aNUqjRo2SJF177bWKi4uzNRQAAEBIRcqmTZv06quvqri4\nWMYYHTt2TG+99Zbd2QAAQBBMnJU0a9YsjRw5UhdffLF69uyppk2b2p0LAABEuZCKlLp16+r666+X\nJPXq1UvZ2dm2hgIAAOWzfJYtt0gR0nBPXFycvvjiC5WUlOijjz7S0aNH7c4FAADK4fP4xNmQzm7i\nxIkqKSnR/fffr5UrV+r++++3OxcAAIhyQTsp+/btC9yvV6+epDMrfSwrclpBAABEq6i+Tsr48eMD\n9y3LkjEmUKAsWrTI3mQAACCqBS1SFi9eLOnMFWb37t2rq666Shs3buRibgAARACvX8wtpLMbM2aM\ndu7cKenMENC4ceNsDQUAAMpn+Xy23CJFSEmys7PVu3dvSdLw4cOVk5NjaygAAICQliBblqV9+/ap\nQYMGyszMlN/vtzsXAAAoh9eXIJdbpOTl5Wn06NEaNWqUDh8+rLp16yojIyMc2QAAQBQLWqQsWbJE\nCxYsUGxsrJ588km1b98+XLkAAEA5onri7Lp167RhwwYtX76cJccAACCsgnZS4uPjFR8fr1q1aqm4\nuDhcmQAAQAi83kkJaeKsJBljznvnXRpedN7fE2my//6e0xEqxc7fdHA6QqX4bO9WpyNcsDql5/9e\nikRLu7r//e2PS3Q6QqU4cmkrpyNUippxTidwn0haLmyHoEXKnj17NHr0aBljAvd/Nn36dNvDAQCA\n6BW0SJk1a1bgfnp6uu1hAABA6KyYGKcj2CpokdK6detw5QAAACgj5DkpAAAgsjBxFgAARCSfxyfO\nevvsAACAa9FJAQDApbw+3OPtswMAAK5FJwUAAJfyeieFIgUAAJfy+hVnvX12AADAteikAADgUl4f\n7vH22QEAANeikwIAgEvRSQEAAHAAnRQAAFzK5/FOCkUKAAAuxRJkAAAAB9BJAQDApZg4CwAA4ICQ\nOik//PCD9u/fr6ZNmyo1NVWWZdmdCwAAlMPrnZRyi5QlS5bob3/7m44fP64ePXooMzNT48ePD0c2\nAAAQRNRPnF2/fr0WLlyo5ORkDRkyRF999VU4cgEAgChXbifFGCPLsgJDPPHx8baHAgAA5fPFxDgd\nwVblFim33nqrBg4cqEOHDmn48OHq1KlTOHIBAIAoV26RMmjQIP3617/W7t271bBhQzVt2jQcuQAA\nQDmifuLsY489Fri/efNmxcXFqV69eho4cKBq1KhhazgAAHBuXi9Syj27wsJC1a1bV926ddOll16q\n7OxsFRUV6dFHHw1HPgAAEKXKLVKOHDmiUaNGqV27dho5cqSKi4v18MMP6+TJk+HIBwAAzsHy+Wy5\nRYpyk+Tl5Wnv3r2SpL179+rUqVM6evSoTp06ZXs4AAAQefx+v8aPH69+/fpp0KBB2r9//1mf99RT\nT+lPf/pThY9T7pyU8ePHa8yYMcrJyVFCQoJ69uypt99+W/fdd1+FDwoAAC6cU3NSNm7cqKKiIq1Y\nsULbtm3TlClTNG/evDLPWb58uXbv3q1WrVpV+Djlnl3z5s01YcIE/frXv1ZBQYF++uknDRw4UF27\ndq3wQQEAgHtt3bpV7dq1kyRdd9112r59e5nt//jHP/TVV1+pX79+F3Scc3ZSioqKtH79ei1dulTx\n8fHKy8vTpk2blJCQcEEHBAAAlcOpTkpeXp6SkpICX8fExKikpESxsbHKycnRnDlz9Pzzz+udd965\noOOcs0jp0KGDbrvtNv3pT3/S5ZdfrmHDhlGgAAAQQZya5JqUlKT8/PzA136/X7GxZ0qKDRs26OjR\noxoxYoRyc3N1+vRpNWzYUL169Trv45yzSLn77rv11ltv6eDBg+rTp4+MMRU4DQAA4DUtWrTQ+++/\nr27dumnbtm1q0qRJYNvgwYM1ePBgSdKaNWv0r3/9q0IFihSkSBk+fLiGDx+uzz//XKtWrdL27ds1\nbdo03XHHHWXCAAAAZ1g+Zz67p3PnztqyZYvS09NljNGkSZP01ltv6dSpUxc8D+Xflbu6p3Xr1mrd\nurVOnDihN998U2PHjtUbb7xRaQEAAIC7+Hw+ZWRklHmsUaNGv3heRTsoPyu3SPlZ9erVNWjQIA0a\nNOiCDggAACqJQ52UcAm5SAEAABEmgq4Oawdvnx0AAHAtOikAALiUFePt4R46KQAAICLRSQEAwK2Y\nOAsAACKSx4sUhnsAAEBEopMCAIBLOfXZPeHi7bMDAACuZWsnJcZn2bn7sLi/abzTESrFZ3u3Oh2h\nUixt1NLpCBds6oLBTkeoFFaH3k5HuGBZVS52OkKluCjOG39v+gqPOx2hclStFr5jMScFAAAg/JiT\nAgCAW3m8k0KRAgCASzFxFgAAwAF0UgAAcCuPD/fQSQEAABGJTgoAAG7l8U4KRQoAAC5lxXi7SGG4\nBwAARCQ6KQAAuBVLkAEAAMKPTgoAAG7FxFkAABCJLI8XKQz3AACAiEQnBQAAt2LiLAAAQPjRSQEA\nwKW8PieFIgUAALfyeJHCcA8AAIhIdFIAAHArJs4CAACEX8idFL/fryNHjqh27dqyLMvOTAAAIAR8\nCrKkd999V506ddKwYcPUpUsXbdmyxe5cAAAgyoXUSZk7d65WrVql2rVr6/Dhw7rvvvt044032p0N\nAAAE4/HVPSEVKTVr1lTt2rUlSSkpKUpKSrI1FAAACAFFilStWjXde++9atWqlbZv367Tp09rxowZ\nkqRHHnnE1oAAACA6hVSkdOrUKXA/NTXVtjAAACB0lseXIJdbpHz33Xfq2bOnioqKtGrVKsXHx6t3\n797yefwfBgAAOCtopbFw4UI99dRTKikp0dSpU7Vlyxbt2rVLkyZNClc+AABwLr4Ye24RImgnZcOG\nDVq+fLksy9K6dev07rvvqnr16kpPTw9XPgAAcC6Wt0c1gp5dtWrVFBMTo507d+pXv/qVqlevLkky\nxoQlHAAAiF5BOymWZWnfvn16/fXX1aFDB0nSDz/8oBiPX+EOAABXiOZOykMPPaSxY8fq4MGDGjx4\nsD7//HPdfffdGjt2bLjyAQCAKBW0k9K8eXOtWrUq8PV1112njRs3Ki4uzvZgAAAgOBPNnZSfffPN\nN+rVq5c6deqkQYMGadeuXXbnAgAA5bF89twiREgXc3v22Wc1depUNW7cWLt27dLEiRP16quv2p0N\nAABEsZCKlCpVqqhx48aSpKZNmzLcAwBAJLAspxPYKmiRsmLFijNPio3VhAkT1KpVK3399dd8wCAA\nALBd0CIlNzdXknT99ddLkvbt26fk5GRdeeWV9icDAADBefwjaoIWKX369FG9evW0b9++cOUBAACQ\nVE6RsnDhQj322GMaP368LMvS8ePHFRMTo6SkJC1atChcGQEAwFlE9RLk7t27q0ePHpo/f77uuusu\n5eTkKD8/X3fffXe48gEAgHPx+BLkoEmmTp2qKVOmKD4+XrNmzdLLL7+s1atX66WXXgpXPgAAEKWC\nDvf4/X41a9ZM2dnZKigo0NVXXy3pzGf6AAAAh0VQ18MOQc8uNvZMDfPRRx+pbdu2kqTi4mKdOnXK\n/mQAACCqBe2ktG3bVunp6crKytK8efOUmZmpjIwMdevWLVz5AADAuXi8kxK0SBkxYoQ6duyopKQk\npaamKjMzU/369VPnzp3DlQ8AAJyD11f3lHtZ/EaNGgXup6WlKS0tzdZAAAAAUoif3VNRiQU/2bn7\nsDCJNZyOUCnqlBqnI1SKqQsGOx3hgo0d6o1rDM3Z6f6O6v7SQqcjVIr42glOR6gUVTzeFbCFx//N\nvH12AADAtWztpAAAABt5/JIgFCkAALgVwz0AAADhRycFAACX8voSZG+fHQAAcC06KQAAuJXP270G\nb58dAABwLTopAAC4lcfnpFCkAADgVh4vUrx9dgAAwLXopAAA4FZ0UgAAAMKPTgoAAC7l9Yu5UaQA\nAOBWHi9SvH12AADAteikAADgVpbldAJb0UkBAAARiU4KAABu5fE5KRQpAAC4lFOre/x+vyZMmKBd\nu3YpPj5ezzzzjOrXrx/Y/t5772nOnDmKjY1V7969deedd1boON4uwQAAQKXbuHGjioqKtGLFCo0e\nPVpTpkwJbCsuLtbkyZO1YMECLV68WCtWrNDhw4crdJyQOik//vij/ud//kcFBQWBx0aOHFmhAwIA\ngEriUCdl69atateunSTpuuuu0/bt2wPb9u7dq7S0NNWoUUOS1LJlS33xxRe65ZZbzvs4IZ3d6NGj\nVVBQoJSUlMANAABEp7y8PCUlJQW+jomJUUlJSWBbcnJyYFu1atWUl5dXoeOE1ElJSEigcwIAQIQx\nDi1BTkpKUn5+fuBrv9+v2NjYs27Lz88vU7Scj6CdlH379mnfvn1KSUnRW2+9pX/961+BxwAAQHRq\n0aKFNm/eLEnatm2bmjRpEtjWqFEj7d+/X8eOHVNRUZG+/PJLXX/99RU6TtBOyvjx4wP3V65cGbhv\nWZYWLVpUoQMCAIDKYYwzx+3cubO2bNmi9PR0GWM0adIkvfXWWzp16pT69euncePG6d5775UxRr17\n91ZqamqFjhO0SFm8eLEkqbCwUHv37tVVV12ljRs36re//W2FDgYAACqP36EqxefzKSMjo8xjjRo1\nCtzv0KGDOnTocOHHCeVJY8aM0c6dOyWdGQIaN27cBR8YAAAgmJCKlOzsbPXu3VuSNHz4cOXk5Nga\nCgAAlM/YdIsUIRUplmUFJstmZmbK7/fbGgoAACCkJciPP/64Ro0apcOHD6tu3bq/GIcCAADh54+k\ntocNQipSvvjiC73xxht2ZwEAAOfBOLW8J0xCGu758MMPVVpaancWAACAgJA6KUePHlW7du102WWX\nybIsWZal5cuX250NAAAEwXCPpBdeeMHuHAAAAGWEVKSUlJRow4YNKi4uliTl5OQweRYAAId5vJES\n+qcgS9I//vEP/fjjjzp27JitoQAAQPn8xp5bpAipSKlatap+//vfKzU1VVOmTNHhw4ftzgUAAKJc\nSMM9lmUpNzdX+fn5OnXqlE6dOmV3LgAAUI6oX4Kcl5enkSNHauPGjbrjjjvUqVMntW3bNhzZAABA\nFAvaSVmyZIkWLFig2NhYPfnkk2rfvr06duwYrmwAACAIr39ITdBOyrp167RhwwYtX75cixYtClcm\nAACA4J2U+Ph4xcfHq1atWoHlxwAAIDJ4fEpKaBNnJe9PzgEAwG0iabmwHYIWKXv27NHo0aNljAnc\n/9n06dNtDwcAAKJX0CJl1qxZgfvp6em2hwEAAKHz+ihH0CKldevW4coBAABQhmVsLMNKtq63a9dh\nM/CbOk5HqBRLu17kdIRKYRV54EKCBSecTlApHrhykNMRLtiDh752OkKlSK4S0sXDI15BiTcW1Dat\nWz1sx8o8kmfLftNqJdmy3/MV8sRZAAAQWTw+2hPaZ/cAAACEG50UAABcyu/xVgqdFAAAEJHopAAA\n4FLe7qNQpAAA4Fpev+Iswz0AACAi0UkBAMClPD5vlk4KAACITHRSAABwKb/Hp87SSQEAABGJTgoA\nAC7l9TkpFCkAALgUS5ABAAAcQCcFAACX8vpwD50UAAAQkeikAADgUl5fgkyRAgCASzHcAwAA4AA6\nKQAAuJTf460UOikAACAihdxJycvL048//qi0tDRVrVrVzkwAACAEpX6nE9grpCJlw4YNeuGFF1Ra\nWqqbb75ZlmXpD3/4g93ZAABAEAz3SPrrX/+qlStXqmbNmvrDH/6gjRs32p0LAABEuZA6KTExMYqP\nj5dlWbIsS4mJiXbnAgAA5SilkyK1bNlSo0ePVnZ2tsaPH69rr73W7lwAACDKhdRJGT58uP75z3/q\nyiuvVMOGDdWhQwe7cwEAgHJ4fU5KSEXKiBEjtGzZMrVv397uPAAAAJJCLFJq1KihV155RQ0aNJDP\nd2aE6De/+Y2twQAAQHAsQZZ00UUX6bvvvtN3330XeIwiBQAAZzHcI2ny5Mllvs7JybElDAAAwM9C\nKlL+/Oc/a9myZSouLtbp06d1+eWXa/369XZnAwAAQbAEWdJ7772nzZs36/bbb9fbb7+t1NRUu3MB\nAIAoF1InpU6dOoqPj1d+fr7q16+v4uJiu3MBAIBy+L3dSAmtSKlXr55ee+01JSYmavr06Tpx4oTd\nuQAAQDlKPV6lBB3umTt3riQpIyNDjRo10tixY1W3bl1Nnz49LOEAAED0ClqkfPrpp2ee5PNp5syZ\nSkpK0qBBg9S4ceOwhAMAAOfmN8aWW6QIWqSYfwtqIig0AADwvqBzUizLOut9AADgvFKP9w+CFik7\nduxQenq6jDHas2dP4L5lWVq+fHm4MgIAgLOIpKEZOwQtUtauXRuuHAAAAGUELVIuvfTScOUAAADn\nKaqXIAMAADglpIu5AQCAyOP1OSl0UgAAQESikwIAgEtF9RJkAAAQubw+3GNrkTJ4R107dx8WS7tU\ndzpCpfDHJTodoVJkVbnY6QgXbH9podMRKsWDh752OsIFe+6S5k5HqBQZx3Y4HaFSXFKY7XSESuKN\n3xuRgE4KAAAu5WcJMgAAQPjRSQEAwKWYOAsAACKS1yfOMtwDAAAiEp0UAABcqpROCgAAQPjRSQEA\nwKW8vgSZIgUAAJfy+uoehnsAAEBEokgBAMCl/MbYcquI06dP68EHH9SAAQM0fPhwHTly5OyZ/X4N\nGzZMy5YtK3efFCkAAOCCLVu2TE2aNNGrr76qHj16aO7cuWd93qxZs3TixImQ9kmRAgCAS5UaY8ut\nIrZu3ap27dpJktq3b69PPvnkF8/ZsGGDLMsKPK88TJwFAMClSh1a3bNq1Sq98sorZR6rXbu2kpOT\nJUnVqlXTyZMny2zfvXu31q1bp9mzZ2vOnDkhHYciBQAAnJe+ffuqb9++ZR4bOXKk8vPzJUn5+fmq\nXr16me1vvPGGsrOzdffdd+vgwYOKi4vTpZdeqvbt25/zOBQpAAC4lFOdlLNp0aKFPvzwQzVv3lyb\nN29Wy5Yty2wfO3Zs4P5zzz2nlJSUoAWKxJwUAABQCfr376/vv/9e/fv314oVKzRy5EhJ0sKFC7Vp\n06YK7ZNOCgAALhVJnZTExETNnj37F4/fc889v3jswQcfDGmfdFIAAEBEopMCAIBLRVInxQ4UKQAA\nuJTXixSGewAAQEQKuZPyww8/aP/+/WratKlSU1NlWZaduQAAQDm83kkJqUhZsmSJ/va3v+n48ePq\n0aOHMjMzNX78eLuzAQCAKBbScM/69eu1cOFCJScna8iQIfrqq6/szgUAAMpR6je23CJFSJ0UY4ws\nywoM8cTHx9saCgAAlC+SCgo7hFSk3HbbbRo4cKAOHTqk4cOHq1OnTnbnAgAAUS6kIuWuu+5S27Zt\ntXv3bjVo0EDNmjWzOxcAAChHVHdSpk+f/otVPDt37tTbb7+tRx55xNZgAAAgugUtUho2bBiuHAAA\n4DxFdSelZ8+ekqSSkhJ98803KikpkTFGOTk5YQkHAADOrSSai5SfjRw5UsXFxcrJyVFpaanq1q2r\n2267ze5sAAAgioV0nZSjR49q/vz5at68udasWaPCwkK7cwEAgHJ4/TopIRUpCQkJkqSCgoLAfQAA\nADuFNNzTpUsXzZkzR82aNVO/fv2UmJhody4AAFCOSOp62CGkIqVevXr6+9//ruLiYiUkJCgmJsbu\nXAAAIMqFVKRMnTpVGRkZqlGjht15AABAiEoNnRRdccUVatOmjd1ZAADAeWC4R1LHjh3Vr1+/Mhd3\nmzx5sm2hAAAAQipSFi9erGHDhik5OdnuPAAAIER0UiSlpKSoW7dudmcBAAAICKlISUhI0L333qur\nrroq8IGDfMAgAADOopMi6aabbrI7BwAAOE+lfr/TEWwVUpHy8wcNAgAAhEtIRQoAAIg8Xh/uCemz\newAAAMIT24A+AAAS3klEQVSNTgoAAC7l9U4KRQoAAC5VQpFScQv6XmPn7sPC2veZ0xEqxZFLWzkd\noVJcFOf+Ecr42glOR6gURaXu/+GYcWyH0xEqxfiaVzsdoVLEvrbW6QiV4vneTifwDjopAAC4lNeH\ne9z/ZykAAPAkOikAALgUnRQAAAAH0EkBAMClvN5JoUgBAMClvF6kMNwDAAAiEp0UAABcik4KAACA\nA+ikAADgUsbjnRSKFAAAXMrv8SKF4R4AABCR6KQAAOBSxtBJAQAACDs6KQAAuBQTZwEAQERi4iwA\nAIAD6KQAAOBSxu90AnvRSQEAABGJTgoAAC7FEmRJRUVFOnjwoE6fPi1JOnHihAoKCmwNBgAAolvQ\nTkpxcbEmT56sDz/8UCkpKfrf//1f/e53v1NxcbHuueceNWnSJFw5AQDAf/D66p6gRcqcOXNUu3Zt\nbdq0SZLk9/v15JNP6qeffqJAAQDAYVF9nZTPPvtMy5YtC3zt8/mUnZ2to0eP2h4MAABEt6BzUny+\nX26eOXOmEhISbAsEAABCY/zGllukCFqkJCQkKDMzs8xjx44dU2Jioq2hAAAAgg73jBo1Svfdd5/u\nvPNOXXbZZTpw4IBee+01TZs2LVz5AADAOfijeQnyNddco4ULF6qoqEibN29WYWGh5s+fr6uuuipc\n+QAAwDl4fbin3Iu5paamasSIEeHIAgAAEMAVZwEAcKlI6nrYgc/uAQAAEYlOCgAALhXVV5wFAACR\niw8YBAAAcACdFAAAXMr4nU5gLzopAAAgItFJAQDApbw+cZZOCgAAiEh0UgAAcCmvX8yNIgUAAJfy\nepHCcA8AAIhIdFIAAHApPxdzAwAACD86KQAAuJTX56RQpAAA4FJeL1IY7gEAABGJTgoAAC7l9SvO\n2lqkHDhZYufuw6J+vSZOR6gUNeOcTlA5fIXHnY5wwapY3mhg7itJcDrCBbukMNvpCJUi9rW1Tkeo\nFCV9ujsdoXKYH5xO4Bl0UgAAcCkTQUuQT58+rTFjxuinn35StWrV9Mc//lG1atUq85wFCxZo3bp1\nsixL9913nzp37hx0n974kw4AgChk/MaWW0UsW7ZMTZo00auvvqoePXpo7ty5ZbafOHFCixYt0vLl\ny7VgwQJNmjSp3H1SpAAAgAu2detWtWvXTpLUvn17ffLJJ2W2JyYm6pJLLlFBQYEKCgpkWVa5+2S4\nBwAAl3Jq4uyqVav0yiuvlHmsdu3aSk5OliRVq1ZNJ0+e/MX3XXzxxbr11ltVWlqq3//+9+UehyIF\nAACcl759+6pv375lHhs5cqTy8/MlSfn5+apevXqZ7Zs3b1ZOTo42bdokSbr33nvVokULNW/e/JzH\noUgBAMCljL/U6QgBLVq00IcffqjmzZtr8+bNatmyZZntNWrUUEJCguLj42VZlpKTk3XixImg+6RI\nAQAAF6x///569NFH1b9/f8XFxWn69OmSpIULFyotLU0dO3bUxx9/rDvvvFM+n08tWrTQjTfeGHSf\nlrFx/dL3Ob8cj3Kb+rHuPwdJ8ifUcDpCpfAVeuD/wyvXSSly/3VSGpV64zopD39c4HSESuGV66S8\nEMbrpNQf+qot+92/YIAt+z1fdFIAAHCpSBrusYM3/qQDAACeQycFAACXMqV0UgAAAMKOTgoAAC7l\n9TkpFCkAALiU14sUhnsAAEBEopMCAIBL0UkBAABwAJ0UAABcyuudFIoUAABcyutFCsM9AAAgIp1X\nJ+XEiRPy+XxKSkqyKw8AAAiRP5o7KTt27FCPHj1UXFysd999V127dlXv3r313nvvhSsfAACIUkE7\nKVOnTtWUKVMUFxenWbNm6eWXX1b9+vU1bNgwdejQIVwZAQDAWXh9TkrQIsXv96tZs2bKzs5WQUGB\nrr76akmSz8dUFgAAYK+gRUps7JnNH330kdq2bStJKi4uVn5+vv3JAABAUFHdSWnbtq3S09OVlZWl\nefPmKTMzUxkZGerWrVu48gEAgHMwpVFcpIwYMUIdO3ZUUlKSUlNTlZmZqX79+qlz587hygcAAKJU\nuUuQGzVqFLiflpamtLQ0WwMBAIDQeH24hxmwAAAgInFZfAAAXMrrnRSKFAAAXMrrRQrDPQAAICLR\nSQEAwKWM3+90BFvRSQEAABGJTgoAAC7l9TkpFCkAALiU14sUhnsAAEBEopMCAIBL+emkAAAAhB+d\nFAAAXMrrn4JMJwUAAEQkOikAALiU11f3UKQAAOBSXi9SGO4BAAARiU4KAAAuRScFAADAAXRSAABw\nKa93UixjjHE6BAAAwH9iuAcAAEQkihQAABCRKFIAAEBEokgBAAARiSIFAABEJIoUAAAQkRy7TsqL\nL76ojz/+WCUlJbIsS48++qiuueaaCu3r2Wef1T333KNLLrmkQt8/atQopaenq02bNhX6/p999tln\nevjhh9W4cWNJUmFhoW6//XYNGjToF88dNGiQJkyYoEaNGl3QMZ3w0ksv6ZVXXtGmTZtUpUoVp+OU\n62yvtTfffFP33HOPVq9erZSUFPXv37/M93z99deaNWuW/H6/8vPzdcstt2jo0KEOncH5vbZCEQmv\nvx9//FHdu3fX1VdfHXisTZs2GjlypGOZ7LZmzRr961//0n//9387HSXqVObvHISPI0XKnj179N57\n72nZsmWyLEs7d+7Uo48+qrVr11Zof0888UQlJ6y4G264QTNnzpQkFRUV6eabb9Ydd9yh6tWrO5ys\n8qxdu1bdunXT+vXr1atXL6fjBFXR11pGRob++Mc/qlGjRiouLlZ6erpuuOEGXXXVVWFK/ktefG01\nbtxYixcvdjoGgti+fbtmzJihgoICGWPUpk0bPfDAA4qPj3c6Wsgq+3cOwseRIiU5OVmHDh3Sa6+9\npvbt2+vKK6/Ua6+9Vuavu2XLlunw4cPq2bOn7r//ftWsWVPt27fXmjVr9Pbbb8uyLGVkZKht27Za\ntGiRJkyYoDFjxmj27Nm67LLLtGHDBn355Zd66KGH9MQTT+jo0aOSpCeffFJNmzbV0qVLtWrVKtWp\nU0c//fSTLeeZl5cnn8+n7777TtOnT5ff71dqaqr+9Kc/BZ6TlZWlCRMmqLCwULm5uXr44YfVqVMn\nzZw5U5999plKSkrUpUsXjRgxQkuXLtUbb7whn8+na6+9Vk8++aQtuYP57LPPlJaWpvT0dI0ZM0a9\nevXS119/rYkTJ6patWqqXbu2qlSpoilTpmjx4sVat26dLMtSt27dNHjw4LDnLe+1JkkbN27UO++8\no9OnT+vJJ59U8+bNlZKSoqVLl6pXr1668sortWzZMsXHx2vNmjXauHGj8vPzdfToUT3wwAPq2rVr\n2M/r319bzz//vIwxys/P1/Tp0xUXF1fmPdO6dWtNmjTpF6+/OXPm6PDhwyooKNCMGTP0q1/9Kuzn\n8Z9KS0s1fvx4ZWVlKScnRx06dNCoUaM0btw4HTt2TMeOHdNf/vIXvfzyy/ryyy/l9/s1ZMgQ3XLL\nLWHPumbNGr3//vs6ffq0cnNzNXjwYG3atEnff/+9xo4dq6ysLL377rsqKCjQRRddpOeff77M90fC\n+6M8WVlZGjNmjObOnasGDRrIGKM5c+Zo8uTJevrpp52OF7Jz/RyACxiHbN++3YwbN8789re/NV27\ndjUbNmwwd911l9mzZ48xxphXX33VzJ492xw4cMC0adPGFBYWGmOMeeihh8znn39uCgsLTbdu3Uxx\ncXHg+5YuXWqee+45Y4wxw4cPN7t27TJTp041S5cuNcYYs2/fPpOenm5yc3NNly5dTGFhoSkqKjK3\n3Xab+fTTTy/4nD799FNzww03mLvuussMGjTIDB061HzwwQeme/fugfNauXKl2b59eyDzli1bAsfe\nunWrGTJkiDHGmJtuuskcOHDAFBYWmmXLlhljjOnVq5f56quvjDHGLF261BQXF19w5vM1evRo8/77\n7xtjjElPTzfbtm0zPXr0MLt37zbGGDNjxgzz6KOPmu+//96kp6ebkpISU1JSYgYNGmT27t0b9rzG\nBH+tzZ492zz11FPGGGN2795tevToYYwx5uTJk+b55583vXv3Nq1btzYZGRmmsLDQrF692gwZMsSU\nlpaa3Nxc87vf/S4s/w/nem0tWbLEZGVlGWOMmTdvnpk7d+4v3jPnev298cYbxhhjZs+ebV588UXb\nz+E/HThwwFx//fXmrrvuCty+/PJLs3LlSmOMMadPnzatW7c2xhjz6KOPmoULFxpjjPnggw/Mww8/\nHHhO9+7dzfHjx8Oef/Xq1eaee+4xxhizbt0606dPH+P3+80nn3xifv/735vnnnvOlJaWGmOMGTp0\nqPnyyy/N6tWrzbRp0yLq/RHMCy+8YObPn1/mMb/fb2666SZTUFDgUKqKOdvPAUQ+Rzop+/fvV1JS\nkiZPnixJ+uabbzR8+HDVqVPn34unwP3LLrss0Fq888479frrrys3N1cdOnRQbOz/ncLtt9+uAQMG\nqG/fvsrLy1OTJk20e/duffrpp3rnnXckScePH1dmZqYaN24c2Gfz5s0r7dz+vSX/s8cffzww9t+3\nb98y2+rUqaN58+bptddek2VZKikpkSRNmzZN06dP1+HDh9WuXTtJ0uTJk7VgwQJNnTpV1113XZl/\no3A4fvy4Nm/erCNHjmjx4sXKy8vTkiVLlJOToyuuuEKS1LJlS7399tvavXu3Dh06pCFDhgS+d//+\n/WrYsGFYM4fyWmvVqpUk6YorrlBubq4KCwu1Y8cOPfDAA3rggQd07NgxPfbYY1qxYoWqVaumVq1a\nyefzKSUlRdWrV9eRI0dUt25d28/lbK+tjRs36tlnn1XVqlWVnZ2tFi1aSCr7njl8+PBZX38/j8en\npKTo8OHDtuc/m/8c7snLy9Obb76pTz/9VElJSSoqKgpsa9CggSRp9+7d2rFjR2A+TklJiQ4ePOjI\nsNeVV14p6cxf6o0aNZJlWapRo4aKi4sVFxenRx55RFWrVlVWVlbgvf3zOUTC+6M8Bw8eDPz8+Zll\nWUpJSVFubm5EdN9Cca6fA23atFHNmjUdTodgHFnds2vXLmVkZAR+ADVo0EDVq1dXzZo1lZubK0n6\n9ttv/y+k7/9itm3bVjt37tTq1at/8Qs/OTlZ11xzjSZPnhyYK9GwYUMNGTJEixcv1qxZs9S9e3dd\nfvnl2rNnj06fPq3S0lLt3LnT1vOtW7eufvjhB0lnJm/97W9/C2z785//rDvuuEPTpk1TmzZtZIxR\nUVGRNmzYoBkzZmjRokV6/fXXdfDgQa1cuVITJ07UkiVLtHPnTv3zn/+0Nfd/Wrt2rXr37q0FCxZo\n/vz5WrlypbZs2aIqVapoz549kqSvvvpK0pl/98aNG2vRokVavHixevXqpaZNm4Y1r3Tu11pMTEzg\nOV9//XXguZdccoksy9KYMWO0b98+SVLNmjV16aWXBn7p79ixQ9KZX/55eXmqXbt2OE+pjKeeekqT\nJk3SlClTVLdu3UDh+u/vmWCvv0izZs0aJScna/r06Ro6dKhOnz4dOCfLsiSdeW21adNGixcv1iuv\nvKJbbrnFsV+WP2f6T8XFxdq4caNmzZqlp556Sn6/v8wfFZHy/ijPxRdfrAMHDpR5zO/369ChQ46+\n7s9XKD8HEJkc6aR06dJFe/fuVZ8+fVS1alUZYzR27FjFxcVp4sSJuuSSS875l6llWeratas+/vhj\npaWl/WJ73759NWzYME2aNEmSdN999+mJJ57QypUrlZeXp5EjR6pWrVoaPny40tPTVatWLSUmJtp6\nvhMnTtTjjz8un8+nOnXqaMiQIVq0aJEk6eabb9bUqVP14osvql69ejp69Kji4+NVo0YN3XnnnUpI\nSNCNN96oSy65RE2bNtWAAQNUrVo1paam6r/+679szf2fVq1apalTpwa+TkxMVJcuXZSSkqLHH39c\nVatWVVxcnFJTU9WsWTO1bdtW/fv3V1FRkZo3b67U1NSw5pXO/Vp75ZVXAs/58ccfNXjwYBUVFSkj\nI0Px8fGaNWuWHn/88cBKgGuvvVa9e/fW2rVrdfjwYd199906efKknn76aUd/0HXv3l0DBw5UYmKi\nUlJSlJOT84vnBHv9RZq2bdtq9OjR2rZtm+Lj41W/fv1fnFOHDh30+eefa8CAATp16pQ6deqkpKQk\nhxKfXWxsrBITE5Weni7pTMf0388jUt4f5enRo4eGDh2qDh06qFatWnr44YeVmpqqm266SVWrVnU6\nXsjO9XMgOTnZ6WgoB5+CjAu2dOlS3XLLLapVq5ZmzpypuLg4zy4jZQkpos327ds1c+ZM5efn6/Tp\n00pJSVFKSorGjRvHUAls59h1UuAdtWvX1tChQ1W1alUlJydrypQpTkcCUEmuueYazZ8/v8xj3333\nneLi4hxKhGhCJwUAAEQkLosPAAAiEkUKAACISBQpAAAgIlGkAACAiESRAgAAIhJFCgAAiEj/D+dK\nvCTma2foAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sb.heatmap(titanic_dmy.corr()) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fare and Pclass are not independent of each other, so I am going to drop these." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedAgeSibSpParchmaleQS
0022.010101
1138.010000
2126.000001
3135.010001
4035.000101
\n", "
" ], "text/plain": [ " Survived Age SibSp Parch male Q S\n", "0 0 22.0 1 0 1 0 1\n", "1 1 38.0 1 0 0 0 0\n", "2 1 26.0 0 0 0 0 1\n", "3 1 35.0 1 0 0 0 1\n", "4 0 35.0 0 0 1 0 1" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic_dmy.drop(['Fare', 'Pclass'],axis=1,inplace=True)\n", "titanic_dmy.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Checking that your dataset size is sufficient\n", "We have 6 predictive features that remain. The rule of thumb is 50 records per feature... so we need to have at least 300 records in this dataset. Let's check again." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 889 entries, 0 to 890\n", "Data columns (total 7 columns):\n", "Survived 889 non-null int64\n", "Age 889 non-null float64\n", "SibSp 889 non-null int64\n", "Parch 889 non-null int64\n", "male 889 non-null uint8\n", "Q 889 non-null uint8\n", "S 889 non-null uint8\n", "dtypes: float64(1), int64(3), uint8(3)\n", "memory usage: 37.3 KB\n" ] } ], "source": [ "titanic_dmy.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, we have 889 records so we are fine." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\piers\\Anaconda3\\lib\\site-packages\\ipykernel_launcher.py:1: DeprecationWarning: \n", ".ix is deprecated. Please use\n", ".loc for label based indexing or\n", ".iloc for positional indexing\n", "\n", "See the documentation here:\n", "http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate_ix\n", " \"\"\"Entry point for launching an IPython kernel.\n" ] } ], "source": [ "X = titanic_dmy.ix[:,(1,2,3,4,5,6)].values\n", "y = titanic_dmy.ix[:,0].values" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .3, random_state=25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Deploying and evaluating the model" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", " verbose=0, warm_start=False)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LogReg = LogisticRegression()\n", "LogReg.fit(X_train, y_train)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_pred = LogReg.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[137, 27],\n", " [ 34, 69]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "confusion_matrix = confusion_matrix(y_test, y_pred)\n", "confusion_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results from the confusion matrix are telling us that 137 and 69 are the number of correct predictions. 34 and 27 are the number of incorrect predictions." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0 0.80 0.84 0.82 164\n", " 1 0.72 0.67 0.69 103\n", "\n", "avg / total 0.77 0.77 0.77 267\n", "\n" ] } ], "source": [ "print(classification_report(y_test, y_pred))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 1 }