{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic regression exercise with Titanic data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "- Data from Kaggle's Titanic competition: [data](https://github.com/justmarkham/DAT8/blob/master/data/titanic.csv), [data dictionary](https://www.kaggle.com/c/titanic/data)\n", "- **Goal**: Predict survival based on passenger characteristics\n", "- `titanic.csv` is already in our repo, so there is no need to download the data from the Kaggle website" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Read the data into Pandas" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
PassengerId
103Braund, Mr. Owen Harrismale2210A/5 211717.2500NaNS
211Cumings, Mrs. John Bradley (Florence Briggs Th...female3810PC 1759971.2833C85C
313Heikkinen, Miss. Lainafemale2600STON/O2. 31012827.9250NaNS
411Futrelle, Mrs. Jacques Heath (Lily May Peel)female351011380353.1000C123S
503Allen, Mr. William Henrymale35003734508.0500NaNS
\n", "
" ], "text/plain": [ " Survived Pclass \\\n", "PassengerId \n", "1 0 3 \n", "2 1 1 \n", "3 1 3 \n", "4 1 1 \n", "5 0 3 \n", "\n", " Name Sex Age \\\n", "PassengerId \n", "1 Braund, Mr. Owen Harris male 22 \n", "2 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38 \n", "3 Heikkinen, Miss. Laina female 26 \n", "4 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35 \n", "5 Allen, Mr. William Henry male 35 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "PassengerId \n", "1 1 0 A/5 21171 7.2500 NaN S \n", "2 1 0 PC 17599 71.2833 C85 C \n", "3 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 1 0 113803 53.1000 C123 S \n", "5 0 0 373450 8.0500 NaN S " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/titanic.csv'\n", "titanic = pd.read_csv(url, index_col='PassengerId')\n", "titanic.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Create X and y\n", "\n", "Define **Pclass** and **Parch** as the features, and **Survived** as the response." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "feature_cols = ['Pclass', 'Parch']\n", "X = titanic[feature_cols]\n", "y = titanic.Survived" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 3: Split the data into training and testing sets" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.cross_validation import train_test_split\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 4: Fit a logistic regression model and examine the coefficients\n", "\n", "Confirm that the coefficients make intuitive sense." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[('Pclass', -0.88188860564511296), ('Parch', 0.34239215857498861)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "logreg = LogisticRegression(C=1e9)\n", "logreg.fit(X_train, y_train)\n", "zip(feature_cols, logreg.coef_[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 5: Make predictions on the testing set and calculate the accuracy" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# class predictions (not predicted probabilities)\n", "y_pred_class = logreg.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.668161434978\n" ] } ], "source": [ "# calculate classification accuracy\n", "from sklearn import metrics\n", "print metrics.accuracy_score(y_test, y_pred_class)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 6: Compare your testing accuracy to the null accuracy" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 0.573991\n", "dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this works regardless of the number of classes\n", "y_test.value_counts().head(1) / len(y_test)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.5739910313901345" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this only works for binary classification problems coded as 0/1\n", "max(y_test.mean(), 1 - y_test.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Confusion matrix of Titanic predictions" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[105 23]\n", " [ 51 44]]\n" ] } ], "source": [ "# print confusion matrix\n", "print metrics.confusion_matrix(y_test, y_pred_class)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# save confusion matrix and slice into four pieces\n", "confusion = metrics.confusion_matrix(y_test, y_pred_class)\n", "TP = confusion[1][1]\n", "TN = confusion[0][0]\n", "FP = confusion[0][1]\n", "FN = confusion[1][0]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True Positives: 44\n", "True Negatives: 105\n", "False Positives: 23\n", "False Negatives: 51\n" ] } ], "source": [ "print 'True Positives:', TP\n", "print 'True Negatives:', TN\n", "print 'False Positives:', FP\n", "print 'False Negatives:', FN" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.463157894737\n", "0.463157894737\n" ] } ], "source": [ "# calculate the sensitivity\n", "print TP / float(TP + FN)\n", "print 44 / float(44 + 51)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.8203125\n", "0.8203125\n" ] } ], "source": [ "# calculate the specificity\n", "print TN / float(TN + FP)\n", "print 105 / float(105 + 23)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# store the predicted probabilities\n", "y_pred_prob = logreg.predict_proba(X_test)[:, 1]" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEPCAYAAABY9lNGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF8tJREFUeJzt3X2UJXV95/H3hxkJ4oIEUUDBAEYUE0FAEZ9ia0hEgoDR\noEYNEp9WEiWeqAtmI0POiVGz8SkeXUXBURcEBBE8rDISW90oAiKIIqJGjIgMPuEisjgj3/2jqplL\n8+uZ29Pd996Zeb/O6TN16/6q6ntruutz6+lXqSokSZptq3EXIEmaTAaEJKnJgJAkNRkQkqQmA0KS\n1GRASJKaliwgkpyaZHWSqwfG7ZhkVZLrklyUZIeB905M8u0k1yb546WqS5I0nKXcgzgNOHTWuBOA\nVVW1N3Bx/5okjwCeAzyin+bdSdy7kaQxWrKNcFV9Afj5rNFHACv74ZXAUf3wkcAZVbWmqq4HvgMc\ntFS1SZI2bNTf0neuqtX98Gpg5374gcANA+1uAB40ysIkSXc3tsM41fXxsb5+PuwDRJLGaPmIl7c6\nyS5VdVOSXYGb+/E/BHYfaLdbP+5ukhgakrQRqirznWbUexDnA8f0w8cA5w2Mf26SrZPsCTwUuLQ1\ng6ryp4qTTjpp7DVMyo/rwnXhulj/z8Zasj2IJGcATwZ2SvID4A3Am4CzkrwYuB44GqCqrklyFnAN\nsBY4rhbyqSRJC7ZkAVFVz5vjrUPmaP9G4I1LVY8kaX6812ATNTU1Ne4SJobrYh3XxTqui4XLpnQk\nJ4lHniRpnpJQm8BJaknSJsKAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAk\nNY36eRBaAnfccQdr165dlHktW7aMbbbZZlHmJWnTZl9Mm4GXvvQ4Tj31/Wy11b0WNJ+qtRx55LM4\n55zTF6kySZNgY/ticg9iM7B2Ldx559u5887jFjin0/n1rz+5KDVJ2vR5DkKS1GRASJKaDAhJUpMB\nIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCS\npCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTWMJiCQnJvlGkquTnJ7kt5LsmGRVkuuSXJRkh3HUJknq\njDwgkuwBvBQ4oKoeCSwDngucAKyqqr2Bi/vXkqQxGccexP8F1gDbJlkObAvcCBwBrOzbrASOGkNt\nkqTeyAOiqn4G/Avwn3TBcEtVrQJ2rqrVfbPVwM6jrk2StM7yUS8wyUOAvwH2AH4BnJ3kBYNtqqqS\nVGv6FStW3DU8NTXF1NTUUpUqSZuk6elppqenFzyfkQcE8Gjgi1X1U4Ak5wKPA25KsktV3ZRkV+Dm\n1sSDASFJuqfZX55PPvnkjZrPOM5BXAscnOTeSQIcAlwDXAAc07c5BjhvDLVJknoj34OoqquSfAi4\nHLgTuAJ4H7AdcFaSFwPXA0ePujZJ0jrjOMREVb0FeMus0T+j25uQJE0A76SWJDUZEJKkJgNCktRk\nQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaE\nJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiS\nmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUNJaASLJDko8l+WaSa5I8NsmOSVYluS7J\nRUl2GEdtkqTOuPYg3gFcWFX7APsC1wInAKuqam/g4v61JGlMRh4QSe4LPKmqTgWoqrVV9QvgCGBl\n32wlcNSoa5MkrbPBgEjyyEVe5p7Aj5OcluSKJKckuQ+wc1Wt7tusBnZe5OVKkuZhmD2I9yS5LMlx\n/bf/hVoOHAC8u6oOAG5j1uGkqiqgFmFZkqSNtHxDDarqiUn2Bv4SuCLJpcBpVXXRRi7zBuCGqrqs\nf/0x4ETgpiS7VNVNSXYFbm5NvGLFiruGp6ammJqa2sgyJGnzND09zfT09ILnk+7L+hANk+V05wXe\nCfyCbu/j9VV1zrwXmnweeElVXZdkBbBt/9ZPq+rNSU4AdqiqE2ZNV8PWuyU59tjj+OAHfx84boFz\nOp3DD/8kF1xw+mKUJWlCJKGqMt/pNrgHkWQ/4EXA4cAq4PCquiLJA4FLgHkHBPBK4H8l2Rr4LnAs\nsAw4K8mLgeuBozdivpKkRbLBgKDbY/gA8HdV9auZkVV1Y5L/vjELraqrgMc03jpkY+YnSVp8wwTE\nnwC3V9VvAJIsA7apqtuq6kNLWp0kaWyGuYrpM8C9B15vS3eoSZK0GRsmILapql/OvKiqW1l3UlmS\ntJkaJiBuS3LgzIskjwZuX7qSJEmTYJhzEH9Dd3XRj/rXuwLPWbqSJEmTYJgb5S5Lsg/wMLq7m79V\nVWuWvDJJ0lgNswcB8Gi6PpSWAwf0N114BZMkbcaGuVHuI8BewJXAbwbeMiAkaTM2zB7EgcAj7ONC\nkrYsw1zF9HW6E9OSpC3IMHsQ9weu6XtxvaMfV1V1xNKVJUkat2ECYkX/bwEZGJYkbcaGucx1Oske\nwO9W1WeSbDvMdJKkTdswjxx9GXA28N5+1G7Ax5eyKEnS+A2zJ/BXwEF0z36gf8jPA5a0Kknzksz7\nWTBz8oJFzRgmIO6oqjtmfgH7J8v5GyRNnMX4s1y8oNGmb5jLXD+X5O+AbZP8Ed3hpguWtixJ0rgN\nExAnAD8GrgZeDlwIbNST5CRJm45hrmL6DfC+/keStIUYpi+m7zVGV1XttQT1SJImxDAnqR8zMLwN\n8GzgfktTjiRpUmzwHERV/WTg54aqejvwJyOoTZI0RsMcYjqQddfPbUX3bIhlS1mUJGn8hjnE9C+s\nC4i1wPXA0UtVkCRpMgxzFdPUCOqQJE2YYQ4x/S33vEXzrl5dq+qti16VJGnshn2i3GOA8+mC4XDg\nMuC6JaxLkjRmwwTE7sABVXUrQJKTgAur6vlLWpkkaayG6WrjAcCagddr+nGSpM3YMHsQHwIuTXIu\n3SGmo4CVS1qVJGnshrmK6R+TfAp4Yj/qRVX11aUtS5I0bsMcYgLYFri1qt4B3JBkzyWsSZI0AYZ5\n5OgK4HV03X4DbA18ZAlrkiRNgGH2IJ4JHAncBlBVPwS2W8qiJEnjN0xA3FFVd868SHKfJaxHkjQh\nhgmIs5O8F9ghycuAi4H3L21ZkqRxW+9VTEkCnAk8HLgV2Bv4+6patdAFJ1kGXA7cUFXPSLJjv6zf\noe8QsKpuWehyJEkbZ5g9iAur6qKqek3/s+Bw6B0PXMO6fp5OAFZV1d50eyknzDWhJGnprTcgqqqA\nryQ5aDEXmmQ34DC6Q1UzHf8dwbob8FbS3ZAnSRqTYe6kPhh4QZLv01/JRJcd+y5guW8DXgtsPzBu\n56pa3Q+vBnZewPwlSQs0Z0AkeXBV/SfwNLrDQJmr7XwkORy4uaq+mmSq1aaqKsnsLsYlSSO0vj2I\nTwD7V9X1Sc6pqmct0jIfDxyR5DBgG2D7JB8GVifZpapuSrIrcHNr4hUrVtw1PDU1xdTU1CKVJUmb\nh+npaaanpxc8n2EOMQHsteAl9arq9cDrAZI8GXhNVb0wyVuAY4A39/+e15p+MCAkSfc0+8vzySef\nvFHzGbYvpqU0cyjpTcAfJbkOeGr/WpI0Juvbg9g3ya398L0HhqE7TbB9a6L5qKrPAZ/rh38GHLLQ\neUqSFsecAVFVy0ZZiCRpskzCISZJ0gQyICRJTQaEJKlp2MtcpbHp+oxcHF3vMZKGYUBoE7EYG/bF\nCxppS+AhJklSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0G\nhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBI\nkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqSmkQdEkt2TfDbJN5J8Pcmr+vE7JlmV5LokFyXZ\nYdS1SZLWGccexBrg1VX1e8DBwF8l2Qc4AVhVVXsDF/evJUljMvKAqKqbqurKfviXwDeBBwFHACv7\nZiuBo0ZdmyRpnbGeg0iyB7A/8GVg56pa3b+1Gth5TGVJkoDl41pwkv8CnAMcX1W3JrnrvaqqJNWa\nbsWKFXcNT01NMTU1tbSFStImZnp6munp6QXPZywBkeRedOHw4ao6rx+9OskuVXVTkl2Bm1vTDgaE\nJOmeZn95PvnkkzdqPuO4iinAB4BrqurtA2+dDxzTDx8DnDd7WknS6IxjD+IJwAuAryX5aj/uROBN\nwFlJXgxcDxw9htokSb2RB0RV/R/m3nM5ZJS1SJLm5p3UkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAk\nSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJahrbE+U0mT75yTNIzhh3GZImgAGhhubT\nXucpizSfmXlJGjUPMUmSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpq8k1rS\noksW7+73qsW6I1/zZUBIWiKL1WWLxsVDTJKkJgNCktRkQEiSmgwISVKTASFJavIqJm1RFuvyy8W4\n9HIxLwVdTJNal0bPgNAWZtIuvZzEp+5N2jrSuHiISZLUZEBIkpomKiCSHJrk2iTfTvLfxl2PJG3J\nJiYgkiwD3gUcCjwCeF6SfcZb1eSanp4edwkTZHrcBUyQ6XEXMDH8G1m4iQkI4CDgO1V1fVWtAT4K\nHDnmmiaWv/yDpsddwASZHncBE8O/kYWbpIB4EPCDgdc39OMkSWMwSZe52qfvRtpqK9hmm/ew9db/\ne0HzWbPmBm6/fZGKkrTJy6T0tZ7kYGBFVR3avz4RuLOq3jzQZjKKlaRNTFXN++aUSQqI5cC3gD8E\nbgQuBZ5XVd8ca2GStIWamENMVbU2yV8DnwaWAR8wHCRpfCZmD0KSNFkm6Sqmuwxzw1ySd/bvX5Vk\n/1HXOCobWhdJnt+vg68l+fck+46jzlEY9kbKJI9JsjbJn46yvlEa8m9kKslXk3w9yfSISxyZIf5G\ndkryqSRX9uviRWMoc8klOTXJ6iRXr6fN/LabVTVRP3SHl74D7AHcC7gS2GdWm8OAC/vhxwKXjLvu\nMa6LxwH37YcP3ZLXxUC7fwM+CTxr3HWP8fdiB+AbwG79653GXfcY18UK4J9m1gPwU2D5uGtfgnXx\nJGB/4Oo53p/3dnMS9yCGuWHuCGAlQFV9Gdghyc6jLXMkNrguqupLVfWL/uWXgd1GXOOoDHsj5SuB\njwE/HmVxIzbMuvhz4JyqugGgqn4y4hpHZZh18SNg+354e+CnVbV2hDWORFV9Afj5eprMe7s5iQEx\nzA1zrTab44ZxvjcPvhi4cEkrGp8NroskD6LbOLynH7W5nmAb5vfiocCOST6b5PIkLxxZdaM1zLo4\nBfi9JDcCVwHHj6i2STPv7ebEXMU0YNg/6tnX9G6OG4OhP1OSpwB/CTxh6coZq2HWxduBE6qq0j31\nZnN9KMEw6+JewAF0l41vC3wpySVV9e0lrWz0hlkXrweurKqpJA8BViXZr6puXeLaJtG8tpuTGBA/\nBHYfeL07XdKtr81u/bjNzTDrgv7E9CnAoVW1vl3MTdkw6+JA4KP9E9F2Ap6eZE1VnT+aEkdmmHXx\nA+AnVXU7cHuSzwP7AZtbQAyzLh4P/CNAVX03yfeAhwGXj6TCyTHv7eYkHmK6HHhokj2SbA08B5j9\nB34+8Bdw1x3Yt1TV6tGWORIbXBdJHgycC7ygqr4zhhpHZYProqr2qqo9q2pPuvMQr9gMwwGG+xv5\nBPDEJMuSbEt3UvKaEdc5CsOsi2uBQwD6Y+4PA/5jpFVOhnlvNyduD6LmuGEuycv7999bVRcmOSzJ\nd4DbgGPHWPKSGWZdAG8Afht4T//NeU1VHTSumpfKkOtiizDk38i1ST4FfA24Ezilqja7gBjy9+KN\nwGlJrqL7Uvy6qvrZ2IpeIknOAJ4M7JTkB8BJdIcaN3q76Y1ykqSmSTzEJEmaAAaEJKnJgJAkNRkQ\nkqQmA0KS1GRASJKaDAjdQ5Lf9N1EX53krCT3XsC8PpjkWf3wKUn2WU/bJyd53EYs4/okO25sjYs1\n3yQrkvxtY/wDk5zdD08luaAffsZM99RJjlrfupln3Q/vu7b+SpI9F2OeG1jegUnesZHTvijJvy52\nTVocBoRaflVV+1fVI4FfA/918M10j4cdVvU/VNVLa/1PCXwKXbcI87XRN/MkWbaB+c6nP6dmHVV1\nY1X9WWP8BbXumetHAY+Yx7LW5yjg7Ko6sKq+txgzXN//eVV9pao2tgM8b8SaYAaENuQLwO/23+6/\nkOQTwNeTbJXkn5Nc2j985GUA6byrf4DLKuABMzNKMp3kwH740P4b7pVJViX5HeDlwKv7vZcnJLl/\nko/1y7g0yeP7ae+X5KJ0D385hTk24kl+meStfbvPJNlpoI63JbkMOD7JHya5It1Dlz7Qd9kw43X9\n+C/3Hb3NfPO/pJ9mVZIHDLTfL8kXk1yX5CV9+z3SeIjLzLfnfq/pGcA/9/PcK8lXBto9dPD1wPhH\n9XVcleTcJDskOYyut9JXJPm3We2X9Xt0V/ef6fjG/8tO6foqmqnv/CQXA59JckY//5n5fTDJs2b2\nivr/++8lue9Am2/3/4/rW2eaUAaE5tR/azyMrrsG6B5G8qqqejjwErq+XA6i65P/pUn2AJ4J7A3s\nQ9fvy+AeQQGV5P7A+4A/rapHAX9WVd8H/ifw1n7v5d+BdwBv65fxbOD9/XxOAj5fVb8PfBx48Bwf\nYVvgsr7d5/rpZuq4V1U9Bng3cBpwdFXtS9f9zCsG5nFLP/5ddL3FAnyhqg6uqgOAM4HXzawyYF+6\nPaHHAW9Issscta1bKVVfousn5zVVdUBV/QfwiyT79U2OBU5tTPoh4LVVtR9wNXBSVV3IuvX41Fnt\nHwU8sKoe2X+m0wbWx1zf5Pene/DSVP9ZjwboQ/SpdA9mmvkcRdcH1DP7No8FvldVP2b960wTyoBQ\ny72TfBW4DLiebuMU4NJ+Qw7wx8Bf9O0uAXakewbBk4DTq/Mjuqe7DQpwMN0G/vsAVXXLrPdnHAK8\nq1/GJ4DtktynX8ZH+mkvZO6HpNxJtzGib//Egfdmxj+MbiM209HhSuAPBtqd0f/7UbqNPsDu/R7M\n14DXsO7QUAHnVdUdVfVT4LN0neQNa/Czvx84NslWdBvl0+/WsPuWft/+ITGz656rq/PvAnule+zk\n04Bhuru+aOD/51PAU/pweDrwuaq6Y1b7M+k6zAN4LuvW81zrTBPMgFDL7f23+P2r6vj+SV3QdfA1\n6K8H2j2kqlb14zf0rXA+z/x47MAydq+q2wbem4/MWu7szzJXu0Ez4/8VeGf/LfzlwPpO4t85jxoH\nl3sO3Ub4cODyIbpxH1wfc50LuYVuD2ea7rzSzB7ZWtZtC7aZNdmvBqb/f/20T6MLrTO5p0voDknu\nRPfwpnP78fNZZ5oQBoQ21qeB4/rDUCTZO1230p8HntOfo9iV7nDLoKLbiPxBf0iKrLtS6FZgu4G2\nFwGvmnkxcMjl83SP1CTJ0+l6s23ZCpg5OfzndOdT7ppd/++3gD1mzi8AL6Q7HDXTZubb8HOAL/bD\n2wM39sMvmjXPI5P8VpL7AVN0e2HDuJV1j8Wk/2b+abqn4502u3H/mNmfJ5nZK3oh3cZ78LPdTV/T\n8qo6F/h7usNH0O0lProffvYG6jyT7sFUT6Lbo5hdV9Ed9nsbcM1AsM21zjTBDAi1tL6Bzj5O/X66\n5wtc0Z+AfQ+wrKo+TvdQmmvoDnt88R4z6p6P/DLg3CRXsu4wzgXAM2dOUtOFw6P7k7DfoPvmCXAy\nXcB8ne549/dpuw04qK9vCviH2Z+x/1Z8LHB2f/hjLd0x/Jk2v52um+hXAq/ux6/o219O9+zrGmj/\nNbpDS18C/qGqbhpc3qzhwXX6UeC1ufulqafT7YFcNMfnO4buxPZVdHsGM59vrnMKDwI+2x+y+zBw\nYj/+f9Cd1L4CuN8c9c24iO5Q1qqB5zrPbncm8HzuvoexgrnXmVcyTSi7+9ZmK8mtVbXdhltOpiSv\nAbarqpM22FhaAhP3wCBpEW2y336SfBzYk+5KIWks3IOQJDV5DkKS1GRASJKaDAhJUpMBIUlqMiAk\nSU0GhCSp6f8DzF7yGFpruC0AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# histogram of predicted probabilities\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.hist(y_pred_prob)\n", "plt.xlim(0, 1)\n", "plt.xlabel('Predicted probability of survival')\n", "plt.ylabel('Frequency')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# increase sensitivity by lowering the threshold for predicting survival\n", "import numpy as np\n", "y_pred_class = np.where(y_pred_prob > 0.3, 1, 0)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[105 23]\n", " [ 51 44]]\n" ] } ], "source": [ "# old confusion matrix\n", "print confusion" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[72 56]\n", " [32 63]]\n" ] } ], "source": [ "# new confusion matrix\n", "print metrics.confusion_matrix(y_test, y_pred_class)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.663157894737\n" ] } ], "source": [ "# new sensitivity (higher than before)\n", "print 63 / float(63 + 32)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.5625\n" ] } ], "source": [ "# new specificity (lower than before)\n", "print 72 / float(72 + 56)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }