{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Supervised Learning: Logistic Regression\n", "\n", "#### Overview\n", "\n", "In this section we will be learning about logistic regression. In this section we will be using Logistic Regression to perform classification of a data set. In this lecture, we will focus on binary classification, so that we have two outputs, a positive and negative, (1 or 0). To give some examples of what we mean by binary: we could classify email as either spam or not spam, or tumors as either malignant or benign. In both these cases we have a set of data and features, but only two possible outputs. It is possible to have more than just two classes, but for now we will focus on binary classification.\n", "\n", "In order to perform this classification we will be using the logistic function to perform logistic regression.\n", "\n", "Here is an overview of what we will do throughout this lecture:\n", "\n", " 1. Basic Mathematical Overview of the Logistic Function\n", " 2. Extra Math Resources\n", " 3. DataSet Analysis\n", " 4. Data Visualization\n", " 5. Data Preparation\n", " 6. Multicollinearity Consideration\n", " 7. Logistic Regression with SciKit Learn\n", " 8. Testing and Training Data Sets\n", " 9. Conclusion and More Resources\n", " \n", "Now we'll start with our imports before continuing the lecture (this is because we want to plot some things in our explanation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----------\n", "#### IMPORTS\n", "\n", "##### Module Install Notice!\n", "You'll need to install a new module we haven't used before: [Statsmodels](http://statsmodels.sourceforge.net/).\n", "\n", "You can install it with 'pip install statsmodels' or 'conda install statsmodels' depending on your Python installation. In this lecture we will only be using a dataset from it, but it can do quite a bit, including many statistical computations that SciKit Learn does.\n" ] }, { "cell_type": "code", "execution_count": 255, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Data Imports\n", "import numpy as np\n", "import pandas as pd\n", "from pandas import Series,DataFrame\n", "\n", "# Math\n", "import math\n", "\n", "# Plot imports\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "sns.set_style('whitegrid')\n", "%matplotlib inline\n", "\n", "# Machine Learning Imports\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.cross_validation import train_test_split\n", "\n", "# For evaluating our ML results\n", "from sklearn import metrics\n", "\n", "# Dataset Import\n", "import statsmodels.api as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 1: Basic Mathematical Overview\n", "\n", "First, let's take a look at the [Logistic Function](http://en.wikipedia.org/wiki/Logistic_function). The logistic function can take an input from negative to positive infinity and it has always has an output between 0 and 1. The logistic function is defined as:\n", "$$ \\sigma (t)= \\frac{1}{1+e^{-t}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A graph of the logistic function looks like this (following code):" ] }, { "cell_type": "code", "execution_count": 224, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 224, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAXMAAAEKCAYAAADgl7WbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAH3pJREFUeJzt3XmcHFW9/vHPZGPLAkoCKiCichSVxQkSArKDgpQFXAFR\n", "QCLwQ1aRwst2wSAIXLkFIpussl1RokBZoIAKKEaBnw0ILnwxARRkX0JICFnn/nFqoJnMTE8y3XO6\n", "q5/369Wvnu7qTD8FyTNnTled6ujq6kJERFrbsNABRERk8FTmIiIloDIXESkBlbmISAmozEVESkBl\n", "LiJSAiNCB5Dyc87dDZxvZj+r0/d7ENjazGb3sX0ccJOZbTeQ1/eRdx3gtaqnu8zsk4MKXuecItVU\n", "5jIUuopbXZjZJjVeshqw6TK8vqcu4Fgzu3FZsy2jweYUeYvKXIJyzv0/4EhgMfA8cISZ/cM5Nx74\n", "IbAe8HKx7REzO9U5twRYHRgFXAO8u/h2t5rZKcWfW8k59wAwEVgErG5mrzjnTgD2L577B3BAHyPh\n", "jj7y3k3VbxnF4++b2Y3OuTeBM4EdgfcC55nZecXrlnrfGjlPBr5YPPdY8d/l+eL9/gBsgf/t4R7g\n", "K2ams//anObMJRjn3HbAN4FtzGxj4EfAzcXm7+PLewNgT2Bz3jm67wAOBmaaWSfwaeDDzrkx+KKc\n", "Z2afNLMlVe/3eeArwCQz+wTwBHBEL9E6gLOdcw9W3T5bbOv5W0b116OAF81sS+ALwFnOuVF9vO/h\n", "/eScAnwWmGhmGwF/Aa6qep/1zGxr4BPAdsDWveyDtBmNzCWkzwI/NrOXAczsaufcec65dYGdgU2K\n", "559zzv20x5/tAn4J/MI5tw7wa+B4M3vdOfdultYB7ADcYGavFd836SPXYKZZsuL+QWAFYHRf71vs\n", "Z285dwauNLN5xXPfB05yzo0ssuXF95njnJuBn66RNqeRuYTUwdLTGR3ASPz0QvXfzyU9XoeZ/Qn4\n", "AHApsC5wv3Nu837eb2H1A+fc2OIHwbLo6pFrVI/t84ps3SP2juV4357/XYbhB17dz82r2tZFH1NC\n", "0l5U5jJUeiuc24G9nXOrw1vTCy8BM4BbgQOL598N7EaPaRbn3FnAyWaWAUcDfwU+jP9BMLzHe3Xh\n", "R+97FFMxAN8G+hqd9+VF/Pw2zrkPAhvWeH1f73sMvuR7y3k7MMU5t3Lx3FHAb81sQfFY5S1L0TSL\n", "DJVrnXNXVT2+wMxOcM6dC9zpnBsGvADsamZdzrlvAJc75x7GfwD6T+CN4s92z1ufC1ztnHsEmA88\n", "BFyP/zD1Aefc34Ati9diZr90zm0ATHfOgZ+LPngZ9+P04j0/BzwK/LZqW88PIWu977zecgJXAGvj\n", "f9MYhv/A9Mv9vI8IHVoCV5qRc+5Q4EEzu9c5twLwO+AUM7s9cDSRpjSgkblzbjPgLDPbtsfzEXAy\n", "/tfaK83s8vpHlDb1N+B859xw/Lz0DSpykb7VHJk75/4T2BeYY2aTq54fif8HNxH/6+90/K/ILzQu\n", "roiI9GYgH4DOAPZg6Q9dPgrMMLPXzGwh8HtgqzrnExGRAahZ5sWxtot62TSWd65d8Towrk65RERk\n", "GQzmaJbXgDFVj8cAr/b3ByqVij5tFRFZDp2dnf0ekjqYMn8Uf/r0asBc/BTL2YMN1MoqlUqX9q91\n", "af9aV1/7FiXZSsD7ittaxe19wBrA+Krb6ix9zH9/FuAHtK8Bc/CfG/a8ze3luXnAm/hDaRcM9H7q\n", "l9aaUyvQspR5F4Bzbh9gtJld5pw7Bn+CwzDgCjN7dhm+n4jIoERJNgJYZ79tV2dqkn0N+BDwQfyZ\n", "wWvx9iJsfZmFPxFsZnHffXup2PZab7c8jd+s/970rVKp1HzNgMrczJ4EJhdfX1/1/C3ALcsXT0Rk\n", "YKIkG4k/u/fjPW4fAEZce9dLABdX/ZE5wFPAA8C/gad73D8HvJSn8TuWWmhlOgNURJpKUdyfAD6F\n", "X+99Iv7ouZE9XvoK8P+BmVt/fMy+v/3L6/vhR9gzgRfzNG6rz+hU5iISVJRk4/DLGWxT3G+CX3Gy\n", "2xv4EfZfetye7y7sSqWy77FTtrtuCGM3HZW5iAypKMlWxBf39sX9J3n7MOlFwCPA/fhR9/3A3/M0\n", "7u3waKmiMheRhouS7H3A54Bd8SXevSLkQvyVk+4ubn/M0/iNXr6F1KAyF5GGKAp8T/zl7zar2vR3\n", "/IETt6PyrhuVuYjUTZRkq+LL+4v4c0868EsS/xr4OXBrnsaPh0tYXipzERmUKMk68IcuHwzsBaxU\n", "bLoHv778z/I01gJ8DaYyF5HlUpxduR/wdWCD4umZwOXAdXkaPx0qWztSmYvIMomSbDxwGHA4/lT4\n", "hcBP8NdivTtP46Wu1yqNpzIXkQGJkmwN4DjgUGBF/OnuZwIX5Gn8TMhsojIXkRqiJFsd+CZwBP6Q\n", "wn8B/wP8ME/jmgtAydBQmYtIr4qTe44GTsQvcf0MvtSvyNN4fshssjSVuYi8Q3F0SgykwHr4FQRP\n", "AS7J03heyGzSN5W5iLwlSjIHXADsgD+1/hzgtDyNZwUNJjWpzEWke13wBDgVv8jVL4Bj8jS2oMFk\n", "wFTmIm0uSrKNgCvxC149Bxyep/GNYVPJslKZi7SpKMmG4w81PBXfBVfhR+P9XstXmpPKXKQNFYtg\n", "XYdfgvbfwIF5Gt8eNJQMyrDaLxGRMomSbFfgYXyR3wxspCJvfSpzkTaxpKuLKMmmAjn+5J/DgD3y\n", "NH45aDCpC02ziLSBKMnGrf/eFQG+BTwJ7J6n8UNBQ0ldaWQuUnJRkn0IuP+xZ94E+BUwUUVePipz\n", "kRKLkmwS8Edg/ckfHQ2wi6ZVykllLlJSUZLtDtwFrAYcstMmq6ILI5eXylykhKIkOxT4Gf6SbZ/P\n", "0/jSwJGkwfQBqEjJREl2LHA28AKwc57GDwSOJENAZS5SEsVqh98qbk8DO2htlfahMhcpgaLIzwL+\n", "E3gc2D5P4yeDhpIhpTlzkXI4FV/kBmylIm8/GpmLtLgoyU4ATgZmAtvpepztSSNzkRYWJdnRwBn4\n", "63JuryJvXypzkRYVJdn+wLnAs/gR+T8DR5KAVOYiLShKsp2AK4BZ+KNWZgaOJIGpzEVaTJRkm/DO\n", "E4L+FjiSNAF9ACrSQqIkWxd/fc5VgL3yNL4nbCJpFipzkRYRJdkY/FrkawJH52n808CRpIn0W+bO\n", "uWHARcCGwHzgIDObWbV9d+BEoAu40sx+0MCsIm0rSrJhwDXAx4EL8zQ+L3AkaTK15sx3A0aZ2WTg\n", "eCDtsf0cYEdgCyBxzo2rf0QRAabi/z3eBXwjbBRpRrXKfAvgNgAzuw+Y2GP7QmBVYCWgAz9CF5E6\n", "ipJsL/xJQU8Ae+ZpvDBwJGlCtcp8LDC76vHiYuqlWwpUgL8AuZlVv1ZEBilKsg2AHwJz8Eeu6MIS\n", "0quOrq6+B9POuRS418ymFY+fMrO1i6/XAW4FNgfeAK4DbjSzPj+UqVQqGrmLDNCCRUu49LYXeGn2\n", "Ivbc8l18bJ2VQ0eSgDo7Ozv6217raJbpQARMc85NAh6u2rYi/jjX+Wa2xDn3An7KZVCBWlmlUunS\n", "/rWuZtq/YhXEq4H9gO/vv/unvz7Y79lM+1dvZd43GNhAuFaZ3wTs6JybXjye4pzbBxhtZpc5564G\n", "/uCcexOYAVw1mMAi8pYD8UV+P/DNwFmkBfRb5mbWBRza4+nHqrafi18bQkTqJEqyjYELgFeBvfM0\n", "XhA4krQAnTQk0kSiJFsFuAFYAfiC1iWXgdLaLCLNJQU+DKR5Gt8SOoy0DpW5SJOIkmxX4BD8gQYn\n", "BY4jLUZlLtIEoiSbgF/SdgGwb57G8wNHkhajOXORwIrDEC8DJgBJnsaPBI4kLUgjc5HwDgQ+D9wJ\n", "fC9wFmlRKnORgKIkez/+8N5ZwAF5Gi8JHElalKZZRAIpplcuAUbji/ypwJGkhWlkLhLOfsBngNvx\n", "a5WLLDeVuUgAUZKtiZ8fnwMckqexFqGTQdE0i0gY5wOrAUfkafzP0GGk9WlkLjLEoiTbA/gCflXS\n", "iwPHkZJQmYsMoSjJxgEX4q+pe6COXpF6UZmLDK3TgDWB0/M0ttBhpDxU5iJDJEqyTwKH45eRPjtw\n", "HCkZlbnIEIiSbBhwEf7f3OFae0XqTWUuMjQOBDYDfpKn8a9Dh5HyUZmLNFiUZOOB/wZeB44JHEdK\n", "SseZizTeWfhjyr+Rp/EzocNIOWlkLtJAUZJ9Cvgq/oITFwSOIyWmMhdpkGIhre4lbY/K03hRyDxS\n", "bipzkcbZB9gc+Fmexr8NHUbKTWUu0gBRkq2M/9BzPvDNwHGkDajMRRrjm8BawDl5Gj8ROoyUn8pc\n", "pM6iJFsbOA54DjgzcBxpEzo0UaT+zgRWwp/p+XroMNIeNDIXqaMoySYBXwYeAK4OHEfaiMpcpE6K\n", "QxHPKR4ereVtZSipzEXqZzf8oYg35Wl8T+gw0l5U5iJ1ECXZCPxc+WLghMBxpA2pzEXq46uAAy7X\n", "RSckBJW5yCBFSbYKcCrwRnEvMuRU5iKDdzT+UnDn5Gn8bOgw0p5U5iKDUKxVfhzwEroUnASkk4ZE\n", "BuckYAxwUp7Gs0OHkfalkbnIcoqSbD3gMOBx4JLAcaTNqcxFlt9pwEj8qHxB6DDS3vqdZnHOdV9R\n", "fEP8Up4HmdnMqu2bAinQAfwb2N/M9JdaSi9Ksg2BL+FP278hcByRmiPz3YBRZjYZOB5f3AA45zqA\n", "S4EDzOzTwG+ADzQqqEiTmVrcn6TT9qUZ1CrzLYDbAMzsPmBi1bb1gZeBY5xzdwOrmplOlpDSi5Js\n", "E2B34I/A7YHjiAC1y3wsUP0J/eJi6gVgdWAycD6wA7C9c27b+kcUaTpTi/tv5WncFTKISLdahybO\n", "xh921W2YmXX/SvkyMKN7NO6cuw0/cr+rv29YqVRK/Zdf+9faau3fv1/2HwmtM34UU3YYf0elUhmS\n", "XPVS5v9/Zd63gahV5tOBCJjmnJsEPFy17XFgtHPug8WHop8GLq/1hp2dnR3LG7bZVSqVLu1f6xrI\n", "/k1NsluBXf714oLtJk6c2O/ApdmU+f9fmfcNBvaDqlaZ3wTs6JybXjye4pzbBxhtZpc55w4EflR8\n", "GDrdzH45uMgizau48MQuwN15GrdUkUv59VvmZtYFHNrj6ceqtt8FbNaAXCLNaGpx/62QIUR6o5OG\n", "RAYgSrItgM8Av8nT+Heh84j0pDIXGZjupW01KpempDIXqSFKsq2B7YHb8zSeXuv1IiGozEVq06hc\n", "mp7KXKQfUZJtC2wN/CJP4/tC5xHpi8pcpA9RknUA3y4ealQuTU1lLtK3HYAtgZ/nafyn0GFE+qMy\n", "F+lFj1H51IBRRAZEZS7Su88Ak4Ab8zR+MHQYkVpU5iI99BiVn9rfa0WahcpcZGmfAzYFpuVp/HCt\n", "F4s0A5W5SJWqUXkXGpVLC1GZi7xTDGwC/DhP47+GDiMyUCpzkcKSri7wR64s4e05c5GWoDIXKTz6\n", "1DyAjYAf5Wn8aOA4Isuk1sUpRNpClGTDxo8bAbAYjcqlBWlkLuLt+eJriwCuzdP4H6HDiCwrlbm0\n", "vSjJhgNTO/wVJE8Lm0Zk+ajMRWBv4CMbr7cyeRo/HjqMyPJQmUtbi5JsBH5FxEVbfWxs6Dgiy01l\n", "Lu3uS8D6wBWrjdbxANK6VObStqIkGwmcAiwAzggcR2RQVObSzvYDPghcnqfxv0KHERkMlbm0pWJU\n", "fjIwHzgzcByRQVOZS7s6AFgXuCRP46fDRhEZPJW5tJ0oyVYA/gt4EzgrcByRulCZSzv6KrAOcFGe\n", "xs+GDiNSDypzaStRkq0InAjMA74bOI5I3ajMpd0cDKwFXJCn8fOhw4jUi8pc2kaUZCsDJwFz0Khc\n", "SkZlLu3kMGAN4Lw8jV8KHUaknlTm0haiJBsDHAe8BqSB44jUncpc2sWRwOrAOXkavxo6jEi9qcyl\n", "9KIkGwccC7wCfC9wHJGGUJlLO/gGsBpwdp7Gs0OHEWkElbmUWpRk78KX+YvABYHjiDSMFnCWsjsW\n", "GAskeRrPCR1GpFH6LXPn3DDgImBD/OpyB5nZzF5edynwspmd0JCUIsshSrLxwFHAs8DFgeOINFSt\n", "aZbdgFFmNhk4nl4O6XLOHQJ8HOiqfzyRQTkOWAU4I0/jeaHDiDRSrTLfArgNwMzuAyZWb3TOTQY+\n", "BVwCdDQioMjyiJLsvcDhwFPAZYHjiDRcrTIfC1R/+r+4mHrBOfce/CW3jkBFLs3nBGBF4PQ8jeeH\n", "DiPSaB1dXX3PjjjnUuBeM5tWPH7KzNYuvj4S+ArwOrAmsDJwspld09f3q1QqmoqRhnt1ziIuuOU5\n", "xqw0nCOjNRk+TGMNaX2dnZ39/kWudTTLdCACpjnnJgEPd28ws/OB8wGcc18BPtJfkQ80UCurVCpd\n", "2r/woiS7Bthv1tzF+35q04n/O9A/1yr7t7zKvH9l3jcY2EC4VpnfBOzonJtePJ7inNsHGG1mPech\n", "NeqW4KIk2xDYFz/wuD5wHJEh02+Zm1kXcGiPpx/r5XVX1zOUyCB8B/8Zzgl5Gi8JHUZkqOgMUCmN\n", "KMm2BHYFfgf8MnAckSGlMpdSiJKsA/jv4uFxeRpr2k/aispcyiICJgM352l8b+gwIkNNZS4tL0qy\n", "4cAZwBL8xZpF2o7KXMpgX+BjwFV5Gv89dBiREFTm0tKiJFsR+DZ+IbipYdOIhKMyl1Z3JLAOcEGe\n", "xk+FDiMSispcWlaxxO1/4S8H953AcUSC0sUppJVNxS8Gd5Qu0iztTiNzaUlRkm0AHAIY8IPAcUSC\n", "U5lLqzobGA4cm6fxwtBhREJTmUvLiZJsJ2AX4DfArYHjiDQFlbm0lCjJRgDn4FfpTHTavoinMpdW\n", "81X8CUJX5mn859BhRJqFylxaRpRkqwKnAXPxhySKSEGHJkorORWYAJyYp/FzocOINBONzKUlFFcQ\n", "OgL4B37OXESqqMyl6RVrlV+I//t6ZJ7G8wNHEmk6KnNpBV8GtgRuytP49tBhRJqRylyaWpRkY/En\n", "CL0JHBM4jkjTUplLs5sKrAmckafxk2GjiDQvlbk0rSjJNgaOAh7Hj85FpA8qc2lKxaXgLsOvv3Jo\n", "nsZvBo4k0tRU5tKsjgImAtflaXxH6DAizU5lLk0nSrJ1gdOBl9GHniIDojNApakUx5RfBKyMn155\n", "MXAkkZagkbk0m72BnYFfAdcGziLSMlTm0jSKa3qeB8wDvqblbUUGTtMs0hSK6ZWL8QtpHZun8eOB\n", "I4m0FI3MpVnsA/wHcA/wvcBZRFqOylyCi5LsffiFtOYCB+RpvDhwJJGWo2kWCaqYXrkcWBU/T67p\n", "FZHloJG5hHYw8FngDuDSwFlEWpbKXIKJkmx9/IUmXgMO1NErIstP0ywSRJRkKwA/BlYB9snT+OnA\n", "kURamkbmEspZwCbAlXka/zh0GJFW1+/I3Dk3DH9q9YbAfOAgM5tZtX0f4OvAIuAR4DAz06/K0q8o\n", "yXYFjgYexS+oJSKDVGtkvhswyswmA8cDafcG59xKwGnANma2JTAO2LVRQaUcisMQr8IPDvbO03hu\n", "2EQi5VCrzLcAbgMws/vwS5J2exPY3My615kegT8NW6RXUZKNAn4KvBtI8jR+OHAkkdKoVeZjgdlV\n", "jxcXUy+YWZeZvQjgnDsSWMXMft2YmFIS5wKTgB/hp+9EpE46urr6nuJ2zqXAvWY2rXj8lJmtXbV9\n", "GPBd4EPAF6tG6b2qVCqaT29TDz0+l5vvfZUJq47koJ3GM2qEPnsXWRadnZ0d/W2vdWjidCACpjnn\n", "JgE9fy2+BD/dsvtAP/isFaiVVSqVLu3f0qIk2wT4AzD/hVkLJ26+2aYz6p9u8PT/r3WVed9gYAPh\n", "WmV+E7Cjc2568XhKcQTLaOBPwFeB3wF3OucAzjOzm5c/spRNlGRrADcDKwJ75mnclEUu0ur6LfNi\n", "tH1oj6cfq/p6eN0TSWlESbYSkAHrACfnaXxL4EgipaWJS2mIKMmGAT8ENsNfMeg7YROJlJvKXBrl\n", "W/hLwP0eOFjrrog0lspc6i5KsoOBU4DHgd3zNJ4fOJJI6anMpa6iJNsD+AHwErBLnsYvBY4k0hZU\n", "5lI3UZJtC1wPvAHsnKexBY4k0jZU5lIXUZJ14o9c6cBPrfwpcCSRtqL1zGXQoiT7JPAr/PkHX8zT\n", "WMs6iAwxjcxlUIoi/zX+Gp5fydP4hsCRRNqSylyWW3GafneRH5Cn8bWBI4m0LZW5LJcoybYA7sQX\n", "+ZQ8ja8JHEmkranMZZlFSbYLfo58DLB/nsZXB44k0vZU5rJMoiT7Em8ftRLnaXxd4EgigspcBihK\n", "so57/job4H+BucBOeRrfGjaViHRTmUtNUZKtAFz1mz/PBngK2CpP43vCphKRaipz6VeUZOPx8+P7\n", "v+/dIwE+pWt3ijQfnTQkfYqSbBIwDVgLuOGA7SfsNWmzic8FjiUivVCZy1KiJOsAjgRS/G9vJwFn\n", "jRzRsVfQYCLSJ5W5vEOUZO/CX9v1C8ALwD55Gt8JUKlUQkYTkX6ozOUtUZLtCFwFvBe4B7/OyjNB\n", "Q4nIgKjMhSjJVgHOAI4CFgEnAt/N03hx0GAiMmAq8zZXnM15EfB+4O/AvnkaPxA2lYgsK5V5m4qS\n", "7D3AecCe+NH4mcBpeRrPCxpMRJaLyrzNREm2Iv5Ilf8CxgJ/AA7J0/gvQYOJyKCozNtElGTDgL3x\n", "I/D3A68AXwMuy9N4SchsIjJ4KvOSK44Z/yxwKrApsAA4GzgjT+NZIbOJSP2ozEuqKPEIOBmYWDz9\n", "E+CEPI2fCBZMRBpCZV4yxaJYewPHABsBXfhT8k/Xmioi5aUyL4koydYEDsXPg08AlgA/Ar6Tp/Hf\n", "QmYTkcZTmbewKMlG4ufDD8BPqYwEZgH/A1yYp/GTwcKJyJBSmbeYYi58Q2B/YF/8KBzgEeBi4Jo8\n", "jecGiicigajMW0BR4JsC/wHsAXyo2PQKcD7wQ+ChPI27wiQUkdBU5k0qSrJxwLbATvgplLWKTXOB\n", "G4rbLXkazw+TUESaicq8SRRHoXQCO+ALfBIwvNg8C7gGuBG4Q6fci0hPKvNAoiSbAGwObAFMxh8L\n", "vkKxeQlwH3AH/pJt9+dpvDBEThFpDSrzBivmu9cFNgE2rrqtXfWyxcCf8euk3A3cmafxq0MaVERa\n", "msq8TqIkW+lrO08gSrI9gfUBV9x/FL+gVbVngV8Af8QX+P15Gs8ZyrwiUi4q8wGKkmw0sA5+kaqe\n", "9+8H1v7BL18A/8Fkt4XADHxxPwg8BPw5T+Pnhy65iLSDfsvcOTcMf+GCDYH5wEFmNrNqe/faH4uA\n", "K83s8gZmrZti6mMlYDVg1eJ+jarbhB6P1wBG9/HtlgDPAHd1fmiVbSsz5h4DPAYY8GSexosauCsi\n", "IkDtkfluwCgzm+yc2wx/tfbdAJxzI4Fz8B/cvQFMd8793MxeqHfIYvnWFYrbKsVtdNXXfT03mrfL\n", "uuf9yAG89WL8RY1nAM8B/wL+Wdx3f/1M94eTlUqla+qhO5w76B0WEVlGtcp8C+A2ADO7zzk3sWrb\n", "R4EZZvYagHPu98BWwE/7+4ZRkv2Et4t5BWBUj8e9PV+P6aCFwKvF7YniflbV/QvA8z1ur2itbxFp\n", "BbVKciwwu+rxYufcMDNbUmx7rWrb68C4AbznXj0eL8ZP4Swo7ucX36v6cfVtbtVtzgAezypu83SG\n", "pIiUVa0ynw2MqXrcXeTgi7x62xj8KLdfU7+0Vs+nhgMrF7eGqlQqjX4LKpVKqX9gaP9aW5n3r8z7\n", "NhC1ynw6/lTyac65SUD1etiPAh92zq2GHwVvhb+CTZ86Ozs7BpFVRET60NHV1fcPM+dcB28fzQIw\n", "BX/K+Wgzu8w5tytwCjAMuMLMLm5wXhER6UW/ZS4iIq1hWOgAIiIyeCpzEZESUJmLiJTAkK7N4pwb\n", "jj9rtBN/UtApZnbbUGYYCs65jwD3AhPMbEHoPPXinBsHXIc/DHUUcIyZ3Rs21eDUWrKi1RVnal+J\n", "Xz9oBeB0M8vDpqo/59wEoAJsb2aPhc5TT865E3j7Gr8XmNnVvb1uqEfm+wEjzGxL/LIAHx3i9284\n", "59xY/LIHb4bO0gDfAH5lZtvgLyJ9YdA09fHWkhXA8fj/d2XyZeBFM9sKf/HvCwLnqbviB9Yl+EOk\n", "S8U5tw2wefH3cxtgvb5eO9RlvhPwb+fcLcBlQDbE799QxaGclwAnAGW8GtC5wKXF1yMpxz6+Y8kK\n", "/FpDZTINf/gw+H/vZVz47Wz8xcyfDR2kAXYCHnHO3QzkwM/7emHDplmccwcCR/d4+kVgnpnt6pzb\n", "Cn8h4q0blaGR+ti/fwI/NrOHnXMALXuSVB/7d4CZVZxzawLXAl8f+mR119+SFS3PzOYCOOfG4Iv9\n", "pLCJ6ss5dwD+N487iumIlv0314fx+AvZ7Ioflf8c+EhvLxzS48ydc9cD08zsxuLxs2b2niEL0GDO\n", "uX8ATxcPJwH3FVMSpeGc+wRwPZCY2e2h8wyWcy4F7jWzacXjp8xs7Rp/rKU459bGXz/2QjO7KnCc\n", "unLO/RboKm4b45eejs2sFNcMcM6dif9hdU7x+CFgBzN7qedrh/riFL8HdgFudM5thB/JloaZfbj7\n", "a+fcE/hfkUrDObcBfnS3p5k9EjpPnfS3ZEXLc86tgb+W7GFmdlfoPPVmZm/9Zu+cuws4pCxFXvg9\n", "/jfgc5xz78Uv7f1yby8c6jK/DLjYOffH4vHXhvj9h1IZT609A38Uy/eLaaRZZrZ72EiDdhOwo3Nu\n", "evF4SsgwDXAifjXTU5xz3XPnO5tZGT+gLx0zu9U5t5Vz7n78Zx6HmVmv3aLT+UVESkAnDYmIlIDK\n", "XESkBFTmIiIloDIXESkBlbmISAmozEVESkBlLiJSAipzEZES+D+vR5775jhCZQAAAABJRU5ErkJg\n", "gg==\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Logistic Function\n", "def logistic(t):\n", " return 1.0 / (1 + math.exp((-1.0)*t) )\n", "\n", "# Set t from -6 to 6 ( 500 elements, linearly spaced)\n", "t = np.linspace(-6,6,500)\n", "\n", "# Set up y values (using list comprehension)\n", "y = np.array([logistic(ele) for ele in t])\n", "\n", "# Plot\n", "plt.plot(t,y)\n", "plt.title(' Logistic Function ')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we remember back to the Linear Regression Lectures, we could describe a [Linear Regression](http://en.wikipedia.org/wiki/Linear_regression) Function model as:\n", "$$ y_i = \\beta _1 x_{i1} + ... + \\beta _i x_{ip}$$\n", "\n", "Which was basically an expanded linear equation (y=mx+b) for various x data features. In the case of the above equation, we presume a data set of 'n' number of units, so that the data set would have the form:\n", "$$ [ y_i, x_{i1},...,x_{ip}]^{n}_{i=1}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our logistic function, if we view *t* as a linear function with a variable *x* we could express t as:\n", "$$ t = \\beta _0 + \\beta _1 x $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, we've basically just substituted a linear function (form similar to y=mx+b) for t. We could then rewrite our logistic function equation as:\n", "$$ F(x)= \\frac{1}{1+e^{-(\\beta _0 + \\beta _1 x)}}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can interpret F(x) as the probability that the dependent variable is a \"success\" case, this is a similar style of thinking as in the Binomial Distribution, in which we had successes and failures. So the formula for F(x) that we have here states that the probability of the dependent variable equaling a \"success\" case is equal to the value of the logistic function of the linear regression expression (the linear equation we used to replace *t* ).\n", "\n", "Inputting the linear regression expression into the logistic function allows us to have a linear regression expression value that can vary from positive to negative infinity, but after the transformation due to the logistic expression we will have an output of F(x) that ranges from 0 to 1.\n", "\n", "We can now perform a binary classification based on where F(x) lies, either from 0 to 0.5, or 0.5 to 1. \n", "\n", "### Part 2: Extra Math Resources\n", "\n", "This is a very basic overview of binary classification using Logistic Regression, if you're still interested in a deeper dive into the mathematics, check out these sources:\n", "\n", "1.) [Andrew Ng's class notes](http://cs229.stanford.edu/notes/cs229-notes1.pdf) on Logistic Regression (Note: Scroll down) \n", "\n", "2.) [CMU notes](http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf) Note: Advanced math notation.\n", "\n", "3.) [Wikipedia](http://en.wikipedia.org/wiki/Logistic_regression) has a very extensive look at logistic regression.\n", "\n", "Scroll down to the bottom for more resources similar to this lecture!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----------\n", "### Part 3: Dataset Analysis\n", "Let us go ahead and take a look at the [dataset](http://statsmodels.sourceforge.net/stable/datasets/generated/fair.html)\n", "\n", "The dataset is packaged within Statsmodels. It is a data set from a 1974 survey of women by Redbook magazine. Married women were asked if they have had extramarital affairs. The published work on the data set can be found in:\n", "\n", "[Fair, Ray. 1978. “A Theory of Extramarital Affairs,” `Journal of Political Economy`, February, 45-61.](http://fairmodel.econ.yale.edu/rayfair/pdf/1978a200.pdf)\n", "\n", "It is important to note that this data comes from a self-reported survey, which can have many issues as far as the accuracy of the data. Also this analysis isn't trying to promote any agenda concerning women or marriage, the data is just interesting but its accuracy should be met with a healthy dose of skepticism.\n", "\n", "We'll ignore those issues concerning the data and just worry about the logistic regression aspects to the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case we will approach this as a classification problem by asking the question:\n", "\n", "*Given certain variables for each woman, can we classify them as either having particpated in an affair, or not participated in an affair?*\n", "\n", "\n", "#### DataSet Description\n", "\n", "From the [Statsmodels website](http://statsmodels.sourceforge.net/stable/datasets/generated/fair.html) we have the following information about the data:\n", "\n", "Number of observations: 6366\n", "Number of variables: 9\n", "Variable name definitions:\n", "\n", " rate_marriage : How rate marriage, 1 = very poor, 2 = poor, 3 = fair,\n", " 4 = good, 5 = very good\n", " age : Age\n", " yrs_married : No. years married. Interval approximations. See\n", " original paper for detailed explanation.\n", " children : No. children\n", " religious : How relgious, 1 = not, 2 = mildly, 3 = fairly,\n", " 4 = strongly\n", " educ : Level of education, 9 = grade school, 12 = high\n", " school, 14 = some college, 16 = college graduate,\n", " 17 = some graduate school, 20 = advanced degree\n", " occupation : 1 = student, 2 = farming, agriculture; semi-skilled,\n", " or unskilled worker; 3 = white-colloar; 4 = teacher\n", " counselor social worker, nurse; artist, writers;\n", " technician, skilled worker, 5 = managerial,\n", " administrative, business, 6 = professional with\n", " advanced degree\n", " occupation_husb : Husband's occupation. Same as occupation.\n", " affairs : measure of time spent in extramarital affairs\n", "\n", "See the original paper for more details.\n", "\n", "*Why a Statsmodels data set?* So you can have the option of working through additional example datasets included in SciKit Learn and their own tutorials." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "------------\n", "### Part 4: Data Visualization\n", "\n", "Now that we've done a quick overview of some math and the data we will be working with, let's go ahead and dive into the code!\n", "\n", "We will start with loading the data and visualizing it. " ] }, { "cell_type": "code", "execution_count": 225, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Standard method of loading Statsmodels datasets into a pandas DataFrame. Note the name fair stands for 'affair' dataset.\n", "df = sm.datasets.fair.load_pandas().data" ] }, { "cell_type": "code", "execution_count": 226, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rate_marriageageyrs_marriedchildrenreligiouseducoccupationoccupation_husbaffairs
03329.03317250.111111
132713.03114343.230769
24222.50116351.400000
343716.54316550.727273
45279.01114344.666666
\n", "
" ], "text/plain": [ " rate_marriage age yrs_married children religious educ occupation \\\n", "0 3 32 9.0 3 3 17 2 \n", "1 3 27 13.0 3 1 14 3 \n", "2 4 22 2.5 0 1 16 3 \n", "3 4 37 16.5 4 3 16 5 \n", "4 5 27 9.0 1 1 14 3 \n", "\n", " occupation_husb affairs \n", "0 5 0.111111 \n", "1 4 3.230769 \n", "2 5 1.400000 \n", "3 5 0.727273 \n", "4 4 4.666666 " ] }, "execution_count": 226, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Preview\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! Let's go ahead and start our classfication by creating a new column called 'Had_Affair'. We will set this column equal to 0 if the affairs column is 0 (meaning no time spent in affairs) otherwise the 'Had_Affair' value will be set as 1, indicating that the woman had an affair." ] }, { "cell_type": "code", "execution_count": 227, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Create check function\n", "def affair_check(x):\n", " if x != 0:\n", " return 1\n", " else:\n", " return 0\n", "\n", "# Apply to DataFrame\n", "df['Had_Affair'] = df['affairs'].apply(affair_check)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's go ahead and see the result!" ] }, { "cell_type": "code", "execution_count": 228, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rate_marriageageyrs_marriedchildrenreligiouseducoccupationoccupation_husbaffairsHad_Affair
0332.09.03.0317250.1111111
1327.013.03.0114343.2307691
2422.02.50.0116351.4000001
3437.016.54.0316550.7272731
4527.09.01.0114344.6666661
5427.09.00.0214344.6666661
6537.023.05.5212540.8521741
7537.023.05.5212231.8260861
8322.02.50.0212334.7999991
9327.06.00.0116351.3333331
10227.06.02.0116353.2666651
11527.06.02.0314352.0416661
12337.016.55.5112230.4848481
13527.06.00.0214322.0000001
14422.06.01.0114443.2666651
15437.09.02.0214361.3611111
16427.06.01.0112352.0000001
17137.023.05.5414521.8260861
18242.023.02.0220441.8260861
19437.06.00.0216542.0416661
20522.02.50.0214347.8399961
21337.016.55.529322.5454541
22342.023.05.5312540.5326091
23227.09.02.0420340.6222221
24427.06.01.0212540.5833331
25527.02.50.0316414.7999991
26227.06.02.0212250.1666671
27537.013.01.0312340.6153851
28232.016.52.0212421.1878781
29327.06.01.01143611.1999991
.................................
6336542.023.04.0314540.0000000
6337527.06.00.0414440.0000000
6338542.023.02.0312220.0000000
6339432.013.03.0316420.0000000
6340527.013.03.0316420.0000000
6341527.09.01.0214450.0000000
6342422.02.50.0216410.0000000
6343517.52.50.0412350.0000000
6344432.016.52.0212340.0000000
6345527.09.01.0312350.0000000
6346422.02.50.0414420.0000000
6347522.02.51.0212320.0000000
6348527.00.50.0420440.0000000
6349537.016.53.0314550.0000000
6350532.013.02.0414360.0000000
6351422.00.50.0216310.0000000
6352542.023.02.0412320.0000000
6353522.02.52.0214350.0000000
6354542.023.04.0412350.0000000
6355427.06.00.0312340.0000000
6356532.013.03.0312350.0000000
6357532.013.04.0214440.0000000
6358327.06.02.0414310.0000000
6359422.02.50.0316550.0000000
6360522.02.50.0214330.0000000
6361532.013.02.0317430.0000000
6362432.013.01.0116550.0000000
6363522.02.50.0214310.0000000
6364532.06.01.0314340.0000000
6365422.02.50.0216240.0000000
\n", "

6366 rows × 10 columns

\n", "
" ], "text/plain": [ " rate_marriage age yrs_married children religious educ occupation \\\n", "0 3 32.0 9.0 3.0 3 17 2 \n", "1 3 27.0 13.0 3.0 1 14 3 \n", "2 4 22.0 2.5 0.0 1 16 3 \n", "3 4 37.0 16.5 4.0 3 16 5 \n", "4 5 27.0 9.0 1.0 1 14 3 \n", "5 4 27.0 9.0 0.0 2 14 3 \n", "6 5 37.0 23.0 5.5 2 12 5 \n", "7 5 37.0 23.0 5.5 2 12 2 \n", "8 3 22.0 2.5 0.0 2 12 3 \n", "9 3 27.0 6.0 0.0 1 16 3 \n", "10 2 27.0 6.0 2.0 1 16 3 \n", "11 5 27.0 6.0 2.0 3 14 3 \n", "12 3 37.0 16.5 5.5 1 12 2 \n", "13 5 27.0 6.0 0.0 2 14 3 \n", "14 4 22.0 6.0 1.0 1 14 4 \n", "15 4 37.0 9.0 2.0 2 14 3 \n", "16 4 27.0 6.0 1.0 1 12 3 \n", "17 1 37.0 23.0 5.5 4 14 5 \n", "18 2 42.0 23.0 2.0 2 20 4 \n", "19 4 37.0 6.0 0.0 2 16 5 \n", "20 5 22.0 2.5 0.0 2 14 3 \n", "21 3 37.0 16.5 5.5 2 9 3 \n", "22 3 42.0 23.0 5.5 3 12 5 \n", "23 2 27.0 9.0 2.0 4 20 3 \n", "24 4 27.0 6.0 1.0 2 12 5 \n", "25 5 27.0 2.5 0.0 3 16 4 \n", "26 2 27.0 6.0 2.0 2 12 2 \n", "27 5 37.0 13.0 1.0 3 12 3 \n", "28 2 32.0 16.5 2.0 2 12 4 \n", "29 3 27.0 6.0 1.0 1 14 3 \n", "... ... ... ... ... ... ... ... \n", "6336 5 42.0 23.0 4.0 3 14 5 \n", "6337 5 27.0 6.0 0.0 4 14 4 \n", "6338 5 42.0 23.0 2.0 3 12 2 \n", "6339 4 32.0 13.0 3.0 3 16 4 \n", "6340 5 27.0 13.0 3.0 3 16 4 \n", "6341 5 27.0 9.0 1.0 2 14 4 \n", "6342 4 22.0 2.5 0.0 2 16 4 \n", "6343 5 17.5 2.5 0.0 4 12 3 \n", "6344 4 32.0 16.5 2.0 2 12 3 \n", "6345 5 27.0 9.0 1.0 3 12 3 \n", "6346 4 22.0 2.5 0.0 4 14 4 \n", "6347 5 22.0 2.5 1.0 2 12 3 \n", "6348 5 27.0 0.5 0.0 4 20 4 \n", "6349 5 37.0 16.5 3.0 3 14 5 \n", "6350 5 32.0 13.0 2.0 4 14 3 \n", "6351 4 22.0 0.5 0.0 2 16 3 \n", "6352 5 42.0 23.0 2.0 4 12 3 \n", "6353 5 22.0 2.5 2.0 2 14 3 \n", "6354 5 42.0 23.0 4.0 4 12 3 \n", "6355 4 27.0 6.0 0.0 3 12 3 \n", "6356 5 32.0 13.0 3.0 3 12 3 \n", "6357 5 32.0 13.0 4.0 2 14 4 \n", "6358 3 27.0 6.0 2.0 4 14 3 \n", "6359 4 22.0 2.5 0.0 3 16 5 \n", "6360 5 22.0 2.5 0.0 2 14 3 \n", "6361 5 32.0 13.0 2.0 3 17 4 \n", "6362 4 32.0 13.0 1.0 1 16 5 \n", "6363 5 22.0 2.5 0.0 2 14 3 \n", "6364 5 32.0 6.0 1.0 3 14 3 \n", "6365 4 22.0 2.5 0.0 2 16 2 \n", "\n", " occupation_husb affairs Had_Affair \n", "0 5 0.111111 1 \n", "1 4 3.230769 1 \n", "2 5 1.400000 1 \n", "3 5 0.727273 1 \n", "4 4 4.666666 1 \n", "5 4 4.666666 1 \n", "6 4 0.852174 1 \n", "7 3 1.826086 1 \n", "8 3 4.799999 1 \n", "9 5 1.333333 1 \n", "10 5 3.266665 1 \n", "11 5 2.041666 1 \n", "12 3 0.484848 1 \n", "13 2 2.000000 1 \n", "14 4 3.266665 1 \n", "15 6 1.361111 1 \n", "16 5 2.000000 1 \n", "17 2 1.826086 1 \n", "18 4 1.826086 1 \n", "19 4 2.041666 1 \n", "20 4 7.839996 1 \n", "21 2 2.545454 1 \n", "22 4 0.532609 1 \n", "23 4 0.622222 1 \n", "24 4 0.583333 1 \n", "25 1 4.799999 1 \n", "26 5 0.166667 1 \n", "27 4 0.615385 1 \n", "28 2 1.187878 1 \n", "29 6 11.199999 1 \n", "... ... ... ... \n", "6336 4 0.000000 0 \n", "6337 4 0.000000 0 \n", "6338 2 0.000000 0 \n", "6339 2 0.000000 0 \n", "6340 2 0.000000 0 \n", "6341 5 0.000000 0 \n", "6342 1 0.000000 0 \n", "6343 5 0.000000 0 \n", "6344 4 0.000000 0 \n", "6345 5 0.000000 0 \n", "6346 2 0.000000 0 \n", "6347 2 0.000000 0 \n", "6348 4 0.000000 0 \n", "6349 5 0.000000 0 \n", "6350 6 0.000000 0 \n", "6351 1 0.000000 0 \n", "6352 2 0.000000 0 \n", "6353 5 0.000000 0 \n", "6354 5 0.000000 0 \n", "6355 4 0.000000 0 \n", "6356 5 0.000000 0 \n", "6357 4 0.000000 0 \n", "6358 1 0.000000 0 \n", "6359 5 0.000000 0 \n", "6360 3 0.000000 0 \n", "6361 3 0.000000 0 \n", "6362 5 0.000000 0 \n", "6363 1 0.000000 0 \n", "6364 4 0.000000 0 \n", "6365 4 0.000000 0 \n", "\n", "[6366 rows x 10 columns]" ] }, "execution_count": 228, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# DataFrame Check\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's go ahead and groupby the newly created 'Had_Affair' column. We'll do this by grouping by the column and then calling the mean aggregate function. " ] }, { "cell_type": "code", "execution_count": 229, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rate_marriageageyrs_marriedchildrenreligiouseducoccupationoccupation_husbaffairs
Had_Affair
04.32970128.3906797.9893351.2388132.50452114.3229773.4052863.8337580.000000
13.64734530.53701911.1524601.7289332.26156813.9722363.4637123.8845592.187243
\n", "
" ], "text/plain": [ " rate_marriage age yrs_married children religious \\\n", "Had_Affair \n", "0 4.329701 28.390679 7.989335 1.238813 2.504521 \n", "1 3.647345 30.537019 11.152460 1.728933 2.261568 \n", "\n", " educ occupation occupation_husb affairs \n", "Had_Affair \n", "0 14.322977 3.405286 3.833758 0.000000 \n", "1 13.972236 3.463712 3.884559 2.187243 " ] }, "execution_count": 229, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Groupby Had Affair column\n", "df.groupby('Had_Affair').mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at this brief glance of the data, it seems that the women who had affairs were slightly older,married longer, and slightly less religious and less educated. However, the mean values of both classes are very close for all variables.\n", "\n", "Let's go ahead and try to visualize some of this data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First some histograms." ] }, { "cell_type": "code", "execution_count": 230, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 230, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAZMAAAFhCAYAAAC1RkdzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAG89JREFUeJzt3X+0XWV95/H3QZKATkJnhJDJKN6Mkq+oi7HcKDVoIBX5\n", "kSoomRFrrUpHaBUZoa7BemW5qg3gaIWY1rIcooLVjgiXWlz80FW0gGkreMbRMtAvxSaUMWIMv4IU\n", "IYEzf+yd9nhzc+9JnnPOPefm/Vori7Of85znPE82uZ+797P3sxutVgtJkkrsN9MdkCQNP8NEklTM\n", "MJEkFTNMJEnFDBNJUjHDRJJUbP9eNh4RRwMfy8yVbWVvBd6bmcvr7TOBs4AdwJrMvD4iDgS+CBwC\n", "PAa8IzO39rKvkqS917Mjk4g4H7gcmNdW9svAb7VtLwLOAZYDJwIXR8Rc4N3A9zNzBfAF4IJe9VOS\n", "VK6Xp7nuBU4DGgAR8VzgQuDcnWXAK4ENmbk9M7fVnzkSOAa4qa5zE3B8D/spSSrUszDJzGupTl0R\n", "EfsBnwV+F/hZW7UFwKNt248BB9Xl2yaUSZIGVE/nTNqMAi8CLgMOAF4SEZcA3wLmt9WbDzxCFSTz\n", "J5RNqdlszrp1YbZv387mzZu72ubixYuZM2dOV9uUNLXR0dHG9LWGW1/CJDPvAF4GEBEvAL6cmb9b\n", "z5lcGBHzqELmCOBOYAOwCrgDOBm4tZPvmW07rNFoLB279IZcuHhJV9rbsnkjq1e/Klqt1j1daVCS\n", "av0Ik4lHDI2dZZn5QESsA26jOuU2lplPRsRlwJURcRvwJPDWPvRzIC1cvITFhy2d6W5I0pR6GiaZ\n", "uYnqSq3dlmXmemD9hDpPAG/uZd8kSd3jTYuSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkq\n", "ZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRifXkGvPZdjUZj\n", "LjDS5WY3tVqtp7rcpqQChol6bWTs0hty4eIlXWlsy+aNXHTeqgDu6UqDkrrCMFHPLVy8hMWHLZ3p\n", "bkjqIedMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTM\n", "MJEkFTNMJEnFDBNJUjHDRJJUzDCRJBXr6cOxIuJo4GOZuTIiXg6sA54GngTenplbIuJM4CxgB7Am\n", "M6+PiAOBLwKHAI8B78jMrb3sqyRp7/XsyCQizgcuB+bVRWuB92bmSuBa4AMRcShwDrAcOBG4OCLm\n", "Au8Gvp+ZK4AvABf0qp+SpHK9PM11L3Aa0Ki335KZP6hfzwGeAF4JbMjM7Zm5rf7MkcAxwE113ZuA\n", "43vYT0lSoZ6d5srMayNipG37AYCIWA6cDbwGOAl4tO1jjwEHAQuAbRPKptVsNlvFHR8g4+Pj3L+j\n", "621ms9nsbqNTf9/Qj0EqNTo62pi+1nDr6ZzJRBFxOjAGrMrMByNiGzC/rcp84BGqIJk/oWxas22H\n", "LVu2bOnaq+7Obra5evXqaLVa93SzzanMhjFIml7fwiQi3kY10X5cZj5cF98OXBgR84ADgCOAO4EN\n", "wCrgDuBk4NZ+9VOStOf6ESatiNgP+BRwH3BtRAD8VWZ+JCLWAbdRzd+MZeaTEXEZcGVE3EZ15ddb\n", "+9BPSdJe6mmYZOYmqiu1AJ67mzrrgfUTyp4A3tzLvkmSusebFiVJxQwTSVIxw0SSVMwwkSQVM0wk\n", "ScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wk\n", "ScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wk\n", "ScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUbP9eNh4RRwMfy8yVEfEi4ArgGeBO4OzMbEXE\n", "mcBZwA5gTWZeHxEHAl8EDgEeA96RmVt72VdJ0t7r2ZFJRJwPXA7Mq4suAcYycwXQAE6NiEXAOcBy\n", "4ETg4oiYC7wb+H5d9wvABb3qpySpXC9Pc90LnEYVHABHZeat9esbgeOBVwAbMnN7Zm6rP3MkcAxw\n", "U133prquJGlA9SxMMvNaqlNXOzXaXj8GHAQsAB7dTfm2CWWSpAHV0zmTCZ5pe70AeIQqMOa3lc+f\n", "pHxn2bSazWarvJuDY3x8nPt3TF9vD9vMZrPZ3Uan/r6hH4NUanR0tDF9reHWzzD5XkQcm5m3ACcD\n", "NwO3AxdGxDzgAOAIqsn5DcAq4I667q2TN/mLZtsOW7Zs2dK1V92d3Wxz9erV0Wq17ulmm1OZDWOQ\n", "NL1+XBq882jh/cBHIuKvqULsmsz8CbAOuI0qXMYy80ngMuClEXEb8C7gI33opyRpL/X0yCQzN1Fd\n", "qUVm/gNw3CR11gPrJ5Q9Aby5l32TJHWPNy1KkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSp\n", "mGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSp\n", "mGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSp\n", "mGEiSSpmmEiSihkmkqRihokkqdj+/fyyiNgPWA8sBZ4BzgSeBq6ot+8Ezs7MVkScCZwF7ADWZOb1\n", "/eyrJKlz/T4yOQF4Tma+GvgocBHwSWAsM1cADeDUiFgEnAMsB04ELo6IuX3uqySpQ309MgGeAA6K\n", "iAZwEPAUcHRm3lq/fyNV4DwNbMjM7cD2iLgXOBL4bp/7K0nqQL/DZANwAPD3wHOBNwAr2t5/jCpk\n", "FgCPTlIuSRpA/Q6T86mOOD4UEc8DvgXMaXt/AfAIsA2Y31Y+H3h4usabzWari32dcePj49y/o+tt\n", "ZrPZ7G6jU3/f0I9BKjU6OtqY6T70Wr/D5DlUQQFVOOwPfC8ijs3MW4CTgZuB24ELI2Ie1ZHMEVST\n", "81OabTts2bJlS9dedXd2s83Vq1dHq9W6p5ttTmU2jEHS9PodJp8APh8Rt1EdkXwQaAKX1xPsdwHX\n", "1FdzrQNuo7pIYCwzn+pzXyVJHeprmGTmI8CbJnnruEnqrqe6jFiSNOC8aVGSVMwwkSQVM0wkScUM\n", "E0lSMcNEklTMMJEkFTNMJEnFpg2TiPijScqu7E13JEnDaLc3LUbEeuCFwLKIeNmEz/xSrzsmSRoe\n", "U90BfyHwAmAd8PtUzxqB6mFVd/W2W5KkYbLbMMnMjcBG4MiIWEC1BPzOQPk3wEO9754kaRhMuzZX\n", "RIwBv0cVHu1LvC/pVackScOlk4Ue3wW8MDN/2uvOSJKGUyeXBt9HBw+mkiR1JiKOi4jLJpTd3cHn\n", "Xh4Rn5+mzkhE/DwiXtFW9sKI+H5E/EFEvC8imhFxzG4+v1dX63ZyZHIv8O2I+CbwZF3WysyP7s0X\n", "SpLo5VNh3w78EdVZpTvqsmOAr2TmhRFxM/CGzNw82Ycz8x1786WdhMmP6j87zaqnGUrSoIiIlwMf\n", "p/rZ/CzgDVTB82XgQGAr8Pg0zbwJWAH8TUQcWH9uDJgTERuBo4CvRMRJwGXAwvrPWGbeGBF3Z+YR\n", "EdEEfgx8PzM/NF3fpw2TzPz96epIkvZIAzglIl7cVnYY8GLgXZn5T/VpsFcDLwH+MjMvjYh3Asfu\n", "rtGIWA7838x8LCK+Bvx6Zn4uIi4GDs3MP4uIM4HTgUOAr2XmVyLiaOD9wI1tzf074LTMvK+TAXVy\n", "NdczkxRvzszndfIFkqRdtIDrMvPdOwvqOZMHgI9HxD8DRwA3AYcDX6qr/Q1ThAnwDuDFEXEj1S0c\n", "xwKfowqviWeVHgZOiIhfq7cn5sFTnQbJZB/eRWb+yyR9RMwB3ggs7/QLJEm7mGy6oAFcAhwPPEJ1\n", "lNAAEjgauBVYtrsGI2Ie8KvAizPz6brsjoh4KbvO0TSogueuzLwkIn4TWD2hzmQHEru1Rws9Zub2\n", "zLy67rAkae+02PUH/M65kduAv6C6knYR1bzGqyLiW8CJk3xupzcA39wZJLU/Bc5sa7/9u74J/FZE\n", "/CXV0c9zJ6nXsU5Oc7XP7DeAl/KvV3VJkvZQZt4C3DKh7Ij65ccn+chpHbR5DXDNhLJ1k9RbWb/c\n", "ArxskvdfMqE/Henkaq6V/GtStaiuJjh9T75EktQ9EfEVqgn0dtdk5qdnoj/Q2ZzJOyNiLhB1/Tsz\n", "c3vPeyZJmlRmvnmm+zBRJ88zWQbcA1xJdVXAfRHxK73umCRpeHRymmsdcHpmfgegDpJ1wCt72TFJ\n", "0vDo5Gqu5+wMEoDM/FvggN51SZI0bDo5Mnk4It6YmV8FiIg3AQ/2tluStG9pNBpzgZEuN7up1Wo9\n", "1eU2J9VJmJwFfC0iPkt1afAzVIuGSZK6Z2Ts0hty4eLuPCpqy+aNXHTeqqCa855UROwH/AlwJNUt\n", "H+/KzB/uzfd1EiYnAf9MtW7MC4GrgeOo7sqUJHXJwsVLWHzY0n5+5RuBuZm5vF6f65N12R7rJEx+\n", "G3hlZj4O/CAifhm4HfjM3nyhNIx6cAqib6cfpCkcQ7X+F5n5nfrq3b3SSZjsD7T/T/8Ue7hmizQL\n", "dO0URCenH6Q+WQBsa9t+OiL2y8w9/hnfSZh8FfhmRFxFNWdyGnDdnn6R1A07djwFMNJodPWxOh0d\n", "JczAKQip17YB89u29ypIoLM74D8QEf+F6mEr24FP7byyS+q3h7b8iOvWnPv1kUUHd6W9TQ9s5ZQL\n", "1nqUoH3VBqoFIq+u7yH8wd421MmRCfVKwVfv7ZdI3TSy6GAOf96ime6G1HVbNm/sd1t/DrwuIjbU\n", "22fs7fd1FCaSpJ7bVM+ldbXNqd7MzBbw7qnqdMowkaQBUM/bDe3p1r6HSUR8kOoc3Rzgj6nO2V1B\n", "dYXYncDZmdmqn1N8FrADWJOZ1/e7r5KkzuzRkxZLRcRxwKsycznVjY//keommbHMXEF1tdipEbEI\n", "OIfq8cAnAhfXy+BLkgZQX8MEOAH4u4j4KvA1qkuMRzPz1vr9G6mef/wKYEP9mOBtwL1Ut/tLkgZQ\n", "v09zHQI8H3g91VHJ16iORnZ6DDiI6kaaRycplyQNoH6HyVbg7szcAdwTET8H/kPb+wuAR9j1Rpr5\n", "wMPTNd5sNlvT1Rkm4+Pj3L+j621ms9nsbqNTf1/Xx9BtnfyddHsc/d4Pmlmjo6PT3mW7L6wa3E3f\n", "Bt4HXBIRi4FnAzdHxLGZeQtwMnAz1dpfF0bEPKpnpxxBNTk/pU522DBZtmzZ0rVX3d3VBTVXr14d\n", "rVarb1eM9GIM3dbJ30m3x9Hv/aChMHLdmnOz3zfk1gs8fiwzV5Z8X1/DJDOvj4gVEXE71XzNe6iu\n", "g768nmC/C7imvpprHXBbXW8sM10UT9Ks1u8bciPifOBtwM9K2+r7pcGZ+YFJio+bpN56YH3POyRJ\n", "+657qdZb/NPShvp9NZckaUBk5rVU9/IVM0wkScUME0lSMdfmkqQBsemBrTPVVvFtFYaJJA2GTfWl\n", "vF1tc7oKmbmJaumqIoaJJA2AYV812DkTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNM\n", "JEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxVw2WNDQajcZcYKTLzW6qV+xVAcNE0jAZGbv0hly4\n", "eElXGtuyeSMXnbcqGOKl3weFYSJpqCxcvITFhy2d6W5oAudMJEnFDBNJUjHDRJJUzDCRJBUzTCRJ\n", "xQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMtbmkPtux4ymAkUaj0c1mXflWM8owkfrsoS0/\n", "4ro15359ZNHBXWlv0wNbOeWCta58qxk1I2ESEQuBJvBa4Bngivq/dwJnZ2YrIs4EzgJ2AGsy8/qZ\n", "6KvUCyOLDubw5y2a6W5IXdP3OZOImAN8BngcaACXAGOZuaLePjUiFgHnAMuBE4GLI2Juv/sqSerM\n", "TByZfAK4DPhgvX1UZt5av74ROAF4GtiQmduB7RFxL3Ak8N1+d3Y28Vy9pF7pa5hExDuBn2bmNyLi\n", "g1RHIu0/2R4DDgIWAI9OUq4CnquX1Cv9PjI5A2hFxPHAy4ErgUPa3l8APAJsA+a3lc8HHp6u8Waz\n", "2epeV2fe+Pg49+/obpvdPlc/Pj6ezWZzqve7PoZum24MdZ2BHkcnY5gNerEf+vF3Nzo62tXTAYOo\n", "r2GSmcfufB0R3wJ+B/hERBybmbcAJwM3A7cDF0bEPOAA4AiqyfkpzbYdtmzZsqVrr7o7Z7ofU1m9\n", "enW0Wq3dHpnMhjHA4I+jkzHMBr3YD/vK312vzfSlwS3g/cDl9QT7XcA19dVc64DbqC4SGMtMz8tL\n", "0oCasTDJzJVtm8dN8v56YH3fOiRJ2msupyJJKmaYSJKKGSaSpGIzPQEvqU8ajcZcYKTLzXrTqgDD\n", "RNqXjIxdekMuXLykK41t2byRi85b5U2rAgwTaZ+ycPESFh+2dKa7oVnIORNJUjHDRJJUzDCRJBUz\n", "TCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMe+Al7TP2rHjKYCRRqOrD2ndJ9crM0wk7bMe2vIj\n", "rltz7tdHFh3clfY2PbCVUy5Yu0+uV2aYSNqnjSw6mMOft2imuzH0nDORJBUzTCRJxQwTSVIxw0SS\n", "VMwwkSQV82ouSXulB/do7JP3Z8wWhomkvdLNezT25fszZgvDRNJe8x4N7eSciSSpmGEiSSpmmEiS\n", "ihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKtbX5VQiYg7wOeAFwDxgDXA3cAXwDHAn\n", "cHZmtiLiTOAsYAewJjOv72dfJUmd6/eRyW8AP83MFcBJwKeBTwJjdVkDODUiFgHnAMuBE4GLI2Ju\n", "n/sqSepQvxd6vBq4pn69H7AdOCozb63LbgROAJ4GNmTmdmB7RNwLHAl8t8/9lSR1oK9hkpmPA0TE\n", "fKpguQD4w7YqjwEHAQuARycpn1Kz2Wx1rbMDYHx8nPt3zHQvpjY+Pp7NZnOq94d+DHWdgR6HYxgc\n", "k41jdHS0aw99GVR9X4I+Ip4PXAt8OjP/V0R8vO3tBcAjwDZgflv5fODh6dqebTts2bJlS9dedXfO\n", "dD+msnr16mi1Wrt9BsVsGAMM/jgcw+DoZByzUV/nTCLiUOAbwPmZeUVd/L2IOLZ+fTJwK3A78JqI\n", "mBcRBwFHUE3OS5IGUL+PTMaoTld9OCI+XJe9D1hXT7DfBVxTX821DriNKvDGMtPHeUrSgOr3nMn7\n", "qMJjouMmqbseWN/rPkmSynnToiSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJ\n", "JKmYYSJJKtb3VYP7pdFozAVGutzsplar5RphkjTBrA0TYGTs0hty4eIlXWlsy+aNXHTeqgD2uaWl\n", "JWk6szlMWLh4CYsPWzrT3ZCkWc85E0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJ\n", "xQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJ\n", "xQwTSVIxw0SSVGz/me7A7kTEfsCfAEcCTwLvyswfzlR/dux4CmCk0Wh0q8lNrVbrqW41JkkzaWDD\n", "BHgjMDczl0fE0cAn67IZ8dCWH3HdmnO/PrLo4OK2Nj2wlVMuWBvAPeU9k6SZN8hhcgxwE0Bmfici\n", "ls1wfxhZdDCHP2/RTHdDkgbOIIfJAmBb2/bTEbFfZj7TaQNbNm/sWmce3PL/2PT41q60temBztsZ\n", "1DFA5+OYDWOA7o1jNowB/DcxmT0Zx2zTaLVaM92HSUXEJ4G/zcyr6+37M/P5u6vfbDYHcyCSBIyO\n", "jnZtwnUQDfKRyQbgDcDVEfErwA+mqjzbd5QkDbJBDpM/B14XERvq7TNmsjOSpN0b2NNckqTh4U2L\n", "kqRihokkqZhhIkkqZphIkooN8tVcfVcv2/KxzFwZEV8GDq3fWgL8dWa+dUL9/w08Wm/+Y2b+1/71\n", "dlcRMQf4HPACYB6wBrgfWAc8TbXG2dszc0vbZwZqDbTdjOGtwM6lB3bZFwM4hmcBlwNLgRbwO8Ac\n", "hms/TDaGCxii/bBTRCwEmsBrgWczRPthmHhkUouI86n+8cwDyMy3ZOZK4E3Aw8B5E+ofUNdbWf+Z\n", "0SCp/Qbw08xcAZwEfBq4FHhvPZZrgQ9M+My/rIEG/B7VGmgzaeIY/jgzf32qfcHgjeH1wDOZ+Wqq\n", "H8AXAWsZrv0wcQwXDuF+2PnLyWeAx4EGw7cfhoZh8q/uBU6j+h+u3UeBdZn5kwnl/wl4dkR8PSJu\n", "ro9qZtrVwIfr1/sB24G3ZObOGz7nAE9M+MwvrIEGzPQaaBPHsKPtvd3ti4EaQ2b+BfDb9eYI8BBw\n", "+jDth0nG8HDb20OxH2qfAC4Dfkx1hDVU+2GYGCa1zLyWX/zBtfPw+FeBKyb5yOPAJzLzRKpTAF+q\n", "D5FnTGY+npk/i4j5VD+UP7TzH3xELAfOpjpSaTfpGmh96fAkJhsDTLsvBmoMAJn5dERcQXVK5c+G\n", "bT/ArmOA4doPEfFOqqPcb9RFjWHcD8PCv6Sp/WfgS5k52Z2d9wBfAsjMfwAeBP59H/s2qYh4PvBN\n", "4AuZ+eW67HSq385WZeaDEz6yDZjftr1Hi2n2wmRjYOp9MXBjAMjMd1LNOVweEc8etv0Au46B4doP\n", "Z1CtovEt4OXAlRFx6DDuh2FgmEzttcCNu3nvDOrzqRGxmOo3mh/3qV+TiohDgW8A52fmFXXZ26h+\n", "AzsuMzdN8rENwKq67rRroPXaZGOoTbUvBm0MvxkRH6w3nwCeAVYzXPth4hiephrH8QzJfsjMYzPz\n", "uHp+5P8AbwdexxDth2Hi1Vy7av+NK4B/bH8zIq6kOvXyWeDzEXFr/dYZA/AbzBhwEPDhiPgw8Czg\n", "ZcAm4NqIAPirzPxI2zgGbQ20iWNoUf3jnmpfDNoYrgGuiIhbqM7Lnwt8HriP4dkPu4whM38eEUsZ\n", "nv3QrkX18+5TDNd+GBquzSVJKuZpLklSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJ\n", "xbwDXvuciNifam2ml1I9syapVow+C3gv8Ajw98AP67ujTwI+QnUn+EbgzMx8aCb6Lg0qj0y0L3oV\n", "8PP6mRUvAg4EzgfeAxwFvAY4HGhFxCHAxcAJmXkU1bph/2NGei0NMJdT0T4pIl4CrAReTPWwp/8J\n", "zM/M/16//9+Afwt8F/gC8E/1R58FPFgvHiip5mku7XMi4hSq01ZrqR4R/FyqU1u/1FZt50PSngV8\n", "OzNPrT97AL+4RLkkPM2lfdNrga9k5pXAT4AVdfmqiJgfEXOplox/BvgO8KqIOLyucwHw8X53WBp0\n", "nubSPiciXkb15MAngQeoTmE9RPU8mvcAPwO2Ui1P/ocR8XrgD6iOUu4H3paZD0/WtrSvMkwkoD7y\n", "+LXMXFtvfxW4PDOvn9meScPBOROpch/wioj4O6oHKd1kkEid88hEklTMCXhJUjHDRJJUzDCRJBUz\n", "TCRJxQwTSVKx/w/C6mf2r2ZSOQAAAABJRU5ErkJggg==\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Factorplot for age with Had Affair hue\n", "sns.factorplot('age',data=df,hue='Had_Affair',palette='coolwarm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This suggests a higher probability of an affair as age increases. Let's check the number of years married." ] }, { "cell_type": "code", "execution_count": 231, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 231, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAZMAAAFhCAYAAAC1RkdzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAHo5JREFUeJzt3X+UXGWd5/F3MySBYRKcETAbEcMo+Yp4WKVbGYObwAyK\n", "oPiDdmX8MaI7gDIMo457UCPrym4irI4Y4w92loiAOIrQ6MJBog6ohPgDLF0UxW/EIcKIiAgkARES\n", "qP3j3oai6KQ7/XRVVyXv1zk5dD331nO/dHXVp+597n3uQLPZRJKkEjtNdwGSpP5nmEiSihkmkqRi\n", "hokkqZhhIkkqZphIkort3MnOI+Jg4MzMPCwi9gLOAZ4EDABvysx1EXECcCKwGViamVdExK7AhcCe\n", "wEbguMy8q5O1SpImr2N7JhFxKlV4zKqbPgR8NjMXA+8HnhMRc4FTgIXAEcAZETETOAm4ITMXARcA\n", "p3WqTklSuU4e5roZOIZqLwSqwHhaRHwdeANwNfACYE1mbsrMDfVzDgQOAVbVz1sFHN7BOiVJhToW\n", "Jpl5KdWhq1Hzgbsz88XArcC7gdnA+pZ1NgK7A3OADW1tkqQe1dExkza/Ay6rf74cWAZ8nypQRs0G\n", "7qUKktltbVvVaDT6bl6YTZs2cfvttxf3M2/ePGbMmDEFFUnqhMHBwYHx1+pv3QyTa4GXUQ2sLwZu\n", "BK4DlkXELGAXYP+6fQ1wFHA9cCRwzUQ20G8v2MDAwIIlH/1K7jVv30n3cefttzA8/MJoNptrp7A0\n", "Sdom3QiT0T2GdwErI+Ikqj2N12fm+ohYAaymOuS2JDMfjIizgfMjYjXwIPD6LtQ5Lfaaty/z9lkw\n", "3WVIUpGOhklmrqMaeCczbwVeMsY6K4GVbW0PAK/tZG2SpKnjRYuSpGKGiSSpmGEiSSpmmEiSihkm\n", "kqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkm\n", "kqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkm\n", "kqRihokkqdjOnew8Ig4GzszMw1raXg/8fWYurB+fAJwIbAaWZuYVEbErcCGwJ7AROC4z7+pkrZKk\n", "yevYnklEnAqcA8xqaXse8F9aHs8FTgEWAkcAZ0TETOAk4IbMXARcAJzWqTolSeU6eZjrZuAYYAAg\n", "Ip4MLAPeMdoGvABYk5mbMnND/ZwDgUOAVfU6q4DDO1inJKlQx8IkMy+lOnRFROwEfBr4R+C+ltXm\n", "AOtbHm8Edq/bN7S1SZJ6VEfHTFoMAs8EzgZ2AZ4dEWcB3wBmt6w3G7iXKkhmt7WNq9FoNKeq4G4Y\n", "GRnhts1T0k82Go3yjiR1xODg4MD4a/W3roRJZl4PPAcgIp4OfCEz/7EeM1kWEbOoQmZ/4EZgDXAU\n", "cD1wJHDNRLbTby/Y0NDQguUX3ZSl/QwPD0ez2Vw7FTVJ0mR049Tg9r2FgdG2zLwDWAGsBq4ClmTm\n", "g1R7MAdExGrgeOD0LtQpSZqkju6ZZOY6qjO1ttiWmSuBlW3rPAC8tpO1SZKmjhctSpKKGSaSpGKG\n", "iSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKG\n", "iSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKG\n", "iSSpmGEiSSpmmEiSihkmkqRiO3ey84g4GDgzMw+LiOcCK4CHgQeBN2XmnRFxAnAisBlYmplXRMSu\n", "wIXAnsBG4LjMvKuTtUqSJq9jeyYRcSpwDjCrbloO/H1mHgZcCrw7Ip4CnAIsBI4AzoiImcBJwA2Z\n", "uQi4ADitU3VKksp18jDXzcAxwED9+K8z80f1zzOAB4AXAGsyc1NmbqifcyBwCLCqXncVcHgH65Qk\n", "FepYmGTmpVSHrkYf3wEQEQuBk4GPAnOA9S1P2wjsXrdvaGuTJPWojo6ZtIuIY4ElwFGZ+buI2ADM\n", "blllNnAvVZDMbmsbV6PRaE5huR03MjLCbZvHX28C/WSj0SjvSFJHDA4ODoy/Vn/rWphExBupBtoP\n", "zcx76ubrgGURMQvYBdgfuBFYAxwFXA8cCVwzkW302ws2NDS0YPlFN2VpP8PDw9FsNtdORU2SNBnd\n", "CJNmROwEfAz4JXBpRAB8MzNPj4gVwGqqQ25LMvPBiDgbOD8iVlOd+fX6LtQpSZqkjoZJZq6jOlML\n", "4MlbWGclsLKt7QHgtZ2sTZI0dbxoUZJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lS\n", "McNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lS\n", "McNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQV27mTnUfEwcCZ\n", "mXlYRDwTOA94BLgRODkzmxFxAnAisBlYmplXRMSuwIXAnsBG4LjMvKuTtUqSJq9jeyYRcSpwDjCr\n", "bjoLWJKZi4AB4JURMRc4BVgIHAGcEREzgZOAG+p1LwBO61SdkqRynTzMdTNwDFVwAByUmdfUP18J\n", "HA48H1iTmZsyc0P9nAOBQ4BV9bqr6nUlST2qY2GSmZdSHboaNdDy80Zgd2AOsH4L7Rva2iRJPaqj\n", "YyZtHmn5eQ5wL1VgzG5pnz1G+2jbuBqNRrO8zO4ZGRnhts3jrzeBfrLRaJR3JKkjBgcHB8Zfq791\n", "M0x+GBGLM/NbwJHAVcB1wLKImAXsAuxPNTi/BjgKuL5e95qxu3y8fnvBhoaGFiy/6KYs7Wd4eDia\n", "zebaqahJkiajG6cGj+4tvAs4PSK+TRVil2Tmb4AVwGqqcFmSmQ8CZwMHRMRq4Hjg9C7UKUmapI7u\n", "mWTmOqoztcjMnwOHjrHOSmBlW9sDwGs7WZskaep40aIkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJ\n", "KmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJ\n", "KmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSio0bJhHx8THazu9MOZKkfrTzlhZExErgGcBQRDyn7TlP\n", "6nRhkqT+scUwAZYBTwdWAB8ABur2zcBPO1uWJKmfbDFMMvMW4BbgwIiYA+zOY4HyJ8DdnS9PktQP\n", "trZnAkBELAHeQxUezZZF+3aqKElSfxk3TIDjgWdk5m87XYwkqT9N5NTgXwL3dLoQSdpRRMShEXF2\n", "W9tNE3jecyPiM+OsMz8i/hARz29pe0ZE3BAR/zMi3h4RjYg4ZAvPn9TZuhPZM7kZuDYirgYerNua\n", "mfk/tnVjEbETsBJYADwCnAA8DJxXP74RODkzmxFxAnAi1YD/0sy8Ylu3J0k9qjn+KpP2JuDjVEeV\n", "rq/bDgG+mJnLIuIq4OjMvH2sJ2fmcZPZ6ETC5Ff1v1EDW1pxAl4C7JaZL4qIw4EP1jUsycxr6qR+\n", "ZUR8FzgFGAR2pQqzr2fmQwXblqSeFhHPBT5E9bn4R8DRVMHzBarPwruA+8fp5tXAIuA7EbFr/bwl\n", "wIyIuAU4CPhiRLwUOBvYq/63JDOvjIibMnP/iGgAvwZuyMz3jVf7uGGSmR8Yb51t8ACwe0QMUJ0d\n", "9hBwcGZeUy+/kipwHgbWZOYmYFNE3AwcCHx/CmuRpOkyALwiIp7V0rYP8Czg+My8tf5y/SLg2cC/\n", "ZuZHI+LNwOItdRoRC4GfZObGiLgceF1mnhsRZwBPycx/qY/6HAvsCVyemV+MiIOBd1F9Bo/6M+CY\n", "zPzlRP6HJnI21yNjNN+emXtPZANt1gC7AD8DnkyVuotalm+kCpk5wPox2iVpe9AELsvMk0Yb6jGT\n", "O4APRcTvgf2BVcB+wOfq1b7DVsIEOA54VkRcSXUJx2LgXKrwaj+qdA/wkoh4Wf24PQ8emmiQjPXk\n", "J8jMRwfpI2IG8Cpg4UQ30OZUqj2O90XE3sA3gBkty+cA9wIbgNkt7bOZwEkAjUajk8chp9zIyAi3\n", "bZ6SfrLRaJR3JKkjBgcH2z/IxxouGADOAg6n+hy8sm5L4GDgGmBoS9uIiFnAXwLPysyH67brI+IA\n", "njhGM0AVPD/NzLMi4m+A4bZ1xtqR2KKJjJk8qj7sdHFEnLYtz2uxG1VQQBUOOwM/jIjFmfkt4Ejg\n", "KuA6YFn9y9mFKqFvHK/zMV6wnjY0NLRg+UU3ZWk/w8PD0Ww2105FTZK6oskTP+BHx0ZWA/9GdSbt\n", "XKpxjc9FxDeA26hOShrL0cDVo0FS+yzViU4/bNteE7ga+HxEHAV8m+poEUzy5ICBZnPrz4uI1pH9\n", "AeAAYHFmvmBbNxYRTwI+A+xBtUeyHGgA5wAzqaZpOaE+m+t4qrO5dgKWZeaXttZ3o9Fo9luYDAwM\n", "LFh+0U05b58Fk+7j9lvX8o5j9zdMJE2rieyZHMZjSdWkOpvg2MlsLDPvpTrToN2hY6y7kuo0YklS\n", "i4j4ItUAeqtLMvOT01EPTGzM5M0RMROIev0b68NdkqRpkJmvne4a2k3kfiZDwFrgfKqzAn4ZEX/R\n", "6cIkSf1jIoe5VgDHZub3AOogWQFs85iJJGn7NJG5uXYbDRKAzPwu1RlWkiQBE9szuSciXpWZXwaI\n", "iFcDv+tsWeo1AwMDM4H5U9DVumaz6bQ4UpspfI+16tr7bSJhciJweUR8murU4EeoJg3TjmX+ko9+\n", "JfeaN/nb2Nx5+y188J1HBdUYnKTHK36PtZrI+62efPdTVNNVPUg1lcsvJrO9iYTJS4HfU80b8wzg\n", "YqpTeYsvtlN/2WvevpRcEyNp66bhPfYqYGZmLqzn5/pI3bbNJjJm8lbgRZl5f2b+CHge1Yy+kqT+\n", "dgjV/F/UY+NbnK5lPBMJk52pZvcd9RDbOGeLJKknzeGxKa4AHq4PfW2ziRzm+jJwdURcRDVmcgxw\n", "2WQ2JknqKe2T6u6UmZPaWRg3gTLz3VTXlQSwL/CxzJzsRI+SpN6xBjgKHr2G8EeT7WhCswZn5sVU\n", "A++SpA658/Zbut3Xl4AXR8Sa+vFbJru9bZqCXpLUMevqU3mntM+tLczMJnDS1taZKMNEknpAfXFh\n", "316DNalRe0mSWhkmkqRihokkqZhhIkkq5gC8JPWAHWHWYElS582/bOk7cv7cPaaks3V33MUrTls+\n", "7izd9QSPZ2bmYSXbM0wkqUfMn7sH++09t2vbi4hTgTcC95X25ZiJJO24bqaab3GgtCPDRJJ2UJl5\n", "KbB5KvoyTCRJxQwTSVIxB+AlqUesu+Ou6eqrWbo9w0SSesO6+lTeKe1zvBUycx2wsHRDhokk9QBn\n", "DZYk7fAME0lSsa4f5oqI9wJHAzOAT1Ddg/g84BHgRuDkzGxGxAnAiVTnQC/NzCu6XaskaWK6umcS\n", "EYcCL8zMhcChwJ8DHwGWZOYiqqswXxkRc4FTqAaFjgDOiIiZ3axVkjRx3T7M9RLgxxHxZeBy4DJg\n", "MDOvqZdfCRwOPB9Yk5mbMnMD1SX/B3a5VknSBHX7MNeewNOAl1PtlVzO4+eE2QjsDswB1o/RLknq\n", "Qd0Ok7uAmzJzM7A2Iv4APLVl+RzgXmADMLulfTZwz3idNxqN4gtvumlkZITbpmBWnJGRkWw0GuUd\n", "bX0bfVOr1GsGBweLJ1Lsdd0Ok2uBtwNnRcQ84I+BqyJicWZ+CzgSuAq4DlgWEbOAXYD9qQbnt6rf\n", "XrChoaEFyy+6KUv7GR4ejmaz2dHz0/upVknd19UwycwrImJRRFxHNV7zd1RXaJ5TD7D/FLikPptr\n", "BbC6Xm9JZnblbmGSpG3X9VODM/PdYzQfOsZ6K4GVHS9IklTMixYlScUME0lSMcNEklTMMJEkFTNM\n", "JEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNM\n", "JEnFDBNJUjHDRJJUzDCRJBXbeboLkKSpMDAwMBOYX9jNumaz+dAUlLPDMUwkbS/mL/noV3KveftO\n", "6sl33n4LH3znUQGsndqydgyGiaTtxl7z9mXePgumu4wdkmMmkqRi7plI2qIpGocAxyK2e4ZJn9u8\n", "+SGA+QMDA6Vd+WbXWIrGIcCxiB2FYdLn7r7zV1y29B1fnT93j0n3se6Ou3jFact9s2tMjkNoIgyT\n", "7cD8uXuw395zp7sMSTswB+AlScUME0lSsWk5zBURewEN4K+AR4Dz6v/eCJycmc2IOAE4EdgMLM3M\n", "K6ajVknS+Lq+ZxIRM4B/Bu4HBoCzgCWZuah+/MqImAucAiwEjgDOiIiZ3a5VkjQx03GY68PA2cCv\n", "68cHZeY19c9XAocDzwfWZOamzNwA3Awc2PVKJUkT0tUwiYg3A7/NzK/VTQP1v1Ebgd2BOcD6Mdol\n", "ST2o22MmbwGaEXE48FzgfGDPluVzgHuBDcDslvbZwD3jdd5oNJpTV2rnjYyMcNvm6a6iMjIyko1G\n", "Y2vLp6TW8baj3tJPr/tU1NqpOgcHB4uvKu51XQ2TzFw8+nNEfAN4G/DhiFicmd8CjgSuAq4DlkXE\n", "LGAXYH+qwfmt6rcXbGhoaMHyi27K6a4DYHh4OJrN5hYvWpyKWjdvfojhY4ePANYVdOOV+l00VX+j\n", "4/19TYWpqLUbdW6vpvuixSbwLuCceoD9p8Al9dlcK4DVVIfilmSmHyB9rvRqfa/Ul3rXtIVJZh7W\n", "8vDQMZavBFZ2rSB1hVfrS9snL1qUJBWb7sNckrZzzmy9YzBMJHWUM1vvGAwTSR3nWNn2b7sME+8O\n", "J0ndtV2GCd4dbofVL18k+qXOHYljO2W21zDx7nA7rn75ItEvde4wHNsps92GiXZc/fJFol/q3JE4\n", "tjN5XmciSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKG\n", "iSSpmGEiSSpmmEiSihkmkqRi3s9EauMd96RtZ5hIbbzjnrTtDBNpDP1wxz33oNRLDBOpT7kHpV5i\n", "mEh9rB/2oLRj8GwuSVIxw0SSVKyrh7kiYgZwLvB0YBawFLgJOA94BLgRODkzmxFxAnAisBlYmplX\n", "dLNWSdLEdXvP5A3AbzNzEfBS4JPAR4AlddsA8MqImAucAiwEjgDOiIiZXa5VkjRB3R6Avxi4pP55\n", "J2ATcFBmXlO3XQm8BHgYWJOZm4BNEXEzcCDw/S7XK0magK6GSWbeDxARs6mC5TTgn1pW2QjsDswB\n", "1o/RLknqQV0/NTgingZcCnwyMz8fER9qWTwHuBfYAMxuaZ8N3DNe341GowkwMjLCbZvLax0ZGclG\n", "o1He0Zb7n5I6p8J4/6+9Umu/1Albr9U6t12/vPZj1Tk4OFh8ZWmv6/YA/FOArwF/l5nfqJt/GBGL\n", "M/NbwJHAVcB1wLKImAXsAuxPNTi/VaMv2NDQ0ILlF92UpfUODw9Hs9ns2MVcU1XnVBjv/7VXau2X\n", "OmHrtVrntuuX177Tnxu9qtt7JkuoDle9PyLeX7e9HVhRD7D/FLikPptrBbCaamxlSWY63YMk9ahu\n", "j5m8nSo82h06xrorgZWdrkmSVM6LFiVJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEk\n", "FTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEk\n", "FTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSsZ2nu4AtiYid\n", "gE8BBwIPAsdn5i+6tf3Nmx8CmD8wMFDa1bpms/lQeUWS1Lt6NkyAVwEzM3NhRBwMfKRu64q77/wV\n", "ly19x1fnz91j0n2su+MuXnHa8gDWTl1lktR7ejlMDgFWAWTm9yJiqNsFzJ+7B/vtPbfbm5WkvtPL\n", "YTIH2NDy+OGI2CkzH5nIk++8/Zaijf/uzn9n3f13FfWx7o7xn98vdcL019ovdcL29dr3S51QVms3\n", "69weDTSbzemuYUwR8RHgu5l5cf34tsx82pbWbzQavfk/IknA4OBg8QBsL+vlPZM1wNHAxRHxF8CP\n", "trby9v5CSVIv6+Uw+RLw4ohYUz9+y3QWI0nasp49zCVJ6h9etChJKmaYSJKKGSaSpGKGiSSpWC+f\n", "zdU1480DFhHvBP4W+G3d9NbM7PoUKRExAzgXeDowC1iamZf3Wp11Le+lOrV7BvCJzDy/ZdnRwH8D\n", "NgPnZubKaapxJrASeCawCfiHzLyhl+qspxI6MzMPi4hnA/+nXvRzqr/Th1vWnbb57Nrq3As4B3gS\n", "MAC8KTPXta3/A2B9/fDfMvNvu1DjE94/wC/o0d9pv3HPpPLoPGDAe6jmAWt1EPA3mXlY/W+65tp6\n", "A/DbzFwEvBT4RNvynqgzIg4FXlj/Pg8F/rxl2QzgLODFwGLgxPrDZzqcAPy+rvMEqg8aoDfqjIhT\n", "qT6UZ9VNy4D3ZOaL6sdHtz1lvL/jbtX5IeCzmbkYeD/wnLb1dwFo+TvteJDU2t8/n6QKlJ77nfYj\n", "w6TyuHnAgPZ5wAaBJRGxOiLe0+3iWlxM9eaE6rXb3La8V+p8CfDjiPgycDlwWcuy/YGbM3N9Zm4C\n", "rgUWTUONAM/msdd9LfDUiJhTL+uFOm8GjqH6dg8wnJnX1ntUc4F729Yf7++4W3UuBJ4WEV+n+gC/\n", "um39/wj8cUR8NSKuqvdquqH9/bOJ3v2d9h3DpDLmPGAtjz8PvBX4S+BFEfGybhY3KjPvz8z7ImI2\n", "1RvjfW2r9ESdwJ5UwfYa4G3A51qWzeGxwxsAG4Hdu1fa4/w/4OUA9SwLewK71cumvc7MvJSWLwyZ\n", "+UhE7APcCDyZJ84KMd7fcVfqBOYDd2fmi4FbgXe3PeV+4MOZeQT130eX6nzC+yczm734O+1H/lIq\n", "G4DZLY/bJ5T8WGbeXX9DvQJ4XleraxERT6P6pndBZn6hbXGv1HkX8LXM3Fx/4/9DRIzO5b+ex/+u\n", "ZwP3dLvA2rnAhohYTXU4Yy1wd72sl+p8VGbempkLgH+mOgzXary/4275HY/tjV7OE7/Nr6X+gpGZ\n", "P6/X/w/dKGys90+f/E57nmFSWQMcBY9+Q33020lE7E51yGa3iBig+tb//ekoMiKeAnwNODUzz2tb\n", "1jN1Uh0Semld1zyqb/ujH9I/A/aLiD+tDy0sAr4zLVXCC4CrM/M/AZcAv87MB3uwTgAi4rKIeGb9\n", "8D7g4bZVtvh33GXXAqN7xYupvvW3egv12EP99zEH+HWnixrr/dNHv9Oe53QqQP3hO3rGBlR/7IPA\n", "n2TmORHxOuCdVGdz/Gtmnj5NdX4M+M9AtjSfA+zWS3UCRMT/Ag6j+sLyXmAPHvt9vpzq2PVOwKcz\n", "8+xpqvHPgIuowu4B4ESqgOmZOiNiPvAv9U3iXgh8GHiI6lDR8Zn5m4g4n+qQ569o+zvu1kkYbXXu\n", "Q3WW3G5UYxCvz8z1LXXeAXyG6qwqqD7cv9uFGsd6/7yP6oSBnvud9hvDRJJUzMNckqRihokkqZhh\n", "IkkqZphIkooZJpKkYoaJJKmYYSJNoYg4vZ5teKLrvyYiPtPJmqRucAp6aQpl5n+f7hqk6WCYqOdF\n", "xAXA6sw8p378DeBg4ErgAOCNwNvrnwE+tbV7j0TEB4B9qK5q3gs4jWr6mYOBGzLzryNiZ+Dsus+n\n", "UF01fQzVzLKrqO4Z8wfgQuDNVJMEXg7MA76ZmedHxJvqunYCGsDJmflgRLyh3uZ9VDPu/qHsNyRN\n", "Pw9zqR+cSxUYRMTTqWb3/R7VB/+zqKbt+NPMPAg4nGra8PEcQDV1yhvr/s+kuu/GQRFxIPBC4A/1\n", "fSyeCexKPUcTsAB4Qz0r7gDwVOC5mTk6i3MzIg4Ajqe6r8vzqMLnv9ZzUf0T1X1eDq77dRoK9T3D\n", "RP3gW8C8OkjeBFxQt3+v/u+PgYiIVVTh0D7lebsm8PV69tdbqSZ4/Fl9h71fAU/KzNXA/46Ik4EV\n", "wH48Nj39nZl5a0t/P2ibSXaAal6y/YDvRcQPgVcAQRVS387M39TPOY/H7gMi9S3DRD0vM5vA+cDr\n", "qSbq+2y96IF6+d1Uexofp/rA/kE9i/LWbGr5uf0mYwMR8QqqQ1j3Ue25XMNjH/oPtKzbbHs8aifg\n", "i5n5vHrP5GDgH+r1W8OjfZZaqS8ZJuoX51HdSOnWzHzcdOX17L4XZuYVVGMU9wF7F27vr6jC4Hzg\n", "N1RT0P/RGOttaa/im8CrI2LPelbqs6nC5FrghRGxd93+usI6pZ5gmKgvZOa/A7+kCpV2q4DfR8RP\n", "qA59jWTmT8bpsrmFn0cfnwO8LiKup7pp0v8F9q2XtT/3Cc/PzB8Bp1PdiGn0fh5nZuadwElU99W4\n", "nmrw3TET9T2noFdfqAeuvwkcUN9JUlIP8dRg9byIeA3VDYreNtEgiYh3AMeNsehXmfnyqaxPknsm\n", "kqQp4JiJJKmYYSJJKmaYSJKKGSaSpGKGiSSp2P8HBrGBvo/sSRwAAAAASUVORK5CYII=\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Factorplot for years married with Had Affair hue\n", "sns.factorplot('yrs_married',data=df,hue='Had_Affair',palette='coolwarm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like probability of having an affair increases with the number of years married. Let's check the number of children." ] }, { "cell_type": "code", "execution_count": 232, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 232, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAZMAAAFhCAYAAAC1RkdzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAF4RJREFUeJzt3X2UXVV5x/HvIASiJrQWY5oCTpYlj6iLCpOqCyiR1jdo\n", "QSWrsnwrsBTUsqjQdlFXjNba4Gt5aazSlhehai1C0EIRUIESTKvQ0eJC6AOpiaUijeElBNSQwO0f\n", "5wxcJzeZm9n3zr135vtZaxb37nPOPnvPhPu75+xz9hlqNBpIklRit143QJI0+AwTSVIxw0SSVMww\n", "kSQVM0wkScUME0lSsd27VXFE7AFcDDwf2BNYAdwFXAI8CdwBnJqZjYg4GTgF2AasyMxrImI28Hng\n", "ucBm4ITM3Nit9kqSJq+bRyZvBX6SmUcArwM+DZwNLKvLhoDXR8R84DTgUOC1wEcjYhbwHuD2et1/\n", "AJZ3sa2SpALdDJPLgQ827WcrcEhmrq7LrgVeBfwmsCYzt2bmI8Ba4CDgMOC6et3r6nUlSX2oa6e5\n", "MvMxgIiYQxUsy4G/alplM7A3MBfYtIPyR8aVSZL6UNfCBCAi9gOuBD6dmV+MiE80LZ4LPEwVGHOa\n", "yue0KB8r26nR0VHnhpHUd0ZGRoZ63YZu6+YA/POArwF/mJk31cXfjYglmXkzcBRwA3ArcFZE7Ans\n", "BRxINTi/BjgauK1edzVtmAl/NEnqN908MllGdWrqgxExNnbyXmBlPcB+J3BFfTXXSuAWqrGVZZm5\n", "JSLOBy6NiFuALcBbuthWSVKBoek0a/Do6GjDIxNJmnretChJKmaYSJKKGSaSpGKGiSSpmGEiSSpm\n", "mEiSihkmkqRihokkqVhX5+bqtaGhoVnAcAerXN9oNB7vYH2SNC1M6zABhped+9Wct2BhcUUb7lvH\n", "R844OoC7y5slSdPLdA8T5i1YyIL9F/W6GZI0rTlmIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSq2e7d3EBEvBz6WmUdGxMHA1cA99eLPZOblEXEycAqwDViRmddExGzg88Bz\n", "gc3ACZm5sdvtlSTtuq6GSUScCbwNeLQuGgHOycxzmtaZD5xWL5sNfDMivg68B7g9Mz8cEccDy4HT\n", "u9leSdLkdPvIZC1wHPC5+v0IsCgiXk91dHI68DJgTWZuBbZGxFrgIOAw4OP1dtcBH+hyWyVJk9TV\n", "MZPMvJLq1NWYbwN/mplLgB8Afw7MATY1rbMZ2BuYCzwyrkyS1Ie6PmYyzpczcyw4vgx8ClhNFShj\n", "5gAPUwXJnHFlExodHW2MvV61ahX3btvZ2rtm1apVOTo62rkKJc0IIyMjQ71uQ7dNdZhcFxF/lJm3\n", "Aa8C/gO4FTgrIvYE9gIOBO4A1gBHA7cBR1GFzoSa/2iLFy9edN5ld2WnGr906dJoNBp3d6o+SZou\n", "pipMxo4W3g18OiK2Aj8GTsnMRyNiJXAL1Wm3ZZm5JSLOBy6NiFuALcBbpqitkqRd1PUwycz1wKH1\n", "69uBw1uscyFw4biynwFv6nb7JEnlvGlRklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklRswjCJiE+1KLu0O82RJA2i3Xe0ICIuBF4ALI6Il4zb\n", "5pe63TBJ0uDYYZgAZwHPB1YCHwKG6vJtwJ3dbZYkaZDsMEwycx2wDjgoIuYCe/N0oDwbeLD7zZMk\n", "DYKdHZkAEBHLgPdRhUejadHCbjVKkjRYJgwT4J3ACzLzJ91ujCRpMLVzafAPgYe63RBJmiki4pUR\n", "cf64srva2O6lEfHZCdYZjoifR8RvNpW9ICJuj4i/jIj3RsRoRBy2g+0ndbVuO0cma4FvRsSNwJa6\n", "rJGZH25nBxHxcuBjmXlkRPw6cAnwJHAHcGpmNiLiZOAUqsH9FZl5TUTMBj4PPBfYDJyQmRt3oW+S\n", "1K8aE68yaX8AfIrqrNJtddlhwJcy86yIuAE4JjPva7VxZp4wmZ22EyY/qn/GDO1oxfEi4kzgbcCj\n", "ddE5wLLMXF2n8usj4lvAacAIMJsquL4OvAe4PTM/HBHHA8uB09vdtyQNmoh4KfAJqs/mZwDHUAXP\n", "P1F9Pm4EHpugmjcCRwD/Xn8pnw0sA/aIiHXAIcCXIuJ1wPnAvPpnWWZeGxF3ZeaBETEK/Jjqc/j9\n", "E7V9wjDJzA9NtM5OrAWOAz5Xvz8kM1fXr68FXgM8AazJzK3A1ohYCxxElaQfr9e9DvhAQTskqZ8M\n", "AcdGxAubyvYHXgi8MzP/p/7CfTjwIuAbmXluRJwILNlRpRFxKPD9zNwcEVcDb87MiyPio8DzMvMf\n", "6zNBx1Od9bk6M79Un0H6E6rP5THPAY7LzB+206F2ruZ6skXxfZm570TbZuaVETHcVNR8VLOZ6nLj\n", "ucCmHZQ/Mq5MkqaDBnBVZr5nrKAeM7kf+ERE/BQ4kOqL9AHAF+rV/p2dhAlwAvDCiLiW6haOJcDF\n", "VJ+9488qPQS8JiJ+t34/Pg8ebzdIWm28ncx8apA+IvYA3gAc2u4OxmkOprnAw1SBMaepfE6L8rGy\n", "CY2Ojj51LnLVqlXcu22SLW1h1apVOTo62rkKJc0IIyMj4z/IWw0XDFENBbyK6vPu2rosgZcDq4HF\n", "O9pHROwJ/Dbwwsx8oi67LSJezPZjNENUwXNnZp4TEW8Hlo5bp9WBxA61M2bylPpU1OURsXxXtmvy\n", "3YhYkpk3A0cBNwC3AmfVv4i9qNL4DmANcDTVANJRVL/ICTX/0RYvXrzovMvuykm2dTtLly6NRqNx\n", "d6fqkzRjNdj+A35sbOQW4AdUV9LOpxrX+EJE3ATcS3WhUivHADeOBUntc8DJwHfH7a8B3Ah8MSKO\n", "Bv4N+JWmZbusndNczSP7Q8CLefqqrnaNNe5PgAsiYhbVlCxX1FdzraT6Be5GNQi0pT5feGlE3FLv\n", "7y27uE9J6kv1F+qbx5UdWL/8RItNjmujziuAK8aVrWyx3pH1yw3AS1osf9G49rSlnSOTI3k6DBpU\n", "VxMc3+4OMnM99WmxzLwHeGWLdS4ELhxX9jPgTe3uR5Jmioj4EtUAerMrMvPTvWgPtDdmcmJ9JBH1\n", "+nfUp7skST2QmX33Rbud55ksBu4GLqW6KuCHEfGKbjdMkjQ42jnNtRI4PjO/DVAHyUrgZd1smCRp\n", "cLQzN9ezxoIEIDO/RXXVlSRJQHtHJg9FxBsy8ysAEfFG4IHuNkuSZpahoaFZwHCHq13faDQe73Cd\n", "LbUTJqcAV0fERVSXBj9JNdWJJKlzhped+9Wct6Azj4racN86PnLG0UE15t1SROwGfIZqCqstVFO5\n", "/Pdk9tdOmLwO+CnVvDEvAC6nury3YzcDSpJg3oKFLNh/0VTu8g3ArMw8tJ6f6+y6bJe1M2byLuDw\n", "zHwsM78HHEw1y68kabAdRjX/F/XY+A6na5lIO2GyO9B8zu1xdnHOFklSX2qeUBfgifrU1y5r5zTX\n", "V4AbI+IyqjGT44CrJrMzSVJfGT/R7m6ZOamDhQkTKDP/jOq+kgAWAn+dmZOd6FGS1D/GJtQdu4fw\n", "e5OtqK1ZgzPzcqqBd0lSl2y4b91U1/Vl4NURsaZ+f9Jk97dLU9BLkrpmfX0pb0fr3NnCzGxQPSK9\n", "mGEiSX2gvrlwYJ+XZJio6wb9zl5JEzNMNBWm/M5eSVPLMNGU6MGdvZKm0KRuTpEkqZlHJpLUBwZ9\n", "bNEwkaT+MHzVitNzeP4+Hals/f0bOXb5eROOLdYTPH4sM48s2Z9hIkl9Ynj+Phyw7/wp219EnAm8\n", "DXi0tC7HTCRp5lpLNd/iUGlFhokkzVCZeSWwrRN1GSaSpGKGiSSpmAPwktQn1t+/sVd1NUr3Z5hI\n", "Un9YX1/K29E6J1ohM9cDh5buyDCRpD4w6LMGO2YiSSpmmEiSihkmkqRihokkqZhhIkkq5tVcfa4L\n", "01L7uFtJHWeY9L+OPfLWx91K6hbDZAD4yFtJ/c4xE0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJU\n", "zDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJU\n", "zMf2auBs2/Y4wPDQ0FCnqlzfaDQe71Rl0kxkmGjgPLjhR1y14vTrh+fvU1zX+vs3cuzy8wK4u7xl\n", "0sxlmGggDc/fhwP2nd/rZkiqOWYiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmY\n", "YSJJKmaYSJKKGSaSpGI9mZsrIr4DbKrf/gD4KHAJ8CRwB3BqZjYi4mTgFGAbsCIzr+lBcyVJE5jy\n", "MImIvQAy88imsquAZZm5OiLOB14fEd8CTgNGgNnANyPi65npVOGS1Gd6cWTyG8AzI+L6ev/vBw7J\n", "zNX18muB1wBPAGsycyuwNSLWAgcB/9GDNkuSdqIXYfIY8MnMvCgiDgCuG7d8M7A3MJenT4U1l/dE\n", "Fx7IBD6USdI00YswuRtYC5CZ90TEA8DBTcvnAg8DjwBzmsrnAA9NVPno6Ghj7PWqVau4d1snmtzZ\n", "BzJB9VCmrQf+FqOjoztdr5N9qOvLifbZaZ3uQ6f14neimWVkZKSj30L7US/C5CSq01WnRsQCqpD4\n", "WkQsycybgaOAG4BbgbMiYk9gL+BAqsH5nWr+oy1evHjReZfdlZ1qeKcfyHTQ0qXRaDR2+oS/Tvdh\n", "aRv77LRO96HTevE7kaabXoTJRcBnI2JsjOQk4AHggoiYBdwJXFFfzbUSuIXqEuZlDr5LUn+a8jDJ\n", "zG3A21ssemWLdS8ELux2myRJZbxpUZJUrCc3LUqDZmhoaBYw3OFqvZpP04ZhIrVneNm5X815CxZ2\n", "pLIN963jI2ccHVRXN0oDzzCR2jRvwUIW7L+o182Q+pJjJpKkYoaJJKmYYSJJKmaYSJKKOQA/gzhZ\n", "paRuMUxmkG5MVnns8vO8vFWSYTLTdHqySkkCx0wkSR1gmEiSihkmkqRihokkqZhhIkkqZphIkooZ\n", "JpKkYoaJJKmYYSJJKmaYSJKKGSaSpGLOzSX1QBdmcHb2ZvWUYSL1QCdncHb2ZvUDw0TqEWdw1nTi\n", "mIkkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRi\n", "hokkqZhhIkkqZphIkooZJpKkYj4cS5ohhoaGZgHDHa7WxwULMEykmWR42blfzXkLFnaksg33reMj\n", "Zxzt44IFGCbSjDJvwUIW7L+o183QNOSYiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkop5\n", "n4mkSdm27XGA4aGhoU5W6x31A8owkTQpD274EVetOP364fn7dKS+9fdv5Njl5+30jnqnhOlfhomk\n", "SRuevw8H7Dt/SnfplDD9yTCRNFCcEqY/OQAvSSpmmEiSihkmkqRihokkqZgD8JJmLO+V6RzDRNKM\n", "1Yt7ZaYrw0TSjNaDe2WmJcdMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVKxvr40OCJ2Az4DHARs\n", "Ad6Zmf/d21ZJksbr9yOTNwCzMvNQ4H3A2T1ujySphX4Pk8OA6wAy89vA4t42R5LUSl+f5gLmAo80\n", "vX8iInbLzCfbrWDDfes60pAHNvwv6x/b2JG6oJp2oV324Rd1sh/2YfJ69e9pOvRhOhpqNBq9bsMO\n", "RcTZwLcy8/L6/b2Zud+O1h8dHe3fzkia0UZGRjo6m2S/6fcjkzXAMcDlEfEK4Hs7W3m6/7EkqV/1\n", "e5h8GXh1RKyp35/Uy8ZIklrr69NckqTB0O9Xc0mSBoBhIkkqZphIkooZJpKkYv1+NVdPTDQnWEQc\n", "A3wA2AZcnJkX9qShE4iIlwMfy8wjx5UPSvv3AC4Gng/sCazIzKublvd9PyLiGcAFwCKgAbw7M7/f\n", "tLzv+zAmIuYBo8DvZObdTeUD0YeI+A6wqX77g8x8R9OyM4B3AD+pi97V3EdNzCOT1nY4J1j9AXcO\n", "8GpgCXBK/T9ZX4mIM6k+xPYcVz4Q7a+9FfhJZh4BvA74m7EFA9SP3wOezMzDgeXAWWMLBqgPY239\n", "O+CxFuV934eI2AsgM4+sf94xbpVDgLc3LTdIdpFh0trO5gQ7EFibmZsycyvwTeCIqW/ihNYCxwHj\n", "b+QclPYDXA58sH69G9U33zED0Y/M/GfgXfXbYeChpsUD0YfaJ4HzgR+PKx+UPvwG8MyIuD4ibqiP\n", "2puNAMsi4paIeF8P2jfwDJPWWs4J1rRsU9OyzcDeU9WwdmXmlfzih++YgWg/QGY+lpmPRsQcqmB5\n", "f9PiQerHExFxCbAS+MemRQPRh4g4keoI8Wt1UfMXlIHoA9UR1Scz87XAu4EvNP0/DfBFqtD/beDw\n", "iPjdHrRxoBkmrT0CzGl63zy55KZxy+bwi982+91AtT8i9gNuBP4hM/+padFA9SMzT6QaN7kgImbX\n", "xYPSh5OoZqK4CXgpcGnTqaxB6cPdwBcAMvMe4AHgV5uW/3VmPlgfXV0DHDz1TRxsDsC3trM5wf4L\n", "OCAifpnq284RVKcABsXAtD8ingd8DfjDzLxp3OKB6EdEvB3YNzM/CvwMeJJqIB4GpA+ZuWTsdR0o\n", "78rMDXXRQPSBKhAPAk6NiAVUR1T3A0TE3sD3IuJFwE+pjk4u6lVDB5Vh0tp2c4JFxJuBZ2fmBRHx\n", "x8D1VEd2F2Xm+PPI/aQBMKDtX0Z1yuSDETE2dnIB8KwB6scVwCURcTOwB/Be4I0RMWh/i2ZDA/jv\n", "6SLgsxGxun5/EvCmpr/D+4CbqK7e/EZmXterhg4q5+aSJBVzzESSVMwwkSQVM0wkScUME0lSMcNE\n", "klTMMJEkFTNMNK1FxL9GxJIW5X8REb8XEcMRsW4H2z7a/RZK04M3LWq6a3kjVWb+OUBEDO/qtpK2\n", "Z5hoWomIj1M9QmAb8Pd18Tsj4mzgl4H3Zua/1BMv3gTc3LTt84HPU80v9R3qI/eI+BDwCmA/4FPA\n", "N6ied/MrVNNvnJaZ/1nX+TDVDLT7An+RmZd0r7dS//A0l6aNiPh94FDgJcDLgBOB+cBDmbkY+COe\n", "ntK+wfZHHn9DNaHkS6km+5vdtGxWZr44M/8WuBQ4MzNHqGaabZ6Act/M/C2qud3+qoPdk/qaRyaa\n", "To4ALqtnft0KHFxPTPiVevmdwD472f6VwJsBMnNVRDQ/huDbABHxbKrn23w2IsaWPSsinkMVTmPT\n", "tH8feE5ph6RBYZhoOtlK07M26vGQZ/H0c10abP+wsGYNfvFovXm7n9evnwH8PDOfmqI8IvbLzAfr\n", "cNkCkJmNprCRpj1Pc2k6WQ0cFxG7R8QzqWay/bUdrDvE9sHydapTY0TEa3n6yOKp9TJzE3BPRLy1\n", "Xu/VwL92qP3SwDJMNG1k5leonkXzHeBWqmeT57jVGk3/bf4BOBU4NiJup3r+/P+NW3fMW6kG9W+n\n", "eqb7m1rUP/61NK05Bb0kqZhHJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSr2/0rJDNh7\n", "mvTeAAAAAElFTkSuQmCC\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Factorplot for number of children with Had Affair hue\n", "sns.factorplot('children',data=df,hue='Had_Affair',palette='coolwarm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty strong evidence suggesting that less children results in a lower probability of an affair. Finally let's check the education level." ] }, { "cell_type": "code", "execution_count": 233, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 233, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAAZMAAAFhCAYAAAC1RkdzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAALEgAACxIB0t1+/AAAHUtJREFUeJzt3X+0XWV95/H3oZDA2BvaUWJW/BXGmq8ow1LvVTqhk0CL\n", "IrSKJa1atKJdQEuR8dcM2Cvj1DYRx1aI6VjWLKJFa6sIl7qgCNoBS2JsC55xsAz0y9ASpSJG5Eci\n", "tZDAmT/2vnI8ubn3JM/5dW/er7VYnPOcZz/7eXKT87l7P3s/u9FqtZAkqcRBw+6AJGn+M0wkScUM\n", "E0lSMcNEklTMMJEkFTNMJEnFDu5n4xFxLPChzDwhIpYClwE/BTSAt2Tmtog4Czgb2A2sy8zrIuIw\n", "4NPAEcBO4IzMfKCffZUk7b++HZlExPlU4bG4Lvow8KeZuQZ4P3B0RCwDzgNWAScBF0XEIuAc4LbM\n", "XA18CriwX/2UJJXr52muu4HTqI5CoAqM50TEXwFvAm4CXgFszcxdmbmj3uYY4Djghnq7G4AT+9hP\n", "SVKhvoVJZl5Ndepq2grgwcx8JfAt4AJgDHikrc5O4HBgCbCjo0ySNKL6OmfS4fvANfXra4H1wNeo\n", "AmXaGPAwVZCMdZTNqtlsui6MpJE0Pj7emLvW/DbIMPkK8ItUE+trgNuBW4D1EbEYOBQ4qi7fCpwC\n", "3AqcDGzuZgcHwg9MkkbRIC4Nnj5ieA/wlojYCrwK+GBmfhfYCGwBbgQmM/Mx4FLgxRGxBTgT+MAA\n", "+ilJ2k+NhbJqcLPZbHlkIknD4U2LkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "HTzsDmhhazQai4AVPW52W6vVerzHbUoqYJio31ZMXvKFXLr8yJ40tv2+e/jgu04J4K6eNCipJwwT\n", "9d3S5Uey/Lkrh90NSX3knIkkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK9fU+k4g4FvhQ\n", "Zp7QVnY68PbMXFW/Pws4G9gNrMvM6yLiMODTwBHATuCMzHygn32VJO2/vh2ZRMT5wGXA4raylwK/\n", "0fZ+GXAesAo4CbgoIhYB5wC3ZeZq4FPAhf3qpySpXD9Pc90NnAY0ACLi6cB64J3TZcArgK2ZuSsz\n", "d9TbHAMcB9xQ17kBOLGP/ZQkFepbmGTm1VSnroiIg4CPA+8GftBWbQnwSNv7ncDhdfmOjjJJ0oga\n", "1Npc48DPAJcChwIvioiLgS8DY231xoCHqYJkrKNsTs1ms9WrDqs3pqamuHd3z9vMZrPZ20alPhof\n", "H2/MXWt+G0iYZOatwNEAEfE84LOZ+e56zmR9RCymCpmjgNuBrcApwK3AycDmbvZzIPzA5puJiYmV\n", "G664M3vZ5tq1a6PVarlqsDRCBnFpcOfRQmO6LDPvBzYCW4AbgcnMfIzqCObFEbEFOBP4wAD6KUna\n", "T309MsnMbVRXau21LDM3AZs66vwQeH0/+yZJ6h1vWpQkFTNMJEnFfNLiCPP56ZLmC8NktPn8dEnz\n", "gmEy4nx+uqT5wDkTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJU\n", "zDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJU\n", "zDCRJBUzTCRJxQwTSVKxg/vZeEQcC3woM0+IiJcAG4EngMeAt2Tm9og4Czgb2A2sy8zrIuIw4NPA\n", "EcBO4IzMfKCffZUk7b++HZlExPnAZcDiumgD8PbMPAG4GrggIp4JnAesAk4CLoqIRcA5wG2ZuRr4\n", "FHBhv/opSSrXz9NcdwOnAY36/Rsz8xv160OAHwKvALZm5q7M3FFvcwxwHHBDXfcG4MQ+9lOSVKhv\n", "YZKZV1Odupp+fz9ARKwCzgUuAZYAj7RtthM4vC7f0VEmSRpRfZ0z6RQRbwAmgVMy8/sRsQMYa6sy\n", "BjxMFSRjHWVzajabrR52d+impqa4d/fc9faxzWw2m71tdPb9zfsxSKXGx8cbc9ea3wYWJhHxZqqJ\n", "9uMz86G6+BZgfUQsBg4FjgJuB7YCpwC3AicDm7vZx0L7gU1MTKzccMWd2cs2165dG61W665etjmb\n", "hTAGSXMbRJi0IuIg4KPAN4GrIwLgrzPzAxGxEdhCdcptMjMfi4hLgU9GxBaqK79OH0A/JUn7qa9h\n", "kpnbqK7UAnj6XupsAjZ1lP0QeH0/+yZJ6h1vWpQkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwT\n", "SVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQ7u\n", "Z+MRcSzwocw8ISJ+BrgceBK4HTg3M1sRcRZwNrAbWJeZ10XEYcCngSOAncAZmflAP/sqSdp/fTsy\n", "iYjzgcuAxXXRxcBkZq4GGsCpEbEMOA9YBZwEXBQRi4BzgNvqup8CLuxXPyVJ5fp5mutu4DSq4AB4\n", "WWZurl9fD5wIvBzYmpm7MnNHvc0xwHHADXXdG+q6kqQR1bcwycyrqU5dTWu0vd4JHA4sAR7ZS/mO\n", "jjJJ0ojq65xJhyfbXi8BHqYKjLG28rEZyqfL5tRsNlvl3RwdU1NT3Lt77nr72GY2m83eNjr7/ub9\n", "GKRS4+PjjblrzW+DDJOvR8SazLwZOBm4EbgFWB8Ri4FDgaOoJue3AqcAt9Z1N8/c5I9baD+wiYmJ\n", "lRuuuDN72ebatWuj1Wrd1cs2Z7MQxiBpboO4NHj6aOE9wAci4qtUIXZVZn4X2AhsoQqXycx8DLgU\n", "eHFEbAHOBD4wgH5KkvZTX49MMnMb1ZVaZOb/A46foc4mYFNH2Q+B1/ezb5Kk3vGmRUlSMcNEklTM\n", "MJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVM0wkScUME0lSMcNEklTM\n", "MJEkFTNMJEnFDBNJUjHDRJJUzDCRJBUzTCRJxQwTSVIxw0SSVMwwkSQVmzNMIuKPZij7ZH+6I0ma\n", "jw7e2wcRsQl4PjAREUd3bPNT/e6YNEoajcYiYEUPm9zWarUe72F70lDtNUyA9cDzgI3A7wKNunw3\n", "cEd/uyWNnBWTl3whly4/srih7ffdwwffdUoAd5V3SxoNew2TzLwHuAc4JiKWAIfzVKD8JPBg/7sn\n", "jY6ly49k+XNXDrsb0kia7cgEgIiYBN5LFR6tto/Kf0WTJC0Ic4YJcCbw/Mz8Xr87I0man7q5NPib\n", "wEP97ogkHSgi4viIuLSj7M4utntJRPzJHHVWRMS/RsTL28qeHxG3RcTvR8Q7IqIZEcftZfv9ulq3\n", "myOTu4GvRMRNwGN1WSszf29fdxYRBwGbgJXAk8BZwBPA5fX724FzM7MVEWcBZ1NN+K/LzOv2dX+S\n", "NKJac1fZb28B/ojqrNKtddlxwOcyc31E3Ai8JjPvm2njzDxjf3baTZh8u/5vWmNvFbvwKuBpmflz\n", "EXEi8MG6D5OZublO6lMj4m+B84Bx4DCqMPurzPRSygPc7t2PA6xoNEr+Gu7By3Q1EiLiJcCHqb4X\n", "fwJ4DVXwfJbqu/AB4NE5mvllYDXwNxFxWL3dJHBIRNwDvAz4XES8GrgUWFr/N5mZ10fEnZl5VEQ0\n", "ge8At2Xm++bq+5xhkpm/O1edffBD4PCIaFBdHfY4cGxmbq4/v54qcJ4AtmbmLmBXRNwNHAN8rYd9\n", "0Tz04PZvc826d35xxbJn9KS9bfc/wGsv3OBluhq0BvDaiHhhW9lzgRcCZ2bmt+pfrn8OeBHwvzLz\n", "koh4K7Bmb41GxCrg/2bmzoi4Fvi1zPxERFwEPDMz/7w+6/MG4Ajg2sz8XEQcC7yH6jt42r8FTsvM\n", "b3YzoG6u5npyhuL7MvPZ3eygw1bgUOAfgKdTpe7qts93UoXMEuCRGcolVix7Bi949rJhd0Mq0QKu\n", "ycxzpgvqOZP7gQ9HxL8ARwE3AC8A/qyu9jfMEibAGcALI+J6qls41gCfoAqvzsP5h4BXRcQv1u87\n", "8+DxboNkpo33kJk/mqSPiEOA1wGrut1Bh/OpjjjeFxHPBr4MHNL2+RLgYWAHMNZWPkYXFwE0m81+\n", "noccuKmpKe7d3fM2s9ls9rbR2ffX8zH0Wjd/Jr0ex6B/Dhqu8fHxzi/ymc7TNoCLgROpvgevr8sS\n", "OBbYDEzsbR8RsRj4eeCFmflEXXZrRLyYPedoGlTBc0dmXhwRvw6s7agz04HEXnUzZ/Ij9WmnKyPi\n", "wn3Zrs3TqIICqnA4GPh6RKzJzJuBk4EbgVuA9fUfzqFUCX37XI3P8AOb1yYmJlZuuOLO7GWba9eu\n", "jVarNbBTOv0YQ69182fS63EM+uegkdNizy/46bmRLcA/UV1Ju4xqXuPPIuLLwL1UFyXN5DXATdNB\n", "UvtTqgudvt6xvxZwE/CZiDgF+CrV2SLYz4sDujnN1T6z3wBezFNXde2rPwD+JCK2UB2R/A7QBC6L\n", "iEVUy7RcVV/NtZHqD/UgqokhJ0glLQj1L883d5QdVb/88AybnNZFm1cBV3WUbZyh3gn1y+3A0TN8\n", "/qKO/nSlmyOTE3gqqVpUVxO8YV92Mi0zH6a60qDT8TPU3UR1GbEkqU1EfI5qAr3dVZn5sWH0B7qb\n", "M3lrfdQQdf3b69NdkqQhyMzXD7sPnbp5nskE1WWTn6S6KuCbEfGz/e6YJGn+6OY010bgDZn5dwB1\n", "kGwEXtHPjkmS5o9u1uZ62nSQAGTm31JdYSVJEtDdkclDEfG6zPw8QET8MvD9/nZLkg4sfXiaJwxw\n", "qaBuwuRs4NqI+DjVpcFPUi0aJknqnZ49zRO6e6JnvfjuH1MtV/UY1VIu/7g/++smTF4N/AvVujHP\n", "B66kupR3pG9Ek6T5ZghP83wdsCgzV9Xrc32kLttn3cyZ/Cbwc5n5aGZ+A3gp1Yq+kqT57Tiq9b+o\n", "58b3ulzLXLoJk4OpVved9jj7uGaLJGkkLeGpJa4AnqhPfe2zbk5zfR64KSKuoJozOQ24Zn92puHy\n", "WSCSOnQuqntQZu7XwUI3d8BfEBG/SrVU/C7go9NXdml+8VkgkjpspVog8sr6HsJv7G9DXa0anJlX\n", "Uk28a57zWSDS6Np+3z2DbusvgFdGxNb6/dv2d3/7tAS9JKlvttWX8va0zdk+zMwWcM5sdbplmEjS\n", "CKjnHuftKeP9mrWXJKmdYSJJKmaYSJKKGSaSpGJOwEvSCDgQVg2WJPXfimvWvTMHfVNxvcDjhzLz\n", "hJL9GSaSNCIGfVNxRJwPvBn4QWlbzplI0oHrbqr1FosX7DNMJOkAlZlXA7t70ZZhIkkqZphIkoo5\n", "AS9JI2Lb/Q8Mq61W6f4ME0kaDdvqS3l72uZcFTJzG7CqdEeGiSSNAFcNliQd8AwTSVKxgZ/miojf\n", "oXrm8CHA/6B6BvHlwJPA7cC5mdmKiLOAs6mugV6XmdcNuq+SpO4M9MgkIo4H/kNmrgKOB/4d8BFg\n", "MjNXU92FeWpELAPOo5oUOgm4KCIWDbKvkqTuDfo016uAv4+IzwPXAtcA45m5uf78euBE4OXA1szc\n", "lZk7qG75P2bAfZUkdWnQp7mOAJ4D/BLVUcm1/PiaMDuBw4ElwCMzlEuSRtCgw+QB4M7M3A3cFRH/\n", "Cjyr7fMlwMPADmCsrXwMeGiuxpvNZvGNN6NkamqKe3uyak7/TE1NZbPZnO3zeT+Guk5Px9HNPrVw\n", "jI+PFy+kOOoGHSZfAd4BXBwRy4F/A9wYEWsy82bgZOBG4BZgfUQsBg4FjqKanJ/VQvuBTUxMrNxw\n", "xZ057H7MZu3atdFqtfZ6bfxCGAP0fhzd7FOaTwYaJpl5XUSsjohbqOZrfpvqDs3L6gn2O4Cr6qu5\n", "NgJb6nqTmTmQp4VJkvbdwC8NzswLZig+foZ6m4BNfe+QJKmYNy1KkooZJpKkYoaJJKmYYSJJKmaY\n", "SJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaY\n", "SJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRihokkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaY\n", "SJKKGSaSpGIHD2OnEbEUaAK/ADwJXF7//3bg3MxsRcRZwNnAbmBdZl43jL5KkuY28COTiDgE+J/A\n", "o0ADuBiYzMzV9ftTI2IZcB6wCjgJuCgiFg26r5Kk7gzjNNcfAJcC36nfvywzN9evrwdOBF4ObM3M\n", "XZm5A7gbOGbgPZUkdWWgYRIRbwW+l5lfqosa9X/TdgKHA0uAR2YolySNoEHPmbwNaEXEicBLgE8C\n", "R7R9vgR4GNgBjLWVjwEPzdV4s9ls9a6rwzc1NcW9u4fdi9lNTU1ls9mc7fN5P4a6Tk/H0c0+tXCM\n", "j4835q41vw00TDJzzfTriPgy8FvAH0TEmsy8GTgZuBG4BVgfEYuBQ4GjqCbnZ7XQfmATExMrN1xx\n", "Zw67H7NZu3ZttFqtu/b2+UIYA/R+HN3sU5pPhnI1V5sW8B7gsnqC/Q7gqvpqro3AFqpTcZOZ+fgQ\n", "+ylJmsXQwiQzT2h7e/wMn28CNg2sQ5Kk/eZNi5KkYsM+zSVpQBqNxiJgRY+b3dZqtTwFLcNEOoCs\n", "mLzkC7l0+ZE9aWz7fffwwXedEoAXEsgwkQ4kS5cfyfLnrhx2N7QAOWciSSpmmEiSihkmkqRihokk\n", "qZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKGSaSpGKGiSSpmGEiSSpmmEiSihkmkqRiPs9E\n", "GrDdux8HWNFoNHrZrE881FAZJtKAPbj921yz7p1fXLHsGT1pb9v9D/DaCzf4xEMNlWEiDcGKZc/g\n", "Bc9eNuxuSD3jnIkkqZhhIkkqZphIkoo5ZyJp3mg0GouAFT1u1ivhesAwkTSfrJi85Au5dPmRPWls\n", "+3338MF3neKVcD1gmEiaV5YuP5Llz1057G6og3MmkqRihokkqdhAT3NFxCHAJ4DnAYuBdcCdwOXA\n", "k8DtwLmZ2YqIs4Czgd3Ausy8bpB9lSR1b9BHJm8CvpeZq4FXAx8DPgJM1mUN4NSIWAacB6wCTgIu\n", "iohFA+6rJKlLg56AvxK4qn59ELALeFlmbq7LrgdeBTwBbM3MXcCuiLgbOAb42oD7K0nqwkDDJDMf\n", "BYiIMapguRD4w7YqO4HDgSXAIzOUS5JG0MAvDY6I5wBXAx/LzM9ExIfbPl4CPAzsAMbayseAh+Zq\n", "u9lstnrZ12Gbmpri3t3D7sXspqamstlszvb5vB9DXWekxzGsMXSz3x7vb16OYXx8vKfPGxhFg56A\n", "fybwJeC3M/PLdfHXI2JNZt4MnAzcCNwCrI+IxcChwFFUk/OzWmg/sImJiZUbrrgzh92P2axduzZa\n", "rdZeb/haCGOA0R/HsMbQzX57aSGMYaEa9JHJJNXpqvdHxPvrsncAG+sJ9juAq+qruTYCW6jmViYz\n", "0+UOJGlEDXrO5B1U4dHp+BnqbgI29btPkqRy3rQoSSpmmEiSihkmkqRihokkqZhhIkkqZphIkor5\n", "cCxJ+2X37scBVjQaPbtX2MfnzmOGiaT98uD2b3PNund+ccWyZxS3te3+B3jthRt8fO48ZphI2m8r\n", "lj2DFzx72bC7oRHgnIkkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKKeZ+JpANWH+7ihwP0\n", "Tn7DRNIBq5d38cOBfSe/YSLpgOZd/L3hnIkkqZhhIkkqZphIkooZJpKkYoaJJKmYYSJJKmaYSJKK\n", "GSaSpGKGiSSp2IK9A77RaCwCVvS42QNyzR1JmsuCDRNgxeQlX8ily4/sSWPf+dZd/Pf/cupJjUZj\n", "W08aNJgkLSAjGyYRcRDwx8AxwGPAmZn5j/vSxtLlR7L8uSt70p/t993TswXhDuTF4CQtTCMbJsDr\n", "gEWZuSoijgU+UpcNjQvCSdLMRnkC/jjgBoDM/DtgYrjdkSTtzSgfmSwBdrS9fyIiDsrMJ7ttYPt9\n", "9/SsM9/f/s9se/SBnrS17f7u2xnVMUD341gIY4DejWMhjAH8NzGTfRnHQtNotVrD7sOMIuIjwN9m\n", "5pX1+3sz8zl7q99sNkdzIJIEjI+P9/RxjqNmlI9MtgKvAa6MiJ8FvjFb5YX+g5KkUTbKYfIXwCsj\n", "Ymv9/m3D7Iwkae9G9jSXJGn+GOWruSRJ84RhIkkqZphIkooZJpKkYqN8NddQRMQiYBPwM8Au4D9l\n", "5m1tn78G+K/AbuATmblpKB3di3rpmQ9l5gkR8RJgI/AE1fpmb8nM7W11i9c/64f2MbSVnQ68PTNX\n", "ddQd+TFExFLgMuCngAbVz2FbW92RHAPsMY7PAs+sPzoS+Gpmnt5Wd+TGERGHAJ8AngcsBtYBdwKX\n", "A08CtwPnZmarbZuRG8d84JHJns4C/qX+0jqL6i8i8KO/mBcDrwTWAGfXXxQjISLOp/rSWlwXbaD6\n", "Aj4BuBq4oGOTH61/BryXav2zoZphDETES4Hf2Msm82EMHwb+NDPXAO8Hju7YZOTGAHuOIzPfWP9d\n", "+mXgIeBdHZuM4jjeBHwvM1cDrwY+RtWvybqsAZzasc0ojmPkGSZ7ehFPrQl2F/CsiFhSf3YUcHdm\n", "PpKZu4CvAKuH080Z3Q2cRvUPBOCNmTl9s+chwA876o/i+mc/NoaIeDqwHngnT42r3ciPAVgFPCci\n", "/orqy+2mjvqjOAbYcxzTfg/YmJnf7SgfxXFcSRXgUH3f7QJelpmb67LrgRM7thnFcYw8w2RP/wf4\n", "JYD6zvsjgKfVny0BHmmruxM4fKC9m0VmXk11+m36/f0AEbEKOBe4pGOTGdc/63c/Z9M+hrovHwfe\n", "DfxgL5uM9BhqK4AHM/OVwLfY8whx5MYAM46D+kj856lOE3UauXFk5qOZ+YOIGKMKlgv58e+9H7Dn\n", "v+GRG8d84B/Qnj4B7IiILVSHu3cBD9afPQKMtdUdozrcH1kR8QbgUuCUzPx+x8c7+PHx7NNCmgMw\n", "TjV3dSnwGeBFEXFxR51RHwPA94Fr6tfXsudvuvNhDNN+Bfiz9jmGNiM5joh4DtXR4Kcy8zNUcyXT\n", "xoCHOzYZyXGMOsNkT68AbsrM/whcBXwnMx+rP/sH4AUR8dP1RP1q4G+G1M85RcSbqY5Ijm+f8G2z\n", "FTilrjvn+meDlpm3ZubR9Xn6NwJ3ZOa7O6qN9BhqXwF+sX69hmrSt918GMO0X6A6NTSTkRtHRDwT\n", "+BJwfmZeXhd/PSLW1K9PBjZ3bDZy45gPvJprTwlcERGTVHMMZ0XErwE/mZmXRcS7gS9SBfHHM/M7\n", "Q+zr3rTqw/KPAt8Ero4IgL/OzA9ExCeB9zHa6591/ubbaC+bZ2N4D7ApIs6h+i34dJg3Y4Af/1kE\n", "8E/tH474OCapTmO9PyKm507eAWysfyG8g+qXxlEfx8hzbS5JUjFPc0mSihkmkqRihokkqZhhIkkq\n", "ZphIkooZJpKkYoaJVIuI/xwR/23Y/ZDmI8NEeoo3XUn7yTvgdcCIiPcCvwr8BPDFzLwgIt4D/CbV\n", "+mv3A/+7rvtkZh5Uv34rsCYz3xYRJwJ/SPWL2DeB0zNz58AHI40Yj0x0QIiIVwMvA15e//9ZEXEh\n", "1TNrXgocDyzfy+YtqiVqFgGfpnq41TFUazad0eeuS/OCRyY6UJwIHAs06/eHUT2r448z81GAiPhz\n", "quXHZ9IA/j3w7elnxGTm+/raY2keMUx0oDgI2JCZlwBExE8D51E9SnfaE3vZdlH9/13thfVD08Yy\n", "89s97qs073iaSweKm4Bfj4inRcTBVI8xfgR4bUQcXp/C+pW2+g9ExIsjogG8lupUVwJHRMRRdZ0L\n", "gN8a3BCk0WWY6ICQmX8JTAF/B/w98PXM/CjVZPotVM8c+ee2Td4L/CXwVarn2FA/1+bNwKci4jbg\n", "hcBFgxqDNMpcgl6SVMwjE0lSMcNEklTMMJEkFTNMJEnFDBNJUjHDRJJUzDCRJBX7/8zkd7uB9jUr\n", "AAAAAElFTkSuQmCC\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Factorplot for number of children with Had Affair hue\n", "sns.factorplot('educ',data=df,hue='Had_Affair',palette='coolwarm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interesting! Go ahead and finish the remaining columns if you wish. For now, we will go ahead and begin diving into the Logistic Regression Analysis." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 5: Data Preparation\n", "\n", "If we look at the data, we'll notice that two columns are unlike the others. Occupation and Husband's Occupation. These columns are in a format know as *Categorical Variables*. Basically they are in set quantity/category, so that 1.0 and 2.0 are seperate variables, not values along a spectrum that goes from 1-2 (e.g. There is no 1.5 for the occupation column). Pandas has a built-in method of getting [dummy variables](http://en.wikipedia.org/wiki/Dummy_variable_%28statistics%29) and creating new columns from them." ] }, { "cell_type": "code", "execution_count": 234, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
1.02.03.04.05.06.0
0010000
1001000
2001000
3000010
4001000
\n", "
" ], "text/plain": [ " 1 2 3 4 5 6\n", "0 0 1 0 0 0 0\n", "1 0 0 1 0 0 0\n", "2 0 0 1 0 0 0\n", "3 0 0 0 0 1 0\n", "4 0 0 1 0 0 0" ] }, "execution_count": 234, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create new DataFrames for the Categorical Variables\n", "occ_dummies = pd.get_dummies(df['occupation'])\n", "hus_occ_dummies = pd.get_dummies(df['occupation_husb'])\n", "\n", "# Let's take a quick look at the results\n", "occ_dummies.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great! Now let's name the columns something a little more readable." ] }, { "cell_type": "code", "execution_count": 235, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create column names for the new DataFrames\n", "occ_dummies.columns = ['occ1','occ2','occ3','occ4','occ5','occ6']\n", "hus_occ_dummies.columns = ['hocc1','hocc2','hocc3','hocc4','hocc5','hocc6']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will create the X and Y data sets for out logistic regression!" ] }, { "cell_type": "code", "execution_count": 236, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Set X as new DataFrame without the occupation columns or the Y target\n", "X = df.drop(['occupation','occupation_husb','Had_Affair'],axis=1)" ] }, { "cell_type": "code", "execution_count": 237, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Concat the dummy DataFrames Together\n", "dummies = pd.concat([occ_dummies,hus_occ_dummies],axis=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will concatenate all the DataFrames together." ] }, { "cell_type": "code", "execution_count": 238, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rate_marriageageyrs_marriedchildrenreligiouseducaffairsocc1occ2occ3occ4occ5occ6hocc1hocc2hocc3hocc4hocc5hocc6
03329.033170.111111010000000010
132713.031143.230769001000000100
24222.501161.400000001000000010
343716.543160.727273000010000010
45279.011144.666666001000000100
\n", "
" ], "text/plain": [ " rate_marriage age yrs_married children religious educ affairs occ1 \\\n", "0 3 32 9.0 3 3 17 0.111111 0 \n", "1 3 27 13.0 3 1 14 3.230769 0 \n", "2 4 22 2.5 0 1 16 1.400000 0 \n", "3 4 37 16.5 4 3 16 0.727273 0 \n", "4 5 27 9.0 1 1 14 4.666666 0 \n", "\n", " occ2 occ3 occ4 occ5 occ6 hocc1 hocc2 hocc3 hocc4 hocc5 hocc6 \n", "0 1 0 0 0 0 0 0 0 0 1 0 \n", "1 0 1 0 0 0 0 0 0 1 0 0 \n", "2 0 1 0 0 0 0 0 0 0 1 0 \n", "3 0 0 0 1 0 0 0 0 0 1 0 \n", "4 0 1 0 0 0 0 0 0 1 0 0 " ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Now Concat the X DataFrame with the dummy variables\n", "X = pd.concat([X,dummies],axis=1)\n", "\n", "# Preview of Result\n", "X.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's go ahead and set up the Y." ] }, { "cell_type": "code", "execution_count": 259, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 1\n", "2 1\n", "3 1\n", "4 1\n", "Name: Had_Affair, dtype: int64" ] }, "execution_count": 259, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Set Y as Target class, Had Affair\n", "Y = df.Had_Affair\n", "\n", "# Preview\n", "Y.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 6: Multicollinearity Consideration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to get rid of a few columns. We will be dropping the occ1 and hocc1 columns to avoid [multicollinearity](http://en.wikipedia.org/wiki/Multicollinearity#Remedies_for_multicollinearity). Multicollinearity occurs due to the [dummy variables](http://en.wikipedia.org/wiki/Dummy_variable_(statistics)) we created. This is because the dummy variables are highly correlated, our model begins to get distorted because one of the dummy variables can be linearly predicted from the others. We take care of this problem by dropping one of the dummy variables from each set, we do this at the cost of losing a data set point.\n", "\n", "The other column we will drop is the affairs column. This is because it is basically a repeat of what will be our Y target, instead of 0 and 1 it just has 0 or a number, so we'll need to drop it for our target to make sense." ] }, { "cell_type": "code", "execution_count": 240, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Dropping one column of each dummy variable set to avoid multicollinearity\n", "X = X.drop('occ1',axis=1)\n", "X = X.drop('hocc1',axis=1)\n", "\n", "# Drop affairs column so Y target makes sense\n", "X = X.drop('affairs',axis=1)\n", "\n", "# PReview\n", "X.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to use the Y with SciKit Learn, we need to set it as a 1-D array. This means we need to \"flatten\" the array. Numpy has a built in method for this called [ravel](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ravel.html). Let's use it!" ] }, { "cell_type": "code", "execution_count": 242, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([1, 1, 1, ..., 0, 0, 0], dtype=int64)" ] }, "execution_count": 242, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Flatten array\n", "Y = np.ravel(Y)\n", "\n", "# Check result\n", "Y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 7: Logistic Regression with SciKit Learn\n", "\n", "Awesome! Now let's go ahead and run the logistic regression. This is a very similar process to the Linear Regression from the previous lecture. We'll create the model, the fit the data into the model, and check our accuracy score. Then we'll split the data into testing and training sets and see if our results improve.\n", "\n", "Let's start by initiating the model!" ] }, { "cell_type": "code", "execution_count": 247, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.72588752748978946" ] }, "execution_count": 247, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create LogisticRegression model\n", "log_model = LogisticRegression()\n", "\n", "# Fit our data\n", "log_model.fit(X,Y)\n", "\n", "# Check our accuracy\n", "log_model.score(X,Y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like we got a 73% accuracy rating. Let's go ahead and compare this to the original Y data. We can do this by simply taking the mean of the Y data, since it is in the format 1 or 0, we can use the mean to calulate the percentage of women who reported having affairs. This is known as checking the [null error rate](http://en.wikipedia.org/wiki/Type_I_and_type_II_errors)." ] }, { "cell_type": "code", "execution_count": 249, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.32249450204209867" ] }, "execution_count": 249, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check percentage of women that had affairs\n", "Y.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that if our model just simply guessed \"no affair\" we would have had 1-0.32=0.68 accuracy (or 68%) accuracy. So while we are doing better than the null error rate, we aren't doing that much better.\n", "\n", "Let's go ahead and check the coefficients of our model to check what seemed to be the stronger predictors." ] }, { "cell_type": "code", "execution_count": 245, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01
0rate_marriage[-0.69751024616]
1age[-0.0561916647628]
2yrs_married[0.10377681236]
3children[0.0182911397779]
4religious[-0.36832290498]
5educ[0.00890195753587]
6occ2[0.296111837283]
7occ3[0.606052332054]
8occ4[0.343259482442]
9occ5[0.94006396288]
10occ6[0.919085673732]
11hocc2[0.219634563784]
12hocc3[0.323455986054]
13hocc4[0.189038509875]
14hocc5[0.212503714395]
15hocc6[0.212868942457]
\n", "
" ], "text/plain": [ " 0 1\n", "0 rate_marriage [-0.69751024616]\n", "1 age [-0.0561916647628]\n", "2 yrs_married [0.10377681236]\n", "3 children [0.0182911397779]\n", "4 religious [-0.36832290498]\n", "5 educ [0.00890195753587]\n", "6 occ2 [0.296111837283]\n", "7 occ3 [0.606052332054]\n", "8 occ4 [0.343259482442]\n", "9 occ5 [0.94006396288]\n", "10 occ6 [0.919085673732]\n", "11 hocc2 [0.219634563784]\n", "12 hocc3 [0.323455986054]\n", "13 hocc4 [0.189038509875]\n", "14 hocc5 [0.212503714395]\n", "15 hocc6 [0.212868942457]" ] }, "execution_count": 245, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use zip to bring the column names and the np.transpose function to bring together the coefficients from the model\n", "coeff_df = DataFrame(zip(X.columns, np.transpose(log_model.coef_)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the coefficients we can see that a positive coeffecient corresponds to increasing the likelihood of having an affair while a negative coefficient means it corresponds to a decreased likelihood of having an affair as the actual data value point increases.\n", "\n", "As you might expect, an increased marriage rating corresponded to a decrease in the likelihood of having an affair. Increased religiousness also seems to correspond to a decrease in the likelihood of having an affair. \n", "\n", "Since all the dummy variables (the wife and husband occupations) are positive that means the lowest likelihood of having an affair corresponds to the baseline occupation we dropped (1-Student)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Part 8: Testing and Training Data Sets\n", "\n", "Just like we did in the Linear Regression Lecture, we should be splitting our data into training and testing data sets. We'll follow a very similar procedure to the Linear Regression Lecture by using SciKit Learn's built-in train_test_split method." ] }, { "cell_type": "code", "execution_count": 253, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", " intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)" ] }, "execution_count": 253, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Split the data\n", "X_train, X_test, Y_train, Y_test = train_test_split(X, Y)\n", "\n", "# Make a new log_model\n", "log_model2 = LogisticRegression()\n", "\n", "# Now fit the new model\n", "log_model2.fit(X_train, Y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can use predict to predict classification labels for the next test set, then we will reevaluate our accuracy score!" ] }, { "cell_type": "code", "execution_count": 258, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.735552763819\n" ] } ], "source": [ "# Predict the classes of the testing data set\n", "class_predict = log_model2.predict(X_test)\n", "\n", "# Compare the predicted classes to the actual test classes\n", "print metrics.accuracy_score(Y_test,class_predict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have a 73.35% accuracy score, which is basically the same as our previous accuracy score, 72.58%.\n", "\n", "### Part 9: Conclusion and more Resources\n", "\n", "So what could we do to try to further improve our Logistic Regression model? We could try some [regularization techniques](http://en.wikipedia.org/wiki/Regularization_%28mathematics%29#Regularization_in_statistics_and_machine_learning) or using a non-linear model.\n", "\n", "I'll leave the Logistic Regression topic here for you to explore more possibilites on your own. Here are several more resources and tutorials with other data sets to explore:\n", "\n", "1.) Here's another great post on how to do logistic regression analysis using Statsmodels from [yhat](http://blog.yhathq.com/posts/logistic-regression-and-python.html)!\n", "\n", "2.) The SciKit learn Documentation includes several [examples](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) at the bottom of the page.\n", "\n", "3.) DataRobot has a great overview of [Logistic Regression](http://www.datarobot.com/blog/classification-with-scikit-learn/)\n", "\n", "4.) Fantastic resource from [aimotion.blogspot](http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html) on the Logistic Regression and the Mathmatics of how it relates to the cost function and gradient!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.7" } }, "nbformat": 4, "nbformat_minor": 0 }