{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GLM: Logistic Regression\n", "\n", "* This is a reproduction with a few slight alterations of [Bayesian Log Reg](http://jbencook.github.io/portfolio/bayesian_logistic_regression.html) by J. Benjamin Cook\n", "\n", "* Author: Peadar Coyle and J. Benjamin Cook\n", "* How likely am I to make more than $50,000 US Dollars?\n", "* Exploration of model selection techniques too - I use DIC and WAIC to select the best model. \n", "* The convenience functions are all taken from Jon Sedars work.\n", "* This example also has some explorations of the features so serves as a good example of Exploratory Data Analysis and how that can guide the model creation/ model selection process. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import pymc3 as pm\n", "import matplotlib.pyplot as plt\n", "import seaborn\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "from collections import OrderedDict\n", "from time import time\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "from scipy.optimize import fmin_powell\n", "from scipy import integrate\n", "\n", "import theano as thno\n", "import theano.tensor as T \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def run_models(df, upper_order=5):\n", " ''' \n", " Convenience function:\n", " Fit a range of pymc3 models of increasing polynomial complexity. \n", " Suggest limit to max order 5 since calculation time is exponential.\n", " '''\n", " \n", " models, traces = OrderedDict(), OrderedDict()\n", "\n", " for k in range(1,upper_order+1):\n", "\n", " nm = 'k{}'.format(k)\n", " fml = create_poly_modelspec(k)\n", "\n", " with pm.Model() as models[nm]:\n", "\n", " print('\\nRunning: {}'.format(nm))\n", " pm.glm.glm(fml, df, family=pm.glm.families.Normal())\n", "\n", " start_MAP = pm.find_MAP(fmin=fmin_powell, disp=False)\n", " traces[nm] = pm.sample(2000, start=start_MAP, step=pm.NUTS(), progressbar=True) \n", " \n", " return models, traces\n", "\n", "def plot_traces(traces, retain=1000):\n", " ''' \n", " Convenience function:\n", " Plot traces with overlaid means and values\n", " '''\n", " \n", " ax = pm.traceplot(traces[-retain:], figsize=(12,len(traces.varnames)*1.5),\n", " lines={k: v['mean'] for k, v in pm.df_summary(traces[-retain:]).iterrows()})\n", "\n", " for i, mn in enumerate(pm.df_summary(traces[-retain:])['mean']):\n", " ax[i,0].annotate('{:.2f}'.format(mn), xy=(mn,0), xycoords='data'\n", " ,xytext=(5,10), textcoords='offset points', rotation=90\n", " ,va='bottom', fontsize='large', color='#AA0022')\n", " \n", "def create_poly_modelspec(k=1):\n", " ''' \n", " Convenience function:\n", " Create a polynomial modelspec string for patsy\n", " '''\n", " return ('income ~ educ + hours + age ' + ' '.join(['+ np.power(age,{})'.format(j) \n", " for j in range(2,k+1)])).strip()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [Adult Data Set](http://archive.ics.uci.edu/ml/datasets/Adult) is commonly used to benchmark machine learning algorithms. The goal is to use demographic features, or variables, to predict whether an individual makes more than \\\\$50,000 per year. The data set is almost 20 years old, and therefore, not perfect for determining the probability that I will make more than \\$50K, but it is a nice, simple dataset that can be used to showcase a few benefits of using Bayesian logistic regression over its frequentist counterpart.\n", "\n", "\n", "The motivation for myself to reproduce this piece of work was to learn how to use Odd Ratio in Bayesian Regression." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "data = pd.read_csv(\"https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\", header=None, names=['age', 'workclass', 'fnlwgt', \n", " 'education-categorical', 'educ', \n", " 'marital-status', 'occupation',\n", " 'relationship', 'race', 'sex', \n", " 'captial-gain', 'capital-loss', \n", " 'hours', 'native-country', \n", " 'income'])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | age | \n", "workclass | \n", "fnlwgt | \n", "education-categorical | \n", "educ | \n", "marital-status | \n", "occupation | \n", "relationship | \n", "race | \n", "sex | \n", "captial-gain | \n", "capital-loss | \n", "hours | \n", "native-country | \n", "income | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "39 | \n", "State-gov | \n", "77516 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Adm-clerical | \n", "Not-in-family | \n", "White | \n", "Male | \n", "2174 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
1 | \n", "50 | \n", "Self-emp-not-inc | \n", "83311 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "13 | \n", "United-States | \n", "<=50K | \n", "
2 | \n", "38 | \n", "Private | \n", "215646 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Handlers-cleaners | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
3 | \n", "53 | \n", "Private | \n", "234721 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Handlers-cleaners | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
4 | \n", "28 | \n", "Private | \n", "338409 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Wife | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "Cuba | \n", "<=50K | \n", "
5 | \n", "37 | \n", "Private | \n", "284582 | \n", "Masters | \n", "14 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
6 | \n", "49 | \n", "Private | \n", "160187 | \n", "9th | \n", "5 | \n", "Married-spouse-absent | \n", "Other-service | \n", "Not-in-family | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "16 | \n", "Jamaica | \n", "<=50K | \n", "
7 | \n", "52 | \n", "Self-emp-not-inc | \n", "209642 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
8 | \n", "31 | \n", "Private | \n", "45781 | \n", "Masters | \n", "14 | \n", "Never-married | \n", "Prof-specialty | \n", "Not-in-family | \n", "White | \n", "Female | \n", "14084 | \n", "0 | \n", "50 | \n", "United-States | \n", ">50K | \n", "
9 | \n", "42 | \n", "Private | \n", "159449 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "5178 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
10 | \n", "37 | \n", "Private | \n", "280464 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "80 | \n", "United-States | \n", ">50K | \n", "
11 | \n", "30 | \n", "State-gov | \n", "141297 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "India | \n", ">50K | \n", "
12 | \n", "23 | \n", "Private | \n", "122272 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "30 | \n", "United-States | \n", "<=50K | \n", "
13 | \n", "32 | \n", "Private | \n", "205019 | \n", "Assoc-acdm | \n", "12 | \n", "Never-married | \n", "Sales | \n", "Not-in-family | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
14 | \n", "40 | \n", "Private | \n", "121772 | \n", "Assoc-voc | \n", "11 | \n", "Married-civ-spouse | \n", "Craft-repair | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "? | \n", ">50K | \n", "
15 | \n", "34 | \n", "Private | \n", "245487 | \n", "7th-8th | \n", "4 | \n", "Married-civ-spouse | \n", "Transport-moving | \n", "Husband | \n", "Amer-Indian-Eskimo | \n", "Male | \n", "0 | \n", "0 | \n", "45 | \n", "Mexico | \n", "<=50K | \n", "
16 | \n", "25 | \n", "Self-emp-not-inc | \n", "176756 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Farming-fishing | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "35 | \n", "United-States | \n", "<=50K | \n", "
17 | \n", "32 | \n", "Private | \n", "186824 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Machine-op-inspct | \n", "Unmarried | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
18 | \n", "38 | \n", "Private | \n", "28887 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Sales | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
19 | \n", "43 | \n", "Self-emp-not-inc | \n", "292175 | \n", "Masters | \n", "14 | \n", "Divorced | \n", "Exec-managerial | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
20 | \n", "40 | \n", "Private | \n", "193524 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "60 | \n", "United-States | \n", ">50K | \n", "
21 | \n", "54 | \n", "Private | \n", "302146 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "Other-service | \n", "Unmarried | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", "<=50K | \n", "
22 | \n", "35 | \n", "Federal-gov | \n", "76845 | \n", "9th | \n", "5 | \n", "Married-civ-spouse | \n", "Farming-fishing | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
23 | \n", "43 | \n", "Private | \n", "117037 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Transport-moving | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "2042 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
24 | \n", "59 | \n", "Private | \n", "109015 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Tech-support | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
25 | \n", "56 | \n", "Local-gov | \n", "216851 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Tech-support | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
26 | \n", "19 | \n", "Private | \n", "168294 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Craft-repair | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
27 | \n", "54 | \n", "? | \n", "180211 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "? | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "60 | \n", "South | \n", ">50K | \n", "
28 | \n", "39 | \n", "Private | \n", "367260 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "80 | \n", "United-States | \n", "<=50K | \n", "
29 | \n", "49 | \n", "Private | \n", "193366 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Craft-repair | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
32531 | \n", "30 | \n", "? | \n", "33811 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "? | \n", "Not-in-family | \n", "Asian-Pac-Islander | \n", "Female | \n", "0 | \n", "0 | \n", "99 | \n", "United-States | \n", "<=50K | \n", "
32532 | \n", "34 | \n", "Private | \n", "204461 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "60 | \n", "United-States | \n", ">50K | \n", "
32533 | \n", "54 | \n", "Private | \n", "337992 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "Japan | \n", ">50K | \n", "
32534 | \n", "37 | \n", "Private | \n", "179137 | \n", "Some-college | \n", "10 | \n", "Divorced | \n", "Adm-clerical | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "39 | \n", "United-States | \n", "<=50K | \n", "
32535 | \n", "22 | \n", "Private | \n", "325033 | \n", "12th | \n", "8 | \n", "Never-married | \n", "Protective-serv | \n", "Own-child | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "35 | \n", "United-States | \n", "<=50K | \n", "
32536 | \n", "34 | \n", "Private | \n", "160216 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "55 | \n", "United-States | \n", ">50K | \n", "
32537 | \n", "30 | \n", "Private | \n", "345898 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Craft-repair | \n", "Not-in-family | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "46 | \n", "United-States | \n", "<=50K | \n", "
32538 | \n", "38 | \n", "Private | \n", "139180 | \n", "Bachelors | \n", "13 | \n", "Divorced | \n", "Prof-specialty | \n", "Unmarried | \n", "Black | \n", "Female | \n", "15020 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
32539 | \n", "71 | \n", "? | \n", "287372 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "? | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "10 | \n", "United-States | \n", ">50K | \n", "
32540 | \n", "45 | \n", "State-gov | \n", "252208 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32541 | \n", "41 | \n", "? | \n", "202822 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "? | \n", "Not-in-family | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "32 | \n", "United-States | \n", "<=50K | \n", "
32542 | \n", "72 | \n", "? | \n", "129912 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "? | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "25 | \n", "United-States | \n", "<=50K | \n", "
32543 | \n", "45 | \n", "Local-gov | \n", "119199 | \n", "Assoc-acdm | \n", "12 | \n", "Divorced | \n", "Prof-specialty | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "48 | \n", "United-States | \n", "<=50K | \n", "
32544 | \n", "31 | \n", "Private | \n", "199655 | \n", "Masters | \n", "14 | \n", "Divorced | \n", "Other-service | \n", "Not-in-family | \n", "Other | \n", "Female | \n", "0 | \n", "0 | \n", "30 | \n", "United-States | \n", "<=50K | \n", "
32545 | \n", "39 | \n", "Local-gov | \n", "111499 | \n", "Assoc-acdm | \n", "12 | \n", "Married-civ-spouse | \n", "Adm-clerical | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", ">50K | \n", "
32546 | \n", "37 | \n", "Private | \n", "198216 | \n", "Assoc-acdm | \n", "12 | \n", "Divorced | \n", "Tech-support | \n", "Not-in-family | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32547 | \n", "43 | \n", "Private | \n", "260761 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Machine-op-inspct | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "Mexico | \n", "<=50K | \n", "
32548 | \n", "65 | \n", "Self-emp-not-inc | \n", "99359 | \n", "Prof-school | \n", "15 | \n", "Never-married | \n", "Prof-specialty | \n", "Not-in-family | \n", "White | \n", "Male | \n", "1086 | \n", "0 | \n", "60 | \n", "United-States | \n", "<=50K | \n", "
32549 | \n", "43 | \n", "State-gov | \n", "255835 | \n", "Some-college | \n", "10 | \n", "Divorced | \n", "Adm-clerical | \n", "Other-relative | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32550 | \n", "43 | \n", "Self-emp-not-inc | \n", "27242 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Craft-repair | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
32551 | \n", "32 | \n", "Private | \n", "34066 | \n", "10th | \n", "6 | \n", "Married-civ-spouse | \n", "Handlers-cleaners | \n", "Husband | \n", "Amer-Indian-Eskimo | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32552 | \n", "43 | \n", "Private | \n", "84661 | \n", "Assoc-voc | \n", "11 | \n", "Married-civ-spouse | \n", "Sales | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", "<=50K | \n", "
32553 | \n", "32 | \n", "Private | \n", "116138 | \n", "Masters | \n", "14 | \n", "Never-married | \n", "Tech-support | \n", "Not-in-family | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "11 | \n", "Taiwan | \n", "<=50K | \n", "
32554 | \n", "53 | \n", "Private | \n", "321865 | \n", "Masters | \n", "14 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
32555 | \n", "22 | \n", "Private | \n", "310152 | \n", "Some-college | \n", "10 | \n", "Never-married | \n", "Protective-serv | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32556 | \n", "27 | \n", "Private | \n", "257302 | \n", "Assoc-acdm | \n", "12 | \n", "Married-civ-spouse | \n", "Tech-support | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "38 | \n", "United-States | \n", "<=50K | \n", "
32557 | \n", "40 | \n", "Private | \n", "154374 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Machine-op-inspct | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
32558 | \n", "58 | \n", "Private | \n", "151910 | \n", "HS-grad | \n", "9 | \n", "Widowed | \n", "Adm-clerical | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32559 | \n", "22 | \n", "Private | \n", "201490 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", "<=50K | \n", "
32560 | \n", "52 | \n", "Self-emp-inc | \n", "287927 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Wife | \n", "White | \n", "Female | \n", "15024 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
32561 rows × 15 columns
\n", "\n", " | age | \n", "workclass | \n", "fnlwgt | \n", "education-categorical | \n", "educ | \n", "marital-status | \n", "occupation | \n", "relationship | \n", "race | \n", "sex | \n", "captial-gain | \n", "capital-loss | \n", "hours | \n", "native-country | \n", "income | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "39 | \n", "State-gov | \n", "77516 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Adm-clerical | \n", "Not-in-family | \n", "White | \n", "Male | \n", "2174 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
1 | \n", "50 | \n", "Self-emp-not-inc | \n", "83311 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "13 | \n", "United-States | \n", "<=50K | \n", "
2 | \n", "38 | \n", "Private | \n", "215646 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Handlers-cleaners | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
3 | \n", "53 | \n", "Private | \n", "234721 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Handlers-cleaners | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
5 | \n", "37 | \n", "Private | \n", "284582 | \n", "Masters | \n", "14 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
7 | \n", "52 | \n", "Self-emp-not-inc | \n", "209642 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
8 | \n", "31 | \n", "Private | \n", "45781 | \n", "Masters | \n", "14 | \n", "Never-married | \n", "Prof-specialty | \n", "Not-in-family | \n", "White | \n", "Female | \n", "14084 | \n", "0 | \n", "50 | \n", "United-States | \n", ">50K | \n", "
9 | \n", "42 | \n", "Private | \n", "159449 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "5178 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
10 | \n", "37 | \n", "Private | \n", "280464 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "80 | \n", "United-States | \n", ">50K | \n", "
12 | \n", "23 | \n", "Private | \n", "122272 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "30 | \n", "United-States | \n", "<=50K | \n", "
13 | \n", "32 | \n", "Private | \n", "205019 | \n", "Assoc-acdm | \n", "12 | \n", "Never-married | \n", "Sales | \n", "Not-in-family | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
16 | \n", "25 | \n", "Self-emp-not-inc | \n", "176756 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Farming-fishing | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "35 | \n", "United-States | \n", "<=50K | \n", "
17 | \n", "32 | \n", "Private | \n", "186824 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Machine-op-inspct | \n", "Unmarried | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
18 | \n", "38 | \n", "Private | \n", "28887 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Sales | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
19 | \n", "43 | \n", "Self-emp-not-inc | \n", "292175 | \n", "Masters | \n", "14 | \n", "Divorced | \n", "Exec-managerial | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
20 | \n", "40 | \n", "Private | \n", "193524 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "60 | \n", "United-States | \n", ">50K | \n", "
21 | \n", "54 | \n", "Private | \n", "302146 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "Other-service | \n", "Unmarried | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", "<=50K | \n", "
22 | \n", "35 | \n", "Federal-gov | \n", "76845 | \n", "9th | \n", "5 | \n", "Married-civ-spouse | \n", "Farming-fishing | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
23 | \n", "43 | \n", "Private | \n", "117037 | \n", "11th | \n", "7 | \n", "Married-civ-spouse | \n", "Transport-moving | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "2042 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
24 | \n", "59 | \n", "Private | \n", "109015 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Tech-support | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
25 | \n", "56 | \n", "Local-gov | \n", "216851 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "Tech-support | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
26 | \n", "19 | \n", "Private | \n", "168294 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Craft-repair | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
28 | \n", "39 | \n", "Private | \n", "367260 | \n", "HS-grad | \n", "9 | \n", "Divorced | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "80 | \n", "United-States | \n", "<=50K | \n", "
29 | \n", "49 | \n", "Private | \n", "193366 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Craft-repair | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
30 | \n", "23 | \n", "Local-gov | \n", "190709 | \n", "Assoc-acdm | \n", "12 | \n", "Never-married | \n", "Protective-serv | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "52 | \n", "United-States | \n", "<=50K | \n", "
31 | \n", "20 | \n", "Private | \n", "266015 | \n", "Some-college | \n", "10 | \n", "Never-married | \n", "Sales | \n", "Own-child | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "44 | \n", "United-States | \n", "<=50K | \n", "
32 | \n", "45 | \n", "Private | \n", "386940 | \n", "Bachelors | \n", "13 | \n", "Divorced | \n", "Exec-managerial | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "1408 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
33 | \n", "30 | \n", "Federal-gov | \n", "59951 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
34 | \n", "22 | \n", "State-gov | \n", "311512 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Other-service | \n", "Husband | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "15 | \n", "United-States | \n", "<=50K | \n", "
36 | \n", "21 | \n", "Private | \n", "197200 | \n", "Some-college | \n", "10 | \n", "Never-married | \n", "Machine-op-inspct | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
32528 | \n", "31 | \n", "Private | \n", "292592 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Machine-op-inspct | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32529 | \n", "29 | \n", "Private | \n", "125976 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "Sales | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "35 | \n", "United-States | \n", "<=50K | \n", "
32530 | \n", "35 | \n", "? | \n", "320084 | \n", "Bachelors | \n", "13 | \n", "Married-civ-spouse | \n", "? | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "55 | \n", "United-States | \n", ">50K | \n", "
32531 | \n", "30 | \n", "? | \n", "33811 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "? | \n", "Not-in-family | \n", "Asian-Pac-Islander | \n", "Female | \n", "0 | \n", "0 | \n", "99 | \n", "United-States | \n", "<=50K | \n", "
32532 | \n", "34 | \n", "Private | \n", "204461 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "60 | \n", "United-States | \n", ">50K | \n", "
32534 | \n", "37 | \n", "Private | \n", "179137 | \n", "Some-college | \n", "10 | \n", "Divorced | \n", "Adm-clerical | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "39 | \n", "United-States | \n", "<=50K | \n", "
32535 | \n", "22 | \n", "Private | \n", "325033 | \n", "12th | \n", "8 | \n", "Never-married | \n", "Protective-serv | \n", "Own-child | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "35 | \n", "United-States | \n", "<=50K | \n", "
32536 | \n", "34 | \n", "Private | \n", "160216 | \n", "Bachelors | \n", "13 | \n", "Never-married | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "55 | \n", "United-States | \n", ">50K | \n", "
32537 | \n", "30 | \n", "Private | \n", "345898 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Craft-repair | \n", "Not-in-family | \n", "Black | \n", "Male | \n", "0 | \n", "0 | \n", "46 | \n", "United-States | \n", "<=50K | \n", "
32538 | \n", "38 | \n", "Private | \n", "139180 | \n", "Bachelors | \n", "13 | \n", "Divorced | \n", "Prof-specialty | \n", "Unmarried | \n", "Black | \n", "Female | \n", "15020 | \n", "0 | \n", "45 | \n", "United-States | \n", ">50K | \n", "
32539 | \n", "71 | \n", "? | \n", "287372 | \n", "Doctorate | \n", "16 | \n", "Married-civ-spouse | \n", "? | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "10 | \n", "United-States | \n", ">50K | \n", "
32540 | \n", "45 | \n", "State-gov | \n", "252208 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32541 | \n", "41 | \n", "? | \n", "202822 | \n", "HS-grad | \n", "9 | \n", "Separated | \n", "? | \n", "Not-in-family | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "32 | \n", "United-States | \n", "<=50K | \n", "
32542 | \n", "72 | \n", "? | \n", "129912 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "? | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "25 | \n", "United-States | \n", "<=50K | \n", "
32543 | \n", "45 | \n", "Local-gov | \n", "119199 | \n", "Assoc-acdm | \n", "12 | \n", "Divorced | \n", "Prof-specialty | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "48 | \n", "United-States | \n", "<=50K | \n", "
32544 | \n", "31 | \n", "Private | \n", "199655 | \n", "Masters | \n", "14 | \n", "Divorced | \n", "Other-service | \n", "Not-in-family | \n", "Other | \n", "Female | \n", "0 | \n", "0 | \n", "30 | \n", "United-States | \n", "<=50K | \n", "
32545 | \n", "39 | \n", "Local-gov | \n", "111499 | \n", "Assoc-acdm | \n", "12 | \n", "Married-civ-spouse | \n", "Adm-clerical | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", ">50K | \n", "
32546 | \n", "37 | \n", "Private | \n", "198216 | \n", "Assoc-acdm | \n", "12 | \n", "Divorced | \n", "Tech-support | \n", "Not-in-family | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32548 | \n", "65 | \n", "Self-emp-not-inc | \n", "99359 | \n", "Prof-school | \n", "15 | \n", "Never-married | \n", "Prof-specialty | \n", "Not-in-family | \n", "White | \n", "Male | \n", "1086 | \n", "0 | \n", "60 | \n", "United-States | \n", "<=50K | \n", "
32549 | \n", "43 | \n", "State-gov | \n", "255835 | \n", "Some-college | \n", "10 | \n", "Divorced | \n", "Adm-clerical | \n", "Other-relative | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32550 | \n", "43 | \n", "Self-emp-not-inc | \n", "27242 | \n", "Some-college | \n", "10 | \n", "Married-civ-spouse | \n", "Craft-repair | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<=50K | \n", "
32551 | \n", "32 | \n", "Private | \n", "34066 | \n", "10th | \n", "6 | \n", "Married-civ-spouse | \n", "Handlers-cleaners | \n", "Husband | \n", "Amer-Indian-Eskimo | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32552 | \n", "43 | \n", "Private | \n", "84661 | \n", "Assoc-voc | \n", "11 | \n", "Married-civ-spouse | \n", "Sales | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "45 | \n", "United-States | \n", "<=50K | \n", "
32554 | \n", "53 | \n", "Private | \n", "321865 | \n", "Masters | \n", "14 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
32555 | \n", "22 | \n", "Private | \n", "310152 | \n", "Some-college | \n", "10 | \n", "Never-married | \n", "Protective-serv | \n", "Not-in-family | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32556 | \n", "27 | \n", "Private | \n", "257302 | \n", "Assoc-acdm | \n", "12 | \n", "Married-civ-spouse | \n", "Tech-support | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "38 | \n", "United-States | \n", "<=50K | \n", "
32557 | \n", "40 | \n", "Private | \n", "154374 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Machine-op-inspct | \n", "Husband | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
32558 | \n", "58 | \n", "Private | \n", "151910 | \n", "HS-grad | \n", "9 | \n", "Widowed | \n", "Adm-clerical | \n", "Unmarried | \n", "White | \n", "Female | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "<=50K | \n", "
32559 | \n", "22 | \n", "Private | \n", "201490 | \n", "HS-grad | \n", "9 | \n", "Never-married | \n", "Adm-clerical | \n", "Own-child | \n", "White | \n", "Male | \n", "0 | \n", "0 | \n", "20 | \n", "United-States | \n", "<=50K | \n", "
32560 | \n", "52 | \n", "Self-emp-inc | \n", "287927 | \n", "HS-grad | \n", "9 | \n", "Married-civ-spouse | \n", "Exec-managerial | \n", "Wife | \n", "White | \n", "Female | \n", "15024 | \n", "0 | \n", "40 | \n", "United-States | \n", ">50K | \n", "
29170 rows × 15 columns
\n", "