{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Does Trivers-Willard apply to people?\n",
    "\n",
    "This notebook contains a \"one-day paper\", my attempt to pose a research question, answer it, and publish the results in one work day.\n",
    "\n",
    "Copyright 2016 Allen B. Downey\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function, division\n",
    "\n",
    "import thinkstats2\n",
    "import thinkplot\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import statsmodels.formula.api as smf\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Trivers-Willard\n",
    "\n",
    "[According to Wikipedia](https://en.wikipedia.org/wiki/Trivers%E2%80%93Willard_hypothesis), the Trivers-Willard hypothesis:\n",
    "\n",
    ">\"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition).\"\n",
    "\n",
    "For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys.  Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.\n",
    "\n",
    "To test whether the T-W hypothesis holds up in humans, I downloaded [birth data for the nearly 4 million babies born in the U.S. in 2014](http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Births).\n",
    "\n",
    "I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Summary of results**\n",
    "\n",
    "1.  Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.\n",
    "\n",
    "2.  However, many of the variables are also correlated with race.  If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.\n",
    "\n",
    "3.  Contrary to other reports, the age of the parents seems to have no predictive power.\n",
    "\n",
    "4.  Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits.  Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).\n",
    "\n",
    "Following convention, I report sex ratio in terms of boys per 100 girls.  The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data cleaning\n",
    "\n",
    "Here's how I loaded the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "names = ['year', 'mager9', 'restatus', 'mbrace', 'mhisp_r',\n",
    "        'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', \n",
    "        'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']\n",
    "colspecs = [(15, 18),\n",
    "            (93, 93),\n",
    "            (138, 138),\n",
    "            (143, 143),\n",
    "            (148, 148),\n",
    "            (152, 152),\n",
    "            (153, 153),\n",
    "            (155, 155),\n",
    "            (186, 187),\n",
    "            (191, 191),\n",
    "            (195, 195),\n",
    "            (197, 197),\n",
    "            (212, 212),\n",
    "            (272, 273),\n",
    "            (281, 281),\n",
    "            (555, 556),\n",
    "            (533, 533),\n",
    "            (413, 413),\n",
    "            (436, 436),\n",
    "           ]\n",
    "\n",
    "colspecs = [(start-1, end) for start, end in colspecs]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>mager9</th>\n",
       "      <th>restatus</th>\n",
       "      <th>mbrace</th>\n",
       "      <th>mhisp_r</th>\n",
       "      <th>mar_p</th>\n",
       "      <th>dmar</th>\n",
       "      <th>meduc</th>\n",
       "      <th>fagerrec11</th>\n",
       "      <th>fbrace</th>\n",
       "      <th>fhisp_r</th>\n",
       "      <th>feduc</th>\n",
       "      <th>lbo_rec</th>\n",
       "      <th>previs_rec</th>\n",
       "      <th>wic</th>\n",
       "      <th>height</th>\n",
       "      <th>bmi_r</th>\n",
       "      <th>pay_rec</th>\n",
       "      <th>sex</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2012</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2012</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2012</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2012</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>9</td>\n",
       "      <td>7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2012</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  mager9  restatus  mbrace  mhisp_r mar_p  dmar  meduc  fagerrec11  \\\n",
       "0  2012       6         1       1        0   NaN     1    NaN           5   \n",
       "1  2012       3         1       3        0   NaN     2    NaN           4   \n",
       "2  2012       2         1       2        0   NaN     2    NaN           3   \n",
       "3  2012       3         1       1        0   NaN     1    NaN           3   \n",
       "4  2012       4         1       4        0   NaN     1    NaN           4   \n",
       "\n",
       "   fbrace  fhisp_r  feduc  lbo_rec  previs_rec  wic  height  bmi_r  pay_rec  \\\n",
       "0       1        0    NaN        2           6  NaN     NaN    NaN      NaN   \n",
       "1       3        0    NaN        2           5  NaN     NaN    NaN      NaN   \n",
       "2       2        0    NaN        1           7  NaN     NaN    NaN      NaN   \n",
       "3       1        0    NaN        9           7  NaN     NaN    NaN      NaN   \n",
       "4       1        0    NaN        3           7  NaN     NaN    NaN      NaN   \n",
       "\n",
       "  sex  \n",
       "0   M  \n",
       "1   F  \n",
       "2   M  \n",
       "3   M  \n",
       "4   F  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "filename = 'Nat2012PublicUS.r20131217.gz'\n",
    "#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)\n",
    "#df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/downey/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:3066: PerformanceWarning: \n",
      "your performance may suffer as PyTables will pickle object types that it cannot\n",
      "map directly to c-types [inferred_type->mixed,key->block2_values] [items->['mar_p', 'wic', 'sex']]\n",
      "\n",
      "  exec(code_obj, self.user_global_ns, self.user_ns)\n"
     ]
    }
   ],
   "source": [
    "# store the dataframe for faster loading\n",
    "\n",
    "#store = pd.HDFStore('store.h5')\n",
    "#store['births2013'] = df\n",
    "#store.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# load the dataframe\n",
    "\n",
    "store = pd.HDFStore('store.h5')\n",
    "df = store['births2013']\n",
    "store.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def series_to_ratio(series):\n",
    "    \"\"\"Takes a boolean series and computes sex ratio.\n",
    "    \"\"\"\n",
    "    boys = np.mean(series)\n",
    "    return np.round(100 * boys / (1-boys)).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I have to recode sex as `0` or `1` to make `logit` happy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1935228\n",
       "1    2025568\n",
       "Name: boy, dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['boy'] = (df.sex=='M').astype(int)\n",
    "df.boy.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All births are from 2014."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2012    3960796\n",
       "Name: year, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.year.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's age:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1       3676\n",
       "2     305837\n",
       "3     918221\n",
       "4    1126139\n",
       "5    1015784\n",
       "6     473533\n",
       "7     109807\n",
       "8       7187\n",
       "9        612\n",
       "Name: mager9, dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mager9.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mager9</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mager9     \n",
       "1       112\n",
       "2       106\n",
       "3       104\n",
       "4       105\n",
       "5       105\n",
       "6       105\n",
       "7       105\n",
       "8       100\n",
       "9       112"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mager9'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mager9.isnull().mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.078144140723228367, 0.029692516352773535)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['youngm'] = df.mager9<=2\n",
    "df['oldm'] = df.mager9>=7\n",
    "df.youngm.mean(), df.oldm.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Residence status (1=resident)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2874513\n",
       "2     993222\n",
       "3      85106\n",
       "4       7955\n",
       "Name: restatus, dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.restatus.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>restatus</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          boy\n",
       "restatus     \n",
       "1         105\n",
       "2         105\n",
       "3         105\n",
       "4         108"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'restatus'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    3007229\n",
       "2     634411\n",
       "3      46105\n",
       "4     273051\n",
       "Name: mbrace, dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mbrace.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mbrace</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mbrace     \n",
       "1       105\n",
       "2       103\n",
       "3       104\n",
       "4       106"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mbrace'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's Hispanic origin (0=Non-Hispanic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3015510\n",
       "1     562250\n",
       "2      67192\n",
       "3      17400\n",
       "4     131955\n",
       "5     135597\n",
       "Name: mhisp_r, dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mhisp_r.replace([9], np.nan, inplace=True)\n",
    "df.mhisp_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def copy_null(df, oldvar, newvar):\n",
    "    df.loc[df[oldvar].isnull(), newvar] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.0077994423343186571, 0.23267591269405055)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mhisp'] = df.mhisp_r > 0\n",
    "copy_null(df, 'mhisp_r', 'mhisp')\n",
    "df.mhisp.isnull().mean(), df.mhisp.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mhisp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mhisp     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mhisp'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Marital status (1=Married)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2349102\n",
       "2    1611694\n",
       "Name: dmar, dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dmar.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dmar</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      boy\n",
       "dmar     \n",
       "1     105\n",
       "2     104"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'dmar'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).\n",
    "\n",
    "I recode X (not applicable because married) as Y (paternity acknowledged)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "N     430123\n",
       "Y    3058398\n",
       "Name: mar_p, dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mar_p.replace(['U'], np.nan, inplace=True)\n",
    "df.mar_p.replace(['X'], 'Y', inplace=True)\n",
    "df.mar_p.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mar_p</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mar_p     \n",
       "N      103\n",
       "Y      105"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mar_p'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's education level"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    144045\n",
       "2    443007\n",
       "3    858548\n",
       "4    732444\n",
       "5    266066\n",
       "6    644497\n",
       "7    282351\n",
       "8     81074\n",
       "Name: meduc, dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.meduc.replace([9], np.nan, inplace=True)\n",
    "df.meduc.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>meduc</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "meduc     \n",
       "1      103\n",
       "2      104\n",
       "3      105\n",
       "4      105\n",
       "5      105\n",
       "6      105\n",
       "7      105\n",
       "8      107"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'meduc'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.12844993784077746, 0.17005983722051243)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['lowed'] = df.meduc <= 2\n",
    "copy_null(df, 'meduc', 'lowed')\n",
    "df.lowed.isnull().mean(), df.lowed.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's age, in 10 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1        422\n",
       "2     104428\n",
       "3     527157\n",
       "4     871442\n",
       "5     977564\n",
       "6     591733\n",
       "7     257619\n",
       "8      84016\n",
       "9      26361\n",
       "10     11389\n",
       "Name: fagerrec11, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fagerrec11.replace([11], np.nan, inplace=True)\n",
    "df.fagerrec11.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fagerrec11</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            boy\n",
       "fagerrec11     \n",
       "1           101\n",
       "2           106\n",
       "3           105\n",
       "4           105\n",
       "5           105\n",
       "6           105\n",
       "7           105\n",
       "8           104\n",
       "9           104\n",
       "10          107"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fagerrec11'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.12842494286502007, 0.03037254379975731)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['youngf'] = df.fagerrec11<=2\n",
    "copy_null(df, 'fagerrec11', 'youngf')\n",
    "df.youngf.isnull().mean(), df.youngf.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.12842494286502007, 0.03527270546801382)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['oldf'] = df.fagerrec11>=8\n",
    "copy_null(df, 'fagerrec11', 'oldf')\n",
    "df.oldf.isnull().mean(), df.oldf.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's race"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2475018\n",
       "2     469930\n",
       "3      35175\n",
       "4     227463\n",
       "Name: fbrace, dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fbrace.replace([9], np.nan, inplace=True)\n",
    "df.fbrace.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fbrace</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "fbrace     \n",
       "1       105\n",
       "2       103\n",
       "3       105\n",
       "4       107"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fbrace'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2603738\n",
       "1     500926\n",
       "2      57417\n",
       "3      16953\n",
       "4     105376\n",
       "5     116056\n",
       "Name: fhisp_r, dtype: int64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fhisp_r.replace([9], np.nan, inplace=True)\n",
    "df.fhisp_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.14146903804184816, 0.23429965187124352)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['fhisp'] = df.fhisp_r > 0\n",
    "copy_null(df, 'fhisp_r', 'fhisp')\n",
    "df.fhisp.isnull().mean(), df.fhisp.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fhisp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "fhisp     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fhisp'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's education level"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    142003\n",
       "2    336801\n",
       "3    852923\n",
       "4    574580\n",
       "5    201888\n",
       "6    544487\n",
       "7    210268\n",
       "8     97452\n",
       "Name: feduc, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.feduc.replace([9], np.nan, inplace=True)\n",
    "df.feduc.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feduc</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "feduc     \n",
       "1      103\n",
       "2      105\n",
       "3      105\n",
       "4      105\n",
       "5      105\n",
       "6      105\n",
       "7      106\n",
       "8      105"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'feduc'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Live birth order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    1574534\n",
       "2    1248053\n",
       "3     651817\n",
       "4     276179\n",
       "5     106197\n",
       "6      43907\n",
       "7      19899\n",
       "8      20268\n",
       "Name: lbo_rec, dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.lbo_rec.replace([9], np.nan, inplace=True)\n",
    "df.lbo_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lbo_rec</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>101</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         boy\n",
       "lbo_rec     \n",
       "1        105\n",
       "2        105\n",
       "3        104\n",
       "4        103\n",
       "5        104\n",
       "6        101\n",
       "7        106\n",
       "8        103"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'lbo_rec'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.0050348465308488492, 0.04828166686713083)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['highbo'] = df.lbo_rec >= 5\n",
    "copy_null(df, 'lbo_rec', 'highbo')\n",
    "df.highbo.isnull().mean(), df.highbo.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Number of prenatal visits, in 11 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1       53862\n",
       "2       39409\n",
       "3       90791\n",
       "4      191909\n",
       "5      361056\n",
       "6      809787\n",
       "7     1023277\n",
       "8      659674\n",
       "9      385390\n",
       "10      98941\n",
       "11     124582\n",
       "Name: previs_rec, dtype: int64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.previs_rec.replace([12], np.nan, inplace=True)\n",
    "df.previs_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df.previs_rec.mean()\n",
    "df['previs'] = df.previs_rec - 7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>previs</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-6</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-5</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-4</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-3</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-2</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-1</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "previs     \n",
       "-6      106\n",
       "-5      107\n",
       "-4      107\n",
       "-3      108\n",
       "-2      107\n",
       "-1      106\n",
       " 0      105\n",
       " 1      103\n",
       " 2      102\n",
       " 3      100\n",
       " 4      102"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'previs'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.030831681308504656, 0.014031393099395157)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['no_previs'] = df.previs_rec <= 1\n",
    "copy_null(df, 'previs_rec', 'no_previs')\n",
    "df.no_previs.isnull().mean(), df.no_previs.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whether the mother is eligible for food stamps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "N    1820030\n",
       "Y    1591601\n",
       "Name: wic, dtype: int64"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.wic.replace(['U'], np.nan, inplace=True)\n",
    "df.wic.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>wic</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     boy\n",
       "wic     \n",
       "N    105\n",
       "Y    104"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'wic'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's height in inches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30        14\n",
       "31         3\n",
       "32         2\n",
       "34         1\n",
       "36        17\n",
       "37         5\n",
       "38         9\n",
       "39         4\n",
       "40        13\n",
       "41        18\n",
       "42         8\n",
       "43         8\n",
       "44         6\n",
       "45        15\n",
       "46         9\n",
       "47        21\n",
       "48       732\n",
       "49       505\n",
       "50       335\n",
       "51       414\n",
       "52       480\n",
       "53      1384\n",
       "54      1434\n",
       "55      2561\n",
       "56      6587\n",
       "57     17396\n",
       "58     19343\n",
       "59     71557\n",
       "60    190472\n",
       "61    240815\n",
       "62    424926\n",
       "63    442238\n",
       "64    505897\n",
       "65    404563\n",
       "66    390878\n",
       "67    303110\n",
       "68    174629\n",
       "69    116518\n",
       "70     56687\n",
       "71     30085\n",
       "72     14269\n",
       "73      4971\n",
       "74      2381\n",
       "75       895\n",
       "76       526\n",
       "77       584\n",
       "78      1011\n",
       "Name: height, dtype: int64"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.height.replace([99], np.nan, inplace=True)\n",
    "df.height.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.13443257365438666, 0.03584275286903034)"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mshort'] = df.height<60\n",
    "copy_null(df, 'height', 'mshort')\n",
    "df.mshort.isnull().mean(), df.mshort.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.13443257365438666, 0.03249652309458583)"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mtall'] = df.height>=70\n",
    "copy_null(df, 'height', 'mtall')\n",
    "df.mtall.isnull().mean(), df.mtall.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mshort</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mshort     \n",
       "0       105\n",
       "1       103"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mshort'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mtall</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mtall     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mtall'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's BMI in 6 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1     129937\n",
       "2    1573715\n",
       "3     849357\n",
       "4     442695\n",
       "5     206615\n",
       "6     141411\n",
       "Name: bmi_r, dtype: int64"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.bmi_r.replace([9], np.nan, inplace=True)\n",
    "df.bmi_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bmi_r</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "bmi_r     \n",
       "1      104\n",
       "2      105\n",
       "3      105\n",
       "4      104\n",
       "5      104\n",
       "6      104"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'bmi_r'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.15579343142136076, 0.23647872286338908)"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['obese'] = df.bmi_r >= 4\n",
    "copy_null(df, 'bmi_r', 'obese')\n",
    "df.obese.isnull().mean(), df.obese.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    1497162\n",
       "2    1628336\n",
       "3     147475\n",
       "4     174821\n",
       "Name: pay_rec, dtype: int64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pay_rec.replace([9], np.nan, inplace=True)\n",
    "df.pay_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pay_rec</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         boy\n",
       "pay_rec     \n",
       "1        104\n",
       "2        105\n",
       "3        105\n",
       "4        105"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'pay_rec'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sex of baby"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "F    1935228\n",
       "M    2025568\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sex.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Regression models\n",
    "\n",
    "Here are some functions I'll use to interpret the results of logistic regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def logodds_to_ratio(logodds):\n",
    "    \"\"\"Convert from log odds to probability.\"\"\"\n",
    "    odds = np.exp(logodds)\n",
    "    return 100 * odds\n",
    "\n",
    "def summarize(results):\n",
    "    \"\"\"Summarize parameters in terms of birth ratio.\"\"\"\n",
    "    inter_or = results.params['Intercept']\n",
    "    inter_rat = logodds_to_ratio(inter_or)\n",
    "    \n",
    "    for value, lor in results.params.iteritems():\n",
    "        if value=='Intercept':\n",
    "            continue\n",
    "        \n",
    "        rat = logodds_to_ratio(inter_or + lor)\n",
    "        code = '*' if results.pvalues[value] < 0.05 else ' '\n",
    "        \n",
    "        print('%-20s   %0.1f   %0.1f' % (value, inter_rat, rat), code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I'll run models with each variable, one at a time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's age seems to have no predictive value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692887\n",
      "         Iterations 3\n",
      "mager9                 104.9   104.8  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3960796</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3960794</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.778e-08</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:54:16</td>     <th>  Log-Likelihood:    </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.5733</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0475</td> <td>    0.004</td> <td>   13.358</td> <td> 0.000</td> <td>    0.041     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>    <td>   -0.0005</td> <td>    0.001</td> <td>   -0.563</td> <td> 0.573</td> <td>   -0.002     0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3960796\n",
       "Model:                          Logit   Df Residuals:                  3960794\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.778e-08\n",
       "Time:                        14:54:16   Log-Likelihood:            -2.7444e+06\n",
       "converged:                       True   LL-Null:                   -2.7444e+06\n",
       "                                        LLR p-value:                    0.5733\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0475      0.004     13.358      0.000         0.041     0.055\n",
       "mager9        -0.0005      0.001     -0.563      0.573        -0.002     0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mager9', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692886\n",
      "         Iterations 3\n",
      "youngm[T.True]         104.6   105.6 *\n",
      "oldm[T.True]           104.6   104.4  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3960796</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3960793</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.205e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:54:22</td>     <th>  Log-Likelihood:    </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.03667</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0449</td> <td>    0.001</td> <td>   42.231</td> <td> 0.000</td> <td>    0.043     0.047</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngm[T.True]</th> <td>    0.0095</td> <td>    0.004</td> <td>    2.529</td> <td> 0.011</td> <td>    0.002     0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldm[T.True]</th>   <td>   -0.0020</td> <td>    0.006</td> <td>   -0.334</td> <td> 0.739</td> <td>   -0.014     0.010</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3960796\n",
       "Model:                          Logit   Df Residuals:                  3960793\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.205e-06\n",
       "Time:                        14:54:22   Log-Likelihood:            -2.7444e+06\n",
       "converged:                       True   LL-Null:                   -2.7444e+06\n",
       "                                        LLR p-value:                   0.03667\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0449      0.001     42.231      0.000         0.043     0.047\n",
       "youngm[T.True]     0.0095      0.004      2.529      0.011         0.002     0.017\n",
       "oldm[T.True]      -0.0020      0.006     -0.334      0.739        -0.014     0.010\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ youngm + oldm', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Neither does residence status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692887\n",
      "         Iterations 3\n",
      "C(restatus)[T.2]       104.6   104.7  \n",
      "C(restatus)[T.3]       104.6   105.4  \n",
      "C(restatus)[T.4]       104.6   108.2  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3960796</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3960792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.393e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:54:48</td>     <th>  Log-Likelihood:    </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.3196</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0452</td> <td>    0.001</td> <td>   38.300</td> <td> 0.000</td> <td>    0.043     0.048</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.2]</th> <td>    0.0008</td> <td>    0.002</td> <td>    0.338</td> <td> 0.735</td> <td>   -0.004     0.005</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.3]</th> <td>    0.0078</td> <td>    0.007</td> <td>    1.126</td> <td> 0.260</td> <td>   -0.006     0.021</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.4]</th> <td>    0.0335</td> <td>    0.022</td> <td>    1.493</td> <td> 0.136</td> <td>   -0.011     0.078</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3960796\n",
       "Model:                          Logit   Df Residuals:                  3960792\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               6.393e-07\n",
       "Time:                        14:54:48   Log-Likelihood:            -2.7444e+06\n",
       "converged:                       True   LL-Null:                   -2.7444e+06\n",
       "                                        LLR p-value:                    0.3196\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0452      0.001     38.300      0.000         0.043     0.048\n",
       "C(restatus)[T.2]     0.0008      0.002      0.338      0.735        -0.004     0.005\n",
       "C(restatus)[T.3]     0.0078      0.007      1.126      0.260        -0.006     0.021\n",
       "C(restatus)[T.4]     0.0335      0.022      1.493      0.136        -0.011     0.078\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(restatus)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's race seems to have predictive value.  Relative to whites, black and Native American mothers have more girls; Asians have more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692881\n",
      "         Iterations 3\n",
      "C(mbrace)[T.2]         104.8   103.3 *\n",
      "C(mbrace)[T.3]         104.8   104.0  \n",
      "C(mbrace)[T.4]         104.8   106.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3960796</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3960792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.640e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:55:15</td>     <th>  Log-Likelihood:    </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>2.829e-10</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0471</td> <td>    0.001</td> <td>   40.838</td> <td> 0.000</td> <td>    0.045     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.2]</th> <td>   -0.0149</td> <td>    0.003</td> <td>   -5.382</td> <td> 0.000</td> <td>   -0.020    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.3]</th> <td>   -0.0075</td> <td>    0.009</td> <td>   -0.799</td> <td> 0.424</td> <td>   -0.026     0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.4]</th> <td>    0.0143</td> <td>    0.004</td> <td>    3.567</td> <td> 0.000</td> <td>    0.006     0.022</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3960796\n",
       "Model:                          Logit   Df Residuals:                  3960792\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.640e-06\n",
       "Time:                        14:55:15   Log-Likelihood:            -2.7444e+06\n",
       "converged:                       True   LL-Null:                   -2.7444e+06\n",
       "                                        LLR p-value:                 2.829e-10\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0471      0.001     40.838      0.000         0.045     0.049\n",
       "C(mbrace)[T.2]    -0.0149      0.003     -5.382      0.000        -0.020    -0.009\n",
       "C(mbrace)[T.3]    -0.0075      0.009     -0.799      0.424        -0.026     0.011\n",
       "C(mbrace)[T.4]     0.0143      0.004      3.567      0.000         0.006     0.022\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(mbrace)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hispanic mothers have more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692884\n",
      "         Iterations 3\n",
      "mhisp                  105.0   103.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3929904</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3929902</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.225e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:55:20</td>     <th>  Log-Likelihood:    </th> <td>-2.7230e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7230e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>9.580e-08</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0485</td> <td>    0.001</td> <td>   42.133</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mhisp</th>     <td>   -0.0127</td> <td>    0.002</td> <td>   -5.335</td> <td> 0.000</td> <td>   -0.017    -0.008</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3929904\n",
       "Model:                          Logit   Df Residuals:                  3929902\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.225e-06\n",
       "Time:                        14:55:20   Log-Likelihood:            -2.7230e+06\n",
       "converged:                       True   LL-Null:                   -2.7230e+06\n",
       "                                        LLR p-value:                 9.580e-08\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0485      0.001     42.133      0.000         0.046     0.051\n",
       "mhisp         -0.0127      0.002     -5.335      0.000        -0.017    -0.008\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mhisp', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692875\n",
      "         Iterations 3\n",
      "C(mar_p)[T.Y]          103.4   104.9 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3488521</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3488519</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.062e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:55:45</td>     <th>  Log-Likelihood:    </th> <td>-2.4171e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.4171e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>9.370e-06</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "        <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>     <td>    0.0338</td> <td>    0.003</td> <td>   11.071</td> <td> 0.000</td> <td>    0.028     0.040</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mar_p)[T.Y]</th> <td>    0.0144</td> <td>    0.003</td> <td>    4.431</td> <td> 0.000</td> <td>    0.008     0.021</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3488521\n",
       "Model:                          Logit   Df Residuals:                  3488519\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               4.062e-06\n",
       "Time:                        14:55:45   Log-Likelihood:            -2.4171e+06\n",
       "converged:                       True   LL-Null:                   -2.4171e+06\n",
       "                                        LLR p-value:                 9.370e-06\n",
       "=================================================================================\n",
       "                    coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "---------------------------------------------------------------------------------\n",
       "Intercept         0.0338      0.003     11.071      0.000         0.028     0.040\n",
       "C(mar_p)[T.Y]     0.0144      0.003      4.431      0.000         0.008     0.021\n",
       "=================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(mar_p)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Being unmarried predicts more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692885\n",
      "         Iterations 3\n",
      "C(dmar)[T.2]           105.0   104.2 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3960796</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3960794</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.561e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:11</td>     <th>  Log-Likelihood:    </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7444e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0001776</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "        <td></td>          <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>    <td>    0.0487</td> <td>    0.001</td> <td>   37.345</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(dmar)[T.2]</th> <td>   -0.0077</td> <td>    0.002</td> <td>   -3.749</td> <td> 0.000</td> <td>   -0.012    -0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3960796\n",
       "Model:                          Logit   Df Residuals:                  3960794\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               2.561e-06\n",
       "Time:                        14:56:11   Log-Likelihood:            -2.7444e+06\n",
       "converged:                       True   LL-Null:                   -2.7444e+06\n",
       "                                        LLR p-value:                 0.0001776\n",
       "================================================================================\n",
       "                   coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "--------------------------------------------------------------------------------\n",
       "Intercept        0.0487      0.001     37.345      0.000         0.046     0.051\n",
       "C(dmar)[T.2]    -0.0077      0.002     -3.749      0.000        -0.012    -0.004\n",
       "================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(dmar)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each level of mother's education predicts a small increase in the probability of a boy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692874\n",
      "         Iterations 3\n",
      "meduc                  103.4   103.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3452032</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3452030</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.742e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:15</td>     <th>  Log-Likelihood:    </th> <td>-2.3918e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3918e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.599e-07</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0330</td> <td>    0.003</td> <td>   11.862</td> <td> 0.000</td> <td>    0.028     0.038</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>meduc</th>     <td>    0.0032</td> <td>    0.001</td> <td>    5.241</td> <td> 0.000</td> <td>    0.002     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3452032\n",
       "Model:                          Logit   Df Residuals:                  3452030\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.742e-06\n",
       "Time:                        14:56:15   Log-Likelihood:            -2.3918e+06\n",
       "converged:                       True   LL-Null:                   -2.3918e+06\n",
       "                                        LLR p-value:                 1.599e-07\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0330      0.003     11.862      0.000         0.028     0.038\n",
       "meduc          0.0032      0.001      5.241      0.000         0.002     0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ meduc', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692875\n",
      "         Iterations 3\n",
      "lowed                  105.0   103.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3452032</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3452030</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.472e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:19</td>     <th>  Log-Likelihood:    </th> <td>-2.3918e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3918e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.594e-05</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0484</td> <td>    0.001</td> <td>   40.975</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>     <td>   -0.0117</td> <td>    0.003</td> <td>   -4.075</td> <td> 0.000</td> <td>   -0.017    -0.006</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3452032\n",
       "Model:                          Logit   Df Residuals:                  3452030\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               3.472e-06\n",
       "Time:                        14:56:19   Log-Likelihood:            -2.3918e+06\n",
       "converged:                       True   LL-Null:                   -2.3918e+06\n",
       "                                        LLR p-value:                 4.594e-05\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0484      0.001     40.975      0.000         0.046     0.051\n",
       "lowed         -0.0117      0.003     -4.075      0.000        -0.017    -0.006\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ lowed', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692865\n",
      "         Iterations 3\n",
      "fagerrec11             105.5   105.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3452131</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3452129</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.250e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:23</td>     <th>  Log-Likelihood:    </th> <td>-2.3919e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3919e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.1130</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>  <td>    0.0533</td> <td>    0.004</td> <td>   13.960</td> <td> 0.000</td> <td>    0.046     0.061</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fagerrec11</th> <td>   -0.0012</td> <td>    0.001</td> <td>   -1.585</td> <td> 0.113</td> <td>   -0.003     0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3452131\n",
       "Model:                          Logit   Df Residuals:                  3452129\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.250e-07\n",
       "Time:                        14:56:23   Log-Likelihood:            -2.3919e+06\n",
       "converged:                       True   LL-Null:                   -2.3919e+06\n",
       "                                        LLR p-value:                    0.1130\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0533      0.004     13.960      0.000         0.046     0.061\n",
       "fagerrec11    -0.0012      0.001     -1.585      0.113        -0.003     0.000\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ fagerrec11', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692865\n",
      "         Iterations 3\n",
      "youngf                 104.9   105.8  \n",
      "oldf                   104.9   104.2  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3452131</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3452128</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.160e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:28</td>     <th>  Log-Likelihood:    </th> <td>-2.3919e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3919e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.1804</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0474</td> <td>    0.001</td> <td>   42.574</td> <td> 0.000</td> <td>    0.045     0.050</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngf</th>    <td>    0.0088</td> <td>    0.006</td> <td>    1.405</td> <td> 0.160</td> <td>   -0.003     0.021</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldf</th>      <td>   -0.0068</td> <td>    0.006</td> <td>   -1.156</td> <td> 0.248</td> <td>   -0.018     0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3452131\n",
       "Model:                          Logit   Df Residuals:                  3452128\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               7.160e-07\n",
       "Time:                        14:56:28   Log-Likelihood:            -2.3919e+06\n",
       "converged:                       True   LL-Null:                   -2.3919e+06\n",
       "                                        LLR p-value:                    0.1804\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0474      0.001     42.574      0.000         0.045     0.050\n",
       "youngf         0.0088      0.006      1.405      0.160        -0.003     0.021\n",
       "oldf          -0.0068      0.006     -1.156      0.248        -0.018     0.005\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ youngf + oldf', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692850\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.0   103.4 *\n",
      "C(fbrace)[T.3.0]       105.0   104.7  \n",
      "C(fbrace)[T.4.0]       105.0   107.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3207586</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3207582</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.138e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:53</td>     <th>  Log-Likelihood:    </th> <td>-2.2224e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2224e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>6.021e-11</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0492</td> <td>    0.001</td> <td>   38.677</td> <td> 0.000</td> <td>    0.047     0.052</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0161</td> <td>    0.003</td> <td>   -5.070</td> <td> 0.000</td> <td>   -0.022    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0035</td> <td>    0.011</td> <td>   -0.328</td> <td> 0.743</td> <td>   -0.025     0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0191</td> <td>    0.004</td> <td>    4.360</td> <td> 0.000</td> <td>    0.011     0.028</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3207586\n",
       "Model:                          Logit   Df Residuals:                  3207582\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.138e-05\n",
       "Time:                        14:56:53   Log-Likelihood:            -2.2224e+06\n",
       "converged:                       True   LL-Null:                   -2.2224e+06\n",
       "                                        LLR p-value:                 6.021e-11\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0492      0.001     38.677      0.000         0.047     0.052\n",
       "C(fbrace)[T.2.0]    -0.0161      0.003     -5.070      0.000        -0.022    -0.010\n",
       "C(fbrace)[T.3.0]    -0.0035      0.011     -0.328      0.743        -0.025     0.018\n",
       "C(fbrace)[T.4.0]     0.0191      0.004      4.360      0.000         0.011     0.028\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(fbrace)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the father is Hispanic, that predicts more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692864\n",
      "         Iterations 3\n",
      "fhisp                  105.2   103.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3400466</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3400464</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.006e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:56:57</td>     <th>  Log-Likelihood:    </th> <td>-2.3561e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3561e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>8.137e-10</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0508</td> <td>    0.001</td> <td>   41.012</td> <td> 0.000</td> <td>    0.048     0.053</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>     <td>   -0.0157</td> <td>    0.003</td> <td>   -6.142</td> <td> 0.000</td> <td>   -0.021    -0.011</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3400466\n",
       "Model:                          Logit   Df Residuals:                  3400464\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.006e-06\n",
       "Time:                        14:56:57   Log-Likelihood:            -2.3561e+06\n",
       "converged:                       True   LL-Null:                   -2.3561e+06\n",
       "                                        LLR p-value:                 8.137e-10\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0508      0.001     41.012      0.000         0.048     0.053\n",
       "fhisp         -0.0157      0.003     -6.142      0.000        -0.021    -0.011\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ fhisp', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's education level might predict more boys, but the apparent effect could be due to chance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692855\n",
      "         Iterations 3\n",
      "feduc                  103.9   104.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2960402</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2960400</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.476e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:00</td>     <th>  Log-Likelihood:    </th> <td>-2.0511e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.0511e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0001591</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0379</td> <td>    0.003</td> <td>   12.866</td> <td> 0.000</td> <td>    0.032     0.044</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>feduc</th>     <td>    0.0025</td> <td>    0.001</td> <td>    3.776</td> <td> 0.000</td> <td>    0.001     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2960402\n",
       "Model:                          Logit   Df Residuals:                  2960400\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               3.476e-06\n",
       "Time:                        14:57:00   Log-Likelihood:            -2.0511e+06\n",
       "converged:                       True   LL-Null:                   -2.0511e+06\n",
       "                                        LLR p-value:                 0.0001591\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0379      0.003     12.866      0.000         0.032     0.044\n",
       "feduc          0.0025      0.001      3.776      0.000         0.001     0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ feduc', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Babies with high birth order are slightly more likely to be girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692885\n",
      "         Iterations 3\n",
      "lbo_rec                105.5   105.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3940854</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3940852</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.164e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:05</td>     <th>  Log-Likelihood:    </th> <td>-2.7306e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7306e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.855e-06</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0536</td> <td>    0.002</td> <td>   27.348</td> <td> 0.000</td> <td>    0.050     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lbo_rec</th>   <td>   -0.0038</td> <td>    0.001</td> <td>   -4.769</td> <td> 0.000</td> <td>   -0.005    -0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3940854\n",
       "Model:                          Logit   Df Residuals:                  3940852\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               4.164e-06\n",
       "Time:                        14:57:05   Log-Likelihood:            -2.7306e+06\n",
       "converged:                       True   LL-Null:                   -2.7306e+06\n",
       "                                        LLR p-value:                 1.855e-06\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0536      0.002     27.348      0.000         0.050     0.057\n",
       "lbo_rec       -0.0038      0.001     -4.769      0.000        -0.005    -0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ lbo_rec', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692887\n",
      "         Iterations 3\n",
      "highbo                 104.7   103.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3940854</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3940852</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.626e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:10</td>     <th>  Log-Likelihood:    </th> <td>-2.7306e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7306e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.02997</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0460</td> <td>    0.001</td> <td>   44.570</td> <td> 0.000</td> <td>    0.044     0.048</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>    <td>   -0.0102</td> <td>    0.005</td> <td>   -2.171</td> <td> 0.030</td> <td>   -0.019    -0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3940854\n",
       "Model:                          Logit   Df Residuals:                  3940852\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.626e-07\n",
       "Time:                        14:57:10   Log-Likelihood:            -2.7306e+06\n",
       "converged:                       True   LL-Null:                   -2.7306e+06\n",
       "                                        LLR p-value:                   0.02997\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0460      0.001     44.570      0.000         0.044     0.048\n",
       "highbo        -0.0102      0.005     -2.171      0.030        -0.019    -0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ highbo', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Strangely, prenatal visits are associated with an increased probability of girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692859\n",
      "         Iterations 3\n",
      "previs                 104.5   103.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3838678</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3838676</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.565e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:15</td>     <th>  Log-Likelihood:    </th> <td>-2.6597e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6598e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>9.364e-55</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0436</td> <td>    0.001</td> <td>   42.437</td> <td> 0.000</td> <td>    0.042     0.046</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0086</td> <td>    0.001</td> <td>  -15.583</td> <td> 0.000</td> <td>   -0.010    -0.007</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3838678\n",
       "Model:                          Logit   Df Residuals:                  3838676\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               4.565e-05\n",
       "Time:                        14:57:15   Log-Likelihood:            -2.6597e+06\n",
       "converged:                       True   LL-Null:                   -2.6598e+06\n",
       "                                        LLR p-value:                 9.364e-55\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0436      0.001     42.437      0.000         0.042     0.046\n",
       "previs        -0.0086      0.001    -15.583      0.000        -0.010    -0.007\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ previs', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692856\n",
      "         Iterations 3\n",
      "no_previs              104.5   99.7 *\n",
      "previs                 104.5   103.5 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3838678</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3838675</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.047e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:21</td>     <th>  Log-Likelihood:    </th> <td>-2.6597e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6598e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>5.053e-59</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0440</td> <td>    0.001</td> <td>   42.713</td> <td> 0.000</td> <td>    0.042     0.046</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th> <td>   -0.0473</td> <td>    0.009</td> <td>   -5.061</td> <td> 0.000</td> <td>   -0.066    -0.029</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0097</td> <td>    0.001</td> <td>  -16.347</td> <td> 0.000</td> <td>   -0.011    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3838678\n",
       "Model:                          Logit   Df Residuals:                  3838675\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.047e-05\n",
       "Time:                        14:57:21   Log-Likelihood:            -2.6597e+06\n",
       "converged:                       True   LL-Null:                   -2.6598e+06\n",
       "                                        LLR p-value:                 5.053e-59\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0440      0.001     42.713      0.000         0.042     0.046\n",
       "no_previs     -0.0473      0.009     -5.061      0.000        -0.066    -0.029\n",
       "previs        -0.0097      0.001    -16.347      0.000        -0.011    -0.009\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ no_previs + previs', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the mother qualifies for food stamps, she is more likely to have a girl."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692878\n",
      "         Iterations 3\n",
      "wic[T.Y]               105.2   104.2 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3411631</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3411629</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.607e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:47</td>     <th>  Log-Likelihood:    </th> <td>-2.3638e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3639e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.635e-05</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0504</td> <td>    0.001</td> <td>   33.979</td> <td> 0.000</td> <td>    0.047     0.053</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>  <td>   -0.0090</td> <td>    0.002</td> <td>   -4.130</td> <td> 0.000</td> <td>   -0.013    -0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3411631\n",
       "Model:                          Logit   Df Residuals:                  3411629\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               3.607e-06\n",
       "Time:                        14:57:47   Log-Likelihood:            -2.3638e+06\n",
       "converged:                       True   LL-Null:                   -2.3639e+06\n",
       "                                        LLR p-value:                 3.635e-05\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0504      0.001     33.979      0.000         0.047     0.053\n",
       "wic[T.Y]      -0.0090      0.002     -4.130      0.000        -0.013    -0.005\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ wic', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's height seems to have no predictive value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692877\n",
      "         Iterations 3\n",
      "height                 99.3   99.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3428336</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3428334</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.043e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:51</td>     <th>  Log-Likelihood:    </th> <td>-2.3754e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3754e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.02598</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>   -0.0075</td> <td>    0.024</td> <td>   -0.309</td> <td> 0.757</td> <td>   -0.055     0.040</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>height</th>    <td>    0.0008</td> <td>    0.000</td> <td>    2.226</td> <td> 0.026</td> <td>    0.000     0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3428336\n",
       "Model:                          Logit   Df Residuals:                  3428334\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.043e-06\n",
       "Time:                        14:57:51   Log-Likelihood:            -2.3754e+06\n",
       "converged:                       True   LL-Null:                   -2.3754e+06\n",
       "                                        LLR p-value:                   0.02598\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept     -0.0075      0.024     -0.309      0.757        -0.055     0.040\n",
       "height         0.0008      0.000      2.226      0.026         0.000     0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ height', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692876\n",
      "         Iterations 3\n",
      "mtall                  104.8   104.0  \n",
      "mshort                 104.8   103.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3428336</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3428333</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.593e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:55</td>     <th>  Log-Likelihood:    </th> <td>-2.3754e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3754e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.02272</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0472</td> <td>    0.001</td> <td>   42.200</td> <td> 0.000</td> <td>    0.045     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mtall</th>     <td>   -0.0076</td> <td>    0.006</td> <td>   -1.249</td> <td> 0.212</td> <td>   -0.020     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mshort</th>    <td>   -0.0145</td> <td>    0.006</td> <td>   -2.494</td> <td> 0.013</td> <td>   -0.026    -0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3428336\n",
       "Model:                          Logit   Df Residuals:                  3428333\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.593e-06\n",
       "Time:                        14:57:55   Log-Likelihood:            -2.3754e+06\n",
       "converged:                       True   LL-Null:                   -2.3754e+06\n",
       "                                        LLR p-value:                   0.02272\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0472      0.001     42.200      0.000         0.045     0.049\n",
       "mtall         -0.0076      0.006     -1.249      0.212        -0.020     0.004\n",
       "mshort        -0.0145      0.006     -2.494      0.013        -0.026    -0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mtall + mshort', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's with higher BMI are more likely to have girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692879\n",
      "         Iterations 3\n",
      "bmi_r                  105.4   105.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3343730</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3343728</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.109e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:57:59</td>     <th>  Log-Likelihood:    </th> <td>-2.3168e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3168e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.02338</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0523</td> <td>    0.003</td> <td>   18.191</td> <td> 0.000</td> <td>    0.047     0.058</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>bmi_r</th>     <td>   -0.0021</td> <td>    0.001</td> <td>   -2.267</td> <td> 0.023</td> <td>   -0.004    -0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3343730\n",
       "Model:                          Logit   Df Residuals:                  3343728\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.109e-06\n",
       "Time:                        14:57:59   Log-Likelihood:            -2.3168e+06\n",
       "converged:                       True   LL-Null:                   -2.3168e+06\n",
       "                                        LLR p-value:                   0.02338\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0523      0.003     18.191      0.000         0.047     0.058\n",
       "bmi_r         -0.0021      0.001     -2.267      0.023        -0.004    -0.000\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ bmi_r', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692878\n",
      "         Iterations 3\n",
      "obese                  104.9   104.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3343730</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3343728</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.833e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:58:03</td>     <th>  Log-Likelihood:    </th> <td>-2.3168e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3168e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.003567</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0481</td> <td>    0.001</td> <td>   38.389</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>     <td>   -0.0075</td> <td>    0.003</td> <td>   -2.914</td> <td> 0.004</td> <td>   -0.013    -0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3343730\n",
       "Model:                          Logit   Df Residuals:                  3343728\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.833e-06\n",
       "Time:                        14:58:03   Log-Likelihood:            -2.3168e+06\n",
       "converged:                       True   LL-Null:                   -2.3168e+06\n",
       "                                        LLR p-value:                  0.003567\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0481      0.001     38.389      0.000         0.046     0.051\n",
       "obese         -0.0075      0.003     -2.914      0.004        -0.013    -0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ obese', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If payment was made by Medicaid, the baby is more likely to be a girl.  Private insurance, self-payment, and other payment method are associated with more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692877\n",
      "         Iterations 3\n",
      "C(pay_rec)[T.2.0]      104.4   105.1 *\n",
      "C(pay_rec)[T.3.0]      104.4   105.3  \n",
      "C(pay_rec)[T.4.0]      104.4   104.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3447794</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3447790</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.074e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:58:29</td>     <th>  Log-Likelihood:    </th> <td>-2.3889e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3889e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.01934</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0427</td> <td>    0.002</td> <td>   26.107</td> <td> 0.000</td> <td>    0.039     0.046</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>    0.0067</td> <td>    0.002</td> <td>    2.944</td> <td> 0.003</td> <td>    0.002     0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0094</td> <td>    0.005</td> <td>    1.720</td> <td> 0.085</td> <td>   -0.001     0.020</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>    0.0033</td> <td>    0.005</td> <td>    0.645</td> <td> 0.519</td> <td>   -0.007     0.013</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3447794\n",
       "Model:                          Logit   Df Residuals:                  3447790\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               2.074e-06\n",
       "Time:                        14:58:29   Log-Likelihood:            -2.3889e+06\n",
       "converged:                       True   LL-Null:                   -2.3889e+06\n",
       "                                        LLR p-value:                   0.01934\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0427      0.002     26.107      0.000         0.039     0.046\n",
       "C(pay_rec)[T.2.0]     0.0067      0.002      2.944      0.003         0.002     0.011\n",
       "C(pay_rec)[T.3.0]     0.0094      0.005      1.720      0.085        -0.001     0.020\n",
       "C(pay_rec)[T.4.0]     0.0033      0.005      0.645      0.519        -0.007     0.013\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(pay_rec)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding controls\n",
    "\n",
    "However, none of the previous results should be taken too seriously.  We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.\n",
    "\n",
    "In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692846\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.3 *\n",
      "C(fbrace)[T.3.0]       105.5   104.1  \n",
      "C(fbrace)[T.4.0]       105.5   107.0  \n",
      "C(mbrace)[T.2]         105.5   105.7  \n",
      "C(mbrace)[T.3]         105.5   106.9  \n",
      "C(mbrace)[T.4]         105.5   105.6  \n",
      "fhisp                  105.5   104.1 *\n",
      "mhisp                  105.5   105.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3184121</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3184112</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     8</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.935e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:59:16</td>     <th>  Log-Likelihood:    </th> <td>-2.2061e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2061e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.988e-15</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0531</td> <td>    0.001</td> <td>   35.736</td> <td> 0.000</td> <td>    0.050     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0211</td> <td>    0.006</td> <td>   -3.688</td> <td> 0.000</td> <td>   -0.032    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0125</td> <td>    0.013</td> <td>   -1.002</td> <td> 0.316</td> <td>   -0.037     0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0142</td> <td>    0.007</td> <td>    1.936</td> <td> 0.053</td> <td>   -0.000     0.029</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.2]</th>   <td>    0.0022</td> <td>    0.006</td> <td>    0.367</td> <td> 0.714</td> <td>   -0.010     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.3]</th>   <td>    0.0140</td> <td>    0.013</td> <td>    1.076</td> <td> 0.282</td> <td>   -0.012     0.040</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.4]</th>   <td>    0.0013</td> <td>    0.007</td> <td>    0.186</td> <td> 0.853</td> <td>   -0.012     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0132</td> <td>    0.004</td> <td>   -2.951</td> <td> 0.003</td> <td>   -0.022    -0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mhisp</th>            <td>   -0.0046</td> <td>    0.004</td> <td>   -1.045</td> <td> 0.296</td> <td>   -0.013     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3184121\n",
       "Model:                          Logit   Df Residuals:                  3184112\n",
       "Method:                           MLE   Df Model:                            8\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.935e-05\n",
       "Time:                        14:59:16   Log-Likelihood:            -2.2061e+06\n",
       "converged:                       True   LL-Null:                   -2.2061e+06\n",
       "                                        LLR p-value:                 3.988e-15\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0531      0.001     35.736      0.000         0.050     0.056\n",
       "C(fbrace)[T.2.0]    -0.0211      0.006     -3.688      0.000        -0.032    -0.010\n",
       "C(fbrace)[T.3.0]    -0.0125      0.013     -1.002      0.316        -0.037     0.012\n",
       "C(fbrace)[T.4.0]     0.0142      0.007      1.936      0.053        -0.000     0.029\n",
       "C(mbrace)[T.2]       0.0022      0.006      0.367      0.714        -0.010     0.014\n",
       "C(mbrace)[T.3]       0.0140      0.013      1.076      0.282        -0.012     0.040\n",
       "C(mbrace)[T.4]       0.0013      0.007      0.186      0.853        -0.012     0.015\n",
       "fhisp               -0.0132      0.004     -2.951      0.003        -0.022    -0.004\n",
       "mhisp               -0.0046      0.004     -1.045      0.296        -0.013     0.004\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692837\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.1   103.0 *\n",
      "C(fbrace)[T.3.0]       105.1   104.0  \n",
      "C(fbrace)[T.4.0]       105.1   106.6 *\n",
      "mar_p[T.Y]             105.1   105.6  \n",
      "fhisp                  105.1   103.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2798315</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2798309</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.968e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:00:03</td>     <th>  Log-Likelihood:    </th> <td>-1.9388e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.9388e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.935e-15</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0497</td> <td>    0.014</td> <td>    3.433</td> <td> 0.001</td> <td>    0.021     0.078</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0201</td> <td>    0.003</td> <td>   -5.761</td> <td> 0.000</td> <td>   -0.027    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0104</td> <td>    0.012</td> <td>   -0.858</td> <td> 0.391</td> <td>   -0.034     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0144</td> <td>    0.005</td> <td>    3.013</td> <td> 0.003</td> <td>    0.005     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mar_p[T.Y]</th>       <td>    0.0045</td> <td>    0.014</td> <td>    0.310</td> <td> 0.757</td> <td>   -0.024     0.033</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0177</td> <td>    0.003</td> <td>   -5.694</td> <td> 0.000</td> <td>   -0.024    -0.012</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2798315\n",
       "Model:                          Logit   Df Residuals:                  2798309\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.968e-05\n",
       "Time:                        15:00:03   Log-Likelihood:            -1.9388e+06\n",
       "converged:                       True   LL-Null:                   -1.9388e+06\n",
       "                                        LLR p-value:                 4.935e-15\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0497      0.014      3.433      0.001         0.021     0.078\n",
       "C(fbrace)[T.2.0]    -0.0201      0.003     -5.761      0.000        -0.027    -0.013\n",
       "C(fbrace)[T.3.0]    -0.0104      0.012     -0.858      0.391        -0.034     0.013\n",
       "C(fbrace)[T.4.0]     0.0144      0.005      3.013      0.003         0.005     0.024\n",
       "mar_p[T.Y]           0.0045      0.014      0.310      0.757        -0.024     0.033\n",
       "fhisp               -0.0177      0.003     -5.694      0.000        -0.024    -0.012\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + mar_p')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Being married still predicts more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692846\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       104.9   102.7 *\n",
      "C(fbrace)[T.3.0]       104.9   104.1  \n",
      "C(fbrace)[T.4.0]       104.9   106.6 *\n",
      "fhisp                  104.9   103.0 *\n",
      "dmar                   104.9   105.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3188403</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3188397</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.937e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:00:29</td>     <th>  Log-Likelihood:    </th> <td>-2.2091e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2091e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>5.665e-17</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0478</td> <td>    0.003</td> <td>   13.880</td> <td> 0.000</td> <td>    0.041     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0209</td> <td>    0.003</td> <td>   -6.174</td> <td> 0.000</td> <td>   -0.028    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0079</td> <td>    0.011</td> <td>   -0.728</td> <td> 0.467</td> <td>   -0.029     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0159</td> <td>    0.004</td> <td>    3.589</td> <td> 0.000</td> <td>    0.007     0.025</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0177</td> <td>    0.003</td> <td>   -5.947</td> <td> 0.000</td> <td>   -0.024    -0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>             <td>    0.0043</td> <td>    0.003</td> <td>    1.667</td> <td> 0.096</td> <td>   -0.001     0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3188403\n",
       "Model:                          Logit   Df Residuals:                  3188397\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.937e-05\n",
       "Time:                        15:00:29   Log-Likelihood:            -2.2091e+06\n",
       "converged:                       True   LL-Null:                   -2.2091e+06\n",
       "                                        LLR p-value:                 5.665e-17\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0478      0.003     13.880      0.000         0.041     0.055\n",
       "C(fbrace)[T.2.0]    -0.0209      0.003     -6.174      0.000        -0.028    -0.014\n",
       "C(fbrace)[T.3.0]    -0.0079      0.011     -0.728      0.467        -0.029     0.013\n",
       "C(fbrace)[T.4.0]     0.0159      0.004      3.589      0.000         0.007     0.025\n",
       "fhisp               -0.0177      0.003     -5.947      0.000        -0.024    -0.012\n",
       "dmar                 0.0043      0.003      1.667      0.096        -0.001     0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + dmar')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of education disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692836\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.6   103.6 *\n",
      "C(fbrace)[T.3.0]       105.6   104.6  \n",
      "C(fbrace)[T.4.0]       105.6   107.1 *\n",
      "fhisp                  105.6   103.9 *\n",
      "lowed                  105.6   105.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2777435</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2777429</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.992e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:00:55</td>     <th>  Log-Likelihood:    </th> <td>-1.9243e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.9243e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.189e-15</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0546</td> <td>    0.002</td> <td>   34.777</td> <td> 0.000</td> <td>    0.052     0.058</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0198</td> <td>    0.004</td> <td>   -5.634</td> <td> 0.000</td> <td>   -0.027    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0100</td> <td>    0.012</td> <td>   -0.823</td> <td> 0.410</td> <td>   -0.034     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0141</td> <td>    0.005</td> <td>    2.925</td> <td> 0.003</td> <td>    0.005     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0163</td> <td>    0.003</td> <td>   -4.999</td> <td> 0.000</td> <td>   -0.023    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>            <td>   -0.0055</td> <td>    0.004</td> <td>   -1.471</td> <td> 0.141</td> <td>   -0.013     0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2777435\n",
       "Model:                          Logit   Df Residuals:                  2777429\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.992e-05\n",
       "Time:                        15:00:55   Log-Likelihood:            -1.9243e+06\n",
       "converged:                       True   LL-Null:                   -1.9243e+06\n",
       "                                        LLR p-value:                 4.189e-15\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0546      0.002     34.777      0.000         0.052     0.058\n",
       "C(fbrace)[T.2.0]    -0.0198      0.004     -5.634      0.000        -0.027    -0.013\n",
       "C(fbrace)[T.3.0]    -0.0100      0.012     -0.823      0.410        -0.034     0.014\n",
       "C(fbrace)[T.4.0]     0.0141      0.005      2.925      0.003         0.005     0.024\n",
       "fhisp               -0.0163      0.003     -4.999      0.000        -0.023    -0.010\n",
       "lowed               -0.0055      0.004     -1.471      0.141        -0.013     0.002\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + lowed')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of birth order disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692847\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.5 *\n",
      "C(fbrace)[T.3.0]       105.5   104.7  \n",
      "C(fbrace)[T.4.0]       105.5   107.1 *\n",
      "fhisp                  105.5   103.8 *\n",
      "highbo                 105.5   104.8  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3175026</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3175020</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.881e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:01:20</td>     <th>  Log-Likelihood:    </th> <td>-2.1998e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1998e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>2.209e-16</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0531</td> <td>    0.001</td> <td>   36.240</td> <td> 0.000</td> <td>    0.050     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0192</td> <td>    0.003</td> <td>   -5.879</td> <td> 0.000</td> <td>   -0.026    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0074</td> <td>    0.011</td> <td>   -0.683</td> <td> 0.495</td> <td>   -0.029     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0154</td> <td>    0.004</td> <td>    3.457</td> <td> 0.001</td> <td>    0.007     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0163</td> <td>    0.003</td> <td>   -5.586</td> <td> 0.000</td> <td>   -0.022    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>           <td>   -0.0062</td> <td>    0.005</td> <td>   -1.127</td> <td> 0.260</td> <td>   -0.017     0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3175026\n",
       "Model:                          Logit   Df Residuals:                  3175020\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.881e-05\n",
       "Time:                        15:01:20   Log-Likelihood:            -2.1998e+06\n",
       "converged:                       True   LL-Null:                   -2.1998e+06\n",
       "                                        LLR p-value:                 2.209e-16\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0531      0.001     36.240      0.000         0.050     0.056\n",
       "C(fbrace)[T.2.0]    -0.0192      0.003     -5.879      0.000        -0.026    -0.013\n",
       "C(fbrace)[T.3.0]    -0.0074      0.011     -0.683      0.495        -0.029     0.014\n",
       "C(fbrace)[T.4.0]     0.0154      0.004      3.457      0.001         0.007     0.024\n",
       "fhisp               -0.0163      0.003     -5.586      0.000        -0.022    -0.011\n",
       "highbo              -0.0062      0.005     -1.127      0.260        -0.017     0.005\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + highbo')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WIC is no longer associated with more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692838\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.4 *\n",
      "C(fbrace)[T.3.0]       105.5   104.7  \n",
      "C(fbrace)[T.4.0]       105.5   107.1 *\n",
      "wic[T.Y]               105.5   105.6  \n",
      "fhisp                  105.5   103.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2735525</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2735519</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.029e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:02:07</td>     <th>  Log-Likelihood:    </th> <td>-1.8953e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.8953e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.710e-15</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0539</td> <td>    0.002</td> <td>   31.172</td> <td> 0.000</td> <td>    0.050     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0209</td> <td>    0.004</td> <td>   -5.723</td> <td> 0.000</td> <td>   -0.028    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0078</td> <td>    0.012</td> <td>   -0.636</td> <td> 0.525</td> <td>   -0.032     0.016</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0148</td> <td>    0.005</td> <td>    3.044</td> <td> 0.002</td> <td>    0.005     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>         <td>    0.0007</td> <td>    0.003</td> <td>    0.264</td> <td> 0.792</td> <td>   -0.004     0.006</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0181</td> <td>    0.003</td> <td>   -5.484</td> <td> 0.000</td> <td>   -0.025    -0.012</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2735525\n",
       "Model:                          Logit   Df Residuals:                  2735519\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               2.029e-05\n",
       "Time:                        15:02:07   Log-Likelihood:            -1.8953e+06\n",
       "converged:                       True   LL-Null:                   -1.8953e+06\n",
       "                                        LLR p-value:                 3.710e-15\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0539      0.002     31.172      0.000         0.050     0.057\n",
       "C(fbrace)[T.2.0]    -0.0209      0.004     -5.723      0.000        -0.028    -0.014\n",
       "C(fbrace)[T.3.0]    -0.0078      0.012     -0.636      0.525        -0.032     0.016\n",
       "C(fbrace)[T.4.0]     0.0148      0.005      3.044      0.002         0.005     0.024\n",
       "wic[T.Y]             0.0007      0.003      0.264      0.792        -0.004     0.006\n",
       "fhisp               -0.0181      0.003     -5.484      0.000        -0.025    -0.012\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + wic')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of obesity disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692838\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.7   103.5 *\n",
      "C(fbrace)[T.3.0]       105.7   104.2  \n",
      "C(fbrace)[T.4.0]       105.7   107.2 *\n",
      "fhisp                  105.7   103.9 *\n",
      "obese                  105.7   105.1  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2686167</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2686161</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.202e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:02:31</td>     <th>  Log-Likelihood:    </th> <td>-1.8611e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.8611e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.274e-16</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0552</td> <td>    0.002</td> <td>   32.697</td> <td> 0.000</td> <td>    0.052     0.059</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0210</td> <td>    0.004</td> <td>   -5.842</td> <td> 0.000</td> <td>   -0.028    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0137</td> <td>    0.012</td> <td>   -1.109</td> <td> 0.267</td> <td>   -0.038     0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0145</td> <td>    0.005</td> <td>    2.949</td> <td> 0.003</td> <td>    0.005     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0174</td> <td>    0.003</td> <td>   -5.490</td> <td> 0.000</td> <td>   -0.024    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>            <td>   -0.0052</td> <td>    0.003</td> <td>   -1.770</td> <td> 0.077</td> <td>   -0.011     0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2686167\n",
       "Model:                          Logit   Df Residuals:                  2686161\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               2.202e-05\n",
       "Time:                        15:02:31   Log-Likelihood:            -1.8611e+06\n",
       "converged:                       True   LL-Null:                   -1.8611e+06\n",
       "                                        LLR p-value:                 3.274e-16\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0552      0.002     32.697      0.000         0.052     0.059\n",
       "C(fbrace)[T.2.0]    -0.0210      0.004     -5.842      0.000        -0.028    -0.014\n",
       "C(fbrace)[T.3.0]    -0.0137      0.012     -1.109      0.267        -0.038     0.011\n",
       "C(fbrace)[T.4.0]     0.0145      0.005      2.949      0.003         0.005     0.024\n",
       "fhisp               -0.0174      0.003     -5.490      0.000        -0.024    -0.011\n",
       "obese               -0.0052      0.003     -1.770      0.077        -0.011     0.001\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + obese')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of payment method is diminished, but self-payment is still associated with more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692835\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.9   103.6 *\n",
      "C(fbrace)[T.3.0]       105.9   104.7  \n",
      "C(fbrace)[T.4.0]       105.9   107.4 *\n",
      "C(pay_rec)[T.2.0]      105.9   105.3  \n",
      "C(pay_rec)[T.3.0]      105.9   107.0  \n",
      "C(pay_rec)[T.4.0]      105.9   105.7  \n",
      "fhisp                  105.9   103.8 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2763347</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2763339</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.100e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:03:17</td>     <th>  Log-Likelihood:    </th> <td>-1.9145e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.9146e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.132e-14</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0571</td> <td>    0.002</td> <td>   22.914</td> <td> 0.000</td> <td>    0.052     0.062</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th>  <td>   -0.0214</td> <td>    0.004</td> <td>   -5.920</td> <td> 0.000</td> <td>   -0.028    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th>  <td>   -0.0113</td> <td>    0.012</td> <td>   -0.915</td> <td> 0.360</td> <td>   -0.035     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th>  <td>    0.0142</td> <td>    0.005</td> <td>    2.955</td> <td> 0.003</td> <td>    0.005     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>   -0.0050</td> <td>    0.003</td> <td>   -1.839</td> <td> 0.066</td> <td>   -0.010     0.000</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0103</td> <td>    0.007</td> <td>    1.580</td> <td> 0.114</td> <td>   -0.002     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0016</td> <td>    0.006</td> <td>   -0.274</td> <td> 0.784</td> <td>   -0.013     0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>             <td>   -0.0193</td> <td>    0.003</td> <td>   -5.917</td> <td> 0.000</td> <td>   -0.026    -0.013</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2763347\n",
       "Model:                          Logit   Df Residuals:                  2763339\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               2.100e-05\n",
       "Time:                        15:03:17   Log-Likelihood:            -1.9145e+06\n",
       "converged:                       True   LL-Null:                   -1.9146e+06\n",
       "                                        LLR p-value:                 1.132e-14\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0571      0.002     22.914      0.000         0.052     0.062\n",
       "C(fbrace)[T.2.0]     -0.0214      0.004     -5.920      0.000        -0.028    -0.014\n",
       "C(fbrace)[T.3.0]     -0.0113      0.012     -0.915      0.360        -0.035     0.013\n",
       "C(fbrace)[T.4.0]      0.0142      0.005      2.955      0.003         0.005     0.024\n",
       "C(pay_rec)[T.2.0]    -0.0050      0.003     -1.839      0.066        -0.010     0.000\n",
       "C(pay_rec)[T.3.0]     0.0103      0.007      1.580      0.114        -0.002     0.023\n",
       "C(pay_rec)[T.4.0]    -0.0016      0.006     -0.274      0.784        -0.013     0.010\n",
       "fhisp                -0.0193      0.003     -5.917      0.000        -0.026    -0.013\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But the effect of prenatal visits is still a strong predictor of more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692809\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.0 *\n",
      "C(fbrace)[T.3.0]       105.5   104.1  \n",
      "C(fbrace)[T.4.0]       105.5   107.0 *\n",
      "fhisp                  105.5   103.4 *\n",
      "previs                 105.5   104.4 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3097584</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3097578</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.830e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:03:43</td>     <th>  Log-Likelihood:    </th> <td>-2.1460e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1462e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.719e-70</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0532</td> <td>    0.001</td> <td>   36.168</td> <td> 0.000</td> <td>    0.050     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0237</td> <td>    0.003</td> <td>   -7.129</td> <td> 0.000</td> <td>   -0.030    -0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0129</td> <td>    0.011</td> <td>   -1.170</td> <td> 0.242</td> <td>   -0.035     0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0141</td> <td>    0.005</td> <td>    3.112</td> <td> 0.002</td> <td>    0.005     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0193</td> <td>    0.003</td> <td>   -6.533</td> <td> 0.000</td> <td>   -0.025    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0103</td> <td>    0.001</td> <td>  -16.043</td> <td> 0.000</td> <td>   -0.012    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3097584\n",
       "Model:                          Logit   Df Residuals:                  3097578\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               7.830e-05\n",
       "Time:                        15:03:43   Log-Likelihood:            -2.1460e+06\n",
       "converged:                       True   LL-Null:                   -2.1462e+06\n",
       "                                        LLR p-value:                 1.719e-70\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0532      0.001     36.168      0.000         0.050     0.056\n",
       "C(fbrace)[T.2.0]    -0.0237      0.003     -7.129      0.000        -0.030    -0.017\n",
       "C(fbrace)[T.3.0]    -0.0129      0.011     -1.170      0.242        -0.035     0.009\n",
       "C(fbrace)[T.4.0]     0.0141      0.005      3.112      0.002         0.005     0.023\n",
       "fhisp               -0.0193      0.003     -6.533      0.000        -0.025    -0.014\n",
       "previs              -0.0103      0.001    -16.043      0.000        -0.012    -0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692805\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.1 *\n",
      "C(fbrace)[T.3.0]       105.5   104.1  \n",
      "C(fbrace)[T.4.0]       105.5   107.0 *\n",
      "fhisp                  105.5   103.5 *\n",
      "previs                 105.5   104.3 *\n",
      "no_previs              105.5   99.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3097584</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3097577</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.320e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:04:09</td>     <th>  Log-Likelihood:    </th> <td>-2.1460e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1462e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.542e-74</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0536</td> <td>    0.001</td> <td>   36.382</td> <td> 0.000</td> <td>    0.051     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0235</td> <td>    0.003</td> <td>   -7.087</td> <td> 0.000</td> <td>   -0.030    -0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0131</td> <td>    0.011</td> <td>   -1.188</td> <td> 0.235</td> <td>   -0.035     0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0139</td> <td>    0.005</td> <td>    3.070</td> <td> 0.002</td> <td>    0.005     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0191</td> <td>    0.003</td> <td>   -6.468</td> <td> 0.000</td> <td>   -0.025    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0113</td> <td>    0.001</td> <td>  -16.666</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0573</td> <td>    0.012</td> <td>   -4.587</td> <td> 0.000</td> <td>   -0.082    -0.033</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3097584\n",
       "Model:                          Logit   Df Residuals:                  3097577\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.320e-05\n",
       "Time:                        15:04:09   Log-Likelihood:            -2.1460e+06\n",
       "converged:                       True   LL-Null:                   -2.1462e+06\n",
       "                                        LLR p-value:                 4.542e-74\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0536      0.001     36.382      0.000         0.051     0.057\n",
       "C(fbrace)[T.2.0]    -0.0235      0.003     -7.087      0.000        -0.030    -0.017\n",
       "C(fbrace)[T.3.0]    -0.0131      0.011     -1.188      0.235        -0.035     0.009\n",
       "C(fbrace)[T.4.0]     0.0139      0.005      3.070      0.002         0.005     0.023\n",
       "fhisp               -0.0191      0.003     -6.468      0.000        -0.025    -0.013\n",
       "previs              -0.0113      0.001    -16.666      0.000        -0.013    -0.010\n",
       "no_previs           -0.0573      0.012     -4.587      0.000        -0.082    -0.033\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### More controls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692808\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.2   102.6 *\n",
      "C(fbrace)[T.3.0]       105.2   103.8  \n",
      "C(fbrace)[T.4.0]       105.2   106.7 *\n",
      "fhisp                  105.2   103.1 *\n",
      "previs                 105.2   104.1 *\n",
      "dmar                   105.2   105.4  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3097584</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3097577</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.846e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:04:35</td>     <th>  Log-Likelihood:    </th> <td>-2.1460e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1462e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.058e-69</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0506</td> <td>    0.004</td> <td>   14.449</td> <td> 0.000</td> <td>    0.044     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0245</td> <td>    0.003</td> <td>   -7.072</td> <td> 0.000</td> <td>   -0.031    -0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0136</td> <td>    0.011</td> <td>   -1.227</td> <td> 0.220</td> <td>   -0.035     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0142</td> <td>    0.005</td> <td>    3.151</td> <td> 0.002</td> <td>    0.005     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0198</td> <td>    0.003</td> <td>   -6.561</td> <td> 0.000</td> <td>   -0.026    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0103</td> <td>    0.001</td> <td>  -15.969</td> <td> 0.000</td> <td>   -0.012    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>             <td>    0.0022</td> <td>    0.003</td> <td>    0.828</td> <td> 0.408</td> <td>   -0.003     0.007</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3097584\n",
       "Model:                          Logit   Df Residuals:                  3097577\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               7.846e-05\n",
       "Time:                        15:04:35   Log-Likelihood:            -2.1460e+06\n",
       "converged:                       True   LL-Null:                   -2.1462e+06\n",
       "                                        LLR p-value:                 1.058e-69\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0506      0.004     14.449      0.000         0.044     0.057\n",
       "C(fbrace)[T.2.0]    -0.0245      0.003     -7.072      0.000        -0.031    -0.018\n",
       "C(fbrace)[T.3.0]    -0.0136      0.011     -1.227      0.220        -0.035     0.008\n",
       "C(fbrace)[T.4.0]     0.0142      0.005      3.151      0.002         0.005     0.023\n",
       "fhisp               -0.0198      0.003     -6.561      0.000        -0.026    -0.014\n",
       "previs              -0.0103      0.001    -15.969      0.000        -0.012    -0.009\n",
       "dmar                 0.0022      0.003      0.828      0.408        -0.003     0.007\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of payment method disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692799\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.7   103.1 *\n",
      "C(fbrace)[T.3.0]       105.7   104.0  \n",
      "C(fbrace)[T.4.0]       105.7   107.0 *\n",
      "C(pay_rec)[T.2.0]      105.7   105.6  \n",
      "C(pay_rec)[T.3.0]      105.7   105.7  \n",
      "C(pay_rec)[T.4.0]      105.7   105.3  \n",
      "fhisp                  105.7   103.6 *\n",
      "previs                 105.7   104.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2679860</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2679851</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     8</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.905e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:05:21</td>     <th>  Log-Likelihood:    </th> <td>-1.8566e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.8568e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>9.714e-59</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0553</td> <td>    0.003</td> <td>   21.819</td> <td> 0.000</td> <td>    0.050     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th>  <td>   -0.0248</td> <td>    0.004</td> <td>   -6.723</td> <td> 0.000</td> <td>   -0.032    -0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th>  <td>   -0.0166</td> <td>    0.012</td> <td>   -1.326</td> <td> 0.185</td> <td>   -0.041     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th>  <td>    0.0128</td> <td>    0.005</td> <td>    2.610</td> <td> 0.009</td> <td>    0.003     0.022</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>   -0.0012</td> <td>    0.003</td> <td>   -0.436</td> <td> 0.663</td> <td>   -0.007     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td> 3.729e-05</td> <td>    0.007</td> <td>    0.006</td> <td> 0.996</td> <td>   -0.013     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0035</td> <td>    0.006</td> <td>   -0.589</td> <td> 0.556</td> <td>   -0.015     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>             <td>   -0.0203</td> <td>    0.003</td> <td>   -6.114</td> <td> 0.000</td> <td>   -0.027    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>            <td>   -0.0103</td> <td>    0.001</td> <td>  -14.715</td> <td> 0.000</td> <td>   -0.012    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2679860\n",
       "Model:                          Logit   Df Residuals:                  2679851\n",
       "Method:                           MLE   Df Model:                            8\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               7.905e-05\n",
       "Time:                        15:05:21   Log-Likelihood:            -1.8566e+06\n",
       "converged:                       True   LL-Null:                   -1.8568e+06\n",
       "                                        LLR p-value:                 9.714e-59\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0553      0.003     21.819      0.000         0.050     0.060\n",
       "C(fbrace)[T.2.0]     -0.0248      0.004     -6.723      0.000        -0.032    -0.018\n",
       "C(fbrace)[T.3.0]     -0.0166      0.012     -1.326      0.185        -0.041     0.008\n",
       "C(fbrace)[T.4.0]      0.0128      0.005      2.610      0.009         0.003     0.022\n",
       "C(pay_rec)[T.2.0]    -0.0012      0.003     -0.436      0.663        -0.007     0.004\n",
       "C(pay_rec)[T.3.0]  3.729e-05      0.007      0.006      0.996        -0.013     0.013\n",
       "C(pay_rec)[T.4.0]    -0.0035      0.006     -0.589      0.556        -0.015     0.008\n",
       "fhisp                -0.0203      0.003     -6.114      0.000        -0.027    -0.014\n",
       "previs               -0.0103      0.001    -14.715      0.000        -0.012    -0.009\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a version with the addition of a boolean for no prenatal visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692805\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.1 *\n",
      "C(fbrace)[T.3.0]       105.5   104.1  \n",
      "C(fbrace)[T.4.0]       105.5   107.0 *\n",
      "fhisp                  105.5   103.5 *\n",
      "previs                 105.5   104.3 *\n",
      "no_previs              105.5   99.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3097584</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3097577</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.320e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:05:46</td>     <th>  Log-Likelihood:    </th> <td>-2.1460e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1462e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.542e-74</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0536</td> <td>    0.001</td> <td>   36.382</td> <td> 0.000</td> <td>    0.051     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0235</td> <td>    0.003</td> <td>   -7.087</td> <td> 0.000</td> <td>   -0.030    -0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0131</td> <td>    0.011</td> <td>   -1.188</td> <td> 0.235</td> <td>   -0.035     0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0139</td> <td>    0.005</td> <td>    3.070</td> <td> 0.002</td> <td>    0.005     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0191</td> <td>    0.003</td> <td>   -6.468</td> <td> 0.000</td> <td>   -0.025    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0113</td> <td>    0.001</td> <td>  -16.666</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0573</td> <td>    0.012</td> <td>   -4.587</td> <td> 0.000</td> <td>   -0.082    -0.033</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3097584\n",
       "Model:                          Logit   Df Residuals:                  3097577\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.320e-05\n",
       "Time:                        15:05:46   Log-Likelihood:            -2.1460e+06\n",
       "converged:                       True   LL-Null:                   -2.1462e+06\n",
       "                                        LLR p-value:                 4.542e-74\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0536      0.001     36.382      0.000         0.051     0.057\n",
       "C(fbrace)[T.2.0]    -0.0235      0.003     -7.087      0.000        -0.030    -0.017\n",
       "C(fbrace)[T.3.0]    -0.0131      0.011     -1.188      0.235        -0.035     0.009\n",
       "C(fbrace)[T.4.0]     0.0139      0.005      3.070      0.002         0.005     0.023\n",
       "fhisp               -0.0191      0.003     -6.468      0.000        -0.025    -0.013\n",
       "previs              -0.0113      0.001    -16.666      0.000        -0.013    -0.010\n",
       "no_previs           -0.0573      0.012     -4.587      0.000        -0.082    -0.033\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, surprisingly, the mother's age has a small effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692805\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       106.2   103.7 *\n",
      "C(fbrace)[T.3.0]       106.2   104.8  \n",
      "C(fbrace)[T.4.0]       106.2   107.8 *\n",
      "fhisp                  106.2   104.2 *\n",
      "previs                 106.2   105.0 *\n",
      "no_previs              106.2   100.3 *\n",
      "mager9                 106.2   106.1  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3097584</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3097576</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.378e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:13</td>     <th>  Log-Likelihood:    </th> <td>-2.1460e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1462e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.081e-73</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0603</td> <td>    0.004</td> <td>   13.417</td> <td> 0.000</td> <td>    0.051     0.069</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0241</td> <td>    0.003</td> <td>   -7.209</td> <td> 0.000</td> <td>   -0.031    -0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0139</td> <td>    0.011</td> <td>   -1.255</td> <td> 0.209</td> <td>   -0.036     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0144</td> <td>    0.005</td> <td>    3.176</td> <td> 0.001</td> <td>    0.006     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0196</td> <td>    0.003</td> <td>   -6.592</td> <td> 0.000</td> <td>   -0.025    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0113</td> <td>    0.001</td> <td>  -16.525</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0571</td> <td>    0.012</td> <td>   -4.578</td> <td> 0.000</td> <td>   -0.082    -0.033</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>           <td>   -0.0015</td> <td>    0.001</td> <td>   -1.573</td> <td> 0.116</td> <td>   -0.003     0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3097584\n",
       "Model:                          Logit   Df Residuals:                  3097576\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.378e-05\n",
       "Time:                        15:06:13   Log-Likelihood:            -2.1460e+06\n",
       "converged:                       True   LL-Null:                   -2.1462e+06\n",
       "                                        LLR p-value:                 1.081e-73\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0603      0.004     13.417      0.000         0.051     0.069\n",
       "C(fbrace)[T.2.0]    -0.0241      0.003     -7.209      0.000        -0.031    -0.018\n",
       "C(fbrace)[T.3.0]    -0.0139      0.011     -1.255      0.209        -0.036     0.008\n",
       "C(fbrace)[T.4.0]     0.0144      0.005      3.176      0.001         0.006     0.023\n",
       "fhisp               -0.0196      0.003     -6.592      0.000        -0.025    -0.014\n",
       "previs              -0.0113      0.001    -16.525      0.000        -0.013    -0.010\n",
       "no_previs           -0.0571      0.012     -4.578      0.000        -0.082    -0.033\n",
       "mager9              -0.0015      0.001     -1.573      0.116        -0.003     0.000\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So does the father's age.  But both age effects are small and borderline significant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692804\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       106.4   103.8 *\n",
      "C(fbrace)[T.3.0]       106.4   105.0  \n",
      "C(fbrace)[T.4.0]       106.4   107.9 *\n",
      "fhisp                  106.4   104.3 *\n",
      "previs                 106.4   105.2 *\n",
      "no_previs              106.4   100.4 *\n",
      "fagerrec11             106.4   106.2 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3088740</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3088732</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.510e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:39</td>     <th>  Log-Likelihood:    </th> <td>-2.1399e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1401e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.099e-74</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0620</td> <td>    0.004</td> <td>   14.546</td> <td> 0.000</td> <td>    0.054     0.070</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0243</td> <td>    0.003</td> <td>   -7.284</td> <td> 0.000</td> <td>   -0.031    -0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0137</td> <td>    0.011</td> <td>   -1.236</td> <td> 0.217</td> <td>   -0.035     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0143</td> <td>    0.005</td> <td>    3.143</td> <td> 0.002</td> <td>    0.005     0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0197</td> <td>    0.003</td> <td>   -6.622</td> <td> 0.000</td> <td>   -0.026    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0113</td> <td>    0.001</td> <td>  -16.639</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0581</td> <td>    0.013</td> <td>   -4.637</td> <td> 0.000</td> <td>   -0.083    -0.034</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fagerrec11</th>       <td>   -0.0017</td> <td>    0.001</td> <td>   -2.082</td> <td> 0.037</td> <td>   -0.003    -0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3088740\n",
       "Model:                          Logit   Df Residuals:                  3088732\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               8.510e-05\n",
       "Time:                        15:06:39   Log-Likelihood:            -2.1399e+06\n",
       "converged:                       True   LL-Null:                   -2.1401e+06\n",
       "                                        LLR p-value:                 1.099e-74\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0620      0.004     14.546      0.000         0.054     0.070\n",
       "C(fbrace)[T.2.0]    -0.0243      0.003     -7.284      0.000        -0.031    -0.018\n",
       "C(fbrace)[T.3.0]    -0.0137      0.011     -1.236      0.217        -0.035     0.008\n",
       "C(fbrace)[T.4.0]     0.0143      0.005      3.143      0.002         0.005     0.023\n",
       "fhisp               -0.0197      0.003     -6.622      0.000        -0.026    -0.014\n",
       "previs              -0.0113      0.001    -16.639      0.000        -0.013    -0.010\n",
       "no_previs           -0.0581      0.013     -4.637      0.000        -0.083    -0.034\n",
       "fagerrec11          -0.0017      0.001     -2.082      0.037        -0.003    -0.000\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What's up with prenatal visits?\n",
    "\n",
    "The predictive power of prenatal visits is still surprising to me.  To make sure we're controlled for race, I'll select cases where both parents are white:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2381977"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "white = df[(df.mbrace==1) & (df.fbrace==1)]\n",
    "len(white)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And compute sex ratios for each level of `previs`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>previs</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-6</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-5</th>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-4</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-3</th>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-2</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-1</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "previs     \n",
       "-6      106\n",
       "-5      110\n",
       "-4      108\n",
       "-3      109\n",
       "-2      108\n",
       "-1      107\n",
       " 0      105\n",
       " 1      103\n",
       " 2      102\n",
       " 3      100\n",
       " 4      103"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'previs'\n",
    "white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect holds up.  People with fewer than average prenatal visits are substantially more likely to have boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692804\n",
      "         Iterations 3\n",
      "previs                 105.1   103.8 *\n",
      "no_previs              105.1   98.9 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2320227</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2320224</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.584e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:43</td>     <th>  Log-Likelihood:    </th> <td>-1.6075e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6076e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.073e-46</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0493</td> <td>    0.001</td> <td>   37.359</td> <td> 0.000</td> <td>    0.047     0.052</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0116</td> <td>    0.001</td> <td>  -14.535</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th> <td>   -0.0608</td> <td>    0.015</td> <td>   -3.966</td> <td> 0.000</td> <td>   -0.091    -0.031</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2320227\n",
       "Model:                          Logit   Df Residuals:                  2320224\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               6.584e-05\n",
       "Time:                        15:06:43   Log-Likelihood:            -1.6075e+06\n",
       "converged:                       True   LL-Null:                   -1.6076e+06\n",
       "                                        LLR p-value:                 1.073e-46\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0493      0.001     37.359      0.000         0.047     0.052\n",
       "previs        -0.0116      0.001    -14.535      0.000        -0.013    -0.010\n",
       "no_previs     -0.0608      0.015     -3.966      0.000        -0.091    -0.031\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ previs + no_previs')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.04929183382635937, -0.011584489975776435)"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inter = results.params['Intercept']\n",
    "slope = results.params['previs']\n",
    "inter, slope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 111.31727637,  110.03516315,  108.76781686,  107.51506742,\n",
       "        106.2767467 ,  105.05268853,  103.84272863,  102.64670462,\n",
       "        101.46445599,  100.29582409])"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "previs = np.arange(-5, 5)\n",
    "logodds = inter + slope * previs\n",
    "odds = np.exp(logodds)\n",
    "odds * 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692845\n",
      "         Iterations 3\n",
      "dmar                   105.2   105.1  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2381977</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2381975</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.675e-08</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:46</td>     <th>  Log-Likelihood:    </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.7276</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0505</td> <td>    0.004</td> <td>   12.847</td> <td> 0.000</td> <td>    0.043     0.058</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>      <td>   -0.0010</td> <td>    0.003</td> <td>   -0.348</td> <td> 0.728</td> <td>   -0.007     0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2381977\n",
       "Model:                          Logit   Df Residuals:                  2381975\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               3.675e-08\n",
       "Time:                        15:06:46   Log-Likelihood:            -1.6503e+06\n",
       "converged:                       True   LL-Null:                   -1.6503e+06\n",
       "                                        LLR p-value:                    0.7276\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0505      0.004     12.847      0.000         0.043     0.058\n",
       "dmar          -0.0010      0.003     -0.348      0.728        -0.007     0.005\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ dmar')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692830\n",
      "         Iterations 3\n",
      "lowed                  105.3   103.9 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2089901</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2089899</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.146e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:48</td>     <th>  Log-Likelihood:    </th> <td>-1.4479e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.4480e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0005303</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0520</td> <td>    0.001</td> <td>   35.035</td> <td> 0.000</td> <td>    0.049     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>     <td>   -0.0142</td> <td>    0.004</td> <td>   -3.465</td> <td> 0.001</td> <td>   -0.022    -0.006</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2089901\n",
       "Model:                          Logit   Df Residuals:                  2089899\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               4.146e-06\n",
       "Time:                        15:06:48   Log-Likelihood:            -1.4479e+06\n",
       "converged:                       True   LL-Null:                   -1.4480e+06\n",
       "                                        LLR p-value:                 0.0005303\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0520      0.001     35.035      0.000         0.049     0.055\n",
       "lowed         -0.0142      0.004     -3.465      0.001        -0.022    -0.006\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ lowed')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692845\n",
      "         Iterations 3\n",
      "highbo                 105.1   104.1  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2373894</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2373892</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.498e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:06:50</td>     <th>  Log-Likelihood:    </th> <td>-1.6447e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6447e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.1437</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0496</td> <td>    0.001</td> <td>   37.359</td> <td> 0.000</td> <td>    0.047     0.052</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>    <td>   -0.0095</td> <td>    0.006</td> <td>   -1.462</td> <td> 0.144</td> <td>   -0.022     0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2373894\n",
       "Model:                          Logit   Df Residuals:                  2373892\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               6.498e-07\n",
       "Time:                        15:06:50   Log-Likelihood:            -1.6447e+06\n",
       "converged:                       True   LL-Null:                   -1.6447e+06\n",
       "                                        LLR p-value:                    0.1437\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0496      0.001     37.359      0.000         0.047     0.052\n",
       "highbo        -0.0095      0.006     -1.462      0.144        -0.022     0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ highbo')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692836\n",
      "         Iterations 3\n",
      "wic[T.Y]               105.3   104.8  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2059437</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2059435</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.267e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:06</td>     <th>  Log-Likelihood:    </th> <td>-1.4269e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.4269e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.05720</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0519</td> <td>    0.002</td> <td>   29.448</td> <td> 0.000</td> <td>    0.048     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>  <td>   -0.0055</td> <td>    0.003</td> <td>   -1.902</td> <td> 0.057</td> <td>   -0.011     0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2059437\n",
       "Model:                          Logit   Df Residuals:                  2059435\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               1.267e-06\n",
       "Time:                        15:07:06   Log-Likelihood:            -1.4269e+06\n",
       "converged:                       True   LL-Null:                   -1.4269e+06\n",
       "                                        LLR p-value:                   0.05720\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0519      0.002     29.448      0.000         0.048     0.055\n",
       "wic[T.Y]      -0.0055      0.003     -1.902      0.057        -0.011     0.000\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ wic')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692834\n",
      "         Iterations 3\n",
      "obese                  105.2   104.8  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2029161</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2029159</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.153e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:08</td>     <th>  Log-Likelihood:    </th> <td>-1.4059e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.4059e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2798</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0509</td> <td>    0.002</td> <td>   31.979</td> <td> 0.000</td> <td>    0.048     0.054</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>     <td>   -0.0037</td> <td>    0.003</td> <td>   -1.081</td> <td> 0.280</td> <td>   -0.010     0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2029161\n",
       "Model:                          Logit   Df Residuals:                  2029159\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               4.153e-07\n",
       "Time:                        15:07:08   Log-Likelihood:            -1.4059e+06\n",
       "converged:                       True   LL-Null:                   -1.4059e+06\n",
       "                                        LLR p-value:                    0.2798\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0509      0.002     31.979      0.000         0.048     0.054\n",
       "obese         -0.0037      0.003     -1.081      0.280        -0.010     0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ obese')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 109,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692834\n",
      "         Iterations 3\n",
      "C(pay_rec)[T.2.0]      105.0   105.2  \n",
      "C(pay_rec)[T.3.0]      105.0   105.8  \n",
      "C(pay_rec)[T.4.0]      105.0   104.8  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2077652</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2077648</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.425e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:23</td>     <th>  Log-Likelihood:    </th> <td>-1.4395e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.4395e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.6681</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0486</td> <td>    0.002</td> <td>   20.446</td> <td> 0.000</td> <td>    0.044     0.053</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>    0.0021</td> <td>    0.003</td> <td>    0.684</td> <td> 0.494</td> <td>   -0.004     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0076</td> <td>    0.007</td> <td>    1.036</td> <td> 0.300</td> <td>   -0.007     0.022</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0020</td> <td>    0.007</td> <td>   -0.296</td> <td> 0.767</td> <td>   -0.015     0.011</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2077652\n",
       "Model:                          Logit   Df Residuals:                  2077648\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               5.425e-07\n",
       "Time:                        15:07:23   Log-Likelihood:            -1.4395e+06\n",
       "converged:                       True   LL-Null:                   -1.4395e+06\n",
       "                                        LLR p-value:                    0.6681\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0486      0.002     20.446      0.000         0.044     0.053\n",
       "C(pay_rec)[T.2.0]     0.0021      0.003      0.684      0.494        -0.004     0.008\n",
       "C(pay_rec)[T.3.0]     0.0076      0.007      1.036      0.300        -0.007     0.022\n",
       "C(pay_rec)[T.4.0]    -0.0020      0.007     -0.296      0.767        -0.015     0.011\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 109,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(pay_rec)')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692845\n",
      "         Iterations 3\n",
      "mager9                 105.8   105.6  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2381977</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2381975</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.201e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:27</td>     <th>  Log-Likelihood:    </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.1525</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0559</td> <td>    0.005</td> <td>   11.397</td> <td> 0.000</td> <td>    0.046     0.066</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>    <td>   -0.0016</td> <td>    0.001</td> <td>   -1.431</td> <td> 0.153</td> <td>   -0.004     0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2381977\n",
       "Model:                          Logit   Df Residuals:                  2381975\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               6.201e-07\n",
       "Time:                        15:07:27   Log-Likelihood:            -1.6503e+06\n",
       "converged:                       True   LL-Null:                   -1.6503e+06\n",
       "                                        LLR p-value:                    0.1525\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0559      0.005     11.397      0.000         0.046     0.066\n",
       "mager9        -0.0016      0.001     -1.431      0.153        -0.004     0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ mager9')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692844\n",
      "         Iterations 3\n",
      "youngm[T.True]         105.0   106.0  \n",
      "oldm[T.True]           105.0   104.9  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2381977</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2381974</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>9.503e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:30</td>     <th>  Log-Likelihood:    </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6503e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2084</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0486</td> <td>    0.001</td> <td>   35.884</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngm[T.True]</th> <td>    0.0101</td> <td>    0.006</td> <td>    1.766</td> <td> 0.077</td> <td>   -0.001     0.021</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldm[T.True]</th>   <td>   -0.0004</td> <td>    0.008</td> <td>   -0.055</td> <td> 0.956</td> <td>   -0.015     0.014</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2381977\n",
       "Model:                          Logit   Df Residuals:                  2381974\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               9.503e-07\n",
       "Time:                        15:07:30   Log-Likelihood:            -1.6503e+06\n",
       "converged:                       True   LL-Null:                   -1.6503e+06\n",
       "                                        LLR p-value:                    0.2084\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0486      0.001     35.884      0.000         0.046     0.051\n",
       "youngm[T.True]     0.0101      0.006      1.766      0.077        -0.001     0.021\n",
       "oldm[T.True]      -0.0004      0.008     -0.055      0.956        -0.015     0.014\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ youngm + oldm')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692843\n",
      "         Iterations 3\n",
      "youngf                 105.1   105.6  \n",
      "oldf                   105.1   104.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2376438</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2376435</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Wed, 18 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.327e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:07:34</td>     <th>  Log-Likelihood:    </th> <td>-1.6465e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6465e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2993</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0495</td> <td>    0.001</td> <td>   37.030</td> <td> 0.000</td> <td>    0.047     0.052</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngf</th>    <td>    0.0053</td> <td>    0.008</td> <td>    0.652</td> <td> 0.514</td> <td>   -0.011     0.021</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldf</th>      <td>   -0.0107</td> <td>    0.008</td> <td>   -1.390</td> <td> 0.164</td> <td>   -0.026     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2376438\n",
       "Model:                          Logit   Df Residuals:                  2376435\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Wed, 18 May 2016   Pseudo R-squ.:               7.327e-07\n",
       "Time:                        15:07:34   Log-Likelihood:            -1.6465e+06\n",
       "converged:                       True   LL-Null:                   -1.6465e+06\n",
       "                                        LLR p-value:                    0.2993\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0495      0.001     37.030      0.000         0.047     0.052\n",
       "youngf         0.0053      0.008      0.652      0.514        -0.011     0.021\n",
       "oldf          -0.0107      0.008     -1.390      0.164        -0.026     0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ youngf + oldf')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}