{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Does Trivers-Willard apply to people?\n",
    "\n",
    "This notebook contains a \"one-day paper\", my attempt to pose a research question, answer it, and publish the results in one work day.\n",
    "\n",
    "Copyright 2016 Allen B. Downey\n",
    "\n",
    "MIT License: https://opensource.org/licenses/MIT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function, division\n",
    "\n",
    "import thinkstats2\n",
    "import thinkplot\n",
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "import statsmodels.formula.api as smf\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Trivers-Willard\n",
    "\n",
    "[According to Wikipedia](https://en.wikipedia.org/wiki/Trivers%E2%80%93Willard_hypothesis), the Trivers-Willard hypothesis:\n",
    "\n",
    ">\"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition).\"\n",
    "\n",
    "For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys.  Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.\n",
    "\n",
    "To test whether the T-W hypothesis holds up in humans, I downloaded [birth data for the nearly 4 million babies born in the U.S. in 2014](http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Births).\n",
    "\n",
    "I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Summary of results**\n",
    "\n",
    "1.  Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.\n",
    "\n",
    "2.  However, many of the variables are also correlated with race.  If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.\n",
    "\n",
    "3.  Contrary to other reports, the age of the parents seems to have no predictive power.\n",
    "\n",
    "4.  Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits.  Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).\n",
    "\n",
    "Following convention, I report sex ratio in terms of boys per 100 girls.  The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data cleaning\n",
    "\n",
    "Here's how I loaded the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "names = ['year', 'mager9', 'mnativ', 'restatus', 'mbrace', 'mhisp_r',\n",
    "        'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', \n",
    "        'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']\n",
    "colspecs = [(9, 12),\n",
    "            (79, 79),\n",
    "            (84, 84),\n",
    "            (104, 104),\n",
    "            (110, 110),\n",
    "            (115, 115),\n",
    "            (119, 119),\n",
    "            (120, 120),\n",
    "            (124, 124),\n",
    "            (149, 150),\n",
    "            (156, 156),\n",
    "            (160, 160),\n",
    "            (163, 163),\n",
    "            (179, 179),\n",
    "            (242, 243),\n",
    "            (251, 251),\n",
    "            (280, 281),\n",
    "            (287, 287),\n",
    "            (436, 436),\n",
    "            (475, 475),\n",
    "           ]\n",
    "\n",
    "colspecs = [(start-1, end) for start, end in colspecs]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "filename = 'Nat2014PublicUS.c20150514.r20151022.txt.gz'\n",
    "#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)\n",
    "#df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# store the dataframe for faster loading\n",
    "\n",
    "#store = pd.HDFStore('store.h5')\n",
    "#store['births2014'] = df\n",
    "#store.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# load the dataframe\n",
    "\n",
    "store = pd.HDFStore('store.h5')\n",
    "df = store['births2014']\n",
    "store.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def series_to_ratio(series):\n",
    "    \"\"\"Takes a boolean series and computes sex ratio.\n",
    "    \"\"\"\n",
    "    boys = np.mean(series)\n",
    "    return np.round(100 * boys / (1-boys)).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I have to recode sex as `0` or `1` to make `logit` happy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1952273\n",
       "1    2045902\n",
       "Name: boy, dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['boy'] = (df.sex=='M').astype(int)\n",
    "df.boy.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All births are from 2014."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2014    3998175\n",
       "Name: year, dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.year.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's age:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1       2777\n",
       "2     249581\n",
       "3     884246\n",
       "4    1148469\n",
       "5    1084064\n",
       "6     510214\n",
       "7     110318\n",
       "8       7750\n",
       "9        756\n",
       "Name: mager9, dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mager9.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mager9</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mager9     \n",
       "1       109\n",
       "2       105\n",
       "3       105\n",
       "4       105\n",
       "5       105\n",
       "6       105\n",
       "7       104\n",
       "8       104\n",
       "9       102"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mager9'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mager9.isnull().mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.06311829772333627, 0.029719559549044251)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['youngm'] = df.mager9<=2\n",
    "df['oldm'] = df.mager9>=7\n",
    "df.youngm.mean(), df.oldm.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's nativity (1 = born in the U.S.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    3106689\n",
       "2     881662\n",
       "Name: mnativ, dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mnativ.replace([3], np.nan, inplace=True)\n",
    "df.mnativ.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mnativ</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mnativ     \n",
       "1       105\n",
       "2       105"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mnativ'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Residence status (1=resident)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2873404\n",
       "2    1025766\n",
       "3      88906\n",
       "4      10099\n",
       "Name: restatus, dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.restatus.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>restatus</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          boy\n",
       "restatus     \n",
       "1         105\n",
       "2         105\n",
       "3         106\n",
       "4         106"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'restatus'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    3029013\n",
       "2     641089\n",
       "3      44962\n",
       "4     283111\n",
       "Name: mbrace, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mbrace.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mbrace</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mbrace     \n",
       "1       105\n",
       "2       103\n",
       "3       103\n",
       "4       106"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mbrace'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's Hispanic origin (0=Non-Hispanic)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3045419\n",
       "1     553738\n",
       "2      69894\n",
       "3      20165\n",
       "4     136785\n",
       "5     141497\n",
       "Name: mhisp_r, dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mhisp_r.replace([9], np.nan, inplace=True)\n",
    "df.mhisp_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def copy_null(df, oldvar, newvar):\n",
    "    df.loc[df[oldvar].isnull(), newvar] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.0076727506925034546, 0.23240818268843488)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mhisp'] = df.mhisp_r > 0\n",
    "copy_null(df, 'mhisp_r', 'mhisp')\n",
    "df.mhisp.isnull().mean(), df.mhisp.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mhisp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mhisp     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mhisp'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Marital status (1=Married)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2390630\n",
       "2    1607545\n",
       "Name: dmar, dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dmar.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dmar</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      boy\n",
       "dmar     \n",
       "1     105\n",
       "2     104"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'dmar'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).\n",
    "\n",
    "I recode X (not applicable because married) as Y (paternity acknowledged)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "N     462627\n",
       "Y    3386542\n",
       "Name: mar_p, dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mar_p.replace(['U'], np.nan, inplace=True)\n",
    "df.mar_p.replace(['X'], 'Y', inplace=True)\n",
    "df.mar_p.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mar_p</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mar_p     \n",
       "N      103\n",
       "Y      105"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mar_p'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's education level"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    138589\n",
       "2    437081\n",
       "3    957265\n",
       "4    815688\n",
       "5    308384\n",
       "6    732661\n",
       "7    326800\n",
       "8     94057\n",
       "Name: meduc, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.meduc.replace([9], np.nan, inplace=True)\n",
    "df.meduc.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>meduc</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "meduc     \n",
       "1      104\n",
       "2      104\n",
       "3      105\n",
       "4      105\n",
       "5      105\n",
       "6      105\n",
       "7      105\n",
       "8      104"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'meduc'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.046933913598079122, 0.15107367095085322)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['lowed'] = df.meduc <= 2\n",
    "copy_null(df, 'meduc', 'lowed')\n",
    "df.lowed.isnull().mean(), df.lowed.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's age, in 10 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1         277\n",
       "2       84852\n",
       "3      498779\n",
       "4      869280\n",
       "5     1025631\n",
       "6      631685\n",
       "7      262169\n",
       "8       87432\n",
       "9       28465\n",
       "10      12490\n",
       "Name: fagerrec11, dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fagerrec11.replace([11], np.nan, inplace=True)\n",
    "df.fagerrec11.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fagerrec11</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            boy\n",
       "fagerrec11     \n",
       "1           102\n",
       "2           106\n",
       "3           106\n",
       "4           105\n",
       "5           105\n",
       "6           105\n",
       "7           105\n",
       "8           105\n",
       "9           104\n",
       "10          109"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fagerrec11'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.12433547806186572, 0.024315207394332003)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['youngf'] = df.fagerrec11<=2\n",
    "copy_null(df, 'fagerrec11', 'youngf')\n",
    "df.youngf.isnull().mean(), df.youngf.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.12433547806186572, 0.036670893957829916)"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['oldf'] = df.fagerrec11>=8\n",
    "copy_null(df, 'fagerrec11', 'oldf')\n",
    "df.oldf.isnull().mean(), df.oldf.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's race"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    2497901\n",
       "2     482433\n",
       "3      35408\n",
       "4     238394\n",
       "Name: fbrace, dtype: int64"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fbrace.replace([9], np.nan, inplace=True)\n",
    "df.fbrace.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fbrace</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "fbrace     \n",
       "1       105\n",
       "2       103\n",
       "3       103\n",
       "4       107"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fbrace'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2649007\n",
       "1     493497\n",
       "2      59137\n",
       "3      19128\n",
       "4     108111\n",
       "5     124172\n",
       "Name: fhisp_r, dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fhisp_r.replace([9], np.nan, inplace=True)\n",
    "df.fhisp_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.13634295647389122, 0.23285053338322156)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['fhisp'] = df.fhisp_r > 0\n",
    "copy_null(df, 'fhisp_r', 'fhisp')\n",
    "df.fhisp.isnull().mean(), df.fhisp.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>fhisp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "fhisp     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'fhisp'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's education level"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    141654\n",
       "2    342061\n",
       "3    951980\n",
       "4    643118\n",
       "5    232622\n",
       "6    616187\n",
       "7    242022\n",
       "8    109482\n",
       "Name: feduc, dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.feduc.replace([9], np.nan, inplace=True)\n",
    "df.feduc.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>feduc</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "feduc     \n",
       "1      104\n",
       "2      105\n",
       "3      105\n",
       "4      105\n",
       "5      106\n",
       "6      105\n",
       "7      105\n",
       "8      105"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'feduc'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Live birth order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    1555006\n",
       "2    1270496\n",
       "3     669016\n",
       "4     284435\n",
       "5     110708\n",
       "6      46093\n",
       "7      20786\n",
       "8      21610\n",
       "Name: lbo_rec, dtype: int64"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.lbo_rec.replace([9], np.nan, inplace=True)\n",
    "df.lbo_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>lbo_rec</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         boy\n",
       "lbo_rec     \n",
       "1        105\n",
       "2        105\n",
       "3        105\n",
       "4        105\n",
       "5        104\n",
       "6        104\n",
       "7        104\n",
       "8        102"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'lbo_rec'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.0050085351441595226, 0.050072772519889897)"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['highbo'] = df.lbo_rec >= 5\n",
    "copy_null(df, 'lbo_rec', 'highbo')\n",
    "df.highbo.isnull().mean(), df.highbo.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Number of prenatal visits, in 11 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1      59670\n",
       "2      44923\n",
       "3      98141\n",
       "4     201032\n",
       "5     366887\n",
       "6     826908\n",
       "7     998330\n",
       "8     684997\n",
       "9     379305\n",
       "10     99067\n",
       "11    128805\n",
       "Name: previs_rec, dtype: int64"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.previs_rec.replace([12], np.nan, inplace=True)\n",
    "df.previs_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df.previs_rec.mean()\n",
    "df['previs'] = df.previs_rec - 7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>previs</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-6</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-5</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-4</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-3</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-2</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-1</th>\n",
       "      <td>106</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "previs     \n",
       "-6      105\n",
       "-5      107\n",
       "-4      107\n",
       "-3      108\n",
       "-2      107\n",
       "-1      106\n",
       " 0      105\n",
       " 1      103\n",
       " 2      102\n",
       " 3      102\n",
       " 4      102"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'previs'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.027540065154726845, 0.015346965650008423)"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['no_previs'] = df.previs_rec <= 1\n",
    "copy_null(df, 'previs_rec', 'no_previs')\n",
    "df.no_previs.isnull().mean(), df.no_previs.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whether the mother is eligible for food stamps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "N    2124143\n",
       "Y    1634978\n",
       "Name: wic, dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.wic.replace(['U'], np.nan, inplace=True)\n",
    "df.wic.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>wic</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     boy\n",
       "wic     \n",
       "N    105\n",
       "Y    104"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'wic'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's height in inches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "30        28\n",
       "31         1\n",
       "34         2\n",
       "36        14\n",
       "37         7\n",
       "38         7\n",
       "39         7\n",
       "40         6\n",
       "41        10\n",
       "42        13\n",
       "43         3\n",
       "44         8\n",
       "45        11\n",
       "46        14\n",
       "47        22\n",
       "48       857\n",
       "49       544\n",
       "50       357\n",
       "51       422\n",
       "52       493\n",
       "53      1503\n",
       "54      1414\n",
       "55      2762\n",
       "56      6678\n",
       "57     18359\n",
       "58     21019\n",
       "59     81588\n",
       "60    209490\n",
       "61    269142\n",
       "62    474306\n",
       "63    485840\n",
       "64    559249\n",
       "65    453503\n",
       "66    429253\n",
       "67    334485\n",
       "68    189690\n",
       "69    127789\n",
       "70     62364\n",
       "71     33428\n",
       "72     15323\n",
       "73      5200\n",
       "74      2538\n",
       "75      1019\n",
       "76       590\n",
       "77       593\n",
       "78       941\n",
       "Name: height, dtype: int64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.height.replace([99], np.nan, inplace=True)\n",
    "df.height.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.051844404009329256, 0.0359147662344377)"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mshort'] = df.height<60\n",
    "copy_null(df, 'height', 'mshort')\n",
    "df.mshort.isnull().mean(), df.mshort.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.051844404009329256, 0.03218134412692316)"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['mtall'] = df.height>=70\n",
    "copy_null(df, 'height', 'mtall')\n",
    "df.mtall.isnull().mean(), df.mtall.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mshort</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "mshort     \n",
       "0       105\n",
       "1       104"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mshort'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mtall</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "mtall     \n",
       "0      105\n",
       "1      104"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'mtall'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's BMI in 6 ranges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1     140142\n",
       "2    1702519\n",
       "3     949075\n",
       "4     506017\n",
       "5     242957\n",
       "6     168515\n",
       "Name: bmi_r, dtype: int64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.bmi_r.replace([9], np.nan, inplace=True)\n",
    "df.bmi_r.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bmi_r</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       boy\n",
       "bmi_r     \n",
       "1      105\n",
       "2      105\n",
       "3      105\n",
       "4      104\n",
       "5      104\n",
       "6      104"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'bmi_r'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.07227047340349034, 0.2473532880857861)"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['obese'] = df.bmi_r >= 4\n",
    "copy_null(df, 'bmi_r', 'obese')\n",
    "df.obese.isnull().mean(), df.obese.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    1665161\n",
       "2    1824151\n",
       "3     162650\n",
       "4     167806\n",
       "Name: pay_rec, dtype: int64"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pay_rec.replace([9], np.nan, inplace=True)\n",
    "df.pay_rec.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pay_rec</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>104</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         boy\n",
       "pay_rec     \n",
       "1        104\n",
       "2        105\n",
       "3        107\n",
       "4        105"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'pay_rec'\n",
    "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sex of baby"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "F    1952273\n",
       "M    2045902\n",
       "Name: sex, dtype: int64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sex.value_counts().sort_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Regression models\n",
    "\n",
    "Here are some functions I'll use to interpret the results of logistic regression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def logodds_to_ratio(logodds):\n",
    "    \"\"\"Convert from log odds to probability.\"\"\"\n",
    "    odds = np.exp(logodds)\n",
    "    return 100 * odds\n",
    "\n",
    "def summarize(results):\n",
    "    \"\"\"Summarize parameters in terms of birth ratio.\"\"\"\n",
    "    inter_or = results.params['Intercept']\n",
    "    inter_rat = logodds_to_ratio(inter_or)\n",
    "    \n",
    "    for value, lor in results.params.iteritems():\n",
    "        if value=='Intercept':\n",
    "            continue\n",
    "        \n",
    "        rat = logodds_to_ratio(inter_or + lor)\n",
    "        code = '*' if results.pvalues[value] < 0.05 else ' '\n",
    "        \n",
    "        print('%-20s   %0.1f   %0.1f' % (value, inter_rat, rat), code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I'll run models with each variable, one at a time."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's age seems to have no predictive value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692873\n",
      "         Iterations 3\n",
      "mager9                 105.1   105.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3998175</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3998173</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.129e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:18:28</td>     <th>  Log-Likelihood:    </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.4290</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0496</td> <td>    0.004</td> <td>   13.550</td> <td> 0.000</td> <td>    0.042     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>    <td>   -0.0007</td> <td>    0.001</td> <td>   -0.791</td> <td> 0.429</td> <td>   -0.002     0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3998175\n",
       "Model:                          Logit   Df Residuals:                  3998173\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.129e-07\n",
       "Time:                        14:18:28   Log-Likelihood:            -2.7702e+06\n",
       "converged:                       True   LL-Null:                   -2.7702e+06\n",
       "                                        LLR p-value:                    0.4290\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0496      0.004     13.550      0.000         0.042     0.057\n",
       "mager9        -0.0007      0.001     -0.791      0.429        -0.002     0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mager9', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692873\n",
      "         Iterations 3\n",
      "youngm[T.True]         104.8   104.9  \n",
      "oldm[T.True]           104.8   103.9  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3998175</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3998172</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.813e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:18:33</td>     <th>  Log-Likelihood:    </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.3478</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0470</td> <td>    0.001</td> <td>   44.772</td> <td> 0.000</td> <td>    0.045     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngm[T.True]</th> <td>    0.0010</td> <td>    0.004</td> <td>    0.240</td> <td> 0.810</td> <td>   -0.007     0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldm[T.True]</th>   <td>   -0.0084</td> <td>    0.006</td> <td>   -1.421</td> <td> 0.155</td> <td>   -0.020     0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3998175\n",
       "Model:                          Logit   Df Residuals:                  3998172\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               3.813e-07\n",
       "Time:                        14:18:33   Log-Likelihood:            -2.7702e+06\n",
       "converged:                       True   LL-Null:                   -2.7702e+06\n",
       "                                        LLR p-value:                    0.3478\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0470      0.001     44.772      0.000         0.045     0.049\n",
       "youngm[T.True]     0.0010      0.004      0.240      0.810        -0.007     0.009\n",
       "oldm[T.True]      -0.0084      0.006     -1.421      0.155        -0.020     0.003\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ youngm + oldm', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Whether the mother was born in the U.S. has no predictive value"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692873\n",
      "         Iterations 3\n",
      "C(mnativ)[T.2.0]       104.8   104.9  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3988351</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3988349</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.566e-08</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:19:00</td>     <th>  Log-Likelihood:    </th> <td>-2.7634e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7634e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.6154</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0466</td> <td>    0.001</td> <td>   41.050</td> <td> 0.000</td> <td>    0.044     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mnativ)[T.2.0]</th> <td>    0.0012</td> <td>    0.002</td> <td>    0.502</td> <td> 0.615</td> <td>   -0.004     0.006</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3988351\n",
       "Model:                          Logit   Df Residuals:                  3988349\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               4.566e-08\n",
       "Time:                        14:19:00   Log-Likelihood:            -2.7634e+06\n",
       "converged:                       True   LL-Null:                   -2.7634e+06\n",
       "                                        LLR p-value:                    0.6154\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0466      0.001     41.050      0.000         0.044     0.049\n",
       "C(mnativ)[T.2.0]     0.0012      0.002      0.502      0.615        -0.004     0.006\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(mnativ)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Neither does residence status"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692872\n",
      "         Iterations 3\n",
      "C(restatus)[T.2]       104.8   104.7  \n",
      "C(restatus)[T.3]       104.8   106.0  \n",
      "C(restatus)[T.4]       104.8   106.2  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3998175</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3998171</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.716e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:19:28</td>     <th>  Log-Likelihood:    </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2932</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0468</td> <td>    0.001</td> <td>   39.653</td> <td> 0.000</td> <td>    0.044     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.2]</th> <td>   -0.0010</td> <td>    0.002</td> <td>   -0.418</td> <td> 0.676</td> <td>   -0.005     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.3]</th> <td>    0.0117</td> <td>    0.007</td> <td>    1.718</td> <td> 0.086</td> <td>   -0.002     0.025</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(restatus)[T.4]</th> <td>    0.0132</td> <td>    0.020</td> <td>    0.663</td> <td> 0.507</td> <td>   -0.026     0.052</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3998175\n",
       "Model:                          Logit   Df Residuals:                  3998171\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               6.716e-07\n",
       "Time:                        14:19:28   Log-Likelihood:            -2.7702e+06\n",
       "converged:                       True   LL-Null:                   -2.7702e+06\n",
       "                                        LLR p-value:                    0.2932\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0468      0.001     39.653      0.000         0.044     0.049\n",
       "C(restatus)[T.2]    -0.0010      0.002     -0.418      0.676        -0.005     0.004\n",
       "C(restatus)[T.3]     0.0117      0.007      1.718      0.086        -0.002     0.025\n",
       "C(restatus)[T.4]     0.0132      0.020      0.663      0.507        -0.026     0.052\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(restatus)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's race seems to have predictive value.  Relative to whites, black and Native American mothers have more girls; Asians have more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692863\n",
      "         Iterations 3\n",
      "C(mbrace)[T.2]         105.1   102.9 *\n",
      "C(mbrace)[T.3]         105.1   103.1 *\n",
      "C(mbrace)[T.4]         105.1   106.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3998175</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3998171</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.401e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:19:55</td>     <th>  Log-Likelihood:    </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.007e-16</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0497</td> <td>    0.001</td> <td>   43.250</td> <td> 0.000</td> <td>    0.047     0.052</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.2]</th> <td>   -0.0214</td> <td>    0.003</td> <td>   -7.770</td> <td> 0.000</td> <td>   -0.027    -0.016</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.3]</th> <td>   -0.0195</td> <td>    0.010</td> <td>   -2.049</td> <td> 0.041</td> <td>   -0.038    -0.001</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.4]</th> <td>    0.0109</td> <td>    0.004</td> <td>    2.777</td> <td> 0.005</td> <td>    0.003     0.019</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3998175\n",
       "Model:                          Logit   Df Residuals:                  3998171\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.401e-05\n",
       "Time:                        14:19:55   Log-Likelihood:            -2.7702e+06\n",
       "converged:                       True   LL-Null:                   -2.7702e+06\n",
       "                                        LLR p-value:                 1.007e-16\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0497      0.001     43.250      0.000         0.047     0.052\n",
       "C(mbrace)[T.2]    -0.0214      0.003     -7.770      0.000        -0.027    -0.016\n",
       "C(mbrace)[T.3]    -0.0195      0.010     -2.049      0.041        -0.038    -0.001\n",
       "C(mbrace)[T.4]     0.0109      0.004      2.777      0.005         0.003     0.019\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(mbrace)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hispanic mothers have more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692874\n",
      "         Iterations 3\n",
      "mhisp                  105.0   104.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3967498</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3967496</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.998e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:19:59</td>     <th>  Log-Likelihood:    </th> <td>-2.7490e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7490e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0009174</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0485</td> <td>    0.001</td> <td>   42.263</td> <td> 0.000</td> <td>    0.046     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mhisp</th>     <td>   -0.0079</td> <td>    0.002</td> <td>   -3.315</td> <td> 0.001</td> <td>   -0.013    -0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3967498\n",
       "Model:                          Logit   Df Residuals:                  3967496\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.998e-06\n",
       "Time:                        14:19:59   Log-Likelihood:            -2.7490e+06\n",
       "converged:                       True   LL-Null:                   -2.7490e+06\n",
       "                                        LLR p-value:                 0.0009174\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0485      0.001     42.263      0.000         0.046     0.051\n",
       "mhisp         -0.0079      0.002     -3.315      0.001        -0.013    -0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mhisp', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692864\n",
      "         Iterations 3\n",
      "C(mar_p)[T.Y]          102.8   105.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3849169</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3849167</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>9.129e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:20:27</td>     <th>  Log-Likelihood:    </th> <td>-2.6670e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6670e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>2.990e-12</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "        <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>     <td>    0.0278</td> <td>    0.003</td> <td>    9.446</td> <td> 0.000</td> <td>    0.022     0.034</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mar_p)[T.Y]</th> <td>    0.0219</td> <td>    0.003</td> <td>    6.978</td> <td> 0.000</td> <td>    0.016     0.028</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3849169\n",
       "Model:                          Logit   Df Residuals:                  3849167\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               9.129e-06\n",
       "Time:                        14:20:27   Log-Likelihood:            -2.6670e+06\n",
       "converged:                       True   LL-Null:                   -2.6670e+06\n",
       "                                        LLR p-value:                 2.990e-12\n",
       "=================================================================================\n",
       "                    coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "---------------------------------------------------------------------------------\n",
       "Intercept         0.0278      0.003      9.446      0.000         0.022     0.034\n",
       "C(mar_p)[T.Y]     0.0219      0.003      6.978      0.000         0.016     0.028\n",
       "=================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(mar_p)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Being unmarried predicts more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692871\n",
      "         Iterations 3\n",
      "C(dmar)[T.2]           105.1   104.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3998175</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3998173</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.001e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:20:54</td>     <th>  Log-Likelihood:    </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7702e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.555e-05</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "        <td></td>          <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>    <td>    0.0502</td> <td>    0.001</td> <td>   38.789</td> <td> 0.000</td> <td>    0.048     0.053</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(dmar)[T.2]</th> <td>   -0.0083</td> <td>    0.002</td> <td>   -4.077</td> <td> 0.000</td> <td>   -0.012    -0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3998175\n",
       "Model:                          Logit   Df Residuals:                  3998173\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               3.001e-06\n",
       "Time:                        14:20:54   Log-Likelihood:            -2.7702e+06\n",
       "converged:                       True   LL-Null:                   -2.7702e+06\n",
       "                                        LLR p-value:                 4.555e-05\n",
       "================================================================================\n",
       "                   coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "--------------------------------------------------------------------------------\n",
       "Intercept        0.0502      0.001     38.789      0.000         0.048     0.053\n",
       "C(dmar)[T.2]    -0.0083      0.002     -4.077      0.000        -0.012    -0.004\n",
       "================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(dmar)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each level of mother's education predicts a small increase in the probability of a boy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692874\n",
      "         Iterations 3\n",
      "meduc                  104.1   104.2 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3810525</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3810523</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.416e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:20:59</td>     <th>  Log-Likelihood:    </th> <td>-2.6402e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6402e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.006248</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0398</td> <td>    0.003</td> <td>   14.711</td> <td> 0.000</td> <td>    0.034     0.045</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>meduc</th>     <td>    0.0016</td> <td>    0.001</td> <td>    2.734</td> <td> 0.006</td> <td>    0.000     0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3810525\n",
       "Model:                          Logit   Df Residuals:                  3810523\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.416e-06\n",
       "Time:                        14:20:59   Log-Likelihood:            -2.6402e+06\n",
       "converged:                       True   LL-Null:                   -2.6402e+06\n",
       "                                        LLR p-value:                  0.006248\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0398      0.003     14.711      0.000         0.034     0.045\n",
       "meduc          0.0016      0.001      2.734      0.006         0.000     0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ meduc', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692874\n",
      "         Iterations 3\n",
      "lowed                  104.9   104.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3810525</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3810523</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.431e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:03</td>     <th>  Log-Likelihood:    </th> <td>-2.6402e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6402e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.005983</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0478</td> <td>    0.001</td> <td>   43.002</td> <td> 0.000</td> <td>    0.046     0.050</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>     <td>   -0.0079</td> <td>    0.003</td> <td>   -2.749</td> <td> 0.006</td> <td>   -0.013    -0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3810525\n",
       "Model:                          Logit   Df Residuals:                  3810523\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.431e-06\n",
       "Time:                        14:21:03   Log-Likelihood:            -2.6402e+06\n",
       "converged:                       True   LL-Null:                   -2.6402e+06\n",
       "                                        LLR p-value:                  0.005983\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0478      0.001     43.002      0.000         0.046     0.050\n",
       "lowed         -0.0079      0.003     -2.749      0.006        -0.013    -0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ lowed', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692840\n",
      "         Iterations 3\n",
      "fagerrec11             105.9   105.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3501060</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3501058</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.226e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:08</td>     <th>  Log-Likelihood:    </th> <td>-2.4257e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.4257e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.04575</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "       <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>  <td>    0.0570</td> <td>    0.004</td> <td>   14.707</td> <td> 0.000</td> <td>    0.049     0.065</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fagerrec11</th> <td>   -0.0015</td> <td>    0.001</td> <td>   -1.998</td> <td> 0.046</td> <td>   -0.003  -2.9e-05</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3501060\n",
       "Model:                          Logit   Df Residuals:                  3501058\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.226e-07\n",
       "Time:                        14:21:08   Log-Likelihood:            -2.4257e+06\n",
       "converged:                       True   LL-Null:                   -2.4257e+06\n",
       "                                        LLR p-value:                   0.04575\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0570      0.004     14.707      0.000         0.049     0.065\n",
       "fagerrec11    -0.0015      0.001     -1.998      0.046        -0.003  -2.9e-05\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ fagerrec11', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692840\n",
      "         Iterations 3\n",
      "youngf                 105.1   106.3  \n",
      "oldf                   105.1   105.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3501060</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3501057</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.807e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:12</td>     <th>  Log-Likelihood:    </th> <td>-2.4257e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.4257e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2445</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0493</td> <td>    0.001</td> <td>   44.656</td> <td> 0.000</td> <td>    0.047     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngf</th>    <td>    0.0116</td> <td>    0.007</td> <td>    1.673</td> <td> 0.094</td> <td>   -0.002     0.025</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldf</th>      <td>   -0.0005</td> <td>    0.006</td> <td>   -0.086</td> <td> 0.932</td> <td>   -0.012     0.011</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3501060\n",
       "Model:                          Logit   Df Residuals:                  3501057\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               5.807e-07\n",
       "Time:                        14:21:12   Log-Likelihood:            -2.4257e+06\n",
       "converged:                       True   LL-Null:                   -2.4257e+06\n",
       "                                        LLR p-value:                    0.2445\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0493      0.001     44.656      0.000         0.047     0.051\n",
       "youngf         0.0116      0.007      1.673      0.094        -0.002     0.025\n",
       "oldf          -0.0005      0.006     -0.086      0.932        -0.012     0.011\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ youngf + oldf', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692818\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.5   103.1 *\n",
      "C(fbrace)[T.3.0]       105.5   102.9 *\n",
      "C(fbrace)[T.4.0]       105.5   106.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3254136</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3254132</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.504e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:38</td>     <th>  Log-Likelihood:    </th> <td>-2.2545e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2546e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.256e-14</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0533</td> <td>    0.001</td> <td>   42.144</td> <td> 0.000</td> <td>    0.051     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0227</td> <td>    0.003</td> <td>   -7.221</td> <td> 0.000</td> <td>   -0.029    -0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0250</td> <td>    0.011</td> <td>   -2.335</td> <td> 0.020</td> <td>   -0.046    -0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0106</td> <td>    0.004</td> <td>    2.479</td> <td> 0.013</td> <td>    0.002     0.019</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3254136\n",
       "Model:                          Logit   Df Residuals:                  3254132\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.504e-05\n",
       "Time:                        14:21:38   Log-Likelihood:            -2.2545e+06\n",
       "converged:                       True   LL-Null:                   -2.2546e+06\n",
       "                                        LLR p-value:                 1.256e-14\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0533      0.001     42.144      0.000         0.051     0.056\n",
       "C(fbrace)[T.2.0]    -0.0227      0.003     -7.221      0.000        -0.029    -0.017\n",
       "C(fbrace)[T.3.0]    -0.0250      0.011     -2.335      0.020        -0.046    -0.004\n",
       "C(fbrace)[T.4.0]     0.0106      0.004      2.479      0.013         0.002     0.019\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(fbrace)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the father is Hispanic, that predicts more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692839\n",
      "         Iterations 3\n",
      "fhisp                  105.4   104.0 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3453052</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3453050</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.800e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:42</td>     <th>  Log-Likelihood:    </th> <td>-2.3924e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.3924e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.378e-07</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0525</td> <td>    0.001</td> <td>   42.696</td> <td> 0.000</td> <td>    0.050     0.055</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>     <td>   -0.0134</td> <td>    0.003</td> <td>   -5.268</td> <td> 0.000</td> <td>   -0.018    -0.008</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3453052\n",
       "Model:                          Logit   Df Residuals:                  3453050\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               5.800e-06\n",
       "Time:                        14:21:42   Log-Likelihood:            -2.3924e+06\n",
       "converged:                       True   LL-Null:                   -2.3924e+06\n",
       "                                        LLR p-value:                 1.378e-07\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0525      0.001     42.696      0.000         0.050     0.055\n",
       "fhisp         -0.0134      0.003     -5.268      0.000        -0.018    -0.008\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ fhisp', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Father's education level might predict more boys, but the apparent effect could be due to chance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692840\n",
      "         Iterations 3\n",
      "feduc                  104.6   104.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3279126</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3279124</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.046e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:46</td>     <th>  Log-Likelihood:    </th> <td>-2.2719e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2719e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.05587</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0445</td> <td>    0.003</td> <td>   15.630</td> <td> 0.000</td> <td>    0.039     0.050</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>feduc</th>     <td>    0.0012</td> <td>    0.001</td> <td>    1.912</td> <td> 0.056</td> <td>-3.02e-05     0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3279126\n",
       "Model:                          Logit   Df Residuals:                  3279124\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.046e-07\n",
       "Time:                        14:21:46   Log-Likelihood:            -2.2719e+06\n",
       "converged:                       True   LL-Null:                   -2.2719e+06\n",
       "                                        LLR p-value:                   0.05587\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0445      0.003     15.630      0.000         0.039     0.050\n",
       "feduc          0.0012      0.001      1.912      0.056     -3.02e-05     0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ feduc', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Babies with high birth order are slightly more likely to be girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692872\n",
      "         Iterations 3\n",
      "lbo_rec                105.3   105.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3978150</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3978148</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.576e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:51</td>     <th>  Log-Likelihood:    </th> <td>-2.7563e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7564e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.003206</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0518</td> <td>    0.002</td> <td>   26.529</td> <td> 0.000</td> <td>    0.048     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lbo_rec</th>   <td>   -0.0023</td> <td>    0.001</td> <td>   -2.947</td> <td> 0.003</td> <td>   -0.004    -0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3978150\n",
       "Model:                          Logit   Df Residuals:                  3978148\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.576e-06\n",
       "Time:                        14:21:51   Log-Likelihood:            -2.7563e+06\n",
       "converged:                       True   LL-Null:                   -2.7564e+06\n",
       "                                        LLR p-value:                  0.003206\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0518      0.002     26.529      0.000         0.048     0.056\n",
       "lbo_rec       -0.0023      0.001     -2.947      0.003        -0.004    -0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ lbo_rec', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692872\n",
      "         Iterations 3\n",
      "highbo                 104.9   103.4 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3978150</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3978148</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.647e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:21:56</td>     <th>  Log-Likelihood:    </th> <td>-2.7563e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.7564e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.002584</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0475</td> <td>    0.001</td> <td>   46.200</td> <td> 0.000</td> <td>    0.046     0.050</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>    <td>   -0.0139</td> <td>    0.005</td> <td>   -3.013</td> <td> 0.003</td> <td>   -0.023    -0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3978150\n",
       "Model:                          Logit   Df Residuals:                  3978148\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.647e-06\n",
       "Time:                        14:21:56   Log-Likelihood:            -2.7563e+06\n",
       "converged:                       True   LL-Null:                   -2.7564e+06\n",
       "                                        LLR p-value:                  0.002584\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0475      0.001     46.200      0.000         0.046     0.050\n",
       "highbo        -0.0139      0.005     -3.013      0.003        -0.023    -0.005\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ highbo', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Strangely, prenatal visits are associated with an increased probability of girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692847\n",
      "         Iterations 3\n",
      "previs                 104.6   103.8 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3888065</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3888063</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.975e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:01</td>     <th>  Log-Likelihood:    </th> <td>-2.6938e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6939e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.677e-48</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0449</td> <td>    0.001</td> <td>   43.933</td> <td> 0.000</td> <td>    0.043     0.047</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0079</td> <td>    0.001</td> <td>  -14.634</td> <td> 0.000</td> <td>   -0.009    -0.007</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3888065\n",
       "Model:                          Logit   Df Residuals:                  3888063\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               3.975e-05\n",
       "Time:                        14:22:01   Log-Likelihood:            -2.6938e+06\n",
       "converged:                       True   LL-Null:                   -2.6939e+06\n",
       "                                        LLR p-value:                 1.677e-48\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0449      0.001     43.933      0.000         0.043     0.047\n",
       "previs        -0.0079      0.001    -14.634      0.000        -0.009    -0.007\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ previs', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692842\n",
      "         Iterations 3\n",
      "no_previs              104.6   98.9 *\n",
      "previs                 104.6   103.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3888065</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3888062</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.717e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:07</td>     <th>  Log-Likelihood:    </th> <td>-2.6938e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6939e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>6.538e-56</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0454</td> <td>    0.001</td> <td>   44.310</td> <td> 0.000</td> <td>    0.043     0.047</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th> <td>   -0.0564</td> <td>    0.009</td> <td>   -6.322</td> <td> 0.000</td> <td>   -0.074    -0.039</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0093</td> <td>    0.001</td> <td>  -15.938</td> <td> 0.000</td> <td>   -0.010    -0.008</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3888065\n",
       "Model:                          Logit   Df Residuals:                  3888062\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               4.717e-05\n",
       "Time:                        14:22:07   Log-Likelihood:            -2.6938e+06\n",
       "converged:                       True   LL-Null:                   -2.6939e+06\n",
       "                                        LLR p-value:                 6.538e-56\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0454      0.001     44.310      0.000         0.043     0.047\n",
       "no_previs     -0.0564      0.009     -6.322      0.000        -0.074    -0.039\n",
       "previs        -0.0093      0.001    -15.938      0.000        -0.010    -0.008\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ no_previs + previs', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the mother qualifies for food stamps, she is more likely to have a girl."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692869\n",
      "         Iterations 3\n",
      "wic[T.Y]               105.2   104.3 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3759121</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3759119</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.051e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:35</td>     <th>  Log-Likelihood:    </th> <td>-2.6046e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6046e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>6.700e-05</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0506</td> <td>    0.001</td> <td>   36.886</td> <td> 0.000</td> <td>    0.048     0.053</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>  <td>   -0.0083</td> <td>    0.002</td> <td>   -3.987</td> <td> 0.000</td> <td>   -0.012    -0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3759121\n",
       "Model:                          Logit   Df Residuals:                  3759119\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               3.051e-06\n",
       "Time:                        14:22:35   Log-Likelihood:            -2.6046e+06\n",
       "converged:                       True   LL-Null:                   -2.6046e+06\n",
       "                                        LLR p-value:                 6.700e-05\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0506      0.001     36.886      0.000         0.048     0.053\n",
       "wic[T.Y]      -0.0083      0.002     -3.987      0.000        -0.012    -0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ wic', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's height seems to have no predictive value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692873\n",
      "         Iterations 3\n",
      "height                 102.4   102.5  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3790892</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3790890</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.853e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:39</td>     <th>  Log-Likelihood:    </th> <td>-2.6266e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6266e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.3238</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0240</td> <td>    0.023</td> <td>    1.038</td> <td> 0.299</td> <td>   -0.021     0.069</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>height</th>    <td>    0.0004</td> <td>    0.000</td> <td>    0.987</td> <td> 0.324</td> <td>   -0.000     0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3790892\n",
       "Model:                          Logit   Df Residuals:                  3790890\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.853e-07\n",
       "Time:                        14:22:39   Log-Likelihood:            -2.6266e+06\n",
       "converged:                       True   LL-Null:                   -2.6266e+06\n",
       "                                        LLR p-value:                    0.3238\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0240      0.023      1.038      0.299        -0.021     0.069\n",
       "height         0.0004      0.000      0.987      0.324        -0.000     0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ height', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692872\n",
      "         Iterations 3\n",
      "mtall                  104.8   104.1  \n",
      "mshort                 104.8   104.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3790892</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3790889</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.560e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:43</td>     <th>  Log-Likelihood:    </th> <td>-2.6266e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6266e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.3019</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0473</td> <td>    0.001</td> <td>   44.433</td> <td> 0.000</td> <td>    0.045     0.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mtall</th>     <td>   -0.0071</td> <td>    0.006</td> <td>   -1.212</td> <td> 0.226</td> <td>   -0.018     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mshort</th>    <td>   -0.0056</td> <td>    0.006</td> <td>   -1.005</td> <td> 0.315</td> <td>   -0.016     0.005</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3790892\n",
       "Model:                          Logit   Df Residuals:                  3790889\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               4.560e-07\n",
       "Time:                        14:22:43   Log-Likelihood:            -2.6266e+06\n",
       "converged:                       True   LL-Null:                   -2.6266e+06\n",
       "                                        LLR p-value:                    0.3019\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0473      0.001     44.433      0.000         0.045     0.049\n",
       "mtall         -0.0071      0.006     -1.212      0.226        -0.018     0.004\n",
       "mshort        -0.0056      0.006     -1.005      0.315        -0.016     0.005\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ mtall + mshort', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Mother's with higher BMI are more likely to have girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692870\n",
      "         Iterations 3\n",
      "bmi_r                  105.7   105.4 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3709225</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3709223</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.168e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:48</td>     <th>  Log-Likelihood:    </th> <td>-2.5700e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.5700e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0008442</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0554</td> <td>    0.003</td> <td>   20.336</td> <td> 0.000</td> <td>    0.050     0.061</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>bmi_r</th>     <td>   -0.0029</td> <td>    0.001</td> <td>   -3.338</td> <td> 0.001</td> <td>   -0.005    -0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3709225\n",
       "Model:                          Logit   Df Residuals:                  3709223\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.168e-06\n",
       "Time:                        14:22:48   Log-Likelihood:            -2.5700e+06\n",
       "converged:                       True   LL-Null:                   -2.5700e+06\n",
       "                                        LLR p-value:                 0.0008442\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0554      0.003     20.336      0.000         0.050     0.061\n",
       "bmi_r         -0.0029      0.001     -3.338      0.001        -0.005    -0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ bmi_r', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692870\n",
      "         Iterations 3\n",
      "obese                  105.0   104.2 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3709225</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3709223</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.347e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:22:53</td>     <th>  Log-Likelihood:    </th> <td>-2.5700e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.5700e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.0005139</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0491</td> <td>    0.001</td> <td>   40.976</td> <td> 0.000</td> <td>    0.047     0.051</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>     <td>   -0.0084</td> <td>    0.002</td> <td>   -3.473</td> <td> 0.001</td> <td>   -0.013    -0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3709225\n",
       "Model:                          Logit   Df Residuals:                  3709223\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.347e-06\n",
       "Time:                        14:22:53   Log-Likelihood:            -2.5700e+06\n",
       "converged:                       True   LL-Null:                   -2.5700e+06\n",
       "                                        LLR p-value:                 0.0005139\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0491      0.001     40.976      0.000         0.047     0.051\n",
       "obese         -0.0084      0.002     -3.473      0.001        -0.013    -0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ obese', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If payment was made by Medicaid, the baby is more likely to be a girl.  Private insurance, self-payment, and other payment method are associated with more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692869\n",
      "         Iterations 3\n",
      "C(pay_rec)[T.2.0]      104.2   105.1 *\n",
      "C(pay_rec)[T.3.0]      104.2   106.6 *\n",
      "C(pay_rec)[T.4.0]      104.2   104.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3819768</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3819764</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>5.306e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:23:19</td>     <th>  Log-Likelihood:    </th> <td>-2.6466e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.6466e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.482e-06</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0416</td> <td>    0.002</td> <td>   26.840</td> <td> 0.000</td> <td>    0.039     0.045</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>    0.0085</td> <td>    0.002</td> <td>    3.982</td> <td> 0.000</td> <td>    0.004     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0222</td> <td>    0.005</td> <td>    4.272</td> <td> 0.000</td> <td>    0.012     0.032</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>    0.0047</td> <td>    0.005</td> <td>    0.925</td> <td> 0.355</td> <td>   -0.005     0.015</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3819768\n",
       "Model:                          Logit   Df Residuals:                  3819764\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               5.306e-06\n",
       "Time:                        14:23:19   Log-Likelihood:            -2.6466e+06\n",
       "converged:                       True   LL-Null:                   -2.6466e+06\n",
       "                                        LLR p-value:                 3.482e-06\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0416      0.002     26.840      0.000         0.039     0.045\n",
       "C(pay_rec)[T.2.0]     0.0085      0.002      3.982      0.000         0.004     0.013\n",
       "C(pay_rec)[T.3.0]     0.0222      0.005      4.272      0.000         0.012     0.032\n",
       "C(pay_rec)[T.4.0]     0.0047      0.005      0.925      0.355        -0.005     0.015\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = smf.logit('boy ~ C(pay_rec)', data=df)    \n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding controls\n",
    "\n",
    "However, none of the previous results should be taken too seriously.  We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.\n",
    "\n",
    "In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692816\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   103.1 *\n",
      "C(fbrace)[T.3.0]       105.8   103.5  \n",
      "C(fbrace)[T.4.0]       105.8   106.9  \n",
      "C(mbrace)[T.2]         105.8   105.9  \n",
      "C(mbrace)[T.3]         105.8   104.5  \n",
      "C(mbrace)[T.4]         105.8   105.6  \n",
      "fhisp                  105.8   104.2 *\n",
      "mhisp                  105.8   106.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3231530</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3231521</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     8</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.087e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:24:08</td>     <th>  Log-Likelihood:    </th> <td>-2.2389e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2389e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>9.292e-17</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0566</td> <td>    0.001</td> <td>   38.234</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0260</td> <td>    0.006</td> <td>   -4.668</td> <td> 0.000</td> <td>   -0.037    -0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0221</td> <td>    0.012</td> <td>   -1.793</td> <td> 0.073</td> <td>   -0.046     0.002</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0097</td> <td>    0.007</td> <td>    1.344</td> <td> 0.179</td> <td>   -0.004     0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.2]</th>   <td>    0.0004</td> <td>    0.006</td> <td>    0.075</td> <td> 0.940</td> <td>   -0.011     0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.3]</th>   <td>   -0.0130</td> <td>    0.013</td> <td>   -0.994</td> <td> 0.320</td> <td>   -0.039     0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(mbrace)[T.4]</th>   <td>   -0.0026</td> <td>    0.007</td> <td>   -0.375</td> <td> 0.708</td> <td>   -0.016     0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0156</td> <td>    0.004</td> <td>   -3.591</td> <td> 0.000</td> <td>   -0.024    -0.007</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mhisp</th>            <td>    0.0018</td> <td>    0.004</td> <td>    0.422</td> <td> 0.673</td> <td>   -0.007     0.010</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3231530\n",
       "Model:                          Logit   Df Residuals:                  3231521\n",
       "Method:                           MLE   Df Model:                            8\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.087e-05\n",
       "Time:                        14:24:08   Log-Likelihood:            -2.2389e+06\n",
       "converged:                       True   LL-Null:                   -2.2389e+06\n",
       "                                        LLR p-value:                 9.292e-17\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0566      0.001     38.234      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0260      0.006     -4.668      0.000        -0.037    -0.015\n",
       "C(fbrace)[T.3.0]    -0.0221      0.012     -1.793      0.073        -0.046     0.002\n",
       "C(fbrace)[T.4.0]     0.0097      0.007      1.344      0.179        -0.004     0.024\n",
       "C(mbrace)[T.2]       0.0004      0.006      0.075      0.940        -0.011     0.012\n",
       "C(mbrace)[T.3]      -0.0130      0.013     -0.994      0.320        -0.039     0.013\n",
       "C(mbrace)[T.4]      -0.0026      0.007     -0.375      0.708        -0.016     0.011\n",
       "fhisp               -0.0156      0.004     -3.591      0.000        -0.024    -0.007\n",
       "mhisp                0.0018      0.004      0.422      0.673        -0.007     0.010\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692814\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       108.2   105.5 *\n",
      "C(fbrace)[T.3.0]       108.2   105.2 *\n",
      "C(fbrace)[T.4.0]       108.2   109.1  \n",
      "mar_p[T.Y]             108.2   105.8  \n",
      "fhisp                  108.2   106.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3112362</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3112356</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.117e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:24:56</td>     <th>  Log-Likelihood:    </th> <td>-2.1563e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1563e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.558e-18</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0792</td> <td>    0.015</td> <td>    5.155</td> <td> 0.000</td> <td>    0.049     0.109</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0258</td> <td>    0.003</td> <td>   -7.860</td> <td> 0.000</td> <td>   -0.032    -0.019</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0283</td> <td>    0.011</td> <td>   -2.594</td> <td> 0.009</td> <td>   -0.050    -0.007</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0074</td> <td>    0.004</td> <td>    1.662</td> <td> 0.097</td> <td>   -0.001     0.016</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mar_p[T.Y]</th>       <td>   -0.0225</td> <td>    0.015</td> <td>   -1.464</td> <td> 0.143</td> <td>   -0.053     0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0148</td> <td>    0.003</td> <td>   -4.982</td> <td> 0.000</td> <td>   -0.021    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3112362\n",
       "Model:                          Logit   Df Residuals:                  3112356\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.117e-05\n",
       "Time:                        14:24:56   Log-Likelihood:            -2.1563e+06\n",
       "converged:                       True   LL-Null:                   -2.1563e+06\n",
       "                                        LLR p-value:                 3.558e-18\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0792      0.015      5.155      0.000         0.049     0.109\n",
       "C(fbrace)[T.2.0]    -0.0258      0.003     -7.860      0.000        -0.032    -0.019\n",
       "C(fbrace)[T.3.0]    -0.0283      0.011     -2.594      0.009        -0.050    -0.007\n",
       "C(fbrace)[T.4.0]     0.0074      0.004      1.662      0.097        -0.001     0.016\n",
       "mar_p[T.Y]          -0.0225      0.015     -1.464      0.143        -0.053     0.008\n",
       "fhisp               -0.0148      0.003     -4.982      0.000        -0.021    -0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + mar_p')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Being married still predicts more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692814\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.0   102.2 *\n",
      "C(fbrace)[T.3.0]       105.0   101.9 *\n",
      "C(fbrace)[T.4.0]       105.0   105.9  \n",
      "fhisp                  105.0   103.4 *\n",
      "dmar                   105.0   105.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3235798</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3235792</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.183e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:25:22</td>     <th>  Log-Likelihood:    </th> <td>-2.2418e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2419e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.485e-19</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0492</td> <td>    0.003</td> <td>   14.375</td> <td> 0.000</td> <td>    0.042     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0278</td> <td>    0.003</td> <td>   -8.324</td> <td> 0.000</td> <td>   -0.034    -0.021</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0301</td> <td>    0.011</td> <td>   -2.778</td> <td> 0.005</td> <td>   -0.051    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0081</td> <td>    0.004</td> <td>    1.871</td> <td> 0.061</td> <td>   -0.000     0.017</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0156</td> <td>    0.003</td> <td>   -5.270</td> <td> 0.000</td> <td>   -0.021    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>             <td>    0.0062</td> <td>    0.003</td> <td>    2.416</td> <td> 0.016</td> <td>    0.001     0.011</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3235798\n",
       "Model:                          Logit   Df Residuals:                  3235792\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.183e-05\n",
       "Time:                        14:25:22   Log-Likelihood:            -2.2418e+06\n",
       "converged:                       True   LL-Null:                   -2.2419e+06\n",
       "                                        LLR p-value:                 1.485e-19\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0492      0.003     14.375      0.000         0.042     0.056\n",
       "C(fbrace)[T.2.0]    -0.0278      0.003     -8.324      0.000        -0.034    -0.021\n",
       "C(fbrace)[T.3.0]    -0.0301      0.011     -2.778      0.005        -0.051    -0.009\n",
       "C(fbrace)[T.4.0]     0.0081      0.004      1.871      0.061        -0.000     0.017\n",
       "fhisp               -0.0156      0.003     -5.270      0.000        -0.021    -0.010\n",
       "dmar                 0.0062      0.003      2.416      0.016         0.001     0.011\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + dmar')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of education disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692816\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   103.1 *\n",
      "C(fbrace)[T.3.0]       105.8   102.8 *\n",
      "C(fbrace)[T.4.0]       105.8   106.5  \n",
      "fhisp                  105.8   104.2 *\n",
      "lowed                  105.8   106.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3091385</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3091379</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.076e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:25:47</td>     <th>  Log-Likelihood:    </th> <td>-2.1418e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1418e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.130e-17</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0566</td> <td>    0.001</td> <td>   37.993</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0259</td> <td>    0.003</td> <td>   -7.838</td> <td> 0.000</td> <td>   -0.032    -0.019</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0287</td> <td>    0.011</td> <td>   -2.624</td> <td> 0.009</td> <td>   -0.050    -0.007</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0067</td> <td>    0.004</td> <td>    1.487</td> <td> 0.137</td> <td>   -0.002     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0152</td> <td>    0.003</td> <td>   -4.927</td> <td> 0.000</td> <td>   -0.021    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>            <td>    0.0017</td> <td>    0.004</td> <td>    0.462</td> <td> 0.644</td> <td>   -0.006     0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3091385\n",
       "Model:                          Logit   Df Residuals:                  3091379\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.076e-05\n",
       "Time:                        14:25:47   Log-Likelihood:            -2.1418e+06\n",
       "converged:                       True   LL-Null:                   -2.1418e+06\n",
       "                                        LLR p-value:                 1.130e-17\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0566      0.001     37.993      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0259      0.003     -7.838      0.000        -0.032    -0.019\n",
       "C(fbrace)[T.3.0]    -0.0287      0.011     -2.624      0.009        -0.050    -0.007\n",
       "C(fbrace)[T.4.0]     0.0067      0.004      1.487      0.137        -0.002     0.015\n",
       "fhisp               -0.0152      0.003     -4.927      0.000        -0.021    -0.009\n",
       "lowed                0.0017      0.004      0.462      0.644        -0.006     0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + lowed')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of birth order disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692816\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   103.2 *\n",
      "C(fbrace)[T.3.0]       105.8   102.9 *\n",
      "C(fbrace)[T.4.0]       105.8   106.6  \n",
      "fhisp                  105.8   104.4 *\n",
      "highbo                 105.8   105.6  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3221819</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3221813</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.029e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:26:13</td>     <th>  Log-Likelihood:    </th> <td>-2.2321e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.2322e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>5.072e-18</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0566</td> <td>    0.001</td> <td>   38.815</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0253</td> <td>    0.003</td> <td>   -7.841</td> <td> 0.000</td> <td>   -0.032    -0.019</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0284</td> <td>    0.011</td> <td>   -2.616</td> <td> 0.009</td> <td>   -0.050    -0.007</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0077</td> <td>    0.004</td> <td>    1.758</td> <td> 0.079</td> <td>   -0.001     0.016</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0139</td> <td>    0.003</td> <td>   -4.785</td> <td> 0.000</td> <td>   -0.020    -0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>           <td>   -0.0026</td> <td>    0.005</td> <td>   -0.483</td> <td> 0.629</td> <td>   -0.013     0.008</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3221819\n",
       "Model:                          Logit   Df Residuals:                  3221813\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.029e-05\n",
       "Time:                        14:26:13   Log-Likelihood:            -2.2321e+06\n",
       "converged:                       True   LL-Null:                   -2.2322e+06\n",
       "                                        LLR p-value:                 5.072e-18\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0566      0.001     38.815      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0253      0.003     -7.841      0.000        -0.032    -0.019\n",
       "C(fbrace)[T.3.0]    -0.0284      0.011     -2.616      0.009        -0.050    -0.007\n",
       "C(fbrace)[T.4.0]     0.0077      0.004      1.758      0.079        -0.001     0.016\n",
       "fhisp               -0.0139      0.003     -4.785      0.000        -0.020    -0.008\n",
       "highbo              -0.0026      0.005     -0.483      0.629        -0.013     0.008\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + highbo')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "WIC is no longer associated with more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692813\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   103.0 *\n",
      "C(fbrace)[T.3.0]       105.8   103.0 *\n",
      "C(fbrace)[T.4.0]       105.8   106.6  \n",
      "wic[T.Y]               105.8   106.1  \n",
      "fhisp                  105.8   104.1 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3040527</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3040521</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.175e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:27:01</td>     <th>  Log-Likelihood:    </th> <td>-2.1065e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1066e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.031e-18</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0564</td> <td>    0.002</td> <td>   34.772</td> <td> 0.000</td> <td>    0.053     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0271</td> <td>    0.003</td> <td>   -7.892</td> <td> 0.000</td> <td>   -0.034    -0.020</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0267</td> <td>    0.011</td> <td>   -2.405</td> <td> 0.016</td> <td>   -0.048    -0.005</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0076</td> <td>    0.005</td> <td>    1.670</td> <td> 0.095</td> <td>   -0.001     0.016</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>         <td>    0.0025</td> <td>    0.003</td> <td>    0.975</td> <td> 0.330</td> <td>   -0.002     0.007</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0161</td> <td>    0.003</td> <td>   -5.153</td> <td> 0.000</td> <td>   -0.022    -0.010</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3040527\n",
       "Model:                          Logit   Df Residuals:                  3040521\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.175e-05\n",
       "Time:                        14:27:01   Log-Likelihood:            -2.1065e+06\n",
       "converged:                       True   LL-Null:                   -2.1066e+06\n",
       "                                        LLR p-value:                 3.031e-18\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0564      0.002     34.772      0.000         0.053     0.060\n",
       "C(fbrace)[T.2.0]    -0.0271      0.003     -7.892      0.000        -0.034    -0.020\n",
       "C(fbrace)[T.3.0]    -0.0267      0.011     -2.405      0.016        -0.048    -0.005\n",
       "C(fbrace)[T.4.0]     0.0076      0.005      1.670      0.095        -0.001     0.016\n",
       "wic[T.Y]             0.0025      0.003      0.975      0.330        -0.002     0.007\n",
       "fhisp               -0.0161      0.003     -5.153      0.000        -0.022    -0.010\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + wic')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of obesity disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692815\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.9   103.3 *\n",
      "C(fbrace)[T.3.0]       105.9   103.1 *\n",
      "C(fbrace)[T.4.0]       105.9   106.5  \n",
      "fhisp                  105.9   104.3 *\n",
      "obese                  105.9   105.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3005073</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3005067</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.947e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:27:26</td>     <th>  Log-Likelihood:    </th> <td>-2.0820e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.0820e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>5.013e-16</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0571</td> <td>    0.002</td> <td>   35.622</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0247</td> <td>    0.003</td> <td>   -7.305</td> <td> 0.000</td> <td>   -0.031    -0.018</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0266</td> <td>    0.011</td> <td>   -2.410</td> <td> 0.016</td> <td>   -0.048    -0.005</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0056</td> <td>    0.005</td> <td>    1.217</td> <td> 0.224</td> <td>   -0.003     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0151</td> <td>    0.003</td> <td>   -4.996</td> <td> 0.000</td> <td>   -0.021    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>            <td>   -0.0014</td> <td>    0.003</td> <td>   -0.524</td> <td> 0.600</td> <td>   -0.007     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3005073\n",
       "Model:                          Logit   Df Residuals:                  3005067\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.947e-05\n",
       "Time:                        14:27:26   Log-Likelihood:            -2.0820e+06\n",
       "converged:                       True   LL-Null:                   -2.0820e+06\n",
       "                                        LLR p-value:                 5.013e-16\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0571      0.002     35.622      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0247      0.003     -7.305      0.000        -0.031    -0.018\n",
       "C(fbrace)[T.3.0]    -0.0266      0.011     -2.410      0.016        -0.048    -0.005\n",
       "C(fbrace)[T.4.0]     0.0056      0.005      1.217      0.224        -0.003     0.015\n",
       "fhisp               -0.0151      0.003     -4.996      0.000        -0.021    -0.009\n",
       "obese               -0.0014      0.003     -0.524      0.600        -0.007     0.004\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + obese')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of payment method is diminished, but self-payment is still associated with more boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692812\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       106.1   103.3 *\n",
      "C(fbrace)[T.3.0]       106.1   103.0 *\n",
      "C(fbrace)[T.4.0]       106.1   106.7  \n",
      "C(pay_rec)[T.2.0]      106.1   105.7  \n",
      "C(pay_rec)[T.3.0]      106.1   108.3 *\n",
      "C(pay_rec)[T.4.0]      106.1   105.4  \n",
      "fhisp                  106.1   104.4 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3086812</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3086804</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.500e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:28:14</td>     <th>  Log-Likelihood:    </th> <td>-2.1386e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1386e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>3.965e-20</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0593</td> <td>    0.002</td> <td>   25.249</td> <td> 0.000</td> <td>    0.055     0.064</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th>  <td>   -0.0271</td> <td>    0.003</td> <td>   -7.980</td> <td> 0.000</td> <td>   -0.034    -0.020</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th>  <td>   -0.0297</td> <td>    0.011</td> <td>   -2.696</td> <td> 0.007</td> <td>   -0.051    -0.008</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th>  <td>    0.0056</td> <td>    0.004</td> <td>    1.239</td> <td> 0.216</td> <td>   -0.003     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>   -0.0043</td> <td>    0.003</td> <td>   -1.680</td> <td> 0.093</td> <td>   -0.009     0.001</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0203</td> <td>    0.006</td> <td>    3.331</td> <td> 0.001</td> <td>    0.008     0.032</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0063</td> <td>    0.006</td> <td>   -1.094</td> <td> 0.274</td> <td>   -0.018     0.005</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>             <td>   -0.0167</td> <td>    0.003</td> <td>   -5.378</td> <td> 0.000</td> <td>   -0.023    -0.011</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3086812\n",
       "Model:                          Logit   Df Residuals:                  3086804\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.500e-05\n",
       "Time:                        14:28:14   Log-Likelihood:            -2.1386e+06\n",
       "converged:                       True   LL-Null:                   -2.1386e+06\n",
       "                                        LLR p-value:                 3.965e-20\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0593      0.002     25.249      0.000         0.055     0.064\n",
       "C(fbrace)[T.2.0]     -0.0271      0.003     -7.980      0.000        -0.034    -0.020\n",
       "C(fbrace)[T.3.0]     -0.0297      0.011     -2.696      0.007        -0.051    -0.008\n",
       "C(fbrace)[T.4.0]      0.0056      0.004      1.239      0.216        -0.003     0.014\n",
       "C(pay_rec)[T.2.0]    -0.0043      0.003     -1.680      0.093        -0.009     0.001\n",
       "C(pay_rec)[T.3.0]     0.0203      0.006      3.331      0.001         0.008     0.032\n",
       "C(pay_rec)[T.4.0]    -0.0063      0.006     -1.094      0.274        -0.018     0.005\n",
       "fhisp                -0.0167      0.003     -5.378      0.000        -0.023    -0.011\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But the effect of prenatal visits is still a strong predictor of more girls."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692778\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   102.8 *\n",
      "C(fbrace)[T.3.0]       105.8   102.3 *\n",
      "C(fbrace)[T.4.0]       105.8   106.4  \n",
      "fhisp                  105.8   104.0 *\n",
      "previs                 105.8   104.8 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3155440</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3155434</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     5</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.997e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:28:40</td>     <th>  Log-Likelihood:    </th> <td>-2.1860e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1862e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>2.081e-73</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0567</td> <td>    0.001</td> <td>   38.800</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0295</td> <td>    0.003</td> <td>   -9.008</td> <td> 0.000</td> <td>   -0.036    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0341</td> <td>    0.011</td> <td>   -3.114</td> <td> 0.002</td> <td>   -0.056    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0058</td> <td>    0.004</td> <td>    1.314</td> <td> 0.189</td> <td>   -0.003     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0172</td> <td>    0.003</td> <td>   -5.862</td> <td> 0.000</td> <td>   -0.023    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0102</td> <td>    0.001</td> <td>  -16.235</td> <td> 0.000</td> <td>   -0.011    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3155440\n",
       "Model:                          Logit   Df Residuals:                  3155434\n",
       "Method:                           MLE   Df Model:                            5\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               7.997e-05\n",
       "Time:                        14:28:40   Log-Likelihood:            -2.1860e+06\n",
       "converged:                       True   LL-Null:                   -2.1862e+06\n",
       "                                        LLR p-value:                 2.081e-73\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0567      0.001     38.800      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0295      0.003     -9.008      0.000        -0.036    -0.023\n",
       "C(fbrace)[T.3.0]    -0.0341      0.011     -3.114      0.002        -0.056    -0.013\n",
       "C(fbrace)[T.4.0]     0.0058      0.004      1.314      0.189        -0.003     0.014\n",
       "fhisp               -0.0172      0.003     -5.862      0.000        -0.023    -0.011\n",
       "previs              -0.0102      0.001    -16.235      0.000        -0.011    -0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692776\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.9   102.8 *\n",
      "C(fbrace)[T.3.0]       105.9   102.3 *\n",
      "C(fbrace)[T.4.0]       105.9   106.5  \n",
      "fhisp                  105.9   104.1 *\n",
      "previs                 105.9   104.7 *\n",
      "no_previs              105.9   101.0 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3155440</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3155433</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.351e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:29:06</td>     <th>  Log-Likelihood:    </th> <td>-2.1860e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1862e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>8.674e-76</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0570</td> <td>    0.001</td> <td>   38.973</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0294</td> <td>    0.003</td> <td>   -8.984</td> <td> 0.000</td> <td>   -0.036    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0342</td> <td>    0.011</td> <td>   -3.123</td> <td> 0.002</td> <td>   -0.056    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0056</td> <td>    0.004</td> <td>    1.270</td> <td> 0.204</td> <td>   -0.003     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0171</td> <td>    0.003</td> <td>   -5.817</td> <td> 0.000</td> <td>   -0.023    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0111</td> <td>    0.001</td> <td>  -16.625</td> <td> 0.000</td> <td>   -0.012    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0469</td> <td>    0.012</td> <td>   -3.936</td> <td> 0.000</td> <td>   -0.070    -0.024</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3155440\n",
       "Model:                          Logit   Df Residuals:                  3155433\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.351e-05\n",
       "Time:                        14:29:06   Log-Likelihood:            -2.1860e+06\n",
       "converged:                       True   LL-Null:                   -2.1862e+06\n",
       "                                        LLR p-value:                 8.674e-76\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0570      0.001     38.973      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0294      0.003     -8.984      0.000        -0.036    -0.023\n",
       "C(fbrace)[T.3.0]    -0.0342      0.011     -3.123      0.002        -0.056    -0.013\n",
       "C(fbrace)[T.4.0]     0.0056      0.004      1.270      0.204        -0.003     0.014\n",
       "fhisp               -0.0171      0.003     -5.817      0.000        -0.023    -0.011\n",
       "previs              -0.0111      0.001    -16.625      0.000        -0.012    -0.010\n",
       "no_previs           -0.0469      0.012     -3.936      0.000        -0.070    -0.024\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### More controls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692778\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.3   102.1 *\n",
      "C(fbrace)[T.3.0]       105.3   101.7 *\n",
      "C(fbrace)[T.4.0]       105.3   106.0  \n",
      "fhisp                  105.3   103.5 *\n",
      "previs                 105.3   104.3 *\n",
      "dmar                   105.3   105.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3155440</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3155433</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.045e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:29:32</td>     <th>  Log-Likelihood:    </th> <td>-2.1860e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1862e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>6.525e-73</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0521</td> <td>    0.003</td> <td>   15.015</td> <td> 0.000</td> <td>    0.045     0.059</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0309</td> <td>    0.003</td> <td>   -9.058</td> <td> 0.000</td> <td>   -0.038    -0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0353</td> <td>    0.011</td> <td>   -3.210</td> <td> 0.001</td> <td>   -0.057    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0062</td> <td>    0.004</td> <td>    1.394</td> <td> 0.163</td> <td>   -0.002     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0181</td> <td>    0.003</td> <td>   -6.033</td> <td> 0.000</td> <td>   -0.024    -0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0102</td> <td>    0.001</td> <td>  -16.122</td> <td> 0.000</td> <td>   -0.011    -0.009</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>             <td>    0.0037</td> <td>    0.003</td> <td>    1.446</td> <td> 0.148</td> <td>   -0.001     0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3155440\n",
       "Model:                          Logit   Df Residuals:                  3155433\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.045e-05\n",
       "Time:                        14:29:32   Log-Likelihood:            -2.1860e+06\n",
       "converged:                       True   LL-Null:                   -2.1862e+06\n",
       "                                        LLR p-value:                 6.525e-73\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0521      0.003     15.015      0.000         0.045     0.059\n",
       "C(fbrace)[T.2.0]    -0.0309      0.003     -9.058      0.000        -0.038    -0.024\n",
       "C(fbrace)[T.3.0]    -0.0353      0.011     -3.210      0.001        -0.057    -0.014\n",
       "C(fbrace)[T.4.0]     0.0062      0.004      1.394      0.163        -0.002     0.015\n",
       "fhisp               -0.0181      0.003     -6.033      0.000        -0.024    -0.012\n",
       "previs              -0.0102      0.001    -16.122      0.000        -0.011    -0.009\n",
       "dmar                 0.0037      0.003      1.446      0.148        -0.001     0.009\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect of payment method disappears."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692777\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.8   102.8 *\n",
      "C(fbrace)[T.3.0]       105.8   102.2 *\n",
      "C(fbrace)[T.4.0]       105.8   106.3  \n",
      "C(pay_rec)[T.2.0]      105.8   105.9  \n",
      "C(pay_rec)[T.3.0]      105.8   106.9  \n",
      "C(pay_rec)[T.4.0]      105.8   105.0  \n",
      "fhisp                  105.8   104.0 *\n",
      "previs                 105.8   104.8 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3009712</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3009703</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     8</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.163e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:30:20</td>     <th>  Log-Likelihood:    </th> <td>-2.0851e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.0852e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.004e-68</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0566</td> <td>    0.002</td> <td>   23.765</td> <td> 0.000</td> <td>    0.052     0.061</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th>  <td>   -0.0295</td> <td>    0.003</td> <td>   -8.509</td> <td> 0.000</td> <td>   -0.036    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th>  <td>   -0.0345</td> <td>    0.011</td> <td>   -3.090</td> <td> 0.002</td> <td>   -0.056    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th>  <td>    0.0046</td> <td>    0.005</td> <td>    1.012</td> <td> 0.312</td> <td>   -0.004     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>    0.0005</td> <td>    0.003</td> <td>    0.174</td> <td> 0.862</td> <td>   -0.005     0.006</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0100</td> <td>    0.006</td> <td>    1.619</td> <td> 0.105</td> <td>   -0.002     0.022</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0074</td> <td>    0.006</td> <td>   -1.260</td> <td> 0.208</td> <td>   -0.019     0.004</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>             <td>   -0.0178</td> <td>    0.003</td> <td>   -5.687</td> <td> 0.000</td> <td>   -0.024    -0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>            <td>   -0.0101</td> <td>    0.001</td> <td>  -15.540</td> <td> 0.000</td> <td>   -0.011    -0.009</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3009712\n",
       "Model:                          Logit   Df Residuals:                  3009703\n",
       "Method:                           MLE   Df Model:                            8\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.163e-05\n",
       "Time:                        14:30:20   Log-Likelihood:            -2.0851e+06\n",
       "converged:                       True   LL-Null:                   -2.0852e+06\n",
       "                                        LLR p-value:                 1.004e-68\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0566      0.002     23.765      0.000         0.052     0.061\n",
       "C(fbrace)[T.2.0]     -0.0295      0.003     -8.509      0.000        -0.036    -0.023\n",
       "C(fbrace)[T.3.0]     -0.0345      0.011     -3.090      0.002        -0.056    -0.013\n",
       "C(fbrace)[T.4.0]      0.0046      0.005      1.012      0.312        -0.004     0.014\n",
       "C(pay_rec)[T.2.0]     0.0005      0.003      0.174      0.862        -0.005     0.006\n",
       "C(pay_rec)[T.3.0]     0.0100      0.006      1.619      0.105        -0.002     0.022\n",
       "C(pay_rec)[T.4.0]    -0.0074      0.006     -1.260      0.208        -0.019     0.004\n",
       "fhisp                -0.0178      0.003     -5.687      0.000        -0.024    -0.012\n",
       "previs               -0.0101      0.001    -15.540      0.000        -0.011    -0.009\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a version with the addition of a boolean for no prenatal visits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692776\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       105.9   102.8 *\n",
      "C(fbrace)[T.3.0]       105.9   102.3 *\n",
      "C(fbrace)[T.4.0]       105.9   106.5  \n",
      "fhisp                  105.9   104.1 *\n",
      "previs                 105.9   104.7 *\n",
      "no_previs              105.9   101.0 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3155440</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3155433</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     6</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.351e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:30:47</td>     <th>  Log-Likelihood:    </th> <td>-2.1860e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1862e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>8.674e-76</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0570</td> <td>    0.001</td> <td>   38.973</td> <td> 0.000</td> <td>    0.054     0.060</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0294</td> <td>    0.003</td> <td>   -8.984</td> <td> 0.000</td> <td>   -0.036    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0342</td> <td>    0.011</td> <td>   -3.123</td> <td> 0.002</td> <td>   -0.056    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0056</td> <td>    0.004</td> <td>    1.270</td> <td> 0.204</td> <td>   -0.003     0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0171</td> <td>    0.003</td> <td>   -5.817</td> <td> 0.000</td> <td>   -0.023    -0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0111</td> <td>    0.001</td> <td>  -16.625</td> <td> 0.000</td> <td>   -0.012    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0469</td> <td>    0.012</td> <td>   -3.936</td> <td> 0.000</td> <td>   -0.070    -0.024</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3155440\n",
       "Model:                          Logit   Df Residuals:                  3155433\n",
       "Method:                           MLE   Df Model:                            6\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.351e-05\n",
       "Time:                        14:30:47   Log-Likelihood:            -2.1860e+06\n",
       "converged:                       True   LL-Null:                   -2.1862e+06\n",
       "                                        LLR p-value:                 8.674e-76\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0570      0.001     38.973      0.000         0.054     0.060\n",
       "C(fbrace)[T.2.0]    -0.0294      0.003     -8.984      0.000        -0.036    -0.023\n",
       "C(fbrace)[T.3.0]    -0.0342      0.011     -3.123      0.002        -0.056    -0.013\n",
       "C(fbrace)[T.4.0]     0.0056      0.004      1.270      0.204        -0.003     0.014\n",
       "fhisp               -0.0171      0.003     -5.817      0.000        -0.023    -0.011\n",
       "previs              -0.0111      0.001    -16.625      0.000        -0.012    -0.010\n",
       "no_previs           -0.0469      0.012     -3.936      0.000        -0.070    -0.024\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, surprisingly, the mother's age has a small effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692775\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       106.8   103.6 *\n",
      "C(fbrace)[T.3.0]       106.8   103.1 *\n",
      "C(fbrace)[T.4.0]       106.8   107.4  \n",
      "fhisp                  106.8   104.9 *\n",
      "previs                 106.8   105.6 *\n",
      "no_previs              106.8   101.9 *\n",
      "mager9                 106.8   106.6 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3155440</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3155432</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.440e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:31:14</td>     <th>  Log-Likelihood:    </th> <td>-2.1860e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1862e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>1.043e-75</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0656</td> <td>    0.005</td> <td>   14.344</td> <td> 0.000</td> <td>    0.057     0.075</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0300</td> <td>    0.003</td> <td>   -9.123</td> <td> 0.000</td> <td>   -0.036    -0.024</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0351</td> <td>    0.011</td> <td>   -3.200</td> <td> 0.001</td> <td>   -0.057    -0.014</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0062</td> <td>    0.004</td> <td>    1.413</td> <td> 0.158</td> <td>   -0.002     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0176</td> <td>    0.003</td> <td>   -5.974</td> <td> 0.000</td> <td>   -0.023    -0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0110</td> <td>    0.001</td> <td>  -16.456</td> <td> 0.000</td> <td>   -0.012    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0468</td> <td>    0.012</td> <td>   -3.926</td> <td> 0.000</td> <td>   -0.070    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>           <td>   -0.0019</td> <td>    0.001</td> <td>   -1.970</td> <td> 0.049</td> <td>   -0.004 -9.69e-06</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3155440\n",
       "Model:                          Logit   Df Residuals:                  3155432\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.440e-05\n",
       "Time:                        14:31:14   Log-Likelihood:            -2.1860e+06\n",
       "converged:                       True   LL-Null:                   -2.1862e+06\n",
       "                                        LLR p-value:                 1.043e-75\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0656      0.005     14.344      0.000         0.057     0.075\n",
       "C(fbrace)[T.2.0]    -0.0300      0.003     -9.123      0.000        -0.036    -0.024\n",
       "C(fbrace)[T.3.0]    -0.0351      0.011     -3.200      0.001        -0.057    -0.014\n",
       "C(fbrace)[T.4.0]     0.0062      0.004      1.413      0.158        -0.002     0.015\n",
       "fhisp               -0.0176      0.003     -5.974      0.000        -0.023    -0.012\n",
       "previs              -0.0110      0.001    -16.456      0.000        -0.012    -0.010\n",
       "no_previs           -0.0468      0.012     -3.926      0.000        -0.070    -0.023\n",
       "mager9              -0.0019      0.001     -1.970      0.049        -0.004 -9.69e-06\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So does the father's age.  But both age effects are small and borderline significant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692775\n",
      "         Iterations 3\n",
      "C(fbrace)[T.2.0]       106.9   103.7 *\n",
      "C(fbrace)[T.3.0]       106.9   103.2 *\n",
      "C(fbrace)[T.4.0]       106.9   107.6  \n",
      "fhisp                  106.9   105.0 *\n",
      "previs                 106.9   105.7 *\n",
      "no_previs              106.9   101.8 *\n",
      "fagerrec11             106.9   106.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>3148537</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>3148529</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     7</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>8.517e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:32:34</td>     <th>  Log-Likelihood:    </th> <td>-2.1812e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-2.1814e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>2.924e-76</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>        <td>    0.0663</td> <td>    0.004</td> <td>   15.399</td> <td> 0.000</td> <td>    0.058     0.075</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.2.0]</th> <td>   -0.0299</td> <td>    0.003</td> <td>   -9.100</td> <td> 0.000</td> <td>   -0.036    -0.023</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.3.0]</th> <td>   -0.0348</td> <td>    0.011</td> <td>   -3.170</td> <td> 0.002</td> <td>   -0.056    -0.013</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(fbrace)[T.4.0]</th> <td>    0.0067</td> <td>    0.004</td> <td>    1.518</td> <td> 0.129</td> <td>   -0.002     0.015</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fhisp</th>            <td>   -0.0176</td> <td>    0.003</td> <td>   -5.974</td> <td> 0.000</td> <td>   -0.023    -0.012</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>           <td>   -0.0110</td> <td>    0.001</td> <td>  -16.545</td> <td> 0.000</td> <td>   -0.012    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th>        <td>   -0.0483</td> <td>    0.012</td> <td>   -4.039</td> <td> 0.000</td> <td>   -0.072    -0.025</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>fagerrec11</th>       <td>   -0.0019</td> <td>    0.001</td> <td>   -2.278</td> <td> 0.023</td> <td>   -0.003    -0.000</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              3148537\n",
       "Model:                          Logit   Df Residuals:                  3148529\n",
       "Method:                           MLE   Df Model:                            7\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               8.517e-05\n",
       "Time:                        14:32:34   Log-Likelihood:            -2.1812e+06\n",
       "converged:                       True   LL-Null:                   -2.1814e+06\n",
       "                                        LLR p-value:                 2.924e-76\n",
       "====================================================================================\n",
       "                       coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------------\n",
       "Intercept            0.0663      0.004     15.399      0.000         0.058     0.075\n",
       "C(fbrace)[T.2.0]    -0.0299      0.003     -9.100      0.000        -0.036    -0.023\n",
       "C(fbrace)[T.3.0]    -0.0348      0.011     -3.170      0.002        -0.056    -0.013\n",
       "C(fbrace)[T.4.0]     0.0067      0.004      1.518      0.129        -0.002     0.015\n",
       "fhisp               -0.0176      0.003     -5.974      0.000        -0.023    -0.012\n",
       "previs              -0.0110      0.001    -16.545      0.000        -0.012    -0.010\n",
       "no_previs           -0.0483      0.012     -4.039      0.000        -0.072    -0.025\n",
       "fagerrec11          -0.0019      0.001     -2.278      0.023        -0.003    -0.000\n",
       "====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')\n",
    "model = smf.logit(formula, data=df)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What's up with prenatal visits?\n",
    "\n",
    "The predictive power of prenatal visits is still surprising to me.  To make sure we're controlled for race, I'll select cases where both parents are white:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2400787"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "white = df[(df.mbrace==1) & (df.fbrace==1)]\n",
    "len(white)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And compute sex ratios for each level of `previs`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 111,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>boy</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>previs</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-6</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-5</th>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-4</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-3</th>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-2</th>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>-1</th>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>103</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        boy\n",
       "previs     \n",
       "-6      107\n",
       "-5      110\n",
       "-4      108\n",
       "-3      110\n",
       "-2      108\n",
       "-1      107\n",
       " 0      105\n",
       " 1      103\n",
       " 2      103\n",
       " 3      102\n",
       " 4      103"
      ]
     },
     "execution_count": 111,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "var = 'previs'\n",
    "white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The effect holds up.  People with fewer than average prenatal visits are substantially more likely to have boys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692749\n",
      "         Iterations 3\n",
      "previs                 105.5   104.3 *\n",
      "no_previs              105.5   100.4 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2346785</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2346782</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>6.418e-05</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>14:40:39</td>     <th>  Log-Likelihood:    </th> <td>-1.6257e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6258e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>4.790e-46</td> \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0534</td> <td>    0.001</td> <td>   40.728</td> <td> 0.000</td> <td>    0.051     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>previs</th>    <td>   -0.0113</td> <td>    0.001</td> <td>  -14.378</td> <td> 0.000</td> <td>   -0.013    -0.010</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>no_previs</th> <td>   -0.0490</td> <td>    0.015</td> <td>   -3.352</td> <td> 0.001</td> <td>   -0.078    -0.020</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2346785\n",
       "Model:                          Logit   Df Residuals:                  2346782\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               6.418e-05\n",
       "Time:                        14:40:39   Log-Likelihood:            -1.6257e+06\n",
       "converged:                       True   LL-Null:                   -1.6258e+06\n",
       "                                        LLR p-value:                 4.790e-46\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0534      0.001     40.728      0.000         0.051     0.056\n",
       "previs        -0.0113      0.001    -14.378      0.000        -0.013    -0.010\n",
       "no_previs     -0.0490      0.015     -3.352      0.001        -0.078    -0.020\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 112,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ previs + no_previs')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.053449172473506806, -0.011302385985286368)"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inter = results.params['Intercept']\n",
    "slope = results.params['previs']\n",
    "inter, slope"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 114,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 111.62346508,  110.36895641,  109.12854687,  107.90207798,\n",
       "        106.68939307,  105.49033723,  104.30475728,  103.13250177,\n",
       "        101.97342096,  100.82736677])"
      ]
     },
     "execution_count": 114,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "previs = np.arange(-5, 5)\n",
    "logodds = inter + slope * previs\n",
    "odds = np.exp(logodds)\n",
    "odds * 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692788\n",
      "         Iterations 3\n",
      "dmar                   105.3   105.5  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2400787</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2400785</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>7.406e-08</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:27:21</td>     <th>  Log-Likelihood:    </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.6196</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0518</td> <td>    0.004</td> <td>   13.234</td> <td> 0.000</td> <td>    0.044     0.059</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>dmar</th>      <td>    0.0014</td> <td>    0.003</td> <td>    0.496</td> <td> 0.620</td> <td>   -0.004     0.007</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2400787\n",
       "Model:                          Logit   Df Residuals:                  2400785\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               7.406e-08\n",
       "Time:                        15:27:21   Log-Likelihood:            -1.6632e+06\n",
       "converged:                       True   LL-Null:                   -1.6632e+06\n",
       "                                        LLR p-value:                    0.6196\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0518      0.004     13.234      0.000         0.044     0.059\n",
       "dmar           0.0014      0.003      0.496      0.620        -0.004     0.007\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 116,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ dmar')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692788\n",
      "         Iterations 3\n",
      "lowed                  105.6   105.0  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2301234</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2301232</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.759e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:28:01</td>     <th>  Log-Likelihood:    </th> <td>-1.5943e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.5943e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2180</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0542</td> <td>    0.001</td> <td>   38.603</td> <td> 0.000</td> <td>    0.051     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>lowed</th>     <td>   -0.0051</td> <td>    0.004</td> <td>   -1.232</td> <td> 0.218</td> <td>   -0.013     0.003</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2301234\n",
       "Model:                          Logit   Df Residuals:                  2301232\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               4.759e-07\n",
       "Time:                        15:28:01   Log-Likelihood:            -1.5943e+06\n",
       "converged:                       True   LL-Null:                   -1.5943e+06\n",
       "                                        LLR p-value:                    0.2180\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0542      0.001     38.603      0.000         0.051     0.057\n",
       "lowed         -0.0051      0.004     -1.232      0.218        -0.013     0.003\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ lowed')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692788\n",
      "         Iterations 3\n",
      "highbo                 105.5   105.6  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2391630</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2391628</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>4.564e-09</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:28:25</td>     <th>  Log-Likelihood:    </th> <td>-1.6569e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6569e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.9021</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0535</td> <td>    0.001</td> <td>   40.493</td> <td> 0.000</td> <td>    0.051     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>highbo</th>    <td>    0.0008</td> <td>    0.006</td> <td>    0.123</td> <td> 0.902</td> <td>   -0.012     0.013</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2391630\n",
       "Model:                          Logit   Df Residuals:                  2391628\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               4.564e-09\n",
       "Time:                        15:28:25   Log-Likelihood:            -1.6569e+06\n",
       "converged:                       True   LL-Null:                   -1.6569e+06\n",
       "                                        LLR p-value:                    0.9021\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0535      0.001     40.493      0.000         0.051     0.056\n",
       "highbo         0.0008      0.006      0.123      0.902        -0.012     0.013\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ highbo')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692786\n",
      "         Iterations 3\n",
      "wic[T.Y]               105.6   105.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2266424</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2266422</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>3.840e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:28:57</td>     <th>  Log-Likelihood:    </th> <td>-1.5701e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.5701e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.2721</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0548</td> <td>    0.002</td> <td>   33.369</td> <td> 0.000</td> <td>    0.052     0.058</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>wic[T.Y]</th>  <td>   -0.0031</td> <td>    0.003</td> <td>   -1.098</td> <td> 0.272</td> <td>   -0.009     0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2266424\n",
       "Model:                          Logit   Df Residuals:                  2266422\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               3.840e-07\n",
       "Time:                        15:28:57   Log-Likelihood:            -1.5701e+06\n",
       "converged:                       True   LL-Null:                   -1.5701e+06\n",
       "                                        LLR p-value:                    0.2721\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0548      0.002     33.369      0.000         0.052     0.058\n",
       "wic[T.Y]      -0.0031      0.003     -1.098      0.272        -0.009     0.002\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ wic')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 120,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692788\n",
      "         Iterations 3\n",
      "obese                  105.6   105.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2244349</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2244347</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.725e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:29:20</td>     <th>  Log-Likelihood:    </th> <td>-1.5549e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.5549e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.4639</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0542</td> <td>    0.002</td> <td>   35.607</td> <td> 0.000</td> <td>    0.051     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>obese</th>     <td>   -0.0023</td> <td>    0.003</td> <td>   -0.732</td> <td> 0.464</td> <td>   -0.009     0.004</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2244349\n",
       "Model:                          Logit   Df Residuals:                  2244347\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.725e-07\n",
       "Time:                        15:29:20   Log-Likelihood:            -1.5549e+06\n",
       "converged:                       True   LL-Null:                   -1.5549e+06\n",
       "                                        LLR p-value:                    0.4639\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0542      0.002     35.607      0.000         0.051     0.057\n",
       "obese         -0.0023      0.003     -0.732      0.464        -0.009     0.004\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 120,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ obese')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692786\n",
      "         Iterations 3\n",
      "C(pay_rec)[T.2.0]      105.4   105.5  \n",
      "C(pay_rec)[T.3.0]      105.4   107.1 *\n",
      "C(pay_rec)[T.4.0]      105.4   105.3  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2295681</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2295677</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     3</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.666e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:30:06</td>     <th>  Log-Likelihood:    </th> <td>-1.5904e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.5904e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.1511</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "          <td></td>             <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>         <td>    0.0529</td> <td>    0.002</td> <td>   23.356</td> <td> 0.000</td> <td>    0.048     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.2.0]</th> <td>    0.0004</td> <td>    0.003</td> <td>    0.147</td> <td> 0.883</td> <td>   -0.005     0.006</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.3.0]</th> <td>    0.0159</td> <td>    0.007</td> <td>    2.235</td> <td> 0.025</td> <td>    0.002     0.030</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>C(pay_rec)[T.4.0]</th> <td>   -0.0013</td> <td>    0.007</td> <td>   -0.197</td> <td> 0.844</td> <td>   -0.015     0.012</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2295681\n",
       "Model:                          Logit   Df Residuals:                  2295677\n",
       "Method:                           MLE   Df Model:                            3\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.666e-06\n",
       "Time:                        15:30:06   Log-Likelihood:            -1.5904e+06\n",
       "converged:                       True   LL-Null:                   -1.5904e+06\n",
       "                                        LLR p-value:                    0.1511\n",
       "=====================================================================================\n",
       "                        coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "-------------------------------------------------------------------------------------\n",
       "Intercept             0.0529      0.002     23.356      0.000         0.048     0.057\n",
       "C(pay_rec)[T.2.0]     0.0004      0.003      0.147      0.883        -0.005     0.006\n",
       "C(pay_rec)[T.3.0]     0.0159      0.007      2.235      0.025         0.002     0.030\n",
       "C(pay_rec)[T.4.0]    -0.0013      0.007     -0.197      0.844        -0.015     0.012\n",
       "=====================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ C(pay_rec)')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692786\n",
      "         Iterations 3\n",
      "mager9                 107.0   106.7 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2400787</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2400785</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     1</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.516e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:30:32</td>     <th>  Log-Likelihood:    </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>  <td>0.003813</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0677</td> <td>    0.005</td> <td>   13.452</td> <td> 0.000</td> <td>    0.058     0.078</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>mager9</th>    <td>   -0.0032</td> <td>    0.001</td> <td>   -2.893</td> <td> 0.004</td> <td>   -0.005    -0.001</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2400787\n",
       "Model:                          Logit   Df Residuals:                  2400785\n",
       "Method:                           MLE   Df Model:                            1\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.516e-06\n",
       "Time:                        15:30:32   Log-Likelihood:            -1.6632e+06\n",
       "converged:                       True   LL-Null:                   -1.6632e+06\n",
       "                                        LLR p-value:                  0.003813\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0677      0.005     13.452      0.000         0.058     0.078\n",
       "mager9        -0.0032      0.001     -2.893      0.004        -0.005    -0.001\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ mager9')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692787\n",
      "         Iterations 3\n",
      "youngm[T.True]         105.6   105.5  \n",
      "oldm[T.True]           105.6   103.8 *\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2400787</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2400784</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>1.549e-06</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:31:04</td>     <th>  Log-Likelihood:    </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6632e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.07608</td>  \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "         <td></td>           <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th>      <td>    0.0542</td> <td>    0.001</td> <td>   40.370</td> <td> 0.000</td> <td>    0.052     0.057</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngm[T.True]</th> <td>   -0.0011</td> <td>    0.006</td> <td>   -0.170</td> <td> 0.865</td> <td>   -0.013     0.011</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldm[T.True]</th>   <td>   -0.0173</td> <td>    0.008</td> <td>   -2.268</td> <td> 0.023</td> <td>   -0.032    -0.002</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2400787\n",
       "Model:                          Logit   Df Residuals:                  2400784\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               1.549e-06\n",
       "Time:                        15:31:04   Log-Likelihood:            -1.6632e+06\n",
       "converged:                       True   LL-Null:                   -1.6632e+06\n",
       "                                        LLR p-value:                   0.07608\n",
       "==================================================================================\n",
       "                     coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "----------------------------------------------------------------------------------\n",
       "Intercept          0.0542      0.001     40.370      0.000         0.052     0.057\n",
       "youngm[T.True]    -0.0011      0.006     -0.170      0.865        -0.013     0.011\n",
       "oldm[T.True]      -0.0173      0.008     -2.268      0.023        -0.032    -0.002\n",
       "==================================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ youngm + oldm')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.692787\n",
      "         Iterations 3\n",
      "youngf                 105.5   106.4  \n",
      "oldf                   105.5   105.7  \n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>Logit Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>        <td>boy</td>       <th>  No. Observations:  </th>   <td>2396141</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>               <td>Logit</td>      <th>  Df Residuals:      </th>   <td>2396138</td>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>               <td>MLE</td>       <th>  Df Model:          </th>   <td>     2</td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>          <td>Tue, 17 May 2016</td> <th>  Pseudo R-squ.:     </th>  <td>2.717e-07</td> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>              <td>15:31:50</td>     <th>  Log-Likelihood:    </th> <td>-1.6600e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>converged:</th>           <td>True</td>       <th>  LL-Null:           </th> <td>-1.6600e+06</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th> </th>                      <td> </td>        <th>  LLR p-value:       </th>   <td>0.6370</td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "      <td></td>         <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Intercept</th> <td>    0.0534</td> <td>    0.001</td> <td>   40.229</td> <td> 0.000</td> <td>    0.051     0.056</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>youngf</th>    <td>    0.0082</td> <td>    0.009</td> <td>    0.924</td> <td> 0.355</td> <td>   -0.009     0.026</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>oldf</th>      <td>    0.0018</td> <td>    0.008</td> <td>    0.242</td> <td> 0.809</td> <td>   -0.013     0.017</td>\n",
       "</tr>\n",
       "</table>"
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                           Logit Regression Results                           \n",
       "==============================================================================\n",
       "Dep. Variable:                    boy   No. Observations:              2396141\n",
       "Model:                          Logit   Df Residuals:                  2396138\n",
       "Method:                           MLE   Df Model:                            2\n",
       "Date:                Tue, 17 May 2016   Pseudo R-squ.:               2.717e-07\n",
       "Time:                        15:31:50   Log-Likelihood:            -1.6600e+06\n",
       "converged:                       True   LL-Null:                   -1.6600e+06\n",
       "                                        LLR p-value:                    0.6370\n",
       "==============================================================================\n",
       "                 coef    std err          z      P>|z|      [95.0% Conf. Int.]\n",
       "------------------------------------------------------------------------------\n",
       "Intercept      0.0534      0.001     40.229      0.000         0.051     0.056\n",
       "youngf         0.0082      0.009      0.924      0.355        -0.009     0.026\n",
       "oldf           0.0018      0.008      0.242      0.809        -0.013     0.017\n",
       "==============================================================================\n",
       "\"\"\""
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "formula = ('boy ~ youngf + oldf')\n",
    "model = smf.logit(formula, data=white)\n",
    "results = model.fit()\n",
    "summarize(results)\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}