{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Does Trivers-Willard apply to people?\n", "\n", "This notebook contains a \"one-day paper\", my attempt to pose a research question, answer it, and publish the results in one work day.\n", "\n", "Copyright 2016 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function, division\n", "\n", "import thinkstats2\n", "import thinkplot\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import statsmodels.formula.api as smf\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trivers-Willard\n", "\n", "[According to Wikipedia](https://en.wikipedia.org/wiki/Trivers%E2%80%93Willard_hypothesis), the Trivers-Willard hypothesis:\n", "\n", ">\"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition).\"\n", "\n", "For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.\n", "\n", "To test whether the T-W hypothesis holds up in humans, I downloaded [birth data for the nearly 4 million babies born in the U.S. in 2014](http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Births).\n", "\n", "I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Summary of results**\n", "\n", "1. Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.\n", "\n", "2. However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.\n", "\n", "3. Contrary to other reports, the age of the parents seems to have no predictive power.\n", "\n", "4. Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).\n", "\n", "Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data cleaning\n", "\n", "Here's how I loaded the data:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "names = ['year', 'mager9', 'restatus', 'mbrace', 'mhisp_r',\n", " 'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', \n", " 'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']\n", "colspecs = [(15, 18),\n", " (93, 93),\n", " (138, 138),\n", " (143, 143),\n", " (148, 148),\n", " (152, 152),\n", " (153, 153),\n", " (155, 155),\n", " (186, 187),\n", " (191, 191),\n", " (195, 195),\n", " (197, 197),\n", " (212, 212),\n", " (272, 273),\n", " (281, 281),\n", " (555, 556),\n", " (533, 533),\n", " (413, 413),\n", " (436, 436),\n", " ]\n", "\n", "colspecs = [(start-1, end) for start, end in colspecs]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = None" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yearmager9restatusmbracemhisp_rmar_pdmarmeducfagerrec11fbracefhisp_rfeduclbo_recprevis_recwicheightbmi_rpay_recsex
020126110NaN1NaN510NaN26NaNNaNNaNNaNM
120123130NaN2NaN430NaN25NaNNaNNaNNaNF
220122120NaN2NaN320NaN17NaNNaNNaNNaNM
320123110NaN1NaN310NaN97NaNNaNNaNNaNM
420124140NaN1NaN410NaN37NaNNaNNaNNaNF
\n", "
" ], "text/plain": [ " year mager9 restatus mbrace mhisp_r mar_p dmar meduc fagerrec11 \\\n", "0 2012 6 1 1 0 NaN 1 NaN 5 \n", "1 2012 3 1 3 0 NaN 2 NaN 4 \n", "2 2012 2 1 2 0 NaN 2 NaN 3 \n", "3 2012 3 1 1 0 NaN 1 NaN 3 \n", "4 2012 4 1 4 0 NaN 1 NaN 4 \n", "\n", " fbrace fhisp_r feduc lbo_rec previs_rec wic height bmi_r pay_rec \\\n", "0 1 0 NaN 2 6 NaN NaN NaN NaN \n", "1 3 0 NaN 2 5 NaN NaN NaN NaN \n", "2 2 0 NaN 1 7 NaN NaN NaN NaN \n", "3 1 0 NaN 9 7 NaN NaN NaN NaN \n", "4 1 0 NaN 3 7 NaN NaN NaN NaN \n", "\n", " sex \n", "0 M \n", "1 F \n", "2 M \n", "3 M \n", "4 F " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filename = 'Nat2012PublicUS.r20131217.gz'\n", "#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)\n", "#df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/downey/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:3066: PerformanceWarning: \n", "your performance may suffer as PyTables will pickle object types that it cannot\n", "map directly to c-types [inferred_type->mixed,key->block2_values] [items->['mar_p', 'wic', 'sex']]\n", "\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] } ], "source": [ "# store the dataframe for faster loading\n", "\n", "#store = pd.HDFStore('store.h5')\n", "#store['births2013'] = df\n", "#store.close()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# load the dataframe\n", "\n", "store = pd.HDFStore('store.h5')\n", "df = store['births2013']\n", "store.close()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def series_to_ratio(series):\n", " \"\"\"Takes a boolean series and computes sex ratio.\n", " \"\"\"\n", " boys = np.mean(series)\n", " return np.round(100 * boys / (1-boys)).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I have to recode sex as `0` or `1` to make `logit` happy." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1935228\n", "1 2025568\n", "Name: boy, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['boy'] = (df.sex=='M').astype(int)\n", "df.boy.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All births are from 2014." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2012 3960796\n", "Name: year, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.year.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's age:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 3676\n", "2 305837\n", "3 918221\n", "4 1126139\n", "5 1015784\n", "6 473533\n", "7 109807\n", "8 7187\n", "9 612\n", "Name: mager9, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mager9.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mager9
1112
2106
3104
4105
5105
6105
7105
8100
9112
\n", "
" ], "text/plain": [ " boy\n", "mager9 \n", "1 112\n", "2 106\n", "3 104\n", "4 105\n", "5 105\n", "6 105\n", "7 105\n", "8 100\n", "9 112" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mager9'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mager9.isnull().mean()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.078144140723228367, 0.029692516352773535)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['youngm'] = df.mager9<=2\n", "df['oldm'] = df.mager9>=7\n", "df.youngm.mean(), df.oldm.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Residence status (1=resident)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2874513\n", "2 993222\n", "3 85106\n", "4 7955\n", "Name: restatus, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.restatus.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
restatus
1105
2105
3105
4108
\n", "
" ], "text/plain": [ " boy\n", "restatus \n", "1 105\n", "2 105\n", "3 105\n", "4 108" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'restatus'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 3007229\n", "2 634411\n", "3 46105\n", "4 273051\n", "Name: mbrace, dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mbrace.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mbrace
1105
2103
3104
4106
\n", "
" ], "text/plain": [ " boy\n", "mbrace \n", "1 105\n", "2 103\n", "3 104\n", "4 106" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mbrace'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's Hispanic origin (0=Non-Hispanic)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 3015510\n", "1 562250\n", "2 67192\n", "3 17400\n", "4 131955\n", "5 135597\n", "Name: mhisp_r, dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mhisp_r.replace([9], np.nan, inplace=True)\n", "df.mhisp_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def copy_null(df, oldvar, newvar):\n", " df.loc[df[oldvar].isnull(), newvar] = np.nan" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.0077994423343186571, 0.23267591269405055)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mhisp'] = df.mhisp_r > 0\n", "copy_null(df, 'mhisp_r', 'mhisp')\n", "df.mhisp.isnull().mean(), df.mhisp.mean()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mhisp
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "mhisp \n", "0 105\n", "1 104" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mhisp'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Marital status (1=Married)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2349102\n", "2 1611694\n", "Name: dmar, dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dmar.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
dmar
1105
2104
\n", "
" ], "text/plain": [ " boy\n", "dmar \n", "1 105\n", "2 104" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'dmar'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).\n", "\n", "I recode X (not applicable because married) as Y (paternity acknowledged)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "N 430123\n", "Y 3058398\n", "Name: mar_p, dtype: int64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mar_p.replace(['U'], np.nan, inplace=True)\n", "df.mar_p.replace(['X'], 'Y', inplace=True)\n", "df.mar_p.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mar_p
N103
Y105
\n", "
" ], "text/plain": [ " boy\n", "mar_p \n", "N 103\n", "Y 105" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mar_p'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's education level" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 144045\n", "2 443007\n", "3 858548\n", "4 732444\n", "5 266066\n", "6 644497\n", "7 282351\n", "8 81074\n", "Name: meduc, dtype: int64" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.meduc.replace([9], np.nan, inplace=True)\n", "df.meduc.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
meduc
1103
2104
3105
4105
5105
6105
7105
8107
\n", "
" ], "text/plain": [ " boy\n", "meduc \n", "1 103\n", "2 104\n", "3 105\n", "4 105\n", "5 105\n", "6 105\n", "7 105\n", "8 107" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'meduc'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.12844993784077746, 0.17005983722051243)" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['lowed'] = df.meduc <= 2\n", "copy_null(df, 'meduc', 'lowed')\n", "df.lowed.isnull().mean(), df.lowed.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's age, in 10 ranges" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 422\n", "2 104428\n", "3 527157\n", "4 871442\n", "5 977564\n", "6 591733\n", "7 257619\n", "8 84016\n", "9 26361\n", "10 11389\n", "Name: fagerrec11, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fagerrec11.replace([11], np.nan, inplace=True)\n", "df.fagerrec11.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fagerrec11
1101
2106
3105
4105
5105
6105
7105
8104
9104
10107
\n", "
" ], "text/plain": [ " boy\n", "fagerrec11 \n", "1 101\n", "2 106\n", "3 105\n", "4 105\n", "5 105\n", "6 105\n", "7 105\n", "8 104\n", "9 104\n", "10 107" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fagerrec11'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.12842494286502007, 0.03037254379975731)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['youngf'] = df.fagerrec11<=2\n", "copy_null(df, 'fagerrec11', 'youngf')\n", "df.youngf.isnull().mean(), df.youngf.mean()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.12842494286502007, 0.03527270546801382)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['oldf'] = df.fagerrec11>=8\n", "copy_null(df, 'fagerrec11', 'oldf')\n", "df.oldf.isnull().mean(), df.oldf.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's race" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2475018\n", "2 469930\n", "3 35175\n", "4 227463\n", "Name: fbrace, dtype: int64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fbrace.replace([9], np.nan, inplace=True)\n", "df.fbrace.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fbrace
1105
2103
3105
4107
\n", "
" ], "text/plain": [ " boy\n", "fbrace \n", "1 105\n", "2 103\n", "3 105\n", "4 107" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fbrace'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 2603738\n", "1 500926\n", "2 57417\n", "3 16953\n", "4 105376\n", "5 116056\n", "Name: fhisp_r, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fhisp_r.replace([9], np.nan, inplace=True)\n", "df.fhisp_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.14146903804184816, 0.23429965187124352)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['fhisp'] = df.fhisp_r > 0\n", "copy_null(df, 'fhisp_r', 'fhisp')\n", "df.fhisp.isnull().mean(), df.fhisp.mean()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fhisp
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "fhisp \n", "0 105\n", "1 104" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fhisp'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's education level" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 142003\n", "2 336801\n", "3 852923\n", "4 574580\n", "5 201888\n", "6 544487\n", "7 210268\n", "8 97452\n", "Name: feduc, dtype: int64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.feduc.replace([9], np.nan, inplace=True)\n", "df.feduc.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
feduc
1103
2105
3105
4105
5105
6105
7106
8105
\n", "
" ], "text/plain": [ " boy\n", "feduc \n", "1 103\n", "2 105\n", "3 105\n", "4 105\n", "5 105\n", "6 105\n", "7 106\n", "8 105" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'feduc'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Live birth order." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 1574534\n", "2 1248053\n", "3 651817\n", "4 276179\n", "5 106197\n", "6 43907\n", "7 19899\n", "8 20268\n", "Name: lbo_rec, dtype: int64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.lbo_rec.replace([9], np.nan, inplace=True)\n", "df.lbo_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
lbo_rec
1105
2105
3104
4103
5104
6101
7106
8103
\n", "
" ], "text/plain": [ " boy\n", "lbo_rec \n", "1 105\n", "2 105\n", "3 104\n", "4 103\n", "5 104\n", "6 101\n", "7 106\n", "8 103" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'lbo_rec'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.0050348465308488492, 0.04828166686713083)" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['highbo'] = df.lbo_rec >= 5\n", "copy_null(df, 'lbo_rec', 'highbo')\n", "df.highbo.isnull().mean(), df.highbo.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Number of prenatal visits, in 11 ranges" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 53862\n", "2 39409\n", "3 90791\n", "4 191909\n", "5 361056\n", "6 809787\n", "7 1023277\n", "8 659674\n", "9 385390\n", "10 98941\n", "11 124582\n", "Name: previs_rec, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.previs_rec.replace([12], np.nan, inplace=True)\n", "df.previs_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.previs_rec.mean()\n", "df['previs'] = df.previs_rec - 7" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
previs
-6106
-5107
-4107
-3108
-2107
-1106
0105
1103
2102
3100
4102
\n", "
" ], "text/plain": [ " boy\n", "previs \n", "-6 106\n", "-5 107\n", "-4 107\n", "-3 108\n", "-2 107\n", "-1 106\n", " 0 105\n", " 1 103\n", " 2 102\n", " 3 100\n", " 4 102" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'previs'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.030831681308504656, 0.014031393099395157)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['no_previs'] = df.previs_rec <= 1\n", "copy_null(df, 'previs_rec', 'no_previs')\n", "df.no_previs.isnull().mean(), df.no_previs.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whether the mother is eligible for food stamps" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "N 1820030\n", "Y 1591601\n", "Name: wic, dtype: int64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.wic.replace(['U'], np.nan, inplace=True)\n", "df.wic.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
wic
N105
Y104
\n", "
" ], "text/plain": [ " boy\n", "wic \n", "N 105\n", "Y 104" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'wic'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's height in inches" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "30 14\n", "31 3\n", "32 2\n", "34 1\n", "36 17\n", "37 5\n", "38 9\n", "39 4\n", "40 13\n", "41 18\n", "42 8\n", "43 8\n", "44 6\n", "45 15\n", "46 9\n", "47 21\n", "48 732\n", "49 505\n", "50 335\n", "51 414\n", "52 480\n", "53 1384\n", "54 1434\n", "55 2561\n", "56 6587\n", "57 17396\n", "58 19343\n", "59 71557\n", "60 190472\n", "61 240815\n", "62 424926\n", "63 442238\n", "64 505897\n", "65 404563\n", "66 390878\n", "67 303110\n", "68 174629\n", "69 116518\n", "70 56687\n", "71 30085\n", "72 14269\n", "73 4971\n", "74 2381\n", "75 895\n", "76 526\n", "77 584\n", "78 1011\n", "Name: height, dtype: int64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.height.replace([99], np.nan, inplace=True)\n", "df.height.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.13443257365438666, 0.03584275286903034)" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mshort'] = df.height<60\n", "copy_null(df, 'height', 'mshort')\n", "df.mshort.isnull().mean(), df.mshort.mean()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.13443257365438666, 0.03249652309458583)" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mtall'] = df.height>=70\n", "copy_null(df, 'height', 'mtall')\n", "df.mtall.isnull().mean(), df.mtall.mean()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mshort
0105
1103
\n", "
" ], "text/plain": [ " boy\n", "mshort \n", "0 105\n", "1 103" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mshort'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mtall
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "mtall \n", "0 105\n", "1 104" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mtall'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's BMI in 6 ranges" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 129937\n", "2 1573715\n", "3 849357\n", "4 442695\n", "5 206615\n", "6 141411\n", "Name: bmi_r, dtype: int64" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.bmi_r.replace([9], np.nan, inplace=True)\n", "df.bmi_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
bmi_r
1104
2105
3105
4104
5104
6104
\n", "
" ], "text/plain": [ " boy\n", "bmi_r \n", "1 104\n", "2 105\n", "3 105\n", "4 104\n", "5 104\n", "6 104" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'bmi_r'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.15579343142136076, 0.23647872286338908)" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['obese'] = df.bmi_r >= 4\n", "copy_null(df, 'bmi_r', 'obese')\n", "df.obese.isnull().mean(), df.obese.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 1497162\n", "2 1628336\n", "3 147475\n", "4 174821\n", "Name: pay_rec, dtype: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pay_rec.replace([9], np.nan, inplace=True)\n", "df.pay_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
pay_rec
1104
2105
3105
4105
\n", "
" ], "text/plain": [ " boy\n", "pay_rec \n", "1 104\n", "2 105\n", "3 105\n", "4 105" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'pay_rec'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sex of baby" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "F 1935228\n", "M 2025568\n", "Name: sex, dtype: int64" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sex.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regression models\n", "\n", "Here are some functions I'll use to interpret the results of logistic regression" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def logodds_to_ratio(logodds):\n", " \"\"\"Convert from log odds to probability.\"\"\"\n", " odds = np.exp(logodds)\n", " return 100 * odds\n", "\n", "def summarize(results):\n", " \"\"\"Summarize parameters in terms of birth ratio.\"\"\"\n", " inter_or = results.params['Intercept']\n", " inter_rat = logodds_to_ratio(inter_or)\n", " \n", " for value, lor in results.params.iteritems():\n", " if value=='Intercept':\n", " continue\n", " \n", " rat = logodds_to_ratio(inter_or + lor)\n", " code = '*' if results.pvalues[value] < 0.05 else ' '\n", " \n", " print('%-20s %0.1f %0.1f' % (value, inter_rat, rat), code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll run models with each variable, one at a time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's age seems to have no predictive value:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692887\n", " Iterations 3\n", "mager9 104.9 104.8 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3960796
Model: Logit Df Residuals: 3960794
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.778e-08
Time: 14:54:16 Log-Likelihood: -2.7444e+06
converged: True LL-Null: -2.7444e+06
LLR p-value: 0.5733
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0475 0.004 13.358 0.000 0.041 0.055
mager9 -0.0005 0.001 -0.563 0.573 -0.002 0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3960796\n", "Model: Logit Df Residuals: 3960794\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.778e-08\n", "Time: 14:54:16 Log-Likelihood: -2.7444e+06\n", "converged: True LL-Null: -2.7444e+06\n", " LLR p-value: 0.5733\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0475 0.004 13.358 0.000 0.041 0.055\n", "mager9 -0.0005 0.001 -0.563 0.573 -0.002 0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mager9', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692886\n", " Iterations 3\n", "youngm[T.True] 104.6 105.6 *\n", "oldm[T.True] 104.6 104.4 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3960796
Model: Logit Df Residuals: 3960793
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.205e-06
Time: 14:54:22 Log-Likelihood: -2.7444e+06
converged: True LL-Null: -2.7444e+06
LLR p-value: 0.03667
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0449 0.001 42.231 0.000 0.043 0.047
youngm[T.True] 0.0095 0.004 2.529 0.011 0.002 0.017
oldm[T.True] -0.0020 0.006 -0.334 0.739 -0.014 0.010
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3960796\n", "Model: Logit Df Residuals: 3960793\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.205e-06\n", "Time: 14:54:22 Log-Likelihood: -2.7444e+06\n", "converged: True LL-Null: -2.7444e+06\n", " LLR p-value: 0.03667\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0449 0.001 42.231 0.000 0.043 0.047\n", "youngm[T.True] 0.0095 0.004 2.529 0.011 0.002 0.017\n", "oldm[T.True] -0.0020 0.006 -0.334 0.739 -0.014 0.010\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ youngm + oldm', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neither does residence status" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692887\n", " Iterations 3\n", "C(restatus)[T.2] 104.6 104.7 \n", "C(restatus)[T.3] 104.6 105.4 \n", "C(restatus)[T.4] 104.6 108.2 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3960796
Model: Logit Df Residuals: 3960792
Method: MLE Df Model: 3
Date: Wed, 18 May 2016 Pseudo R-squ.: 6.393e-07
Time: 14:54:48 Log-Likelihood: -2.7444e+06
converged: True LL-Null: -2.7444e+06
LLR p-value: 0.3196
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0452 0.001 38.300 0.000 0.043 0.048
C(restatus)[T.2] 0.0008 0.002 0.338 0.735 -0.004 0.005
C(restatus)[T.3] 0.0078 0.007 1.126 0.260 -0.006 0.021
C(restatus)[T.4] 0.0335 0.022 1.493 0.136 -0.011 0.078
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3960796\n", "Model: Logit Df Residuals: 3960792\n", "Method: MLE Df Model: 3\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 6.393e-07\n", "Time: 14:54:48 Log-Likelihood: -2.7444e+06\n", "converged: True LL-Null: -2.7444e+06\n", " LLR p-value: 0.3196\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0452 0.001 38.300 0.000 0.043 0.048\n", "C(restatus)[T.2] 0.0008 0.002 0.338 0.735 -0.004 0.005\n", "C(restatus)[T.3] 0.0078 0.007 1.126 0.260 -0.006 0.021\n", "C(restatus)[T.4] 0.0335 0.022 1.493 0.136 -0.011 0.078\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(restatus)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692881\n", " Iterations 3\n", "C(mbrace)[T.2] 104.8 103.3 *\n", "C(mbrace)[T.3] 104.8 104.0 \n", "C(mbrace)[T.4] 104.8 106.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3960796
Model: Logit Df Residuals: 3960792
Method: MLE Df Model: 3
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.640e-06
Time: 14:55:15 Log-Likelihood: -2.7444e+06
converged: True LL-Null: -2.7444e+06
LLR p-value: 2.829e-10
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0471 0.001 40.838 0.000 0.045 0.049
C(mbrace)[T.2] -0.0149 0.003 -5.382 0.000 -0.020 -0.009
C(mbrace)[T.3] -0.0075 0.009 -0.799 0.424 -0.026 0.011
C(mbrace)[T.4] 0.0143 0.004 3.567 0.000 0.006 0.022
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3960796\n", "Model: Logit Df Residuals: 3960792\n", "Method: MLE Df Model: 3\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.640e-06\n", "Time: 14:55:15 Log-Likelihood: -2.7444e+06\n", "converged: True LL-Null: -2.7444e+06\n", " LLR p-value: 2.829e-10\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0471 0.001 40.838 0.000 0.045 0.049\n", "C(mbrace)[T.2] -0.0149 0.003 -5.382 0.000 -0.020 -0.009\n", "C(mbrace)[T.3] -0.0075 0.009 -0.799 0.424 -0.026 0.011\n", "C(mbrace)[T.4] 0.0143 0.004 3.567 0.000 0.006 0.022\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(mbrace)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hispanic mothers have more girls." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692884\n", " Iterations 3\n", "mhisp 105.0 103.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3929904
Model: Logit Df Residuals: 3929902
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.225e-06
Time: 14:55:20 Log-Likelihood: -2.7230e+06
converged: True LL-Null: -2.7230e+06
LLR p-value: 9.580e-08
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0485 0.001 42.133 0.000 0.046 0.051
mhisp -0.0127 0.002 -5.335 0.000 -0.017 -0.008
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3929904\n", "Model: Logit Df Residuals: 3929902\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.225e-06\n", "Time: 14:55:20 Log-Likelihood: -2.7230e+06\n", "converged: True LL-Null: -2.7230e+06\n", " LLR p-value: 9.580e-08\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0485 0.001 42.133 0.000 0.046 0.051\n", "mhisp -0.0127 0.002 -5.335 0.000 -0.017 -0.008\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mhisp', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692875\n", " Iterations 3\n", "C(mar_p)[T.Y] 103.4 104.9 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3488521
Model: Logit Df Residuals: 3488519
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 4.062e-06
Time: 14:55:45 Log-Likelihood: -2.4171e+06
converged: True LL-Null: -2.4171e+06
LLR p-value: 9.370e-06
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0338 0.003 11.071 0.000 0.028 0.040
C(mar_p)[T.Y] 0.0144 0.003 4.431 0.000 0.008 0.021
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3488521\n", "Model: Logit Df Residuals: 3488519\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 4.062e-06\n", "Time: 14:55:45 Log-Likelihood: -2.4171e+06\n", "converged: True LL-Null: -2.4171e+06\n", " LLR p-value: 9.370e-06\n", "=================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "---------------------------------------------------------------------------------\n", "Intercept 0.0338 0.003 11.071 0.000 0.028 0.040\n", "C(mar_p)[T.Y] 0.0144 0.003 4.431 0.000 0.008 0.021\n", "=================================================================================\n", "\"\"\"" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(mar_p)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Being unmarried predicts more girls." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692885\n", " Iterations 3\n", "C(dmar)[T.2] 105.0 104.2 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3960796
Model: Logit Df Residuals: 3960794
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 2.561e-06
Time: 14:56:11 Log-Likelihood: -2.7444e+06
converged: True LL-Null: -2.7444e+06
LLR p-value: 0.0001776
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0487 0.001 37.345 0.000 0.046 0.051
C(dmar)[T.2] -0.0077 0.002 -3.749 0.000 -0.012 -0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3960796\n", "Model: Logit Df Residuals: 3960794\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 2.561e-06\n", "Time: 14:56:11 Log-Likelihood: -2.7444e+06\n", "converged: True LL-Null: -2.7444e+06\n", " LLR p-value: 0.0001776\n", "================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "--------------------------------------------------------------------------------\n", "Intercept 0.0487 0.001 37.345 0.000 0.046 0.051\n", "C(dmar)[T.2] -0.0077 0.002 -3.749 0.000 -0.012 -0.004\n", "================================================================================\n", "\"\"\"" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(dmar)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each level of mother's education predicts a small increase in the probability of a boy." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692874\n", " Iterations 3\n", "meduc 103.4 103.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3452032
Model: Logit Df Residuals: 3452030
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.742e-06
Time: 14:56:15 Log-Likelihood: -2.3918e+06
converged: True LL-Null: -2.3918e+06
LLR p-value: 1.599e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0330 0.003 11.862 0.000 0.028 0.038
meduc 0.0032 0.001 5.241 0.000 0.002 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3452032\n", "Model: Logit Df Residuals: 3452030\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.742e-06\n", "Time: 14:56:15 Log-Likelihood: -2.3918e+06\n", "converged: True LL-Null: -2.3918e+06\n", " LLR p-value: 1.599e-07\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0330 0.003 11.862 0.000 0.028 0.038\n", "meduc 0.0032 0.001 5.241 0.000 0.002 0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ meduc', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692875\n", " Iterations 3\n", "lowed 105.0 103.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3452032
Model: Logit Df Residuals: 3452030
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 3.472e-06
Time: 14:56:19 Log-Likelihood: -2.3918e+06
converged: True LL-Null: -2.3918e+06
LLR p-value: 4.594e-05
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0484 0.001 40.975 0.000 0.046 0.051
lowed -0.0117 0.003 -4.075 0.000 -0.017 -0.006
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3452032\n", "Model: Logit Df Residuals: 3452030\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 3.472e-06\n", "Time: 14:56:19 Log-Likelihood: -2.3918e+06\n", "converged: True LL-Null: -2.3918e+06\n", " LLR p-value: 4.594e-05\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0484 0.001 40.975 0.000 0.046 0.051\n", "lowed -0.0117 0.003 -4.075 0.000 -0.017 -0.006\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ lowed', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance)." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692865\n", " Iterations 3\n", "fagerrec11 105.5 105.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3452131
Model: Logit Df Residuals: 3452129
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.250e-07
Time: 14:56:23 Log-Likelihood: -2.3919e+06
converged: True LL-Null: -2.3919e+06
LLR p-value: 0.1130
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0533 0.004 13.960 0.000 0.046 0.061
fagerrec11 -0.0012 0.001 -1.585 0.113 -0.003 0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3452131\n", "Model: Logit Df Residuals: 3452129\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.250e-07\n", "Time: 14:56:23 Log-Likelihood: -2.3919e+06\n", "converged: True LL-Null: -2.3919e+06\n", " LLR p-value: 0.1130\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0533 0.004 13.960 0.000 0.046 0.061\n", "fagerrec11 -0.0012 0.001 -1.585 0.113 -0.003 0.000\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ fagerrec11', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692865\n", " Iterations 3\n", "youngf 104.9 105.8 \n", "oldf 104.9 104.2 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3452131
Model: Logit Df Residuals: 3452128
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 7.160e-07
Time: 14:56:28 Log-Likelihood: -2.3919e+06
converged: True LL-Null: -2.3919e+06
LLR p-value: 0.1804
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0474 0.001 42.574 0.000 0.045 0.050
youngf 0.0088 0.006 1.405 0.160 -0.003 0.021
oldf -0.0068 0.006 -1.156 0.248 -0.018 0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3452131\n", "Model: Logit Df Residuals: 3452128\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 7.160e-07\n", "Time: 14:56:28 Log-Likelihood: -2.3919e+06\n", "converged: True LL-Null: -2.3919e+06\n", " LLR p-value: 0.1804\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0474 0.001 42.574 0.000 0.045 0.050\n", "youngf 0.0088 0.006 1.405 0.160 -0.003 0.021\n", "oldf -0.0068 0.006 -1.156 0.248 -0.018 0.005\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ youngf + oldf', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers." ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692850\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.0 103.4 *\n", "C(fbrace)[T.3.0] 105.0 104.7 \n", "C(fbrace)[T.4.0] 105.0 107.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3207586
Model: Logit Df Residuals: 3207582
Method: MLE Df Model: 3
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.138e-05
Time: 14:56:53 Log-Likelihood: -2.2224e+06
converged: True LL-Null: -2.2224e+06
LLR p-value: 6.021e-11
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0492 0.001 38.677 0.000 0.047 0.052
C(fbrace)[T.2.0] -0.0161 0.003 -5.070 0.000 -0.022 -0.010
C(fbrace)[T.3.0] -0.0035 0.011 -0.328 0.743 -0.025 0.018
C(fbrace)[T.4.0] 0.0191 0.004 4.360 0.000 0.011 0.028
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3207586\n", "Model: Logit Df Residuals: 3207582\n", "Method: MLE Df Model: 3\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.138e-05\n", "Time: 14:56:53 Log-Likelihood: -2.2224e+06\n", "converged: True LL-Null: -2.2224e+06\n", " LLR p-value: 6.021e-11\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0492 0.001 38.677 0.000 0.047 0.052\n", "C(fbrace)[T.2.0] -0.0161 0.003 -5.070 0.000 -0.022 -0.010\n", "C(fbrace)[T.3.0] -0.0035 0.011 -0.328 0.743 -0.025 0.018\n", "C(fbrace)[T.4.0] 0.0191 0.004 4.360 0.000 0.011 0.028\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(fbrace)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the father is Hispanic, that predicts more girls." ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692864\n", " Iterations 3\n", "fhisp 105.2 103.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3400466
Model: Logit Df Residuals: 3400464
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.006e-06
Time: 14:56:57 Log-Likelihood: -2.3561e+06
converged: True LL-Null: -2.3561e+06
LLR p-value: 8.137e-10
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0508 0.001 41.012 0.000 0.048 0.053
fhisp -0.0157 0.003 -6.142 0.000 -0.021 -0.011
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3400466\n", "Model: Logit Df Residuals: 3400464\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.006e-06\n", "Time: 14:56:57 Log-Likelihood: -2.3561e+06\n", "converged: True LL-Null: -2.3561e+06\n", " LLR p-value: 8.137e-10\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0508 0.001 41.012 0.000 0.048 0.053\n", "fhisp -0.0157 0.003 -6.142 0.000 -0.021 -0.011\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ fhisp', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's education level might predict more boys, but the apparent effect could be due to chance." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692855\n", " Iterations 3\n", "feduc 103.9 104.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2960402
Model: Logit Df Residuals: 2960400
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 3.476e-06
Time: 14:57:00 Log-Likelihood: -2.0511e+06
converged: True LL-Null: -2.0511e+06
LLR p-value: 0.0001591
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0379 0.003 12.866 0.000 0.032 0.044
feduc 0.0025 0.001 3.776 0.000 0.001 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2960402\n", "Model: Logit Df Residuals: 2960400\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 3.476e-06\n", "Time: 14:57:00 Log-Likelihood: -2.0511e+06\n", "converged: True LL-Null: -2.0511e+06\n", " LLR p-value: 0.0001591\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0379 0.003 12.866 0.000 0.032 0.044\n", "feduc 0.0025 0.001 3.776 0.000 0.001 0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ feduc', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Babies with high birth order are slightly more likely to be girls." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692885\n", " Iterations 3\n", "lbo_rec 105.5 105.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3940854
Model: Logit Df Residuals: 3940852
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 4.164e-06
Time: 14:57:05 Log-Likelihood: -2.7306e+06
converged: True LL-Null: -2.7306e+06
LLR p-value: 1.855e-06
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0536 0.002 27.348 0.000 0.050 0.057
lbo_rec -0.0038 0.001 -4.769 0.000 -0.005 -0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3940854\n", "Model: Logit Df Residuals: 3940852\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 4.164e-06\n", "Time: 14:57:05 Log-Likelihood: -2.7306e+06\n", "converged: True LL-Null: -2.7306e+06\n", " LLR p-value: 1.855e-06\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0536 0.002 27.348 0.000 0.050 0.057\n", "lbo_rec -0.0038 0.001 -4.769 0.000 -0.005 -0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ lbo_rec', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692887\n", " Iterations 3\n", "highbo 104.7 103.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3940854
Model: Logit Df Residuals: 3940852
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.626e-07
Time: 14:57:10 Log-Likelihood: -2.7306e+06
converged: True LL-Null: -2.7306e+06
LLR p-value: 0.02997
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0460 0.001 44.570 0.000 0.044 0.048
highbo -0.0102 0.005 -2.171 0.030 -0.019 -0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3940854\n", "Model: Logit Df Residuals: 3940852\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.626e-07\n", "Time: 14:57:10 Log-Likelihood: -2.7306e+06\n", "converged: True LL-Null: -2.7306e+06\n", " LLR p-value: 0.02997\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0460 0.001 44.570 0.000 0.044 0.048\n", "highbo -0.0102 0.005 -2.171 0.030 -0.019 -0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ highbo', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strangely, prenatal visits are associated with an increased probability of girls." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692859\n", " Iterations 3\n", "previs 104.5 103.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3838678
Model: Logit Df Residuals: 3838676
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 4.565e-05
Time: 14:57:15 Log-Likelihood: -2.6597e+06
converged: True LL-Null: -2.6598e+06
LLR p-value: 9.364e-55
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0436 0.001 42.437 0.000 0.042 0.046
previs -0.0086 0.001 -15.583 0.000 -0.010 -0.007
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3838678\n", "Model: Logit Df Residuals: 3838676\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 4.565e-05\n", "Time: 14:57:15 Log-Likelihood: -2.6597e+06\n", "converged: True LL-Null: -2.6598e+06\n", " LLR p-value: 9.364e-55\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0436 0.001 42.437 0.000 0.042 0.046\n", "previs -0.0086 0.001 -15.583 0.000 -0.010 -0.007\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ previs', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692856\n", " Iterations 3\n", "no_previs 104.5 99.7 *\n", "previs 104.5 103.5 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3838678
Model: Logit Df Residuals: 3838675
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.047e-05
Time: 14:57:21 Log-Likelihood: -2.6597e+06
converged: True LL-Null: -2.6598e+06
LLR p-value: 5.053e-59
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0440 0.001 42.713 0.000 0.042 0.046
no_previs -0.0473 0.009 -5.061 0.000 -0.066 -0.029
previs -0.0097 0.001 -16.347 0.000 -0.011 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3838678\n", "Model: Logit Df Residuals: 3838675\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.047e-05\n", "Time: 14:57:21 Log-Likelihood: -2.6597e+06\n", "converged: True LL-Null: -2.6598e+06\n", " LLR p-value: 5.053e-59\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0440 0.001 42.713 0.000 0.042 0.046\n", "no_previs -0.0473 0.009 -5.061 0.000 -0.066 -0.029\n", "previs -0.0097 0.001 -16.347 0.000 -0.011 -0.009\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ no_previs + previs', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the mother qualifies for food stamps, she is more likely to have a girl." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692878\n", " Iterations 3\n", "wic[T.Y] 105.2 104.2 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3411631
Model: Logit Df Residuals: 3411629
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 3.607e-06
Time: 14:57:47 Log-Likelihood: -2.3638e+06
converged: True LL-Null: -2.3639e+06
LLR p-value: 3.635e-05
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0504 0.001 33.979 0.000 0.047 0.053
wic[T.Y] -0.0090 0.002 -4.130 0.000 -0.013 -0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3411631\n", "Model: Logit Df Residuals: 3411629\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 3.607e-06\n", "Time: 14:57:47 Log-Likelihood: -2.3638e+06\n", "converged: True LL-Null: -2.3639e+06\n", " LLR p-value: 3.635e-05\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0504 0.001 33.979 0.000 0.047 0.053\n", "wic[T.Y] -0.0090 0.002 -4.130 0.000 -0.013 -0.005\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ wic', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's height seems to have no predictive value." ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692877\n", " Iterations 3\n", "height 99.3 99.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3428336
Model: Logit Df Residuals: 3428334
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.043e-06
Time: 14:57:51 Log-Likelihood: -2.3754e+06
converged: True LL-Null: -2.3754e+06
LLR p-value: 0.02598
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -0.0075 0.024 -0.309 0.757 -0.055 0.040
height 0.0008 0.000 2.226 0.026 0.000 0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3428336\n", "Model: Logit Df Residuals: 3428334\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.043e-06\n", "Time: 14:57:51 Log-Likelihood: -2.3754e+06\n", "converged: True LL-Null: -2.3754e+06\n", " LLR p-value: 0.02598\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept -0.0075 0.024 -0.309 0.757 -0.055 0.040\n", "height 0.0008 0.000 2.226 0.026 0.000 0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ height', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692876\n", " Iterations 3\n", "mtall 104.8 104.0 \n", "mshort 104.8 103.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3428336
Model: Logit Df Residuals: 3428333
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.593e-06
Time: 14:57:55 Log-Likelihood: -2.3754e+06
converged: True LL-Null: -2.3754e+06
LLR p-value: 0.02272
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0472 0.001 42.200 0.000 0.045 0.049
mtall -0.0076 0.006 -1.249 0.212 -0.020 0.004
mshort -0.0145 0.006 -2.494 0.013 -0.026 -0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3428336\n", "Model: Logit Df Residuals: 3428333\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.593e-06\n", "Time: 14:57:55 Log-Likelihood: -2.3754e+06\n", "converged: True LL-Null: -2.3754e+06\n", " LLR p-value: 0.02272\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0472 0.001 42.200 0.000 0.045 0.049\n", "mtall -0.0076 0.006 -1.249 0.212 -0.020 0.004\n", "mshort -0.0145 0.006 -2.494 0.013 -0.026 -0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mtall + mshort', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's with higher BMI are more likely to have girls." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692879\n", " Iterations 3\n", "bmi_r 105.4 105.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3343730
Model: Logit Df Residuals: 3343728
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.109e-06
Time: 14:57:59 Log-Likelihood: -2.3168e+06
converged: True LL-Null: -2.3168e+06
LLR p-value: 0.02338
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0523 0.003 18.191 0.000 0.047 0.058
bmi_r -0.0021 0.001 -2.267 0.023 -0.004 -0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3343730\n", "Model: Logit Df Residuals: 3343728\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.109e-06\n", "Time: 14:57:59 Log-Likelihood: -2.3168e+06\n", "converged: True LL-Null: -2.3168e+06\n", " LLR p-value: 0.02338\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0523 0.003 18.191 0.000 0.047 0.058\n", "bmi_r -0.0021 0.001 -2.267 0.023 -0.004 -0.000\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ bmi_r', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692878\n", " Iterations 3\n", "obese 104.9 104.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3343730
Model: Logit Df Residuals: 3343728
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.833e-06
Time: 14:58:03 Log-Likelihood: -2.3168e+06
converged: True LL-Null: -2.3168e+06
LLR p-value: 0.003567
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0481 0.001 38.389 0.000 0.046 0.051
obese -0.0075 0.003 -2.914 0.004 -0.013 -0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3343730\n", "Model: Logit Df Residuals: 3343728\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.833e-06\n", "Time: 14:58:03 Log-Likelihood: -2.3168e+06\n", "converged: True LL-Null: -2.3168e+06\n", " LLR p-value: 0.003567\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0481 0.001 38.389 0.000 0.046 0.051\n", "obese -0.0075 0.003 -2.914 0.004 -0.013 -0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ obese', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692877\n", " Iterations 3\n", "C(pay_rec)[T.2.0] 104.4 105.1 *\n", "C(pay_rec)[T.3.0] 104.4 105.3 \n", "C(pay_rec)[T.4.0] 104.4 104.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3447794
Model: Logit Df Residuals: 3447790
Method: MLE Df Model: 3
Date: Wed, 18 May 2016 Pseudo R-squ.: 2.074e-06
Time: 14:58:29 Log-Likelihood: -2.3889e+06
converged: True LL-Null: -2.3889e+06
LLR p-value: 0.01934
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0427 0.002 26.107 0.000 0.039 0.046
C(pay_rec)[T.2.0] 0.0067 0.002 2.944 0.003 0.002 0.011
C(pay_rec)[T.3.0] 0.0094 0.005 1.720 0.085 -0.001 0.020
C(pay_rec)[T.4.0] 0.0033 0.005 0.645 0.519 -0.007 0.013
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3447794\n", "Model: Logit Df Residuals: 3447790\n", "Method: MLE Df Model: 3\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 2.074e-06\n", "Time: 14:58:29 Log-Likelihood: -2.3889e+06\n", "converged: True LL-Null: -2.3889e+06\n", " LLR p-value: 0.01934\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0427 0.002 26.107 0.000 0.039 0.046\n", "C(pay_rec)[T.2.0] 0.0067 0.002 2.944 0.003 0.002 0.011\n", "C(pay_rec)[T.3.0] 0.0094 0.005 1.720 0.085 -0.001 0.020\n", "C(pay_rec)[T.4.0] 0.0033 0.005 0.645 0.519 -0.007 0.013\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(pay_rec)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding controls\n", "\n", "However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.\n", "\n", "In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value." ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692846\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.3 *\n", "C(fbrace)[T.3.0] 105.5 104.1 \n", "C(fbrace)[T.4.0] 105.5 107.0 \n", "C(mbrace)[T.2] 105.5 105.7 \n", "C(mbrace)[T.3] 105.5 106.9 \n", "C(mbrace)[T.4] 105.5 105.6 \n", "fhisp 105.5 104.1 *\n", "mhisp 105.5 105.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3184121
Model: Logit Df Residuals: 3184112
Method: MLE Df Model: 8
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.935e-05
Time: 14:59:16 Log-Likelihood: -2.2061e+06
converged: True LL-Null: -2.2061e+06
LLR p-value: 3.988e-15
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0531 0.001 35.736 0.000 0.050 0.056
C(fbrace)[T.2.0] -0.0211 0.006 -3.688 0.000 -0.032 -0.010
C(fbrace)[T.3.0] -0.0125 0.013 -1.002 0.316 -0.037 0.012
C(fbrace)[T.4.0] 0.0142 0.007 1.936 0.053 -0.000 0.029
C(mbrace)[T.2] 0.0022 0.006 0.367 0.714 -0.010 0.014
C(mbrace)[T.3] 0.0140 0.013 1.076 0.282 -0.012 0.040
C(mbrace)[T.4] 0.0013 0.007 0.186 0.853 -0.012 0.015
fhisp -0.0132 0.004 -2.951 0.003 -0.022 -0.004
mhisp -0.0046 0.004 -1.045 0.296 -0.013 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3184121\n", "Model: Logit Df Residuals: 3184112\n", "Method: MLE Df Model: 8\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.935e-05\n", "Time: 14:59:16 Log-Likelihood: -2.2061e+06\n", "converged: True LL-Null: -2.2061e+06\n", " LLR p-value: 3.988e-15\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0531 0.001 35.736 0.000 0.050 0.056\n", "C(fbrace)[T.2.0] -0.0211 0.006 -3.688 0.000 -0.032 -0.010\n", "C(fbrace)[T.3.0] -0.0125 0.013 -1.002 0.316 -0.037 0.012\n", "C(fbrace)[T.4.0] 0.0142 0.007 1.936 0.053 -0.000 0.029\n", "C(mbrace)[T.2] 0.0022 0.006 0.367 0.714 -0.010 0.014\n", "C(mbrace)[T.3] 0.0140 0.013 1.076 0.282 -0.012 0.040\n", "C(mbrace)[T.4] 0.0013 0.007 0.186 0.853 -0.012 0.015\n", "fhisp -0.0132 0.004 -2.951 0.003 -0.022 -0.004\n", "mhisp -0.0046 0.004 -1.045 0.296 -0.013 0.004\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity." ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692837\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.1 103.0 *\n", "C(fbrace)[T.3.0] 105.1 104.0 \n", "C(fbrace)[T.4.0] 105.1 106.6 *\n", "mar_p[T.Y] 105.1 105.6 \n", "fhisp 105.1 103.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2798315
Model: Logit Df Residuals: 2798309
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.968e-05
Time: 15:00:03 Log-Likelihood: -1.9388e+06
converged: True LL-Null: -1.9388e+06
LLR p-value: 4.935e-15
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0497 0.014 3.433 0.001 0.021 0.078
C(fbrace)[T.2.0] -0.0201 0.003 -5.761 0.000 -0.027 -0.013
C(fbrace)[T.3.0] -0.0104 0.012 -0.858 0.391 -0.034 0.013
C(fbrace)[T.4.0] 0.0144 0.005 3.013 0.003 0.005 0.024
mar_p[T.Y] 0.0045 0.014 0.310 0.757 -0.024 0.033
fhisp -0.0177 0.003 -5.694 0.000 -0.024 -0.012
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2798315\n", "Model: Logit Df Residuals: 2798309\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.968e-05\n", "Time: 15:00:03 Log-Likelihood: -1.9388e+06\n", "converged: True LL-Null: -1.9388e+06\n", " LLR p-value: 4.935e-15\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0497 0.014 3.433 0.001 0.021 0.078\n", "C(fbrace)[T.2.0] -0.0201 0.003 -5.761 0.000 -0.027 -0.013\n", "C(fbrace)[T.3.0] -0.0104 0.012 -0.858 0.391 -0.034 0.013\n", "C(fbrace)[T.4.0] 0.0144 0.005 3.013 0.003 0.005 0.024\n", "mar_p[T.Y] 0.0045 0.014 0.310 0.757 -0.024 0.033\n", "fhisp -0.0177 0.003 -5.694 0.000 -0.024 -0.012\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + mar_p')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Being married still predicts more boys." ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692846\n", " Iterations 3\n", "C(fbrace)[T.2.0] 104.9 102.7 *\n", "C(fbrace)[T.3.0] 104.9 104.1 \n", "C(fbrace)[T.4.0] 104.9 106.6 *\n", "fhisp 104.9 103.0 *\n", "dmar 104.9 105.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3188403
Model: Logit Df Residuals: 3188397
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.937e-05
Time: 15:00:29 Log-Likelihood: -2.2091e+06
converged: True LL-Null: -2.2091e+06
LLR p-value: 5.665e-17
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0478 0.003 13.880 0.000 0.041 0.055
C(fbrace)[T.2.0] -0.0209 0.003 -6.174 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0079 0.011 -0.728 0.467 -0.029 0.013
C(fbrace)[T.4.0] 0.0159 0.004 3.589 0.000 0.007 0.025
fhisp -0.0177 0.003 -5.947 0.000 -0.024 -0.012
dmar 0.0043 0.003 1.667 0.096 -0.001 0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3188403\n", "Model: Logit Df Residuals: 3188397\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.937e-05\n", "Time: 15:00:29 Log-Likelihood: -2.2091e+06\n", "converged: True LL-Null: -2.2091e+06\n", " LLR p-value: 5.665e-17\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0478 0.003 13.880 0.000 0.041 0.055\n", "C(fbrace)[T.2.0] -0.0209 0.003 -6.174 0.000 -0.028 -0.014\n", "C(fbrace)[T.3.0] -0.0079 0.011 -0.728 0.467 -0.029 0.013\n", "C(fbrace)[T.4.0] 0.0159 0.004 3.589 0.000 0.007 0.025\n", "fhisp -0.0177 0.003 -5.947 0.000 -0.024 -0.012\n", "dmar 0.0043 0.003 1.667 0.096 -0.001 0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + dmar')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of education disappears." ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692836\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.6 103.6 *\n", "C(fbrace)[T.3.0] 105.6 104.6 \n", "C(fbrace)[T.4.0] 105.6 107.1 *\n", "fhisp 105.6 103.9 *\n", "lowed 105.6 105.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2777435
Model: Logit Df Residuals: 2777429
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.992e-05
Time: 15:00:55 Log-Likelihood: -1.9243e+06
converged: True LL-Null: -1.9243e+06
LLR p-value: 4.189e-15
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0546 0.002 34.777 0.000 0.052 0.058
C(fbrace)[T.2.0] -0.0198 0.004 -5.634 0.000 -0.027 -0.013
C(fbrace)[T.3.0] -0.0100 0.012 -0.823 0.410 -0.034 0.014
C(fbrace)[T.4.0] 0.0141 0.005 2.925 0.003 0.005 0.024
fhisp -0.0163 0.003 -4.999 0.000 -0.023 -0.010
lowed -0.0055 0.004 -1.471 0.141 -0.013 0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2777435\n", "Model: Logit Df Residuals: 2777429\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.992e-05\n", "Time: 15:00:55 Log-Likelihood: -1.9243e+06\n", "converged: True LL-Null: -1.9243e+06\n", " LLR p-value: 4.189e-15\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0546 0.002 34.777 0.000 0.052 0.058\n", "C(fbrace)[T.2.0] -0.0198 0.004 -5.634 0.000 -0.027 -0.013\n", "C(fbrace)[T.3.0] -0.0100 0.012 -0.823 0.410 -0.034 0.014\n", "C(fbrace)[T.4.0] 0.0141 0.005 2.925 0.003 0.005 0.024\n", "fhisp -0.0163 0.003 -4.999 0.000 -0.023 -0.010\n", "lowed -0.0055 0.004 -1.471 0.141 -0.013 0.002\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + lowed')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of birth order disappears." ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692847\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.5 *\n", "C(fbrace)[T.3.0] 105.5 104.7 \n", "C(fbrace)[T.4.0] 105.5 107.1 *\n", "fhisp 105.5 103.8 *\n", "highbo 105.5 104.8 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3175026
Model: Logit Df Residuals: 3175020
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.881e-05
Time: 15:01:20 Log-Likelihood: -2.1998e+06
converged: True LL-Null: -2.1998e+06
LLR p-value: 2.209e-16
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0531 0.001 36.240 0.000 0.050 0.056
C(fbrace)[T.2.0] -0.0192 0.003 -5.879 0.000 -0.026 -0.013
C(fbrace)[T.3.0] -0.0074 0.011 -0.683 0.495 -0.029 0.014
C(fbrace)[T.4.0] 0.0154 0.004 3.457 0.001 0.007 0.024
fhisp -0.0163 0.003 -5.586 0.000 -0.022 -0.011
highbo -0.0062 0.005 -1.127 0.260 -0.017 0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3175026\n", "Model: Logit Df Residuals: 3175020\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.881e-05\n", "Time: 15:01:20 Log-Likelihood: -2.1998e+06\n", "converged: True LL-Null: -2.1998e+06\n", " LLR p-value: 2.209e-16\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0531 0.001 36.240 0.000 0.050 0.056\n", "C(fbrace)[T.2.0] -0.0192 0.003 -5.879 0.000 -0.026 -0.013\n", "C(fbrace)[T.3.0] -0.0074 0.011 -0.683 0.495 -0.029 0.014\n", "C(fbrace)[T.4.0] 0.0154 0.004 3.457 0.001 0.007 0.024\n", "fhisp -0.0163 0.003 -5.586 0.000 -0.022 -0.011\n", "highbo -0.0062 0.005 -1.127 0.260 -0.017 0.005\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + highbo')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WIC is no longer associated with more girls." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692838\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.4 *\n", "C(fbrace)[T.3.0] 105.5 104.7 \n", "C(fbrace)[T.4.0] 105.5 107.1 *\n", "wic[T.Y] 105.5 105.6 \n", "fhisp 105.5 103.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2735525
Model: Logit Df Residuals: 2735519
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 2.029e-05
Time: 15:02:07 Log-Likelihood: -1.8953e+06
converged: True LL-Null: -1.8953e+06
LLR p-value: 3.710e-15
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0539 0.002 31.172 0.000 0.050 0.057
C(fbrace)[T.2.0] -0.0209 0.004 -5.723 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0078 0.012 -0.636 0.525 -0.032 0.016
C(fbrace)[T.4.0] 0.0148 0.005 3.044 0.002 0.005 0.024
wic[T.Y] 0.0007 0.003 0.264 0.792 -0.004 0.006
fhisp -0.0181 0.003 -5.484 0.000 -0.025 -0.012
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2735525\n", "Model: Logit Df Residuals: 2735519\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 2.029e-05\n", "Time: 15:02:07 Log-Likelihood: -1.8953e+06\n", "converged: True LL-Null: -1.8953e+06\n", " LLR p-value: 3.710e-15\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0539 0.002 31.172 0.000 0.050 0.057\n", "C(fbrace)[T.2.0] -0.0209 0.004 -5.723 0.000 -0.028 -0.014\n", "C(fbrace)[T.3.0] -0.0078 0.012 -0.636 0.525 -0.032 0.016\n", "C(fbrace)[T.4.0] 0.0148 0.005 3.044 0.002 0.005 0.024\n", "wic[T.Y] 0.0007 0.003 0.264 0.792 -0.004 0.006\n", "fhisp -0.0181 0.003 -5.484 0.000 -0.025 -0.012\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + wic')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of obesity disappears." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692838\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.7 103.5 *\n", "C(fbrace)[T.3.0] 105.7 104.2 \n", "C(fbrace)[T.4.0] 105.7 107.2 *\n", "fhisp 105.7 103.9 *\n", "obese 105.7 105.1 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2686167
Model: Logit Df Residuals: 2686161
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 2.202e-05
Time: 15:02:31 Log-Likelihood: -1.8611e+06
converged: True LL-Null: -1.8611e+06
LLR p-value: 3.274e-16
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0552 0.002 32.697 0.000 0.052 0.059
C(fbrace)[T.2.0] -0.0210 0.004 -5.842 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0137 0.012 -1.109 0.267 -0.038 0.011
C(fbrace)[T.4.0] 0.0145 0.005 2.949 0.003 0.005 0.024
fhisp -0.0174 0.003 -5.490 0.000 -0.024 -0.011
obese -0.0052 0.003 -1.770 0.077 -0.011 0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2686167\n", "Model: Logit Df Residuals: 2686161\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 2.202e-05\n", "Time: 15:02:31 Log-Likelihood: -1.8611e+06\n", "converged: True LL-Null: -1.8611e+06\n", " LLR p-value: 3.274e-16\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0552 0.002 32.697 0.000 0.052 0.059\n", "C(fbrace)[T.2.0] -0.0210 0.004 -5.842 0.000 -0.028 -0.014\n", "C(fbrace)[T.3.0] -0.0137 0.012 -1.109 0.267 -0.038 0.011\n", "C(fbrace)[T.4.0] 0.0145 0.005 2.949 0.003 0.005 0.024\n", "fhisp -0.0174 0.003 -5.490 0.000 -0.024 -0.011\n", "obese -0.0052 0.003 -1.770 0.077 -0.011 0.001\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + obese')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of payment method is diminished, but self-payment is still associated with more boys." ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692835\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.9 103.6 *\n", "C(fbrace)[T.3.0] 105.9 104.7 \n", "C(fbrace)[T.4.0] 105.9 107.4 *\n", "C(pay_rec)[T.2.0] 105.9 105.3 \n", "C(pay_rec)[T.3.0] 105.9 107.0 \n", "C(pay_rec)[T.4.0] 105.9 105.7 \n", "fhisp 105.9 103.8 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2763347
Model: Logit Df Residuals: 2763339
Method: MLE Df Model: 7
Date: Wed, 18 May 2016 Pseudo R-squ.: 2.100e-05
Time: 15:03:17 Log-Likelihood: -1.9145e+06
converged: True LL-Null: -1.9146e+06
LLR p-value: 1.132e-14
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0571 0.002 22.914 0.000 0.052 0.062
C(fbrace)[T.2.0] -0.0214 0.004 -5.920 0.000 -0.028 -0.014
C(fbrace)[T.3.0] -0.0113 0.012 -0.915 0.360 -0.035 0.013
C(fbrace)[T.4.0] 0.0142 0.005 2.955 0.003 0.005 0.024
C(pay_rec)[T.2.0] -0.0050 0.003 -1.839 0.066 -0.010 0.000
C(pay_rec)[T.3.0] 0.0103 0.007 1.580 0.114 -0.002 0.023
C(pay_rec)[T.4.0] -0.0016 0.006 -0.274 0.784 -0.013 0.010
fhisp -0.0193 0.003 -5.917 0.000 -0.026 -0.013
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2763347\n", "Model: Logit Df Residuals: 2763339\n", "Method: MLE Df Model: 7\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 2.100e-05\n", "Time: 15:03:17 Log-Likelihood: -1.9145e+06\n", "converged: True LL-Null: -1.9146e+06\n", " LLR p-value: 1.132e-14\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0571 0.002 22.914 0.000 0.052 0.062\n", "C(fbrace)[T.2.0] -0.0214 0.004 -5.920 0.000 -0.028 -0.014\n", "C(fbrace)[T.3.0] -0.0113 0.012 -0.915 0.360 -0.035 0.013\n", "C(fbrace)[T.4.0] 0.0142 0.005 2.955 0.003 0.005 0.024\n", "C(pay_rec)[T.2.0] -0.0050 0.003 -1.839 0.066 -0.010 0.000\n", "C(pay_rec)[T.3.0] 0.0103 0.007 1.580 0.114 -0.002 0.023\n", "C(pay_rec)[T.4.0] -0.0016 0.006 -0.274 0.784 -0.013 0.010\n", "fhisp -0.0193 0.003 -5.917 0.000 -0.026 -0.013\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But the effect of prenatal visits is still a strong predictor of more girls." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692809\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.0 *\n", "C(fbrace)[T.3.0] 105.5 104.1 \n", "C(fbrace)[T.4.0] 105.5 107.0 *\n", "fhisp 105.5 103.4 *\n", "previs 105.5 104.4 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3097584
Model: Logit Df Residuals: 3097578
Method: MLE Df Model: 5
Date: Wed, 18 May 2016 Pseudo R-squ.: 7.830e-05
Time: 15:03:43 Log-Likelihood: -2.1460e+06
converged: True LL-Null: -2.1462e+06
LLR p-value: 1.719e-70
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0532 0.001 36.168 0.000 0.050 0.056
C(fbrace)[T.2.0] -0.0237 0.003 -7.129 0.000 -0.030 -0.017
C(fbrace)[T.3.0] -0.0129 0.011 -1.170 0.242 -0.035 0.009
C(fbrace)[T.4.0] 0.0141 0.005 3.112 0.002 0.005 0.023
fhisp -0.0193 0.003 -6.533 0.000 -0.025 -0.014
previs -0.0103 0.001 -16.043 0.000 -0.012 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3097584\n", "Model: Logit Df Residuals: 3097578\n", "Method: MLE Df Model: 5\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 7.830e-05\n", "Time: 15:03:43 Log-Likelihood: -2.1460e+06\n", "converged: True LL-Null: -2.1462e+06\n", " LLR p-value: 1.719e-70\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0532 0.001 36.168 0.000 0.050 0.056\n", "C(fbrace)[T.2.0] -0.0237 0.003 -7.129 0.000 -0.030 -0.017\n", "C(fbrace)[T.3.0] -0.0129 0.011 -1.170 0.242 -0.035 0.009\n", "C(fbrace)[T.4.0] 0.0141 0.005 3.112 0.002 0.005 0.023\n", "fhisp -0.0193 0.003 -6.533 0.000 -0.025 -0.014\n", "previs -0.0103 0.001 -16.043 0.000 -0.012 -0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692805\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.1 *\n", "C(fbrace)[T.3.0] 105.5 104.1 \n", "C(fbrace)[T.4.0] 105.5 107.0 *\n", "fhisp 105.5 103.5 *\n", "previs 105.5 104.3 *\n", "no_previs 105.5 99.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3097584
Model: Logit Df Residuals: 3097577
Method: MLE Df Model: 6
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05
Time: 15:04:09 Log-Likelihood: -2.1460e+06
converged: True LL-Null: -2.1462e+06
LLR p-value: 4.542e-74
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0536 0.001 36.382 0.000 0.051 0.057
C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017
C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009
C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023
fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013
previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010
no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3097584\n", "Model: Logit Df Residuals: 3097577\n", "Method: MLE Df Model: 6\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05\n", "Time: 15:04:09 Log-Likelihood: -2.1460e+06\n", "converged: True LL-Null: -2.1462e+06\n", " LLR p-value: 4.542e-74\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0536 0.001 36.382 0.000 0.051 0.057\n", "C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017\n", "C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009\n", "C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023\n", "fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013\n", "previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010\n", "no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More controls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears." ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692808\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.2 102.6 *\n", "C(fbrace)[T.3.0] 105.2 103.8 \n", "C(fbrace)[T.4.0] 105.2 106.7 *\n", "fhisp 105.2 103.1 *\n", "previs 105.2 104.1 *\n", "dmar 105.2 105.4 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3097584
Model: Logit Df Residuals: 3097577
Method: MLE Df Model: 6
Date: Wed, 18 May 2016 Pseudo R-squ.: 7.846e-05
Time: 15:04:35 Log-Likelihood: -2.1460e+06
converged: True LL-Null: -2.1462e+06
LLR p-value: 1.058e-69
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0506 0.004 14.449 0.000 0.044 0.057
C(fbrace)[T.2.0] -0.0245 0.003 -7.072 0.000 -0.031 -0.018
C(fbrace)[T.3.0] -0.0136 0.011 -1.227 0.220 -0.035 0.008
C(fbrace)[T.4.0] 0.0142 0.005 3.151 0.002 0.005 0.023
fhisp -0.0198 0.003 -6.561 0.000 -0.026 -0.014
previs -0.0103 0.001 -15.969 0.000 -0.012 -0.009
dmar 0.0022 0.003 0.828 0.408 -0.003 0.007
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3097584\n", "Model: Logit Df Residuals: 3097577\n", "Method: MLE Df Model: 6\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 7.846e-05\n", "Time: 15:04:35 Log-Likelihood: -2.1460e+06\n", "converged: True LL-Null: -2.1462e+06\n", " LLR p-value: 1.058e-69\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0506 0.004 14.449 0.000 0.044 0.057\n", "C(fbrace)[T.2.0] -0.0245 0.003 -7.072 0.000 -0.031 -0.018\n", "C(fbrace)[T.3.0] -0.0136 0.011 -1.227 0.220 -0.035 0.008\n", "C(fbrace)[T.4.0] 0.0142 0.005 3.151 0.002 0.005 0.023\n", "fhisp -0.0198 0.003 -6.561 0.000 -0.026 -0.014\n", "previs -0.0103 0.001 -15.969 0.000 -0.012 -0.009\n", "dmar 0.0022 0.003 0.828 0.408 -0.003 0.007\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of payment method disappears." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692799\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.7 103.1 *\n", "C(fbrace)[T.3.0] 105.7 104.0 \n", "C(fbrace)[T.4.0] 105.7 107.0 *\n", "C(pay_rec)[T.2.0] 105.7 105.6 \n", "C(pay_rec)[T.3.0] 105.7 105.7 \n", "C(pay_rec)[T.4.0] 105.7 105.3 \n", "fhisp 105.7 103.6 *\n", "previs 105.7 104.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2679860
Model: Logit Df Residuals: 2679851
Method: MLE Df Model: 8
Date: Wed, 18 May 2016 Pseudo R-squ.: 7.905e-05
Time: 15:05:21 Log-Likelihood: -1.8566e+06
converged: True LL-Null: -1.8568e+06
LLR p-value: 9.714e-59
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0553 0.003 21.819 0.000 0.050 0.060
C(fbrace)[T.2.0] -0.0248 0.004 -6.723 0.000 -0.032 -0.018
C(fbrace)[T.3.0] -0.0166 0.012 -1.326 0.185 -0.041 0.008
C(fbrace)[T.4.0] 0.0128 0.005 2.610 0.009 0.003 0.022
C(pay_rec)[T.2.0] -0.0012 0.003 -0.436 0.663 -0.007 0.004
C(pay_rec)[T.3.0] 3.729e-05 0.007 0.006 0.996 -0.013 0.013
C(pay_rec)[T.4.0] -0.0035 0.006 -0.589 0.556 -0.015 0.008
fhisp -0.0203 0.003 -6.114 0.000 -0.027 -0.014
previs -0.0103 0.001 -14.715 0.000 -0.012 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2679860\n", "Model: Logit Df Residuals: 2679851\n", "Method: MLE Df Model: 8\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 7.905e-05\n", "Time: 15:05:21 Log-Likelihood: -1.8566e+06\n", "converged: True LL-Null: -1.8568e+06\n", " LLR p-value: 9.714e-59\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0553 0.003 21.819 0.000 0.050 0.060\n", "C(fbrace)[T.2.0] -0.0248 0.004 -6.723 0.000 -0.032 -0.018\n", "C(fbrace)[T.3.0] -0.0166 0.012 -1.326 0.185 -0.041 0.008\n", "C(fbrace)[T.4.0] 0.0128 0.005 2.610 0.009 0.003 0.022\n", "C(pay_rec)[T.2.0] -0.0012 0.003 -0.436 0.663 -0.007 0.004\n", "C(pay_rec)[T.3.0] 3.729e-05 0.007 0.006 0.996 -0.013 0.013\n", "C(pay_rec)[T.4.0] -0.0035 0.006 -0.589 0.556 -0.015 0.008\n", "fhisp -0.0203 0.003 -6.114 0.000 -0.027 -0.014\n", "previs -0.0103 0.001 -14.715 0.000 -0.012 -0.009\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a version with the addition of a boolean for no prenatal visits." ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692805\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.1 *\n", "C(fbrace)[T.3.0] 105.5 104.1 \n", "C(fbrace)[T.4.0] 105.5 107.0 *\n", "fhisp 105.5 103.5 *\n", "previs 105.5 104.3 *\n", "no_previs 105.5 99.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3097584
Model: Logit Df Residuals: 3097577
Method: MLE Df Model: 6
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05
Time: 15:05:46 Log-Likelihood: -2.1460e+06
converged: True LL-Null: -2.1462e+06
LLR p-value: 4.542e-74
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0536 0.001 36.382 0.000 0.051 0.057
C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017
C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009
C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023
fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013
previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010
no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3097584\n", "Model: Logit Df Residuals: 3097577\n", "Method: MLE Df Model: 6\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05\n", "Time: 15:05:46 Log-Likelihood: -2.1460e+06\n", "converged: True LL-Null: -2.1462e+06\n", " LLR p-value: 4.542e-74\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0536 0.001 36.382 0.000 0.051 0.057\n", "C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017\n", "C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009\n", "C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023\n", "fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013\n", "previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010\n", "no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, surprisingly, the mother's age has a small effect." ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692805\n", " Iterations 3\n", "C(fbrace)[T.2.0] 106.2 103.7 *\n", "C(fbrace)[T.3.0] 106.2 104.8 \n", "C(fbrace)[T.4.0] 106.2 107.8 *\n", "fhisp 106.2 104.2 *\n", "previs 106.2 105.0 *\n", "no_previs 106.2 100.3 *\n", "mager9 106.2 106.1 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3097584
Model: Logit Df Residuals: 3097576
Method: MLE Df Model: 7
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.378e-05
Time: 15:06:13 Log-Likelihood: -2.1460e+06
converged: True LL-Null: -2.1462e+06
LLR p-value: 1.081e-73
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0603 0.004 13.417 0.000 0.051 0.069
C(fbrace)[T.2.0] -0.0241 0.003 -7.209 0.000 -0.031 -0.018
C(fbrace)[T.3.0] -0.0139 0.011 -1.255 0.209 -0.036 0.008
C(fbrace)[T.4.0] 0.0144 0.005 3.176 0.001 0.006 0.023
fhisp -0.0196 0.003 -6.592 0.000 -0.025 -0.014
previs -0.0113 0.001 -16.525 0.000 -0.013 -0.010
no_previs -0.0571 0.012 -4.578 0.000 -0.082 -0.033
mager9 -0.0015 0.001 -1.573 0.116 -0.003 0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3097584\n", "Model: Logit Df Residuals: 3097576\n", "Method: MLE Df Model: 7\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.378e-05\n", "Time: 15:06:13 Log-Likelihood: -2.1460e+06\n", "converged: True LL-Null: -2.1462e+06\n", " LLR p-value: 1.081e-73\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0603 0.004 13.417 0.000 0.051 0.069\n", "C(fbrace)[T.2.0] -0.0241 0.003 -7.209 0.000 -0.031 -0.018\n", "C(fbrace)[T.3.0] -0.0139 0.011 -1.255 0.209 -0.036 0.008\n", "C(fbrace)[T.4.0] 0.0144 0.005 3.176 0.001 0.006 0.023\n", "fhisp -0.0196 0.003 -6.592 0.000 -0.025 -0.014\n", "previs -0.0113 0.001 -16.525 0.000 -0.013 -0.010\n", "no_previs -0.0571 0.012 -4.578 0.000 -0.082 -0.033\n", "mager9 -0.0015 0.001 -1.573 0.116 -0.003 0.000\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So does the father's age. But both age effects are small and borderline significant." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692804\n", " Iterations 3\n", "C(fbrace)[T.2.0] 106.4 103.8 *\n", "C(fbrace)[T.3.0] 106.4 105.0 \n", "C(fbrace)[T.4.0] 106.4 107.9 *\n", "fhisp 106.4 104.3 *\n", "previs 106.4 105.2 *\n", "no_previs 106.4 100.4 *\n", "fagerrec11 106.4 106.2 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3088740
Model: Logit Df Residuals: 3088732
Method: MLE Df Model: 7
Date: Wed, 18 May 2016 Pseudo R-squ.: 8.510e-05
Time: 15:06:39 Log-Likelihood: -2.1399e+06
converged: True LL-Null: -2.1401e+06
LLR p-value: 1.099e-74
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0620 0.004 14.546 0.000 0.054 0.070
C(fbrace)[T.2.0] -0.0243 0.003 -7.284 0.000 -0.031 -0.018
C(fbrace)[T.3.0] -0.0137 0.011 -1.236 0.217 -0.035 0.008
C(fbrace)[T.4.0] 0.0143 0.005 3.143 0.002 0.005 0.023
fhisp -0.0197 0.003 -6.622 0.000 -0.026 -0.014
previs -0.0113 0.001 -16.639 0.000 -0.013 -0.010
no_previs -0.0581 0.013 -4.637 0.000 -0.083 -0.034
fagerrec11 -0.0017 0.001 -2.082 0.037 -0.003 -0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3088740\n", "Model: Logit Df Residuals: 3088732\n", "Method: MLE Df Model: 7\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 8.510e-05\n", "Time: 15:06:39 Log-Likelihood: -2.1399e+06\n", "converged: True LL-Null: -2.1401e+06\n", " LLR p-value: 1.099e-74\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0620 0.004 14.546 0.000 0.054 0.070\n", "C(fbrace)[T.2.0] -0.0243 0.003 -7.284 0.000 -0.031 -0.018\n", "C(fbrace)[T.3.0] -0.0137 0.011 -1.236 0.217 -0.035 0.008\n", "C(fbrace)[T.4.0] 0.0143 0.005 3.143 0.002 0.005 0.023\n", "fhisp -0.0197 0.003 -6.622 0.000 -0.026 -0.014\n", "previs -0.0113 0.001 -16.639 0.000 -0.013 -0.010\n", "no_previs -0.0581 0.013 -4.637 0.000 -0.083 -0.034\n", "fagerrec11 -0.0017 0.001 -2.082 0.037 -0.003 -0.000\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's up with prenatal visits?\n", "\n", "The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2381977" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "white = df[(df.mbrace==1) & (df.fbrace==1)]\n", "len(white)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And compute sex ratios for each level of `previs`" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
previs
-6106
-5110
-4108
-3109
-2108
-1107
0105
1103
2102
3100
4103
\n", "
" ], "text/plain": [ " boy\n", "previs \n", "-6 106\n", "-5 110\n", "-4 108\n", "-3 109\n", "-2 108\n", "-1 107\n", " 0 105\n", " 1 103\n", " 2 102\n", " 3 100\n", " 4 103" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'previs'\n", "white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys." ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692804\n", " Iterations 3\n", "previs 105.1 103.8 *\n", "no_previs 105.1 98.9 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2320227
Model: Logit Df Residuals: 2320224
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 6.584e-05
Time: 15:06:43 Log-Likelihood: -1.6075e+06
converged: True LL-Null: -1.6076e+06
LLR p-value: 1.073e-46
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0493 0.001 37.359 0.000 0.047 0.052
previs -0.0116 0.001 -14.535 0.000 -0.013 -0.010
no_previs -0.0608 0.015 -3.966 0.000 -0.091 -0.031
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2320227\n", "Model: Logit Df Residuals: 2320224\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 6.584e-05\n", "Time: 15:06:43 Log-Likelihood: -1.6075e+06\n", "converged: True LL-Null: -1.6076e+06\n", " LLR p-value: 1.073e-46\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0493 0.001 37.359 0.000 0.047 0.052\n", "previs -0.0116 0.001 -14.535 0.000 -0.013 -0.010\n", "no_previs -0.0608 0.015 -3.966 0.000 -0.091 -0.031\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ previs + no_previs')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.04929183382635937, -0.011584489975776435)" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inter = results.params['Intercept']\n", "slope = results.params['previs']\n", "inter, slope" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 111.31727637, 110.03516315, 108.76781686, 107.51506742,\n", " 106.2767467 , 105.05268853, 103.84272863, 102.64670462,\n", " 101.46445599, 100.29582409])" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "previs = np.arange(-5, 5)\n", "logodds = inter + slope * previs\n", "odds = np.exp(logodds)\n", "odds * 100" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692845\n", " Iterations 3\n", "dmar 105.2 105.1 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2381977
Model: Logit Df Residuals: 2381975
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 3.675e-08
Time: 15:06:46 Log-Likelihood: -1.6503e+06
converged: True LL-Null: -1.6503e+06
LLR p-value: 0.7276
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0505 0.004 12.847 0.000 0.043 0.058
dmar -0.0010 0.003 -0.348 0.728 -0.007 0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2381977\n", "Model: Logit Df Residuals: 2381975\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 3.675e-08\n", "Time: 15:06:46 Log-Likelihood: -1.6503e+06\n", "converged: True LL-Null: -1.6503e+06\n", " LLR p-value: 0.7276\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0505 0.004 12.847 0.000 0.043 0.058\n", "dmar -0.0010 0.003 -0.348 0.728 -0.007 0.005\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ dmar')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692830\n", " Iterations 3\n", "lowed 105.3 103.9 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2089901
Model: Logit Df Residuals: 2089899
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 4.146e-06
Time: 15:06:48 Log-Likelihood: -1.4479e+06
converged: True LL-Null: -1.4480e+06
LLR p-value: 0.0005303
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0520 0.001 35.035 0.000 0.049 0.055
lowed -0.0142 0.004 -3.465 0.001 -0.022 -0.006
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2089901\n", "Model: Logit Df Residuals: 2089899\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 4.146e-06\n", "Time: 15:06:48 Log-Likelihood: -1.4479e+06\n", "converged: True LL-Null: -1.4480e+06\n", " LLR p-value: 0.0005303\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0520 0.001 35.035 0.000 0.049 0.055\n", "lowed -0.0142 0.004 -3.465 0.001 -0.022 -0.006\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ lowed')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692845\n", " Iterations 3\n", "highbo 105.1 104.1 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2373894
Model: Logit Df Residuals: 2373892
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 6.498e-07
Time: 15:06:50 Log-Likelihood: -1.6447e+06
converged: True LL-Null: -1.6447e+06
LLR p-value: 0.1437
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0496 0.001 37.359 0.000 0.047 0.052
highbo -0.0095 0.006 -1.462 0.144 -0.022 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2373894\n", "Model: Logit Df Residuals: 2373892\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 6.498e-07\n", "Time: 15:06:50 Log-Likelihood: -1.6447e+06\n", "converged: True LL-Null: -1.6447e+06\n", " LLR p-value: 0.1437\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0496 0.001 37.359 0.000 0.047 0.052\n", "highbo -0.0095 0.006 -1.462 0.144 -0.022 0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ highbo')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692836\n", " Iterations 3\n", "wic[T.Y] 105.3 104.8 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2059437
Model: Logit Df Residuals: 2059435
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 1.267e-06
Time: 15:07:06 Log-Likelihood: -1.4269e+06
converged: True LL-Null: -1.4269e+06
LLR p-value: 0.05720
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0519 0.002 29.448 0.000 0.048 0.055
wic[T.Y] -0.0055 0.003 -1.902 0.057 -0.011 0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2059437\n", "Model: Logit Df Residuals: 2059435\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 1.267e-06\n", "Time: 15:07:06 Log-Likelihood: -1.4269e+06\n", "converged: True LL-Null: -1.4269e+06\n", " LLR p-value: 0.05720\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0519 0.002 29.448 0.000 0.048 0.055\n", "wic[T.Y] -0.0055 0.003 -1.902 0.057 -0.011 0.000\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ wic')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692834\n", " Iterations 3\n", "obese 105.2 104.8 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2029161
Model: Logit Df Residuals: 2029159
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 4.153e-07
Time: 15:07:08 Log-Likelihood: -1.4059e+06
converged: True LL-Null: -1.4059e+06
LLR p-value: 0.2798
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0509 0.002 31.979 0.000 0.048 0.054
obese -0.0037 0.003 -1.081 0.280 -0.010 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2029161\n", "Model: Logit Df Residuals: 2029159\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 4.153e-07\n", "Time: 15:07:08 Log-Likelihood: -1.4059e+06\n", "converged: True LL-Null: -1.4059e+06\n", " LLR p-value: 0.2798\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0509 0.002 31.979 0.000 0.048 0.054\n", "obese -0.0037 0.003 -1.081 0.280 -0.010 0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ obese')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692834\n", " Iterations 3\n", "C(pay_rec)[T.2.0] 105.0 105.2 \n", "C(pay_rec)[T.3.0] 105.0 105.8 \n", "C(pay_rec)[T.4.0] 105.0 104.8 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2077652
Model: Logit Df Residuals: 2077648
Method: MLE Df Model: 3
Date: Wed, 18 May 2016 Pseudo R-squ.: 5.425e-07
Time: 15:07:23 Log-Likelihood: -1.4395e+06
converged: True LL-Null: -1.4395e+06
LLR p-value: 0.6681
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0486 0.002 20.446 0.000 0.044 0.053
C(pay_rec)[T.2.0] 0.0021 0.003 0.684 0.494 -0.004 0.008
C(pay_rec)[T.3.0] 0.0076 0.007 1.036 0.300 -0.007 0.022
C(pay_rec)[T.4.0] -0.0020 0.007 -0.296 0.767 -0.015 0.011
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2077652\n", "Model: Logit Df Residuals: 2077648\n", "Method: MLE Df Model: 3\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 5.425e-07\n", "Time: 15:07:23 Log-Likelihood: -1.4395e+06\n", "converged: True LL-Null: -1.4395e+06\n", " LLR p-value: 0.6681\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0486 0.002 20.446 0.000 0.044 0.053\n", "C(pay_rec)[T.2.0] 0.0021 0.003 0.684 0.494 -0.004 0.008\n", "C(pay_rec)[T.3.0] 0.0076 0.007 1.036 0.300 -0.007 0.022\n", "C(pay_rec)[T.4.0] -0.0020 0.007 -0.296 0.767 -0.015 0.011\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(pay_rec)')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692845\n", " Iterations 3\n", "mager9 105.8 105.6 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2381977
Model: Logit Df Residuals: 2381975
Method: MLE Df Model: 1
Date: Wed, 18 May 2016 Pseudo R-squ.: 6.201e-07
Time: 15:07:27 Log-Likelihood: -1.6503e+06
converged: True LL-Null: -1.6503e+06
LLR p-value: 0.1525
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0559 0.005 11.397 0.000 0.046 0.066
mager9 -0.0016 0.001 -1.431 0.153 -0.004 0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2381977\n", "Model: Logit Df Residuals: 2381975\n", "Method: MLE Df Model: 1\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 6.201e-07\n", "Time: 15:07:27 Log-Likelihood: -1.6503e+06\n", "converged: True LL-Null: -1.6503e+06\n", " LLR p-value: 0.1525\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0559 0.005 11.397 0.000 0.046 0.066\n", "mager9 -0.0016 0.001 -1.431 0.153 -0.004 0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ mager9')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692844\n", " Iterations 3\n", "youngm[T.True] 105.0 106.0 \n", "oldm[T.True] 105.0 104.9 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2381977
Model: Logit Df Residuals: 2381974
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 9.503e-07
Time: 15:07:30 Log-Likelihood: -1.6503e+06
converged: True LL-Null: -1.6503e+06
LLR p-value: 0.2084
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0486 0.001 35.884 0.000 0.046 0.051
youngm[T.True] 0.0101 0.006 1.766 0.077 -0.001 0.021
oldm[T.True] -0.0004 0.008 -0.055 0.956 -0.015 0.014
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2381977\n", "Model: Logit Df Residuals: 2381974\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 9.503e-07\n", "Time: 15:07:30 Log-Likelihood: -1.6503e+06\n", "converged: True LL-Null: -1.6503e+06\n", " LLR p-value: 0.2084\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0486 0.001 35.884 0.000 0.046 0.051\n", "youngm[T.True] 0.0101 0.006 1.766 0.077 -0.001 0.021\n", "oldm[T.True] -0.0004 0.008 -0.055 0.956 -0.015 0.014\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ youngm + oldm')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692843\n", " Iterations 3\n", "youngf 105.1 105.6 \n", "oldf 105.1 104.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2376438
Model: Logit Df Residuals: 2376435
Method: MLE Df Model: 2
Date: Wed, 18 May 2016 Pseudo R-squ.: 7.327e-07
Time: 15:07:34 Log-Likelihood: -1.6465e+06
converged: True LL-Null: -1.6465e+06
LLR p-value: 0.2993
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0495 0.001 37.030 0.000 0.047 0.052
youngf 0.0053 0.008 0.652 0.514 -0.011 0.021
oldf -0.0107 0.008 -1.390 0.164 -0.026 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2376438\n", "Model: Logit Df Residuals: 2376435\n", "Method: MLE Df Model: 2\n", "Date: Wed, 18 May 2016 Pseudo R-squ.: 7.327e-07\n", "Time: 15:07:34 Log-Likelihood: -1.6465e+06\n", "converged: True LL-Null: -1.6465e+06\n", " LLR p-value: 0.2993\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0495 0.001 37.030 0.000 0.047 0.052\n", "youngf 0.0053 0.008 0.652 0.514 -0.011 0.021\n", "oldf -0.0107 0.008 -1.390 0.164 -0.026 0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ youngf + oldf')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }