{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Does Trivers-Willard apply to people?\n",
"\n",
"This notebook contains a \"one-day paper\", my attempt to pose a research question, answer it, and publish the results in one work day.\n",
"\n",
"Copyright 2016 Allen B. Downey\n",
"\n",
"MIT License: https://opensource.org/licenses/MIT"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import print_function, division\n",
"\n",
"import thinkstats2\n",
"import thinkplot\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import statsmodels.formula.api as smf\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trivers-Willard\n",
"\n",
"[According to Wikipedia](https://en.wikipedia.org/wiki/Trivers%E2%80%93Willard_hypothesis), the Trivers-Willard hypothesis:\n",
"\n",
">\"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition).\"\n",
"\n",
"For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.\n",
"\n",
"To test whether the T-W hypothesis holds up in humans, I downloaded [birth data for the nearly 4 million babies born in the U.S. in 2014](http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Births).\n",
"\n",
"I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Summary of results**\n",
"\n",
"1. Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.\n",
"\n",
"2. However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.\n",
"\n",
"3. Contrary to other reports, the age of the parents seems to have no predictive power.\n",
"\n",
"4. Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).\n",
"\n",
"Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data cleaning\n",
"\n",
"Here's how I loaded the data:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"names = ['year', 'mager9', 'restatus', 'mbrace', 'mhisp_r',\n",
" 'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', \n",
" 'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']\n",
"colspecs = [(15, 18),\n",
" (93, 93),\n",
" (138, 138),\n",
" (143, 143),\n",
" (148, 148),\n",
" (152, 152),\n",
" (153, 153),\n",
" (155, 155),\n",
" (186, 187),\n",
" (191, 191),\n",
" (195, 195),\n",
" (197, 197),\n",
" (212, 212),\n",
" (272, 273),\n",
" (281, 281),\n",
" (555, 556),\n",
" (533, 533),\n",
" (413, 413),\n",
" (436, 436),\n",
" ]\n",
"\n",
"colspecs = [(start-1, end) for start, end in colspecs]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = None"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
" \n",
" \n",
" | \n",
" year | \n",
" mager9 | \n",
" restatus | \n",
" mbrace | \n",
" mhisp_r | \n",
" mar_p | \n",
" dmar | \n",
" meduc | \n",
" fagerrec11 | \n",
" fbrace | \n",
" fhisp_r | \n",
" feduc | \n",
" lbo_rec | \n",
" previs_rec | \n",
" wic | \n",
" height | \n",
" bmi_r | \n",
" pay_rec | \n",
" sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2012 | \n",
" 6 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
" 1 | \n",
" NaN | \n",
" 5 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
" 2 | \n",
" 6 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" M | \n",
"
\n",
" \n",
" 1 | \n",
" 2012 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" 0 | \n",
" NaN | \n",
" 2 | \n",
" NaN | \n",
" 4 | \n",
" 3 | \n",
" 0 | \n",
" NaN | \n",
" 2 | \n",
" 5 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" F | \n",
"
\n",
" \n",
" 2 | \n",
" 2012 | \n",
" 2 | \n",
" 1 | \n",
" 2 | \n",
" 0 | \n",
" NaN | \n",
" 2 | \n",
" NaN | \n",
" 3 | \n",
" 2 | \n",
" 0 | \n",
" NaN | \n",
" 1 | \n",
" 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" M | \n",
"
\n",
" \n",
" 3 | \n",
" 2012 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
" 1 | \n",
" NaN | \n",
" 3 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
" 9 | \n",
" 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" M | \n",
"
\n",
" \n",
" 4 | \n",
" 2012 | \n",
" 4 | \n",
" 1 | \n",
" 4 | \n",
" 0 | \n",
" NaN | \n",
" 1 | \n",
" NaN | \n",
" 4 | \n",
" 1 | \n",
" 0 | \n",
" NaN | \n",
" 3 | \n",
" 7 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" F | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" year mager9 restatus mbrace mhisp_r mar_p dmar meduc fagerrec11 \\\n",
"0 2012 6 1 1 0 NaN 1 NaN 5 \n",
"1 2012 3 1 3 0 NaN 2 NaN 4 \n",
"2 2012 2 1 2 0 NaN 2 NaN 3 \n",
"3 2012 3 1 1 0 NaN 1 NaN 3 \n",
"4 2012 4 1 4 0 NaN 1 NaN 4 \n",
"\n",
" fbrace fhisp_r feduc lbo_rec previs_rec wic height bmi_r pay_rec \\\n",
"0 1 0 NaN 2 6 NaN NaN NaN NaN \n",
"1 3 0 NaN 2 5 NaN NaN NaN NaN \n",
"2 2 0 NaN 1 7 NaN NaN NaN NaN \n",
"3 1 0 NaN 9 7 NaN NaN NaN NaN \n",
"4 1 0 NaN 3 7 NaN NaN NaN NaN \n",
"\n",
" sex \n",
"0 M \n",
"1 F \n",
"2 M \n",
"3 M \n",
"4 F "
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filename = 'Nat2012PublicUS.r20131217.gz'\n",
"#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)\n",
"#df.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/downey/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py:3066: PerformanceWarning: \n",
"your performance may suffer as PyTables will pickle object types that it cannot\n",
"map directly to c-types [inferred_type->mixed,key->block2_values] [items->['mar_p', 'wic', 'sex']]\n",
"\n",
" exec(code_obj, self.user_global_ns, self.user_ns)\n"
]
}
],
"source": [
"# store the dataframe for faster loading\n",
"\n",
"#store = pd.HDFStore('store.h5')\n",
"#store['births2013'] = df\n",
"#store.close()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# load the dataframe\n",
"\n",
"store = pd.HDFStore('store.h5')\n",
"df = store['births2013']\n",
"store.close()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def series_to_ratio(series):\n",
" \"\"\"Takes a boolean series and computes sex ratio.\n",
" \"\"\"\n",
" boys = np.mean(series)\n",
" return np.round(100 * boys / (1-boys)).astype(int)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I have to recode sex as `0` or `1` to make `logit` happy."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 1935228\n",
"1 2025568\n",
"Name: boy, dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['boy'] = (df.sex=='M').astype(int)\n",
"df.boy.value_counts().sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All births are from 2014."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2012 3960796\n",
"Name: year, dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.year.value_counts().sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's age:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 3676\n",
"2 305837\n",
"3 918221\n",
"4 1126139\n",
"5 1015784\n",
"6 473533\n",
"7 109807\n",
"8 7187\n",
"9 612\n",
"Name: mager9, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mager9.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mager9 | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 112 | \n",
"
\n",
" \n",
" 2 | \n",
" 106 | \n",
"
\n",
" \n",
" 3 | \n",
" 104 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
"
\n",
" \n",
" 5 | \n",
" 105 | \n",
"
\n",
" \n",
" 6 | \n",
" 105 | \n",
"
\n",
" \n",
" 7 | \n",
" 105 | \n",
"
\n",
" \n",
" 8 | \n",
" 100 | \n",
"
\n",
" \n",
" 9 | \n",
" 112 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mager9 \n",
"1 112\n",
"2 106\n",
"3 104\n",
"4 105\n",
"5 105\n",
"6 105\n",
"7 105\n",
"8 100\n",
"9 112"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mager9'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0.0"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mager9.isnull().mean()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.078144140723228367, 0.029692516352773535)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['youngm'] = df.mager9<=2\n",
"df['oldm'] = df.mager9>=7\n",
"df.youngm.mean(), df.oldm.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Residence status (1=resident)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 2874513\n",
"2 993222\n",
"3 85106\n",
"4 7955\n",
"Name: restatus, dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.restatus.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" restatus | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 105 | \n",
"
\n",
" \n",
" 2 | \n",
" 105 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 108 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"restatus \n",
"1 105\n",
"2 105\n",
"3 105\n",
"4 108"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'restatus'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 3007229\n",
"2 634411\n",
"3 46105\n",
"4 273051\n",
"Name: mbrace, dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mbrace.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mbrace | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 105 | \n",
"
\n",
" \n",
" 2 | \n",
" 103 | \n",
"
\n",
" \n",
" 3 | \n",
" 104 | \n",
"
\n",
" \n",
" 4 | \n",
" 106 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mbrace \n",
"1 105\n",
"2 103\n",
"3 104\n",
"4 106"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mbrace'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's Hispanic origin (0=Non-Hispanic)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 3015510\n",
"1 562250\n",
"2 67192\n",
"3 17400\n",
"4 131955\n",
"5 135597\n",
"Name: mhisp_r, dtype: int64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mhisp_r.replace([9], np.nan, inplace=True)\n",
"df.mhisp_r.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def copy_null(df, oldvar, newvar):\n",
" df.loc[df[oldvar].isnull(), newvar] = np.nan"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.0077994423343186571, 0.23267591269405055)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['mhisp'] = df.mhisp_r > 0\n",
"copy_null(df, 'mhisp_r', 'mhisp')\n",
"df.mhisp.isnull().mean(), df.mhisp.mean()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mhisp | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mhisp \n",
"0 105\n",
"1 104"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mhisp'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Marital status (1=Married)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 2349102\n",
"2 1611694\n",
"Name: dmar, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dmar.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" dmar | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 105 | \n",
"
\n",
" \n",
" 2 | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"dmar \n",
"1 105\n",
"2 104"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'dmar'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).\n",
"\n",
"I recode X (not applicable because married) as Y (paternity acknowledged)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"N 430123\n",
"Y 3058398\n",
"Name: mar_p, dtype: int64"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.mar_p.replace(['U'], np.nan, inplace=True)\n",
"df.mar_p.replace(['X'], 'Y', inplace=True)\n",
"df.mar_p.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mar_p | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" N | \n",
" 103 | \n",
"
\n",
" \n",
" Y | \n",
" 105 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mar_p \n",
"N 103\n",
"Y 105"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mar_p'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's education level"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 144045\n",
"2 443007\n",
"3 858548\n",
"4 732444\n",
"5 266066\n",
"6 644497\n",
"7 282351\n",
"8 81074\n",
"Name: meduc, dtype: int64"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.meduc.replace([9], np.nan, inplace=True)\n",
"df.meduc.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" meduc | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 103 | \n",
"
\n",
" \n",
" 2 | \n",
" 104 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
"
\n",
" \n",
" 5 | \n",
" 105 | \n",
"
\n",
" \n",
" 6 | \n",
" 105 | \n",
"
\n",
" \n",
" 7 | \n",
" 105 | \n",
"
\n",
" \n",
" 8 | \n",
" 107 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"meduc \n",
"1 103\n",
"2 104\n",
"3 105\n",
"4 105\n",
"5 105\n",
"6 105\n",
"7 105\n",
"8 107"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'meduc'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.12844993784077746, 0.17005983722051243)"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['lowed'] = df.meduc <= 2\n",
"copy_null(df, 'meduc', 'lowed')\n",
"df.lowed.isnull().mean(), df.lowed.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Father's age, in 10 ranges"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 422\n",
"2 104428\n",
"3 527157\n",
"4 871442\n",
"5 977564\n",
"6 591733\n",
"7 257619\n",
"8 84016\n",
"9 26361\n",
"10 11389\n",
"Name: fagerrec11, dtype: int64"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.fagerrec11.replace([11], np.nan, inplace=True)\n",
"df.fagerrec11.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" fagerrec11 | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 101 | \n",
"
\n",
" \n",
" 2 | \n",
" 106 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
"
\n",
" \n",
" 5 | \n",
" 105 | \n",
"
\n",
" \n",
" 6 | \n",
" 105 | \n",
"
\n",
" \n",
" 7 | \n",
" 105 | \n",
"
\n",
" \n",
" 8 | \n",
" 104 | \n",
"
\n",
" \n",
" 9 | \n",
" 104 | \n",
"
\n",
" \n",
" 10 | \n",
" 107 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"fagerrec11 \n",
"1 101\n",
"2 106\n",
"3 105\n",
"4 105\n",
"5 105\n",
"6 105\n",
"7 105\n",
"8 104\n",
"9 104\n",
"10 107"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'fagerrec11'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.12842494286502007, 0.03037254379975731)"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['youngf'] = df.fagerrec11<=2\n",
"copy_null(df, 'fagerrec11', 'youngf')\n",
"df.youngf.isnull().mean(), df.youngf.mean()"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.12842494286502007, 0.03527270546801382)"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['oldf'] = df.fagerrec11>=8\n",
"copy_null(df, 'fagerrec11', 'oldf')\n",
"df.oldf.isnull().mean(), df.oldf.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Father's race"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 2475018\n",
"2 469930\n",
"3 35175\n",
"4 227463\n",
"Name: fbrace, dtype: int64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.fbrace.replace([9], np.nan, inplace=True)\n",
"df.fbrace.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" fbrace | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 105 | \n",
"
\n",
" \n",
" 2 | \n",
" 103 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 107 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"fbrace \n",
"1 105\n",
"2 103\n",
"3 105\n",
"4 107"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'fbrace'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 2603738\n",
"1 500926\n",
"2 57417\n",
"3 16953\n",
"4 105376\n",
"5 116056\n",
"Name: fhisp_r, dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.fhisp_r.replace([9], np.nan, inplace=True)\n",
"df.fhisp_r.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.14146903804184816, 0.23429965187124352)"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['fhisp'] = df.fhisp_r > 0\n",
"copy_null(df, 'fhisp_r', 'fhisp')\n",
"df.fhisp.isnull().mean(), df.fhisp.mean()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" fhisp | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"fhisp \n",
"0 105\n",
"1 104"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'fhisp'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Father's education level"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 142003\n",
"2 336801\n",
"3 852923\n",
"4 574580\n",
"5 201888\n",
"6 544487\n",
"7 210268\n",
"8 97452\n",
"Name: feduc, dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.feduc.replace([9], np.nan, inplace=True)\n",
"df.feduc.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" feduc | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 103 | \n",
"
\n",
" \n",
" 2 | \n",
" 105 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
"
\n",
" \n",
" 5 | \n",
" 105 | \n",
"
\n",
" \n",
" 6 | \n",
" 105 | \n",
"
\n",
" \n",
" 7 | \n",
" 106 | \n",
"
\n",
" \n",
" 8 | \n",
" 105 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"feduc \n",
"1 103\n",
"2 105\n",
"3 105\n",
"4 105\n",
"5 105\n",
"6 105\n",
"7 106\n",
"8 105"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'feduc'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Live birth order."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 1574534\n",
"2 1248053\n",
"3 651817\n",
"4 276179\n",
"5 106197\n",
"6 43907\n",
"7 19899\n",
"8 20268\n",
"Name: lbo_rec, dtype: int64"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.lbo_rec.replace([9], np.nan, inplace=True)\n",
"df.lbo_rec.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" lbo_rec | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 105 | \n",
"
\n",
" \n",
" 2 | \n",
" 105 | \n",
"
\n",
" \n",
" 3 | \n",
" 104 | \n",
"
\n",
" \n",
" 4 | \n",
" 103 | \n",
"
\n",
" \n",
" 5 | \n",
" 104 | \n",
"
\n",
" \n",
" 6 | \n",
" 101 | \n",
"
\n",
" \n",
" 7 | \n",
" 106 | \n",
"
\n",
" \n",
" 8 | \n",
" 103 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"lbo_rec \n",
"1 105\n",
"2 105\n",
"3 104\n",
"4 103\n",
"5 104\n",
"6 101\n",
"7 106\n",
"8 103"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'lbo_rec'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.0050348465308488492, 0.04828166686713083)"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['highbo'] = df.lbo_rec >= 5\n",
"copy_null(df, 'lbo_rec', 'highbo')\n",
"df.highbo.isnull().mean(), df.highbo.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of prenatal visits, in 11 ranges"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 53862\n",
"2 39409\n",
"3 90791\n",
"4 191909\n",
"5 361056\n",
"6 809787\n",
"7 1023277\n",
"8 659674\n",
"9 385390\n",
"10 98941\n",
"11 124582\n",
"Name: previs_rec, dtype: int64"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.previs_rec.replace([12], np.nan, inplace=True)\n",
"df.previs_rec.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"df.previs_rec.mean()\n",
"df['previs'] = df.previs_rec - 7"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" previs | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" -6 | \n",
" 106 | \n",
"
\n",
" \n",
" -5 | \n",
" 107 | \n",
"
\n",
" \n",
" -4 | \n",
" 107 | \n",
"
\n",
" \n",
" -3 | \n",
" 108 | \n",
"
\n",
" \n",
" -2 | \n",
" 107 | \n",
"
\n",
" \n",
" -1 | \n",
" 106 | \n",
"
\n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 103 | \n",
"
\n",
" \n",
" 2 | \n",
" 102 | \n",
"
\n",
" \n",
" 3 | \n",
" 100 | \n",
"
\n",
" \n",
" 4 | \n",
" 102 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"previs \n",
"-6 106\n",
"-5 107\n",
"-4 107\n",
"-3 108\n",
"-2 107\n",
"-1 106\n",
" 0 105\n",
" 1 103\n",
" 2 102\n",
" 3 100\n",
" 4 102"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'previs'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.030831681308504656, 0.014031393099395157)"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['no_previs'] = df.previs_rec <= 1\n",
"copy_null(df, 'previs_rec', 'no_previs')\n",
"df.no_previs.isnull().mean(), df.no_previs.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Whether the mother is eligible for food stamps"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"N 1820030\n",
"Y 1591601\n",
"Name: wic, dtype: int64"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.wic.replace(['U'], np.nan, inplace=True)\n",
"df.wic.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" wic | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" N | \n",
" 105 | \n",
"
\n",
" \n",
" Y | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"wic \n",
"N 105\n",
"Y 104"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'wic'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's height in inches"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"30 14\n",
"31 3\n",
"32 2\n",
"34 1\n",
"36 17\n",
"37 5\n",
"38 9\n",
"39 4\n",
"40 13\n",
"41 18\n",
"42 8\n",
"43 8\n",
"44 6\n",
"45 15\n",
"46 9\n",
"47 21\n",
"48 732\n",
"49 505\n",
"50 335\n",
"51 414\n",
"52 480\n",
"53 1384\n",
"54 1434\n",
"55 2561\n",
"56 6587\n",
"57 17396\n",
"58 19343\n",
"59 71557\n",
"60 190472\n",
"61 240815\n",
"62 424926\n",
"63 442238\n",
"64 505897\n",
"65 404563\n",
"66 390878\n",
"67 303110\n",
"68 174629\n",
"69 116518\n",
"70 56687\n",
"71 30085\n",
"72 14269\n",
"73 4971\n",
"74 2381\n",
"75 895\n",
"76 526\n",
"77 584\n",
"78 1011\n",
"Name: height, dtype: int64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.height.replace([99], np.nan, inplace=True)\n",
"df.height.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.13443257365438666, 0.03584275286903034)"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['mshort'] = df.height<60\n",
"copy_null(df, 'height', 'mshort')\n",
"df.mshort.isnull().mean(), df.mshort.mean()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.13443257365438666, 0.03249652309458583)"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['mtall'] = df.height>=70\n",
"copy_null(df, 'height', 'mtall')\n",
"df.mtall.isnull().mean(), df.mtall.mean()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mshort | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 103 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mshort \n",
"0 105\n",
"1 103"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mshort'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" mtall | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"mtall \n",
"0 105\n",
"1 104"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'mtall'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's BMI in 6 ranges"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 129937\n",
"2 1573715\n",
"3 849357\n",
"4 442695\n",
"5 206615\n",
"6 141411\n",
"Name: bmi_r, dtype: int64"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.bmi_r.replace([9], np.nan, inplace=True)\n",
"df.bmi_r.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" bmi_r | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 104 | \n",
"
\n",
" \n",
" 2 | \n",
" 105 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 104 | \n",
"
\n",
" \n",
" 5 | \n",
" 104 | \n",
"
\n",
" \n",
" 6 | \n",
" 104 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"bmi_r \n",
"1 104\n",
"2 105\n",
"3 105\n",
"4 104\n",
"5 104\n",
"6 104"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'bmi_r'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.15579343142136076, 0.23647872286338908)"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['obese'] = df.bmi_r >= 4\n",
"copy_null(df, 'bmi_r', 'obese')\n",
"df.obese.isnull().mean(), df.obese.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"1 1497162\n",
"2 1628336\n",
"3 147475\n",
"4 174821\n",
"Name: pay_rec, dtype: int64"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.pay_rec.replace([9], np.nan, inplace=True)\n",
"df.pay_rec.value_counts().sort_index()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" pay_rec | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 104 | \n",
"
\n",
" \n",
" 2 | \n",
" 105 | \n",
"
\n",
" \n",
" 3 | \n",
" 105 | \n",
"
\n",
" \n",
" 4 | \n",
" 105 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"pay_rec \n",
"1 104\n",
"2 105\n",
"3 105\n",
"4 105"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'pay_rec'\n",
"df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sex of baby"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"F 1935228\n",
"M 2025568\n",
"Name: sex, dtype: int64"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sex.value_counts().sort_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regression models\n",
"\n",
"Here are some functions I'll use to interpret the results of logistic regression"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def logodds_to_ratio(logodds):\n",
" \"\"\"Convert from log odds to probability.\"\"\"\n",
" odds = np.exp(logodds)\n",
" return 100 * odds\n",
"\n",
"def summarize(results):\n",
" \"\"\"Summarize parameters in terms of birth ratio.\"\"\"\n",
" inter_or = results.params['Intercept']\n",
" inter_rat = logodds_to_ratio(inter_or)\n",
" \n",
" for value, lor in results.params.iteritems():\n",
" if value=='Intercept':\n",
" continue\n",
" \n",
" rat = logodds_to_ratio(inter_or + lor)\n",
" code = '*' if results.pvalues[value] < 0.05 else ' '\n",
" \n",
" print('%-20s %0.1f %0.1f' % (value, inter_rat, rat), code)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I'll run models with each variable, one at a time."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's age seems to have no predictive value:"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692887\n",
" Iterations 3\n",
"mager9 104.9 104.8 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3960796 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3960794 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.778e-08 | \n",
"
\n",
"\n",
" Time: | 14:54:16 | Log-Likelihood: | -2.7444e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7444e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.5733 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0475 | 0.004 | 13.358 | 0.000 | 0.041 0.055 | \n",
"
\n",
"\n",
" mager9 | -0.0005 | 0.001 | -0.563 | 0.573 | -0.002 0.001 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3960796\n",
"Model: Logit Df Residuals: 3960794\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.778e-08\n",
"Time: 14:54:16 Log-Likelihood: -2.7444e+06\n",
"converged: True LL-Null: -2.7444e+06\n",
" LLR p-value: 0.5733\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0475 0.004 13.358 0.000 0.041 0.055\n",
"mager9 -0.0005 0.001 -0.563 0.573 -0.002 0.001\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ mager9', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant."
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692886\n",
" Iterations 3\n",
"youngm[T.True] 104.6 105.6 *\n",
"oldm[T.True] 104.6 104.4 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3960796 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3960793 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.205e-06 | \n",
"
\n",
"\n",
" Time: | 14:54:22 | Log-Likelihood: | -2.7444e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7444e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.03667 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0449 | 0.001 | 42.231 | 0.000 | 0.043 0.047 | \n",
"
\n",
"\n",
" youngm[T.True] | 0.0095 | 0.004 | 2.529 | 0.011 | 0.002 0.017 | \n",
"
\n",
"\n",
" oldm[T.True] | -0.0020 | 0.006 | -0.334 | 0.739 | -0.014 0.010 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3960796\n",
"Model: Logit Df Residuals: 3960793\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.205e-06\n",
"Time: 14:54:22 Log-Likelihood: -2.7444e+06\n",
"converged: True LL-Null: -2.7444e+06\n",
" LLR p-value: 0.03667\n",
"==================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"----------------------------------------------------------------------------------\n",
"Intercept 0.0449 0.001 42.231 0.000 0.043 0.047\n",
"youngm[T.True] 0.0095 0.004 2.529 0.011 0.002 0.017\n",
"oldm[T.True] -0.0020 0.006 -0.334 0.739 -0.014 0.010\n",
"==================================================================================\n",
"\"\"\""
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ youngm + oldm', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Neither does residence status"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692887\n",
" Iterations 3\n",
"C(restatus)[T.2] 104.6 104.7 \n",
"C(restatus)[T.3] 104.6 105.4 \n",
"C(restatus)[T.4] 104.6 108.2 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3960796 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3960792 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 3 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 6.393e-07 | \n",
"
\n",
"\n",
" Time: | 14:54:48 | Log-Likelihood: | -2.7444e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7444e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.3196 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0452 | 0.001 | 38.300 | 0.000 | 0.043 0.048 | \n",
"
\n",
"\n",
" C(restatus)[T.2] | 0.0008 | 0.002 | 0.338 | 0.735 | -0.004 0.005 | \n",
"
\n",
"\n",
" C(restatus)[T.3] | 0.0078 | 0.007 | 1.126 | 0.260 | -0.006 0.021 | \n",
"
\n",
"\n",
" C(restatus)[T.4] | 0.0335 | 0.022 | 1.493 | 0.136 | -0.011 0.078 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3960796\n",
"Model: Logit Df Residuals: 3960792\n",
"Method: MLE Df Model: 3\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 6.393e-07\n",
"Time: 14:54:48 Log-Likelihood: -2.7444e+06\n",
"converged: True LL-Null: -2.7444e+06\n",
" LLR p-value: 0.3196\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0452 0.001 38.300 0.000 0.043 0.048\n",
"C(restatus)[T.2] 0.0008 0.002 0.338 0.735 -0.004 0.005\n",
"C(restatus)[T.3] 0.0078 0.007 1.126 0.260 -0.006 0.021\n",
"C(restatus)[T.4] 0.0335 0.022 1.493 0.136 -0.011 0.078\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(restatus)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys."
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692881\n",
" Iterations 3\n",
"C(mbrace)[T.2] 104.8 103.3 *\n",
"C(mbrace)[T.3] 104.8 104.0 \n",
"C(mbrace)[T.4] 104.8 106.3 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3960796 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3960792 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 3 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.640e-06 | \n",
"
\n",
"\n",
" Time: | 14:55:15 | Log-Likelihood: | -2.7444e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7444e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 2.829e-10 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0471 | 0.001 | 40.838 | 0.000 | 0.045 0.049 | \n",
"
\n",
"\n",
" C(mbrace)[T.2] | -0.0149 | 0.003 | -5.382 | 0.000 | -0.020 -0.009 | \n",
"
\n",
"\n",
" C(mbrace)[T.3] | -0.0075 | 0.009 | -0.799 | 0.424 | -0.026 0.011 | \n",
"
\n",
"\n",
" C(mbrace)[T.4] | 0.0143 | 0.004 | 3.567 | 0.000 | 0.006 0.022 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3960796\n",
"Model: Logit Df Residuals: 3960792\n",
"Method: MLE Df Model: 3\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.640e-06\n",
"Time: 14:55:15 Log-Likelihood: -2.7444e+06\n",
"converged: True LL-Null: -2.7444e+06\n",
" LLR p-value: 2.829e-10\n",
"==================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"----------------------------------------------------------------------------------\n",
"Intercept 0.0471 0.001 40.838 0.000 0.045 0.049\n",
"C(mbrace)[T.2] -0.0149 0.003 -5.382 0.000 -0.020 -0.009\n",
"C(mbrace)[T.3] -0.0075 0.009 -0.799 0.424 -0.026 0.011\n",
"C(mbrace)[T.4] 0.0143 0.004 3.567 0.000 0.006 0.022\n",
"==================================================================================\n",
"\"\"\""
]
},
"execution_count": 63,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(mbrace)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Hispanic mothers have more girls."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692884\n",
" Iterations 3\n",
"mhisp 105.0 103.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3929904 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3929902 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.225e-06 | \n",
"
\n",
"\n",
" Time: | 14:55:20 | Log-Likelihood: | -2.7230e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7230e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 9.580e-08 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0485 | 0.001 | 42.133 | 0.000 | 0.046 0.051 | \n",
"
\n",
"\n",
" mhisp | -0.0127 | 0.002 | -5.335 | 0.000 | -0.017 -0.008 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3929904\n",
"Model: Logit Df Residuals: 3929902\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.225e-06\n",
"Time: 14:55:20 Log-Likelihood: -2.7230e+06\n",
"converged: True LL-Null: -2.7230e+06\n",
" LLR p-value: 9.580e-08\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0485 0.001 42.133 0.000 0.046 0.051\n",
"mhisp -0.0127 0.002 -5.335 0.000 -0.017 -0.008\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ mhisp', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692875\n",
" Iterations 3\n",
"C(mar_p)[T.Y] 103.4 104.9 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3488521 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3488519 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 4.062e-06 | \n",
"
\n",
"\n",
" Time: | 14:55:45 | Log-Likelihood: | -2.4171e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.4171e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 9.370e-06 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0338 | 0.003 | 11.071 | 0.000 | 0.028 0.040 | \n",
"
\n",
"\n",
" C(mar_p)[T.Y] | 0.0144 | 0.003 | 4.431 | 0.000 | 0.008 0.021 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3488521\n",
"Model: Logit Df Residuals: 3488519\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 4.062e-06\n",
"Time: 14:55:45 Log-Likelihood: -2.4171e+06\n",
"converged: True LL-Null: -2.4171e+06\n",
" LLR p-value: 9.370e-06\n",
"=================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"---------------------------------------------------------------------------------\n",
"Intercept 0.0338 0.003 11.071 0.000 0.028 0.040\n",
"C(mar_p)[T.Y] 0.0144 0.003 4.431 0.000 0.008 0.021\n",
"=================================================================================\n",
"\"\"\""
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(mar_p)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Being unmarried predicts more girls."
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692885\n",
" Iterations 3\n",
"C(dmar)[T.2] 105.0 104.2 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3960796 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3960794 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 2.561e-06 | \n",
"
\n",
"\n",
" Time: | 14:56:11 | Log-Likelihood: | -2.7444e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7444e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.0001776 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0487 | 0.001 | 37.345 | 0.000 | 0.046 0.051 | \n",
"
\n",
"\n",
" C(dmar)[T.2] | -0.0077 | 0.002 | -3.749 | 0.000 | -0.012 -0.004 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3960796\n",
"Model: Logit Df Residuals: 3960794\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 2.561e-06\n",
"Time: 14:56:11 Log-Likelihood: -2.7444e+06\n",
"converged: True LL-Null: -2.7444e+06\n",
" LLR p-value: 0.0001776\n",
"================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"--------------------------------------------------------------------------------\n",
"Intercept 0.0487 0.001 37.345 0.000 0.046 0.051\n",
"C(dmar)[T.2] -0.0077 0.002 -3.749 0.000 -0.012 -0.004\n",
"================================================================================\n",
"\"\"\""
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(dmar)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Each level of mother's education predicts a small increase in the probability of a boy."
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692874\n",
" Iterations 3\n",
"meduc 103.4 103.7 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3452032 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3452030 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.742e-06 | \n",
"
\n",
"\n",
" Time: | 14:56:15 | Log-Likelihood: | -2.3918e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3918e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.599e-07 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0330 | 0.003 | 11.862 | 0.000 | 0.028 0.038 | \n",
"
\n",
"\n",
" meduc | 0.0032 | 0.001 | 5.241 | 0.000 | 0.002 0.004 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3452032\n",
"Model: Logit Df Residuals: 3452030\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.742e-06\n",
"Time: 14:56:15 Log-Likelihood: -2.3918e+06\n",
"converged: True LL-Null: -2.3918e+06\n",
" LLR p-value: 1.599e-07\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0330 0.003 11.862 0.000 0.028 0.038\n",
"meduc 0.0032 0.001 5.241 0.000 0.002 0.004\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ meduc', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692875\n",
" Iterations 3\n",
"lowed 105.0 103.7 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3452032 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3452030 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 3.472e-06 | \n",
"
\n",
"\n",
" Time: | 14:56:19 | Log-Likelihood: | -2.3918e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3918e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 4.594e-05 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0484 | 0.001 | 40.975 | 0.000 | 0.046 0.051 | \n",
"
\n",
"\n",
" lowed | -0.0117 | 0.003 | -4.075 | 0.000 | -0.017 -0.006 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3452032\n",
"Model: Logit Df Residuals: 3452030\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 3.472e-06\n",
"Time: 14:56:19 Log-Likelihood: -2.3918e+06\n",
"converged: True LL-Null: -2.3918e+06\n",
" LLR p-value: 4.594e-05\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0484 0.001 40.975 0.000 0.046 0.051\n",
"lowed -0.0117 0.003 -4.075 0.000 -0.017 -0.006\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ lowed', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance)."
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692865\n",
" Iterations 3\n",
"fagerrec11 105.5 105.3 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3452131 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3452129 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.250e-07 | \n",
"
\n",
"\n",
" Time: | 14:56:23 | Log-Likelihood: | -2.3919e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3919e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.1130 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0533 | 0.004 | 13.960 | 0.000 | 0.046 0.061 | \n",
"
\n",
"\n",
" fagerrec11 | -0.0012 | 0.001 | -1.585 | 0.113 | -0.003 0.000 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3452131\n",
"Model: Logit Df Residuals: 3452129\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.250e-07\n",
"Time: 14:56:23 Log-Likelihood: -2.3919e+06\n",
"converged: True LL-Null: -2.3919e+06\n",
" LLR p-value: 0.1130\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0533 0.004 13.960 0.000 0.046 0.061\n",
"fagerrec11 -0.0012 0.001 -1.585 0.113 -0.003 0.000\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ fagerrec11', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692865\n",
" Iterations 3\n",
"youngf 104.9 105.8 \n",
"oldf 104.9 104.2 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3452131 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3452128 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 7.160e-07 | \n",
"
\n",
"\n",
" Time: | 14:56:28 | Log-Likelihood: | -2.3919e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3919e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.1804 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0474 | 0.001 | 42.574 | 0.000 | 0.045 0.050 | \n",
"
\n",
"\n",
" youngf | 0.0088 | 0.006 | 1.405 | 0.160 | -0.003 0.021 | \n",
"
\n",
"\n",
" oldf | -0.0068 | 0.006 | -1.156 | 0.248 | -0.018 0.005 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3452131\n",
"Model: Logit Df Residuals: 3452128\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 7.160e-07\n",
"Time: 14:56:28 Log-Likelihood: -2.3919e+06\n",
"converged: True LL-Null: -2.3919e+06\n",
" LLR p-value: 0.1804\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0474 0.001 42.574 0.000 0.045 0.050\n",
"youngf 0.0088 0.006 1.405 0.160 -0.003 0.021\n",
"oldf -0.0068 0.006 -1.156 0.248 -0.018 0.005\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 70,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ youngf + oldf', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers."
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692850\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.0 103.4 *\n",
"C(fbrace)[T.3.0] 105.0 104.7 \n",
"C(fbrace)[T.4.0] 105.0 107.1 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3207586 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3207582 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 3 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.138e-05 | \n",
"
\n",
"\n",
" Time: | 14:56:53 | Log-Likelihood: | -2.2224e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.2224e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 6.021e-11 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0492 | 0.001 | 38.677 | 0.000 | 0.047 0.052 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0161 | 0.003 | -5.070 | 0.000 | -0.022 -0.010 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0035 | 0.011 | -0.328 | 0.743 | -0.025 0.018 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0191 | 0.004 | 4.360 | 0.000 | 0.011 0.028 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3207586\n",
"Model: Logit Df Residuals: 3207582\n",
"Method: MLE Df Model: 3\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.138e-05\n",
"Time: 14:56:53 Log-Likelihood: -2.2224e+06\n",
"converged: True LL-Null: -2.2224e+06\n",
" LLR p-value: 6.021e-11\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0492 0.001 38.677 0.000 0.047 0.052\n",
"C(fbrace)[T.2.0] -0.0161 0.003 -5.070 0.000 -0.022 -0.010\n",
"C(fbrace)[T.3.0] -0.0035 0.011 -0.328 0.743 -0.025 0.018\n",
"C(fbrace)[T.4.0] 0.0191 0.004 4.360 0.000 0.011 0.028\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(fbrace)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the father is Hispanic, that predicts more girls."
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692864\n",
" Iterations 3\n",
"fhisp 105.2 103.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3400466 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3400464 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.006e-06 | \n",
"
\n",
"\n",
" Time: | 14:56:57 | Log-Likelihood: | -2.3561e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3561e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 8.137e-10 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0508 | 0.001 | 41.012 | 0.000 | 0.048 0.053 | \n",
"
\n",
"\n",
" fhisp | -0.0157 | 0.003 | -6.142 | 0.000 | -0.021 -0.011 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3400466\n",
"Model: Logit Df Residuals: 3400464\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.006e-06\n",
"Time: 14:56:57 Log-Likelihood: -2.3561e+06\n",
"converged: True LL-Null: -2.3561e+06\n",
" LLR p-value: 8.137e-10\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0508 0.001 41.012 0.000 0.048 0.053\n",
"fhisp -0.0157 0.003 -6.142 0.000 -0.021 -0.011\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ fhisp', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Father's education level might predict more boys, but the apparent effect could be due to chance."
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692855\n",
" Iterations 3\n",
"feduc 103.9 104.1 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2960402 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2960400 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 3.476e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:00 | Log-Likelihood: | -2.0511e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.0511e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.0001591 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0379 | 0.003 | 12.866 | 0.000 | 0.032 0.044 | \n",
"
\n",
"\n",
" feduc | 0.0025 | 0.001 | 3.776 | 0.000 | 0.001 0.004 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2960402\n",
"Model: Logit Df Residuals: 2960400\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 3.476e-06\n",
"Time: 14:57:00 Log-Likelihood: -2.0511e+06\n",
"converged: True LL-Null: -2.0511e+06\n",
" LLR p-value: 0.0001591\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0379 0.003 12.866 0.000 0.032 0.044\n",
"feduc 0.0025 0.001 3.776 0.000 0.001 0.004\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ feduc', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Babies with high birth order are slightly more likely to be girls."
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692885\n",
" Iterations 3\n",
"lbo_rec 105.5 105.1 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3940854 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3940852 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 4.164e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:05 | Log-Likelihood: | -2.7306e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7306e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.855e-06 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0536 | 0.002 | 27.348 | 0.000 | 0.050 0.057 | \n",
"
\n",
"\n",
" lbo_rec | -0.0038 | 0.001 | -4.769 | 0.000 | -0.005 -0.002 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3940854\n",
"Model: Logit Df Residuals: 3940852\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 4.164e-06\n",
"Time: 14:57:05 Log-Likelihood: -2.7306e+06\n",
"converged: True LL-Null: -2.7306e+06\n",
" LLR p-value: 1.855e-06\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0536 0.002 27.348 0.000 0.050 0.057\n",
"lbo_rec -0.0038 0.001 -4.769 0.000 -0.005 -0.002\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ lbo_rec', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692887\n",
" Iterations 3\n",
"highbo 104.7 103.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3940854 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3940852 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.626e-07 | \n",
"
\n",
"\n",
" Time: | 14:57:10 | Log-Likelihood: | -2.7306e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.7306e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.02997 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0460 | 0.001 | 44.570 | 0.000 | 0.044 0.048 | \n",
"
\n",
"\n",
" highbo | -0.0102 | 0.005 | -2.171 | 0.030 | -0.019 -0.001 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3940854\n",
"Model: Logit Df Residuals: 3940852\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.626e-07\n",
"Time: 14:57:10 Log-Likelihood: -2.7306e+06\n",
"converged: True LL-Null: -2.7306e+06\n",
" LLR p-value: 0.02997\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0460 0.001 44.570 0.000 0.044 0.048\n",
"highbo -0.0102 0.005 -2.171 0.030 -0.019 -0.001\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ highbo', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Strangely, prenatal visits are associated with an increased probability of girls."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692859\n",
" Iterations 3\n",
"previs 104.5 103.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3838678 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3838676 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 4.565e-05 | \n",
"
\n",
"\n",
" Time: | 14:57:15 | Log-Likelihood: | -2.6597e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.6598e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 9.364e-55 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0436 | 0.001 | 42.437 | 0.000 | 0.042 0.046 | \n",
"
\n",
"\n",
" previs | -0.0086 | 0.001 | -15.583 | 0.000 | -0.010 -0.007 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3838678\n",
"Model: Logit Df Residuals: 3838676\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 4.565e-05\n",
"Time: 14:57:15 Log-Likelihood: -2.6597e+06\n",
"converged: True LL-Null: -2.6598e+06\n",
" LLR p-value: 9.364e-55\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0436 0.001 42.437 0.000 0.042 0.046\n",
"previs -0.0086 0.001 -15.583 0.000 -0.010 -0.007\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ previs', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits."
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692856\n",
" Iterations 3\n",
"no_previs 104.5 99.7 *\n",
"previs 104.5 103.5 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3838678 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3838675 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.047e-05 | \n",
"
\n",
"\n",
" Time: | 14:57:21 | Log-Likelihood: | -2.6597e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.6598e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 5.053e-59 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0440 | 0.001 | 42.713 | 0.000 | 0.042 0.046 | \n",
"
\n",
"\n",
" no_previs | -0.0473 | 0.009 | -5.061 | 0.000 | -0.066 -0.029 | \n",
"
\n",
"\n",
" previs | -0.0097 | 0.001 | -16.347 | 0.000 | -0.011 -0.009 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3838678\n",
"Model: Logit Df Residuals: 3838675\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.047e-05\n",
"Time: 14:57:21 Log-Likelihood: -2.6597e+06\n",
"converged: True LL-Null: -2.6598e+06\n",
" LLR p-value: 5.053e-59\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0440 0.001 42.713 0.000 0.042 0.046\n",
"no_previs -0.0473 0.009 -5.061 0.000 -0.066 -0.029\n",
"previs -0.0097 0.001 -16.347 0.000 -0.011 -0.009\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ no_previs + previs', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the mother qualifies for food stamps, she is more likely to have a girl."
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692878\n",
" Iterations 3\n",
"wic[T.Y] 105.2 104.2 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3411631 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3411629 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 3.607e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:47 | Log-Likelihood: | -2.3638e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3639e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 3.635e-05 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0504 | 0.001 | 33.979 | 0.000 | 0.047 0.053 | \n",
"
\n",
"\n",
" wic[T.Y] | -0.0090 | 0.002 | -4.130 | 0.000 | -0.013 -0.005 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3411631\n",
"Model: Logit Df Residuals: 3411629\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 3.607e-06\n",
"Time: 14:57:47 Log-Likelihood: -2.3638e+06\n",
"converged: True LL-Null: -2.3639e+06\n",
" LLR p-value: 3.635e-05\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0504 0.001 33.979 0.000 0.047 0.053\n",
"wic[T.Y] -0.0090 0.002 -4.130 0.000 -0.013 -0.005\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ wic', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's height seems to have no predictive value."
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692877\n",
" Iterations 3\n",
"height 99.3 99.3 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3428336 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3428334 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.043e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:51 | Log-Likelihood: | -2.3754e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3754e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.02598 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | -0.0075 | 0.024 | -0.309 | 0.757 | -0.055 0.040 | \n",
"
\n",
"\n",
" height | 0.0008 | 0.000 | 2.226 | 0.026 | 0.000 0.002 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3428336\n",
"Model: Logit Df Residuals: 3428334\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.043e-06\n",
"Time: 14:57:51 Log-Likelihood: -2.3754e+06\n",
"converged: True LL-Null: -2.3754e+06\n",
" LLR p-value: 0.02598\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept -0.0075 0.024 -0.309 0.757 -0.055 0.040\n",
"height 0.0008 0.000 2.226 0.026 0.000 0.002\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ height', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692876\n",
" Iterations 3\n",
"mtall 104.8 104.0 \n",
"mshort 104.8 103.3 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3428336 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3428333 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.593e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:55 | Log-Likelihood: | -2.3754e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3754e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.02272 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0472 | 0.001 | 42.200 | 0.000 | 0.045 0.049 | \n",
"
\n",
"\n",
" mtall | -0.0076 | 0.006 | -1.249 | 0.212 | -0.020 0.004 | \n",
"
\n",
"\n",
" mshort | -0.0145 | 0.006 | -2.494 | 0.013 | -0.026 -0.003 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3428336\n",
"Model: Logit Df Residuals: 3428333\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.593e-06\n",
"Time: 14:57:55 Log-Likelihood: -2.3754e+06\n",
"converged: True LL-Null: -2.3754e+06\n",
" LLR p-value: 0.02272\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0472 0.001 42.200 0.000 0.045 0.049\n",
"mtall -0.0076 0.006 -1.249 0.212 -0.020 0.004\n",
"mshort -0.0145 0.006 -2.494 0.013 -0.026 -0.003\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ mtall + mshort', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Mother's with higher BMI are more likely to have girls."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692879\n",
" Iterations 3\n",
"bmi_r 105.4 105.1 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3343730 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3343728 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.109e-06 | \n",
"
\n",
"\n",
" Time: | 14:57:59 | Log-Likelihood: | -2.3168e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3168e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.02338 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0523 | 0.003 | 18.191 | 0.000 | 0.047 0.058 | \n",
"
\n",
"\n",
" bmi_r | -0.0021 | 0.001 | -2.267 | 0.023 | -0.004 -0.000 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3343730\n",
"Model: Logit Df Residuals: 3343728\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.109e-06\n",
"Time: 14:57:59 Log-Likelihood: -2.3168e+06\n",
"converged: True LL-Null: -2.3168e+06\n",
" LLR p-value: 0.02338\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0523 0.003 18.191 0.000 0.047 0.058\n",
"bmi_r -0.0021 0.001 -2.267 0.023 -0.004 -0.000\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ bmi_r', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692878\n",
" Iterations 3\n",
"obese 104.9 104.1 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3343730 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3343728 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.833e-06 | \n",
"
\n",
"\n",
" Time: | 14:58:03 | Log-Likelihood: | -2.3168e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3168e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.003567 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0481 | 0.001 | 38.389 | 0.000 | 0.046 0.051 | \n",
"
\n",
"\n",
" obese | -0.0075 | 0.003 | -2.914 | 0.004 | -0.013 -0.002 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3343730\n",
"Model: Logit Df Residuals: 3343728\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.833e-06\n",
"Time: 14:58:03 Log-Likelihood: -2.3168e+06\n",
"converged: True LL-Null: -2.3168e+06\n",
" LLR p-value: 0.003567\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0481 0.001 38.389 0.000 0.046 0.051\n",
"obese -0.0075 0.003 -2.914 0.004 -0.013 -0.002\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ obese', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys."
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692877\n",
" Iterations 3\n",
"C(pay_rec)[T.2.0] 104.4 105.1 *\n",
"C(pay_rec)[T.3.0] 104.4 105.3 \n",
"C(pay_rec)[T.4.0] 104.4 104.7 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3447794 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3447790 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 3 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 2.074e-06 | \n",
"
\n",
"\n",
" Time: | 14:58:29 | Log-Likelihood: | -2.3889e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.3889e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.01934 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0427 | 0.002 | 26.107 | 0.000 | 0.039 0.046 | \n",
"
\n",
"\n",
" C(pay_rec)[T.2.0] | 0.0067 | 0.002 | 2.944 | 0.003 | 0.002 0.011 | \n",
"
\n",
"\n",
" C(pay_rec)[T.3.0] | 0.0094 | 0.005 | 1.720 | 0.085 | -0.001 0.020 | \n",
"
\n",
"\n",
" C(pay_rec)[T.4.0] | 0.0033 | 0.005 | 0.645 | 0.519 | -0.007 0.013 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3447794\n",
"Model: Logit Df Residuals: 3447790\n",
"Method: MLE Df Model: 3\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 2.074e-06\n",
"Time: 14:58:29 Log-Likelihood: -2.3889e+06\n",
"converged: True LL-Null: -2.3889e+06\n",
" LLR p-value: 0.01934\n",
"=====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept 0.0427 0.002 26.107 0.000 0.039 0.046\n",
"C(pay_rec)[T.2.0] 0.0067 0.002 2.944 0.003 0.002 0.011\n",
"C(pay_rec)[T.3.0] 0.0094 0.005 1.720 0.085 -0.001 0.020\n",
"C(pay_rec)[T.4.0] 0.0033 0.005 0.645 0.519 -0.007 0.013\n",
"=====================================================================================\n",
"\"\"\""
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model = smf.logit('boy ~ C(pay_rec)', data=df) \n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding controls\n",
"\n",
"However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.\n",
"\n",
"In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value."
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692846\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.3 *\n",
"C(fbrace)[T.3.0] 105.5 104.1 \n",
"C(fbrace)[T.4.0] 105.5 107.0 \n",
"C(mbrace)[T.2] 105.5 105.7 \n",
"C(mbrace)[T.3] 105.5 106.9 \n",
"C(mbrace)[T.4] 105.5 105.6 \n",
"fhisp 105.5 104.1 *\n",
"mhisp 105.5 105.0 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3184121 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3184112 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 8 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.935e-05 | \n",
"
\n",
"\n",
" Time: | 14:59:16 | Log-Likelihood: | -2.2061e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.2061e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 3.988e-15 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0531 | 0.001 | 35.736 | 0.000 | 0.050 0.056 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0211 | 0.006 | -3.688 | 0.000 | -0.032 -0.010 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0125 | 0.013 | -1.002 | 0.316 | -0.037 0.012 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0142 | 0.007 | 1.936 | 0.053 | -0.000 0.029 | \n",
"
\n",
"\n",
" C(mbrace)[T.2] | 0.0022 | 0.006 | 0.367 | 0.714 | -0.010 0.014 | \n",
"
\n",
"\n",
" C(mbrace)[T.3] | 0.0140 | 0.013 | 1.076 | 0.282 | -0.012 0.040 | \n",
"
\n",
"\n",
" C(mbrace)[T.4] | 0.0013 | 0.007 | 0.186 | 0.853 | -0.012 0.015 | \n",
"
\n",
"\n",
" fhisp | -0.0132 | 0.004 | -2.951 | 0.003 | -0.022 -0.004 | \n",
"
\n",
"\n",
" mhisp | -0.0046 | 0.004 | -1.045 | 0.296 | -0.013 0.004 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3184121\n",
"Model: Logit Df Residuals: 3184112\n",
"Method: MLE Df Model: 8\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.935e-05\n",
"Time: 14:59:16 Log-Likelihood: -2.2061e+06\n",
"converged: True LL-Null: -2.2061e+06\n",
" LLR p-value: 3.988e-15\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0531 0.001 35.736 0.000 0.050 0.056\n",
"C(fbrace)[T.2.0] -0.0211 0.006 -3.688 0.000 -0.032 -0.010\n",
"C(fbrace)[T.3.0] -0.0125 0.013 -1.002 0.316 -0.037 0.012\n",
"C(fbrace)[T.4.0] 0.0142 0.007 1.936 0.053 -0.000 0.029\n",
"C(mbrace)[T.2] 0.0022 0.006 0.367 0.714 -0.010 0.014\n",
"C(mbrace)[T.3] 0.0140 0.013 1.076 0.282 -0.012 0.040\n",
"C(mbrace)[T.4] 0.0013 0.007 0.186 0.853 -0.012 0.015\n",
"fhisp -0.0132 0.004 -2.951 0.003 -0.022 -0.004\n",
"mhisp -0.0046 0.004 -1.045 0.296 -0.013 0.004\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity."
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692837\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.1 103.0 *\n",
"C(fbrace)[T.3.0] 105.1 104.0 \n",
"C(fbrace)[T.4.0] 105.1 106.6 *\n",
"mar_p[T.Y] 105.1 105.6 \n",
"fhisp 105.1 103.3 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2798315 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2798309 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.968e-05 | \n",
"
\n",
"\n",
" Time: | 15:00:03 | Log-Likelihood: | -1.9388e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.9388e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 4.935e-15 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0497 | 0.014 | 3.433 | 0.001 | 0.021 0.078 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0201 | 0.003 | -5.761 | 0.000 | -0.027 -0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0104 | 0.012 | -0.858 | 0.391 | -0.034 0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0144 | 0.005 | 3.013 | 0.003 | 0.005 0.024 | \n",
"
\n",
"\n",
" mar_p[T.Y] | 0.0045 | 0.014 | 0.310 | 0.757 | -0.024 0.033 | \n",
"
\n",
"\n",
" fhisp | -0.0177 | 0.003 | -5.694 | 0.000 | -0.024 -0.012 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2798315\n",
"Model: Logit Df Residuals: 2798309\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.968e-05\n",
"Time: 15:00:03 Log-Likelihood: -1.9388e+06\n",
"converged: True LL-Null: -1.9388e+06\n",
" LLR p-value: 4.935e-15\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0497 0.014 3.433 0.001 0.021 0.078\n",
"C(fbrace)[T.2.0] -0.0201 0.003 -5.761 0.000 -0.027 -0.013\n",
"C(fbrace)[T.3.0] -0.0104 0.012 -0.858 0.391 -0.034 0.013\n",
"C(fbrace)[T.4.0] 0.0144 0.005 3.013 0.003 0.005 0.024\n",
"mar_p[T.Y] 0.0045 0.014 0.310 0.757 -0.024 0.033\n",
"fhisp -0.0177 0.003 -5.694 0.000 -0.024 -0.012\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + mar_p')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Being married still predicts more boys."
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692846\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 104.9 102.7 *\n",
"C(fbrace)[T.3.0] 104.9 104.1 \n",
"C(fbrace)[T.4.0] 104.9 106.6 *\n",
"fhisp 104.9 103.0 *\n",
"dmar 104.9 105.3 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3188403 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3188397 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.937e-05 | \n",
"
\n",
"\n",
" Time: | 15:00:29 | Log-Likelihood: | -2.2091e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.2091e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 5.665e-17 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0478 | 0.003 | 13.880 | 0.000 | 0.041 0.055 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0209 | 0.003 | -6.174 | 0.000 | -0.028 -0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0079 | 0.011 | -0.728 | 0.467 | -0.029 0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0159 | 0.004 | 3.589 | 0.000 | 0.007 0.025 | \n",
"
\n",
"\n",
" fhisp | -0.0177 | 0.003 | -5.947 | 0.000 | -0.024 -0.012 | \n",
"
\n",
"\n",
" dmar | 0.0043 | 0.003 | 1.667 | 0.096 | -0.001 0.009 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3188403\n",
"Model: Logit Df Residuals: 3188397\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.937e-05\n",
"Time: 15:00:29 Log-Likelihood: -2.2091e+06\n",
"converged: True LL-Null: -2.2091e+06\n",
" LLR p-value: 5.665e-17\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0478 0.003 13.880 0.000 0.041 0.055\n",
"C(fbrace)[T.2.0] -0.0209 0.003 -6.174 0.000 -0.028 -0.014\n",
"C(fbrace)[T.3.0] -0.0079 0.011 -0.728 0.467 -0.029 0.013\n",
"C(fbrace)[T.4.0] 0.0159 0.004 3.589 0.000 0.007 0.025\n",
"fhisp -0.0177 0.003 -5.947 0.000 -0.024 -0.012\n",
"dmar 0.0043 0.003 1.667 0.096 -0.001 0.009\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + dmar')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect of education disappears."
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692836\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.6 103.6 *\n",
"C(fbrace)[T.3.0] 105.6 104.6 \n",
"C(fbrace)[T.4.0] 105.6 107.1 *\n",
"fhisp 105.6 103.9 *\n",
"lowed 105.6 105.0 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2777435 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2777429 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.992e-05 | \n",
"
\n",
"\n",
" Time: | 15:00:55 | Log-Likelihood: | -1.9243e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.9243e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 4.189e-15 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0546 | 0.002 | 34.777 | 0.000 | 0.052 0.058 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0198 | 0.004 | -5.634 | 0.000 | -0.027 -0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0100 | 0.012 | -0.823 | 0.410 | -0.034 0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0141 | 0.005 | 2.925 | 0.003 | 0.005 0.024 | \n",
"
\n",
"\n",
" fhisp | -0.0163 | 0.003 | -4.999 | 0.000 | -0.023 -0.010 | \n",
"
\n",
"\n",
" lowed | -0.0055 | 0.004 | -1.471 | 0.141 | -0.013 0.002 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2777435\n",
"Model: Logit Df Residuals: 2777429\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.992e-05\n",
"Time: 15:00:55 Log-Likelihood: -1.9243e+06\n",
"converged: True LL-Null: -1.9243e+06\n",
" LLR p-value: 4.189e-15\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0546 0.002 34.777 0.000 0.052 0.058\n",
"C(fbrace)[T.2.0] -0.0198 0.004 -5.634 0.000 -0.027 -0.013\n",
"C(fbrace)[T.3.0] -0.0100 0.012 -0.823 0.410 -0.034 0.014\n",
"C(fbrace)[T.4.0] 0.0141 0.005 2.925 0.003 0.005 0.024\n",
"fhisp -0.0163 0.003 -4.999 0.000 -0.023 -0.010\n",
"lowed -0.0055 0.004 -1.471 0.141 -0.013 0.002\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + lowed')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect of birth order disappears."
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692847\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.5 *\n",
"C(fbrace)[T.3.0] 105.5 104.7 \n",
"C(fbrace)[T.4.0] 105.5 107.1 *\n",
"fhisp 105.5 103.8 *\n",
"highbo 105.5 104.8 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3175026 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3175020 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.881e-05 | \n",
"
\n",
"\n",
" Time: | 15:01:20 | Log-Likelihood: | -2.1998e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1998e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 2.209e-16 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0531 | 0.001 | 36.240 | 0.000 | 0.050 0.056 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0192 | 0.003 | -5.879 | 0.000 | -0.026 -0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0074 | 0.011 | -0.683 | 0.495 | -0.029 0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0154 | 0.004 | 3.457 | 0.001 | 0.007 0.024 | \n",
"
\n",
"\n",
" fhisp | -0.0163 | 0.003 | -5.586 | 0.000 | -0.022 -0.011 | \n",
"
\n",
"\n",
" highbo | -0.0062 | 0.005 | -1.127 | 0.260 | -0.017 0.005 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3175026\n",
"Model: Logit Df Residuals: 3175020\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.881e-05\n",
"Time: 15:01:20 Log-Likelihood: -2.1998e+06\n",
"converged: True LL-Null: -2.1998e+06\n",
" LLR p-value: 2.209e-16\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0531 0.001 36.240 0.000 0.050 0.056\n",
"C(fbrace)[T.2.0] -0.0192 0.003 -5.879 0.000 -0.026 -0.013\n",
"C(fbrace)[T.3.0] -0.0074 0.011 -0.683 0.495 -0.029 0.014\n",
"C(fbrace)[T.4.0] 0.0154 0.004 3.457 0.001 0.007 0.024\n",
"fhisp -0.0163 0.003 -5.586 0.000 -0.022 -0.011\n",
"highbo -0.0062 0.005 -1.127 0.260 -0.017 0.005\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 88,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + highbo')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"WIC is no longer associated with more girls."
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692838\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.4 *\n",
"C(fbrace)[T.3.0] 105.5 104.7 \n",
"C(fbrace)[T.4.0] 105.5 107.1 *\n",
"wic[T.Y] 105.5 105.6 \n",
"fhisp 105.5 103.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2735525 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2735519 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 2.029e-05 | \n",
"
\n",
"\n",
" Time: | 15:02:07 | Log-Likelihood: | -1.8953e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.8953e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 3.710e-15 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0539 | 0.002 | 31.172 | 0.000 | 0.050 0.057 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0209 | 0.004 | -5.723 | 0.000 | -0.028 -0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0078 | 0.012 | -0.636 | 0.525 | -0.032 0.016 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0148 | 0.005 | 3.044 | 0.002 | 0.005 0.024 | \n",
"
\n",
"\n",
" wic[T.Y] | 0.0007 | 0.003 | 0.264 | 0.792 | -0.004 0.006 | \n",
"
\n",
"\n",
" fhisp | -0.0181 | 0.003 | -5.484 | 0.000 | -0.025 -0.012 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2735525\n",
"Model: Logit Df Residuals: 2735519\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 2.029e-05\n",
"Time: 15:02:07 Log-Likelihood: -1.8953e+06\n",
"converged: True LL-Null: -1.8953e+06\n",
" LLR p-value: 3.710e-15\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0539 0.002 31.172 0.000 0.050 0.057\n",
"C(fbrace)[T.2.0] -0.0209 0.004 -5.723 0.000 -0.028 -0.014\n",
"C(fbrace)[T.3.0] -0.0078 0.012 -0.636 0.525 -0.032 0.016\n",
"C(fbrace)[T.4.0] 0.0148 0.005 3.044 0.002 0.005 0.024\n",
"wic[T.Y] 0.0007 0.003 0.264 0.792 -0.004 0.006\n",
"fhisp -0.0181 0.003 -5.484 0.000 -0.025 -0.012\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 89,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + wic')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect of obesity disappears."
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692838\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.7 103.5 *\n",
"C(fbrace)[T.3.0] 105.7 104.2 \n",
"C(fbrace)[T.4.0] 105.7 107.2 *\n",
"fhisp 105.7 103.9 *\n",
"obese 105.7 105.1 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2686167 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2686161 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 2.202e-05 | \n",
"
\n",
"\n",
" Time: | 15:02:31 | Log-Likelihood: | -1.8611e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.8611e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 3.274e-16 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0552 | 0.002 | 32.697 | 0.000 | 0.052 0.059 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0210 | 0.004 | -5.842 | 0.000 | -0.028 -0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0137 | 0.012 | -1.109 | 0.267 | -0.038 0.011 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0145 | 0.005 | 2.949 | 0.003 | 0.005 0.024 | \n",
"
\n",
"\n",
" fhisp | -0.0174 | 0.003 | -5.490 | 0.000 | -0.024 -0.011 | \n",
"
\n",
"\n",
" obese | -0.0052 | 0.003 | -1.770 | 0.077 | -0.011 0.001 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2686167\n",
"Model: Logit Df Residuals: 2686161\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 2.202e-05\n",
"Time: 15:02:31 Log-Likelihood: -1.8611e+06\n",
"converged: True LL-Null: -1.8611e+06\n",
" LLR p-value: 3.274e-16\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0552 0.002 32.697 0.000 0.052 0.059\n",
"C(fbrace)[T.2.0] -0.0210 0.004 -5.842 0.000 -0.028 -0.014\n",
"C(fbrace)[T.3.0] -0.0137 0.012 -1.109 0.267 -0.038 0.011\n",
"C(fbrace)[T.4.0] 0.0145 0.005 2.949 0.003 0.005 0.024\n",
"fhisp -0.0174 0.003 -5.490 0.000 -0.024 -0.011\n",
"obese -0.0052 0.003 -1.770 0.077 -0.011 0.001\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 90,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + obese')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect of payment method is diminished, but self-payment is still associated with more boys."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692835\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.9 103.6 *\n",
"C(fbrace)[T.3.0] 105.9 104.7 \n",
"C(fbrace)[T.4.0] 105.9 107.4 *\n",
"C(pay_rec)[T.2.0] 105.9 105.3 \n",
"C(pay_rec)[T.3.0] 105.9 107.0 \n",
"C(pay_rec)[T.4.0] 105.9 105.7 \n",
"fhisp 105.9 103.8 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2763347 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2763339 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 7 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 2.100e-05 | \n",
"
\n",
"\n",
" Time: | 15:03:17 | Log-Likelihood: | -1.9145e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.9146e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.132e-14 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0571 | 0.002 | 22.914 | 0.000 | 0.052 0.062 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0214 | 0.004 | -5.920 | 0.000 | -0.028 -0.014 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0113 | 0.012 | -0.915 | 0.360 | -0.035 0.013 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0142 | 0.005 | 2.955 | 0.003 | 0.005 0.024 | \n",
"
\n",
"\n",
" C(pay_rec)[T.2.0] | -0.0050 | 0.003 | -1.839 | 0.066 | -0.010 0.000 | \n",
"
\n",
"\n",
" C(pay_rec)[T.3.0] | 0.0103 | 0.007 | 1.580 | 0.114 | -0.002 0.023 | \n",
"
\n",
"\n",
" C(pay_rec)[T.4.0] | -0.0016 | 0.006 | -0.274 | 0.784 | -0.013 0.010 | \n",
"
\n",
"\n",
" fhisp | -0.0193 | 0.003 | -5.917 | 0.000 | -0.026 -0.013 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2763347\n",
"Model: Logit Df Residuals: 2763339\n",
"Method: MLE Df Model: 7\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 2.100e-05\n",
"Time: 15:03:17 Log-Likelihood: -1.9145e+06\n",
"converged: True LL-Null: -1.9146e+06\n",
" LLR p-value: 1.132e-14\n",
"=====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept 0.0571 0.002 22.914 0.000 0.052 0.062\n",
"C(fbrace)[T.2.0] -0.0214 0.004 -5.920 0.000 -0.028 -0.014\n",
"C(fbrace)[T.3.0] -0.0113 0.012 -0.915 0.360 -0.035 0.013\n",
"C(fbrace)[T.4.0] 0.0142 0.005 2.955 0.003 0.005 0.024\n",
"C(pay_rec)[T.2.0] -0.0050 0.003 -1.839 0.066 -0.010 0.000\n",
"C(pay_rec)[T.3.0] 0.0103 0.007 1.580 0.114 -0.002 0.023\n",
"C(pay_rec)[T.4.0] -0.0016 0.006 -0.274 0.784 -0.013 0.010\n",
"fhisp -0.0193 0.003 -5.917 0.000 -0.026 -0.013\n",
"=====================================================================================\n",
"\"\"\""
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But the effect of prenatal visits is still a strong predictor of more girls."
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692809\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.0 *\n",
"C(fbrace)[T.3.0] 105.5 104.1 \n",
"C(fbrace)[T.4.0] 105.5 107.0 *\n",
"fhisp 105.5 103.4 *\n",
"previs 105.5 104.4 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3097584 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3097578 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 5 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 7.830e-05 | \n",
"
\n",
"\n",
" Time: | 15:03:43 | Log-Likelihood: | -2.1460e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1462e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.719e-70 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0532 | 0.001 | 36.168 | 0.000 | 0.050 0.056 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0237 | 0.003 | -7.129 | 0.000 | -0.030 -0.017 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0129 | 0.011 | -1.170 | 0.242 | -0.035 0.009 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0141 | 0.005 | 3.112 | 0.002 | 0.005 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0193 | 0.003 | -6.533 | 0.000 | -0.025 -0.014 | \n",
"
\n",
"\n",
" previs | -0.0103 | 0.001 | -16.043 | 0.000 | -0.012 -0.009 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3097584\n",
"Model: Logit Df Residuals: 3097578\n",
"Method: MLE Df Model: 5\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 7.830e-05\n",
"Time: 15:03:43 Log-Likelihood: -2.1460e+06\n",
"converged: True LL-Null: -2.1462e+06\n",
" LLR p-value: 1.719e-70\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0532 0.001 36.168 0.000 0.050 0.056\n",
"C(fbrace)[T.2.0] -0.0237 0.003 -7.129 0.000 -0.030 -0.017\n",
"C(fbrace)[T.3.0] -0.0129 0.011 -1.170 0.242 -0.035 0.009\n",
"C(fbrace)[T.4.0] 0.0141 0.005 3.112 0.002 0.005 0.023\n",
"fhisp -0.0193 0.003 -6.533 0.000 -0.025 -0.014\n",
"previs -0.0103 0.001 -16.043 0.000 -0.012 -0.009\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits."
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692805\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.1 *\n",
"C(fbrace)[T.3.0] 105.5 104.1 \n",
"C(fbrace)[T.4.0] 105.5 107.0 *\n",
"fhisp 105.5 103.5 *\n",
"previs 105.5 104.3 *\n",
"no_previs 105.5 99.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3097584 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3097577 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 6 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.320e-05 | \n",
"
\n",
"\n",
" Time: | 15:04:09 | Log-Likelihood: | -2.1460e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1462e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 4.542e-74 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0536 | 0.001 | 36.382 | 0.000 | 0.051 0.057 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0235 | 0.003 | -7.087 | 0.000 | -0.030 -0.017 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0131 | 0.011 | -1.188 | 0.235 | -0.035 0.009 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0139 | 0.005 | 3.070 | 0.002 | 0.005 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0191 | 0.003 | -6.468 | 0.000 | -0.025 -0.013 | \n",
"
\n",
"\n",
" previs | -0.0113 | 0.001 | -16.666 | 0.000 | -0.013 -0.010 | \n",
"
\n",
"\n",
" no_previs | -0.0573 | 0.012 | -4.587 | 0.000 | -0.082 -0.033 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3097584\n",
"Model: Logit Df Residuals: 3097577\n",
"Method: MLE Df Model: 6\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05\n",
"Time: 15:04:09 Log-Likelihood: -2.1460e+06\n",
"converged: True LL-Null: -2.1462e+06\n",
" LLR p-value: 4.542e-74\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0536 0.001 36.382 0.000 0.051 0.057\n",
"C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017\n",
"C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009\n",
"C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023\n",
"fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013\n",
"previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010\n",
"no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### More controls"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears."
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692808\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.2 102.6 *\n",
"C(fbrace)[T.3.0] 105.2 103.8 \n",
"C(fbrace)[T.4.0] 105.2 106.7 *\n",
"fhisp 105.2 103.1 *\n",
"previs 105.2 104.1 *\n",
"dmar 105.2 105.4 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3097584 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3097577 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 6 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 7.846e-05 | \n",
"
\n",
"\n",
" Time: | 15:04:35 | Log-Likelihood: | -2.1460e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1462e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.058e-69 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0506 | 0.004 | 14.449 | 0.000 | 0.044 0.057 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0245 | 0.003 | -7.072 | 0.000 | -0.031 -0.018 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0136 | 0.011 | -1.227 | 0.220 | -0.035 0.008 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0142 | 0.005 | 3.151 | 0.002 | 0.005 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0198 | 0.003 | -6.561 | 0.000 | -0.026 -0.014 | \n",
"
\n",
"\n",
" previs | -0.0103 | 0.001 | -15.969 | 0.000 | -0.012 -0.009 | \n",
"
\n",
"\n",
" dmar | 0.0022 | 0.003 | 0.828 | 0.408 | -0.003 0.007 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3097584\n",
"Model: Logit Df Residuals: 3097577\n",
"Method: MLE Df Model: 6\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 7.846e-05\n",
"Time: 15:04:35 Log-Likelihood: -2.1460e+06\n",
"converged: True LL-Null: -2.1462e+06\n",
" LLR p-value: 1.058e-69\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0506 0.004 14.449 0.000 0.044 0.057\n",
"C(fbrace)[T.2.0] -0.0245 0.003 -7.072 0.000 -0.031 -0.018\n",
"C(fbrace)[T.3.0] -0.0136 0.011 -1.227 0.220 -0.035 0.008\n",
"C(fbrace)[T.4.0] 0.0142 0.005 3.151 0.002 0.005 0.023\n",
"fhisp -0.0198 0.003 -6.561 0.000 -0.026 -0.014\n",
"previs -0.0103 0.001 -15.969 0.000 -0.012 -0.009\n",
"dmar 0.0022 0.003 0.828 0.408 -0.003 0.007\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect of payment method disappears."
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692799\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.7 103.1 *\n",
"C(fbrace)[T.3.0] 105.7 104.0 \n",
"C(fbrace)[T.4.0] 105.7 107.0 *\n",
"C(pay_rec)[T.2.0] 105.7 105.6 \n",
"C(pay_rec)[T.3.0] 105.7 105.7 \n",
"C(pay_rec)[T.4.0] 105.7 105.3 \n",
"fhisp 105.7 103.6 *\n",
"previs 105.7 104.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2679860 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2679851 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 8 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 7.905e-05 | \n",
"
\n",
"\n",
" Time: | 15:05:21 | Log-Likelihood: | -1.8566e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.8568e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 9.714e-59 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0553 | 0.003 | 21.819 | 0.000 | 0.050 0.060 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0248 | 0.004 | -6.723 | 0.000 | -0.032 -0.018 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0166 | 0.012 | -1.326 | 0.185 | -0.041 0.008 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0128 | 0.005 | 2.610 | 0.009 | 0.003 0.022 | \n",
"
\n",
"\n",
" C(pay_rec)[T.2.0] | -0.0012 | 0.003 | -0.436 | 0.663 | -0.007 0.004 | \n",
"
\n",
"\n",
" C(pay_rec)[T.3.0] | 3.729e-05 | 0.007 | 0.006 | 0.996 | -0.013 0.013 | \n",
"
\n",
"\n",
" C(pay_rec)[T.4.0] | -0.0035 | 0.006 | -0.589 | 0.556 | -0.015 0.008 | \n",
"
\n",
"\n",
" fhisp | -0.0203 | 0.003 | -6.114 | 0.000 | -0.027 -0.014 | \n",
"
\n",
"\n",
" previs | -0.0103 | 0.001 | -14.715 | 0.000 | -0.012 -0.009 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2679860\n",
"Model: Logit Df Residuals: 2679851\n",
"Method: MLE Df Model: 8\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 7.905e-05\n",
"Time: 15:05:21 Log-Likelihood: -1.8566e+06\n",
"converged: True LL-Null: -1.8568e+06\n",
" LLR p-value: 9.714e-59\n",
"=====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept 0.0553 0.003 21.819 0.000 0.050 0.060\n",
"C(fbrace)[T.2.0] -0.0248 0.004 -6.723 0.000 -0.032 -0.018\n",
"C(fbrace)[T.3.0] -0.0166 0.012 -1.326 0.185 -0.041 0.008\n",
"C(fbrace)[T.4.0] 0.0128 0.005 2.610 0.009 0.003 0.022\n",
"C(pay_rec)[T.2.0] -0.0012 0.003 -0.436 0.663 -0.007 0.004\n",
"C(pay_rec)[T.3.0] 3.729e-05 0.007 0.006 0.996 -0.013 0.013\n",
"C(pay_rec)[T.4.0] -0.0035 0.006 -0.589 0.556 -0.015 0.008\n",
"fhisp -0.0203 0.003 -6.114 0.000 -0.027 -0.014\n",
"previs -0.0103 0.001 -14.715 0.000 -0.012 -0.009\n",
"=====================================================================================\n",
"\"\"\""
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a version with the addition of a boolean for no prenatal visits."
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692805\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 105.5 103.1 *\n",
"C(fbrace)[T.3.0] 105.5 104.1 \n",
"C(fbrace)[T.4.0] 105.5 107.0 *\n",
"fhisp 105.5 103.5 *\n",
"previs 105.5 104.3 *\n",
"no_previs 105.5 99.6 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3097584 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3097577 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 6 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.320e-05 | \n",
"
\n",
"\n",
" Time: | 15:05:46 | Log-Likelihood: | -2.1460e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1462e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 4.542e-74 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0536 | 0.001 | 36.382 | 0.000 | 0.051 0.057 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0235 | 0.003 | -7.087 | 0.000 | -0.030 -0.017 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0131 | 0.011 | -1.188 | 0.235 | -0.035 0.009 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0139 | 0.005 | 3.070 | 0.002 | 0.005 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0191 | 0.003 | -6.468 | 0.000 | -0.025 -0.013 | \n",
"
\n",
"\n",
" previs | -0.0113 | 0.001 | -16.666 | 0.000 | -0.013 -0.010 | \n",
"
\n",
"\n",
" no_previs | -0.0573 | 0.012 | -4.587 | 0.000 | -0.082 -0.033 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3097584\n",
"Model: Logit Df Residuals: 3097577\n",
"Method: MLE Df Model: 6\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.320e-05\n",
"Time: 15:05:46 Log-Likelihood: -2.1460e+06\n",
"converged: True LL-Null: -2.1462e+06\n",
" LLR p-value: 4.542e-74\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0536 0.001 36.382 0.000 0.051 0.057\n",
"C(fbrace)[T.2.0] -0.0235 0.003 -7.087 0.000 -0.030 -0.017\n",
"C(fbrace)[T.3.0] -0.0131 0.011 -1.188 0.235 -0.035 0.009\n",
"C(fbrace)[T.4.0] 0.0139 0.005 3.070 0.002 0.005 0.023\n",
"fhisp -0.0191 0.003 -6.468 0.000 -0.025 -0.013\n",
"previs -0.0113 0.001 -16.666 0.000 -0.013 -0.010\n",
"no_previs -0.0573 0.012 -4.587 0.000 -0.082 -0.033\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 96,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, surprisingly, the mother's age has a small effect."
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692805\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 106.2 103.7 *\n",
"C(fbrace)[T.3.0] 106.2 104.8 \n",
"C(fbrace)[T.4.0] 106.2 107.8 *\n",
"fhisp 106.2 104.2 *\n",
"previs 106.2 105.0 *\n",
"no_previs 106.2 100.3 *\n",
"mager9 106.2 106.1 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3097584 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3097576 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 7 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.378e-05 | \n",
"
\n",
"\n",
" Time: | 15:06:13 | Log-Likelihood: | -2.1460e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1462e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.081e-73 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0603 | 0.004 | 13.417 | 0.000 | 0.051 0.069 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0241 | 0.003 | -7.209 | 0.000 | -0.031 -0.018 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0139 | 0.011 | -1.255 | 0.209 | -0.036 0.008 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0144 | 0.005 | 3.176 | 0.001 | 0.006 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0196 | 0.003 | -6.592 | 0.000 | -0.025 -0.014 | \n",
"
\n",
"\n",
" previs | -0.0113 | 0.001 | -16.525 | 0.000 | -0.013 -0.010 | \n",
"
\n",
"\n",
" no_previs | -0.0571 | 0.012 | -4.578 | 0.000 | -0.082 -0.033 | \n",
"
\n",
"\n",
" mager9 | -0.0015 | 0.001 | -1.573 | 0.116 | -0.003 0.000 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3097584\n",
"Model: Logit Df Residuals: 3097576\n",
"Method: MLE Df Model: 7\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.378e-05\n",
"Time: 15:06:13 Log-Likelihood: -2.1460e+06\n",
"converged: True LL-Null: -2.1462e+06\n",
" LLR p-value: 1.081e-73\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0603 0.004 13.417 0.000 0.051 0.069\n",
"C(fbrace)[T.2.0] -0.0241 0.003 -7.209 0.000 -0.031 -0.018\n",
"C(fbrace)[T.3.0] -0.0139 0.011 -1.255 0.209 -0.036 0.008\n",
"C(fbrace)[T.4.0] 0.0144 0.005 3.176 0.001 0.006 0.023\n",
"fhisp -0.0196 0.003 -6.592 0.000 -0.025 -0.014\n",
"previs -0.0113 0.001 -16.525 0.000 -0.013 -0.010\n",
"no_previs -0.0571 0.012 -4.578 0.000 -0.082 -0.033\n",
"mager9 -0.0015 0.001 -1.573 0.116 -0.003 0.000\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So does the father's age. But both age effects are small and borderline significant."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692804\n",
" Iterations 3\n",
"C(fbrace)[T.2.0] 106.4 103.8 *\n",
"C(fbrace)[T.3.0] 106.4 105.0 \n",
"C(fbrace)[T.4.0] 106.4 107.9 *\n",
"fhisp 106.4 104.3 *\n",
"previs 106.4 105.2 *\n",
"no_previs 106.4 100.4 *\n",
"fagerrec11 106.4 106.2 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 3088740 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 3088732 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 7 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 8.510e-05 | \n",
"
\n",
"\n",
" Time: | 15:06:39 | Log-Likelihood: | -2.1399e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -2.1401e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.099e-74 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0620 | 0.004 | 14.546 | 0.000 | 0.054 0.070 | \n",
"
\n",
"\n",
" C(fbrace)[T.2.0] | -0.0243 | 0.003 | -7.284 | 0.000 | -0.031 -0.018 | \n",
"
\n",
"\n",
" C(fbrace)[T.3.0] | -0.0137 | 0.011 | -1.236 | 0.217 | -0.035 0.008 | \n",
"
\n",
"\n",
" C(fbrace)[T.4.0] | 0.0143 | 0.005 | 3.143 | 0.002 | 0.005 0.023 | \n",
"
\n",
"\n",
" fhisp | -0.0197 | 0.003 | -6.622 | 0.000 | -0.026 -0.014 | \n",
"
\n",
"\n",
" previs | -0.0113 | 0.001 | -16.639 | 0.000 | -0.013 -0.010 | \n",
"
\n",
"\n",
" no_previs | -0.0581 | 0.013 | -4.637 | 0.000 | -0.083 -0.034 | \n",
"
\n",
"\n",
" fagerrec11 | -0.0017 | 0.001 | -2.082 | 0.037 | -0.003 -0.000 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 3088740\n",
"Model: Logit Df Residuals: 3088732\n",
"Method: MLE Df Model: 7\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 8.510e-05\n",
"Time: 15:06:39 Log-Likelihood: -2.1399e+06\n",
"converged: True LL-Null: -2.1401e+06\n",
" LLR p-value: 1.099e-74\n",
"====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------------\n",
"Intercept 0.0620 0.004 14.546 0.000 0.054 0.070\n",
"C(fbrace)[T.2.0] -0.0243 0.003 -7.284 0.000 -0.031 -0.018\n",
"C(fbrace)[T.3.0] -0.0137 0.011 -1.236 0.217 -0.035 0.008\n",
"C(fbrace)[T.4.0] 0.0143 0.005 3.143 0.002 0.005 0.023\n",
"fhisp -0.0197 0.003 -6.622 0.000 -0.026 -0.014\n",
"previs -0.0113 0.001 -16.639 0.000 -0.013 -0.010\n",
"no_previs -0.0581 0.013 -4.637 0.000 -0.083 -0.034\n",
"fagerrec11 -0.0017 0.001 -2.082 0.037 -0.003 -0.000\n",
"====================================================================================\n",
"\"\"\""
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')\n",
"model = smf.logit(formula, data=df)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What's up with prenatal visits?\n",
"\n",
"The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"2381977"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"white = df[(df.mbrace==1) & (df.fbrace==1)]\n",
"len(white)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And compute sex ratios for each level of `previs`"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" boy | \n",
"
\n",
" \n",
" previs | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" -6 | \n",
" 106 | \n",
"
\n",
" \n",
" -5 | \n",
" 110 | \n",
"
\n",
" \n",
" -4 | \n",
" 108 | \n",
"
\n",
" \n",
" -3 | \n",
" 109 | \n",
"
\n",
" \n",
" -2 | \n",
" 108 | \n",
"
\n",
" \n",
" -1 | \n",
" 107 | \n",
"
\n",
" \n",
" 0 | \n",
" 105 | \n",
"
\n",
" \n",
" 1 | \n",
" 103 | \n",
"
\n",
" \n",
" 2 | \n",
" 102 | \n",
"
\n",
" \n",
" 3 | \n",
" 100 | \n",
"
\n",
" \n",
" 4 | \n",
" 103 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" boy\n",
"previs \n",
"-6 106\n",
"-5 110\n",
"-4 108\n",
"-3 109\n",
"-2 108\n",
"-1 107\n",
" 0 105\n",
" 1 103\n",
" 2 102\n",
" 3 100\n",
" 4 103"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"var = 'previs'\n",
"white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys."
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692804\n",
" Iterations 3\n",
"previs 105.1 103.8 *\n",
"no_previs 105.1 98.9 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2320227 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2320224 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 6.584e-05 | \n",
"
\n",
"\n",
" Time: | 15:06:43 | Log-Likelihood: | -1.6075e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6076e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 1.073e-46 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0493 | 0.001 | 37.359 | 0.000 | 0.047 0.052 | \n",
"
\n",
"\n",
" previs | -0.0116 | 0.001 | -14.535 | 0.000 | -0.013 -0.010 | \n",
"
\n",
"\n",
" no_previs | -0.0608 | 0.015 | -3.966 | 0.000 | -0.091 -0.031 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2320227\n",
"Model: Logit Df Residuals: 2320224\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 6.584e-05\n",
"Time: 15:06:43 Log-Likelihood: -1.6075e+06\n",
"converged: True LL-Null: -1.6076e+06\n",
" LLR p-value: 1.073e-46\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0493 0.001 37.359 0.000 0.047 0.052\n",
"previs -0.0116 0.001 -14.535 0.000 -0.013 -0.010\n",
"no_previs -0.0608 0.015 -3.966 0.000 -0.091 -0.031\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ previs + no_previs')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(0.04929183382635937, -0.011584489975776435)"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"inter = results.params['Intercept']\n",
"slope = results.params['previs']\n",
"inter, slope"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 111.31727637, 110.03516315, 108.76781686, 107.51506742,\n",
" 106.2767467 , 105.05268853, 103.84272863, 102.64670462,\n",
" 101.46445599, 100.29582409])"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"previs = np.arange(-5, 5)\n",
"logodds = inter + slope * previs\n",
"odds = np.exp(logodds)\n",
"odds * 100"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692845\n",
" Iterations 3\n",
"dmar 105.2 105.1 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2381977 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2381975 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 3.675e-08 | \n",
"
\n",
"\n",
" Time: | 15:06:46 | Log-Likelihood: | -1.6503e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6503e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.7276 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0505 | 0.004 | 12.847 | 0.000 | 0.043 0.058 | \n",
"
\n",
"\n",
" dmar | -0.0010 | 0.003 | -0.348 | 0.728 | -0.007 0.005 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2381977\n",
"Model: Logit Df Residuals: 2381975\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 3.675e-08\n",
"Time: 15:06:46 Log-Likelihood: -1.6503e+06\n",
"converged: True LL-Null: -1.6503e+06\n",
" LLR p-value: 0.7276\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0505 0.004 12.847 0.000 0.043 0.058\n",
"dmar -0.0010 0.003 -0.348 0.728 -0.007 0.005\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ dmar')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692830\n",
" Iterations 3\n",
"lowed 105.3 103.9 *\n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2089901 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2089899 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 4.146e-06 | \n",
"
\n",
"\n",
" Time: | 15:06:48 | Log-Likelihood: | -1.4479e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.4480e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.0005303 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0520 | 0.001 | 35.035 | 0.000 | 0.049 0.055 | \n",
"
\n",
"\n",
" lowed | -0.0142 | 0.004 | -3.465 | 0.001 | -0.022 -0.006 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2089901\n",
"Model: Logit Df Residuals: 2089899\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 4.146e-06\n",
"Time: 15:06:48 Log-Likelihood: -1.4479e+06\n",
"converged: True LL-Null: -1.4480e+06\n",
" LLR p-value: 0.0005303\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0520 0.001 35.035 0.000 0.049 0.055\n",
"lowed -0.0142 0.004 -3.465 0.001 -0.022 -0.006\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ lowed')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692845\n",
" Iterations 3\n",
"highbo 105.1 104.1 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2373894 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2373892 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 6.498e-07 | \n",
"
\n",
"\n",
" Time: | 15:06:50 | Log-Likelihood: | -1.6447e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6447e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.1437 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0496 | 0.001 | 37.359 | 0.000 | 0.047 0.052 | \n",
"
\n",
"\n",
" highbo | -0.0095 | 0.006 | -1.462 | 0.144 | -0.022 0.003 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2373894\n",
"Model: Logit Df Residuals: 2373892\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 6.498e-07\n",
"Time: 15:06:50 Log-Likelihood: -1.6447e+06\n",
"converged: True LL-Null: -1.6447e+06\n",
" LLR p-value: 0.1437\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0496 0.001 37.359 0.000 0.047 0.052\n",
"highbo -0.0095 0.006 -1.462 0.144 -0.022 0.003\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ highbo')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692836\n",
" Iterations 3\n",
"wic[T.Y] 105.3 104.8 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2059437 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2059435 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 1.267e-06 | \n",
"
\n",
"\n",
" Time: | 15:07:06 | Log-Likelihood: | -1.4269e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.4269e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.05720 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0519 | 0.002 | 29.448 | 0.000 | 0.048 0.055 | \n",
"
\n",
"\n",
" wic[T.Y] | -0.0055 | 0.003 | -1.902 | 0.057 | -0.011 0.000 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2059437\n",
"Model: Logit Df Residuals: 2059435\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 1.267e-06\n",
"Time: 15:07:06 Log-Likelihood: -1.4269e+06\n",
"converged: True LL-Null: -1.4269e+06\n",
" LLR p-value: 0.05720\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0519 0.002 29.448 0.000 0.048 0.055\n",
"wic[T.Y] -0.0055 0.003 -1.902 0.057 -0.011 0.000\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ wic')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692834\n",
" Iterations 3\n",
"obese 105.2 104.8 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2029161 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2029159 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 4.153e-07 | \n",
"
\n",
"\n",
" Time: | 15:07:08 | Log-Likelihood: | -1.4059e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.4059e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.2798 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0509 | 0.002 | 31.979 | 0.000 | 0.048 0.054 | \n",
"
\n",
"\n",
" obese | -0.0037 | 0.003 | -1.081 | 0.280 | -0.010 0.003 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2029161\n",
"Model: Logit Df Residuals: 2029159\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 4.153e-07\n",
"Time: 15:07:08 Log-Likelihood: -1.4059e+06\n",
"converged: True LL-Null: -1.4059e+06\n",
" LLR p-value: 0.2798\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0509 0.002 31.979 0.000 0.048 0.054\n",
"obese -0.0037 0.003 -1.081 0.280 -0.010 0.003\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ obese')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692834\n",
" Iterations 3\n",
"C(pay_rec)[T.2.0] 105.0 105.2 \n",
"C(pay_rec)[T.3.0] 105.0 105.8 \n",
"C(pay_rec)[T.4.0] 105.0 104.8 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2077652 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2077648 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 3 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 5.425e-07 | \n",
"
\n",
"\n",
" Time: | 15:07:23 | Log-Likelihood: | -1.4395e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.4395e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.6681 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0486 | 0.002 | 20.446 | 0.000 | 0.044 0.053 | \n",
"
\n",
"\n",
" C(pay_rec)[T.2.0] | 0.0021 | 0.003 | 0.684 | 0.494 | -0.004 0.008 | \n",
"
\n",
"\n",
" C(pay_rec)[T.3.0] | 0.0076 | 0.007 | 1.036 | 0.300 | -0.007 0.022 | \n",
"
\n",
"\n",
" C(pay_rec)[T.4.0] | -0.0020 | 0.007 | -0.296 | 0.767 | -0.015 0.011 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2077652\n",
"Model: Logit Df Residuals: 2077648\n",
"Method: MLE Df Model: 3\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 5.425e-07\n",
"Time: 15:07:23 Log-Likelihood: -1.4395e+06\n",
"converged: True LL-Null: -1.4395e+06\n",
" LLR p-value: 0.6681\n",
"=====================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"-------------------------------------------------------------------------------------\n",
"Intercept 0.0486 0.002 20.446 0.000 0.044 0.053\n",
"C(pay_rec)[T.2.0] 0.0021 0.003 0.684 0.494 -0.004 0.008\n",
"C(pay_rec)[T.3.0] 0.0076 0.007 1.036 0.300 -0.007 0.022\n",
"C(pay_rec)[T.4.0] -0.0020 0.007 -0.296 0.767 -0.015 0.011\n",
"=====================================================================================\n",
"\"\"\""
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ C(pay_rec)')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692845\n",
" Iterations 3\n",
"mager9 105.8 105.6 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2381977 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2381975 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 1 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 6.201e-07 | \n",
"
\n",
"\n",
" Time: | 15:07:27 | Log-Likelihood: | -1.6503e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6503e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.1525 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0559 | 0.005 | 11.397 | 0.000 | 0.046 0.066 | \n",
"
\n",
"\n",
" mager9 | -0.0016 | 0.001 | -1.431 | 0.153 | -0.004 0.001 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2381977\n",
"Model: Logit Df Residuals: 2381975\n",
"Method: MLE Df Model: 1\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 6.201e-07\n",
"Time: 15:07:27 Log-Likelihood: -1.6503e+06\n",
"converged: True LL-Null: -1.6503e+06\n",
" LLR p-value: 0.1525\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0559 0.005 11.397 0.000 0.046 0.066\n",
"mager9 -0.0016 0.001 -1.431 0.153 -0.004 0.001\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ mager9')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692844\n",
" Iterations 3\n",
"youngm[T.True] 105.0 106.0 \n",
"oldm[T.True] 105.0 104.9 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2381977 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2381974 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 9.503e-07 | \n",
"
\n",
"\n",
" Time: | 15:07:30 | Log-Likelihood: | -1.6503e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6503e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.2084 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0486 | 0.001 | 35.884 | 0.000 | 0.046 0.051 | \n",
"
\n",
"\n",
" youngm[T.True] | 0.0101 | 0.006 | 1.766 | 0.077 | -0.001 0.021 | \n",
"
\n",
"\n",
" oldm[T.True] | -0.0004 | 0.008 | -0.055 | 0.956 | -0.015 0.014 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2381977\n",
"Model: Logit Df Residuals: 2381974\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 9.503e-07\n",
"Time: 15:07:30 Log-Likelihood: -1.6503e+06\n",
"converged: True LL-Null: -1.6503e+06\n",
" LLR p-value: 0.2084\n",
"==================================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"----------------------------------------------------------------------------------\n",
"Intercept 0.0486 0.001 35.884 0.000 0.046 0.051\n",
"youngm[T.True] 0.0101 0.006 1.766 0.077 -0.001 0.021\n",
"oldm[T.True] -0.0004 0.008 -0.055 0.956 -0.015 0.014\n",
"==================================================================================\n",
"\"\"\""
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ youngm + oldm')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.692843\n",
" Iterations 3\n",
"youngf 105.1 105.6 \n",
"oldf 105.1 104.0 \n"
]
},
{
"data": {
"text/html": [
"\n",
"Logit Regression Results\n",
"\n",
" Dep. Variable: | boy | No. Observations: | 2376438 | \n",
"
\n",
"\n",
" Model: | Logit | Df Residuals: | 2376435 | \n",
"
\n",
"\n",
" Method: | MLE | Df Model: | 2 | \n",
"
\n",
"\n",
" Date: | Wed, 18 May 2016 | Pseudo R-squ.: | 7.327e-07 | \n",
"
\n",
"\n",
" Time: | 15:07:34 | Log-Likelihood: | -1.6465e+06 | \n",
"
\n",
"\n",
" converged: | True | LL-Null: | -1.6465e+06 | \n",
"
\n",
"\n",
" | | LLR p-value: | 0.2993 | \n",
"
\n",
"
\n",
"\n",
"\n",
" | coef | std err | z | P>|z| | [95.0% Conf. Int.] | \n",
"
\n",
"\n",
" Intercept | 0.0495 | 0.001 | 37.030 | 0.000 | 0.047 0.052 | \n",
"
\n",
"\n",
" youngf | 0.0053 | 0.008 | 0.652 | 0.514 | -0.011 0.021 | \n",
"
\n",
"\n",
" oldf | -0.0107 | 0.008 | -1.390 | 0.164 | -0.026 0.004 | \n",
"
\n",
"
"
],
"text/plain": [
"\n",
"\"\"\"\n",
" Logit Regression Results \n",
"==============================================================================\n",
"Dep. Variable: boy No. Observations: 2376438\n",
"Model: Logit Df Residuals: 2376435\n",
"Method: MLE Df Model: 2\n",
"Date: Wed, 18 May 2016 Pseudo R-squ.: 7.327e-07\n",
"Time: 15:07:34 Log-Likelihood: -1.6465e+06\n",
"converged: True LL-Null: -1.6465e+06\n",
" LLR p-value: 0.2993\n",
"==============================================================================\n",
" coef std err z P>|z| [95.0% Conf. Int.]\n",
"------------------------------------------------------------------------------\n",
"Intercept 0.0495 0.001 37.030 0.000 0.047 0.052\n",
"youngf 0.0053 0.008 0.652 0.514 -0.011 0.021\n",
"oldf -0.0107 0.008 -1.390 0.164 -0.026 0.004\n",
"==============================================================================\n",
"\"\"\""
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"formula = ('boy ~ youngf + oldf')\n",
"model = smf.logit(formula, data=white)\n",
"results = model.fit()\n",
"summarize(results)\n",
"results.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 0
}