{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Does Trivers-Willard apply to people?\n", "\n", "This notebook contains a \"one-day paper\", my attempt to pose a research question, answer it, and publish the results in one work day.\n", "\n", "Copyright 2016 Allen B. Downey\n", "\n", "MIT License: https://opensource.org/licenses/MIT" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from __future__ import print_function, division\n", "\n", "import thinkstats2\n", "import thinkplot\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import statsmodels.formula.api as smf\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trivers-Willard\n", "\n", "[According to Wikipedia](https://en.wikipedia.org/wiki/Trivers%E2%80%93Willard_hypothesis), the Trivers-Willard hypothesis:\n", "\n", ">\"...suggests that female mammals are able to adjust offspring sex ratio in response to their maternal condition. For example, it may predict greater parental investment in males by parents in 'good conditions' and greater investment in females by parents in 'poor conditions' (relative to parents in good condition).\"\n", "\n", "For humans, the hypothesis suggests that people with relatively high social status might be more likely to have boys. Some studies have shown evidence for this hypothesis, but based on my very casual survey, it is not persuasive.\n", "\n", "To test whether the T-W hypothesis holds up in humans, I downloaded [birth data for the nearly 4 million babies born in the U.S. in 2014](http://www.cdc.gov/nchs/data_access/vitalstatsonline.htm#Births).\n", "\n", "I selected variables that seemed likely to be related to social status and used logistic regression to identify variables associated with sex ratio." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Summary of results**\n", "\n", "1. Running regression with one variable at a time, many of the variables have a statistically significant effect on sex ratio, with the sign of the effect generally in the direction predicted by T-W.\n", "\n", "2. However, many of the variables are also correlated with race. If we control for either the mother's race or the father's race, or both, most other variables have no additional predictive power.\n", "\n", "3. Contrary to other reports, the age of the parents seems to have no predictive power.\n", "\n", "4. Strangely, the variable that shows the strongest and most consistent relationship with sex ratio is the number of prenatal visits. Although it seems obvious that prenatal visits are a proxy for quality of health care and general socioeconomic status, the sign of the effect is opposite what T-W predicts; that is, more prenatal visits is a strong predictor of lower sex ratio (more girls).\n", "\n", "Following convention, I report sex ratio in terms of boys per 100 girls. The overall sex ratio at birth is about 105; that is, 105 boys are born for every 100 girls." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data cleaning\n", "\n", "Here's how I loaded the data:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "names = ['year', 'mager9', 'mnativ', 'restatus', 'mbrace', 'mhisp_r',\n", " 'mar_p', 'dmar', 'meduc', 'fagerrec11', 'fbrace', 'fhisp_r', 'feduc', \n", " 'lbo_rec', 'previs_rec', 'wic', 'height', 'bmi_r', 'pay_rec', 'sex']\n", "colspecs = [(9, 12),\n", " (79, 79),\n", " (84, 84),\n", " (104, 104),\n", " (110, 110),\n", " (115, 115),\n", " (119, 119),\n", " (120, 120),\n", " (124, 124),\n", " (149, 150),\n", " (156, 156),\n", " (160, 160),\n", " (163, 163),\n", " (179, 179),\n", " (242, 243),\n", " (251, 251),\n", " (280, 281),\n", " (287, 287),\n", " (436, 436),\n", " (475, 475),\n", " ]\n", "\n", "colspecs = [(start-1, end) for start, end in colspecs]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = None" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "filename = 'Nat2014PublicUS.c20150514.r20151022.txt.gz'\n", "#df = pd.read_fwf(filename, compression='gzip', header=None, names=names, colspecs=colspecs)\n", "#df.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# store the dataframe for faster loading\n", "\n", "#store = pd.HDFStore('store.h5')\n", "#store['births2014'] = df\n", "#store.close()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# load the dataframe\n", "\n", "store = pd.HDFStore('store.h5')\n", "df = store['births2014']\n", "store.close()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def series_to_ratio(series):\n", " \"\"\"Takes a boolean series and computes sex ratio.\n", " \"\"\"\n", " boys = np.mean(series)\n", " return np.round(100 * boys / (1-boys)).astype(int)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I have to recode sex as `0` or `1` to make `logit` happy." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 1952273\n", "1 2045902\n", "Name: boy, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['boy'] = (df.sex=='M').astype(int)\n", "df.boy.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All births are from 2014." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2014 3998175\n", "Name: year, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.year.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's age:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2777\n", "2 249581\n", "3 884246\n", "4 1148469\n", "5 1084064\n", "6 510214\n", "7 110318\n", "8 7750\n", "9 756\n", "Name: mager9, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mager9.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mager9
1109
2105
3105
4105
5105
6105
7104
8104
9102
\n", "
" ], "text/plain": [ " boy\n", "mager9 \n", "1 109\n", "2 105\n", "3 105\n", "4 105\n", "5 105\n", "6 105\n", "7 104\n", "8 104\n", "9 102" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mager9'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mager9.isnull().mean()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.06311829772333627, 0.029719559549044251)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['youngm'] = df.mager9<=2\n", "df['oldm'] = df.mager9>=7\n", "df.youngm.mean(), df.oldm.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's nativity (1 = born in the U.S.)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 3106689\n", "2 881662\n", "Name: mnativ, dtype: int64" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mnativ.replace([3], np.nan, inplace=True)\n", "df.mnativ.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mnativ
1105
2105
\n", "
" ], "text/plain": [ " boy\n", "mnativ \n", "1 105\n", "2 105" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mnativ'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Residence status (1=resident)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2873404\n", "2 1025766\n", "3 88906\n", "4 10099\n", "Name: restatus, dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.restatus.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
restatus
1105
2105
3106
4106
\n", "
" ], "text/plain": [ " boy\n", "restatus \n", "1 105\n", "2 105\n", "3 106\n", "4 106" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'restatus'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's race (1=White, 2=Black, 3=American Indian or Alaskan Native, 4=Asian or Pacific Islander)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 3029013\n", "2 641089\n", "3 44962\n", "4 283111\n", "Name: mbrace, dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mbrace.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mbrace
1105
2103
3103
4106
\n", "
" ], "text/plain": [ " boy\n", "mbrace \n", "1 105\n", "2 103\n", "3 103\n", "4 106" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mbrace'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's Hispanic origin (0=Non-Hispanic)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 3045419\n", "1 553738\n", "2 69894\n", "3 20165\n", "4 136785\n", "5 141497\n", "Name: mhisp_r, dtype: int64" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mhisp_r.replace([9], np.nan, inplace=True)\n", "df.mhisp_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def copy_null(df, oldvar, newvar):\n", " df.loc[df[oldvar].isnull(), newvar] = np.nan" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.0076727506925034546, 0.23240818268843488)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mhisp'] = df.mhisp_r > 0\n", "copy_null(df, 'mhisp_r', 'mhisp')\n", "df.mhisp.isnull().mean(), df.mhisp.mean()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mhisp
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "mhisp \n", "0 105\n", "1 104" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mhisp'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Marital status (1=Married)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2390630\n", "2 1607545\n", "Name: dmar, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dmar.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
dmar
1105
2104
\n", "
" ], "text/plain": [ " boy\n", "dmar \n", "1 105\n", "2 104" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'dmar'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Paternity acknowledged, if unmarried (Y=yes, N=no, X=not applicable, U=unknown).\n", "\n", "I recode X (not applicable because married) as Y (paternity acknowledged)." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "N 462627\n", "Y 3386542\n", "Name: mar_p, dtype: int64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.mar_p.replace(['U'], np.nan, inplace=True)\n", "df.mar_p.replace(['X'], 'Y', inplace=True)\n", "df.mar_p.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mar_p
N103
Y105
\n", "
" ], "text/plain": [ " boy\n", "mar_p \n", "N 103\n", "Y 105" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mar_p'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's education level" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 138589\n", "2 437081\n", "3 957265\n", "4 815688\n", "5 308384\n", "6 732661\n", "7 326800\n", "8 94057\n", "Name: meduc, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.meduc.replace([9], np.nan, inplace=True)\n", "df.meduc.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
meduc
1104
2104
3105
4105
5105
6105
7105
8104
\n", "
" ], "text/plain": [ " boy\n", "meduc \n", "1 104\n", "2 104\n", "3 105\n", "4 105\n", "5 105\n", "6 105\n", "7 105\n", "8 104" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'meduc'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.046933913598079122, 0.15107367095085322)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['lowed'] = df.meduc <= 2\n", "copy_null(df, 'meduc', 'lowed')\n", "df.lowed.isnull().mean(), df.lowed.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's age, in 10 ranges" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 277\n", "2 84852\n", "3 498779\n", "4 869280\n", "5 1025631\n", "6 631685\n", "7 262169\n", "8 87432\n", "9 28465\n", "10 12490\n", "Name: fagerrec11, dtype: int64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fagerrec11.replace([11], np.nan, inplace=True)\n", "df.fagerrec11.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fagerrec11
1102
2106
3106
4105
5105
6105
7105
8105
9104
10109
\n", "
" ], "text/plain": [ " boy\n", "fagerrec11 \n", "1 102\n", "2 106\n", "3 106\n", "4 105\n", "5 105\n", "6 105\n", "7 105\n", "8 105\n", "9 104\n", "10 109" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fagerrec11'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.12433547806186572, 0.024315207394332003)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['youngf'] = df.fagerrec11<=2\n", "copy_null(df, 'fagerrec11', 'youngf')\n", "df.youngf.isnull().mean(), df.youngf.mean()" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.12433547806186572, 0.036670893957829916)" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['oldf'] = df.fagerrec11>=8\n", "copy_null(df, 'fagerrec11', 'oldf')\n", "df.oldf.isnull().mean(), df.oldf.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's race" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 2497901\n", "2 482433\n", "3 35408\n", "4 238394\n", "Name: fbrace, dtype: int64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fbrace.replace([9], np.nan, inplace=True)\n", "df.fbrace.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fbrace
1105
2103
3103
4107
\n", "
" ], "text/plain": [ " boy\n", "fbrace \n", "1 105\n", "2 103\n", "3 103\n", "4 107" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fbrace'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's Hispanic origin (0=non-hispanic, other values indicate country of origin)" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "0 2649007\n", "1 493497\n", "2 59137\n", "3 19128\n", "4 108111\n", "5 124172\n", "Name: fhisp_r, dtype: int64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.fhisp_r.replace([9], np.nan, inplace=True)\n", "df.fhisp_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.13634295647389122, 0.23285053338322156)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['fhisp'] = df.fhisp_r > 0\n", "copy_null(df, 'fhisp_r', 'fhisp')\n", "df.fhisp.isnull().mean(), df.fhisp.mean()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
fhisp
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "fhisp \n", "0 105\n", "1 104" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'fhisp'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's education level" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 141654\n", "2 342061\n", "3 951980\n", "4 643118\n", "5 232622\n", "6 616187\n", "7 242022\n", "8 109482\n", "Name: feduc, dtype: int64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.feduc.replace([9], np.nan, inplace=True)\n", "df.feduc.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
feduc
1104
2105
3105
4105
5106
6105
7105
8105
\n", "
" ], "text/plain": [ " boy\n", "feduc \n", "1 104\n", "2 105\n", "3 105\n", "4 105\n", "5 106\n", "6 105\n", "7 105\n", "8 105" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'feduc'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Live birth order." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 1555006\n", "2 1270496\n", "3 669016\n", "4 284435\n", "5 110708\n", "6 46093\n", "7 20786\n", "8 21610\n", "Name: lbo_rec, dtype: int64" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.lbo_rec.replace([9], np.nan, inplace=True)\n", "df.lbo_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
lbo_rec
1105
2105
3105
4105
5104
6104
7104
8102
\n", "
" ], "text/plain": [ " boy\n", "lbo_rec \n", "1 105\n", "2 105\n", "3 105\n", "4 105\n", "5 104\n", "6 104\n", "7 104\n", "8 102" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'lbo_rec'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.0050085351441595226, 0.050072772519889897)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['highbo'] = df.lbo_rec >= 5\n", "copy_null(df, 'lbo_rec', 'highbo')\n", "df.highbo.isnull().mean(), df.highbo.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Number of prenatal visits, in 11 ranges" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 59670\n", "2 44923\n", "3 98141\n", "4 201032\n", "5 366887\n", "6 826908\n", "7 998330\n", "8 684997\n", "9 379305\n", "10 99067\n", "11 128805\n", "Name: previs_rec, dtype: int64" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.previs_rec.replace([12], np.nan, inplace=True)\n", "df.previs_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df.previs_rec.mean()\n", "df['previs'] = df.previs_rec - 7" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
previs
-6105
-5107
-4107
-3108
-2107
-1106
0105
1103
2102
3102
4102
\n", "
" ], "text/plain": [ " boy\n", "previs \n", "-6 105\n", "-5 107\n", "-4 107\n", "-3 108\n", "-2 107\n", "-1 106\n", " 0 105\n", " 1 103\n", " 2 102\n", " 3 102\n", " 4 102" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'previs'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.027540065154726845, 0.015346965650008423)" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['no_previs'] = df.previs_rec <= 1\n", "copy_null(df, 'previs_rec', 'no_previs')\n", "df.no_previs.isnull().mean(), df.no_previs.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whether the mother is eligible for food stamps" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "N 2124143\n", "Y 1634978\n", "Name: wic, dtype: int64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.wic.replace(['U'], np.nan, inplace=True)\n", "df.wic.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
wic
N105
Y104
\n", "
" ], "text/plain": [ " boy\n", "wic \n", "N 105\n", "Y 104" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'wic'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's height in inches" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "30 28\n", "31 1\n", "34 2\n", "36 14\n", "37 7\n", "38 7\n", "39 7\n", "40 6\n", "41 10\n", "42 13\n", "43 3\n", "44 8\n", "45 11\n", "46 14\n", "47 22\n", "48 857\n", "49 544\n", "50 357\n", "51 422\n", "52 493\n", "53 1503\n", "54 1414\n", "55 2762\n", "56 6678\n", "57 18359\n", "58 21019\n", "59 81588\n", "60 209490\n", "61 269142\n", "62 474306\n", "63 485840\n", "64 559249\n", "65 453503\n", "66 429253\n", "67 334485\n", "68 189690\n", "69 127789\n", "70 62364\n", "71 33428\n", "72 15323\n", "73 5200\n", "74 2538\n", "75 1019\n", "76 590\n", "77 593\n", "78 941\n", "Name: height, dtype: int64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.height.replace([99], np.nan, inplace=True)\n", "df.height.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.051844404009329256, 0.0359147662344377)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mshort'] = df.height<60\n", "copy_null(df, 'height', 'mshort')\n", "df.mshort.isnull().mean(), df.mshort.mean()" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.051844404009329256, 0.03218134412692316)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['mtall'] = df.height>=70\n", "copy_null(df, 'height', 'mtall')\n", "df.mtall.isnull().mean(), df.mtall.mean()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mshort
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "mshort \n", "0 105\n", "1 104" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mshort'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
mtall
0105
1104
\n", "
" ], "text/plain": [ " boy\n", "mtall \n", "0 105\n", "1 104" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'mtall'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's BMI in 6 ranges" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 140142\n", "2 1702519\n", "3 949075\n", "4 506017\n", "5 242957\n", "6 168515\n", "Name: bmi_r, dtype: int64" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.bmi_r.replace([9], np.nan, inplace=True)\n", "df.bmi_r.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
bmi_r
1105
2105
3105
4104
5104
6104
\n", "
" ], "text/plain": [ " boy\n", "bmi_r \n", "1 105\n", "2 105\n", "3 105\n", "4 104\n", "5 104\n", "6 104" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'bmi_r'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.07227047340349034, 0.2473532880857861)" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['obese'] = df.bmi_r >= 4\n", "copy_null(df, 'bmi_r', 'obese')\n", "df.obese.isnull().mean(), df.obese.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Payment method (1=Medicaid, 2=Private insurance, 3=Self pay, 4=Other)" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "1 1665161\n", "2 1824151\n", "3 162650\n", "4 167806\n", "Name: pay_rec, dtype: int64" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pay_rec.replace([9], np.nan, inplace=True)\n", "df.pay_rec.value_counts().sort_index()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
pay_rec
1104
2105
3107
4105
\n", "
" ], "text/plain": [ " boy\n", "pay_rec \n", "1 104\n", "2 105\n", "3 107\n", "4 105" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'pay_rec'\n", "df[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sex of baby" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "F 1952273\n", "M 2045902\n", "Name: sex, dtype: int64" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.sex.value_counts().sort_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Regression models\n", "\n", "Here are some functions I'll use to interpret the results of logistic regression" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def logodds_to_ratio(logodds):\n", " \"\"\"Convert from log odds to probability.\"\"\"\n", " odds = np.exp(logodds)\n", " return 100 * odds\n", "\n", "def summarize(results):\n", " \"\"\"Summarize parameters in terms of birth ratio.\"\"\"\n", " inter_or = results.params['Intercept']\n", " inter_rat = logodds_to_ratio(inter_or)\n", " \n", " for value, lor in results.params.iteritems():\n", " if value=='Intercept':\n", " continue\n", " \n", " rat = logodds_to_ratio(inter_or + lor)\n", " code = '*' if results.pvalues[value] < 0.05 else ' '\n", " \n", " print('%-20s %0.1f %0.1f' % (value, inter_rat, rat), code)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll run models with each variable, one at a time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's age seems to have no predictive value:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692873\n", " Iterations 3\n", "mager9 105.1 105.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998173
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.129e-07
Time: 14:18:28 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.4290
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0496 0.004 13.550 0.000 0.042 0.057
mager9 -0.0007 0.001 -0.791 0.429 -0.002 0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3998175\n", "Model: Logit Df Residuals: 3998173\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.129e-07\n", "Time: 14:18:28 Log-Likelihood: -2.7702e+06\n", "converged: True LL-Null: -2.7702e+06\n", " LLR p-value: 0.4290\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0496 0.004 13.550 0.000 0.042 0.057\n", "mager9 -0.0007 0.001 -0.791 0.429 -0.002 0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mager9', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimated ratios for young mothers is higher, and the ratio for older mothers is lower, but neither is statistically significant." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692873\n", " Iterations 3\n", "youngm[T.True] 104.8 104.9 \n", "oldm[T.True] 104.8 103.9 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998172
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.813e-07
Time: 14:18:33 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.3478
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0470 0.001 44.772 0.000 0.045 0.049
youngm[T.True] 0.0010 0.004 0.240 0.810 -0.007 0.009
oldm[T.True] -0.0084 0.006 -1.421 0.155 -0.020 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3998175\n", "Model: Logit Df Residuals: 3998172\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 3.813e-07\n", "Time: 14:18:33 Log-Likelihood: -2.7702e+06\n", "converged: True LL-Null: -2.7702e+06\n", " LLR p-value: 0.3478\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0470 0.001 44.772 0.000 0.045 0.049\n", "youngm[T.True] 0.0010 0.004 0.240 0.810 -0.007 0.009\n", "oldm[T.True] -0.0084 0.006 -1.421 0.155 -0.020 0.003\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ youngm + oldm', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Whether the mother was born in the U.S. has no predictive value" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692873\n", " Iterations 3\n", "C(mnativ)[T.2.0] 104.8 104.9 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3988351
Model: Logit Df Residuals: 3988349
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.566e-08
Time: 14:19:00 Log-Likelihood: -2.7634e+06
converged: True LL-Null: -2.7634e+06
LLR p-value: 0.6154
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0466 0.001 41.050 0.000 0.044 0.049
C(mnativ)[T.2.0] 0.0012 0.002 0.502 0.615 -0.004 0.006
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3988351\n", "Model: Logit Df Residuals: 3988349\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 4.566e-08\n", "Time: 14:19:00 Log-Likelihood: -2.7634e+06\n", "converged: True LL-Null: -2.7634e+06\n", " LLR p-value: 0.6154\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0466 0.001 41.050 0.000 0.044 0.049\n", "C(mnativ)[T.2.0] 0.0012 0.002 0.502 0.615 -0.004 0.006\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(mnativ)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Neither does residence status" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692872\n", " Iterations 3\n", "C(restatus)[T.2] 104.8 104.7 \n", "C(restatus)[T.3] 104.8 106.0 \n", "C(restatus)[T.4] 104.8 106.2 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998171
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.716e-07
Time: 14:19:28 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 0.2932
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0468 0.001 39.653 0.000 0.044 0.049
C(restatus)[T.2] -0.0010 0.002 -0.418 0.676 -0.005 0.004
C(restatus)[T.3] 0.0117 0.007 1.718 0.086 -0.002 0.025
C(restatus)[T.4] 0.0132 0.020 0.663 0.507 -0.026 0.052
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3998175\n", "Model: Logit Df Residuals: 3998171\n", "Method: MLE Df Model: 3\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 6.716e-07\n", "Time: 14:19:28 Log-Likelihood: -2.7702e+06\n", "converged: True LL-Null: -2.7702e+06\n", " LLR p-value: 0.2932\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0468 0.001 39.653 0.000 0.044 0.049\n", "C(restatus)[T.2] -0.0010 0.002 -0.418 0.676 -0.005 0.004\n", "C(restatus)[T.3] 0.0117 0.007 1.718 0.086 -0.002 0.025\n", "C(restatus)[T.4] 0.0132 0.020 0.663 0.507 -0.026 0.052\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(restatus)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's race seems to have predictive value. Relative to whites, black and Native American mothers have more girls; Asians have more boys." ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692863\n", " Iterations 3\n", "C(mbrace)[T.2] 105.1 102.9 *\n", "C(mbrace)[T.3] 105.1 103.1 *\n", "C(mbrace)[T.4] 105.1 106.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998171
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.401e-05
Time: 14:19:55 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 1.007e-16
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0497 0.001 43.250 0.000 0.047 0.052
C(mbrace)[T.2] -0.0214 0.003 -7.770 0.000 -0.027 -0.016
C(mbrace)[T.3] -0.0195 0.010 -2.049 0.041 -0.038 -0.001
C(mbrace)[T.4] 0.0109 0.004 2.777 0.005 0.003 0.019
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3998175\n", "Model: Logit Df Residuals: 3998171\n", "Method: MLE Df Model: 3\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.401e-05\n", "Time: 14:19:55 Log-Likelihood: -2.7702e+06\n", "converged: True LL-Null: -2.7702e+06\n", " LLR p-value: 1.007e-16\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0497 0.001 43.250 0.000 0.047 0.052\n", "C(mbrace)[T.2] -0.0214 0.003 -7.770 0.000 -0.027 -0.016\n", "C(mbrace)[T.3] -0.0195 0.010 -2.049 0.041 -0.038 -0.001\n", "C(mbrace)[T.4] 0.0109 0.004 2.777 0.005 0.003 0.019\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(mbrace)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hispanic mothers have more girls." ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692874\n", " Iterations 3\n", "mhisp 105.0 104.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3967498
Model: Logit Df Residuals: 3967496
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.998e-06
Time: 14:19:59 Log-Likelihood: -2.7490e+06
converged: True LL-Null: -2.7490e+06
LLR p-value: 0.0009174
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0485 0.001 42.263 0.000 0.046 0.051
mhisp -0.0079 0.002 -3.315 0.001 -0.013 -0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3967498\n", "Model: Logit Df Residuals: 3967496\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.998e-06\n", "Time: 14:19:59 Log-Likelihood: -2.7490e+06\n", "converged: True LL-Null: -2.7490e+06\n", " LLR p-value: 0.0009174\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0485 0.001 42.263 0.000 0.046 0.051\n", "mhisp -0.0079 0.002 -3.315 0.001 -0.013 -0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mhisp', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the mother is married or unmarried but paternity is acknowledged, the sex ratio is higher (more boys)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692864\n", " Iterations 3\n", "C(mar_p)[T.Y] 102.8 105.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3849169
Model: Logit Df Residuals: 3849167
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 9.129e-06
Time: 14:20:27 Log-Likelihood: -2.6670e+06
converged: True LL-Null: -2.6670e+06
LLR p-value: 2.990e-12
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0278 0.003 9.446 0.000 0.022 0.034
C(mar_p)[T.Y] 0.0219 0.003 6.978 0.000 0.016 0.028
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3849169\n", "Model: Logit Df Residuals: 3849167\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 9.129e-06\n", "Time: 14:20:27 Log-Likelihood: -2.6670e+06\n", "converged: True LL-Null: -2.6670e+06\n", " LLR p-value: 2.990e-12\n", "=================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "---------------------------------------------------------------------------------\n", "Intercept 0.0278 0.003 9.446 0.000 0.022 0.034\n", "C(mar_p)[T.Y] 0.0219 0.003 6.978 0.000 0.016 0.028\n", "=================================================================================\n", "\"\"\"" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(mar_p)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Being unmarried predicts more girls." ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692871\n", " Iterations 3\n", "C(dmar)[T.2] 105.1 104.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3998175
Model: Logit Df Residuals: 3998173
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.001e-06
Time: 14:20:54 Log-Likelihood: -2.7702e+06
converged: True LL-Null: -2.7702e+06
LLR p-value: 4.555e-05
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0502 0.001 38.789 0.000 0.048 0.053
C(dmar)[T.2] -0.0083 0.002 -4.077 0.000 -0.012 -0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3998175\n", "Model: Logit Df Residuals: 3998173\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 3.001e-06\n", "Time: 14:20:54 Log-Likelihood: -2.7702e+06\n", "converged: True LL-Null: -2.7702e+06\n", " LLR p-value: 4.555e-05\n", "================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "--------------------------------------------------------------------------------\n", "Intercept 0.0502 0.001 38.789 0.000 0.048 0.053\n", "C(dmar)[T.2] -0.0083 0.002 -4.077 0.000 -0.012 -0.004\n", "================================================================================\n", "\"\"\"" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(dmar)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each level of mother's education predicts a small increase in the probability of a boy." ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692874\n", " Iterations 3\n", "meduc 104.1 104.2 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3810525
Model: Logit Df Residuals: 3810523
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.416e-06
Time: 14:20:59 Log-Likelihood: -2.6402e+06
converged: True LL-Null: -2.6402e+06
LLR p-value: 0.006248
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0398 0.003 14.711 0.000 0.034 0.045
meduc 0.0016 0.001 2.734 0.006 0.000 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3810525\n", "Model: Logit Df Residuals: 3810523\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.416e-06\n", "Time: 14:20:59 Log-Likelihood: -2.6402e+06\n", "converged: True LL-Null: -2.6402e+06\n", " LLR p-value: 0.006248\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0398 0.003 14.711 0.000 0.034 0.045\n", "meduc 0.0016 0.001 2.734 0.006 0.000 0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ meduc', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692874\n", " Iterations 3\n", "lowed 104.9 104.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3810525
Model: Logit Df Residuals: 3810523
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.431e-06
Time: 14:21:03 Log-Likelihood: -2.6402e+06
converged: True LL-Null: -2.6402e+06
LLR p-value: 0.005983
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0478 0.001 43.002 0.000 0.046 0.050
lowed -0.0079 0.003 -2.749 0.006 -0.013 -0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3810525\n", "Model: Logit Df Residuals: 3810523\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.431e-06\n", "Time: 14:21:03 Log-Likelihood: -2.6402e+06\n", "converged: True LL-Null: -2.6402e+06\n", " LLR p-value: 0.005983\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0478 0.001 43.002 0.000 0.046 0.050\n", "lowed -0.0079 0.003 -2.749 0.006 -0.013 -0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ lowed', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Older fathers are slightly more likely to have girls (but this apparent effect could be due to chance)." ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692840\n", " Iterations 3\n", "fagerrec11 105.9 105.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3501060
Model: Logit Df Residuals: 3501058
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.226e-07
Time: 14:21:08 Log-Likelihood: -2.4257e+06
converged: True LL-Null: -2.4257e+06
LLR p-value: 0.04575
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.004 14.707 0.000 0.049 0.065
fagerrec11 -0.0015 0.001 -1.998 0.046 -0.003 -2.9e-05
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3501060\n", "Model: Logit Df Residuals: 3501058\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.226e-07\n", "Time: 14:21:08 Log-Likelihood: -2.4257e+06\n", "converged: True LL-Null: -2.4257e+06\n", " LLR p-value: 0.04575\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0570 0.004 14.707 0.000 0.049 0.065\n", "fagerrec11 -0.0015 0.001 -1.998 0.046 -0.003 -2.9e-05\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ fagerrec11', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692840\n", " Iterations 3\n", "youngf 105.1 106.3 \n", "oldf 105.1 105.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3501060
Model: Logit Df Residuals: 3501057
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.807e-07
Time: 14:21:12 Log-Likelihood: -2.4257e+06
converged: True LL-Null: -2.4257e+06
LLR p-value: 0.2445
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0493 0.001 44.656 0.000 0.047 0.051
youngf 0.0116 0.007 1.673 0.094 -0.002 0.025
oldf -0.0005 0.006 -0.086 0.932 -0.012 0.011
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3501060\n", "Model: Logit Df Residuals: 3501057\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 5.807e-07\n", "Time: 14:21:12 Log-Likelihood: -2.4257e+06\n", "converged: True LL-Null: -2.4257e+06\n", " LLR p-value: 0.2445\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0493 0.001 44.656 0.000 0.047 0.051\n", "youngf 0.0116 0.007 1.673 0.094 -0.002 0.025\n", "oldf -0.0005 0.006 -0.086 0.932 -0.012 0.011\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ youngf + oldf', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predictions based on father's race are similar to those based on mother's race: more girls for black and Native American fathers; more boys for Asian fathers." ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692818\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.5 103.1 *\n", "C(fbrace)[T.3.0] 105.5 102.9 *\n", "C(fbrace)[T.4.0] 105.5 106.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3254136
Model: Logit Df Residuals: 3254132
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.504e-05
Time: 14:21:38 Log-Likelihood: -2.2545e+06
converged: True LL-Null: -2.2546e+06
LLR p-value: 1.256e-14
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0533 0.001 42.144 0.000 0.051 0.056
C(fbrace)[T.2.0] -0.0227 0.003 -7.221 0.000 -0.029 -0.017
C(fbrace)[T.3.0] -0.0250 0.011 -2.335 0.020 -0.046 -0.004
C(fbrace)[T.4.0] 0.0106 0.004 2.479 0.013 0.002 0.019
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3254136\n", "Model: Logit Df Residuals: 3254132\n", "Method: MLE Df Model: 3\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.504e-05\n", "Time: 14:21:38 Log-Likelihood: -2.2545e+06\n", "converged: True LL-Null: -2.2546e+06\n", " LLR p-value: 1.256e-14\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0533 0.001 42.144 0.000 0.051 0.056\n", "C(fbrace)[T.2.0] -0.0227 0.003 -7.221 0.000 -0.029 -0.017\n", "C(fbrace)[T.3.0] -0.0250 0.011 -2.335 0.020 -0.046 -0.004\n", "C(fbrace)[T.4.0] 0.0106 0.004 2.479 0.013 0.002 0.019\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(fbrace)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the father is Hispanic, that predicts more girls." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692839\n", " Iterations 3\n", "fhisp 105.4 104.0 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3453052
Model: Logit Df Residuals: 3453050
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.800e-06
Time: 14:21:42 Log-Likelihood: -2.3924e+06
converged: True LL-Null: -2.3924e+06
LLR p-value: 1.378e-07
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0525 0.001 42.696 0.000 0.050 0.055
fhisp -0.0134 0.003 -5.268 0.000 -0.018 -0.008
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3453052\n", "Model: Logit Df Residuals: 3453050\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 5.800e-06\n", "Time: 14:21:42 Log-Likelihood: -2.3924e+06\n", "converged: True LL-Null: -2.3924e+06\n", " LLR p-value: 1.378e-07\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0525 0.001 42.696 0.000 0.050 0.055\n", "fhisp -0.0134 0.003 -5.268 0.000 -0.018 -0.008\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ fhisp', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Father's education level might predict more boys, but the apparent effect could be due to chance." ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692840\n", " Iterations 3\n", "feduc 104.6 104.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3279126
Model: Logit Df Residuals: 3279124
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.046e-07
Time: 14:21:46 Log-Likelihood: -2.2719e+06
converged: True LL-Null: -2.2719e+06
LLR p-value: 0.05587
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0445 0.003 15.630 0.000 0.039 0.050
feduc 0.0012 0.001 1.912 0.056 -3.02e-05 0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3279126\n", "Model: Logit Df Residuals: 3279124\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.046e-07\n", "Time: 14:21:46 Log-Likelihood: -2.2719e+06\n", "converged: True LL-Null: -2.2719e+06\n", " LLR p-value: 0.05587\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0445 0.003 15.630 0.000 0.039 0.050\n", "feduc 0.0012 0.001 1.912 0.056 -3.02e-05 0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ feduc', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Babies with high birth order are slightly more likely to be girls." ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692872\n", " Iterations 3\n", "lbo_rec 105.3 105.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3978150
Model: Logit Df Residuals: 3978148
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.576e-06
Time: 14:21:51 Log-Likelihood: -2.7563e+06
converged: True LL-Null: -2.7564e+06
LLR p-value: 0.003206
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0518 0.002 26.529 0.000 0.048 0.056
lbo_rec -0.0023 0.001 -2.947 0.003 -0.004 -0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3978150\n", "Model: Logit Df Residuals: 3978148\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.576e-06\n", "Time: 14:21:51 Log-Likelihood: -2.7563e+06\n", "converged: True LL-Null: -2.7564e+06\n", " LLR p-value: 0.003206\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0518 0.002 26.529 0.000 0.048 0.056\n", "lbo_rec -0.0023 0.001 -2.947 0.003 -0.004 -0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ lbo_rec', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692872\n", " Iterations 3\n", "highbo 104.9 103.4 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3978150
Model: Logit Df Residuals: 3978148
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.647e-06
Time: 14:21:56 Log-Likelihood: -2.7563e+06
converged: True LL-Null: -2.7564e+06
LLR p-value: 0.002584
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0475 0.001 46.200 0.000 0.046 0.050
highbo -0.0139 0.005 -3.013 0.003 -0.023 -0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3978150\n", "Model: Logit Df Residuals: 3978148\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.647e-06\n", "Time: 14:21:56 Log-Likelihood: -2.7563e+06\n", "converged: True LL-Null: -2.7564e+06\n", " LLR p-value: 0.002584\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0475 0.001 46.200 0.000 0.046 0.050\n", "highbo -0.0139 0.005 -3.013 0.003 -0.023 -0.005\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ highbo', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Strangely, prenatal visits are associated with an increased probability of girls." ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692847\n", " Iterations 3\n", "previs 104.6 103.8 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3888065
Model: Logit Df Residuals: 3888063
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.975e-05
Time: 14:22:01 Log-Likelihood: -2.6938e+06
converged: True LL-Null: -2.6939e+06
LLR p-value: 1.677e-48
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0449 0.001 43.933 0.000 0.043 0.047
previs -0.0079 0.001 -14.634 0.000 -0.009 -0.007
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3888065\n", "Model: Logit Df Residuals: 3888063\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 3.975e-05\n", "Time: 14:22:01 Log-Likelihood: -2.6938e+06\n", "converged: True LL-Null: -2.6939e+06\n", " LLR p-value: 1.677e-48\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0449 0.001 43.933 0.000 0.043 0.047\n", "previs -0.0079 0.001 -14.634 0.000 -0.009 -0.007\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ previs', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect seems to be non-linear at zero, so I'm adding a boolean for no prenatal visits." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692842\n", " Iterations 3\n", "no_previs 104.6 98.9 *\n", "previs 104.6 103.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3888065
Model: Logit Df Residuals: 3888062
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.717e-05
Time: 14:22:07 Log-Likelihood: -2.6938e+06
converged: True LL-Null: -2.6939e+06
LLR p-value: 6.538e-56
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0454 0.001 44.310 0.000 0.043 0.047
no_previs -0.0564 0.009 -6.322 0.000 -0.074 -0.039
previs -0.0093 0.001 -15.938 0.000 -0.010 -0.008
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3888065\n", "Model: Logit Df Residuals: 3888062\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 4.717e-05\n", "Time: 14:22:07 Log-Likelihood: -2.6938e+06\n", "converged: True LL-Null: -2.6939e+06\n", " LLR p-value: 6.538e-56\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0454 0.001 44.310 0.000 0.043 0.047\n", "no_previs -0.0564 0.009 -6.322 0.000 -0.074 -0.039\n", "previs -0.0093 0.001 -15.938 0.000 -0.010 -0.008\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ no_previs + previs', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the mother qualifies for food stamps, she is more likely to have a girl." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692869\n", " Iterations 3\n", "wic[T.Y] 105.2 104.3 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3759121
Model: Logit Df Residuals: 3759119
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.051e-06
Time: 14:22:35 Log-Likelihood: -2.6046e+06
converged: True LL-Null: -2.6046e+06
LLR p-value: 6.700e-05
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0506 0.001 36.886 0.000 0.048 0.053
wic[T.Y] -0.0083 0.002 -3.987 0.000 -0.012 -0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3759121\n", "Model: Logit Df Residuals: 3759119\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 3.051e-06\n", "Time: 14:22:35 Log-Likelihood: -2.6046e+06\n", "converged: True LL-Null: -2.6046e+06\n", " LLR p-value: 6.700e-05\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0506 0.001 36.886 0.000 0.048 0.053\n", "wic[T.Y] -0.0083 0.002 -3.987 0.000 -0.012 -0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ wic', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's height seems to have no predictive value." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692873\n", " Iterations 3\n", "height 102.4 102.5 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3790892
Model: Logit Df Residuals: 3790890
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.853e-07
Time: 14:22:39 Log-Likelihood: -2.6266e+06
converged: True LL-Null: -2.6266e+06
LLR p-value: 0.3238
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0240 0.023 1.038 0.299 -0.021 0.069
height 0.0004 0.000 0.987 0.324 -0.000 0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3790892\n", "Model: Logit Df Residuals: 3790890\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.853e-07\n", "Time: 14:22:39 Log-Likelihood: -2.6266e+06\n", "converged: True LL-Null: -2.6266e+06\n", " LLR p-value: 0.3238\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0240 0.023 1.038 0.299 -0.021 0.069\n", "height 0.0004 0.000 0.987 0.324 -0.000 0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ height', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692872\n", " Iterations 3\n", "mtall 104.8 104.1 \n", "mshort 104.8 104.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3790892
Model: Logit Df Residuals: 3790889
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.560e-07
Time: 14:22:43 Log-Likelihood: -2.6266e+06
converged: True LL-Null: -2.6266e+06
LLR p-value: 0.3019
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0473 0.001 44.433 0.000 0.045 0.049
mtall -0.0071 0.006 -1.212 0.226 -0.018 0.004
mshort -0.0056 0.006 -1.005 0.315 -0.016 0.005
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3790892\n", "Model: Logit Df Residuals: 3790889\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 4.560e-07\n", "Time: 14:22:43 Log-Likelihood: -2.6266e+06\n", "converged: True LL-Null: -2.6266e+06\n", " LLR p-value: 0.3019\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0473 0.001 44.433 0.000 0.045 0.049\n", "mtall -0.0071 0.006 -1.212 0.226 -0.018 0.004\n", "mshort -0.0056 0.006 -1.005 0.315 -0.016 0.005\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ mtall + mshort', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mother's with higher BMI are more likely to have girls." ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692870\n", " Iterations 3\n", "bmi_r 105.7 105.4 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3709225
Model: Logit Df Residuals: 3709223
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.168e-06
Time: 14:22:48 Log-Likelihood: -2.5700e+06
converged: True LL-Null: -2.5700e+06
LLR p-value: 0.0008442
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0554 0.003 20.336 0.000 0.050 0.061
bmi_r -0.0029 0.001 -3.338 0.001 -0.005 -0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3709225\n", "Model: Logit Df Residuals: 3709223\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.168e-06\n", "Time: 14:22:48 Log-Likelihood: -2.5700e+06\n", "converged: True LL-Null: -2.5700e+06\n", " LLR p-value: 0.0008442\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0554 0.003 20.336 0.000 0.050 0.061\n", "bmi_r -0.0029 0.001 -3.338 0.001 -0.005 -0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ bmi_r', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692870\n", " Iterations 3\n", "obese 105.0 104.2 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3709225
Model: Logit Df Residuals: 3709223
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.347e-06
Time: 14:22:53 Log-Likelihood: -2.5700e+06
converged: True LL-Null: -2.5700e+06
LLR p-value: 0.0005139
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0491 0.001 40.976 0.000 0.047 0.051
obese -0.0084 0.002 -3.473 0.001 -0.013 -0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3709225\n", "Model: Logit Df Residuals: 3709223\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.347e-06\n", "Time: 14:22:53 Log-Likelihood: -2.5700e+06\n", "converged: True LL-Null: -2.5700e+06\n", " LLR p-value: 0.0005139\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0491 0.001 40.976 0.000 0.047 0.051\n", "obese -0.0084 0.002 -3.473 0.001 -0.013 -0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ obese', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If payment was made by Medicaid, the baby is more likely to be a girl. Private insurance, self-payment, and other payment method are associated with more boys." ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692869\n", " Iterations 3\n", "C(pay_rec)[T.2.0] 104.2 105.1 *\n", "C(pay_rec)[T.3.0] 104.2 106.6 *\n", "C(pay_rec)[T.4.0] 104.2 104.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3819768
Model: Logit Df Residuals: 3819764
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 5.306e-06
Time: 14:23:19 Log-Likelihood: -2.6466e+06
converged: True LL-Null: -2.6466e+06
LLR p-value: 3.482e-06
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0416 0.002 26.840 0.000 0.039 0.045
C(pay_rec)[T.2.0] 0.0085 0.002 3.982 0.000 0.004 0.013
C(pay_rec)[T.3.0] 0.0222 0.005 4.272 0.000 0.012 0.032
C(pay_rec)[T.4.0] 0.0047 0.005 0.925 0.355 -0.005 0.015
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3819768\n", "Model: Logit Df Residuals: 3819764\n", "Method: MLE Df Model: 3\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 5.306e-06\n", "Time: 14:23:19 Log-Likelihood: -2.6466e+06\n", "converged: True LL-Null: -2.6466e+06\n", " LLR p-value: 3.482e-06\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0416 0.002 26.840 0.000 0.039 0.045\n", "C(pay_rec)[T.2.0] 0.0085 0.002 3.982 0.000 0.004 0.013\n", "C(pay_rec)[T.3.0] 0.0222 0.005 4.272 0.000 0.012 0.032\n", "C(pay_rec)[T.4.0] 0.0047 0.005 0.925 0.355 -0.005 0.015\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model = smf.logit('boy ~ C(pay_rec)', data=df) \n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding controls\n", "\n", "However, none of the previous results should be taken too seriously. We only tested one variable at a time, and many of these apparent effects disappear when we add control variables.\n", "\n", "In particular, if we control for father's race and Hispanic origin, the mother's race has no additional predictive value." ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692816\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 103.1 *\n", "C(fbrace)[T.3.0] 105.8 103.5 \n", "C(fbrace)[T.4.0] 105.8 106.9 \n", "C(mbrace)[T.2] 105.8 105.9 \n", "C(mbrace)[T.3] 105.8 104.5 \n", "C(mbrace)[T.4] 105.8 105.6 \n", "fhisp 105.8 104.2 *\n", "mhisp 105.8 106.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3231530
Model: Logit Df Residuals: 3231521
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.087e-05
Time: 14:24:08 Log-Likelihood: -2.2389e+06
converged: True LL-Null: -2.2389e+06
LLR p-value: 9.292e-17
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 38.234 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0260 0.006 -4.668 0.000 -0.037 -0.015
C(fbrace)[T.3.0] -0.0221 0.012 -1.793 0.073 -0.046 0.002
C(fbrace)[T.4.0] 0.0097 0.007 1.344 0.179 -0.004 0.024
C(mbrace)[T.2] 0.0004 0.006 0.075 0.940 -0.011 0.012
C(mbrace)[T.3] -0.0130 0.013 -0.994 0.320 -0.039 0.013
C(mbrace)[T.4] -0.0026 0.007 -0.375 0.708 -0.016 0.011
fhisp -0.0156 0.004 -3.591 0.000 -0.024 -0.007
mhisp 0.0018 0.004 0.422 0.673 -0.007 0.010
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3231530\n", "Model: Logit Df Residuals: 3231521\n", "Method: MLE Df Model: 8\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.087e-05\n", "Time: 14:24:08 Log-Likelihood: -2.2389e+06\n", "converged: True LL-Null: -2.2389e+06\n", " LLR p-value: 9.292e-17\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0566 0.001 38.234 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0260 0.006 -4.668 0.000 -0.037 -0.015\n", "C(fbrace)[T.3.0] -0.0221 0.012 -1.793 0.073 -0.046 0.002\n", "C(fbrace)[T.4.0] 0.0097 0.007 1.344 0.179 -0.004 0.024\n", "C(mbrace)[T.2] 0.0004 0.006 0.075 0.940 -0.011 0.012\n", "C(mbrace)[T.3] -0.0130 0.013 -0.994 0.320 -0.039 0.013\n", "C(mbrace)[T.4] -0.0026 0.007 -0.375 0.708 -0.016 0.011\n", "fhisp -0.0156 0.004 -3.591 0.000 -0.024 -0.007\n", "mhisp 0.0018 0.004 0.422 0.673 -0.007 0.010\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + C(mbrace) + mhisp')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, once we control for father's race and Hispanic origin, almost every other variable becomes statistically insignificant, including acknowledged paternity." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692814\n", " Iterations 3\n", "C(fbrace)[T.2.0] 108.2 105.5 *\n", "C(fbrace)[T.3.0] 108.2 105.2 *\n", "C(fbrace)[T.4.0] 108.2 109.1 \n", "mar_p[T.Y] 108.2 105.8 \n", "fhisp 108.2 106.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3112362
Model: Logit Df Residuals: 3112356
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.117e-05
Time: 14:24:56 Log-Likelihood: -2.1563e+06
converged: True LL-Null: -2.1563e+06
LLR p-value: 3.558e-18
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0792 0.015 5.155 0.000 0.049 0.109
C(fbrace)[T.2.0] -0.0258 0.003 -7.860 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0283 0.011 -2.594 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0074 0.004 1.662 0.097 -0.001 0.016
mar_p[T.Y] -0.0225 0.015 -1.464 0.143 -0.053 0.008
fhisp -0.0148 0.003 -4.982 0.000 -0.021 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3112362\n", "Model: Logit Df Residuals: 3112356\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.117e-05\n", "Time: 14:24:56 Log-Likelihood: -2.1563e+06\n", "converged: True LL-Null: -2.1563e+06\n", " LLR p-value: 3.558e-18\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0792 0.015 5.155 0.000 0.049 0.109\n", "C(fbrace)[T.2.0] -0.0258 0.003 -7.860 0.000 -0.032 -0.019\n", "C(fbrace)[T.3.0] -0.0283 0.011 -2.594 0.009 -0.050 -0.007\n", "C(fbrace)[T.4.0] 0.0074 0.004 1.662 0.097 -0.001 0.016\n", "mar_p[T.Y] -0.0225 0.015 -1.464 0.143 -0.053 0.008\n", "fhisp -0.0148 0.003 -4.982 0.000 -0.021 -0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + mar_p')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Being married still predicts more boys." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692814\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.0 102.2 *\n", "C(fbrace)[T.3.0] 105.0 101.9 *\n", "C(fbrace)[T.4.0] 105.0 105.9 \n", "fhisp 105.0 103.4 *\n", "dmar 105.0 105.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3235798
Model: Logit Df Residuals: 3235792
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.183e-05
Time: 14:25:22 Log-Likelihood: -2.2418e+06
converged: True LL-Null: -2.2419e+06
LLR p-value: 1.485e-19
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0492 0.003 14.375 0.000 0.042 0.056
C(fbrace)[T.2.0] -0.0278 0.003 -8.324 0.000 -0.034 -0.021
C(fbrace)[T.3.0] -0.0301 0.011 -2.778 0.005 -0.051 -0.009
C(fbrace)[T.4.0] 0.0081 0.004 1.871 0.061 -0.000 0.017
fhisp -0.0156 0.003 -5.270 0.000 -0.021 -0.010
dmar 0.0062 0.003 2.416 0.016 0.001 0.011
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3235798\n", "Model: Logit Df Residuals: 3235792\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.183e-05\n", "Time: 14:25:22 Log-Likelihood: -2.2418e+06\n", "converged: True LL-Null: -2.2419e+06\n", " LLR p-value: 1.485e-19\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0492 0.003 14.375 0.000 0.042 0.056\n", "C(fbrace)[T.2.0] -0.0278 0.003 -8.324 0.000 -0.034 -0.021\n", "C(fbrace)[T.3.0] -0.0301 0.011 -2.778 0.005 -0.051 -0.009\n", "C(fbrace)[T.4.0] 0.0081 0.004 1.871 0.061 -0.000 0.017\n", "fhisp -0.0156 0.003 -5.270 0.000 -0.021 -0.010\n", "dmar 0.0062 0.003 2.416 0.016 0.001 0.011\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + dmar')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of education disappears." ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692816\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 103.1 *\n", "C(fbrace)[T.3.0] 105.8 102.8 *\n", "C(fbrace)[T.4.0] 105.8 106.5 \n", "fhisp 105.8 104.2 *\n", "lowed 105.8 106.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3091385
Model: Logit Df Residuals: 3091379
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.076e-05
Time: 14:25:47 Log-Likelihood: -2.1418e+06
converged: True LL-Null: -2.1418e+06
LLR p-value: 1.130e-17
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 37.993 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0259 0.003 -7.838 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0287 0.011 -2.624 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0067 0.004 1.487 0.137 -0.002 0.015
fhisp -0.0152 0.003 -4.927 0.000 -0.021 -0.009
lowed 0.0017 0.004 0.462 0.644 -0.006 0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3091385\n", "Model: Logit Df Residuals: 3091379\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.076e-05\n", "Time: 14:25:47 Log-Likelihood: -2.1418e+06\n", "converged: True LL-Null: -2.1418e+06\n", " LLR p-value: 1.130e-17\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0566 0.001 37.993 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0259 0.003 -7.838 0.000 -0.032 -0.019\n", "C(fbrace)[T.3.0] -0.0287 0.011 -2.624 0.009 -0.050 -0.007\n", "C(fbrace)[T.4.0] 0.0067 0.004 1.487 0.137 -0.002 0.015\n", "fhisp -0.0152 0.003 -4.927 0.000 -0.021 -0.009\n", "lowed 0.0017 0.004 0.462 0.644 -0.006 0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + lowed')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of birth order disappears." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692816\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 103.2 *\n", "C(fbrace)[T.3.0] 105.8 102.9 *\n", "C(fbrace)[T.4.0] 105.8 106.6 \n", "fhisp 105.8 104.4 *\n", "highbo 105.8 105.6 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3221819
Model: Logit Df Residuals: 3221813
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.029e-05
Time: 14:26:13 Log-Likelihood: -2.2321e+06
converged: True LL-Null: -2.2322e+06
LLR p-value: 5.072e-18
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.001 38.815 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0253 0.003 -7.841 0.000 -0.032 -0.019
C(fbrace)[T.3.0] -0.0284 0.011 -2.616 0.009 -0.050 -0.007
C(fbrace)[T.4.0] 0.0077 0.004 1.758 0.079 -0.001 0.016
fhisp -0.0139 0.003 -4.785 0.000 -0.020 -0.008
highbo -0.0026 0.005 -0.483 0.629 -0.013 0.008
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3221819\n", "Model: Logit Df Residuals: 3221813\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.029e-05\n", "Time: 14:26:13 Log-Likelihood: -2.2321e+06\n", "converged: True LL-Null: -2.2322e+06\n", " LLR p-value: 5.072e-18\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0566 0.001 38.815 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0253 0.003 -7.841 0.000 -0.032 -0.019\n", "C(fbrace)[T.3.0] -0.0284 0.011 -2.616 0.009 -0.050 -0.007\n", "C(fbrace)[T.4.0] 0.0077 0.004 1.758 0.079 -0.001 0.016\n", "fhisp -0.0139 0.003 -4.785 0.000 -0.020 -0.008\n", "highbo -0.0026 0.005 -0.483 0.629 -0.013 0.008\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + highbo')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "WIC is no longer associated with more girls." ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692813\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 103.0 *\n", "C(fbrace)[T.3.0] 105.8 103.0 *\n", "C(fbrace)[T.4.0] 105.8 106.6 \n", "wic[T.Y] 105.8 106.1 \n", "fhisp 105.8 104.1 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3040527
Model: Logit Df Residuals: 3040521
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.175e-05
Time: 14:27:01 Log-Likelihood: -2.1065e+06
converged: True LL-Null: -2.1066e+06
LLR p-value: 3.031e-18
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0564 0.002 34.772 0.000 0.053 0.060
C(fbrace)[T.2.0] -0.0271 0.003 -7.892 0.000 -0.034 -0.020
C(fbrace)[T.3.0] -0.0267 0.011 -2.405 0.016 -0.048 -0.005
C(fbrace)[T.4.0] 0.0076 0.005 1.670 0.095 -0.001 0.016
wic[T.Y] 0.0025 0.003 0.975 0.330 -0.002 0.007
fhisp -0.0161 0.003 -5.153 0.000 -0.022 -0.010
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3040527\n", "Model: Logit Df Residuals: 3040521\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.175e-05\n", "Time: 14:27:01 Log-Likelihood: -2.1065e+06\n", "converged: True LL-Null: -2.1066e+06\n", " LLR p-value: 3.031e-18\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0564 0.002 34.772 0.000 0.053 0.060\n", "C(fbrace)[T.2.0] -0.0271 0.003 -7.892 0.000 -0.034 -0.020\n", "C(fbrace)[T.3.0] -0.0267 0.011 -2.405 0.016 -0.048 -0.005\n", "C(fbrace)[T.4.0] 0.0076 0.005 1.670 0.095 -0.001 0.016\n", "wic[T.Y] 0.0025 0.003 0.975 0.330 -0.002 0.007\n", "fhisp -0.0161 0.003 -5.153 0.000 -0.022 -0.010\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + wic')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of obesity disappears." ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692815\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.9 103.3 *\n", "C(fbrace)[T.3.0] 105.9 103.1 *\n", "C(fbrace)[T.4.0] 105.9 106.5 \n", "fhisp 105.9 104.3 *\n", "obese 105.9 105.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3005073
Model: Logit Df Residuals: 3005067
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.947e-05
Time: 14:27:26 Log-Likelihood: -2.0820e+06
converged: True LL-Null: -2.0820e+06
LLR p-value: 5.013e-16
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0571 0.002 35.622 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0247 0.003 -7.305 0.000 -0.031 -0.018
C(fbrace)[T.3.0] -0.0266 0.011 -2.410 0.016 -0.048 -0.005
C(fbrace)[T.4.0] 0.0056 0.005 1.217 0.224 -0.003 0.015
fhisp -0.0151 0.003 -4.996 0.000 -0.021 -0.009
obese -0.0014 0.003 -0.524 0.600 -0.007 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3005073\n", "Model: Logit Df Residuals: 3005067\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.947e-05\n", "Time: 14:27:26 Log-Likelihood: -2.0820e+06\n", "converged: True LL-Null: -2.0820e+06\n", " LLR p-value: 5.013e-16\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0571 0.002 35.622 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0247 0.003 -7.305 0.000 -0.031 -0.018\n", "C(fbrace)[T.3.0] -0.0266 0.011 -2.410 0.016 -0.048 -0.005\n", "C(fbrace)[T.4.0] 0.0056 0.005 1.217 0.224 -0.003 0.015\n", "fhisp -0.0151 0.003 -4.996 0.000 -0.021 -0.009\n", "obese -0.0014 0.003 -0.524 0.600 -0.007 0.004\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + obese')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of payment method is diminished, but self-payment is still associated with more boys." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692812\n", " Iterations 3\n", "C(fbrace)[T.2.0] 106.1 103.3 *\n", "C(fbrace)[T.3.0] 106.1 103.0 *\n", "C(fbrace)[T.4.0] 106.1 106.7 \n", "C(pay_rec)[T.2.0] 106.1 105.7 \n", "C(pay_rec)[T.3.0] 106.1 108.3 *\n", "C(pay_rec)[T.4.0] 106.1 105.4 \n", "fhisp 106.1 104.4 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3086812
Model: Logit Df Residuals: 3086804
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.500e-05
Time: 14:28:14 Log-Likelihood: -2.1386e+06
converged: True LL-Null: -2.1386e+06
LLR p-value: 3.965e-20
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0593 0.002 25.249 0.000 0.055 0.064
C(fbrace)[T.2.0] -0.0271 0.003 -7.980 0.000 -0.034 -0.020
C(fbrace)[T.3.0] -0.0297 0.011 -2.696 0.007 -0.051 -0.008
C(fbrace)[T.4.0] 0.0056 0.004 1.239 0.216 -0.003 0.014
C(pay_rec)[T.2.0] -0.0043 0.003 -1.680 0.093 -0.009 0.001
C(pay_rec)[T.3.0] 0.0203 0.006 3.331 0.001 0.008 0.032
C(pay_rec)[T.4.0] -0.0063 0.006 -1.094 0.274 -0.018 0.005
fhisp -0.0167 0.003 -5.378 0.000 -0.023 -0.011
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3086812\n", "Model: Logit Df Residuals: 3086804\n", "Method: MLE Df Model: 7\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.500e-05\n", "Time: 14:28:14 Log-Likelihood: -2.1386e+06\n", "converged: True LL-Null: -2.1386e+06\n", " LLR p-value: 3.965e-20\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0593 0.002 25.249 0.000 0.055 0.064\n", "C(fbrace)[T.2.0] -0.0271 0.003 -7.980 0.000 -0.034 -0.020\n", "C(fbrace)[T.3.0] -0.0297 0.011 -2.696 0.007 -0.051 -0.008\n", "C(fbrace)[T.4.0] 0.0056 0.004 1.239 0.216 -0.003 0.014\n", "C(pay_rec)[T.2.0] -0.0043 0.003 -1.680 0.093 -0.009 0.001\n", "C(pay_rec)[T.3.0] 0.0203 0.006 3.331 0.001 0.008 0.032\n", "C(pay_rec)[T.4.0] -0.0063 0.006 -1.094 0.274 -0.018 0.005\n", "fhisp -0.0167 0.003 -5.378 0.000 -0.023 -0.011\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + C(pay_rec)')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But the effect of prenatal visits is still a strong predictor of more girls." ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692778\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 102.8 *\n", "C(fbrace)[T.3.0] 105.8 102.3 *\n", "C(fbrace)[T.4.0] 105.8 106.4 \n", "fhisp 105.8 104.0 *\n", "previs 105.8 104.8 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155434
Method: MLE Df Model: 5
Date: Tue, 17 May 2016 Pseudo R-squ.: 7.997e-05
Time: 14:28:40 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 2.081e-73
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0567 0.001 38.800 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0295 0.003 -9.008 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0341 0.011 -3.114 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0058 0.004 1.314 0.189 -0.003 0.014
fhisp -0.0172 0.003 -5.862 0.000 -0.023 -0.011
previs -0.0102 0.001 -16.235 0.000 -0.011 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3155440\n", "Model: Logit Df Residuals: 3155434\n", "Method: MLE Df Model: 5\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 7.997e-05\n", "Time: 14:28:40 Log-Likelihood: -2.1860e+06\n", "converged: True LL-Null: -2.1862e+06\n", " LLR p-value: 2.081e-73\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0567 0.001 38.800 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0295 0.003 -9.008 0.000 -0.036 -0.023\n", "C(fbrace)[T.3.0] -0.0341 0.011 -3.114 0.002 -0.056 -0.013\n", "C(fbrace)[T.4.0] 0.0058 0.004 1.314 0.189 -0.003 0.014\n", "fhisp -0.0172 0.003 -5.862 0.000 -0.023 -0.011\n", "previs -0.0102 0.001 -16.235 0.000 -0.011 -0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the effect is even stronger if we add a boolean to capture the nonlinearity at 0 visits." ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692776\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.9 102.8 *\n", "C(fbrace)[T.3.0] 105.9 102.3 *\n", "C(fbrace)[T.4.0] 105.9 106.5 \n", "fhisp 105.9 104.1 *\n", "previs 105.9 104.7 *\n", "no_previs 105.9 101.0 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05
Time: 14:29:06 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 8.674e-76
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.001 38.973 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014
fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011
previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010
no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3155440\n", "Model: Logit Df Residuals: 3155433\n", "Method: MLE Df Model: 6\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05\n", "Time: 14:29:06 Log-Likelihood: -2.1860e+06\n", "converged: True LL-Null: -2.1862e+06\n", " LLR p-value: 8.674e-76\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0570 0.001 38.973 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023\n", "C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013\n", "C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014\n", "fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011\n", "previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010\n", "no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More controls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now if we control for father's race and Hispanic origin as well as number of prenatal visits, the effect of marriage disappears." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692778\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.3 102.1 *\n", "C(fbrace)[T.3.0] 105.3 101.7 *\n", "C(fbrace)[T.4.0] 105.3 106.0 \n", "fhisp 105.3 103.5 *\n", "previs 105.3 104.3 *\n", "dmar 105.3 105.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.045e-05
Time: 14:29:32 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 6.525e-73
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0521 0.003 15.015 0.000 0.045 0.059
C(fbrace)[T.2.0] -0.0309 0.003 -9.058 0.000 -0.038 -0.024
C(fbrace)[T.3.0] -0.0353 0.011 -3.210 0.001 -0.057 -0.014
C(fbrace)[T.4.0] 0.0062 0.004 1.394 0.163 -0.002 0.015
fhisp -0.0181 0.003 -6.033 0.000 -0.024 -0.012
previs -0.0102 0.001 -16.122 0.000 -0.011 -0.009
dmar 0.0037 0.003 1.446 0.148 -0.001 0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3155440\n", "Model: Logit Df Residuals: 3155433\n", "Method: MLE Df Model: 6\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.045e-05\n", "Time: 14:29:32 Log-Likelihood: -2.1860e+06\n", "converged: True LL-Null: -2.1862e+06\n", " LLR p-value: 6.525e-73\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0521 0.003 15.015 0.000 0.045 0.059\n", "C(fbrace)[T.2.0] -0.0309 0.003 -9.058 0.000 -0.038 -0.024\n", "C(fbrace)[T.3.0] -0.0353 0.011 -3.210 0.001 -0.057 -0.014\n", "C(fbrace)[T.4.0] 0.0062 0.004 1.394 0.163 -0.002 0.015\n", "fhisp -0.0181 0.003 -6.033 0.000 -0.024 -0.012\n", "previs -0.0102 0.001 -16.122 0.000 -0.011 -0.009\n", "dmar 0.0037 0.003 1.446 0.148 -0.001 0.009\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + dmar')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect of payment method disappears." ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692777\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.8 102.8 *\n", "C(fbrace)[T.3.0] 105.8 102.2 *\n", "C(fbrace)[T.4.0] 105.8 106.3 \n", "C(pay_rec)[T.2.0] 105.8 105.9 \n", "C(pay_rec)[T.3.0] 105.8 106.9 \n", "C(pay_rec)[T.4.0] 105.8 105.0 \n", "fhisp 105.8 104.0 *\n", "previs 105.8 104.8 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3009712
Model: Logit Df Residuals: 3009703
Method: MLE Df Model: 8
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.163e-05
Time: 14:30:20 Log-Likelihood: -2.0851e+06
converged: True LL-Null: -2.0852e+06
LLR p-value: 1.004e-68
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0566 0.002 23.765 0.000 0.052 0.061
C(fbrace)[T.2.0] -0.0295 0.003 -8.509 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0345 0.011 -3.090 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0046 0.005 1.012 0.312 -0.004 0.014
C(pay_rec)[T.2.0] 0.0005 0.003 0.174 0.862 -0.005 0.006
C(pay_rec)[T.3.0] 0.0100 0.006 1.619 0.105 -0.002 0.022
C(pay_rec)[T.4.0] -0.0074 0.006 -1.260 0.208 -0.019 0.004
fhisp -0.0178 0.003 -5.687 0.000 -0.024 -0.012
previs -0.0101 0.001 -15.540 0.000 -0.011 -0.009
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3009712\n", "Model: Logit Df Residuals: 3009703\n", "Method: MLE Df Model: 8\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.163e-05\n", "Time: 14:30:20 Log-Likelihood: -2.0851e+06\n", "converged: True LL-Null: -2.0852e+06\n", " LLR p-value: 1.004e-68\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0566 0.002 23.765 0.000 0.052 0.061\n", "C(fbrace)[T.2.0] -0.0295 0.003 -8.509 0.000 -0.036 -0.023\n", "C(fbrace)[T.3.0] -0.0345 0.011 -3.090 0.002 -0.056 -0.013\n", "C(fbrace)[T.4.0] 0.0046 0.005 1.012 0.312 -0.004 0.014\n", "C(pay_rec)[T.2.0] 0.0005 0.003 0.174 0.862 -0.005 0.006\n", "C(pay_rec)[T.3.0] 0.0100 0.006 1.619 0.105 -0.002 0.022\n", "C(pay_rec)[T.4.0] -0.0074 0.006 -1.260 0.208 -0.019 0.004\n", "fhisp -0.0178 0.003 -5.687 0.000 -0.024 -0.012\n", "previs -0.0101 0.001 -15.540 0.000 -0.011 -0.009\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + C(pay_rec)')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a version with the addition of a boolean for no prenatal visits." ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692776\n", " Iterations 3\n", "C(fbrace)[T.2.0] 105.9 102.8 *\n", "C(fbrace)[T.3.0] 105.9 102.3 *\n", "C(fbrace)[T.4.0] 105.9 106.5 \n", "fhisp 105.9 104.1 *\n", "previs 105.9 104.7 *\n", "no_previs 105.9 101.0 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155433
Method: MLE Df Model: 6
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05
Time: 14:30:47 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 8.674e-76
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0570 0.001 38.973 0.000 0.054 0.060
C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014
fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011
previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010
no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3155440\n", "Model: Logit Df Residuals: 3155433\n", "Method: MLE Df Model: 6\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.351e-05\n", "Time: 14:30:47 Log-Likelihood: -2.1860e+06\n", "converged: True LL-Null: -2.1862e+06\n", " LLR p-value: 8.674e-76\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0570 0.001 38.973 0.000 0.054 0.060\n", "C(fbrace)[T.2.0] -0.0294 0.003 -8.984 0.000 -0.036 -0.023\n", "C(fbrace)[T.3.0] -0.0342 0.011 -3.123 0.002 -0.056 -0.013\n", "C(fbrace)[T.4.0] 0.0056 0.004 1.270 0.204 -0.003 0.014\n", "fhisp -0.0171 0.003 -5.817 0.000 -0.023 -0.011\n", "previs -0.0111 0.001 -16.625 0.000 -0.012 -0.010\n", "no_previs -0.0469 0.012 -3.936 0.000 -0.070 -0.024\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, surprisingly, the mother's age has a small effect." ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692775\n", " Iterations 3\n", "C(fbrace)[T.2.0] 106.8 103.6 *\n", "C(fbrace)[T.3.0] 106.8 103.1 *\n", "C(fbrace)[T.4.0] 106.8 107.4 \n", "fhisp 106.8 104.9 *\n", "previs 106.8 105.6 *\n", "no_previs 106.8 101.9 *\n", "mager9 106.8 106.6 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3155440
Model: Logit Df Residuals: 3155432
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.440e-05
Time: 14:31:14 Log-Likelihood: -2.1860e+06
converged: True LL-Null: -2.1862e+06
LLR p-value: 1.043e-75
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0656 0.005 14.344 0.000 0.057 0.075
C(fbrace)[T.2.0] -0.0300 0.003 -9.123 0.000 -0.036 -0.024
C(fbrace)[T.3.0] -0.0351 0.011 -3.200 0.001 -0.057 -0.014
C(fbrace)[T.4.0] 0.0062 0.004 1.413 0.158 -0.002 0.015
fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012
previs -0.0110 0.001 -16.456 0.000 -0.012 -0.010
no_previs -0.0468 0.012 -3.926 0.000 -0.070 -0.023
mager9 -0.0019 0.001 -1.970 0.049 -0.004 -9.69e-06
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3155440\n", "Model: Logit Df Residuals: 3155432\n", "Method: MLE Df Model: 7\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.440e-05\n", "Time: 14:31:14 Log-Likelihood: -2.1860e+06\n", "converged: True LL-Null: -2.1862e+06\n", " LLR p-value: 1.043e-75\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0656 0.005 14.344 0.000 0.057 0.075\n", "C(fbrace)[T.2.0] -0.0300 0.003 -9.123 0.000 -0.036 -0.024\n", "C(fbrace)[T.3.0] -0.0351 0.011 -3.200 0.001 -0.057 -0.014\n", "C(fbrace)[T.4.0] 0.0062 0.004 1.413 0.158 -0.002 0.015\n", "fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012\n", "previs -0.0110 0.001 -16.456 0.000 -0.012 -0.010\n", "no_previs -0.0468 0.012 -3.926 0.000 -0.070 -0.023\n", "mager9 -0.0019 0.001 -1.970 0.049 -0.004 -9.69e-06\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + mager9')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So does the father's age. But both age effects are small and borderline significant." ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692775\n", " Iterations 3\n", "C(fbrace)[T.2.0] 106.9 103.7 *\n", "C(fbrace)[T.3.0] 106.9 103.2 *\n", "C(fbrace)[T.4.0] 106.9 107.6 \n", "fhisp 106.9 105.0 *\n", "previs 106.9 105.7 *\n", "no_previs 106.9 101.8 *\n", "fagerrec11 106.9 106.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 3148537
Model: Logit Df Residuals: 3148529
Method: MLE Df Model: 7
Date: Tue, 17 May 2016 Pseudo R-squ.: 8.517e-05
Time: 14:32:34 Log-Likelihood: -2.1812e+06
converged: True LL-Null: -2.1814e+06
LLR p-value: 2.924e-76
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0663 0.004 15.399 0.000 0.058 0.075
C(fbrace)[T.2.0] -0.0299 0.003 -9.100 0.000 -0.036 -0.023
C(fbrace)[T.3.0] -0.0348 0.011 -3.170 0.002 -0.056 -0.013
C(fbrace)[T.4.0] 0.0067 0.004 1.518 0.129 -0.002 0.015
fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012
previs -0.0110 0.001 -16.545 0.000 -0.012 -0.010
no_previs -0.0483 0.012 -4.039 0.000 -0.072 -0.025
fagerrec11 -0.0019 0.001 -2.278 0.023 -0.003 -0.000
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 3148537\n", "Model: Logit Df Residuals: 3148529\n", "Method: MLE Df Model: 7\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 8.517e-05\n", "Time: 14:32:34 Log-Likelihood: -2.1812e+06\n", "converged: True LL-Null: -2.1814e+06\n", " LLR p-value: 2.924e-76\n", "====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------------\n", "Intercept 0.0663 0.004 15.399 0.000 0.058 0.075\n", "C(fbrace)[T.2.0] -0.0299 0.003 -9.100 0.000 -0.036 -0.023\n", "C(fbrace)[T.3.0] -0.0348 0.011 -3.170 0.002 -0.056 -0.013\n", "C(fbrace)[T.4.0] 0.0067 0.004 1.518 0.129 -0.002 0.015\n", "fhisp -0.0176 0.003 -5.974 0.000 -0.023 -0.012\n", "previs -0.0110 0.001 -16.545 0.000 -0.012 -0.010\n", "no_previs -0.0483 0.012 -4.039 0.000 -0.072 -0.025\n", "fagerrec11 -0.0019 0.001 -2.278 0.023 -0.003 -0.000\n", "====================================================================================\n", "\"\"\"" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(fbrace) + fhisp + previs + no_previs + fagerrec11')\n", "model = smf.logit(formula, data=df)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What's up with prenatal visits?\n", "\n", "The predictive power of prenatal visits is still surprising to me. To make sure we're controlled for race, I'll select cases where both parents are white:" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "2400787" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "white = df[(df.mbrace==1) & (df.fbrace==1)]\n", "len(white)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And compute sex ratios for each level of `previs`" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
boy
previs
-6107
-5110
-4108
-3110
-2108
-1107
0105
1103
2103
3102
4103
\n", "
" ], "text/plain": [ " boy\n", "previs \n", "-6 107\n", "-5 110\n", "-4 108\n", "-3 110\n", "-2 108\n", "-1 107\n", " 0 105\n", " 1 103\n", " 2 103\n", " 3 102\n", " 4 103" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "var = 'previs'\n", "white[[var, 'boy']].groupby(var).aggregate(series_to_ratio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The effect holds up. People with fewer than average prenatal visits are substantially more likely to have boys." ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692749\n", " Iterations 3\n", "previs 105.5 104.3 *\n", "no_previs 105.5 100.4 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2346785
Model: Logit Df Residuals: 2346782
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 6.418e-05
Time: 14:40:39 Log-Likelihood: -1.6257e+06
converged: True LL-Null: -1.6258e+06
LLR p-value: 4.790e-46
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0534 0.001 40.728 0.000 0.051 0.056
previs -0.0113 0.001 -14.378 0.000 -0.013 -0.010
no_previs -0.0490 0.015 -3.352 0.001 -0.078 -0.020
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2346785\n", "Model: Logit Df Residuals: 2346782\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 6.418e-05\n", "Time: 14:40:39 Log-Likelihood: -1.6257e+06\n", "converged: True LL-Null: -1.6258e+06\n", " LLR p-value: 4.790e-46\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0534 0.001 40.728 0.000 0.051 0.056\n", "previs -0.0113 0.001 -14.378 0.000 -0.013 -0.010\n", "no_previs -0.0490 0.015 -3.352 0.001 -0.078 -0.020\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ previs + no_previs')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(0.053449172473506806, -0.011302385985286368)" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inter = results.params['Intercept']\n", "slope = results.params['previs']\n", "inter, slope" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([ 111.62346508, 110.36895641, 109.12854687, 107.90207798,\n", " 106.68939307, 105.49033723, 104.30475728, 103.13250177,\n", " 101.97342096, 100.82736677])" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "previs = np.arange(-5, 5)\n", "logodds = inter + slope * previs\n", "odds = np.exp(logodds)\n", "odds * 100" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692788\n", " Iterations 3\n", "dmar 105.3 105.5 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400785
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 7.406e-08
Time: 15:27:21 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.6196
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0518 0.004 13.234 0.000 0.044 0.059
dmar 0.0014 0.003 0.496 0.620 -0.004 0.007
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2400787\n", "Model: Logit Df Residuals: 2400785\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 7.406e-08\n", "Time: 15:27:21 Log-Likelihood: -1.6632e+06\n", "converged: True LL-Null: -1.6632e+06\n", " LLR p-value: 0.6196\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0518 0.004 13.234 0.000 0.044 0.059\n", "dmar 0.0014 0.003 0.496 0.620 -0.004 0.007\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ dmar')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692788\n", " Iterations 3\n", "lowed 105.6 105.0 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2301234
Model: Logit Df Residuals: 2301232
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.759e-07
Time: 15:28:01 Log-Likelihood: -1.5943e+06
converged: True LL-Null: -1.5943e+06
LLR p-value: 0.2180
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.001 38.603 0.000 0.051 0.057
lowed -0.0051 0.004 -1.232 0.218 -0.013 0.003
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2301234\n", "Model: Logit Df Residuals: 2301232\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 4.759e-07\n", "Time: 15:28:01 Log-Likelihood: -1.5943e+06\n", "converged: True LL-Null: -1.5943e+06\n", " LLR p-value: 0.2180\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0542 0.001 38.603 0.000 0.051 0.057\n", "lowed -0.0051 0.004 -1.232 0.218 -0.013 0.003\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ lowed')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692788\n", " Iterations 3\n", "highbo 105.5 105.6 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2391630
Model: Logit Df Residuals: 2391628
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 4.564e-09
Time: 15:28:25 Log-Likelihood: -1.6569e+06
converged: True LL-Null: -1.6569e+06
LLR p-value: 0.9021
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0535 0.001 40.493 0.000 0.051 0.056
highbo 0.0008 0.006 0.123 0.902 -0.012 0.013
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2391630\n", "Model: Logit Df Residuals: 2391628\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 4.564e-09\n", "Time: 15:28:25 Log-Likelihood: -1.6569e+06\n", "converged: True LL-Null: -1.6569e+06\n", " LLR p-value: 0.9021\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0535 0.001 40.493 0.000 0.051 0.056\n", "highbo 0.0008 0.006 0.123 0.902 -0.012 0.013\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ highbo')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692786\n", " Iterations 3\n", "wic[T.Y] 105.6 105.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2266424
Model: Logit Df Residuals: 2266422
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 3.840e-07
Time: 15:28:57 Log-Likelihood: -1.5701e+06
converged: True LL-Null: -1.5701e+06
LLR p-value: 0.2721
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0548 0.002 33.369 0.000 0.052 0.058
wic[T.Y] -0.0031 0.003 -1.098 0.272 -0.009 0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2266424\n", "Model: Logit Df Residuals: 2266422\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 3.840e-07\n", "Time: 15:28:57 Log-Likelihood: -1.5701e+06\n", "converged: True LL-Null: -1.5701e+06\n", " LLR p-value: 0.2721\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0548 0.002 33.369 0.000 0.052 0.058\n", "wic[T.Y] -0.0031 0.003 -1.098 0.272 -0.009 0.002\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ wic')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692788\n", " Iterations 3\n", "obese 105.6 105.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2244349
Model: Logit Df Residuals: 2244347
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.725e-07
Time: 15:29:20 Log-Likelihood: -1.5549e+06
converged: True LL-Null: -1.5549e+06
LLR p-value: 0.4639
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.002 35.607 0.000 0.051 0.057
obese -0.0023 0.003 -0.732 0.464 -0.009 0.004
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2244349\n", "Model: Logit Df Residuals: 2244347\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.725e-07\n", "Time: 15:29:20 Log-Likelihood: -1.5549e+06\n", "converged: True LL-Null: -1.5549e+06\n", " LLR p-value: 0.4639\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0542 0.002 35.607 0.000 0.051 0.057\n", "obese -0.0023 0.003 -0.732 0.464 -0.009 0.004\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ obese')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 123, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692786\n", " Iterations 3\n", "C(pay_rec)[T.2.0] 105.4 105.5 \n", "C(pay_rec)[T.3.0] 105.4 107.1 *\n", "C(pay_rec)[T.4.0] 105.4 105.3 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2295681
Model: Logit Df Residuals: 2295677
Method: MLE Df Model: 3
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.666e-06
Time: 15:30:06 Log-Likelihood: -1.5904e+06
converged: True LL-Null: -1.5904e+06
LLR p-value: 0.1511
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0529 0.002 23.356 0.000 0.048 0.057
C(pay_rec)[T.2.0] 0.0004 0.003 0.147 0.883 -0.005 0.006
C(pay_rec)[T.3.0] 0.0159 0.007 2.235 0.025 0.002 0.030
C(pay_rec)[T.4.0] -0.0013 0.007 -0.197 0.844 -0.015 0.012
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2295681\n", "Model: Logit Df Residuals: 2295677\n", "Method: MLE Df Model: 3\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.666e-06\n", "Time: 15:30:06 Log-Likelihood: -1.5904e+06\n", "converged: True LL-Null: -1.5904e+06\n", " LLR p-value: 0.1511\n", "=====================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "-------------------------------------------------------------------------------------\n", "Intercept 0.0529 0.002 23.356 0.000 0.048 0.057\n", "C(pay_rec)[T.2.0] 0.0004 0.003 0.147 0.883 -0.005 0.006\n", "C(pay_rec)[T.3.0] 0.0159 0.007 2.235 0.025 0.002 0.030\n", "C(pay_rec)[T.4.0] -0.0013 0.007 -0.197 0.844 -0.015 0.012\n", "=====================================================================================\n", "\"\"\"" ] }, "execution_count": 123, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ C(pay_rec)')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 124, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692786\n", " Iterations 3\n", "mager9 107.0 106.7 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400785
Method: MLE Df Model: 1
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.516e-06
Time: 15:30:32 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.003813
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0677 0.005 13.452 0.000 0.058 0.078
mager9 -0.0032 0.001 -2.893 0.004 -0.005 -0.001
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2400787\n", "Model: Logit Df Residuals: 2400785\n", "Method: MLE Df Model: 1\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.516e-06\n", "Time: 15:30:32 Log-Likelihood: -1.6632e+06\n", "converged: True LL-Null: -1.6632e+06\n", " LLR p-value: 0.003813\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0677 0.005 13.452 0.000 0.058 0.078\n", "mager9 -0.0032 0.001 -2.893 0.004 -0.005 -0.001\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ mager9')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 125, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692787\n", " Iterations 3\n", "youngm[T.True] 105.6 105.5 \n", "oldm[T.True] 105.6 103.8 *\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2400787
Model: Logit Df Residuals: 2400784
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 1.549e-06
Time: 15:31:04 Log-Likelihood: -1.6632e+06
converged: True LL-Null: -1.6632e+06
LLR p-value: 0.07608
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0542 0.001 40.370 0.000 0.052 0.057
youngm[T.True] -0.0011 0.006 -0.170 0.865 -0.013 0.011
oldm[T.True] -0.0173 0.008 -2.268 0.023 -0.032 -0.002
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2400787\n", "Model: Logit Df Residuals: 2400784\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 1.549e-06\n", "Time: 15:31:04 Log-Likelihood: -1.6632e+06\n", "converged: True LL-Null: -1.6632e+06\n", " LLR p-value: 0.07608\n", "==================================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "----------------------------------------------------------------------------------\n", "Intercept 0.0542 0.001 40.370 0.000 0.052 0.057\n", "youngm[T.True] -0.0011 0.006 -0.170 0.865 -0.013 0.011\n", "oldm[T.True] -0.0173 0.008 -2.268 0.023 -0.032 -0.002\n", "==================================================================================\n", "\"\"\"" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ youngm + oldm')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.692787\n", " Iterations 3\n", "youngf 105.5 106.4 \n", "oldf 105.5 105.7 \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Logit Regression Results
Dep. Variable: boy No. Observations: 2396141
Model: Logit Df Residuals: 2396138
Method: MLE Df Model: 2
Date: Tue, 17 May 2016 Pseudo R-squ.: 2.717e-07
Time: 15:31:50 Log-Likelihood: -1.6600e+06
converged: True LL-Null: -1.6600e+06
LLR p-value: 0.6370
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
coef std err z P>|z| [95.0% Conf. Int.]
Intercept 0.0534 0.001 40.229 0.000 0.051 0.056
youngf 0.0082 0.009 0.924 0.355 -0.009 0.026
oldf 0.0018 0.008 0.242 0.809 -0.013 0.017
" ], "text/plain": [ "\n", "\"\"\"\n", " Logit Regression Results \n", "==============================================================================\n", "Dep. Variable: boy No. Observations: 2396141\n", "Model: Logit Df Residuals: 2396138\n", "Method: MLE Df Model: 2\n", "Date: Tue, 17 May 2016 Pseudo R-squ.: 2.717e-07\n", "Time: 15:31:50 Log-Likelihood: -1.6600e+06\n", "converged: True LL-Null: -1.6600e+06\n", " LLR p-value: 0.6370\n", "==============================================================================\n", " coef std err z P>|z| [95.0% Conf. Int.]\n", "------------------------------------------------------------------------------\n", "Intercept 0.0534 0.001 40.229 0.000 0.051 0.056\n", "youngf 0.0082 0.009 0.924 0.355 -0.009 0.026\n", "oldf 0.0018 0.008 0.242 0.809 -0.013 0.017\n", "==============================================================================\n", "\"\"\"" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "formula = ('boy ~ youngf + oldf')\n", "model = smf.logit(formula, data=white)\n", "results = model.fit()\n", "summarize(results)\n", "results.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }