{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Living Arrangements Examples from CPS\n", "\n", "September 1, 2020\n", "\n", "Attempt at replication of [this](https://www.federalreserve.gov/econres/notes/feds-notes/an-early-evaluation-of-the-effects-of-the-pandemic-on-living-arrangements-and-household-formation-20200807.htm). \n", "\n", "---- \n", "\n", "**Update**: September 17, 2020\n", "\n", "The gap between the published results and my local calculations seems to be coming from how headship is defined. In the CPS, the household ID is the person ID of the wife in husband-wife households and the reference person in all other households. In the published data, the head of household is defined using a process that assigns an average headship to each person in the CPS.\n", "\n", "Currently trying to implement a new variable in the bd CPS called `HEAD` which is the average headship rate for an individual age 16 or older, based on Paciorek (2013, 2016). \n", "\n", "Also, because the published results are seasonally adjusted and the raw data have a clear seasonal pattern, it would be helpful to give an overview of the seasonal factors. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T01:11:41.038438Z", "start_time": "2020-09-18T01:11:40.850556Z" } }, "outputs": [], "source": [ "import pandas as pd\n", "comp_data = pd.read_csv('fed_hh_example.csv')\n", "\n", "import os\n", "os.chdir('/home/brian/Documents/CPS/data/')\n", "os.environ['X13PATH'] = '/home/brian/Documents/econ_data/micro/x13as/'\n", "\n", "import re, struct\n", "import numpy as np\n", "from statsmodels.tsa.x13 import x13_arima_analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregate approach" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T02:54:56.224716Z", "start_time": "2020-09-18T02:54:46.250243Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/brian/miniconda3/lib/python3.8/site-packages/statsmodels/tsa/x13.py:187: X13Warning: WARNING: At least one visually significant trading day peak has been\n", " found in one or more of the estimated spectra.\n", " warn(errors, X13Warning)\n" ] } ], "source": [ "cols = ['QSTNUM', 'AGE', 'YEAR', 'MONTH', 'HHWGT', 'PWSSWGT']\n", "\n", "df = pd.concat([pd.read_feather(f'clean/cps{year}.ft', columns=cols)\n", " .query('AGE > 15') \n", " for year in range(1996, 2021)])\n", "\n", "headship_rate = (lambda grp: grp.groupby('QSTNUM').HHWGT.first().sum()\n", " / grp.PWSSWGT.sum())\n", "\n", "data = (df.groupby(['YEAR', 'MONTH']).apply(headship_rate)).reset_index()\n", "data['DATE'] = pd.to_datetime(dict(year=data.YEAR, month=data.MONTH, day=1))\n", "data = data.set_index('DATE').drop(['YEAR', 'MONTH'], axis=1) * 100\n", "\n", "sm = x13_arima_analysis(data[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \"Average\" approach" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T02:47:04.218571Z", "start_time": "2020-09-18T02:46:59.822246Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/brian/miniconda3/lib/python3.8/site-packages/statsmodels/tsa/x13.py:187: X13Warning: WARNING: At least one visually significant seasonal peak has been found\n", " in the estimated spectrum of the regARIMA residuals.\n", " warn(errors, X13Warning)\n" ] } ], "source": [ "cols = ['QSTNUM', 'AGE', 'YEAR', 'MONTH', 'PWSSWGT', 'HEAD']\n", "\n", "df = pd.concat([pd.read_feather(f'clean/cps{year}.ft', columns=cols)\n", " .query('AGE > 15') \n", " for year in range(1996, 2021)])\n", "\n", "headship_rate = (lambda grp: np.average(grp.HEAD, weights=grp.PWSSWGT))\n", "data = (df.groupby(['YEAR', 'MONTH']).apply(headship_rate)).reset_index()\n", "data['DATE'] = pd.to_datetime(dict(year=data.YEAR, month=data.MONTH, day=1))\n", "data = data.set_index('DATE').drop(['YEAR', 'MONTH'], axis=1) * 100\n", "\n", "sm = x13_arima_analysis(data[0])" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T02:55:03.469956Z", "start_time": "2020-09-18T02:55:03.216779Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "comp_data['DATE'] = pd.to_datetime(comp_data['date'])\n", "comb = (comp_data.set_index('DATE')\n", " .join(sm.seasadj)\n", " .rename({'headship': 'Paciorek', 'seasadj': 'Dew'}, axis=1))\n", "comb[['Paciorek', 'Dew']].plot(title='Aggregate headship rate, percent');" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T02:55:08.857435Z", "start_time": "2020-09-18T02:55:08.711597Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "(comb['Dew'] - comb['Paciorek']).plot(title='Difference');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Average monthly pattern from x13as adjustment" ] }, { "cell_type": "code", "execution_count": 126, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T03:05:23.739234Z", "start_time": "2020-09-18T03:05:23.614542Z" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "((sm.observed - sm.seasadj).reset_index()\n", " .assign(MONTH = lambda x: x.DATE.dt.month)\n", " .groupby('MONTH')[0].mean().plot());" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Looking at individual months to figure out difference\n", "\n", "### Update: Try taking average of individual \"headship rate\"\n", "\n", "Assign an average headship to each person in the CPS, attempting to replicate Paciorek (2013, 2016)." ] }, { "cell_type": "code", "execution_count": 127, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T03:07:17.962314Z", "start_time": "2020-09-18T03:07:17.431135Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "May 2020\n", " Result (average): 49.42 \n", "\n", " Result (aggregate): 49.13 \n", "\n", " Goal: 49.12\n", "\n", " Households with incorrect headship sum: 0\n" ] } ], "source": [ "date = '2020-05-01'\n", "cpsdt = pd.to_datetime(date).strftime('%b%y').lower()\n", "textdt = pd.to_datetime(date).strftime('%B %Y')\n", "print(textdt)\n", "\n", "# manually list out the IDs for series of interest \n", "var_names = ['PWSSWGT', 'QSTNUM', 'PRTAGE', 'PULINENO', \n", " 'PESPOUSE', 'HURESPLI'] \n", "\n", "dd = '2020_Basic_CPS_Public_Use_Record_Layout_plus_IO_Code_list.txt'\n", "data_dict = open(dd, 'r', encoding='iso-8859-1').read()\n", "\n", "p = f'\\n({\"|\".join(var_names)})\\s+(\\d+)\\s+.*?\\t+.*?(\\d\\d*).*?(\\d\\d+)'\n", "\n", "d = {s[0]: [int(s[2])-1, int(s[3]), f'{s[1]}s']\n", " for s in re.findall(p, data_dict)}\n", "\n", "start, end, width = zip(*d.values())\n", "skip = ([f'{s - e}x' for s, e in zip(start, [0] + list(end[:-1]))])\n", "unpack_fmt = ''.join([j for i in zip(skip, width) for j in i])\n", "unpacker = struct.Struct(unpack_fmt).unpack_from \n", "\n", "file = f'{cpsdt}pub.dat'\n", "raw_data = open(file, 'rb').readlines()\n", "data = [[*map(int, unpacker(row))] for row in raw_data]\n", "df = pd.DataFrame(data, columns=d.keys())\n", "\n", "hh16 = lambda x: np.where(x.PRTAGE > 15, 1, 0)\n", "\n", "hhsize = lambda x: x.groupby('QSTNUM').HH16.transform('sum') # count of age 16+\n", "\n", "head1 = (lambda x: (np.where((x.HURESPLI == x.PULINENO) & (x.HHSIZE == 1), 1, # alone (1)\n", " np.where((x.HURESPLI == x.PULINENO) & (x.PESPOUSE > 0), 0.5, # spouse1 (0.5)\n", " np.where((x.HURESPLI != -1) & (x.HURESPLI == x.PESPOUSE), 0.5, # spouse2 (0.5)\n", " np.nan)))))\n", "\n", "hhsum = lambda x: x.groupby('QSTNUM').HEAD1.transform('sum')\n", "\n", "head2 = (lambda x: (np.where(x.HEAD1.notnull(), np.nan, # already identified\n", " np.where(x.HH16 == 0, np.nan, # under 16\n", " np.where((x.HHSUM == 1) & (x.HEAD1.isnull()), 0, # living with family (0)\n", " 1/x.HHSIZE))))) # roommates (1/nr)\n", "\n", "head = (lambda x: (np.where(x.HEAD1.notnull(), x.HEAD1, \n", " np.where(x.HEAD2.notnull(), x.HEAD2, np.nan))))\n", "\n", "hhsum2 = lambda x: x.groupby('QSTNUM').HEAD.transform('sum')\n", "\n", "droplist = ['HH16', 'HHSIZE', 'HEAD1', 'HHSUM', 'HEAD2', 'HURESPLI']\n", "\n", "data = df.assign(HH16 = hh16, \n", " HHSIZE = hhsize, \n", " HEAD1 = head1, \n", " HHSUM = hhsum, \n", " HEAD2 = head2, \n", " HEAD = head,\n", " HHSUM2 = hhsum2\n", " ).drop(droplist, axis=1)\n", "\n", "d1 = data.query('PRTAGE > 15')\n", "result = np.average(d1.HEAD, weights=d1.PWSSWGT)\n", "print(f' Result (average): {result*100:.2f}',\n", " '\\n\\n',f'Result (aggregate): ', comb.loc[date, 'Dew'].round(2),\n", " '\\n\\n','Goal: ', comb.loc[date, 'Paciorek'])\n", "\n", "incorrect = len(d1[~d1.HHSUM2.between(0.99, 1.01)])\n", "print(f'\\n Households with incorrect headship sum: {incorrect}')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 79, "metadata": { "ExecuteTime": { "end_time": "2020-09-18T02:24:43.798969Z", "start_time": "2020-09-18T02:24:43.776072Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PRTAGEPESPOUSEPULINENOPWSSWGTQSTNUMHEAD
072-1117479947119061.0
1802117816015114370.5
2801218182721114370.5
450-1124249454248330.5
585-1223134010248330.5
.....................
12335060123273148212480.5
12335176212613962295760.5
12335277122532948295760.5
12335737215620033362370.5
12335835128175390362370.5
\n", "

76441 rows × 6 columns

\n", "
" ], "text/plain": [ " PRTAGE PESPOUSE PULINENO PWSSWGT QSTNUM HEAD\n", "0 72 -1 1 17479947 11906 1.0\n", "1 80 2 1 17816015 11437 0.5\n", "2 80 1 2 18182721 11437 0.5\n", "4 50 -1 1 24249454 24833 0.5\n", "5 85 -1 2 23134010 24833 0.5\n", "... ... ... ... ... ... ...\n", "123350 60 1 2 3273148 21248 0.5\n", "123351 76 2 1 2613962 29576 0.5\n", "123352 77 1 2 2532948 29576 0.5\n", "123357 37 2 1 5620033 36237 0.5\n", "123358 35 1 2 8175390 36237 0.5\n", "\n", "[76441 rows x 6 columns]" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2020-09-11T23:37:04.244044Z", "start_time": "2020-09-11T23:37:04.237842Z" } }, "source": [ "### Living with Family -- Does that match?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2020-09-15T05:18:07.636690Z", "start_time": "2020-09-15T05:18:07.627495Z" } }, "outputs": [ { "data": { "text/plain": [ "36.39200602822098" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data.query('HWHHWTLN != PULINENO and PERRP in [48, 49, 50, 51, 52, 53]').PWSSWGT.sum() / \n", " data.PWSSWGT.sum()) * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goal: 20.76" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2020-09-15T05:18:07.646736Z", "start_time": "2020-09-15T05:18:07.637634Z" } }, "outputs": [ { "data": { "text/plain": [ "18.20508384690031" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(data.query('PRFAMREL not in [0, 1, 2]').PWCMPWGT.sum() / \n", " data.PWCMPWGT.sum()) * 100" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-09-15T16:51:36.829131Z", "start_time": "2020-09-15T16:51:36.827183Z" } }, "outputs": [], "source": [ "# Working area\n", "\n", "#d1['HHEAD'] = d1.groupby('QSTNUM').HEAD.transform('sum')\n", "#d1[d1.HHEAD < 0.99]\n", "#d1.query('QSTNUM == 37931')\n", "\n", "\n", " # More detailed headship indicator for those 16+\n", " #spcheck = lambda x: 1 if x.HURESPLI.iloc[0] in x.SPOUSE.to_list() else 0\n", " #dfm['SPCHECK'] = dfm.QSTNUM.map(dfm.groupby('QSTNUM').apply(spcheck).to_dict())\n", " \n", " # identify husband-wife household\n", " #hhcat = lambda x: np.where((x.HRHTYPE.isin([1, 2])) & (x.SPCHECK == 1), 1, 0)\n", "\n", " # identify those age 16 or older\n", " #hh16 = lambda x: np.where(x.AGE > 15, 1, 0)\n", "\n", " # count of age 16+\n", " #hhsize = lambda x: x.groupby('QSTNUM').HH16.transform('sum') \n", "\n", " # from Paciorek (2013, 2016)\n", " #head = (lambda x: (np.where((x.HURESPLI == x.PULINENO) & (x.HHSIZE == 1), 1, \n", " # np.where((x.HURESPLI == x.PULINENO) & (x.SPOUSE > 0), 0.5, \n", " # np.where(x.HURESPLI == x.SPOUSE, 0.5, \n", " # np.where((x.HHCAT == 1) & (x.HH16 == 1), 0, \n", " # np.where(x.HH16 == 1, 1/x.HHSIZE, np.nan))))))) \n", "\n", " #dfm = (dfm.assign(HHCAT=hhcat, HH16=hh16, HHSIZE=hhsize, HEAD=head)\n", " # .drop(['HHCAT', 'HH16', 'HHSIZE', 'HURESPLI', 'HRHTYPE'], axis=1))\n", " \n", " \n", " \n", "#sp1 = df[df.PESPOUSE > 0].groupby('QSTNUM').PESPOUSE.max()\n", "#sp1.name = 'SPOUSE1'\n", "#sp2 = df[df.PESPOUSE > 0].groupby('QSTNUM').PESPOUSE.min()\n", "#sp2.name = 'SPOUSE2'\n", "#df = df.merge(sp1.reset_index()).merge(sp2.reset_index())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }