{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "977e01fb-9600-4466-ad78-7e215d1c0359",
   "metadata": {},
   "source": [
    "# Expectated vs Realized Income Growth in A Standard Life Cycle Model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d62a3a23-f587-4e72-a42b-f876074efd55",
   "metadata": {},
   "source": [
    "This notebook uses the income process in [Cocco, Gomes & Maenhout (2005)](https://academic.oup.com/rfs/article/18/2/491/1599892?login=true) to demonstrate that estimates of a regression of expected income changes on realized income changes are sensitive to the size of transitory shocks.\n",
    "\n",
    "We first load some tools from the [HARK toolkit](https://github.com/econ-ark/HARK)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d4d379f7-1993-47af-912d-f7aa547c3557",
   "metadata": {},
   "outputs": [],
   "source": [
    "import statsmodels.api as sm\n",
    "from linearmodels.panel.model import PanelOLS\n",
    "from HARK.distribution import calc_expectation\n",
    "from HARK.ConsumptionSaving.ConsIndShockModel import (\n",
    "    IndShockConsumerType,\n",
    "    init_lifecycle,\n",
    ")\n",
    "\n",
    "from HARK.Calibration.Income.IncomeTools import (\n",
    "    parse_income_spec,\n",
    "    parse_time_params,\n",
    "    CGM_income,\n",
    ")\n",
    "\n",
    "from HARK.datasets.life_tables.us_ssa.SSATools import parse_ssa_life_table\n",
    "import pandas as pd\n",
    "from copy import copy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "304179a2-d6ec-4ff5-8850-c33c17fdf3d6",
   "metadata": {},
   "source": [
    "We now create a population of agents with the income process of [Cocco, Gomes & Maenhout (2005)](https://academic.oup.com/rfs/article/18/2/491/1599892?login=true), which is implemented as a default calibration in the toolkit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5d2760b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "birth_age = 21\n",
    "death_age = 66\n",
    "adjust_infl_to = 1992\n",
    "income_calib = CGM_income\n",
    "education = \"HS\"\n",
    "\n",
    "# Income specification\n",
    "income_params = parse_income_spec(\n",
    "    age_min=birth_age,\n",
    "    age_max=death_age,\n",
    "    adjust_infl_to=adjust_infl_to,\n",
    "    **income_calib[education],\n",
    "    SabelhausSong=True,\n",
    ")\n",
    "\n",
    "# We need survival probabilities only up to death_age-1, because survival\n",
    "# probability at death_age is 1.\n",
    "liv_prb = parse_ssa_life_table(\n",
    "    female=True, cross_sec=True, year=2004, min_age=birth_age, max_age=death_age - 1\n",
    ")\n",
    "\n",
    "# Parameters related to the number of periods implied by the calibration\n",
    "time_params = parse_time_params(age_birth=birth_age, age_death=death_age)\n",
    "\n",
    "# Update all the new parameters\n",
    "params = copy(init_lifecycle)\n",
    "params.update(time_params)\n",
    "# params.update(dist_params)\n",
    "params.update(income_params)\n",
    "params.update(\n",
    "    {\n",
    "        \"LivPrb\": liv_prb,\n",
    "        \"pLvlInitStd\": 0.0,\n",
    "        \"PermGroFacAgg\": 1.0,\n",
    "        \"UnempPrb\": 0.0,\n",
    "        \"UnempPrbRet\": 0.0,\n",
    "        \"track_vars\": [\"pLvl\", \"t_age\", \"PermShk\", \"TranShk\"],\n",
    "        \"AgentCount\": 200,\n",
    "        \"T_sim\": 500,\n",
    "    }\n",
    ")\n",
    "\n",
    "Agent = IndShockConsumerType(**params)\n",
    "Agent.solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36ec1e8d-024f-4bfa-bdf6-7a6bbe76ba3f",
   "metadata": {},
   "source": [
    "We simulate a population of agents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f5e966f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run the simulations\n",
    "Agent.initialize_sim()\n",
    "Agent.simulate();"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5aba367-bb4e-4cce-8d51-4c19c24afb4e",
   "metadata": {},
   "source": [
    "\n",
    "$\\newcommand{\\Ex}{\\mathbb{E}}$\n",
    "$\\newcommand{\\PermShk}{\\psi}$\n",
    "$\\newcommand{\\pLvl}{\\mathbf{p}}$\n",
    "$\\newcommand{\\pLvl}{P}$\n",
    "$\\newcommand{\\yLvl}{\\mathbf{y}}$\n",
    "$\\newcommand{\\yLvl}{Y}$\n",
    "$\\newcommand{\\PermGroFac}{\\Gamma}$\n",
    "$\\newcommand{\\UnempPrb}{\\wp}$\n",
    "$\\newcommand{\\TranShk}{\\theta}$\n",
    "\n",
    "We assume a standard income process with transitory and permanent shocks:  The consumer's Permanent noncapital income $\\pLvl$ grows by a predictable factor $\\PermGroFac$ and is subject to an unpredictable multiplicative shock $\\Ex_{t}[\\PermShk_{t+1}]=1$,\n",
    "\n",
    "\\begin{eqnarray}\n",
    "\\pLvl_{t+1} & = & \\pLvl_{t} \\PermGroFac_{t+1} \\PermShk_{t+1}, \\notag\n",
    "\\end{eqnarray}\n",
    "and, if the consumer is employed, actual income $Y$ is permanent income multiplied by a transitory shock $\\Ex_{t}[\\TranShk_{t+1}]=1$,\n",
    "\\begin{eqnarray}\n",
    "\\yLvl_{t+1} & = & \\pLvl_{t+1} \\TranShk_{t+1}, \\notag\n",
    "\\end{eqnarray}\n",
    "\n",
    "<!--- There is also a probability $\\UnempPrb$ that the consumer will be temporarily unemployed and experience income of $\\TranShk^{\\large u}  = 0$.  We construct $\\TranShk^{\\large e}$ so that its mean value is $1/(1-\\UnempPrb)$ because in that case the mean level of the transitory shock (accounting for both unemployed and employed states) is exactly\n",
    "\n",
    "\\begin{eqnarray}\n",
    "\\Ex_{t}[\\TranShk_{t+1}] & = & \\TranShk^{\\large{u}}  \\times \\UnempPrb + (1-\\UnempPrb) \\times \\Ex_{t}[\\TranShk^{\\large{e}}_{t+1}] \\notag\n",
    "\\\\ & = & 0 \\times \\UnempPrb + (1-\\UnempPrb) \\times 1/(1-\\UnempPrb)  \\notag\n",
    "\\\\ & = & 1. \\notag\n",
    "\\end{eqnarray}\n",
    "--->\n",
    "\n",
    "$\\Gamma_{t}$ captures the predictable life cycle profile of income growth (faster when young, slower when old).  See [our replication of CGM-2005](https://github.com/econ-ark/CGMPortfolio/blob/master/Code/Python/CGMPortfolio.ipynb) for a detailed account of how these objects map to CGM's notation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d028fadf-2423-4965-a828-3caee7d624bc",
   "metadata": {},
   "source": [
    "Now define $\\newcommand{\\yLog}{y}\\newcommand{\\pLog}{p}\\yLog = \\log \\yLvl,\\pLog=\\log \\pLvl$ and similarly for other variables.\n",
    "\n",
    "Using this notation, we construct all the necessary inputs to the regressors. The main input is the expected income growth of every agent at every time period, which is given by\n",
    "\\begin{equation}\n",
    "\\begin{split}\n",
    "\\Ex_t[\\yLvl_{t+1}/\\yLvl_{t}] &= \\mathbb{E}_t[\\left(\\frac{\\theta_{t+1}\\pLvl_{t} \\PermGroFac_{t+1} \\PermShk_{t+1}}{\\theta_{t}P_{t}}\\right)]\\\\\n",
    " &= \\left(\\frac{\\PermGroFac_{t+1}}{\\theta_{t}}\\right)\\\\\n",
    "\\Ex_t[\\yLog_{t+1} - \\yLog_{t}] & = \\log \\Gamma_{t+1}-\\log \\theta_t\n",
    "\\end{split}\n",
    "\\end{equation}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d0cb9753",
   "metadata": {},
   "outputs": [],
   "source": [
    "exp = [\n",
    "    calc_expectation(Agent.IncShkDstn[i], func=lambda x: x[0] * x[1])\n",
    "    for i in range(Agent.T_cycle)\n",
    "]\n",
    "exp_df = pd.DataFrame(\n",
    "    {\n",
    "        \"exp_prod\": exp,\n",
    "        \"PermGroFac\": Agent.PermGroFac,\n",
    "        \"Age\": [x + birth_age for x in range(Agent.T_cycle)],\n",
    "    }\n",
    ")\n",
    "\n",
    "raw_data = {\n",
    "    \"Age\": Agent.history[\"t_age\"].T.flatten() + birth_age - 1,\n",
    "    \"pLvl\": Agent.history[\"pLvl\"].T.flatten(),\n",
    "    \"PermShk\": Agent.history[\"PermShk\"].T.flatten(),\n",
    "    \"TranShk\": Agent.history[\"TranShk\"].T.flatten(),\n",
    "}\n",
    "\n",
    "Data = pd.DataFrame(raw_data)\n",
    "\n",
    "# Create an individual id\n",
    "Data[\"id\"] = (Data[\"Age\"].diff(1) < 0).cumsum()\n",
    "\n",
    "Data[\"Y\"] = Data.pLvl * Data.TranShk\n",
    "\n",
    "# Find Et[Yt+1 - Yt]\n",
    "Data = Data.join(exp_df.set_index(\"Age\"), on=\"Age\", how=\"left\")\n",
    "Data[\"ExpIncChange\"] = Data[\"pLvl\"] * (\n",
    "    Data[\"PermGroFac\"] * Data[\"exp_prod\"] - Data[\"TranShk\"]\n",
    ")\n",
    "\n",
    "Data[\"Y_change\"] = Data.groupby(\"id\")[\"Y\"].diff(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88a8c5bb-6f88-4d3f-8350-94d9e3a06bfd",
   "metadata": {},
   "source": [
    "A corresponding version of this relationship can be estimated in simulated data:\n",
    "\\begin{equation*}\n",
    "        \\Ex_t[\\Delta y_{i,t+1}] = \\gamma_{0} + \\gamma_{1} \\Delta y_{i,t} + f_i + \\epsilon_{i,t}\n",
    "\\end{equation*}\n",
    "We now estimate an analogous regression in our simulated population."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2729d74b-8c59-40b3-8183-3c0cffc8b388",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                          PanelOLS Estimation Summary                           \n",
      "================================================================================\n",
      "Dep. Variable:                 ExpBin   R-squared:                        0.1955\n",
      "Estimator:                   PanelOLS   R-squared (Between):             -0.0730\n",
      "No. Observations:              100000   R-squared (Within):               0.1955\n",
      "Date:                Fri, Nov 24 2023   R-squared (Overall):              0.1889\n",
      "Time:                        15:15:16   Log-likelihood                -1.288e+05\n",
      "Cov. Estimator:            Unadjusted                                           \n",
      "                                        F-statistic:                   2.371e+04\n",
      "Entities:                        2422   P-value                           0.0000\n",
      "Avg Obs:                       41.288   Distribution:                 F(1,97577)\n",
      "Min Obs:                       2.0000                                           \n",
      "Max Obs:                       46.000   F-statistic (robust):          2.371e+04\n",
      "                                        P-value                           0.0000\n",
      "Time periods:                      45   Distribution:                 F(1,97577)\n",
      "Avg Obs:                       2222.2                                           \n",
      "Min Obs:                       1951.0                                           \n",
      "Max Obs:                       2425.0                                           \n",
      "                                                                                \n",
      "                             Parameter Estimates                              \n",
      "==============================================================================\n",
      "            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI\n",
      "------------------------------------------------------------------------------\n",
      "const          0.1368     0.0028     48.687     0.0000      0.1313      0.1423\n",
      "ChangeBin     -0.4404     0.0029    -153.98     0.0000     -0.4460     -0.4348\n",
      "==============================================================================\n",
      "\n",
      "F-test for Poolability: 1.2900\n",
      "P-value: 0.0000\n",
      "Distribution: F(2421,97577)\n",
      "\n",
      "Included effects: Entity\n"
     ]
    }
   ],
   "source": [
    "Data = Data.set_index([\"id\", \"Age\"])\n",
    "\n",
    "# Create the variables they actually use\n",
    "Data[\"ExpBin\"] = 0\n",
    "Data.loc[Data[\"ExpIncChange\"] > 0, \"ExpBin\"] = 1\n",
    "Data.loc[Data[\"ExpIncChange\"] < 0, \"ExpBin\"] = -1\n",
    "\n",
    "Data[\"ChangeBin\"] = 0\n",
    "Data.loc[Data[\"Y_change\"] > 0, \"ChangeBin\"] = 1\n",
    "Data.loc[Data[\"Y_change\"] < 0, \"ChangeBin\"] = -1\n",
    "\n",
    "mod = PanelOLS(Data.ExpBin, sm.add_constant(Data.ChangeBin), entity_effects=True)\n",
    "fe_res = mod.fit()\n",
    "print(fe_res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7772169e-d8b3-498d-b0e1-2767d03f60b8",
   "metadata": {},
   "source": [
    "The estimated $\\hat{\\gamma}_{1}$ is negative because in usual life-cycle calibrations, transitory shocks are volatile enough that mean reversion of transitory fluctuations is a stronger force than persistent trends in income age-profiles.\n",
    "\n",
    "However, with less volatile transitory shocks, the regression coefficient would be positive. We demonstrate this by shutting off transitory shocks, simulating another population of agents, and re-running the regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "018deabf-56d2-4586-bf51-5973a041a3c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "params_no_transitory = copy(params)\n",
    "params_no_transitory.update({\"TranShkStd\": [0.0] * len(params[\"TranShkStd\"])})\n",
    "\n",
    "# Create agent\n",
    "Agent_nt = IndShockConsumerType(**params_no_transitory)\n",
    "Agent_nt.solve()\n",
    "# Run the simulations\n",
    "Agent_nt.initialize_sim()\n",
    "Agent_nt.simulate();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "534f4132-5125-4194-8971-f88da52cd614",
   "metadata": {},
   "outputs": [],
   "source": [
    "exp = [\n",
    "    calc_expectation(Agent_nt.IncShkDstn[i], func=lambda x: x[0] * x[1])\n",
    "    for i in range(Agent_nt.T_cycle)\n",
    "]\n",
    "exp_df = pd.DataFrame(\n",
    "    {\n",
    "        \"exp_prod\": exp,\n",
    "        \"PermGroFac\": Agent_nt.PermGroFac,\n",
    "        \"Age\": [x + birth_age for x in range(Agent.T_cycle)],\n",
    "    }\n",
    ")\n",
    "\n",
    "raw_data = {\n",
    "    \"Age\": Agent_nt.history[\"t_age\"].T.flatten() + birth_age - 1,\n",
    "    \"pLvl\": Agent_nt.history[\"pLvl\"].T.flatten(),\n",
    "    \"PermShk\": Agent_nt.history[\"PermShk\"].T.flatten(),\n",
    "    \"TranShk\": Agent_nt.history[\"TranShk\"].T.flatten(),\n",
    "}\n",
    "\n",
    "Data = pd.DataFrame(raw_data)\n",
    "\n",
    "# Create an individual id\n",
    "Data[\"id\"] = (Data[\"Age\"].diff(1) < 0).cumsum()\n",
    "\n",
    "Data[\"Y\"] = Data.pLvl * Data.TranShk\n",
    "\n",
    "# Find Et[Yt+1 - Yt]\n",
    "Data = Data.join(exp_df.set_index(\"Age\"), on=\"Age\", how=\"left\")\n",
    "Data[\"ExpIncChange\"] = Data[\"pLvl\"] * (\n",
    "    Data[\"PermGroFac\"] * Data[\"exp_prod\"] - Data[\"TranShk\"]\n",
    ")\n",
    "\n",
    "Data[\"Y_change\"] = Data.groupby(\"id\")[\"Y\"].diff(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cff4ce49-b290-4594-9cb3-6f2c6f1f3eb6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                          PanelOLS Estimation Summary                           \n",
      "================================================================================\n",
      "Dep. Variable:                 ExpBin   R-squared:                        0.0086\n",
      "Estimator:                   PanelOLS   R-squared (Between):             -0.0259\n",
      "No. Observations:              100000   R-squared (Within):               0.0086\n",
      "Date:                Fri, Nov 24 2023   R-squared (Overall):              0.0088\n",
      "Time:                        15:15:18   Log-likelihood                -1.394e+05\n",
      "Cov. Estimator:            Unadjusted                                           \n",
      "                                        F-statistic:                      848.82\n",
      "Entities:                        2422   P-value                           0.0000\n",
      "Avg Obs:                       41.288   Distribution:                 F(1,97577)\n",
      "Min Obs:                       2.0000                                           \n",
      "Max Obs:                       46.000   F-statistic (robust):             848.82\n",
      "                                        P-value                           0.0000\n",
      "Time periods:                      45   Distribution:                 F(1,97577)\n",
      "Avg Obs:                       2222.2                                           \n",
      "Min Obs:                       1951.0                                           \n",
      "Max Obs:                       2425.0                                           \n",
      "                                                                                \n",
      "                             Parameter Estimates                              \n",
      "==============================================================================\n",
      "            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI\n",
      "------------------------------------------------------------------------------\n",
      "const          0.1129     0.0031     36.103     0.0000      0.1067      0.1190\n",
      "ChangeBin      0.0934     0.0032     29.135     0.0000      0.0871      0.0997\n",
      "==============================================================================\n",
      "\n",
      "F-test for Poolability: 1.1591\n",
      "P-value: 0.0000\n",
      "Distribution: F(2421,97577)\n",
      "\n",
      "Included effects: Entity\n"
     ]
    }
   ],
   "source": [
    "# Create variables\n",
    "Data[\"ExpBin\"] = 0\n",
    "Data.loc[Data[\"ExpIncChange\"] > 0, \"ExpBin\"] = 1\n",
    "Data.loc[Data[\"ExpIncChange\"] < 0, \"ExpBin\"] = -1\n",
    "\n",
    "Data[\"ChangeBin\"] = 0\n",
    "Data.loc[Data[\"Y_change\"] > 0, \"ChangeBin\"] = 1\n",
    "Data.loc[Data[\"Y_change\"] < 0, \"ChangeBin\"] = -1\n",
    "\n",
    "Data = Data.set_index([\"id\", \"Age\"])\n",
    "mod = PanelOLS(Data.ExpBin, sm.add_constant(Data.ChangeBin), entity_effects=True)\n",
    "fe_res = mod.fit()\n",
    "print(fe_res)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c828d415-db10-4274-ae35-5efd5c60c178",
   "metadata": {},
   "source": [
    "The estimated $\\hat{\\gamma}_{1}$ when there are no transitory shocks is positive."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "ExecuteTime,collapsed,jupyter,tags,title,-autoscroll",
   "formats": "ipynb,py:percent",
   "notebook_metadata_filter": "all,-widgets,-varInspector"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}