{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "653ea87d",
   "metadata": {},
   "source": [
    "## A Demonstration of the Harmenberg (2021) Aggregation Method\n",
    "\n",
    "   - [\"Aggregating heterogeneous-agent models with permanent income shocks\"](https://doi.org/10.1016/j.jedc.2021.104185)\n",
    "\n",
    "## Authors: [Christopher D. Carroll](http://www.econ2.jhu.edu/people/ccarroll/), [Mateo Velásquez-Giraldo](https://mv77.github.io/)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c03a1161",
   "metadata": {},
   "source": [
    "`# Set Up the Computational Environment: (in JupyterLab, click the dots)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a09f2e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Preliminaries\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from matplotlib import pyplot as plt\n",
    "from copy import deepcopy\n",
    "\n",
    "from HARK.distribution import calc_expectation\n",
    "from HARK.ConsumptionSaving.ConsIndShockModel import (\n",
    "    IndShockConsumerType,\n",
    "    init_idiosyncratic_shocks,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d940f86a",
   "metadata": {},
   "source": [
    "# Description of the problem\n",
    "\n",
    "$\\newcommand{\\pLvl}{\\mathbf{p}}$\n",
    "$\\newcommand{\\mLvl}{\\mathbf{m}}$\n",
    "$\\newcommand{\\mNrm}{m}$\n",
    "$\\newcommand{\\CLvl}{\\mathbf{C}}$\n",
    "$\\newcommand{\\MLvl}{\\mathbf{M}}$\n",
    "$\\newcommand{\\CLvlest}{\\widehat{\\CLvl}}$\n",
    "$\\newcommand{\\MLvlest}{\\widehat{\\MLvl}}$\n",
    "$\\newcommand{\\mpLvlDstn}{\\mu}$\n",
    "$\\newcommand{\\mWgtDstnMarg}{\\tilde{\\mu}^{m}}$\n",
    "$\\newcommand{\\PermGroFac}{\\pmb{\\Phi}}$\n",
    "$\\newcommand{\\PermShk}{\\pmb{\\Psi}}$\n",
    "$\\newcommand{\\def}{:=}$\n",
    "$\\newcommand{\\kernel}{\\Lambda}$\n",
    "$\\newcommand{\\pShkNeutDstn}{\\tilde{f}_{\\PermShk}}$\n",
    "$\\newcommand{\\Ex}{\\mathbb{E}}$\n",
    "$\\newcommand{\\cFunc}{\\mathrm{c}}$\n",
    "$\\newcommand{\\Rfree}{\\mathsf{R}}$\n",
    "\n",
    "Macroeconomic models with heterogeneous agents sometimes incorporate a microeconomic income process with a permanent component ($\\pLvl_t$) that follows a geometric random walk. To find an aggregate characteristic of these economies such as aggregate consumption $\\CLvl_t$, one must integrate over permanent income (and all the other relevant state variables):\n",
    "\\begin{equation*}\n",
    "\\CLvl_t = \\int_{\\pLvl} \\int_{\\mLvl} \\mathrm{c}(\\mLvl,\\pLvl) \\times f_t(\\mLvl,\\pLvl) \\, d \\mLvl\\, d\\pLvl,\n",
    "\\end{equation*}\n",
    "where $\\mLvl$ denotes any other state variables that consumption might depend on, $\\cFunc(\\cdot,\\cdot)$ is the individual consumption function, and $f_t(\\cdot,\\cdot)$ is the joint density function of permanent income and the other state variables at time $t$.\n",
    "\n",
    "Under the usual assumption of Constant Relative Risk Aversion utility and standard assumptions about the budget constraint, [such models are homothetic](https://econ-ark.github.io/BufferStockTheory/BufferStockTheory3.html#The-Problem-Can-Be-Normalized-By-Permanent-Income). This means that for a state variable $\\mLvl$ one can solve for a normalized policy function $\\cFunc(\\cdot)$ such that\n",
    "\\begin{equation*}\n",
    "    \\mathrm{c}(\\mLvl,\\pLvl) = \\mathrm{c}\\left(\\mLvl/\\pLvl\\right)\\times \\pLvl\n",
    "\\end{equation*}\n",
    "\n",
    "\n",
    "In practice, this implies that one can defined a normalized state vector $\\mNrm = \\mLvl/\\pLvl$ and solve for the normalized policy function. This eliminates one dimension of the optimization problem problem, $\\pLvl$.\n",
    "\n",
    "While convenient for the solution of the agents' optimization problem, homotheticity has not simplified our aggregation calculations as we still have\n",
    "\n",
    "\\begin{equation*}\n",
    "\\begin{split}\n",
    "\\CLvl_t =& \\int \\int \\cFunc(\\mLvl,\\pLvl) \\times f_t(\\mLvl,\\pLvl) \\, d\\mLvl\\, d\\pLvl\\\\\n",
    "=& \\int \\int \\cFunc\\left(\\frac{1}{\\pLvl}\\times \\mLvl\\right)\\times \\pLvl \\times f_t(\\mLvl,\\pLvl) \\, d\\mLvl\\, d\\pLvl,\n",
    "\\end{split}\n",
    "\\end{equation*}\n",
    "\n",
    "which depends on $\\pLvl$.\n",
    "\n",
    "To further complicate matters, we usually do not have analytical expressions for $\\cFunc(\\cdot)$ or $f_t(\\mLvl,\\pLvl)$. What we often do in practice is to simulate a population $I$ of agents for a large number of periods $T$ using the model's policy functions and transition equations. The result is a set of observations $\\{\\mLvl_{i,t},\\pLvl_{i,t}\\}_{i\\in I, 0\\leq t\\leq T}$ which we then use to approximate\n",
    "\\begin{equation*}\n",
    "\\CLvl_t \\approx \\frac{1}{|I|}\\sum_{i \\in I} \\cFunc\\left(\\mLvl_{i,t}/\\pLvl_{i,t}\\right)\\times \\pLvl_{i,t}.\n",
    "\\end{equation*}\n",
    "\n",
    "At least two features of the previous strategy are unpleasant:\n",
    "- We have to simulate the distribution of permanent income, even though the model's solution does not depend on it.\n",
    "- As a geometric random walk, permanent income might have an unbounded distribution. Since $\\pLvl_{i,t}$ appears multiplicatively in our approximation, agents with high permanent incomes will be the most important in determining levels of aggregate variables. Therefore, it is important for our simulated population to achieve a good approximation of the distribution of permanent income among the small number of agents with very high permanent income, which will require us to use many agents (large $I$, requiring considerable computational resources).\n",
    "\n",
    "[Harmenberg (2021)](https://www.sciencedirect.com/science/article/pii/S0165188921001202?via%3Dihub) solves both problems. His solution constructs a distribution $\\tilde{f}(\\cdot)$ of the normalized state vector that he calls **the permanent-income-weighted distribution** and which has the convenient property that\n",
    "\\begin{equation*}\n",
    "\\begin{split}\n",
    "\\CLvl_t =& \\int \\int \\cFunc\\left(\\frac{1}{\\pLvl}\\times \\mLvl\\right)\\times \\pLvl \\times f_t(\\mLvl,\\pLvl) \\, d\\mLvl\\, d\\pLvl\\\\\n",
    "=& \\int \\cFunc\\left(\\mNrm\\right) \\times \\tilde{f}(\\mNrm) \\, d\\mNrm.\n",
    "\\end{split}\n",
    "\\end{equation*}\n",
    "\n",
    "Therefore, his solution allows us to calculate aggregate variables without the need to keep track of the distribution of permanent income. Additionally, the method eliminates the issue of a small number of agents in the tail having an outsized influence in our approximation and this makes it much more precise.\n",
    "\n",
    "This notebook briefly describes Harmenberg's method and demonstrates its implementation in the HARK toolkit."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41ec855e",
   "metadata": {},
   "source": [
    "# Description of the method\n",
    "\n",
    "To illustrate Harmenberg's idea, consider a [buffer stock saving](https://econ-ark.github.io/BufferStockTheory) model in which:\n",
    "- The individual agent's problem has two state variables:\n",
    "    - Market resources $\\mLvl_{i,t}$.\n",
    "    - Permanent income $\\pLvl_{i,t}$.\n",
    "\n",
    "- The agent's problem is homothetic in permanent income, so that we can define $m_t = \\mLvl_t/\\pLvl_t$ and find a normalized policy function $\\cFunc(\\cdot)$ such that\n",
    "\\begin{equation*}\n",
    "\\cFunc(\\mNrm) \\times \\pLvl_t = \\cFunc(\\mLvl_t, \\pLvl_t) \\,\\,\\qquad \\forall(\\mLvl_t, \\pLvl_t)\n",
    "\\end{equation*}\n",
    "where $\\cFunc(\\cdot,\\cdot)$ is the optimal consumption function.\n",
    "\n",
    "- $\\pLvl_t$ evolves according to $$\\pLvl_{t+1} = \\PermGroFac \\PermShk_{t+1} \\pLvl_t,$$ where $\\PermShk_{t+1}$ is a shock with density function $f_{\\PermShk}(\\cdot)$ satisfying $\\Ex_t[\\PermShk_{t+1}] = 1$.\n",
    "\n",
    "To compute aggregate consumption $\\CLvl_t$ in this model, we would follow the approach from above\n",
    "\\begin{equation*}\n",
    "\\CLvl_t = \\int \\int \\cFunc(\\mNrm)\\times\\pLvl \\times \\mpLvlDstn_t(\\mNrm,\\pLvl) \\, d\\mNrm \\, d\\pLvl,\n",
    "\\end{equation*}\n",
    "where $\\mpLvlDstn_t(\\mNrm,\\pLvl)$ is the measure of agents with normalized resources $\\mNrm$ and permanent income $\\pLvl$.\n",
    "\n",
    "## First insight\n",
    "\n",
    "The first of Harmenberg's insights is that the previous integral can be rearranged as\n",
    "\\begin{equation*}\n",
    "\\CLvl_t = \\int_{\\mNrm} \\cFunc(\\mNrm)\\left(\\int \\pLvl \\times \\mpLvlDstn_t(\\mNrm,\\pLvl) \\, d\\pLvl\\right) \\, d\\mNrm.\n",
    "\\end{equation*}\n",
    "The inner integral, $\\int_{\\pLvl} \\pLvl \\times \\mpLvlDstn_t(\\mNrm,\\pLvl) \\, d\\pLvl$, is a function of $\\mNrm$ and it measures *the total amount of permanent income accruing to agents with normalized market resources of* $\\mNrm$. De-trending this object by the deterministic component of growth in permanent income $\\PermGroFac$, Harmenberg defines the *permanent-income-weighted distribution* $\\mWgtDstnMarg(\\cdot)$ as\n",
    "\n",
    "\\begin{equation*}\n",
    "\\mWgtDstnMarg_{t}(\\mNrm) \\def \\PermGroFac^{-t}\\int_{\\pLvl} \\pLvl \\times \\mpLvlDstn_t(\\mNrm,\\pLvl) \\, d\\pLvl.\n",
    "\\end{equation*}\n",
    "\n",
    "\n",
    "The definition allows us to rewrite\n",
    "\\begin{equation}\\label{eq:aggC}\n",
    "\\CLvl_{t} = \\PermGroFac^t \\int_{m} \\cFunc(\\mNrm) \\times \\mWgtDstnMarg_t(\\mNrm) \\, dm.\n",
    "\\end{equation}\n",
    "\n",
    "There are no computational advances yet: We have merely hidden the joint distribution of $(\\mNrm,\\pLvl)$ inside the $\\mWgtDstnMarg$ object we have defined. This helps us notice that $\\mWgtDstnMarg$ is the only object besides the solution that we need in order to compute aggregate consumption. But we still have no practial way of computing or approximating $\\mWgtDstnMarg$.\n",
    "\n",
    "## Second insight\n",
    "\n",
    "Harmenberg's second insight produces a simple way of generating simulated counterparts of $\\mWgtDstnMarg$ without having to simulate permanent incomes.\n",
    "\n",
    "We start with the density function of $\\mNrm_{t+1}$ given $\\mNrm_t$ and $\\PermShk_{t+1}$, $\\kernel(\\mNrm_{t+1}|\\mNrm_t,\\PermShk_{t+1})$. This density will depend on the model's transition equations and draws of random variables like transitory shocks to income in $t+1$ or random returns to savings between $t$ and $t+1$. If we can simulate those things, then we can sample from $\\kernel(\\cdot|\\mNrm_t,\\PermShk_{t+1})$.\n",
    "\n",
    "Harmenberg shows that\n",
    "\\begin{equation*}\\label{eq:transition}\n",
    "\\texttt{transition:    }\\mWgtDstnMarg_{t+1}(\\mNrm_{t+1}) = \\int \\kernel(\\mNrm_{t+1}|\\mNrm_t, \\PermShk_t) \\pShkNeutDstn(\\PermShk_{t+1}) \\mWgtDstnMarg_t(\\mNrm_t)\\, d\\mNrm_t\\, d\\PermShk_{t+1},\n",
    "\\end{equation*}\n",
    "where $\\pShkNeutDstn$ is an altered density function for the permanent income shocks $\\PermShk$, which he calls the *permanent-income-neutral* measure, and which relates to the original density $f_{\\PermShk}$ through $$\\pShkNeutDstn(\\PermShk_{t+1})\\def \\PermShk_{t+1}f_{\\PermShk}(\\PermShk_{t+1})\\,\\,\\, \\forall \\PermShk_{t+1}.$$\n",
    "\n",
    "What's remarkable about this equation is that it gives us a way to obtain a distribution $\\mWgtDstnMarg_{t+1}$ from $\\mWgtDstnMarg_t$:\n",
    "- Start with a population whose $\\mNrm$ is distributed according to $\\mWgtDstnMarg_t$.\n",
    "- Give that population permanent income shocks with distribution $\\pShkNeutDstn$.\n",
    "- Apply the transition equations and other shocks of the model to obtain $\\mNrm_{t+1}$ from $\\mNrm_{t}$ and $\\PermShk_{t+1}$ for every agent.\n",
    "- The distribution of $\\mNrm$ across the resulting population will be $\\mWgtDstnMarg_{t+1}$.\n",
    "\n",
    "Notice that the only change in these steps from what how we would usually simulate the model is that we now draw permanent income shocks from $\\pShkNeutDstn$ instead of $f_{\\PermShk}$. Therefore, with this procedure we can approximate $\\mWgtDstnMarg_t$ and compute aggregates using formulas like the equation `transition`, all without tracking permanent income and with few changes to the code we use to simulate the model."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fb35e3a",
   "metadata": {},
   "source": [
    "# Harmenberg's method in HARK\n",
    "\n",
    "Harmenberg's method for simulating under the permanent-income-neutral measure is available in [HARK's `IndShockConsumerType` class](https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsIndShockModel.py) and the (many) models that inherit its income process, such as [`PortfolioConsumerType`](https://github.com/econ-ark/HARK/blob/master/HARK/ConsumptionSaving/ConsPortfolioModel.py).\n",
    "\n",
    "As the cell below illustrates, using Harmenberg's method in [HARK](https://github.com/econ-ark/HARK) simply requires setting an agent's property `agent.neutral_measure = True` and then computing the discrete approximation to the income process. After these steps, `agent.simulate` will simulate the model using Harmenberg's permanent-income-neutral measure."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c2b8ea9",
   "metadata": {},
   "source": [
    "`# Implementation in HARK:`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a4445de-0e32-4cb0-b2ec-5c9b617e8d31",
   "metadata": {},
   "source": [
    "#### Farther down in the notebook, code like this solves the standard model:\n",
    "\n",
    "```python\n",
    "# Create a population with the default parametrization\n",
    "\n",
    "popn = IndShockConsumerType(**params)\n",
    "\n",
    "# Specify which variables to track in the simulation\n",
    "popn.track_vars=[\n",
    "    'mNrm',  # mLvl normalized by permanent income (mLvl = market resources)\n",
    "    'cNrm',  # cLvl normalized by permanent income (cLvl = consumption)\n",
    "    'pLvl']  # pLvl: permanent income\n",
    "\n",
    "popn.cycles = 0  # No life cycles -- an infinite horizon\n",
    "\n",
    "# Solve for the consumption function\n",
    "popn.solve()\n",
    "\n",
    "# Simulate under the base measure\n",
    "popn.initialize_sim()\n",
    "popn.simulate()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "272bef2d-7f8e-48f2-b6d2-4617e665a721",
   "metadata": {},
   "source": [
    "#### Later, code like this simulates using the permanent-income-neutral measure\n",
    "```python\n",
    "# Harmenberg permanent-income-neutral simulation\n",
    "\n",
    "# Make a clone of the population weighted solution\n",
    "ntrl = deepcopy(popn)\n",
    "\n",
    "# Change the income process to use the neutral measure\n",
    "\n",
    "ntrl.neutral_measure = True\n",
    "ntrl.update_income_process()\n",
    "\n",
    "# Simulate\n",
    "ntrl.initialize_sim()\n",
    "ntrl.simulate()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4ecbd12",
   "metadata": {},
   "source": [
    "All we had to do differently to simulate using the permanent-income-neutral measure was to set the agent's property `neutral_measure=True`.\n",
    "\n",
    "This is implemented when the function `update_income_process` re-constructs the agent's income process. The specific lines that achieve the change of measure in HARK are in [this link](https://github.com/econ-ark/HARK/blob/760df611a6ec2ff147d00b7d866dbab6fc4e18a1/HARK/ConsumptionSaving/ConsIndShockModel.py#L2734-L2735), or reproduced here:\n",
    "\n",
    "```python\n",
    "if self.neutral_measure == True:\n",
    "    PermShkDstn_t.pmv = PermShkDstn_t.atoms*PermShkDstn_t.pmv\n",
    "```\n",
    "\n",
    "Simple!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a4de660",
   "metadata": {},
   "source": [
    "# The efficiency gain from using Harmenberg's method\n",
    "\n",
    "To demonstrate the gain in efficiency from using Harmenberg's method, we will set up the following experiment.\n",
    "\n",
    "Consider an economy populated by [Buffer-Stock](https://econ-ark.github.io/BufferStockTheory/) savers, whose individual-level state variables are market resources $\\mLvl_t$ and permanent income $\\pLvl_t$. Such agents have a [homothetic consumption function](https://econ-ark.github.io/BufferStockTheory/#The-Problem-Can-Be-Normalized-By-Permanent-Income), so that we can define normalized market resources $\\mNrm_t \\def \\mLvl_t / \\pLvl_t$, solve for a normalized consumption function $\\cFunc(\\cdot)$, and express the consumption function as $\\cFunc(\\mLvl,\\pLvl) = \\cFunc(\\mNrm)\\times\\pLvl$.\n",
    "\n",
    "Assume further that mortality, impatience, and permanent income growth are such that the economy converges to stable joint distribution of $\\mNrm$ and $\\pLvl$ characterized by the density function $f(\\cdot,\\cdot)$. Under these conditions, define the stable level of aggregate market resources and consumption as\n",
    "\\begin{equation}\n",
    "    \\MLvl \\def \\int \\int \\mNrm \\times \\pLvl \\times f(\\mNrm, \\pLvl)\\,d\\mNrm \\,d\\pLvl, \\,\\,\\,    \\CLvl \\def \\int \\int \\cFunc(\\mNrm) \\times \\pLvl \\times f(\\mNrm, \\pLvl)\\,d\\mNrm \\,d\\pLvl.\n",
    "\\end{equation}\n",
    "\n",
    "If we could simulate the economy with a continuum of agents we would find that, over time, our estimate of aggregate market resources $\\MLvlest_t$ would converge to $\\MLvl$ and $\\CLvlest_t$ would converge to $\\CLvl$. Therefore, if we computed our aggregate estimates at different periods in time we would find them to be close:\n",
    "\\begin{equation*}\n",
    "    \\MLvlest_t \\approx \\MLvlest_{t+n} \\approx \\MLvl \\,\\,\n",
    "    \\text{and} \\,\\,\n",
    "    \\CLvlest_t \\approx \\CLvlest_{t+n} \\approx \\CLvl, \\,\\,\n",
    "    \\text{for } n>0 \\text{ and } t \\text{ large enough}.\n",
    "\\end{equation*}\n",
    "\n",
    "In practice, however, we rely on approximations using a finite number of agents $I$. Our estimates of aggregate market resources and consumption at time $t$ are\n",
    "\\begin{equation}\n",
    "\\MLvlest_t \\def \\frac{1}{I} \\sum_{i=1}^{I} m_{i,t}\\times\\pLvl_{i,t}, \\,\\,\\, \\CLvlest_t \\def \\frac{1}{I} \\sum_{i=1}^{I} \\cFunc(m_{i,t})\\times\\pLvl_{i,t},\n",
    "\\end{equation}\n",
    "\n",
    "under the basic simulation strategy or\n",
    "\n",
    "\\begin{equation}\n",
    "\\MLvlest_t \\def \\frac{1}{I} \\sum_{i=1}^{I} \\tilde{m}_{i,t}, \\,\\,\\, \\CLvlest_t \\def \\frac{1}{I} \\sum_{i=1}^{I} \\cFunc(\\tilde{m}_{i,t}),\n",
    "\\end{equation}\n",
    "\n",
    "if we use Harmenberg's method to simulate the distribution of normalized market resources under the permanent-income neutral measure.\n",
    "\n",
    "If we do not use enough agents, our distributions of agents over state variables will be noisy at approximating their continuous counterparts. Additionally, they will depend on the sequences of shocks that the agents receive. With a finite sample, the stochasticity of the draws will cause fluctuations in $\\MLvlest_t$ and $\\CLvlest_t$. Therefore an informal way to measure the precision of our approximations is to examine the amplitude of these fluctuations.\n",
    "\n",
    "First, some setup.\n",
    "1. Simulate the economy for a sufficiently long \"burn in\" time $T_0$.\n",
    "2. Sample our aggregate estimates at regular intervals after $T_0$. Letting the sampling times be $\\mathcal{T}\\def \\{T_0 + \\Delta t\\times n\\}_{n=0,1,...,N}$, obtain $\\{\\MLvlest_t\\}_{t\\in\\mathcal{T}}$ and $\\{\\CLvlest_t\\}_{t\\in\\mathcal{T}}$.\n",
    "3. Compute the variance of approximation samples $\\text{Var}\\left(\\{\\MLvlest_t\\}_{t\\in\\mathcal{T}}\\right)$ and $\\text{Var}\\left(\\{\\CLvlest_t\\}_{t\\in\\mathcal{T}}\\right)$.\n",
    "    - Other measures of uncertainty (like standard deviation) could also be computed\n",
    "    - But variance is the natural choice [because it is closely related to expected welfare](http://www.econ2.jhu.edu/people/ccarroll/papers/candcwithstickye/#Utility-Costs-Of-Sticky-Expectations)\n",
    "\n",
    "We will now perform exactly this exercise, examining the fluctuations in aggregates when they are approximated using the basic simulation strategy and Harmenberg's permanent-income-neutral measure. Since each approximation can be made arbitrarily good by increasing the number of agents it uses, we will examine the variances of aggregates for various sample sizes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a12309c",
   "metadata": {},
   "source": [
    "`# Setup computational environment:`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0d51173",
   "metadata": {},
   "outputs": [],
   "source": [
    "# How long to run the economies without sampling? T_0\n",
    "# Because we start the population at mBalLvl which turns out to be close\n",
    "# to MBalLvl so we don't need a long burn in period\n",
    "burn_in = 200\n",
    "# Fixed intervals between sampling aggregates, Δt\n",
    "sample_every = 1  # periods - increase this if worried about serial correlation\n",
    "# How many times to sample the aggregates? n\n",
    "n_sample = 200  # times; minimum\n",
    "\n",
    "# Create a vector with all the times at which we'll sample\n",
    "sample_periods_lvl = np.arange(\n",
    "    start=burn_in, stop=burn_in + sample_every * n_sample, step=sample_every, dtype=int\n",
    ")\n",
    "# Corresponding periods when object is first difference not level\n",
    "sample_periods_dff = np.arange(\n",
    "    start=burn_in,\n",
    "    stop=burn_in + sample_every * n_sample - 1,  # 1 fewer diff\n",
    "    step=sample_every,\n",
    "    dtype=int,\n",
    ")\n",
    "\n",
    "# Maximum number of agents that we will use for our approximations\n",
    "max_agents = 100000\n",
    "# Minimum number of agents for comparing methods in plots\n",
    "min_agents = 100"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2827536",
   "metadata": {},
   "source": [
    "`# Define tool to calculate summary statistics:`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91221385",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now create a function that takes HARK's simulation output\n",
    "# and computes all the summary statistics we need\n",
    "\n",
    "\n",
    "def sumstats(sims, sample_periods):\n",
    "    # sims will be an array in the shape of [economy].history elements\n",
    "    # Columns are different agents and rows are different times.\n",
    "\n",
    "    # Subset the times at which we'll sample and transpose.\n",
    "    samples_lvl = pd.DataFrame(sims[sample_periods,].T)\n",
    "\n",
    "    # Get averages over agents. This will tell us what our\n",
    "    # aggregate estimate would be if we had each possible sim size\n",
    "    avgs_lvl = samples_lvl.expanding(1).mean()\n",
    "\n",
    "    # Now get the mean and standard deviations across time with\n",
    "    # every number of agents\n",
    "    mean_lvl = avgs_lvl.mean(axis=1)\n",
    "    vars_lvl = avgs_lvl.std(axis=1) ** 2\n",
    "\n",
    "    # Also return the full sample on the last simulation period\n",
    "    return {\n",
    "        \"mean_lvl\": mean_lvl,\n",
    "        \"vars_lvl\": vars_lvl,\n",
    "        \"dist_last\": sims[-1,],\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1b63dbd",
   "metadata": {},
   "source": [
    "We now configure and solve a buffer-stock agent with a default parametrization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "022940b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create and solve agent\n",
    "\n",
    "popn = IndShockConsumerType(**init_idiosyncratic_shocks)\n",
    "\n",
    "# Modify default parameters\n",
    "popn.T_sim = max(sample_periods_lvl) + 1\n",
    "popn.AgentCount = max_agents\n",
    "popn.track_vars = [\"mNrm\", \"cNrm\", \"pLvl\"]\n",
    "popn.LivPrb = [1.0]\n",
    "popn.cycles = 0\n",
    "\n",
    "# Solve (but do not yet simulate)\n",
    "popn.solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5b545bb",
   "metadata": {},
   "source": [
    "Under the basic simulation strategy, we have to de-normalize market resources and consumption multiplying them by permanent income. Only then we construct our statistics of interest.\n",
    "\n",
    "Note that our time-sampling strategy requires that, after enough time has passed, the economy settles on a stable distribution of its agents across states. How can we know this will be the case? [Szeidl (2013)](http://www.personal.ceu.hu/staff/Adam_Szeidl/papers/invariant.pdf) and [Harmenberg (2021)](https://www.sciencedirect.com/science/article/pii/S0165188921001202?via%3Dihub) provide conditions that can give us some reassurance.$\\newcommand{\\Rfree}{\\mathsf{R}}$\n",
    "\n",
    "1. [Szeidl (2013)](http://www.personal.ceu.hu/staff/Adam_Szeidl/papers/invariant.pdf) shows that if $$\\log \\left[\\frac{(\\Rfree\\beta)^{1/\\rho}}{\\PermGroFac}\n",
    "\\right] < \\Ex[\\log \\PermShk],$$ then there is a stable invariant distribution of normalized market resources $\\mNrm$.\n",
    "2. [Harmenberg (2021)](https://www.sciencedirect.com/science/article/pii/S0165188921001202?via%3Dihub) repurposes the Szeidl proof to argue that if the same condition is satisfied when the expectation is taken with respect to the permanent-income-neutral measure ($\\pShkNeutDstn$), then there is a stable invariant permanent-income-weighted distribution ($\\mWgtDstnMarg$)\n",
    "\n",
    "We now check both conditions with our parametrization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc204d95",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Two versions for different HARK versions\n",
    "try:  # This works with HARK 2.0 pre-alpha\n",
    "    Bilt = popn.solution[0].Bilt\n",
    "    APFac = Bilt.APFac\n",
    "    GPFacRaw = Bilt.GPFacRaw\n",
    "    soln = popn.solution[0]\n",
    "    soln.check_GICSdl(soln, quietly=False)\n",
    "    soln.check_GICHrm(soln, quietly=False)\n",
    "except:  # This one for HARK 0.12 or later\n",
    "    # APFac = popn.thorn\n",
    "    GPFacRaw = popn.GPFRaw\n",
    "    e_log_PermShk_popn = calc_expectation(popn.PermShkDstn[0], func=lambda x: np.log(x))\n",
    "    e_log_PermShk_ntrl = calc_expectation(\n",
    "        popn.PermShkDstn[0], func=lambda x: x * np.log(x)\n",
    "    )\n",
    "    szeidl_cond = np.log(GPFacRaw) < e_log_PermShk_popn\n",
    "    harmen_cond = np.log(GPFacRaw) < e_log_PermShk_ntrl\n",
    "    if szeidl_cond:\n",
    "        print(\n",
    "            \"Szeidl's condition is satisfied, there is a stable invariant distribution of normalized market resources\"\n",
    "        )\n",
    "    else:\n",
    "        print(\"Warning: Szeidl's condition is not satisfied\")\n",
    "    if harmen_cond:\n",
    "        print(\n",
    "            \"Harmenberg's condition is satisfied, there is a stable invariant permanent-income-weighted distribution\"\n",
    "        )\n",
    "    else:\n",
    "        print(\"Warning: Harmenberg's condition is not satisfied\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a306679",
   "metadata": {},
   "source": [
    "Knowing that the conditions are satisfied, we are ready to perform our experiments.\n",
    "\n",
    "First, we simulate using the traditional approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "048daf16-208f-4741-b367-e7ab14ab00d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Base simulation\n",
    "\n",
    "# Start assets at m balanced growth (levels) point\n",
    "try:  # Accommodate syntax for old and new versions of HARK\n",
    "    Bilt = popn.solution[0].Bilt\n",
    "    popn.aNrmInitMean = np.log(Bilt.mBalLvl - 1)\n",
    "except:\n",
    "    popn.aNrmInitMean = np.log(popn.solution[0].mNrmStE - 1)\n",
    "\n",
    "popn.aNrmInitStd = 0.0\n",
    "\n",
    "popn.initialize_sim()\n",
    "popn.simulate()\n",
    "\n",
    "# Retrieve history\n",
    "mNrm_popn = popn.history[\"mNrm\"]\n",
    "mLvl_popn = popn.history[\"mNrm\"] * popn.history[\"pLvl\"]\n",
    "cLvl_popn = popn.history[\"cNrm\"] * popn.history[\"pLvl\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78ba82a7",
   "metadata": {},
   "source": [
    "Update and simulate using Harmenberg's strategy. This time, not multiplying by permanent income."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7bf55cf3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Harmenberg permanent income neutral simulation\n",
    "\n",
    "# Start by duplicating the previous setup\n",
    "ntrl = deepcopy(popn)\n",
    "\n",
    "# Recompute income process to use neutral measure\n",
    "ntrl.neutral_measure = True\n",
    "ntrl.update_income_process()\n",
    "\n",
    "ntrl.initialize_sim()\n",
    "ntrl.simulate()\n",
    "\n",
    "# Retrieve history\n",
    "cLvl_ntrl = ntrl.history[\"cNrm\"]\n",
    "mLvl_ntrl = ntrl.history[\"mNrm\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a84d07e",
   "metadata": {},
   "source": [
    "# Now Compare the Variances of Simulated Outcomes\n",
    "\n",
    "Harmenberg (2021) and Szeidl (2013) prove that with an infinite population size, models of this kind will have constant and identical growth rates of aggregate consumption, market resources, and noncapital income.\n",
    "\n",
    "A method of comparing the efficiency of the two methods is therefore to calculate the variance of the simulated aggregate variables, and see how many agents must be simulated using each of them in order to achieve a given variance.  (An infinite number of agents would be required to achieve zero variance).\n",
    "\n",
    "The plots below show the (logs of) the estimated variances for the two methods as a function of the (logs of) the number of agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "499e0e33",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plots\n",
    "\n",
    "# Construct aggregate levels and growth rates\n",
    "# Cumulate levels and divide by number of agents to get average\n",
    "CAvg_popn = np.cumsum(cLvl_popn, axis=1) / np.arange(1, max_agents + 1)\n",
    "CAvg_ntrl = np.cumsum(cLvl_ntrl, axis=1) / np.arange(1, max_agents + 1)\n",
    "MAvg_popn = np.cumsum(mLvl_popn, axis=1) / np.arange(1, max_agents + 1)\n",
    "MAvg_ntrl = np.cumsum(mLvl_ntrl, axis=1) / np.arange(1, max_agents + 1)\n",
    "# First difference the logs to get aggregate growth rates\n",
    "CGro_popn = np.diff(np.log(CAvg_popn).T).T\n",
    "CGro_ntrl = np.diff(np.log(CAvg_ntrl).T).T\n",
    "MGro_popn = np.diff(np.log(MAvg_popn).T).T\n",
    "MGro_ntrl = np.diff(np.log(MAvg_ntrl).T).T\n",
    "# Calculate statistics for them\n",
    "CGro_popn_stats = sumstats(CGro_popn, sample_periods_dff)\n",
    "CGro_ntrl_stats = sumstats(CGro_ntrl, sample_periods_dff)\n",
    "MGro_popn_stats = sumstats(MGro_popn, sample_periods_dff)\n",
    "MGro_ntrl_stats = sumstats(MGro_ntrl, sample_periods_dff)\n",
    "\n",
    "# Count the agents\n",
    "nagents = np.arange(1, max_agents + 1, 1)\n",
    "\n",
    "# Plot\n",
    "fig, axs = plt.subplots(2, figsize=(10, 7), constrained_layout=True)\n",
    "fig.suptitle(\"Variances of Aggregate Growth Rates\", fontsize=16)\n",
    "axs[0].plot(\n",
    "    nagents[min_agents:],\n",
    "    np.array(CGro_popn_stats[\"vars_lvl\"])[min_agents:],\n",
    "    label=\"Base\",\n",
    ")\n",
    "axs[0].plot(\n",
    "    nagents[min_agents:],\n",
    "    np.array(CGro_ntrl_stats[\"vars_lvl\"])[min_agents:],\n",
    "    label=\"Perm. Inc. Neutral\",\n",
    ")\n",
    "axs[0].set_yscale(\"log\")\n",
    "axs[0].set_xscale(\"log\")\n",
    "axs[0].set_title(\"Consumption\", fontsize=14)\n",
    "axs[0].set_ylabel(\n",
    "    r\"$\\Delta \\log \\left(\\{\\hat{C}_t\\}_{t\\in\\mathcal{T}}\\right)$\", fontsize=14\n",
    ")\n",
    "axs[0].set_xlabel(\"Number of Agents\", fontsize=10)\n",
    "axs[0].grid()\n",
    "axs[0].legend(fontsize=12)\n",
    "\n",
    "axs[1].plot(\n",
    "    nagents[min_agents:],\n",
    "    np.array(MGro_popn_stats[\"vars_lvl\"])[min_agents:],\n",
    "    label=\"Base\",\n",
    ")\n",
    "axs[1].plot(\n",
    "    nagents[min_agents:],\n",
    "    np.array(MGro_ntrl_stats[\"vars_lvl\"])[min_agents:],\n",
    "    label=\"Perm. Inc. Neutral\",\n",
    ")\n",
    "axs[1].set_yscale(\"log\")\n",
    "axs[1].set_xscale(\"log\")\n",
    "axs[1].set_title(\"Market Resources\", fontsize=14)\n",
    "axs[1].set_ylabel(\n",
    "    r\"$\\Delta \\log \\left(\\{\\hat{M}_t\\}_{t\\in\\mathcal{T}}\\right)$\", fontsize=14\n",
    ")\n",
    "axs[1].set_xlabel(\"Number of Agents\", fontsize=10)\n",
    "axs[1].grid()\n",
    "axs[1].legend(fontsize=12)\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbb44a41",
   "metadata": {},
   "source": [
    "# Harmenberg's Method Produces Large Gains in Efficiency\n",
    "\n",
    "The number of agents required to achieve a given variance is revealed by choosing that variance and then finding the points on the horizontal axis that correspond to the two methods.\n",
    "\n",
    "The upper variance plot shows that the efficiency gains are very large for consumption: The horizontal gap between the two loci is generally more than two orders of magnitude.  That is, Harmenberg's method requires less than **one-hundredth** as many agents as the standard method would require for a given precision.  Alternatively, for a given number of agents it is typically more than 10 times as precise.\n",
    "\n",
    "The improvement in variance is smaller for market resources, likely because in a buffer stock model the point of consumers' actions is to use assets to absorb shocks.  But even for $\\MLvl$, the Harmenberg method attains any given level of precision ($\\text{var}\\left(\\{\\MLvlest_t\\}_{t\\in\\mathcal{T}}\\right)$) with roughly **one tenth** of the agents needed by the standard method to achieve that same level.\n",
    "\n",
    "Of course, these results apply only to the particular configuration of parameter values that is the default in the HARK toolkit (but which were chosen long before Harmenberg developed his method).  The degree of improvement will vary depending on the calibration -- for example, if the magnitude of permanent shocks is small or zero, the method will yield little or no improvement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "329e8bbf-d6ed-4c3e-a416-4dc71c82ade6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Execute the line below to see that there's little drift from mBalLvl as starting point\n",
    "# (after setting burn_in to zero above).  This means burn_in does not need to be large:\n",
    "\n",
    "# plt.plot(np.arange(1,len(np.mean(MAvg_ntrl,axis=1))+1),np.mean(MAvg_ntrl,axis=1).T)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "ExecuteTime,collapsed,title,code_folding,tags,incorrectly_encoded_metadata,jp-MarkdownHeadingCollapsed,-autoscroll",
   "encoding": "# -*- coding: utf-8 -*-",
   "formats": "ipynb,py:percent",
   "notebook_metadata_filter": "all,-widgets,-varInspector"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}