{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setup model and install packages\n", "**Note:** This package is a proof of concept. While the code will remain working with proper use of a [Julia manifest](https://pkgdocs.julialang.org/v1/), we can't guarantee ongoing support and maintenance.\n", "\n", "Remove comment below to install the exact version of packages listed in the manifest." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "# import Pkg; Pkg.instantiate()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The core package is `DifferentiableStateSpaceModels.jl` where `DifferenceEquations.jl` provides compatible functionality for simulating and calculating likelihoods" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]\n", "└ @ Base loading.jl:1423\n" ] } ], "source": [ "using DifferentiableStateSpaceModels, DifferenceEquations, LinearAlgebra, Zygote, Distributions, Plots, DiffEqBase, Symbolics, BenchmarkTools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model Class\n", "The model follows [Schmitt-Grohe and Uribe (2004)](http://www.columbia.edu/~mu2166/2nd_order/2nd_order.pdf) timing convention. The system takes a nonlinear expectational difference equation including all first-order conditions for decisions and the system evolution equations,\n", "\n", "$$\n", "\\mathbb{E}_{t}\\mathcal{H}\\left(y',y,x',x;p\\right)=0\n", "$$\n", "where $y$ are the control variables, $x$ are the states, and $p$ is a vector of deep parameters of interest. Expectations are taken over forward-looking variables and an underlying random process $\\epsilon'$.\n", "\n", "In addition, we consider an observation equation - which might be noisy, for\n", "$$\n", "z = Q \\cdot \\begin{bmatrix}y &x\\end{bmatrix}^{\\top} + \\nu\n", "$$\n", "where $\\nu$ may or may not be normally distributed but $\\mathbb{E}(\\nu) = 0$ and $\\mathbb{V}(\\nu) = \\Omega(p) \\Omega(p)^{\\top}$.\n", "\n", "Assume that there is a non-stochastic steady state of this problem as $y_{ss}, x_{ss}$.\n", "\n", "## Perturbation Solution\n", "Define the deviation from the non-stochastic steady state as $\\hat{x} \\equiv x - x_{ss}, \\hat{y} \\equiv y - y_{ss},$ and $\\hat{z} \\equiv z - z_{ss}$.\n", "\n", "The solution finds the first or second order perturbation around that non-stochastic steady state, and yields\n", "\n", "$$\n", "x' = h(x; p) + \\eta \\ \\Gamma(p)\\ \\epsilon'\n", "$$\n", "\n", "where $\\eta$ describes how shocks affect the law of motion and $\\mathbb{E}(\\epsilon') = 0$. Frequently this would be organized such that $\\mathbb{V}(\\epsilon)= I$, but that is not required. In addition, it could instead be interpreted as for $x' = h(x; p) + \\eta \\ \\epsilon'$ with $\\mathbb{V}(\\epsilon') = \\Gamma(p) \\Gamma(p)^{\\top}$.\n", "\n", "and with the policy equation,\n", "\n", "$$\n", "y = g(x; p)\n", "$$\n", "\n", "and finally, substitution in for the observation equation\n", "\n", "$$\n", "z= Q \\begin{bmatrix} g(x;p) \\\\ x\\end{bmatrix} + \\nu\n", "$$\n", "\n", "## First Order Solutions\n", "Perturbation approximates the above model equations, where $h$ and $g$ are not available explicitly, by a Taylor expansion around the steady state. For example, in the case of the 1st order model the solution finds\n", "\n", "$$\n", "\\hat{x}' = A(p)\\ \\hat{x} + B(p) \\epsilon'\n", "$$\n", "\n", "and\n", "\n", "$$\n", "\\hat{y} = g_x(p) \\ \\hat{x}\n", "$$\n", "\n", "and \n", "\n", "$$\n", "\\hat{z} = C(p)\\ \\hat{x} + \\nu\n", "$$\n", "where $C(p)\\equiv Q \\begin{bmatrix} g_x(p) \\\\ I\\end{bmatrix}$, $B(p) \\equiv \\eta \\Gamma(p)$, and $\\mathbb{V}(v) = D(\\nu) D(p)^{\\top}$. Normality of $\\nu$ or $\\epsilon'$ is not required in general.\n", "\n", "This is a linear state-space model and if the priors and shocks are Gaussian, a marginal likelihood can be evaluated with classic methods such as a Kalman Filter. The output of the perturbation can be used manually, or in conjunction with [DifferenceEquations.jl](https://github.com/SciML/DifferenceEquations.jl).\n", "\n", "## Second Order Solutions\n", "This package also solves for second order perturbations, with all of the same features. The canonical form of the second order solution implements state-space pruning as in [Andreasen, Fernandez-Villaverde, and Rubio-Ramirez (2017)](https://www.sas.upenn.edu/~jesusfv/Pruning.pdf). Given the same formulation of the problem, the model finds the canonical form,\n", "\n", "$$\n", "\\hat{x}' = A_0(p) + A_1(p)\\ \\hat{x} + \\hat{x}^{\\top} A_2(p)\\ \\hat{x} + B(p) \\epsilon'\n", "$$\n", "\n", "and\n", "\n", "$$\n", "\\hat{y} = g_{\\sigma\\sigma} + g_x \\ \\hat{x} + \\hat{x}^{\\top}\\ g_{xx}\\\\hat{x}\n", "$$\n", "\n", "and \n", "\n", "$$\n", "\\hat{z} = C_0(p) + C_1(p)\\ \\hat{x} + \\hat{x}^{\\top} C_2(p)\\ \\hat{x} + \\nu\n", "$$\n", "\n", "where $B(p) \\equiv \\eta \\Gamma(p)$, $\\mathbb{V}(v) = D(\\nu) D(p)^{\\top}$, and the $C_0,C_1,C_2,A_0, A_1, A_2$ all reflect the state space pruning.\n", "\n", "## Gradients\n", "All of the above use standard solution methods. The primary contribution of this package is that all of these model elements are **differentiable**. Hence, these gradeients can be composed for use in applications such as optimization, gradient-based estimation methods, and with [DifferenceEquations.jl](https://github.com/SciML/DifferenceEquations.jl) which provides differentiable simulations and likelihoods for state-space models. That is, if we think of a perturbation solver mapping $p$ to solutions (e.g. in first order $\\mathbf{P}(p) \\to (A, B, C, D)$, then we can find the gradients $\\partial_p \\mathbf{P}(p), \\partial_p A(p)$ etc. Or, when these gradients are available for use with reverse-mode auto-differentiation, it can take \"wobbles\" in $A, B, C, D$ and go back to the \"wiggles\" of the underlying $p$ through $\\mathbf{P}$. See [ChainRules.jl](https://juliadiff.org/ChainRulesCore.jl/stable/maths/propagators.html) for more details on AD. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Model Primitives\n", "Models are defined using a Dynare-style DSL using [Symbolics.jl](https://github.com/JuliaSymbolics/Symbolics.jl). The list of primitives are:\n", "1. The list of variables for the controls $y$, state $x$, and deep parameters $p$.\n", "2. The set of equations $H$ as a function of $p, y(t), y(t+1), x(t),$ and $x(t+1)$. No $t-1$ timing is allowed.\n", "3. The loading of shocks $\\eta$ as a fixed matrix of constants\n", "4. The shock covariance Cholesky factor $\\Gamma$ as a function of parameters $p$\n", "5. The observation equation $Q$ as a fixed matrix.\n", "6. The Cholesky factor of the observation errors, $\\Omega$ as a function of parameters $p$. At this point only a diagonal matrix is support.\n", "7. Either the steady state equations for all of $y$ and $x$ in closed form as a function of $p$, or initial conditions for the nonlinear solution to solve for the steady state as functions of $p$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# RBC Example\n", "This example implements the model primitives to build a simple version of the RBC model with two observables." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Perturbation Model: n_y = 2, n_x = 2, n_p = 6, n_ϵ = 1, n_z = 2\n", " y = [:c, :q] \n", " x = [:k, :z] \n", " p = [:α, :β, :ρ, :δ, :σ, :Ω_1]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "∞ = Inf\n", "@variables α, β, ρ, δ, σ, Ω_1\n", "@variables t::Integer, k(..), z(..), c(..), q(..)\n", "\n", "x = [k, z] # states\n", "y = [c, q] # controls\n", "p = [α, β, ρ, δ, σ, Ω_1] # parameters\n", "\n", "H = [1 / c(t) - (β / c(t + 1)) * (α * exp(z(t + 1)) * k(t + 1)^(α - 1) + (1 - δ)),\n", " c(t) + k(t + 1) - (1 - δ) * k(t) - q(t),\n", " q(t) - exp(z(t)) * k(t)^α,\n", " z(t + 1) - ρ * z(t)] # system of model equations\n", "\n", "# analytic solutions for the steady state. Could pass initial values and run solver and use initial values with steady_states_iv\n", "steady_states = [k(∞) ~ (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),\n", " z(∞) ~ 0,\n", " c(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1)) -\n", " δ * (((1 / β) - 1 + δ) / α)^(1 / (α - 1)),\n", " q(∞) ~ (((1 / β) - 1 + δ) / α)^(α / (α - 1))]\n", "\n", "\n", "Γ = [σ;;] # matrix for the 1 shock. The [;;] notation just makes it a matrix rather than vector in julia\n", "η = [0; -1;;] # η is n_x * n_ϵ matrix. The [;;] notation just makes it a matrix rather than vector in julia\n", "\n", "# observation matrix. order is \"y\" then \"x\" variables, so [c,q,k,z] in this example\n", "Q = [1.0 0 0 0; # select c as first \"z\" observable\n", " 0 0 1.0 0] # select k as second \"z\" observable\n", "\n", "# diagonal cholesky of covariance matrix for observation noise (so these are standard deviations). Non-diagonal observation noise not currently supported\n", "Ω = [Ω_1, Ω_1]\n", "\n", "# Generates the files and includes if required. If the model is already created, then just loads\n", "overwrite_model_cache = true\n", "model_rbc = @make_and_include_perturbation_model(\"rbc_notebook_example\", H, (; t, y, x, p, steady_states, Γ, Ω, η, Q, overwrite_model_cache)) # Convenience macro. Saves as \".function_cache/rbc_notebook_example.jl\"\n", "\n", "# After generation could just include the files directly, or add the following to prevent overwriting the models\n", "# isdefined(Main, :rbc_notebook_example) || include(joinpath(pkgdir(DifferentiableStateSpaceModels),\n", "# \"test/generated_models/rbc_notebook_example.jl\"))\n", "# Alternatively, this is also the same example as a prebuilt example\n", "# model_rbc = @include_example_module(DifferentiableStateSpaceModels.Examples.rbc_observables)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As commented above, the convenience macro will take this model, generate files, and include them. It called the function [`make_perturbation_model`](https://github.com/HighDimensionalEconLab/DifferentiableStateSpaceModels.jl/blob/main/src/make_perturbation_model.jl#L6-L13) with the associated options in that file. Once generated these can be included as normal `.jl` files. At this point, there are no runtime generated functions though that functionality could be added later. The following queries some of the properties of the loaded model." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\begin{equation}\n", "\\left[\n", "\\begin{array}{c}\n", "\\frac{ - \\beta \\left( 1 - \\delta + \\left( k\\left( 1 + t \\right) \\right)^{-1 + \\alpha} \\alpha e^{z\\left( 1 + t \\right)} \\right)}{c\\left( 1 + t \\right)} + \\frac{1}{c\\left( t \\right)} \\\\\n", " - q\\left( t \\right) - \\left( 1 - \\delta \\right) k\\left( t \\right) + c\\left( t \\right) + k\\left( 1 + t \\right) \\\\\n", " - \\left( k\\left( t \\right) \\right)^{\\alpha} e^{z\\left( t \\right)} + q\\left( t \\right) \\\\\n", " - \\rho z\\left( t \\right) + z\\left( 1 + t \\right) \\\\\n", "\\end{array}\n", "\\right]\n", "\\end{equation}\n", "$" ], "text/plain": [ "L\"$\\begin{equation}\n", "\\left[\n", "\\begin{array}{c}\n", "\\frac{ - \\beta \\left( 1 - \\delta + \\left( k\\left( 1 + t \\right) \\right)^{-1 + \\alpha} \\alpha e^{z\\left( 1 + t \\right)} \\right)}{c\\left( 1 + t \\right)} + \\frac{1}{c\\left( t \\right)} \\\\\n", " - q\\left( t \\right) - \\left( 1 - \\delta \\right) k\\left( t \\right) + c\\left( t \\right) + k\\left( 1 + t \\right) \\\\\n", " - \\left( k\\left( t \\right) \\right)^{\\alpha} e^{z\\left( t \\right)} + q\\left( t \\right) \\\\\n", " - \\rho z\\left( t \\right) + z\\left( 1 + t \\right) \\\\\n", "\\end{array}\n", "\\right]\n", "\\end{equation}\n", "$\"" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_H_latex(model_rbc)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$\\begin{align}\n", "k\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{1}{-1 + \\alpha}} \\\\\n", "z\\left( \\infty \\right) =& 0 \\\\\n", "c\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{\\alpha}{-1 + \\alpha}} - \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{1}{-1 + \\alpha}} \\delta \\\\\n", "q\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{\\alpha}{-1 + \\alpha}}\n", "\\end{align}\n", "$" ], "text/plain": [ "L\"$\\begin{align}\n", "k\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{1}{-1 + \\alpha}} \\\\\n", "z\\left( \\infty \\right) =& 0 \\\\\n", "c\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{\\alpha}{-1 + \\alpha}} - \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{1}{-1 + \\alpha}} \\delta \\\\\n", "q\\left( \\infty \\right) =& \\left( \\frac{-1 + \\delta + \\frac{1}{\\beta}}{\\alpha} \\right)^{\\frac{\\alpha}{-1 + \\alpha}}\n", "\\end{align}\n", "$\"" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model_steady_states_latex(model_rbc) # this is the closed form steady state, could instead pass in the state_states_iv to give approximate initial conditions for optimizer instead" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z) = (2, 2, 1, 2)\n" ] }, { "data": { "text/plain": [ "(2, 2, 1, 2)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@show model_rbc.n_x, model_rbc.n_y, model_rbc.n_ϵ, model_rbc.n_z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solving Perturbations\n", "\n", "Given a model definition, you can solve the expectational difference equation for a given set of parameters. All of the `p` defined above must be in either the `p_d` or the `p_f` arguments below.\n", "\n", "The distinction between these is that the `p_d` parameters will be available for differentiation where the `p_f` will be fixed. That is, for any parameters in the `p_d` the model will support calculation of gradients of the perturbation solution. The only downside with having extra values in `p_d` vs. `p_f` is extra computation. Otherwise, pass in `nothing` for `p_f` if you are differentiating all of the parameters" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f)) = (:Success, :Success, true)\n" ] }, { "data": { "text/plain": [ "(:Success, :Success, true)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "p_d = (α = 0.5, β = 0.95) # Pseudo-true values\n", "m = model_rbc # ensure notebook executed above\n", "sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC\n", "sol_2 = generate_perturbation(model_rbc, p_d, p_f, Val(2)); # Solution to the second-order RBC\n", "@show sol.retcode, sol_2.retcode, verify_steady_state(m, p_d, p_f) # the final call checks that the analytically provided steady-state solution is correct" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The perturbation solution (in the canonical form described in the top section) can be queried from the resulting solution. A few examples for the first order solution are below," ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(sol.y, sol.x) = ([5.936252888048732, 6.884057971014497], [47.39025414828823, 0.0])\n", "sol.g_x = [0.09579643002416627 0.6746869652586192; 0.07263157894736873 6.884057971014506]\n", "(sol.A, sol.B) = ([0.9568351489232028 6.2093710057558855; 1.5076865909646354e-18 0.20000000000000004], [0.0; -0.01;;])\n", "(sol.C, sol.D) = ([0.09579643002416627 0.6746869652586192; 1.0 0.0], [0.0001, 0.0001])\n", "sol.x_ergodic_var = [0.07005411173180152 0.00015997603451513398; 0.00015997603451513398 0.00010416666666666667]\n" ] } ], "source": [ "@show sol.y, sol.x # steady states y_ss and x_ss These are the values such that y ≡ ŷ + sol.y and x ≡ x̂ + sol.x\n", "@show sol.g_x # the policy\n", "@show sol.A, sol.B # the evolution equation of the state, so that x̂' = A x̂ + B ϵ\n", "@show sol.C, sol.D; # the evolution equation of the state, so that z = C x̂ + ν with variance of ν as D D'.\n", "@show sol.x_ergodic_var; # covariance matrix of the ergodic distribution of x̂, which is mean zero since x̂ ≡ x - x_ss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition, you can query some metadata about the solution to aid in debugging and generic programming" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(sol.Q, sol.η, sol.Γ) = ([1.0 0.0 0.0 0.0; 0.0 0.0 1.0 0.0], [0; -1;;], [0.01;;])\n", "(sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p) = (2, 2, 1, 2, 6)\n", "(sol.x_symbols, sol.y_symbols, sol.p_symbols) = ([:k, :z], [:c, :q], [:α, :β, :ρ, :δ, :σ, :Ω_1])\n" ] } ], "source": [ "@show sol.Q, sol.η, sol.Γ; # various matrices for loading of shocks and observations\n", "@show sol.n_y, sol.n_x, sol.n_ϵ, sol.n_z, sol.n_p # sizes of model\n", "@show sol.x_symbols, sol.y_symbols, sol.p_symbols; # the order of symbols for the x vector" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions of Perturbation Solutions (and their Derivatives)\n", "The core feature of this library is to enable gradients of the perturbation solutions with respect to parameters (i.e., anything in the `p_d` vector). To show this, we will construct a function which uses the resulting law of motion and finds the gradient of the results with respect to this value." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.052773688889493325" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function IRF(p_d, ϵ_0; m, p_f, steps)\n", " sol = generate_perturbation(m, p_d, p_f) # First-order perturbation by default, pass Val(2) as additional argument to do 2nd order.\n", " x = sol.B * ϵ_0 # start after applying impulse with the model's shock η and Γ(p)\n", " for _ in 1:steps\n", " # A note: you cannot use mutating expressions here with most AD code. i.e. x .= sol.A * x wouldn't work\n", " # For more elaborate simuluations, you would want to use DifferenceEquations.jl in practice\n", " x = sol.A * x # iterate forward using first-order observation equation \n", " end \n", " return [0, 1]' * sol.C * x # choose the second observable using the model's C observation equation since first-order\n", "end\n", "\n", "m = model_rbc # ensure notebook executed above\n", "p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01) # not differentiated \n", "p_d = (α=0.5, β=0.95) # different parameters\n", "steps = 10 # steps ahead to forecast\n", "ϵ_0 = [1.0] # shock size\n", "IRF(p_d, ϵ_0; m, p_f, steps) # Function works on its own, calculating perturbation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, the real benefit is that this function can itself be differentiated, to find gradients with respect to the deep parameters `p_d` and the shock `ϵ_0`" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Using the Zygote auto-differentiation library already loaded above\n", "p_d = (α=0.5, β=0.95) # different parameters\n", "ϵ_0 = [1.0] # shock size\n", "IRF_grad = gradient((p_d, ϵ_0) -> IRF(p_d, ϵ_0; m, p_f, steps), p_d, ϵ_0) # \"closes\" over the m, p_f, and steps to create a new function, and differentiates it with respect to other arguments" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "d/dα IRF(p_d, ϵ0) = -0.5810083808068425, d/dβ IRF(p_d, ϵ0) = -1.182312351872585, d/dϵ0 IRF(p_d, ϵ0) = [-0.05277368888949331]\n" ] } ], "source": [ "println(\"d/dα IRF(p_d, ϵ0) = $(IRF_grad[1].α), d/dβ IRF(p_d, ϵ0) = $(IRF_grad[1].β), d/dϵ0 IRF(p_d, ϵ0) = $(IRF_grad[2])\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Solution Cache\n", "\n", "The calculation of the gradients of the perturbation solution is slow relative to the perturbation itself, and both can allocate a significant amount of memory. If you intend to keep calculating a perturbation a large number of times (e.g., during estimation), then you may want to preallocate this memory and reuse it by passing in a cache." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "IRF_cache(p_d, ϵ_0; m, p_f, steps, cache) = -0.052773688889493325\n" ] }, { "data": { "text/plain": [ "((α = -0.5810083808068425, β = -1.182312351872585), [-0.05277368888949331])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)\n", " sol = generate_perturbation(m, p_d, p_f; cache) # only difference is passing in a pre-allocated cache to the perturbation solver\n", " x = sol.B * ϵ_0\n", " for _ in 1:steps\n", " x = sol.A * x\n", " end \n", " return [0, 1]' * sol.C * x\n", "end\n", "\n", "# To call it\n", "m = model_rbc # ensure notebook executed above\n", "p_d = (α=0.5, β=0.95) # Differentiated parameters\n", "p_f = (ρ=0.2, δ=0.02, σ=0.01, Ω_1=0.01)\n", "steps = 10 # steps ahead to forecast\n", "ϵ_0 = [1.0] # shock size\n", "cache = SolverCache(m, Val(1), p_d) # the p_d passed into the cache just reuses the symbols, and the values in p_d are unused\n", "@show IRF_cache(p_d, ϵ_0; m, p_f, steps, cache)\n", "gradient((p_d, ϵ_0) -> IRF_cache(p_d, ϵ_0; m, p_f, steps, cache), p_d, ϵ_0) # as before, do not try to differentiate with respect to the cache values itself." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 64.608 μs (214 allocations: 86.59 KiB)\n", " 55.104 μs (125 allocations: 75.72 KiB)\n" ] }, { "data": { "text/plain": [ "-0.052773688889493325" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# However, in a problem this small you will not see much of a difference in performance\n", "@btime IRF($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps)\n", "@btime IRF_cache($p_d, $ϵ_0; m = $m, p_f = $p_f, steps = $steps, cache = $cache)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulating with DifferenceEquations.jl\n", "The manual iteration of the state-space model from the perturbation solution is possible, but can be verbose and difficult to achieve efficiency for gradients. One benefit of this package is that it creates state-space models in a form consistent with [DifferenceEquations.jl](https://github.com/SciML/DifferenceEquations.jl) which can be easily simulated, visualized, and estimated.\n", "\n", "To do this, we will calculate a perturbation solution then simulate it for various `x_0` drawn from the ergodic solution." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Precompiling GR_jll [d2c73de3-f751-5644-a686-071e5b155ba9]\n", "└ @ Base loading.jl:1423\n" ] } ], "source": [ "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "p_d = (α = 0.5, β = 0.95) # Pseudo-true values\n", "m = model_rbc # ensure notebook executed above\n", "sol = generate_perturbation(m, p_d, p_f) # Solution to the first-order RBC\n", "\n", "# Simulate T observations from a random initial condition\n", "T = 20\n", "\n", "# draw from ergodic distribution for the initial condition\n", "x_iv = MvNormal(sol.x_ergodic_var)\n", "problem = LinearStateSpaceProblem(sol, x_iv, (0, T))\n", "plot(solve(problem))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `LinearStateSpaceProblem` type is automatically constructed from the underlying perturbation. However, we can override any of these options, or pass in our own noise rather than simulate it for a particular experiment" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are \"noise\" in DifferenceEquations for SciML compatibility\n", "x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution \n", "problem = LinearStateSpaceProblem(sol, x_iv, (0, T); noise)\n", "plot(solve(problem))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To demonstrate the composition of gradients between DifferenceEquations and DifferentiableStateSpaceModels lets adapt this function to simulate an impulse with fixed noise and look at the final observable" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.001039731618424578" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function last_observable(p_d, noise, x_iv; m, p_f, T)\n", " sol = generate_perturbation(m, p_d, p_f)\n", " problem = LinearStateSpaceProblem(sol, x_iv, (0, T);noise, observables_noise = nothing) # removing observation noise\n", " return solve(problem).z[end][2] # return 2nd argument of last observable\n", "end\n", "T = 100\n", "noise = Matrix([1.0; zeros(T-1)]') # the ϵ shocks are \"noise\" in DifferenceEquations for SciML compatibility\n", "x_iv = [0.0, 0.0] # can pass in a single value rather than a distribution\n", "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "p_d = (α = 0.5, β = 0.95) # Pseudo-true values\n", "m = model_rbc # ensure notebook executed above\n", "last_observable(p_d, noise, x_iv; m, p_f, T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, as before, we can calculate gradients with respect to the underlying `p_d` parameters, but also with respect to the noise which will demonstrate a key benefit of these methods, as they can let us do a joint likelihood of the latent variables in cases where they cannot be easily marginalized out (e.g., non-Gaussian or nonlinear). Note that the dimensionality of this gradient is over 100." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((α = -0.02350208860044909, β = -0.08002797800781031), [-0.0010397316184245775 -0.0010866361040296902 … -0.06209371005755886 0.0], [0.012125846203919335, 0.09948517579554432])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gradient((p_d, noise, x_iv) -> last_observable(p_d, noise, x_iv; m, p_f, T), p_d, noise, x_iv)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can use a simple utility functions to investigate an IRF." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "p_d = (α = 0.5, β = 0.95) # Pseudo-true values\n", "m = model_rbc\n", "sol = generate_perturbation(model_rbc, p_d, p_f)\n", "ϵ0 = [1.0]\n", "T = 10\n", "sim = irf(sol, ϵ0, T)\n", "plot(sim)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sneak Peak at SciML Compatible Functionality\n", "Finally, there are a variety of features of SciML which are supported. For example, parallel simulations of ensembles and associated summary statistics" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "t: 0:10\n", "u: 11-element Vector{Vector{Float64}}:\n", " [-0.06849616749168119, 0.002430490689003771]\n", " [-0.018232243727516317, 4.178362879834265e-5]\n", " [-0.006130734877711555, -0.0012801506033949567]\n", " [-0.029692362571749217, 0.0029167124862915397]\n", " [-0.01418677984731015, 8.880061768457687e-5]\n", " [-0.012168309195379916, 0.0003679002721679268]\n", " [-0.02903024314533037, -0.0017796657403380003]\n", " [-0.017594841603510616, -0.003123852451232536]\n", " [-0.042044233027366255, 9.676460362403633e-5]\n", " [-0.03947618577501135, 0.0007834629342420161]\n", " [-0.053775744337300646, 0.005120733305282458]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Simulate multiple trajectories with T observations\n", "trajectories = 40\n", "x_iv = MvNormal(sol.x_ergodic_var)\n", "problem = LinearStateSpaceProblem(sol, x_iv, (0, T))\n", "\n", "# Solve multiple trajectories and plot an ensemble\n", "ensemble_results = solve(EnsembleProblem(problem), DirectIteration(), EnsembleThreads();\n", " trajectories)\n", "summ = EnsembleSummary(ensemble_results) # see SciML documentation. Calculates median and other quantiles automatically.\n", "summ.med # median values for the \"x\" simulated ensembles" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "plot(summ, fillalpha= 0.2) # plots by default show the median and quantiles of both variables. Modifying transparency as an example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate sequence of observables\n", "We can use the underlying state-space model to easily simulate states and observables" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2×21 Matrix{Float64}:\n", " -0.0367643 -0.0463864 -0.0366824 … 0.0156731 -0.00214294 0.00295132\n", " -0.172727 -0.257091 -0.311023 0.0203809 0.0373975 0.0446603" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Simulate T observations\n", "T = 20\n", "\n", "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "p_d = (α = 0.5, β = 0.95) # Pseudo-true values\n", "sol = generate_perturbation(model_rbc, p_d, p_f) # Solution to the first-order RBC\n", "\n", "x_iv = MvNormal(sol.x_ergodic_var) # draw initial conditions from the ergodic distribution\n", "problem = LinearStateSpaceProblem(sol, x_iv, (0, T))\n", "sim = solve(problem, DirectIteration())\n", "ϵ = sim.W # store the underlying noise in the simulation\n", "\n", "# Collapse to simulated observables as a matrix - as required by current DifferenceEquations.jl likelihood\n", "# see https://github.com/SciML/DifferenceEquations.jl/issues/55 for direct support of this datastructure\n", "z_rbc = hcat(sim.z...) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Bayesian estimation with Turing \n", "The combination of this package and DifferenceEquations enables gradient-based estimation methods such as NUTS/HMC. We will show this with Julia's Turing package, which provides a probabalistic programming lanuage (PPL) in the same spirit as Stan.\n", "\n", "For gradients, we use the Zygote.jl reverse-mode auto-differentiation package, which is compatible with both DSSM and DifferenceEquations.\n", "\n", "We will consider two likelihoods: a Kalman-Filter where we marginalize out the latent dimensions for linear-gaussian problems, and a joint distribution between the latents and the deep parameters. In both cases, gradients of the underlying likelihoods are provided by the packages." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "using Turing\n", "using Turing: @addlogprob!\n", "Turing.setadbackend(:zygote); # Especially when we sample the latent noise, we will require high-dimensional gradients with reverse-mode AD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First-order, marginal likelihood approach\n", "This case samples the deep parameters, calculates the state-space model from the first-order perturbation, then evaluates the likelihood using a Kalman Filter." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Found initial step size\n", "│ ϵ = 0.025\n", "└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188\n", "┌ Warning: The current proposal will be rejected due to numerical error(s).\n", "│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)\n", "└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47\n", "\u001b[32mSampling: 7%|██▊ | ETA: 0:00:22\u001b[39m┌ Warning: The current proposal will be rejected due to numerical error(s).\n", "│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true)\n", "└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47\n", "\u001b[32mSampling: 100%|█████████████████████████████████████████| Time: 0:00:17\u001b[39m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Sampling. Ignore any 'rejected due to numerical errors' warnings\n" ] }, { "data": { "text/plain": [ "Chains MCMC chain (1000×14×1 Array{Float64, 3}):\n", "\n", "Iterations = 251:1:1250\n", "Number of chains = 1\n", "Samples per chain = 1000\n", "Wall duration = 79.41 seconds\n", "Compute duration = 79.41 seconds\n", "parameters = α, β\n", "internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n", "\n", "Summary Statistics\n", " \u001b[1m parameters \u001b[0m \u001b[1m mean \u001b[0m \u001b[1m std \u001b[0m \u001b[1m naive_se \u001b[0m \u001b[1m mcse \u001b[0m \u001b[1m ess \u001b[0m \u001b[1m rhat \u001b[0m \u001b[1m e\u001b[0m ⋯\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m \u001b[0m ⋯\n", "\n", " α 0.4844 0.0207 0.0007 0.0010 450.9653 1.0088 ⋯\n", " β 0.9506 0.0050 0.0002 0.0003 346.2868 1.0066 ⋯\n", "\u001b[36m 1 column omitted\u001b[0m\n", "\n", "Quantiles\n", " \u001b[1m parameters \u001b[0m \u001b[1m 2.5% \u001b[0m \u001b[1m 25.0% \u001b[0m \u001b[1m 50.0% \u001b[0m \u001b[1m 75.0% \u001b[0m \u001b[1m 97.5% \u001b[0m\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m\n", "\n", " α 0.4432 0.4714 0.4840 0.4983 0.5260\n", " β 0.9416 0.9470 0.9505 0.9537 0.9602\n" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Turing model definition\n", "@model function rbc_kalman(z, m, p_f, settings)\n", " α ~ Uniform(0.2, 0.8) # priors\n", " β ~ Uniform(0.5, 0.99) \n", " p_d = (; α, β)\n", " T = size(z, 2)\n", " # also passes in the p_f for fixed values\n", " sol = generate_perturbation(m, p_d, p_f, Val(1); settings) # first-order perturbation\n", " if !(sol.retcode == :Success) # if the perturbation failed, we want to return -infinity for the likelihood and resample\n", " @addlogprob! -Inf\n", " return\n", " end\n", " problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z) # utility constructs a LinearStateSpaceProblem from the sol.A, sol.B, etc.\n", " @addlogprob! solve(problem, KalmanFilter()).logpdf # Linear-Gaussian so can marginalize with kalman filter. It should automatically choose the KalmanFilter in this case if not provided.\n", "end\n", "cache = SolverCache(model_rbc, Val(1), [:α, :β])\n", "settings = PerturbationSolverSettings(; print_level = 0)\n", "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "z = z_rbc # simulated in previous steps\n", "turing_model = rbc_kalman(z, model_rbc, p_f, settings) # passing observables from before \n", "\n", "n_samples = 1000\n", "n_adapts = 250\n", "δ = 0.65\n", "alg = NUTS(n_adapts,δ)\n", "println(\"Sampling. Ignore any 'rejected due to numerical errors' warnings\") # At this point, Turing can't turn off those warnings\n", "chain_1_marginal = sample(turing_model, alg, n_samples; progress = true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First-order, joint likelihood approach\n", "The second case instead draws all of the latents and evaluates the joint likelihood of all. This is slower than the the marginal case of a Kalman filter in cases where everything is linear-Gaussian, but it enables non-gaussian shocks and more flexibility. This approach is used for the 2nd order perturbations in the next case instead of marginalizing with a particle filter.\n", "\n", "It is possible that the joint likelihood becomes more competitive with the Kalman filter for large models, but for now consider this a demonstration that sampling the high-dimensional latent variables gives the correct answer and is not unusably slow." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Warning: The current proposal will be rejected due to numerical error(s).\n", "│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)\n", "└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47\n", "┌ Warning: The current proposal will be rejected due to numerical error(s).\n", "│ isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false)\n", "└ @ AdvancedHMC /Users/dchilder/.julia/packages/AdvancedHMC/51xgc/src/hamiltonian.jl:47\n", "┌ Info: Found initial step size\n", "│ ϵ = 0.00625\n", "└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188\n", "\u001b[32mSampling: 100%|█████████████████████████████████████████| Time: 0:04:25\u001b[39m\n" ] }, { "data": { "text/plain": [ "Chains MCMC chain (300×35×1 Array{Float64, 3}):\n", "\n", "Iterations = 51:1:350\n", "Number of chains = 1\n", "Samples per chain = 300\n", "Wall duration = 332.18 seconds\n", "Compute duration = 332.18 seconds\n", "parameters = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21]\n", "internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n", "\n", "Summary Statistics\n", " \u001b[1m parameters \u001b[0m \u001b[1m mean \u001b[0m \u001b[1m std \u001b[0m \u001b[1m naive_se \u001b[0m \u001b[1m mcse \u001b[0m \u001b[1m ess \u001b[0m \u001b[1m rhat \u001b[0m \u001b[1m e\u001b[0m ⋯\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m \u001b[0m ⋯\n", "\n", " α 0.5229 0.0181 0.0010 0.0022 69.6873 0.9972 ⋯\n", " β 0.9477 0.0051 0.0003 0.0004 209.0060 0.9969 ⋯\n", " ϵ_draw[1] 3.3707 0.5082 0.0293 0.0637 63.0585 0.9988 ⋯\n", " ϵ_draw[2] 0.2689 0.2019 0.0117 0.0053 314.7188 0.9967 ⋯\n", " ϵ_draw[3] 0.7577 0.2222 0.0128 0.0183 145.6377 0.9967 ⋯\n", " ϵ_draw[4] -0.3051 0.2072 0.0120 0.0125 279.1289 0.9969 ⋯\n", " ϵ_draw[5] 0.4963 0.2128 0.0123 0.0124 274.8647 0.9975 ⋯\n", " ϵ_draw[6] 0.2234 0.2034 0.0117 0.0096 319.6311 1.0020 ⋯\n", " ϵ_draw[7] 0.7579 0.2149 0.0124 0.0157 185.8446 0.9972 ⋯\n", " ϵ_draw[8] -1.0655 0.2382 0.0138 0.0249 116.7922 0.9978 ⋯\n", " ϵ_draw[9] 0.2374 0.2057 0.0119 0.0154 229.3082 0.9969 ⋯\n", " ϵ_draw[10] 0.0883 0.1932 0.0112 0.0117 405.8351 0.9968 ⋯\n", " ϵ_draw[11] 0.9973 0.2473 0.0143 0.0185 173.9337 0.9990 ⋯\n", " ϵ_draw[12] -0.3812 0.2214 0.0128 0.0099 364.7857 0.9975 ⋯\n", " ϵ_draw[13] 0.2082 0.2131 0.0123 0.0099 337.4811 0.9979 ⋯\n", " ϵ_draw[14] -0.3477 0.1845 0.0107 0.0106 201.1109 0.9971 ⋯\n", " ϵ_draw[15] -1.0082 0.2325 0.0134 0.0222 117.0747 0.9984 ⋯\n", " ϵ_draw[16] -1.8721 0.3316 0.0191 0.0370 82.6932 0.9968 ⋯\n", " ϵ_draw[17] 0.0170 0.1936 0.0112 0.0115 218.6334 0.9971 ⋯\n", " ϵ_draw[18] -0.4615 0.2105 0.0122 0.0117 282.9331 0.9967 ⋯\n", " ϵ_draw[19] -0.1628 0.1887 0.0109 0.0111 326.3768 0.9970 ⋯\n", " ϵ_draw[20] -0.0303 0.2101 0.0121 0.0079 398.9178 0.9977 ⋯\n", " ϵ_draw[21] -0.0619 0.9117 0.0526 0.1126 63.0430 0.9974 ⋯\n", "\u001b[36m 1 column omitted\u001b[0m\n", "\n", "Quantiles\n", " \u001b[1m parameters \u001b[0m \u001b[1m 2.5% \u001b[0m \u001b[1m 25.0% \u001b[0m \u001b[1m 50.0% \u001b[0m \u001b[1m 75.0% \u001b[0m \u001b[1m 97.5% \u001b[0m\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m\n", "\n", " α 0.4872 0.5094 0.5236 0.5349 0.5554\n", " β 0.9374 0.9445 0.9476 0.9510 0.9575\n", " ϵ_draw[1] 2.4592 2.9460 3.3925 3.7234 4.3613\n", " ϵ_draw[2] -0.1099 0.1333 0.2589 0.3936 0.6971\n", " ϵ_draw[3] 0.3833 0.5889 0.7498 0.9022 1.2460\n", " ϵ_draw[4] -0.7119 -0.4320 -0.2715 -0.1825 0.1024\n", " ϵ_draw[5] 0.0757 0.3503 0.4837 0.6273 0.9723\n", " ϵ_draw[6] -0.1415 0.0891 0.2006 0.3422 0.6636\n", " ϵ_draw[7] 0.3246 0.6242 0.7338 0.8935 1.1931\n", " ϵ_draw[8] -1.5653 -1.2191 -1.0338 -0.8900 -0.6837\n", " ϵ_draw[9] -0.1407 0.0931 0.2308 0.3688 0.6677\n", " ϵ_draw[10] -0.3329 -0.0311 0.0836 0.2166 0.4600\n", " ϵ_draw[11] 0.5293 0.8480 0.9825 1.1291 1.5695\n", " ϵ_draw[12] -0.8768 -0.5111 -0.3606 -0.2229 0.0133\n", " ϵ_draw[13] -0.1808 0.0588 0.2068 0.3394 0.6536\n", " ϵ_draw[14] -0.7252 -0.4677 -0.3344 -0.2255 0.0101\n", " ϵ_draw[15] -1.4794 -1.1560 -0.9809 -0.8436 -0.6249\n", " ϵ_draw[16] -2.5968 -2.0884 -1.8550 -1.6155 -1.2956\n", " ϵ_draw[17] -0.3888 -0.1104 0.0171 0.1467 0.3788\n", " ϵ_draw[18] -0.8963 -0.5928 -0.4580 -0.3050 -0.1086\n", " ϵ_draw[19] -0.5132 -0.2866 -0.1720 -0.0235 0.2068\n", " ϵ_draw[20] -0.4202 -0.1648 -0.0474 0.1009 0.3734\n", " ϵ_draw[21] -1.8476 -0.6901 -0.0752 0.4894 2.0249\n" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Turing model definition\n", "@model function rbc_1_joint(z, m, p_f, cache, settings)\n", " α ~ Uniform(0.2, 0.8)\n", " β ~ Uniform(0.5, 0.99)\n", " p_d = (; α, β)\n", " T = size(z, 2)\n", " ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0)\n", " ϵ = reshape(ϵ_draw, m.n_ϵ, T)\n", " sol = generate_perturbation(m, p_d, p_f, Val(1); cache, settings)\n", " if !(sol.retcode == :Success)\n", " @addlogprob! -Inf\n", " return\n", " end\n", " problem = LinearStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)\n", " @addlogprob! solve(problem, DirectIteration()).logpdf # should choose DirectIteration() by default if not provided\n", "end\n", "cache = SolverCache(model_rbc, Val(1), [:α, :β])\n", "settings = PerturbationSolverSettings(; print_level = 0)\n", "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "z = z_rbc # simulated in previous steps\n", "turing_model = rbc_1_joint(z, model_rbc, p_f, cache, settings) # passing observables from before \n", "\n", "n_samples = 300\n", "n_adapts = 50\n", "δ = 0.65\n", "alg = NUTS(n_adapts,δ)\n", "chain_1_joint = sample(turing_model, alg, n_samples; progress = true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second-order, joint likelihood approach\n", "Finally, this calculates a second order perturbation and then samples the latents and deep parameters. Of course, the Kalman Filter would not provide the correct likelihood so the proper comparison is something like a particle filter." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "┌ Info: Found initial step size\n", "│ ϵ = 0.0125\n", "└ @ Turing.Inference /Users/dchilder/.julia/packages/Turing/S4Y4B/src/inference/hmc.jl:188\n", "\u001b[32mSampling: 100%|█████████████████████████████████████████| Time: 0:03:44\u001b[39m\n" ] }, { "data": { "text/plain": [ "Chains MCMC chain (300×35×1 Array{Float64, 3}):\n", "\n", "Iterations = 51:1:350\n", "Number of chains = 1\n", "Samples per chain = 300\n", "Wall duration = 273.11 seconds\n", "Compute duration = 273.11 seconds\n", "parameters = α, β, ϵ_draw[1], ϵ_draw[2], ϵ_draw[3], ϵ_draw[4], ϵ_draw[5], ϵ_draw[6], ϵ_draw[7], ϵ_draw[8], ϵ_draw[9], ϵ_draw[10], ϵ_draw[11], ϵ_draw[12], ϵ_draw[13], ϵ_draw[14], ϵ_draw[15], ϵ_draw[16], ϵ_draw[17], ϵ_draw[18], ϵ_draw[19], ϵ_draw[20], ϵ_draw[21]\n", "internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size\n", "\n", "Summary Statistics\n", " \u001b[1m parameters \u001b[0m \u001b[1m mean \u001b[0m \u001b[1m std \u001b[0m \u001b[1m naive_se \u001b[0m \u001b[1m mcse \u001b[0m \u001b[1m ess \u001b[0m \u001b[1m rhat \u001b[0m \u001b[1m e\u001b[0m ⋯\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m \u001b[0m ⋯\n", "\n", " α 0.5250 0.0180 0.0010 0.0021 48.1201 1.0333 ⋯\n", " β 0.9477 0.0057 0.0003 0.0003 809.0493 0.9977 ⋯\n", " ϵ_draw[1] 3.3547 0.5340 0.0308 0.0701 38.0331 1.0474 ⋯\n", " ϵ_draw[2] 0.2523 0.1901 0.0110 0.0081 267.7918 1.0010 ⋯\n", " ϵ_draw[3] 0.7605 0.2202 0.0127 0.0166 138.2808 1.0039 ⋯\n", " ϵ_draw[4] -0.3092 0.2105 0.0122 0.0153 212.3421 0.9968 ⋯\n", " ϵ_draw[5] 0.4908 0.2067 0.0119 0.0143 248.2859 0.9973 ⋯\n", " ϵ_draw[6] 0.2183 0.2034 0.0117 0.0152 236.9790 1.0027 ⋯\n", " ϵ_draw[7] 0.7239 0.2318 0.0134 0.0221 125.7547 1.0008 ⋯\n", " ϵ_draw[8] -1.0306 0.2295 0.0132 0.0222 92.2911 1.0086 ⋯\n", " ϵ_draw[9] 0.2083 0.2068 0.0119 0.0103 302.6394 0.9968 ⋯\n", " ϵ_draw[10] 0.0971 0.1909 0.0110 0.0130 259.3403 1.0017 ⋯\n", " ϵ_draw[11] 0.9924 0.2447 0.0141 0.0209 147.6565 1.0076 ⋯\n", " ϵ_draw[12] -0.3877 0.2087 0.0121 0.0134 271.4525 0.9967 ⋯\n", " ϵ_draw[13] 0.2118 0.2060 0.0119 0.0142 255.3336 1.0000 ⋯\n", " ϵ_draw[14] -0.3364 0.2150 0.0124 0.0112 232.4814 0.9986 ⋯\n", " ϵ_draw[15] -1.0031 0.2613 0.0151 0.0257 62.9911 1.0307 ⋯\n", " ϵ_draw[16] -1.7959 0.3273 0.0189 0.0314 79.7611 1.0077 ⋯\n", " ϵ_draw[17] 0.0039 0.1958 0.0113 0.0146 207.4630 1.0076 ⋯\n", " ϵ_draw[18] -0.4767 0.1978 0.0114 0.0138 205.4055 0.9997 ⋯\n", " ϵ_draw[19] -0.1388 0.2154 0.0124 0.0199 223.3459 0.9970 ⋯\n", " ϵ_draw[20] -0.0447 0.2332 0.0135 0.0188 206.5141 1.0035 ⋯\n", " ϵ_draw[21] 0.0016 0.7230 0.0417 0.0844 43.5613 1.0244 ⋯\n", "\u001b[36m 1 column omitted\u001b[0m\n", "\n", "Quantiles\n", " \u001b[1m parameters \u001b[0m \u001b[1m 2.5% \u001b[0m \u001b[1m 25.0% \u001b[0m \u001b[1m 50.0% \u001b[0m \u001b[1m 75.0% \u001b[0m \u001b[1m 97.5% \u001b[0m\n", " \u001b[90m Symbol \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m \u001b[90m Float64 \u001b[0m\n", "\n", " α 0.4921 0.5140 0.5252 0.5375 0.5587\n", " β 0.9349 0.9442 0.9479 0.9513 0.9580\n", " ϵ_draw[1] 2.4768 2.9778 3.2626 3.6505 4.5892\n", " ϵ_draw[2] -0.1108 0.1226 0.2431 0.3856 0.6346\n", " ϵ_draw[3] 0.3850 0.5965 0.7473 0.9134 1.2168\n", " ϵ_draw[4] -0.7323 -0.4438 -0.3015 -0.1730 0.0512\n", " ϵ_draw[5] 0.0843 0.3569 0.4806 0.6170 0.9349\n", " ϵ_draw[6] -0.1683 0.0915 0.2181 0.3569 0.6012\n", " ϵ_draw[7] 0.3236 0.5608 0.7098 0.8609 1.2712\n", " ϵ_draw[8] -1.5092 -1.1558 -1.0130 -0.8786 -0.6073\n", " ϵ_draw[9] -0.1816 0.0750 0.2114 0.3291 0.6813\n", " ϵ_draw[10] -0.2711 -0.0368 0.0946 0.2235 0.4602\n", " ϵ_draw[11] 0.5584 0.8224 0.9656 1.1663 1.4695\n", " ϵ_draw[12] -0.7866 -0.5384 -0.3778 -0.2229 -0.0195\n", " ϵ_draw[13] -0.1534 0.0597 0.2070 0.3587 0.6425\n", " ϵ_draw[14] -0.7686 -0.4789 -0.3317 -0.1854 0.0814\n", " ϵ_draw[15] -1.5411 -1.1619 -0.9703 -0.8259 -0.5259\n", " ϵ_draw[16] -2.5215 -1.9819 -1.7608 -1.5387 -1.2839\n", " ϵ_draw[17] -0.3643 -0.1147 -0.0020 0.1181 0.3983\n", " ϵ_draw[18] -0.8702 -0.5914 -0.4742 -0.3403 -0.1091\n", " ϵ_draw[19] -0.5752 -0.2878 -0.1399 0.0075 0.3161\n", " ϵ_draw[20] -0.5372 -0.1825 -0.0365 0.0931 0.4135\n", " ϵ_draw[21] -1.4457 -0.5054 0.0329 0.5084 1.3862\n" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Turing model definition\n", "@model function rbc_2_joint(z, m, p_f, cache, settings)\n", " α ~ Uniform(0.2, 0.8)\n", " β ~ Uniform(0.5, 0.99)\n", " p_d = (; α, β)\n", " T = size(z, 2)\n", " ϵ_draw ~ MvNormal(m.n_ϵ * T, 1.0) # add noise to the estimation\n", " ϵ = reshape(ϵ_draw, m.n_ϵ, T)\n", " sol = generate_perturbation(m, p_d, p_f, Val(2); cache, settings)\n", " if !(sol.retcode == :Success)\n", " @addlogprob! -Inf\n", " return\n", " end\n", " problem = QuadraticStateSpaceProblem(sol, zeros(2), (0, T), observables = z, noise=ϵ)\n", " @addlogprob! solve(problem, DirectIteration()).logpdf\n", "end\n", "cache = SolverCache(model_rbc, Val(2), [:α, :β])\n", "settings = PerturbationSolverSettings(; print_level = 0)\n", "z = z_rbc # simulated in previous steps\n", "p_f = (ρ = 0.2, δ = 0.02, σ = 0.01, Ω_1 = 0.01) # Fixed parameters\n", "turing_model = rbc_2_joint(z, model_rbc, p_f, cache, settings) # passing observables from before \n", "\n", "n_samples = 300\n", "n_adapts = 50\n", "δ = 0.65\n", "alg = NUTS(n_adapts,δ)\n", "chain_2_joint = sample(turing_model, alg, n_samples; progress = true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting the estimated latent states\n", "Below is a graph which shows how well the high-dimensional sampled latents do relative to the pseudo-true latent values." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "symbol_to_int(s) = parse(Int, string(s)[9:end-1])\n", "ϵ_chain = sort(chain_1_joint[:, [Symbol(\"ϵ_draw[$a]\") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))\n", "tmp = describe(ϵ_chain)\n", "ϵ_mean = tmp[1][:, 2]\n", "ϵ_std = tmp[1][:, 3]\n", "plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label=\"Posterior mean\", title = \"First-Order Joint: Estimated Latents\")\n", "plot!(ϵ', label=\"True values\")" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ϵ_chain = sort(chain_2_joint[:, [Symbol(\"ϵ_draw[$a]\") for a in 1:21], 1], lt = (x,y) -> symbol_to_int(x) < symbol_to_int(y))\n", "tmp = describe(ϵ_chain)\n", "ϵ_mean = tmp[1][:, 2]\n", "ϵ_std = tmp[1][:, 3]\n", "plot(ϵ_mean[2:end], ribbon=2 * ϵ_std[2:end], label=\"Posterior mean\", title = \"Second-Order Joint: Estimated Latents\")\n", "plot!(ϵ', label=\"True values\")" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.7.2", "language": "julia", "name": "julia-1.7" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.7.2" } }, "nbformat": 4, "nbformat_minor": 4 }