{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "91e845f5",
   "metadata": {},
   "source": [
    "\n",
    "<a id='multivariate-normal-v11'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d4fec87",
   "metadata": {},
   "source": [
    "# Multivariate Normal Distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a69db931",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "- [Multivariate Normal Distribution](#Multivariate-Normal-Distribution)  \n",
    "  - [Overview](#Overview)  \n",
    "  - [The Multivariate Normal Distribution](#The-Multivariate-Normal-Distribution)  \n",
    "  - [Bivariate Example](#Bivariate-Example)  \n",
    "  - [Trivariate Example](#Trivariate-Example)  \n",
    "  - [One Dimensional Intelligence (IQ)](#One-Dimensional-Intelligence-%28IQ%29)  \n",
    "  - [Information as Surprise](#Information-as-Surprise)  \n",
    "  - [Cholesky Factor Magic](#Cholesky-Factor-Magic)  \n",
    "  - [Math and Verbal  Intelligence](#Math-and-Verbal--Intelligence)  \n",
    "  - [Univariate Time Series Analysis](#Univariate-Time-Series-Analysis)  \n",
    "  - [Stochastic Difference Equation](#Stochastic-Difference-Equation)  \n",
    "  - [Application to Stock Price Model](#Application-to-Stock-Price-Model)  \n",
    "  - [Filtering Foundations](#Filtering-Foundations)  \n",
    "  - [Classic Factor Analysis Model](#Classic-Factor-Analysis-Model)  \n",
    "  - [PCA and Factor Analysis](#PCA-and-Factor-Analysis)  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de5f9a1a",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This lecture describes a workhorse in probability theory, statistics, and economics, namely,\n",
    "the **multivariate normal distribution**.\n",
    "\n",
    "In this lecture, you will learn formulas for\n",
    "\n",
    "- the joint distribution of a random vector $ x $ of length $ N $  \n",
    "- marginal distributions for all subvectors of $ x $  \n",
    "- conditional distributions for subvectors of $ x $ conditional on other subvectors of $ x $  \n",
    "\n",
    "\n",
    "We will use  the multivariate normal distribution to formulate some useful models:\n",
    "\n",
    "- a factor analytic model of an intelligence quotient, i.e., IQ  \n",
    "- a factor analytic model of two independent inherent abilities, say, mathematical and verbal.  \n",
    "- a more general factor analytic model  \n",
    "- Principal Components Analysis (PCA) as an approximation to a factor analytic model  \n",
    "- time series generated by linear stochastic difference equations  \n",
    "- optimal linear filtering theory  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0072a273",
   "metadata": {},
   "source": [
    "## The Multivariate Normal Distribution\n",
    "\n",
    "This lecture defines a Python class `MultivariateNormal` to be used\n",
    "to generate **marginal** and **conditional** distributions associated\n",
    "with a multivariate normal distribution.\n",
    "\n",
    "For a multivariate normal distribution it is very convenient that\n",
    "\n",
    "- conditional expectations equal linear least squares projections  \n",
    "- conditional distributions are characterized by multivariate linear\n",
    "  regressions  \n",
    "\n",
    "\n",
    "We apply our Python class to some examples.\n",
    "\n",
    "We  use the following imports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1fc237b6",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams[\"figure.figsize\"] = (11, 5)  #set default figure size\n",
    "import numpy as np\n",
    "from numba import njit\n",
    "import statsmodels.api as sm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c372c2d",
   "metadata": {},
   "source": [
    "Assume that an $ N \\times 1 $ random vector $ z $ has a\n",
    "multivariate normal probability density.\n",
    "\n",
    "This means that the probability density takes the form\n",
    "\n",
    "$$\n",
    "f\\left(z;\\mu,\\Sigma\\right)=\\left(2\\pi\\right)^{-\\left(\\frac{N}{2}\\right)}\\det\\left(\\Sigma\\right)^{-\\frac{1}{2}}\\exp\\left(-.5\\left(z-\\mu\\right)^{\\prime}\\Sigma^{-1}\\left(z-\\mu\\right)\\right)\n",
    "$$\n",
    "\n",
    "where $ \\mu=Ez $ is the mean of the random vector $ z $ and\n",
    "$ \\Sigma=E\\left(z-\\mu\\right)\\left(z-\\mu\\right)^\\prime $ is the\n",
    "covariance matrix of $ z $.\n",
    "\n",
    "The covariance matrix $ \\Sigma $ is symmetric and positive definite."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91985e81",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@njit\n",
    "def f(z, μ, Σ):\n",
    "    \"\"\"\n",
    "    The density function of multivariate normal distribution.\n",
    "\n",
    "    Parameters\n",
    "    ---------------\n",
    "    z: ndarray(float, dim=2)\n",
    "        random vector, N by 1\n",
    "    μ: ndarray(float, dim=1 or 2)\n",
    "        the mean of z, N by 1\n",
    "    Σ: ndarray(float, dim=2)\n",
    "        the covarianece matrix of z, N by 1\n",
    "    \"\"\"\n",
    "\n",
    "    z = np.atleast_2d(z)\n",
    "    μ = np.atleast_2d(μ)\n",
    "    Σ = np.atleast_2d(Σ)\n",
    "\n",
    "    N = z.size\n",
    "\n",
    "    temp1 = np.linalg.det(Σ) ** (-1/2)\n",
    "    temp2 = np.exp(-.5 * (z - μ).T @ np.linalg.inv(Σ) @ (z - μ))\n",
    "\n",
    "    return (2 * np.pi) ** (-N/2) * temp1 * temp2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15747142",
   "metadata": {},
   "source": [
    "For some integer $ k\\in \\{1,\\dots, N-1\\} $, partition\n",
    "$ z $ as\n",
    "\n",
    "$$\n",
    "z=\\left[\\begin{array}{c} z_{1}\\\\ z_{2} \\end{array}\\right],\n",
    "$$\n",
    "\n",
    "where $ z_1 $ is an $ \\left(N-k\\right)\\times1 $ vector and $ z_2 $\n",
    "is a $ k\\times1 $ vector.\n",
    "\n",
    "Let\n",
    "\n",
    "$$\n",
    "\\mu=\\left[\\begin{array}{c}\n",
    "\\mu_{1}\\\\\n",
    "\\mu_{2}\n",
    "\\end{array}\\right],\\quad\\Sigma=\\left[\\begin{array}{cc}\n",
    "\\Sigma_{11} & \\Sigma_{12}\\\\\n",
    "\\Sigma_{21} & \\Sigma_{22}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "be corresponding partitions of $ \\mu $ and $ \\Sigma $.\n",
    "\n",
    "The **marginal** distribution of $ z_1 $ is\n",
    "\n",
    "- multivariate normal with mean $ \\mu_1 $ and covariance matrix\n",
    "  $ \\Sigma_{11} $.  \n",
    "\n",
    "\n",
    "The **marginal** distribution of $ z_2 $ is\n",
    "\n",
    "- multivariate normal with mean $ \\mu_2 $ and covariance matrix\n",
    "  $ \\Sigma_{22} $.  \n",
    "\n",
    "\n",
    "The distribution of $ z_1 $ **conditional** on $ z_2 $ is\n",
    "\n",
    "- multivariate normal with mean  \n",
    "\n",
    "\n",
    "$$\n",
    "\\hat{\\mu}_1 = \\mu_1 + \\beta \\left(z_2 -\\mu_2\\right)\n",
    "$$\n",
    "\n",
    "and covariance matrix\n",
    "\n",
    "$$\n",
    "\\hat{\\Sigma}_{11}=\\Sigma_{11}-\\Sigma_{12}\\Sigma_{22}^{-1}\\Sigma_{21}=\\Sigma_{11}-\\beta\\Sigma_{22}\\beta^{\\prime}\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "\\beta = \\Sigma_{12}\\Sigma_{22}^{-1}\n",
    "$$\n",
    "\n",
    "is an $ \\left(N-k\\right) \\times k $ matrix of **population\n",
    "regression coefficients** of the $ (N -k) \\times 1 $ random vector $ z_1 - \\mu_1 $ on the $ k \\times 1 $ random vector $ z_2 - \\mu_2 $.\n",
    "\n",
    "The following class constructs a multivariate normal distribution\n",
    "instance with two methods.\n",
    "\n",
    "- a method `partition` computes $ \\beta $, taking $ k $ as an\n",
    "  input  \n",
    "- a method `cond_dist` computes either the distribution of\n",
    "  $ z_1 $ conditional on $ z_2 $ or the distribution of\n",
    "  $ z_2 $ conditional on $ z_1 $  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "027665a4",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "class MultivariateNormal:\n",
    "    \"\"\"\n",
    "    Class of multivariate normal distribution.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    μ: ndarray(float, dim=1)\n",
    "        the mean of z, N by 1\n",
    "    Σ: ndarray(float, dim=2)\n",
    "        the covarianece matrix of z, N by 1\n",
    "\n",
    "    Arguments\n",
    "    ---------\n",
    "    μ, Σ:\n",
    "        see parameters\n",
    "    μs: list(ndarray(float, dim=1))\n",
    "        list of mean vectors μ1 and μ2 in order\n",
    "    Σs: list(list(ndarray(float, dim=2)))\n",
    "        2 dimensional list of covariance matrices\n",
    "        Σ11, Σ12, Σ21, Σ22 in order\n",
    "    βs: list(ndarray(float, dim=1))\n",
    "        list of regression coefficients β1 and β2 in order\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, μ, Σ):\n",
    "        \"initialization\"\n",
    "        self.μ = np.array(μ)\n",
    "        self.Σ = np.atleast_2d(Σ)\n",
    "\n",
    "    def partition(self, k):\n",
    "        \"\"\"\n",
    "        Given k, partition the random vector z into a size k vector z1\n",
    "        and a size N-k vector z2. Partition the mean vector μ into\n",
    "        μ1 and μ2, and the covariance matrix Σ into Σ11, Σ12, Σ21, Σ22\n",
    "        correspondingly. Compute the regression coefficients β1 and β2\n",
    "        using the partitioned arrays.\n",
    "        \"\"\"\n",
    "        μ = self.μ\n",
    "        Σ = self.Σ\n",
    "\n",
    "        self.μs = [μ[:k], μ[k:]]\n",
    "        self.Σs = [[Σ[:k, :k], Σ[:k, k:]],\n",
    "                   [Σ[k:, :k], Σ[k:, k:]]]\n",
    "\n",
    "        self.βs = [self.Σs[0][1] @ np.linalg.inv(self.Σs[1][1]),\n",
    "                   self.Σs[1][0] @ np.linalg.inv(self.Σs[0][0])]\n",
    "\n",
    "    def cond_dist(self, ind, z):\n",
    "        \"\"\"\n",
    "        Compute the conditional distribution of z1 given z2, or reversely.\n",
    "        Argument ind determines whether we compute the conditional\n",
    "        distribution of z1 (ind=0) or z2 (ind=1).\n",
    "\n",
    "        Returns\n",
    "        ---------\n",
    "        μ_hat: ndarray(float, ndim=1)\n",
    "            The conditional mean of z1 or z2.\n",
    "        Σ_hat: ndarray(float, ndim=2)\n",
    "            The conditional covariance matrix of z1 or z2.\n",
    "        \"\"\"\n",
    "        β = self.βs[ind]\n",
    "        μs = self.μs\n",
    "        Σs = self.Σs\n",
    "\n",
    "        μ_hat = μs[ind] + β @ (z - μs[1-ind])\n",
    "        Σ_hat = Σs[ind][ind] - β @ Σs[1-ind][1-ind] @ β.T\n",
    "\n",
    "        return μ_hat, Σ_hat"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5b06ff7",
   "metadata": {},
   "source": [
    "Let’s put this code to work on a suite of examples.\n",
    "\n",
    "We begin with a simple bivariate example; after that we’ll turn to a\n",
    "trivariate example.\n",
    "\n",
    "We’ll compute population moments of some conditional distributions using\n",
    "our `MultivariateNormal` class.\n",
    "\n",
    "For fun we’ll also compute sample analogs of the associated population\n",
    "regressions by generating simulations and then computing linear least\n",
    "squares regressions.\n",
    "\n",
    "We’ll compare those linear least squares regressions for the simulated\n",
    "data to their population counterparts."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56436525",
   "metadata": {},
   "source": [
    "## Bivariate Example\n",
    "\n",
    "We start with a bivariate normal distribution pinned down by\n",
    "\n",
    "$$\n",
    "\\mu=\\left[\\begin{array}{c}\n",
    ".5 \\\\\n",
    "1.0\n",
    "\\end{array}\\right],\\quad\\Sigma=\\left[\\begin{array}{cc}\n",
    "1 & .5\\\\\n",
    ".5 & 1\n",
    "\\end{array}\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fd7856f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ = np.array([.5, 1.])\n",
    "Σ = np.array([[1., .5], [.5 ,1.]])\n",
    "\n",
    "# construction of the multivariate normal instance\n",
    "multi_normal = MultivariateNormal(μ, Σ)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f6808b7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "k = 1 # choose partition\n",
    "\n",
    "# partition and compute regression coefficients\n",
    "multi_normal.partition(k)\n",
    "multi_normal.βs[0],multi_normal.βs[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8696e3d",
   "metadata": {},
   "source": [
    "Let’s illustrate the fact that you *can regress anything on anything else*.\n",
    "\n",
    "We have computed everything we need to compute two regression lines, one of $ z_2 $ on $ z_1 $, the other of $ z_1 $ on $ z_2 $.\n",
    "\n",
    "We’ll represent  these regressions as\n",
    "\n",
    "$$\n",
    "z_1 = a_1 + b_1 z_2 + \\epsilon_1\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "z_2 = a_2 + b_2 z_1 + \\epsilon_2\n",
    "$$\n",
    "\n",
    "where we have the population least squares orthogonality conditions\n",
    "\n",
    "$$\n",
    "E \\epsilon_1 z_2 = 0\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "E \\epsilon_2 z_1 = 0\n",
    "$$\n",
    "\n",
    "Let’s  compute $ a_1, a_2, b_1, b_2 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3a91d691",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "beta = multi_normal.βs\n",
    "\n",
    "a1 = μ[0] - beta[0]*μ[1]\n",
    "b1 = beta[0]\n",
    "\n",
    "a2 = μ[1] - beta[1]*μ[0]\n",
    "b2 = beta[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fd44ed4b",
   "metadata": {},
   "source": [
    "Let’s print out the intercepts and slopes.\n",
    "\n",
    "For the regression of $ z_1 $ on $ z_2 $ we have"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c94eddd0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "print (\"a1 = \", a1)\n",
    "print (\"b1 = \", b1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0eac4edf",
   "metadata": {},
   "source": [
    "For the regression of $ z_2 $ on $ z_1 $ we have"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4f6afad",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "print (\"a2 = \", a2)\n",
    "print (\"b2 = \", b2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b490d9d4",
   "metadata": {},
   "source": [
    "Now let’s plot the two regression lines and stare at them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eaf206b4",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "z2 = np.linspace(-4,4,100)\n",
    "\n",
    "\n",
    "a1 = np.squeeze(a1)\n",
    "b1 = np.squeeze(b1)\n",
    "\n",
    "a2 = np.squeeze(a2)\n",
    "b2 = np.squeeze(b2)\n",
    "\n",
    "z1  = b1*z2 + a1\n",
    "\n",
    "\n",
    "z1h = z2/b2 - a2/b2\n",
    "\n",
    "\n",
    "fig = plt.figure(figsize=(12,12))\n",
    "ax = fig.add_subplot(1, 1, 1)\n",
    "ax.set(xlim=(-4, 4), ylim=(-4, 4))\n",
    "ax.spines['left'].set_position('center')\n",
    "ax.spines['bottom'].set_position('zero')\n",
    "ax.spines['right'].set_color('none')\n",
    "ax.spines['top'].set_color('none')\n",
    "ax.xaxis.set_ticks_position('bottom')\n",
    "ax.yaxis.set_ticks_position('left')\n",
    "plt.ylabel('$z_1$', loc = 'top')\n",
    "plt.xlabel('$z_2$,', loc = 'right')\n",
    "plt.title('two regressions')\n",
    "plt.plot(z2,z1, 'r', label = \"$z_1$ on $z_2$\")\n",
    "plt.plot(z2,z1h, 'b', label = \"$z_2$ on $z_1$\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "756e516c",
   "metadata": {},
   "source": [
    "The red line is the  expectation of $ z_1 $ conditional on $ z_2 $.\n",
    "\n",
    "The intercept and slope of the red line are"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb24e113",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "print(\"a1 = \", a1)\n",
    "print(\"b1 = \", b1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "92f35a87",
   "metadata": {},
   "source": [
    "The blue line is the expectation of $ z_2 $ conditional on $ z_1 $.\n",
    "\n",
    "The intercept and slope of the blue line are"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "527cf395",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "print(\"-a2/b2 = \", - a2/b2)\n",
    "print(\"1/b2 = \", 1/b2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76118b31",
   "metadata": {},
   "source": [
    "We can use these regression lines or our code to compute conditional expectations.\n",
    "\n",
    "Let’s  compute the mean and variance of the distribution of $ z_2 $\n",
    "conditional on $ z_1=5 $.\n",
    "\n",
    "After that we’ll reverse what are on the left and right sides of the regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bb4e499b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute the cond. dist. of z1\n",
    "ind = 1\n",
    "z1 = np.array([5.]) # given z1\n",
    "\n",
    "μ2_hat, Σ2_hat = multi_normal.cond_dist(ind, z1)\n",
    "print('μ2_hat, Σ2_hat = ', μ2_hat, Σ2_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43ed887c",
   "metadata": {},
   "source": [
    "Now let’s compute the mean and variance of the distribution of $ z_1 $\n",
    "conditional on $ z_2=5 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f9f4514",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute the cond. dist. of z1\n",
    "ind = 0\n",
    "z2 = np.array([5.]) # given z2\n",
    "\n",
    "μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)\n",
    "print('μ1_hat, Σ1_hat = ', μ1_hat, Σ1_hat)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aeb8a4ba",
   "metadata": {},
   "source": [
    "Let’s compare the preceding population mean and variance with outcomes\n",
    "from drawing a large sample and then regressing $ z_1 - \\mu_1 $ on\n",
    "$ z_2 - \\mu_2 $.\n",
    "\n",
    "We know that\n",
    "\n",
    "$$\n",
    "E z_1 | z_2 = \\left(\\mu_1 - \\beta \\mu_2 \\right) + \\beta z_2\n",
    "$$\n",
    "\n",
    "which can be arranged to\n",
    "\n",
    "$$\n",
    "z_1 - \\mu_1 = \\beta \\left( z_2 - \\mu_2 \\right) + \\epsilon,\n",
    "$$\n",
    "\n",
    "We anticipate that for larger and larger sample sizes, estimated OLS\n",
    "coefficients will converge to $ \\beta $ and the estimated variance\n",
    "of $ \\epsilon $ will converge to $ \\hat{\\Sigma}_1 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74ff2202",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "n = 1_000_000 # sample size\n",
    "\n",
    "# simulate multivariate normal random vectors\n",
    "data = np.random.multivariate_normal(μ, Σ, size=n)\n",
    "z1_data = data[:, 0]\n",
    "z2_data = data[:, 1]\n",
    "\n",
    "# OLS regression\n",
    "μ1, μ2 = multi_normal.μs\n",
    "results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b1bcf92",
   "metadata": {},
   "source": [
    "Let’s compare the preceding population $ \\beta $ with the OLS sample\n",
    "estimate on $ z_2 - \\mu_2 $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ece8d89",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal.βs[0], results.params"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff6f5276",
   "metadata": {},
   "source": [
    "Let’s compare our population $ \\hat{\\Sigma}_1 $ with the\n",
    "degrees-of-freedom adjusted estimate of the variance of $ \\epsilon $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9fb562dd",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Σ1_hat, results.resid @ results.resid.T / (n - 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b3e503b",
   "metadata": {},
   "source": [
    "Lastly, let’s compute the estimate of $ \\hat{E z_1 | z_2} $ and\n",
    "compare it with $ \\hat{\\mu}_1 $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae3aa870",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ1_hat, results.predict(z2 - μ2) + μ1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f5fc0f1",
   "metadata": {},
   "source": [
    "Thus, in each case, for our very large sample size, the sample analogues\n",
    "closely approximate their population counterparts.\n",
    "\n",
    "A Law of Large\n",
    "Numbers explains why  sample analogues approximate  population objects."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82cb745d",
   "metadata": {},
   "source": [
    "## Trivariate Example\n",
    "\n",
    "Let’s apply our code to a trivariate example.\n",
    "\n",
    "We’ll specify the mean vector and the covariance matrix as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4861f880",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ = np.random.random(3)\n",
    "C = np.random.random((3, 3))\n",
    "Σ = C @ C.T # positive semi-definite\n",
    "\n",
    "multi_normal = MultivariateNormal(μ, Σ)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5365562e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ, Σ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc8a9d87",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "k = 1\n",
    "multi_normal.partition(k)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66396e8f",
   "metadata": {},
   "source": [
    "Let’s compute the distribution of $ z_1 $ conditional on\n",
    "$ z_{2}=\\left[\\begin{array}{c} 2\\\\ 5 \\end{array}\\right] $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98a2525a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ind = 0\n",
    "z2 = np.array([2., 5.])\n",
    "\n",
    "μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5367ceb6",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "n = 1_000_000\n",
    "data = np.random.multivariate_normal(μ, Σ, size=n)\n",
    "z1_data = data[:, :k]\n",
    "z2_data = data[:, k:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f7bf25f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ1, μ2 = multi_normal.μs\n",
    "results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c252512d",
   "metadata": {},
   "source": [
    "As above, we compare population and sample regression coefficients, the\n",
    "conditional covariance matrix, and the conditional mean vector in that\n",
    "order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ccc3c40",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal.βs[0], results.params"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e24173f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Σ1_hat, results.resid @ results.resid.T / (n - 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6aea2e0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μ1_hat, results.predict(z2 - μ2) + μ1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41565269",
   "metadata": {},
   "source": [
    "Once again, sample analogues do a good job of approximating their\n",
    "populations counterparts."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88f59190",
   "metadata": {},
   "source": [
    "## One Dimensional Intelligence (IQ)\n",
    "\n",
    "Let’s move closer to a real-life example, namely, inferring a\n",
    "one-dimensional measure of intelligence called IQ from a list of test\n",
    "scores.\n",
    "\n",
    "The $ i $th test score $ y_i $ equals the sum of an unknown\n",
    "scalar IQ $ \\theta $ and a random variable $ w_{i} $.\n",
    "\n",
    "$$\n",
    "y_{i} = \\theta + \\sigma_y w_i, \\quad i=1,\\dots, n\n",
    "$$\n",
    "\n",
    "The distribution of IQ’s for a cross-section of people is a normal\n",
    "random variable described by\n",
    "\n",
    "$$\n",
    "\\theta = \\mu_{\\theta} + \\sigma_{\\theta} w_{n+1}.\n",
    "$$\n",
    "\n",
    "We assume that the noises $ \\{w_i\\}_{i=1}^N $  in the test scores are IID and not correlated with\n",
    "IQ.\n",
    "\n",
    "We also assume that $ \\{w_i\\}_{i=1}^{n+1} $ are i.i.d. standard\n",
    "normal:\n",
    "\n",
    "$$\n",
    "\\boldsymbol{w}=\n",
    "\\left[\\begin{array}{c}\n",
    "w_{1}\\\\\n",
    "w_{2}\\\\\n",
    "\\vdots\\\\\n",
    "w_{n}\\\\\n",
    "w_{n+1}\n",
    "\\end{array}\\right]\\sim N\\left(0,I_{n+1}\\right)\n",
    "$$\n",
    "\n",
    "The following system describes the $ (n+1) \\times 1 $ random vector $ X $ that\n",
    "interests us:\n",
    "\n",
    "$$\n",
    "X=\\left[\\begin{array}{c}\n",
    "y_{1}\\\\\n",
    "y_{2}\\\\\n",
    "\\vdots\\\\\n",
    "y_{n}\\\\\n",
    "\\theta\n",
    "\\end{array}\\right]=\\left[\\begin{array}{c}\n",
    "\\mu_{\\theta}\\\\\n",
    "\\mu_{\\theta}\\\\\n",
    "\\vdots\\\\\n",
    "\\mu_{\\theta}\\\\\n",
    "\\mu_{\\theta}\n",
    "\\end{array}\\right]+\\left[\\begin{array}{ccccc}\n",
    "\\sigma_{y} & 0 & \\cdots & 0 & \\sigma_{\\theta}\\\\\n",
    "0 & \\sigma_{y} & \\cdots & 0 & \\sigma_{\\theta}\\\\\n",
    "\\vdots & \\vdots & \\ddots & \\vdots & \\vdots\\\\\n",
    "0 & 0 & \\cdots & \\sigma_{y} & \\sigma_{\\theta}\\\\\n",
    "0 & 0 & \\cdots & 0 & \\sigma_{\\theta}\n",
    "\\end{array}\\right]\\left[\\begin{array}{c}\n",
    "w_{1}\\\\\n",
    "w_{2}\\\\\n",
    "\\vdots\\\\\n",
    "w_{n}\\\\\n",
    "w_{n+1}\n",
    "\\end{array}\\right],\n",
    "$$\n",
    "\n",
    "or equivalently,\n",
    "\n",
    "$$\n",
    "X=\\mu_{\\theta}\\boldsymbol{1}_{n+1}+D\\boldsymbol{w}\n",
    "$$\n",
    "\n",
    "where $ X = \\begin{bmatrix} y \\cr \\theta \\end{bmatrix} $,\n",
    "$ \\boldsymbol{1}_{n+1} $ is a vector of $ 1 $s of size\n",
    "$ n+1 $, and $ D $ is an $ n+1 $ by $ n+1 $ matrix.\n",
    "\n",
    "Let’s define a Python function that constructs the mean $ \\mu $ and\n",
    "covariance matrix $ \\Sigma $ of the random vector $ X $ that we\n",
    "know is governed by a multivariate normal distribution.\n",
    "\n",
    "As arguments, the function takes the number of tests $ n $, the mean\n",
    "$ \\mu_{\\theta} $ and the standard deviation $ \\sigma_\\theta $ of\n",
    "the IQ distribution, and the standard deviation of the randomness in\n",
    "test scores $ \\sigma_{y} $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a8ba299",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def construct_moments_IQ(n, μθ, σθ, σy):\n",
    "\n",
    "    μ_IQ = np.full(n+1, μθ)\n",
    "\n",
    "    D_IQ = np.zeros((n+1, n+1))\n",
    "    D_IQ[range(n), range(n)] = σy\n",
    "    D_IQ[:, n] = σθ\n",
    "\n",
    "    Σ_IQ = D_IQ @ D_IQ.T\n",
    "\n",
    "    return μ_IQ, Σ_IQ, D_IQ"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a3a8ca9",
   "metadata": {},
   "source": [
    "Now let’s consider a specific instance of this model.\n",
    "\n",
    "Assume we have recorded $ 50 $ test scores and we know that\n",
    "$ \\mu_{\\theta}=100 $, $ \\sigma_{\\theta}=10 $, and\n",
    "$ \\sigma_{y}=10 $.\n",
    "\n",
    "We can compute the mean vector and covariance matrix of $ X $ easily\n",
    "with our `construct_moments_IQ` function as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "08318b70",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "n = 50\n",
    "μθ, σθ, σy = 100., 10., 10.\n",
    "\n",
    "μ_IQ, Σ_IQ, D_IQ = construct_moments_IQ(n, μθ, σθ, σy)\n",
    "μ_IQ, Σ_IQ, D_IQ"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fdfaf26",
   "metadata": {},
   "source": [
    "We can now use our `MultivariateNormal` class to construct an\n",
    "instance, then partition the mean vector and covariance matrix as we\n",
    "wish.\n",
    "\n",
    "We want to regress IQ, the random variable $ \\theta $ (*what we don’t know*), on the vector $ y $  of test scores (*what we do know*).\n",
    "\n",
    "We choose `k=n` so that $ z_{1} = y $ and $ z_{2} = \\theta $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c190436",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_IQ = MultivariateNormal(μ_IQ, Σ_IQ)\n",
    "\n",
    "k = n\n",
    "multi_normal_IQ.partition(k)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fb3da1b",
   "metadata": {},
   "source": [
    "Using the generator `multivariate_normal`, we can make one draw of the\n",
    "random vector from our distribution and then compute the distribution of\n",
    "$ \\theta $ conditional on our test scores.\n",
    "\n",
    "Let’s do that and then print out some pertinent quantities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b21b795",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "x = np.random.multivariate_normal(μ_IQ, Σ_IQ)\n",
    "y = x[:-1] # test scores\n",
    "θ = x[-1]  # IQ"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5267e87",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# the true value\n",
    "θ"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b50ed16",
   "metadata": {},
   "source": [
    "The method `cond_dist` takes test scores $ y $ as input and returns the\n",
    "conditional normal distribution of the IQ $ \\theta $.\n",
    "\n",
    "In the following code, `ind` sets the variables on the right side of the regression.\n",
    "\n",
    "Given the way we have defined the vector $ X $, we want  to set `ind=1` in order to make $ \\theta $ the left side variable in the\n",
    "population regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1c52cd35",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ind = 1\n",
    "multi_normal_IQ.cond_dist(ind, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53ef8b40",
   "metadata": {},
   "source": [
    "The first number is the conditional mean $ \\hat{\\mu}_{\\theta} $ and\n",
    "the second is the conditional variance $ \\hat{\\Sigma}_{\\theta} $.\n",
    "\n",
    "How do additional test scores affect our inferences?\n",
    "\n",
    "To shed light on this, we compute a sequence of conditional\n",
    "distributions of $ \\theta $ by varying the number of test scores in\n",
    "the conditioning set from $ 1 $ to $ n $.\n",
    "\n",
    "We’ll make a pretty graph showing how our judgment of the person’s IQ\n",
    "change as more test results come in."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6bfef2ed",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# array for containing moments\n",
    "μθ_hat_arr = np.empty(n)\n",
    "Σθ_hat_arr = np.empty(n)\n",
    "\n",
    "# loop over number of test scores\n",
    "for i in range(1, n+1):\n",
    "    # construction of multivariate normal distribution instance\n",
    "    μ_IQ_i, Σ_IQ_i, D_IQ_i = construct_moments_IQ(i, μθ, σθ, σy)\n",
    "    multi_normal_IQ_i = MultivariateNormal(μ_IQ_i, Σ_IQ_i)\n",
    "\n",
    "    # partition and compute conditional distribution\n",
    "    multi_normal_IQ_i.partition(i)\n",
    "    scores_i = y[:i]\n",
    "    μθ_hat_i, Σθ_hat_i = multi_normal_IQ_i.cond_dist(1, scores_i)\n",
    "\n",
    "    # store the results\n",
    "    μθ_hat_arr[i-1] = μθ_hat_i[0]\n",
    "    Σθ_hat_arr[i-1] = Σθ_hat_i[0, 0]\n",
    "\n",
    "# transform variance to standard deviation\n",
    "σθ_hat_arr = np.sqrt(Σθ_hat_arr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c225b466",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μθ_hat_lower = μθ_hat_arr - 1.96 * σθ_hat_arr\n",
    "μθ_hat_higher = μθ_hat_arr + 1.96 * σθ_hat_arr\n",
    "\n",
    "plt.hlines(θ, 1, n+1, ls='--', label='true $θ$')\n",
    "plt.plot(range(1, n+1), μθ_hat_arr, color='b', label='$\\hat{μ}_{θ}$')\n",
    "plt.plot(range(1, n+1), μθ_hat_lower, color='b', ls='--')\n",
    "plt.plot(range(1, n+1), μθ_hat_higher, color='b', ls='--')\n",
    "plt.fill_between(range(1, n+1), μθ_hat_lower, μθ_hat_higher,\n",
    "                 color='b', alpha=0.2, label='95%')\n",
    "\n",
    "plt.xlabel('number of test scores')\n",
    "plt.ylabel('$\\hat{θ}$')\n",
    "plt.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2764266",
   "metadata": {},
   "source": [
    "The solid blue line in the plot above shows $ \\hat{\\mu}_{\\theta} $\n",
    "as a function of the number of test scores that we have recorded and\n",
    "conditioned on.\n",
    "\n",
    "The blue area shows the span that comes from adding or subtracting\n",
    "$ 1.96 \\hat{\\sigma}_{\\theta} $ from $ \\hat{\\mu}_{\\theta} $.\n",
    "\n",
    "Therefore, $ 95\\% $ of the probability mass of the conditional\n",
    "distribution falls in this range.\n",
    "\n",
    "The value of the random $ \\theta $ that we drew is shown by the\n",
    "black dotted line.\n",
    "\n",
    "As more and more test scores come in, our estimate of the person’s\n",
    "$ \\theta $ become more and more reliable.\n",
    "\n",
    "By staring at the changes in the conditional distributions, we see that\n",
    "adding more test scores makes $ \\hat{\\theta} $ settle down and\n",
    "approach $ \\theta $.\n",
    "\n",
    "Thus, each $ y_{i} $ adds information about $ \\theta $.\n",
    "\n",
    "If we were to drive the number of tests $ n \\rightarrow + \\infty $, the\n",
    "conditional standard deviation $ \\hat{\\sigma}_{\\theta} $ would\n",
    "converge to $ 0 $ at  rate $ \\frac{1}{n^{.5}} $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2711334a",
   "metadata": {},
   "source": [
    "## Information as Surprise\n",
    "\n",
    "By using a different representation, let’s look at things from a\n",
    "different perspective.\n",
    "\n",
    "We can represent the random vector $ X $ defined above as\n",
    "\n",
    "$$\n",
    "X = \\mu_{\\theta} \\boldsymbol{1}_{n+1} + C \\epsilon, \\quad \\epsilon \\sim N\\left(0, I\\right)\n",
    "$$\n",
    "\n",
    "where $ C $ is a lower triangular **Cholesky factor** of\n",
    "$ \\Sigma $ so that\n",
    "\n",
    "$$\n",
    "\\Sigma \\equiv DD^{\\prime} = C C^\\prime\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "E \\epsilon \\epsilon' = I .\n",
    "$$\n",
    "\n",
    "It follows that\n",
    "\n",
    "$$\n",
    "\\epsilon \\sim N(0, I) .\n",
    "$$\n",
    "\n",
    "Let $ G=C^{-1} $\n",
    "\n",
    "$ G $ is also lower triangular.\n",
    "\n",
    "We can compute $ \\epsilon $ from the formula\n",
    "\n",
    "$$\n",
    "\\epsilon = G \\left( X - \\mu_{\\theta} \\boldsymbol{1}_{n+1} \\right)\n",
    "$$\n",
    "\n",
    "This formula confirms that the orthonormal vector $ \\epsilon $\n",
    "contains the same information as the non-orthogonal vector\n",
    "$ \\left( X - \\mu_{\\theta} \\boldsymbol{1}_{n+1} \\right) $.\n",
    "\n",
    "We can say that $ \\epsilon $ is an orthogonal basis for\n",
    "$ \\left( X - \\mu_{\\theta} \\boldsymbol{1}_{n+1} \\right) $.\n",
    "\n",
    "Let $ c_{i} $ be the $ i $th element in the last row of\n",
    "$ C $.\n",
    "\n",
    "Then we can write\n",
    "\n",
    "\n",
    "<a id='equation-mnv-1'></a>\n",
    "$$\n",
    "\\theta = \\mu_{\\theta} + c_1 \\epsilon_1 + c_2 \\epsilon_2 + \\dots + c_n \\epsilon_n + c_{n+1} \\epsilon_{n+1} \\tag{12.1}\n",
    "$$\n",
    "\n",
    "The mutual orthogonality of the $ \\epsilon_i $’s provides us with an\n",
    "informative way to interpret them in light of equation [(12.1)](#equation-mnv-1).\n",
    "\n",
    "Thus, relative to what is known from tests $ i=1, \\ldots, n-1 $,\n",
    "$ c_i \\epsilon_i $ is the amount of **new information** about\n",
    "$ \\theta $ brought by the test number $ i $.\n",
    "\n",
    "Here **new information** means **surprise** or what could not be\n",
    "predicted from earlier information.\n",
    "\n",
    "Formula [(12.1)](#equation-mnv-1) also provides us with an enlightening way to express\n",
    "conditional means and conditional variances that we computed earlier.\n",
    "\n",
    "In particular,\n",
    "\n",
    "$$\n",
    "E\\left[\\theta \\mid y_1, \\dots, y_k\\right] = \\mu_{\\theta} + c_1 \\epsilon_1 + \\dots + c_k \\epsilon_k\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "Var\\left(\\theta \\mid y_1, \\dots, y_k\\right) = c^2_{k+1} + c^2_{k+2} + \\dots + c^2_{n+1}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3db2c282",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "C = np.linalg.cholesky(Σ_IQ)\n",
    "G = np.linalg.inv(C)\n",
    "\n",
    "ε = G @ (x - μθ)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09fc7ba0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "cε = C[n, :] * ε\n",
    "\n",
    "# compute the sequence of μθ and Σθ conditional on y1, y2, ..., yk\n",
    "μθ_hat_arr_C = np.array([np.sum(cε[:k+1]) for k in range(n)]) + μθ\n",
    "Σθ_hat_arr_C = np.array([np.sum(C[n, i+1:n+1] ** 2) for i in range(n)])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d11231dc",
   "metadata": {},
   "source": [
    "To confirm that these formulas give the same answers that we computed\n",
    "earlier, we can compare the means and variances of $ \\theta $\n",
    "conditional on $ \\{y_i\\}_{i=1}^k $ with what we obtained above using\n",
    "the formulas implemented in the class `MultivariateNormal` built on\n",
    "our original representation of conditional distributions for\n",
    "multivariate normal distributions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "028d054d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# conditional mean\n",
    "np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa497c27",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# conditional variance\n",
    "np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54180f5d",
   "metadata": {},
   "source": [
    "## Cholesky Factor Magic\n",
    "\n",
    "Evidently, the Cholesky factorizations automatically computes the\n",
    "population  **regression coefficients** and associated statistics\n",
    "that are produced by our `MultivariateNormal` class.\n",
    "\n",
    "The Cholesky factorization  computes these things **recursively**.\n",
    "\n",
    "Indeed, in formula [(12.1)](#equation-mnv-1),\n",
    "\n",
    "- the random variable $ c_i \\epsilon_i $ is information about\n",
    "  $ \\theta $ that is not contained by the information in\n",
    "  $ \\epsilon_1, \\epsilon_2, \\ldots, \\epsilon_{i-1} $  \n",
    "- the coefficient $ c_i $ is the simple population regression\n",
    "  coefficient of $ \\theta - \\mu_\\theta $ on $ \\epsilon_i $  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7a9108a",
   "metadata": {},
   "source": [
    "## Math and Verbal  Intelligence\n",
    "\n",
    "We can alter the preceding example to be more realistic.\n",
    "\n",
    "There is ample evidence that IQ is not a scalar.\n",
    "\n",
    "Some people are good in math skills but poor in language skills.\n",
    "\n",
    "Other people are good in language skills but poor in math skills.\n",
    "\n",
    "So now we shall assume that there are two dimensions of IQ,\n",
    "$ \\theta $ and $ \\eta $.\n",
    "\n",
    "These determine average performances in math and language tests,\n",
    "respectively.\n",
    "\n",
    "We observe math scores $ \\{y_i\\}_{i=1}^{n} $ and language scores\n",
    "$ \\{y_i\\}_{i=n+1}^{2n} $.\n",
    "\n",
    "When $ n=2 $, we assume that outcomes are draws from a multivariate\n",
    "normal distribution with representation\n",
    "\n",
    "$$\n",
    "X=\\left[\\begin{array}{c}\n",
    "y_{1}\\\\\n",
    "y_{2}\\\\\n",
    "y_{3}\\\\\n",
    "y_{4}\\\\\n",
    "\\theta\\\\\n",
    "\\eta\n",
    "\\end{array}\\right]=\\left[\\begin{array}{c}\n",
    "\\mu_{\\theta}\\\\\n",
    "\\mu_{\\theta}\\\\\n",
    "\\mu_{\\eta}\\\\\n",
    "\\mu_{\\eta}\\\\\n",
    "\\mu_{\\theta}\\\\\n",
    "\\mu_{\\eta}\n",
    "\\end{array}\\right]+\\left[\\begin{array}{cccccc}\n",
    "\\sigma_{y} & 0 & 0 & 0 & \\sigma_{\\theta} & 0\\\\\n",
    "0 & \\sigma_{y} & 0 & 0 & \\sigma_{\\theta} & 0\\\\\n",
    "0 & 0 & \\sigma_{y} & 0 & 0 & \\sigma_{\\eta}\\\\\n",
    "0 & 0 & 0 & \\sigma_{y} & 0 & \\sigma_{\\eta}\\\\\n",
    "0 & 0 & 0 & 0 & \\sigma_{\\theta} & 0\\\\\n",
    "0 & 0 & 0 & 0 & 0 & \\sigma_{\\eta}\n",
    "\\end{array}\\right]\\left[\\begin{array}{c}\n",
    "w_{1}\\\\\n",
    "w_{2}\\\\\n",
    "w_{3}\\\\\n",
    "w_{4}\\\\\n",
    "w_{5}\\\\\n",
    "w_{6}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "where\n",
    "$ w \\begin{bmatrix} w_1 \\cr w_2 \\cr \\vdots \\cr w_6 \\end{bmatrix} $\n",
    "is a standard normal random vector.\n",
    "\n",
    "We construct a Python function `construct_moments_IQ2d` to construct\n",
    "the mean vector and covariance matrix of the joint normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75c0a35e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy):\n",
    "\n",
    "    μ_IQ2d = np.empty(2*(n+1))\n",
    "    μ_IQ2d[:n] = μθ\n",
    "    μ_IQ2d[2*n] = μθ\n",
    "    μ_IQ2d[n:2*n] = μη\n",
    "    μ_IQ2d[2*n+1] = μη\n",
    "\n",
    "\n",
    "    D_IQ2d = np.zeros((2*(n+1), 2*(n+1)))\n",
    "    D_IQ2d[range(2*n), range(2*n)] = σy\n",
    "    D_IQ2d[:n, 2*n] = σθ\n",
    "    D_IQ2d[2*n, 2*n] = σθ\n",
    "    D_IQ2d[n:2*n, 2*n+1] = ση\n",
    "    D_IQ2d[2*n+1, 2*n+1] = ση\n",
    "\n",
    "    Σ_IQ2d = D_IQ2d @ D_IQ2d.T\n",
    "\n",
    "    return μ_IQ2d, Σ_IQ2d, D_IQ2d"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "769e366b",
   "metadata": {},
   "source": [
    "Let’s put the function to work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "673b97aa",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "n = 2\n",
    "# mean and variance of θ, η, and y\n",
    "μθ, σθ, μη, ση, σy = 100., 10., 100., 10, 10\n",
    "\n",
    "μ_IQ2d, Σ_IQ2d, D_IQ2d = construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy)\n",
    "μ_IQ2d, Σ_IQ2d, D_IQ2d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93b734d3",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# take one draw\n",
    "x = np.random.multivariate_normal(μ_IQ2d, Σ_IQ2d)\n",
    "y1 = x[:n]\n",
    "y2 = x[n:2*n]\n",
    "θ = x[2*n]\n",
    "η = x[2*n+1]\n",
    "\n",
    "# the true values\n",
    "θ, η"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec9ca5d",
   "metadata": {},
   "source": [
    "We first compute the joint normal distribution of\n",
    "$ \\left(\\theta, \\eta\\right) $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9aaf2c7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_IQ2d = MultivariateNormal(μ_IQ2d, Σ_IQ2d)\n",
    "\n",
    "k = 2*n # the length of data vector\n",
    "multi_normal_IQ2d.partition(k)\n",
    "\n",
    "multi_normal_IQ2d.cond_dist(1, [*y1, *y2])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c9e0d34",
   "metadata": {},
   "source": [
    "Now let’s compute distributions of $ \\theta $ and $ \\mu $\n",
    "separately conditional on various subsets of test scores.\n",
    "\n",
    "It will be fun to compare outcomes with the help of an auxiliary function\n",
    "`cond_dist_IQ2d` that we now construct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c493aa9",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def cond_dist_IQ2d(μ, Σ, data):\n",
    "\n",
    "    n = len(μ)\n",
    "\n",
    "    multi_normal = MultivariateNormal(μ, Σ)\n",
    "    multi_normal.partition(n-1)\n",
    "    μ_hat, Σ_hat = multi_normal.cond_dist(1, data)\n",
    "\n",
    "    return μ_hat, Σ_hat"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "122eea23",
   "metadata": {},
   "source": [
    "Let’s see how things work for an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e16d3410",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),\n",
    "                                ([*range(n), 2*n], 'θ', 'y1, y2'),\n",
    "                                ([*range(n, 2*n), 2*n], 'θ', 'y3, y4'),\n",
    "                                ([*range(2*n), 2*n+1], 'η', 'y1, y2, y3, y4'),\n",
    "                                ([*range(n), 2*n+1], 'η', 'y1, y2'),\n",
    "                                ([*range(n, 2*n), 2*n+1], 'η', 'y3, y4')]:\n",
    "\n",
    "    μ_hat, Σ_hat = cond_dist_IQ2d(μ_IQ2d[indices], Σ_IQ2d[indices][:, indices], x[indices[:-1]])\n",
    "    print(f'The mean and variance of {IQ} conditional on {conditions: <15} are ' +\n",
    "          f'{μ_hat[0]:1.2f} and {Σ_hat[0, 0]:1.2f} respectively')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63ac32b7",
   "metadata": {},
   "source": [
    "Evidently, math tests provide no information about $ \\mu $ and\n",
    "language tests provide no information about $ \\eta $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "062c2d08",
   "metadata": {},
   "source": [
    "## Univariate Time Series Analysis\n",
    "\n",
    "We can use the multivariate normal distribution and a little matrix\n",
    "algebra to present foundations of univariate linear time series\n",
    "analysis.\n",
    "\n",
    "Let $ x_t, y_t, v_t, w_{t+1} $ each be scalars for $ t \\geq 0 $.\n",
    "\n",
    "Consider the following model:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "x_0 & \\sim  N\\left(0, \\sigma_0^2\\right) \\\\\n",
    "x_{t+1} & = a x_{t} + b w_{t+1}, \\quad w_{t+1} \\sim N\\left(0, 1\\right), t \\geq 0  \\\\\n",
    "y_{t} & = c x_{t} + d v_{t}, \\quad v_{t} \\sim N\\left(0, 1\\right), t \\geq 0\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "We can compute the moments of $ x_{t} $\n",
    "\n",
    "1. $ E x_{t+1}^2 = a^2 E x_{t}^2 + b^2, t \\geq 0 $, where\n",
    "  $ E x_{0}^2 = \\sigma_{0}^2 $  \n",
    "1. $ E x_{t+j} x_{t} = a^{j} E x_{t}^2, \\forall t \\ \\forall j $  \n",
    "\n",
    "\n",
    "Given some $ T $, we can formulate the sequence\n",
    "$ \\{x_{t}\\}_{t=0}^T $ as a random vector\n",
    "\n",
    "$$\n",
    "X=\\left[\\begin{array}{c}\n",
    "x_{0}\\\\\n",
    "x_{1}\\\\\n",
    "\\vdots\\\\\n",
    "x_{T}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "and the covariance matrix $ \\Sigma_{x} $ can be constructed using\n",
    "the moments we have computed above.\n",
    "\n",
    "Similarly, we can define\n",
    "\n",
    "$$\n",
    "Y=\\left[\\begin{array}{c}\n",
    "y_{0}\\\\\n",
    "y_{1}\\\\\n",
    "\\vdots\\\\\n",
    "y_{T}\n",
    "\\end{array}\\right], \\quad\n",
    "v=\\left[\\begin{array}{c}\n",
    "v_{0}\\\\\n",
    "v_{1}\\\\\n",
    "\\vdots\\\\\n",
    "v_{T}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "and therefore\n",
    "\n",
    "$$\n",
    "Y = C X + D V\n",
    "$$\n",
    "\n",
    "where $ C $ and $ D $ are both diagonal matrices with constant\n",
    "$ c $ and $ d $ as diagonal respectively.\n",
    "\n",
    "Consequently, the covariance matrix of $ Y $ is\n",
    "\n",
    "$$\n",
    "\\Sigma_{y} = E Y Y^{\\prime} = C \\Sigma_{x} C^{\\prime} + D D^{\\prime}\n",
    "$$\n",
    "\n",
    "By stacking $ X $ and $ Y $, we can write\n",
    "\n",
    "$$\n",
    "Z=\\left[\\begin{array}{c}\n",
    "X\\\\\n",
    "Y\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "\\Sigma_{z} = EZZ^{\\prime}=\\left[\\begin{array}{cc}\n",
    "\\Sigma_{x} & \\Sigma_{x}C^{\\prime}\\\\\n",
    "C\\Sigma_{x} & \\Sigma_{y}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "Thus, the stacked sequences $ \\{x_{t}\\}_{t=0}^T $ and\n",
    "$ \\{y_{t}\\}_{t=0}^T $ jointly follow the multivariate normal\n",
    "distribution $ N\\left(0, \\Sigma_{z}\\right) $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de012d62",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# as an example, consider the case where T = 3\n",
    "T = 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbf1724a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# variance of the initial distribution x_0\n",
    "σ0 = 1.\n",
    "\n",
    "# parameters of the equation system\n",
    "a = .9\n",
    "b = 1.\n",
    "c = 1.0\n",
    "d = .05"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29a9d55e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct the covariance matrix of X\n",
    "Σx = np.empty((T+1, T+1))\n",
    "\n",
    "Σx[0, 0] = σ0 ** 2\n",
    "for i in range(T):\n",
    "    Σx[i, i+1:] = Σx[i, i] * a ** np.arange(1, T+1-i)\n",
    "    Σx[i+1:, i] = Σx[i, i+1:]\n",
    "\n",
    "    Σx[i+1, i+1] = a ** 2 * Σx[i, i] + b ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bb805c8",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Σx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a9f17d3d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct the covariance matrix of Y\n",
    "C = np.eye(T+1) * c\n",
    "D = np.eye(T+1) * d\n",
    "\n",
    "Σy = C @ Σx @ C.T + D @ D.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25d6073a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct the covariance matrix of Z\n",
    "Σz = np.empty((2*(T+1), 2*(T+1)))\n",
    "\n",
    "Σz[:T+1, :T+1] = Σx\n",
    "Σz[:T+1, T+1:] = Σx @ C.T\n",
    "Σz[T+1:, :T+1] = C @ Σx\n",
    "Σz[T+1:, T+1:] = Σy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c16b209b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Σz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff5e38a5",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct the mean vector of Z\n",
    "μz = np.zeros(2*(T+1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fe54d36",
   "metadata": {},
   "source": [
    "The following Python code lets us sample random vectors $ X $ and\n",
    "$ Y $.\n",
    "\n",
    "This is going to be very useful for doing the conditioning to be used in\n",
    "the fun exercises below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9e68358",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "z = np.random.multivariate_normal(μz, Σz)\n",
    "\n",
    "x = z[:T+1]\n",
    "y = z[T+1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5bec3b8",
   "metadata": {},
   "source": [
    "### Smoothing Example\n",
    "\n",
    "This is an instance of a classic `smoothing` calculation whose purpose\n",
    "is to compute $ E X \\mid Y $.\n",
    "\n",
    "An interpretation of this example is\n",
    "\n",
    "- $ X $ is a random sequence of hidden Markov state variables\n",
    "  $ x_t $  \n",
    "- $ Y $ is a sequence of observed signals $ y_t $ bearing\n",
    "  information about the hidden state  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31da8706",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct a MultivariateNormal instance\n",
    "multi_normal_ex1 = MultivariateNormal(μz, Σz)\n",
    "x = z[:T+1]\n",
    "y = z[T+1:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f9c6269",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# partition Z into X and Y\n",
    "multi_normal_ex1.partition(T+1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d73fefc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute the conditional mean and covariance matrix of X given Y=y\n",
    "\n",
    "print(\"X = \", x)\n",
    "print(\"Y = \", y)\n",
    "print(\" E [ X | Y] = \", )\n",
    "\n",
    "multi_normal_ex1.cond_dist(0, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a991623b",
   "metadata": {},
   "source": [
    "### Filtering Exercise\n",
    "\n",
    "Compute $ E\\left[x_{t} \\mid y_{t-1}, y_{t-2}, \\dots, y_{0}\\right] $.\n",
    "\n",
    "To do so, we need to first construct the mean vector and the covariance\n",
    "matrix of the subvector\n",
    "$ \\left[x_{t}, y_{0}, \\dots, y_{t-2}, y_{t-1}\\right] $.\n",
    "\n",
    "For example, let’s say that we want the conditional distribution of\n",
    "$ x_{3} $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1956eacb",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "t = 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e215346",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# mean of the subvector\n",
    "sub_μz = np.zeros(t+1)\n",
    "\n",
    "# covariance matrix of the subvector\n",
    "sub_Σz = np.empty((t+1, t+1))\n",
    "\n",
    "sub_Σz[0, 0] = Σz[t, t] # x_t\n",
    "sub_Σz[0, 1:] = Σz[t, T+1:T+t+1]\n",
    "sub_Σz[1:, 0] = Σz[T+1:T+t+1, t]\n",
    "sub_Σz[1:, 1:] = Σz[T+1:T+t+1, T+1:T+t+1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fadcc7b2",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "sub_Σz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93fa3fcf",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_ex2 = MultivariateNormal(sub_μz, sub_Σz)\n",
    "multi_normal_ex2.partition(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e596207",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "sub_y = y[:t]\n",
    "\n",
    "multi_normal_ex2.cond_dist(0, sub_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a463f50a",
   "metadata": {},
   "source": [
    "### Prediction Exercise\n",
    "\n",
    "Compute $ E\\left[y_{t} \\mid y_{t-j}, \\dots, y_{0} \\right] $.\n",
    "\n",
    "As what we did in exercise 2, we will construct the mean vector and\n",
    "covariance matrix of the subvector\n",
    "$ \\left[y_{t}, y_{0}, \\dots, y_{t-j-1}, y_{t-j} \\right] $.\n",
    "\n",
    "For example, we take a case in which $ t=3 $ and $ j=2 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef3bc2df",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "t = 3\n",
    "j = 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "746c3217",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "sub_μz = np.zeros(t-j+2)\n",
    "sub_Σz = np.empty((t-j+2, t-j+2))\n",
    "\n",
    "sub_Σz[0, 0] = Σz[T+t+1, T+t+1]\n",
    "sub_Σz[0, 1:] = Σz[T+t+1, T+1:T+t-j+2]\n",
    "sub_Σz[1:, 0] = Σz[T+1:T+t-j+2, T+t+1]\n",
    "sub_Σz[1:, 1:] = Σz[T+1:T+t-j+2, T+1:T+t-j+2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cd8c88a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "sub_Σz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cb2fc51",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_ex3 = MultivariateNormal(sub_μz, sub_Σz)\n",
    "multi_normal_ex3.partition(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b6122e4",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "sub_y = y[:t-j+1]\n",
    "\n",
    "multi_normal_ex3.cond_dist(0, sub_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bff181d",
   "metadata": {},
   "source": [
    "### Constructing a Wold Representation\n",
    "\n",
    "Now we’ll apply Cholesky decomposition to decompose\n",
    "$ \\Sigma_{y}=H H^{\\prime} $ and form\n",
    "\n",
    "$$\n",
    "\\epsilon = H^{-1} Y.\n",
    "$$\n",
    "\n",
    "Then we can represent $ y_{t} $ as\n",
    "\n",
    "$$\n",
    "y_{t} = h_{t,t} \\epsilon_{t} + h_{t,t-1} \\epsilon_{t-1} + \\dots + h_{t,0} \\epsilon_{0}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c355d5e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "H = np.linalg.cholesky(Σy)\n",
    "\n",
    "H"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "491ac361",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ε = np.linalg.inv(H) @ y\n",
    "\n",
    "ε"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9949c1c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56da4e62",
   "metadata": {},
   "source": [
    "This example is an instance of what is known as a **Wold representation** in time series analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0db8218",
   "metadata": {},
   "source": [
    "## Stochastic Difference Equation\n",
    "\n",
    "Consider the stochastic second-order linear difference equation\n",
    "\n",
    "$$\n",
    "y_{t} = \\alpha_{0} + \\alpha_{1} y_{y-1} + \\alpha_{2} y_{t-2} + u_{t}\n",
    "$$\n",
    "\n",
    "where $ u_{t} \\sim N \\left(0, \\sigma_{u}^{2}\\right) $ and\n",
    "\n",
    "$$\n",
    "\\left[\\begin{array}{c}\n",
    "y_{-1}\\\\\n",
    "y_{0}\n",
    "\\end{array}\\right]\\sim N\\left(\\mu_{\\tilde{y}},\\Sigma_{\\tilde{y}}\\right)\n",
    "$$\n",
    "\n",
    "It can be written as a stacked system\n",
    "\n",
    "$$\n",
    "\\underset{\\equiv A}{\\underbrace{\\left[\\begin{array}{cccccccc}\n",
    "1 & 0 & 0 & 0 & \\cdots & 0 & 0 & 0\\\\\n",
    "-\\alpha_{1} & 1 & 0 & 0 & \\cdots & 0 & 0 & 0\\\\\n",
    "-\\alpha_{2} & -\\alpha_{1} & 1 & 0 & \\cdots & 0 & 0 & 0\\\\\n",
    "0 & -\\alpha_{2} & -\\alpha_{1} & 1 & \\cdots & 0 & 0 & 0\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots & \\cdots & \\vdots & \\vdots & \\vdots\\\\\n",
    "0 & 0 & 0 & 0 & \\cdots & -\\alpha_{2} & -\\alpha_{1} & 1\n",
    "\\end{array}\\right]}}\\left[\\begin{array}{c}\n",
    "y_{1}\\\\\n",
    "y_{2}\\\\\n",
    "y_{3}\\\\\n",
    "y_{4}\\\\\n",
    "\\vdots\\\\\n",
    "y_{T}\n",
    "\\end{array}\\right]=\\underset{\\equiv b}{\\underbrace{\\left[\\begin{array}{c}\n",
    "\\alpha_{0}+\\alpha_{1}y_{0}+\\alpha_{2}y_{-1}\\\\\n",
    "\\alpha_{0}+\\alpha_{2}y_{0}\\\\\n",
    "\\alpha_{0}\\\\\n",
    "\\alpha_{0}\\\\\n",
    "\\vdots\\\\\n",
    "\\alpha_{0}\n",
    "\\end{array}\\right]}} +\\underset{\\equiv u}{\\underbrace{\\left[\\begin{array}{c}\n",
    "u_{1} \\\\\n",
    "u_2 \\\\\n",
    "u_3\\\\\n",
    "u_4\\\\\n",
    "\\vdots\\\\\n",
    "u_T\n",
    "\\end{array}\\right]}}\n",
    "$$\n",
    "\n",
    "We can compute $ y $ by solving the system\n",
    "\n",
    "$$\n",
    "y = A^{-1} \\left(b + u\\right)\n",
    "$$\n",
    "\n",
    "We have\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\mu_{y} = A^{-1} \\mu_{b} \\\\\n",
    "\\Sigma_{y} &= A^{-1} E \\left[\\left(b - \\mu_{b} + u \\right) \\left(b - \\mu_{b} + u \\right)^{\\prime}\\right] \\left(A^{-1}\\right)^{\\prime} \\\\\n",
    "           &= A^{-1} \\left(\\Sigma_{b} + \\Sigma_{u} \\right) \\left(A^{-1}\\right)^{\\prime}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "\\mu_{b}=\\left[\\begin{array}{c}\n",
    "\\alpha_{0}+\\alpha_{1}\\mu_{y_{0}}+\\alpha_{2}\\mu_{y_{-1}}\\\\\n",
    "\\alpha_{0}+\\alpha_{2}\\mu_{y_{0}}\\\\\n",
    "\\alpha_{0}\\\\\n",
    "\\vdots\\\\\n",
    "\\alpha_{0}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\Sigma_{b}=\\left[\\begin{array}{cc}\n",
    "C\\Sigma_{\\tilde{y}}C^{\\prime} & \\boldsymbol{0}_{N-2\\times N-2}\\\\\n",
    "\\boldsymbol{0}_{N-2\\times2} & \\boldsymbol{0}_{N-2\\times N-2}\n",
    "\\end{array}\\right],\\quad C=\\left[\\begin{array}{cc}\n",
    "\\alpha_{2} & \\alpha_{1}\\\\\n",
    "0 & \\alpha_{2}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\Sigma_{u}=\\left[\\begin{array}{cccc}\n",
    "\\sigma_{u}^{2} & 0 & \\cdots & 0\\\\\n",
    "0 & \\sigma_{u}^{2} & \\cdots & 0\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots\\\\\n",
    "0 & 0 & \\cdots & \\sigma_{u}^{2}\n",
    "\\end{array}\\right]\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b0e147a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# set parameters\n",
    "T = 80\n",
    "T = 160\n",
    "# coefficients of the second order difference equation\n",
    "𝛼0 = 10\n",
    "𝛼1 = 1.53\n",
    "𝛼2 = -.9\n",
    "\n",
    "# variance of u\n",
    "σu = 1.\n",
    "σu = 10.\n",
    "\n",
    "# distribution of y_{-1} and y_{0}\n",
    "μy_tilde = np.array([1., 0.5])\n",
    "Σy_tilde = np.array([[2., 1.], [1., 0.5]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac6f5fd6",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct A and A^{\\prime}\n",
    "A = np.zeros((T, T))\n",
    "\n",
    "for i in range(T):\n",
    "    A[i, i] = 1\n",
    "\n",
    "    if i-1 >= 0:\n",
    "        A[i, i-1] = -𝛼1\n",
    "\n",
    "    if i-2 >= 0:\n",
    "        A[i, i-2] = -𝛼2\n",
    "\n",
    "A_inv = np.linalg.inv(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1bac2c5",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute the mean vectors of b and y\n",
    "μb = np.full(T, 𝛼0)\n",
    "μb[0] += 𝛼1 * μy_tilde[1] + 𝛼2 * μy_tilde[0]\n",
    "μb[1] += 𝛼2 * μy_tilde[1]\n",
    "\n",
    "μy = A_inv @ μb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af923d16",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute the covariance matrices of b and y\n",
    "Σu = np.eye(T) * σu ** 2\n",
    "\n",
    "Σb = np.zeros((T, T))\n",
    "\n",
    "C = np.array([[𝛼2, 𝛼1], [0, 𝛼2]])\n",
    "Σb[:2, :2] = C @ Σy_tilde @ C.T\n",
    "\n",
    "Σy = A_inv @ (Σb + Σu) @ A_inv.T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d378dc3d",
   "metadata": {},
   "source": [
    "## Application to Stock Price Model\n",
    "\n",
    "Let\n",
    "\n",
    "$$\n",
    "p_{t} = \\sum_{j=0}^{T-t} \\beta^{j} y_{t+j}\n",
    "$$\n",
    "\n",
    "Form\n",
    "\n",
    "$$\n",
    "\\underset{\\equiv p}{\\underbrace{\\left[\\begin{array}{c}\n",
    "p_{1}\\\\\n",
    "p_{2}\\\\\n",
    "p_{3}\\\\\n",
    "\\vdots\\\\\n",
    "p_{T}\n",
    "\\end{array}\\right]}}=\\underset{\\equiv B}{\\underbrace{\\left[\\begin{array}{ccccc}\n",
    "1 & \\beta & \\beta^{2} & \\cdots & \\beta^{T-1}\\\\\n",
    "0 & 1 & \\beta & \\cdots & \\beta^{T-2}\\\\\n",
    "0 & 0 & 1 & \\cdots & \\beta^{T-3}\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots & \\vdots\\\\\n",
    "0 & 0 & 0 & \\cdots & 1\n",
    "\\end{array}\\right]}}\\left[\\begin{array}{c}\n",
    "y_{1}\\\\\n",
    "y_{2}\\\\\n",
    "y_{3}\\\\\n",
    "\\vdots\\\\\n",
    "y_{T}\n",
    "\\end{array}\\right]\n",
    "$$\n",
    "\n",
    "we have\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\mu_{p} = B \\mu_{y} \\\\\n",
    "\\Sigma_{p} = B \\Sigma_{y} B^{\\prime}\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "411531b9",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "β = .96"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc42e574",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construct B\n",
    "B = np.zeros((T, T))\n",
    "\n",
    "for i in range(T):\n",
    "    B[i, i:] = β ** np.arange(0, T-i)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1715d2b5",
   "metadata": {},
   "source": [
    "Denote\n",
    "\n",
    "$$\n",
    "z=\\left[\\begin{array}{c}\n",
    "y\\\\\n",
    "p\n",
    "\\end{array}\\right]=\\underset{\\equiv D}{\\underbrace{\\left[\\begin{array}{c}\n",
    "I\\\\\n",
    "B\n",
    "\\end{array}\\right]}} y\n",
    "$$\n",
    "\n",
    "Thus, $ \\{y_t\\}_{t=1}^{T} $ and $ \\{p_t\\}_{t=1}^{T} $ jointly\n",
    "follow the multivariate normal distribution\n",
    "$ N \\left(\\mu_{z}, \\Sigma_{z}\\right) $, where\n",
    "\n",
    "$$\n",
    "\\mu_{z}=D\\mu_{y}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\Sigma_{z}=D\\Sigma_{y}D^{\\prime}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c5764f7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "D = np.vstack([np.eye(T), B])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf2b1c21",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μz = D @ μy\n",
    "Σz = D @ Σy @ D.T"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e76aedde",
   "metadata": {},
   "source": [
    "We can simulate paths of $ y_{t} $ and $ p_{t} $ and compute the\n",
    "conditional mean $ E \\left[p_{t} \\mid y_{t-1}, y_{t}\\right] $ using\n",
    "the `MultivariateNormal` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b65788d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "z = np.random.multivariate_normal(μz, Σz)\n",
    "y, p = z[:T], z[T:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56416e2a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "cond_Ep = np.empty(T-1)\n",
    "\n",
    "sub_μ = np.empty(3)\n",
    "sub_Σ = np.empty((3, 3))\n",
    "for t in range(2, T+1):\n",
    "    sub_μ[:] = μz[[t-2, t-1, T-1+t]]\n",
    "    sub_Σ[:, :] = Σz[[t-2, t-1, T-1+t], :][:, [t-2, t-1, T-1+t]]\n",
    "\n",
    "    multi_normal = MultivariateNormal(sub_μ, sub_Σ)\n",
    "    multi_normal.partition(2)\n",
    "\n",
    "    cond_Ep[t-2] = multi_normal.cond_dist(1, y[t-2:t])[0][0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc328b94",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plt.plot(range(1, T), y[1:], label='$y_{t}$')\n",
    "plt.plot(range(1, T), y[:-1], label='$y_{t-1}$')\n",
    "plt.plot(range(1, T), p[1:], label='$p_{t}$')\n",
    "plt.plot(range(1, T), cond_Ep, label='$Ep_{t}|y_{t}, y_{t-1}$')\n",
    "\n",
    "plt.xlabel('t')\n",
    "plt.legend(loc=1)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e08361a",
   "metadata": {},
   "source": [
    "In the above graph, the green line is what the price of the stock would\n",
    "be if people had perfect foresight about the path of dividends while the\n",
    "green line is the conditional expectation $ E p_t | y_t, y_{t-1} $, which is what the price would\n",
    "be if people did not have perfect foresight but were optimally\n",
    "predicting future dividends on the basis of the information\n",
    "$ y_t, y_{t-1} $ at time $ t $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba58be2f",
   "metadata": {},
   "source": [
    "## Filtering Foundations\n",
    "\n",
    "Assume that $ x_0 $ is an $ n \\times 1 $ random vector and that\n",
    "$ y_0 $ is a $ p \\times 1 $ random vector determined by the\n",
    "*observation equation*\n",
    "\n",
    "$$\n",
    "y_0 = G x_0 + v_0  , \\quad x_0 \\sim {\\mathcal N}(\\hat x_0, \\Sigma_0), \\quad v_0 \\sim {\\mathcal N}(0, R)\n",
    "$$\n",
    "\n",
    "where $ v_0 $ is orthogonal to $ x_0 $, $ G $ is a\n",
    "$ p \\times n $ matrix, and $ R $ is a $ p \\times p $\n",
    "positive definite matrix.\n",
    "\n",
    "We consider the problem of someone who\n",
    "\n",
    "- *observes* $ y_0 $  \n",
    "- does not observe $ x_0 $,  \n",
    "- knows $ \\hat x_0, \\Sigma_0, G, R $ and therefore  the joint probability distribution of the vector $ \\begin{bmatrix} x_0 \\cr y_0 \\end{bmatrix} $  \n",
    "- wants to infer $ x_0 $ from $ y_0 $ in light of what he knows about that\n",
    "  joint probability distribution.  \n",
    "\n",
    "\n",
    "Therefore, the person wants to construct the probability distribution of\n",
    "$ x_0 $ conditional on the random vector $ y_0 $.\n",
    "\n",
    "The joint distribution of\n",
    "$ \\begin{bmatrix} x_0 \\cr y_0 \\end{bmatrix} $ is multivariate normal\n",
    "$ {\\mathcal N}(\\mu, \\Sigma) $ with\n",
    "\n",
    "$$\n",
    "\\mu = \\begin{bmatrix} \\hat x_0 \\cr G \\hat x_0 \\end{bmatrix} , \\quad\n",
    "  \\Sigma = \\begin{bmatrix} \\Sigma_0 & \\Sigma_0 G' \\cr\n",
    "                          G \\Sigma_0 & G \\Sigma_0 G' + R \\end{bmatrix}\n",
    "$$\n",
    "\n",
    "By applying an appropriate instance of the above formulas for the  mean vector $ \\hat \\mu_1 $ and covariance matrix\n",
    "$ \\hat \\Sigma_{11} $ of $ z_1 $ conditional on $ z_2 $, we find that the probability distribution of\n",
    "$ x_0 $ conditional on $ y_0 $ is\n",
    "$ {\\mathcal N}(\\tilde x_0, \\tilde \\Sigma_0) $ where\n",
    "\n",
    "$$\n",
    "\\begin{aligned} \\beta_0  & = \\Sigma_0 G' (G \\Sigma_0 G' + R)^{-1} \\cr\n",
    "\\tilde x_0 & = \\hat x_0 + \\beta_0 ( y_0 - G \\hat x_0) \\cr\n",
    " \\tilde \\Sigma_0 & = \\Sigma_0 - \\Sigma_0 G' (G \\Sigma_0 G' + R)^{-1} G \\Sigma_0\n",
    "  \\end{aligned}\n",
    "$$\n",
    "\n",
    "We can express our finding that the probability distribution of\n",
    "$ x_0 $ conditional on $ y_0 $ is $ {\\mathcal N}(\\tilde x_0, \\tilde \\Sigma_0) $ by representing $ x_0 $\n",
    "as\n",
    "\n",
    "\n",
    "<a id='equation-eq-x0rep2'></a>\n",
    "$$\n",
    "x_0 = \\tilde x_0 + \\zeta_0 \\tag{12.2}\n",
    "$$\n",
    "\n",
    "where $ \\zeta_0 $ is a Gaussian random vector that is orthogonal to $ \\tilde x_0 $ and $ y_0 $ and that\n",
    "has mean vector  $ 0 $ and conditional covariance matrix $ E [\\zeta_0 \\zeta_0' | y_0] = \\tilde \\Sigma_0 $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5604b9f",
   "metadata": {},
   "source": [
    "### Step toward dynamics\n",
    "\n",
    "Now suppose that we are in a time series setting and that we have the\n",
    "one-step state transition equation\n",
    "\n",
    "$$\n",
    "x_1 = A x_0 + C w_1 ,  \\quad w_1 \\sim {\\mathcal N}(0, I )\n",
    "$$\n",
    "\n",
    "where $ A $ is an $ n \\times n $ matrix and $ C $ is an\n",
    "$ n \\times m $ matrix.\n",
    "\n",
    "Using equation [(12.2)](#equation-eq-x0rep2), we can also represent $ x_1 $ as\n",
    "\n",
    "$$\n",
    "x_1 = A (\\tilde x_0 + \\zeta_0) + C w_1\n",
    "$$\n",
    "\n",
    "It follows that\n",
    "\n",
    "$$\n",
    "E x_1 | y_0 = A \\tilde x_0\n",
    "$$\n",
    "\n",
    "and that the corresponding conditional covariance matrix $ E (x_1 - E x_1| y_0)  (x_1 - E x_1| y_0)' \\equiv \\Sigma_1 $ is\n",
    "\n",
    "$$\n",
    "\\Sigma_1 = A \\tilde \\Sigma_0 A' + C C'\n",
    "$$\n",
    "\n",
    "or\n",
    "\n",
    "$$\n",
    "\\Sigma_1 =  A \\Sigma_0 A' - A \\Sigma_0 G' (G \\Sigma_0 G' + R)^{-1} G \\Sigma_0 A'\n",
    "$$\n",
    "\n",
    "We can write the mean of $ x_1 $ conditional on $ y_0 $ as\n",
    "\n",
    "$$\n",
    "\\hat x_1 = A \\hat x_0 + A \\Sigma_0 G' (G \\Sigma_0 G' + R)^{-1} (y_0 - G \\hat x_0)\n",
    "$$\n",
    "\n",
    "or\n",
    "\n",
    "$$\n",
    "\\hat x_1 = A \\hat x_0 + K_0 (y_0 - G \\hat x_0)\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "$$\n",
    "K_0 = A \\Sigma_0 G' (G \\Sigma_0 G' + R)^{-1}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dea9b70",
   "metadata": {},
   "source": [
    "### Dynamic version\n",
    "\n",
    "Suppose now that for $ t \\geq 0 $,\n",
    "$ \\{x_{t+1}, y_t\\}_{t=0}^\\infty $ are governed by the equations\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "x_{t+1} & = A x_t + C w_{t+1} \\cr\n",
    "y_t & = G x_t + v_t\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "where as before $ x_0 \\sim {\\mathcal N}(\\hat x_0, \\Sigma_0) $,\n",
    "$ w_{t+1} $ is the $ t+1 $th component of an i.i.d. stochastic\n",
    "process distributed as $ w_{t+1} \\sim {\\mathcal N}(0, I) $, and\n",
    "$ v_t $ is the $ t $th component of an i.i.d. process\n",
    "distributed as $ v_t \\sim {\\mathcal N}(0, R) $ and the\n",
    "$ \\{w_{t+1}\\}_{t=0}^\\infty $ and $ \\{v_t\\}_{t=0}^\\infty $\n",
    "processes are orthogonal at all pairs of dates.\n",
    "\n",
    "The logic and\n",
    "formulas that we applied above imply that the probability distribution\n",
    "of $ x_t $ conditional on\n",
    "$ y_0, y_1, \\ldots , y_{t-1} = y^{t-1} $ is\n",
    "\n",
    "$$\n",
    "x_t | y^{t-1} \\sim {\\mathcal N}(A \\tilde x_t , A \\tilde \\Sigma_t A' + C C' )\n",
    "$$\n",
    "\n",
    "where $ \\{\\tilde x_t, \\tilde \\Sigma_t\\}_{t=1}^\\infty $ can be\n",
    "computed by iterating on the following equations starting from\n",
    "$ t=1 $ and initial conditions for\n",
    "$ \\tilde x_0, \\tilde \\Sigma_0 $ computed as we have above:\n",
    "\n",
    "$$\n",
    "\\begin{aligned} \\Sigma_t & = A  \\tilde \\Sigma_{t-1} A' + C C' \\cr\n",
    "               \\hat x_t & = A \\tilde x_{t-1} \\cr\n",
    "\\beta_t & = \\Sigma_t G' (G \\Sigma_t G' + R)^{-1} \\cr\n",
    "\\tilde x_t & = \\hat x_t + \\beta_t ( y_t - G \\hat x_t) \\cr\n",
    " \\tilde \\Sigma_t & = \\Sigma_t - \\Sigma_t G' (G \\Sigma_t G' + R)^{-1} G \\Sigma_t\n",
    "  \\end{aligned}\n",
    "$$\n",
    "\n",
    "If we shift the first equation forward one period and then substitute the expression for $ \\tilde \\Sigma_t $  on the right side of the fifth equation\n",
    "into it we obtain\n",
    "\n",
    "$$\n",
    "\\Sigma_{t+1}= C C' + A \\Sigma_t A' - A \\Sigma_t G' (G \\Sigma_t G' +R)^{-1} G \\Sigma_t A' .\n",
    "$$\n",
    "\n",
    "This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears in  a quantecon lecture on the basics of linear quadratic control theory.\n",
    "\n",
    "That equation has the form\n",
    "\n",
    "$$\n",
    "P_{t-1} =R + A' P_t A  - A' P_t B\n",
    "(B' P_t B + Q)^{-1}  B' P_t A  .\n",
    "$$\n",
    "\n",
    "Stare at the two preceding equations for a moment or two, the first being a matrix difference equation for a conditional covariance matrix, the\n",
    "second being a matrix difference equation in the matrix appearing in a quadratic form for an intertemporal cost of value function.\n",
    "\n",
    "Although the  two equations are not identical, they display striking family resemblences.\n",
    "\n",
    "- the first equation tells dynamics that work **forward**  in time  \n",
    "- the second equation tells dynamics that work  **backward** in time  \n",
    "- while many of the terms are similar, one equation seems to apply matrix transformations to some matrices that play similar roles in the other equation  \n",
    "\n",
    "\n",
    "The family resemblences of these two equations reflects a transcendent **duality** that prevails between control theory and filtering theory."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93b8727e",
   "metadata": {},
   "source": [
    "### An example\n",
    "\n",
    "We can use the Python class *MultivariateNormal* to construct examples.\n",
    "\n",
    "Here is an example for a single period problem at time $ 0 $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4545d1d2",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "G = np.array([[1., 3.]])\n",
    "R = np.array([[1.]])\n",
    "\n",
    "x0_hat = np.array([0., 1.])\n",
    "Σ0 = np.array([[1., .5], [.3, 2.]])\n",
    "\n",
    "μ = np.hstack([x0_hat, G @ x0_hat])\n",
    "Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4625d51d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# construction of the multivariate normal instance\n",
    "multi_normal = MultivariateNormal(μ, Σ)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9cbe936",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal.partition(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4224f282",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# the observation of y\n",
    "y0 = 2.3\n",
    "\n",
    "# conditional distribution of x0\n",
    "μ1_hat, Σ11 = multi_normal.cond_dist(0, y0)\n",
    "μ1_hat, Σ11"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0821a569",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "A = np.array([[0.5, 0.2], [-0.1, 0.3]])\n",
    "C = np.array([[2.], [1.]])\n",
    "\n",
    "# conditional distribution of x1\n",
    "x1_cond = A @ μ1_hat\n",
    "Σ1_cond = C @ C.T + A @ Σ11 @ A.T\n",
    "x1_cond, Σ1_cond"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27339579",
   "metadata": {},
   "source": [
    "### Code for Iterating\n",
    "\n",
    "Here is code for solving a dynamic filtering problem by iterating on our\n",
    "equations, followed by an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e51d3594",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def iterate(x0_hat, Σ0, A, C, G, R, y_seq):\n",
    "\n",
    "    p, n = G.shape\n",
    "\n",
    "    T = len(y_seq)\n",
    "    x_hat_seq = np.empty((T+1, n))\n",
    "    Σ_hat_seq = np.empty((T+1, n, n))\n",
    "\n",
    "    x_hat_seq[0] = x0_hat\n",
    "    Σ_hat_seq[0] = Σ0\n",
    "\n",
    "    for t in range(T):\n",
    "        xt_hat = x_hat_seq[t]\n",
    "        Σt = Σ_hat_seq[t]\n",
    "        μ = np.hstack([xt_hat, G @ xt_hat])\n",
    "        Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]])\n",
    "\n",
    "        # filtering\n",
    "        multi_normal = MultivariateNormal(μ, Σ)\n",
    "        multi_normal.partition(n)\n",
    "        x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t])\n",
    "\n",
    "        # forecasting\n",
    "        x_hat_seq[t+1] = A @ x_tilde\n",
    "        Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T\n",
    "\n",
    "    return x_hat_seq, Σ_hat_seq"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bfb0766a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71947fc3",
   "metadata": {},
   "source": [
    "The iterative algorithm just described is a version of the celebrated **Kalman filter**.\n",
    "\n",
    "We describe the Kalman filter  and some applications of it in [A First Look at the Kalman Filter](https://python.quantecon.org/kalman.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c28af658",
   "metadata": {},
   "source": [
    "## Classic Factor Analysis Model\n",
    "\n",
    "The factor analysis model widely used in psychology and other fields can\n",
    "be represented as\n",
    "\n",
    "$$\n",
    "Y = \\Lambda f + U\n",
    "$$\n",
    "\n",
    "where\n",
    "\n",
    "1. $ Y $ is $ n \\times 1 $ random vector,\n",
    "  $ E U U^{\\prime} = D $ is a diagonal matrix,  \n",
    "1. $ \\Lambda $ is $ n \\times k $ coefficient matrix,  \n",
    "1. $ f $ is $ k \\times 1 $ random vector,\n",
    "  $ E f f^{\\prime} = I $,  \n",
    "1. $ U $ is $ n \\times 1 $ random vector, and $ U \\perp f $ (i.e., $ E U f' = 0 $ )  \n",
    "1. It is presumed that $ k $ is small relative to $ n $; often\n",
    "  $ k $ is only $ 1 $ or $ 2 $, as in our IQ examples.  \n",
    "\n",
    "\n",
    "This implies that\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\Sigma_y = E Y Y^{\\prime} = \\Lambda \\Lambda^{\\prime} + D \\\\\n",
    "E Y f^{\\prime} = \\Lambda \\\\\n",
    "E f Y^{\\prime} = \\Lambda^{\\prime}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "Thus, the covariance matrix $ \\Sigma_Y $ is the sum of a diagonal\n",
    "matrix $ D $ and a positive semi-definite matrix\n",
    "$ \\Lambda \\Lambda^{\\prime} $ of rank $ k $.\n",
    "\n",
    "This means that all covariances among the $ n $ components of the\n",
    "$ Y $ vector are intermediated by their common dependencies on the\n",
    "$ k< $ factors.\n",
    "\n",
    "Form\n",
    "\n",
    "$$\n",
    "Z=\\left(\\begin{array}{c}\n",
    "f\\\\\n",
    "Y\n",
    "\\end{array}\\right)\n",
    "$$\n",
    "\n",
    "the covariance matrix of the expanded random vector $ Z $ can be\n",
    "computed as\n",
    "\n",
    "$$\n",
    "\\Sigma_{z} = EZZ^{\\prime}=\\left(\\begin{array}{cc}\n",
    "I & \\Lambda^{\\prime}\\\\\n",
    "\\Lambda & \\Lambda\\Lambda^{\\prime}+D\n",
    "\\end{array}\\right)\n",
    "$$\n",
    "\n",
    "In the following, we first construct the mean vector and the covariance\n",
    "matrix for the case where $ N=10 $ and $ k=2 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07f8e101",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N = 10\n",
    "k = 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "324546ce",
   "metadata": {},
   "source": [
    "We set the coefficient matrix $ \\Lambda $ and the covariance matrix\n",
    "of $ U $ to be\n",
    "\n",
    "$$\n",
    "\\Lambda=\\left(\\begin{array}{cc}\n",
    "1 & 0\\\\\n",
    "\\vdots & \\vdots\\\\\n",
    "1 & 0\\\\\n",
    "0 & 1\\\\\n",
    "\\vdots & \\vdots\\\\\n",
    "0 & 1\n",
    "\\end{array}\\right),\\quad D=\\left(\\begin{array}{cccc}\n",
    "\\sigma_{u}^{2} & 0 & \\cdots & 0\\\\\n",
    "0 & \\sigma_{u}^{2} & \\cdots & 0\\\\\n",
    "\\vdots & \\vdots & \\vdots & \\vdots\\\\\n",
    "0 & 0 & \\cdots & \\sigma_{u}^{2}\n",
    "\\end{array}\\right)\n",
    "$$\n",
    "\n",
    "where the first half of the first column of $ \\Lambda $ is filled\n",
    "with $ 1 $s and $ 0 $s for the rest half, and symmetrically\n",
    "for the second column.\n",
    "\n",
    "$ D $ is a diagonal matrix with parameter\n",
    "$ \\sigma_{u}^{2} $ on the diagonal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c290b088",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Λ = np.zeros((N, k))\n",
    "Λ[:N//2, 0] = 1\n",
    "Λ[N//2:, 1] = 1\n",
    "\n",
    "σu = .5\n",
    "D = np.eye(N) * σu ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "075e3275",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compute Σy\n",
    "Σy = Λ @ Λ.T + D"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f56a2115",
   "metadata": {},
   "source": [
    "We can now construct the mean vector and the covariance matrix for\n",
    "$ Z $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a746091d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "μz = np.zeros(k+N)\n",
    "\n",
    "Σz = np.empty((k+N, k+N))\n",
    "\n",
    "Σz[:k, :k] = np.eye(k)\n",
    "Σz[:k, k:] = Λ.T\n",
    "Σz[k:, :k] = Λ\n",
    "Σz[k:, k:] = Σy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0d408c0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "z = np.random.multivariate_normal(μz, Σz)\n",
    "\n",
    "f = z[:k]\n",
    "y = z[k:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8546eaae",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_factor = MultivariateNormal(μz, Σz)\n",
    "multi_normal_factor.partition(k)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dd9f53c",
   "metadata": {},
   "source": [
    "Let’s compute the conditional distribution of the hidden factor\n",
    "$ f $ on the observations $ Y $, namely, $ f \\mid Y=y $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fe25f531",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_factor.cond_dist(0, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8879afb2",
   "metadata": {},
   "source": [
    "We can verify that the conditional mean\n",
    "$ E \\left[f \\mid Y=y\\right] = B Y $ where\n",
    "$ B = \\Lambda^{\\prime} \\Sigma_{y}^{-1} $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1089051",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "B = Λ.T @ np.linalg.inv(Σy)\n",
    "\n",
    "B @ y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edb0d87a",
   "metadata": {},
   "source": [
    "Similarly, we can compute the conditional distribution $ Y \\mid f $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b8e41191",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "multi_normal_factor.cond_dist(1, f)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ca27a41",
   "metadata": {},
   "source": [
    "It can be verified that the mean is\n",
    "$ \\Lambda I^{-1} f = \\Lambda f $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "339c4c78",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Λ @ f"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb175c45",
   "metadata": {},
   "source": [
    "## PCA and Factor Analysis\n",
    "\n",
    "To learn about Principal Components Analysis (PCA), please see this lecture [Singular Value Decompositions](https://python.quantecon.org/svd_intro.html).\n",
    "\n",
    "For fun, let’s apply a PCA decomposition\n",
    "to a covariance matrix $ \\Sigma_y $ that in fact is governed by our factor-analytic\n",
    "model.\n",
    "\n",
    "Technically, this means that the PCA model is misspecified. (Can you\n",
    "explain why?)\n",
    "\n",
    "Nevertheless, this exercise will let us study how well the first two\n",
    "principal components from a PCA can approximate the conditional\n",
    "expectations $ E f_i | Y $ for our two factors $ f_i $,\n",
    "$ i=1,2 $ for the factor analytic model that we have assumed truly\n",
    "governs the data on $ Y $ we have generated.\n",
    "\n",
    "So we compute the PCA decomposition\n",
    "\n",
    "$$\n",
    "\\Sigma_{y} = P \\tilde{\\Lambda} P^{\\prime}\n",
    "$$\n",
    "\n",
    "where $ \\tilde{\\Lambda} $ is a diagonal matrix.\n",
    "\n",
    "We have\n",
    "\n",
    "$$\n",
    "Y = P \\epsilon\n",
    "$$\n",
    "\n",
    "and\n",
    "\n",
    "$$\n",
    "\\epsilon = P^\\prime Y\n",
    "$$\n",
    "\n",
    "Note that we will arrange the eigenvectors in $ P $ in the\n",
    "*descending* order of eigenvalues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "859c183b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "𝜆_tilde, P = np.linalg.eigh(Σy)\n",
    "\n",
    "# arrange the eigenvectors by eigenvalues\n",
    "ind = sorted(range(N), key=lambda x: 𝜆_tilde[x], reverse=True)\n",
    "\n",
    "P = P[:, ind]\n",
    "𝜆_tilde = 𝜆_tilde[ind]\n",
    "Λ_tilde = np.diag(𝜆_tilde)\n",
    "\n",
    "print('𝜆_tilde =', 𝜆_tilde)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab5890dc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# verify the orthogonality of eigenvectors\n",
    "np.abs(P @ P.T - np.eye(N)).max()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fea567ba",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# verify the eigenvalue decomposition is correct\n",
    "P @ Λ_tilde @ P.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79fc8bb3",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ε = P.T @ y\n",
    "\n",
    "print(\"ε = \", ε)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf77926b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# print the values of the two factors\n",
    "\n",
    "print('f = ', f)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d8c56a1",
   "metadata": {},
   "source": [
    "Below we’ll plot several things\n",
    "\n",
    "- the $ N $ values of $ y $  \n",
    "- the $ N $ values of the principal components $ \\epsilon $  \n",
    "- the value of the first factor $ f_1 $ plotted only for the first\n",
    "  $ N/2 $ observations of $ y $ for which it receives a\n",
    "  non-zero loading in $ \\Lambda $  \n",
    "- the value of the second factor $ f_2 $ plotted only for the final\n",
    "  $ N/2 $ observations for which it receives a non-zero loading in\n",
    "  $ \\Lambda $  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "371c48de",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plt.scatter(range(N), y, label='y')\n",
    "plt.scatter(range(N), ε, label='$\\epsilon$')\n",
    "plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')\n",
    "plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')\n",
    "plt.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78e7c9c8",
   "metadata": {},
   "source": [
    "Consequently, the first two $ \\epsilon_{j} $ correspond to the\n",
    "largest two eigenvalues.\n",
    "\n",
    "Let’s look at them, after which we’ll look at $ E f | y = B y $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc4b4d51",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "ε[:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79afeaf7",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# compare with Ef|y\n",
    "B @ y"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e379ce4",
   "metadata": {},
   "source": [
    "The fraction of variance in $ y_{t} $ explained by the first two\n",
    "principal components can be computed as below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9157c43b",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "𝜆_tilde[:2].sum() / 𝜆_tilde.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93181a01",
   "metadata": {},
   "source": [
    "Compute\n",
    "\n",
    "$$\n",
    "\\hat{Y} = P_{j} \\epsilon_{j} + P_{k} \\epsilon_{k}\n",
    "$$\n",
    "\n",
    "where $ P_{j} $ and $ P_{k} $ correspond to the largest two\n",
    "eigenvalues."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73fb186c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "y_hat = P[:, :2] @ ε[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c4c941d",
   "metadata": {},
   "source": [
    "In this example, it turns out that the projection $ \\hat{Y} $ of\n",
    "$ Y $ on the first two principal components does a good job of\n",
    "approximating $ Ef \\mid y $.\n",
    "\n",
    "We confirm this in the following plot of $ f $,\n",
    "$ E y \\mid f $, $ E f \\mid y $, and $ \\hat{y} $ on the\n",
    "coordinate axis versus $ y $ on the ordinate axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f726e0c",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plt.scatter(range(N), Λ @ f, label='$Ey|f$')\n",
    "plt.scatter(range(N), y_hat, label='$\\hat{y}$')\n",
    "plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')\n",
    "plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')\n",
    "\n",
    "Efy = B @ y\n",
    "plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$')\n",
    "plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$')\n",
    "plt.legend()\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "728268a6",
   "metadata": {},
   "source": [
    "The covariance matrix of $ \\hat{Y} $ can be computed by first\n",
    "constructing the covariance matrix of $ \\epsilon $ and then use the\n",
    "upper left block for $ \\epsilon_{1} $ and $ \\epsilon_{2} $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19440835",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Σεjk = (P.T @ Σy @ P)[:2, :2]\n",
    "\n",
    "Pjk = P[:, :2]\n",
    "\n",
    "Σy_hat = Pjk @ Σεjk @ Pjk.T\n",
    "print('Σy_hat = \\n', Σy_hat)"
   ]
  }
 ],
 "metadata": {
  "date": 1714442508.429578,
  "filename": "multivariate_normal.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Multivariate Normal Distribution"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}