{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Foundations of Computational Economics #33\n",
    "\n",
    "by Fedor Iskhakov, ANU\n",
    "\n",
    "<img src=\"_static/img/dag3logo.png\" style=\"width:256px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "## Random numbers in Python, Monte Carlo\n",
    "\n",
    "<img src=\"_static/img/lecture.png\" style=\"width:64px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"_static/img/youtube.png\" style=\"width:65px;\">\n",
    "\n",
    "[https://youtu.be/YX8lPKRETLQ](https://youtu.be/YX8lPKRETLQ)\n",
    "\n",
    "Description: Random number generation in Python. Inverse transform sampling. Monte Carlo simulations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Random numbers and simulation\n",
    "\n",
    "Has to do with three different fields:\n",
    "\n",
    "- Theory of probabilities - analysis with known distribution laws  \n",
    "- Mathematical statistics - analysis of data from unknown distributions  \n",
    "- Econometrics - application of statistics to economics  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Probability space\n",
    "\n",
    "$$\n",
    "\\tilde{X} = \\big( \\Omega, \\mathfrak{A}, P \\big)\n",
    "$$\n",
    "\n",
    "- $ \\Omega $ set of all outcomes (sample space)  \n",
    "- $ \\mathfrak{A} $ (sigma-algebra) a collection of subsets of $ \\Omega $\n",
    "  called a $ \\sigma $-field if it satisfies the following conditions:  \n",
    "  - $ \\emptyset \\in \\mathfrak{A} $  \n",
    "  - if $ A_1,A_2,...\\in \\mathfrak{A} $, then $ \\cup_{i=1}^{\\infty}A_i\\in \\mathfrak{A} $  \n",
    "  - if $ A \\in \\mathfrak{A} $, then $ A^c \\in \\mathfrak{A} $  \n",
    "- $ P $ a probability measure on $ \\mathfrak{A} $ is a function\n",
    "  $ P: \\mathfrak{A} \\rightarrow [0,1] $ satisfying:  \n",
    "  - $ P(\\emptyset)=0 $ and $ P(\\Omega)=1 $  \n",
    "  - If $ A_1,A_2,... $ is a collection of disjoint members of $ \\mathfrak{A} $, in that $ A_i \\cap A_j=\\emptyset $ for all pairs $ i \\ne j $, then $ P(\\cup_{i=1}^\\infty A_i)=\\sum_{i=1}^{\\infty}P(A_i) $  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Random variables\n",
    "\n",
    "A real-valued random variable is a function\n",
    "\n",
    "$$\n",
    "\\tilde{X}: \\Omega \\rightarrow \\mathbb{R}\n",
    "$$\n",
    "\n",
    "with the property that $ \\{\\omega \\in \\Omega: X(\\omega) \\leq x\\} \\in \\mathfrak{A} $ for each $ x \\in \\mathbb{R} $. Such function is said to be $ \\mathfrak{A} $-measurable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Discrete random variables\n",
    "\n",
    "- $ \\Omega = \\{x_1,\\dots,x_n\\} $ is finite with $ n $ elements  \n",
    "- $ \\mathfrak{A} $ is set of all subsets of $ \\Omega $  \n",
    "- $ P $ is constructed from $ (p_1,p_2,\\dots,p_n) $, such that  \n",
    "\n",
    "\n",
    "$$\n",
    "P(A) = \\sum_{i: x_i \\in A} p_i \\text{ for all } A \\in \\mathfrak{A}\n",
    "$$\n",
    "\n",
    "$ F(x) = P(\\{\\tilde{X} \\le x\\}) $ cumulative distribution function, stepwise constant, discontinuous"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Continuous random variables\n",
    "\n",
    "- $ \\Omega $ is subset of $ \\mathbb{R} $  \n",
    "- $ \\mathfrak{A} $ is Borel set on $ \\Omega $  \n",
    "- $ P([a,b]) = F(b) - F(a) $  \n",
    "\n",
    "\n",
    "$$\n",
    "F(x) = \\int_{-\\infty}^x f(t) dt\n",
    "$$\n",
    "\n",
    "- $ F(x) = P(\\{\\tilde{X} \\le x\\}) $ cumulative distribution function, continuous  \n",
    "- $ f(x) $ probability density function  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Moments of random variables\n",
    "\n",
    "- First, mean, measure of *central tendency*  \n",
    "\n",
    "\n",
    "$$\n",
    "E(\\tilde{X}) = \\int_{\\Omega} x \\,dF(x) = \\int_{\\Omega} x f(x) \\,dx \\quad\\text{or}\\quad =\\sum_{i=1}^{n} x_i p_i\n",
    "$$\n",
    "\n",
    "- Second central, variance, measure of *variability*  \n",
    "\n",
    "\n",
    "$$\n",
    "D(\\tilde{X}) = V(\\tilde{X}) = Var (\\tilde{X}) = E\\big(\\tilde{X}- E(\\tilde{X})\\big)^2\n",
    "$$\n",
    "\n",
    "- Standard deviation = square root of variance (same units as mean)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Random sample\n",
    "\n",
    "Realizations of a random variable $ \\tilde{X} $ drawn independently one by one\n",
    "\n",
    "$$\n",
    "\\{x_1,x_2,\\dots,x_n\\} \\sim F_{\\tilde{X}}(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Simulation $ \\sim $ Random sample\n",
    "\n",
    "- simulation of the random variable is the same as random sampling  \n",
    "- economic model (with shocks), as a function of the random variable, is a random variable itself  \n",
    "- the complex induced distribution can be studied by simulation *of the model*  \n",
    "- parameters of the model alter the model induced distribution  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Generating random numbers\n",
    "\n",
    "Every language has a **rand()** function that generates uniform number\n",
    "on $ [0,1] $\n",
    "\n",
    "In fact, **pseudo-random number generator**, i.e. deterministic\n",
    "algorithm that returns a sequence of numbers that look close enough to\n",
    "being random\n",
    "\n",
    "- must be fast  \n",
    "- must have large periodicity  \n",
    "- must satisfy a battery of statistical tests for independence and stationarity  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Random draws pictured\n",
    "\n",
    "<img src=\"_static/img/randbitmap_computer.png\" style=\"width:600px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### True random number generators\n",
    "\n",
    "Use physical processes as origin for randomness (RANDOM.ORG, hardware\n",
    "RNGs)\n",
    "\n",
    "Pseudo random number generator state may be initiated by true source of\n",
    "randomness\n",
    "\n",
    "- thermal noise  \n",
    "- electrical noise  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### True random draws pictured\n",
    "\n",
    "<img src=\"_static/img/randbitmap_true.png\" style=\"width:600px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Random generators in Python\n",
    "\n",
    "Modules:\n",
    "- **random** - basic functionality, scalar numbers\n",
    "- **NumPy.random** - vectorized, many distributions\n",
    "- **SciPy.stat** - more probability and statistics functions\n",
    "\n",
    "**x=random.random()** uniform on $ [0,1] $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "# Simulating a scalar random variable (one by one or in bulk)\n",
    "import random\n",
    "x = [random.random() for i in range(100)]\n",
    "print(*x[:10],sep='\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "def hist(data,bins='auto',range=None,theoretical=None,cdf=False):\n",
    "    '''Draws histogram of data, imposes a theoretical distribution if given'''\n",
    "    fig, ax = plt.subplots(figsize=(10,6))\n",
    "    if cdf:\n",
    "        # plot CDF instead of histogram\n",
    "        plt.hist(data,bins=len(data),range=range,cumulative=True,density=True,align='right',histtype='step',color='black')\n",
    "    else:\n",
    "        plt.hist(data,bins=bins,range=range,density=True,histtype='bar',color='lightgrey',edgecolor='k')\n",
    "    if theoretical and len(data)>0:\n",
    "        # add theoretical distribution\n",
    "        x = (np.linspace(range[0],range[-1],100) if range else np.linspace(min(data),max(data),100))\n",
    "        y = theoretical(x)\n",
    "        plt.plot(x,y,'r-')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import scipy.stats\n",
    "hist(x,theoretical=scipy.stats.uniform.pdf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "for i in range(1,6):\n",
    "    n=10**i\n",
    "    data = np.random.random(n) #NumPy\n",
    "    hist(data,bins=25,theoretical=scipy.stats.uniform.pdf)\n",
    "    plt.title('%d realizations'%n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Other distributions\n",
    "\n",
    "- Normal  \n",
    "- Log-normal  \n",
    "- Exponential  \n",
    "- Fisher  \n",
    "- $ \\chi^2 $-distribution  \n",
    "\n",
    "\n",
    "and many other\n",
    "\n",
    "[https://numpy.org/doc/stable/reference/random/generator.html](https://numpy.org/doc/stable/reference/random/generator.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "for i in range(1,6):\n",
    "    n=10**i\n",
    "    data = np.random.lognormal(size=n)\n",
    "    hist(data,bins=25,range=(0,5),theoretical=lambda x: scipy.stats.lognorm.pdf(x,1.0))\n",
    "    plt.title('%d realizations'%n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Simulating from any probability distribution\n",
    "\n",
    "- Inverse transform sampling  \n",
    "- Accept-reject method (Rejection sampling)  \n",
    "- [Ziggurat algorithm](https://en.wikipedia.org/wiki/Ziggurat_algorithm)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Inverse transform sampling\n",
    "\n",
    "Let $ F(x) $ be cdf of the random variable of interest\n",
    "$ \\tilde{X} $, with inverse $ F^{(-1)}(x) $\n",
    "\n",
    "To simulate $ \\tilde{X} $:\n",
    "\n",
    "1. simulate $ (u_1,\\dots,u_n) $ from standard uniform distribution  \n",
    "1. return $ \\big( F^{(-1)}(u_1),\\dots,F^{(-1)}(u_n) \\big) $  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Let $ \\tilde{X} = F^{(-1)}(\\tilde{U}) $, and denote its cdf $ F_{\\tilde{X}}(x) $.\n",
    "\n",
    "Also let $ F_{U}(x) = \\min(\\max(0,U),1) $ denote cdf of standard uniform distribution.\n",
    "Then\n",
    "\n",
    "$$\n",
    "F_{\\tilde{X}}(x) = P(\\{ \\tilde{X} \\le x \\}) = P(\\{ F^{(-1)}(\\tilde{U}) \\le x \\}) =\n",
    "$$\n",
    "\n",
    "$$\n",
    "= P(\\{ \\tilde{U} \\le F(x) \\}) = F_U \\big(F(x)\\big) = F(x)\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Inverse CDF\n",
    "\n",
    "<img src=\"_static/img/invcdf.png\" style=\"width:600px;\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Example: Gumbel (Extreme value type I) distribution\n",
    "\n",
    "$$\n",
    "F(x) = \\exp \\big( -\\exp (-\\frac{x-\\mu}{\\beta} ) \\big)\n",
    "$$\n",
    "\n",
    "$$\n",
    "F^{-1}(x) = \\mu - \\beta \\log( -\\log(x))\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Repeatability of simulations\n",
    "\n",
    "- Sometimes it is important to be able to repeat the simulation’s random number sequence  \n",
    "- There are ways to control the psuedo-random number generator state  \n",
    "\n",
    "\n",
    "`seed()`\n",
    "\n",
    "`get_state()`\n",
    "\n",
    "`set_state()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Monte Carlo method\n",
    "\n",
    "Solving deterministic problems using random numbers\n",
    "\n",
    "Example: Compute $ \\pi $\n",
    "\n",
    "Approach:\n",
    "\n",
    "<img src=\"_static/img/pimc.jpg\" style=\"\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "def pimc(n=100):\n",
    "    '''Compute pi using Monte Carlo with sample size n'''\n",
    "    d = np.random.uniform(size=(2,int(n)))\n",
    "    d2 = np.sum(d**2,axis=0)\n",
    "    n1 = np.sum(d2<1)\n",
    "    s4 = n1/n\n",
    "    return 4*s4\n",
    "\n",
    "d = pimc(n=1e6)\n",
    "print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "data=[]\n",
    "for i in range(8):\n",
    "    n = 10**i\n",
    "    d = pimc(n)\n",
    "    print('Estimate of pi is %1.5f, %1.0e points, bias %1.3e'%(d,n,d-np.pi))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "data=[]\n",
    "for i in range(1000):\n",
    "    data.append(pimc(n=10000))\n",
    "hist(data)\n",
    "d = np.mean(data)\n",
    "print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Thought on Monte Carlo\n",
    "\n",
    "- Convergence is slow  \n",
    "- Fitted for some tasks better than for other  \n",
    "  \n",
    "  1. Simulating random variables with complex distributions (economic models)  \n",
    "  1. Testing econometric procedures and estimators  \n",
    "  1. Computing high dimensional integrals  \n",
    "  1. Reducing computational load for very hard problems by randomization  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Monte Carlo assistant for analytic proofs\n",
    "\n",
    "1. Assume some non-informative initial distributions for all parameters in the proposition (uniform with wide bounds usually)  \n",
    "1. Generate a *large* number of parameter vectors  \n",
    "1. Check if a theoretical proposition is true or not.  \n",
    "1. Analyze the patterns of parameters when the proposition does or does not hold (sometime finding just a single counter example is sufficient)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Example\n",
    "\n",
    "Let $ f(x) = x ^ k $ and\n",
    "$ g(x) = 1 + \\frac{1}{k}\\cos(2 \\pi x) $.\n",
    "\n",
    "*Proposition:* $ g(x)>f(x) $ on $ [0,1] $ for all\n",
    "$ k \\ge 1.5 $?\n",
    "\n",
    "More broadly, what are the values of $ k $ when the inequality holds?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide-output": false,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "x = np.linspace(0,1,1000).reshape([-1,1]) # space for check points on [0,1], column\n",
    "N = 100 # number of random draws of parameters\n",
    "k = np.random.uniform(low=1.5,high=10,size=N)  # generate k\n",
    "f = x ** k\n",
    "g = 1 + np.cos(x*2*np.pi) / k\n",
    "\n",
    "# plot the functions\n",
    "fig, ax = plt.subplots()\n",
    "ax.plot(x,f,color='b',alpha=0.5)\n",
    "ax.plot(x,g,color='r',alpha=0.5)\n",
    "plt.show()\n",
    "\n",
    "check = np.array(np.all(g>f,axis=0)) # simulate the proposition\n",
    "fk = k[np.logical_not(check)]\n",
    "answer = np.all(check)\n",
    "print('The simulated proposition is:',answer)\n",
    "if not answer:\n",
    "    print('Failing values of k:',fk,sep=' ')\n",
    "\n",
    "# plot the functions\n",
    "fig, ax = plt.subplots()\n",
    "ax.scatter(k[check],np.ones(k[check].shape),color='g')\n",
    "ax.scatter(fk,np.ones(fk.shape),color='r')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Conclusion: Even though with relatively low $ N=100 $ the statement may\n",
    "seem to be correct (depending on the lowest realized $ k $), running more\n",
    "simulations, for example, $ N=500 $, reveals that there are values of\n",
    "$ k $ close to 1.5 where the statement is incorrect.\n",
    "Running even more simulations may help to establish an approximate threshold\n",
    "somewhat above 1.5, such that for $ k $ above the threshold, the inequality\n",
    "is true."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Further learning resources\n",
    "\n",
    "- Docs on **SciPy.random**\n",
    "  [https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html)  \n",
    "- Docs on **SciPy.stats**\n",
    "  [https://docs.scipy.org/doc/scipy/reference/stats.html](https://docs.scipy.org/doc/scipy/reference/stats.html)  \n",
    "- Random number generators [https://www.random.org/analysis](https://www.random.org/analysis)  \n",
    "- 📖 Kevin Sheppard “Introduction to Python for Econometrics, Statistics and Data Analysis.” *Chapter: 19*  "
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "date": 1634828600.452924,
  "filename": "33_stochastic.rst",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Foundations of Computational Economics #33"
 },
 "nbformat": 4,
 "nbformat_minor": 4
}