{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Foundations of Computational Economics #33\n", "\n", "by Fedor Iskhakov, ANU\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## Random numbers in Python, Monte Carlo\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "[https://youtu.be/YX8lPKRETLQ](https://youtu.be/YX8lPKRETLQ)\n", "\n", "Description: Random number generation in Python. Inverse transform sampling. Monte Carlo simulations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Random numbers and simulation\n", "\n", "Has to do with three different fields:\n", "\n", "- Theory of probabilities - analysis with known distribution laws \n", "- Mathematical statistics - analysis of data from unknown distributions \n", "- Econometrics - application of statistics to economics " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Probability space\n", "\n", "$$\n", "\\tilde{X} = \\big( \\Omega, \\mathfrak{A}, P \\big)\n", "$$\n", "\n", "- $ \\Omega $ set of all outcomes (sample space) \n", "- $ \\mathfrak{A} $ (sigma-algebra) a collection of subsets of $ \\Omega $\n", " called a $ \\sigma $-field if it satisfies the following conditions: \n", " - $ \\emptyset \\in \\mathfrak{A} $ \n", " - if $ A_1,A_2,...\\in \\mathfrak{A} $, then $ \\cup_{i=1}^{\\infty}A_i\\in \\mathfrak{A} $ \n", " - if $ A \\in \\mathfrak{A} $, then $ A^c \\in \\mathfrak{A} $ \n", "- $ P $ a probability measure on $ \\mathfrak{A} $ is a function\n", " $ P: \\mathfrak{A} \\rightarrow [0,1] $ satisfying: \n", " - $ P(\\emptyset)=0 $ and $ P(\\Omega)=1 $ \n", " - If $ A_1,A_2,... $ is a collection of disjoint members of $ \\mathfrak{A} $, in that $ A_i \\cap A_j=\\emptyset $ for all pairs $ i \\ne j $, then $ P(\\cup_{i=1}^\\infty A_i)=\\sum_{i=1}^{\\infty}P(A_i) $ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Random variables\n", "\n", "A real-valued random variable is a function\n", "\n", "$$\n", "\\tilde{X}: \\Omega \\rightarrow \\mathbb{R}\n", "$$\n", "\n", "with the property that $ \\{\\omega \\in \\Omega: X(\\omega) \\leq x\\} \\in \\mathfrak{A} $ for each $ x \\in \\mathbb{R} $. Such function is said to be $ \\mathfrak{A} $-measurable" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Discrete random variables\n", "\n", "- $ \\Omega = \\{x_1,\\dots,x_n\\} $ is finite with $ n $ elements \n", "- $ \\mathfrak{A} $ is set of all subsets of $ \\Omega $ \n", "- $ P $ is constructed from $ (p_1,p_2,\\dots,p_n) $, such that \n", "\n", "\n", "$$\n", "P(A) = \\sum_{i: x_i \\in A} p_i \\text{ for all } A \\in \\mathfrak{A}\n", "$$\n", "\n", "$ F(x) = P(\\{\\tilde{X} \\le x\\}) $ cumulative distribution function, stepwise constant, discontinuous" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Continuous random variables\n", "\n", "- $ \\Omega $ is subset of $ \\mathbb{R} $ \n", "- $ \\mathfrak{A} $ is Borel set on $ \\Omega $ \n", "- $ P([a,b]) = F(b) - F(a) $ \n", "\n", "\n", "$$\n", "F(x) = \\int_{-\\infty}^x f(t) dt\n", "$$\n", "\n", "- $ F(x) = P(\\{\\tilde{X} \\le x\\}) $ cumulative distribution function, continuous \n", "- $ f(x) $ probability density function " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Moments of random variables\n", "\n", "- First, mean, measure of *central tendency* \n", "\n", "\n", "$$\n", "E(\\tilde{X}) = \\int_{\\Omega} x \\,dF(x) = \\int_{\\Omega} x f(x) \\,dx \\quad\\text{or}\\quad =\\sum_{i=1}^{n} x_i p_i\n", "$$\n", "\n", "- Second central, variance, measure of *variability* \n", "\n", "\n", "$$\n", "D(\\tilde{X}) = V(\\tilde{X}) = Var (\\tilde{X}) = E\\big(\\tilde{X}- E(\\tilde{X})\\big)^2\n", "$$\n", "\n", "- Standard deviation = square root of variance (same units as mean) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Random sample\n", "\n", "Realizations of a random variable $ \\tilde{X} $ drawn independently one by one\n", "\n", "$$\n", "\\{x_1,x_2,\\dots,x_n\\} \\sim F_{\\tilde{X}}(x)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Simulation $ \\sim $ Random sample\n", "\n", "- simulation of the random variable is the same as random sampling \n", "- economic model (with shocks), as a function of the random variable, is a random variable itself \n", "- the complex induced distribution can be studied by simulation *of the model* \n", "- parameters of the model alter the model induced distribution " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Generating random numbers\n", "\n", "Every language has a **rand()** function that generates uniform number\n", "on $ [0,1] $\n", "\n", "In fact, **pseudo-random number generator**, i.e. deterministic\n", "algorithm that returns a sequence of numbers that look close enough to\n", "being random\n", "\n", "- must be fast \n", "- must have large periodicity \n", "- must satisfy a battery of statistical tests for independence and stationarity " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Random draws pictured\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### True random number generators\n", "\n", "Use physical processes as origin for randomness (RANDOM.ORG, hardware\n", "RNGs)\n", "\n", "Pseudo random number generator state may be initiated by true source of\n", "randomness\n", "\n", "- thermal noise \n", "- electrical noise " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### True random draws pictured\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Random generators in Python\n", "\n", "Modules:\n", "- **random** - basic functionality, scalar numbers\n", "- **NumPy.random** - vectorized, many distributions\n", "- **SciPy.stat** - more probability and statistics functions\n", "\n", "**x=random.random()** uniform on $ [0,1] $" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Simulating a scalar random variable (one by one or in bulk)\n", "import random\n", "x = [random.random() for i in range(100)]\n", "print(*x[:10],sep='\\n')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "def hist(data,bins='auto',range=None,theoretical=None,cdf=False):\n", " '''Draws histogram of data, imposes a theoretical distribution if given'''\n", " fig, ax = plt.subplots(figsize=(10,6))\n", " if cdf:\n", " # plot CDF instead of histogram\n", " plt.hist(data,bins=len(data),range=range,cumulative=True,density=True,align='right',histtype='step',color='black')\n", " else:\n", " plt.hist(data,bins=bins,range=range,density=True,histtype='bar',color='lightgrey',edgecolor='k')\n", " if theoretical and len(data)>0:\n", " # add theoretical distribution\n", " x = (np.linspace(range[0],range[-1],100) if range else np.linspace(min(data),max(data),100))\n", " y = theoretical(x)\n", " plt.plot(x,y,'r-')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import scipy.stats\n", "hist(x,theoretical=scipy.stats.uniform.pdf)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "for i in range(1,6):\n", " n=10**i\n", " data = np.random.random(n) #NumPy\n", " hist(data,bins=25,theoretical=scipy.stats.uniform.pdf)\n", " plt.title('%d realizations'%n)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Other distributions\n", "\n", "- Normal \n", "- Log-normal \n", "- Exponential \n", "- Fisher \n", "- $ \\chi^2 $-distribution \n", "\n", "\n", "and many other\n", "\n", "[https://numpy.org/doc/stable/reference/random/generator.html](https://numpy.org/doc/stable/reference/random/generator.html)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "for i in range(1,6):\n", " n=10**i\n", " data = np.random.lognormal(size=n)\n", " hist(data,bins=25,range=(0,5),theoretical=lambda x: scipy.stats.lognorm.pdf(x,1.0))\n", " plt.title('%d realizations'%n)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Simulating from any probability distribution\n", "\n", "- Inverse transform sampling \n", "- Accept-reject method (Rejection sampling) \n", "- [Ziggurat algorithm](https://en.wikipedia.org/wiki/Ziggurat_algorithm) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Inverse transform sampling\n", "\n", "Let $ F(x) $ be cdf of the random variable of interest\n", "$ \\tilde{X} $, with inverse $ F^{(-1)}(x) $\n", "\n", "To simulate $ \\tilde{X} $:\n", "\n", "1. simulate $ (u_1,\\dots,u_n) $ from standard uniform distribution \n", "1. return $ \\big( F^{(-1)}(u_1),\\dots,F^{(-1)}(u_n) \\big) $ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let $ \\tilde{X} = F^{(-1)}(\\tilde{U}) $, and denote its cdf $ F_{\\tilde{X}}(x) $.\n", "\n", "Also let $ F_{U}(x) = \\min(\\max(0,U),1) $ denote cdf of standard uniform distribution.\n", "Then\n", "\n", "$$\n", "F_{\\tilde{X}}(x) = P(\\{ \\tilde{X} \\le x \\}) = P(\\{ F^{(-1)}(\\tilde{U}) \\le x \\}) =\n", "$$\n", "\n", "$$\n", "= P(\\{ \\tilde{U} \\le F(x) \\}) = F_U \\big(F(x)\\big) = F(x)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Inverse CDF\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Example: Gumbel (Extreme value type I) distribution\n", "\n", "$$\n", "F(x) = \\exp \\big( -\\exp (-\\frac{x-\\mu}{\\beta} ) \\big)\n", "$$\n", "\n", "$$\n", "F^{-1}(x) = \\mu - \\beta \\log( -\\log(x))\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Repeatability of simulations\n", "\n", "- Sometimes it is important to be able to repeat the simulation’s random number sequence \n", "- There are ways to control the psuedo-random number generator state \n", "\n", "\n", "`seed()`\n", "\n", "`get_state()`\n", "\n", "`set_state()`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Monte Carlo method\n", "\n", "Solving deterministic problems using random numbers\n", "\n", "Example: Compute $ \\pi $\n", "\n", "Approach:\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "def pimc(n=100):\n", " '''Compute pi using Monte Carlo with sample size n'''\n", " d = np.random.uniform(size=(2,int(n)))\n", " d2 = np.sum(d**2,axis=0)\n", " n1 = np.sum(d2<1)\n", " s4 = n1/n\n", " return 4*s4\n", "\n", "d = pimc(n=1e6)\n", "print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "data=[]\n", "for i in range(8):\n", " n = 10**i\n", " d = pimc(n)\n", " print('Estimate of pi is %1.5f, %1.0e points, bias %1.3e'%(d,n,d-np.pi))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "data=[]\n", "for i in range(1000):\n", " data.append(pimc(n=10000))\n", "hist(data)\n", "d = np.mean(data)\n", "print('Estimate of pi is %1.5f, bias %1.3e'%(d,d-np.pi))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Thought on Monte Carlo\n", "\n", "- Convergence is slow \n", "- Fitted for some tasks better than for other \n", " \n", " 1. Simulating random variables with complex distributions (economic models) \n", " 1. Testing econometric procedures and estimators \n", " 1. Computing high dimensional integrals \n", " 1. Reducing computational load for very hard problems by randomization " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Monte Carlo assistant for analytic proofs\n", "\n", "1. Assume some non-informative initial distributions for all parameters in the proposition (uniform with wide bounds usually) \n", "1. Generate a *large* number of parameter vectors \n", "1. Check if a theoretical proposition is true or not. \n", "1. Analyze the patterns of parameters when the proposition does or does not hold (sometime finding just a single counter example is sufficient) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Example\n", "\n", "Let $ f(x) = x ^ k $ and\n", "$ g(x) = 1 + \\frac{1}{k}\\cos(2 \\pi x) $.\n", "\n", "*Proposition:* $ g(x)>f(x) $ on $ [0,1] $ for all\n", "$ k \\ge 1.5 $?\n", "\n", "More broadly, what are the values of $ k $ when the inequality holds?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "x = np.linspace(0,1,1000).reshape([-1,1]) # space for check points on [0,1], column\n", "N = 100 # number of random draws of parameters\n", "k = np.random.uniform(low=1.5,high=10,size=N) # generate k\n", "f = x ** k\n", "g = 1 + np.cos(x*2*np.pi) / k\n", "\n", "# plot the functions\n", "fig, ax = plt.subplots()\n", "ax.plot(x,f,color='b',alpha=0.5)\n", "ax.plot(x,g,color='r',alpha=0.5)\n", "plt.show()\n", "\n", "check = np.array(np.all(g>f,axis=0)) # simulate the proposition\n", "fk = k[np.logical_not(check)]\n", "answer = np.all(check)\n", "print('The simulated proposition is:',answer)\n", "if not answer:\n", " print('Failing values of k:',fk,sep=' ')\n", "\n", "# plot the functions\n", "fig, ax = plt.subplots()\n", "ax.scatter(k[check],np.ones(k[check].shape),color='g')\n", "ax.scatter(fk,np.ones(fk.shape),color='r')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Conclusion: Even though with relatively low $ N=100 $ the statement may\n", "seem to be correct (depending on the lowest realized $ k $), running more\n", "simulations, for example, $ N=500 $, reveals that there are values of\n", "$ k $ close to 1.5 where the statement is incorrect.\n", "Running even more simulations may help to establish an approximate threshold\n", "somewhat above 1.5, such that for $ k $ above the threshold, the inequality\n", "is true." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Further learning resources\n", "\n", "- Docs on **SciPy.random**\n", " [https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.random.html) \n", "- Docs on **SciPy.stats**\n", " [https://docs.scipy.org/doc/scipy/reference/stats.html](https://docs.scipy.org/doc/scipy/reference/stats.html) \n", "- Random number generators [https://www.random.org/analysis](https://www.random.org/analysis) \n", "- 📖 Kevin Sheppard “Introduction to Python for Econometrics, Statistics and Data Analysis.” *Chapter: 19* " ] } ], "metadata": { "celltoolbar": "Slideshow", "date": 1634828600.452924, "filename": "33_stochastic.rst", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "title": "Foundations of Computational Economics #33" }, "nbformat": 4, "nbformat_minor": 4 }