{ "metadata": { "name": "", "signature": "sha256:b1defb57ac4b84efc1aa0b618a3a17cf695496ebe8025a45c5287ec3cd62a3c2" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "

Multilanguage Monte Carlo

\n", "\n", "

Python

\n", "\n", "

Statistical Computing Languages, RSS

\n", "
\n", "\n", "

\n", "

[Mike Croucher](http://www.walkingrandomly.com/) [@WalkingRandomly](http://twitter.com/walkingrandomly)

\n", "

University of Manchester

\n", "

Presented by Neil D. Lawrence on 21st November 2014

\n", "\n", "

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Multi Language Monte Carlo\n", "\n", "We want to estimate the integral\n", "\n", "$$\n", "\\int^5_{-5} \\frac{\\exp(-u^2/2)}{\\sqrt{2\\pi}}\\text{d}u\n", "$$\n", "\n", "using a simple Monte Carlo rejection algorithm. The function is the standard normal distribution, so the integral is approximately equal to 1. \n", "\n", "Here, we use the IPython notebook (Soon to be called Project Jupyter) to demonstrate possible solutions in several languages. Once you have this notebook set up correctly on your machine, all of the code below is editable and executable from within the web browser." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# rerun Fernando's script to load in css\n", "%run talktools" ], "language": "python", "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "html": [ "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python\n", "\n", "A first attempt at a solution using pure Python might look like this" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import random\n", "from math import exp, sqrt, pi\n", "\n", "def f(u):\n", " return(exp(-u**2/2)/sqrt(2*pi))\n", "\n", "def loopy_monte(N):\n", " hits = 0\n", " for count in range(N):\n", " x = random.uniform(-5,5)\n", " y = random.uniform(0,0.5)\n", " hits = hits + (y < f(x))\n", " \n", " estimate = hits / float(N) * (0.5*10)\n", " return(estimate)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run the calculation for 1 million iterations" ] }, { "cell_type": "code", "collapsed": false, "input": [ "loopy_monte(10**6)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "1.001425" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Import Libraries\n", "\n", "Note the **import** statements. Unlike languages such as `MATLAB` or `R`, Python has few functions built into its global namespace at startup. This means that even simple functions such as **sqrt** need to be explicity imported from a module. In the above code, we use the modules **random** and **math** which are part of the standard Python library, the set of modules that are included with every Python installation. \n", "\n", "It is possible to export everything from a module in one hit. Instead of\n", "\n", "`from math import exp,sqrt,pi`\n", "\n", "we might have done\n", "\n", "`from math import *`\n", "\n", "This idiom is commonly used but is considered bad practice. Python programmers would say that it is not *Pythonic*. The principle objection to this practice is that it pollutes your namespace with many functions -- many of which the user is probably unaware of. If you do multiple module imports in this manner, there is a good chance that function names from one module would overwrite those of a previous module." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Timing It\n", "\n", "Let's time our function using IPython's `%timeit` magic" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit loopy_monte(10**6)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 3.61 s per loop\n" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "This might not seem too bad for 1 million iterations but we can do significantly better. As with many high-level languages, loops are very expensive in Python. One approach to making things faster is to **vectorise** our code. That is, we operate on entire vectors at once rather than on scalars. This can be achieved in Python using the **numpy** module. \n", "\n", "Numpy is not part of the Python standard library and needs to be installed separately. It is, however, considered the de-facto standard for numerical computing in Python and is widely used. If you use a Python distribution such as Anaconda or Enthought, numpy will be available to import out of the box." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Vectored Code" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "\n", "def numpy_f(u):\n", " return(np.exp(-u**2/2)/np.sqrt(2*np.pi))\n", "\n", "def numpy_monte(N):\n", " x = np.random.uniform(-5,5,N)\n", " y = np.random.uniform(0,0.5,N)\n", " hits = np.sum(y # start the clock\r\n", "> start_time = proc.time()\r\n", "> f = function(u) exp(-u^2/2)/sqrt(2*pi)\r\n", "> \r\n", "> simulate_pt = function(i){\r\n", "+ x = runif(1,-5,5); y = runif(1,0,0.5);\r\n", "+ y < f(x)\r\n", "+ }\r\n", "> \r\n", "> N = 10^6; hits = 0\r\n", "> for(i in 1:N)\r\n", "+ hits = hits + simulate_pt() \r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "> (estimate = hits/N*(0.5*10))\r\n", "[1] 0.996925\r\n", "> \r\n", "> #Stop the clock\r\n", "> print(proc.time() - start_time)\r\n", " user system elapsed \r\n", " 10.180 0.050 10.303 \r\n", "> print(estimate)\r\n", "[1] 0.996925\r\n", "> \r\n" ] } ], "prompt_number": 92 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, 11.68 seconds via the R connector and 10.3 seconds if run directly in the shell. Either way, it's the slowest solution so far." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Vectorisation is pretty much essential in R. Here's the vecorisd version given by Colin. Although the R connector is nice, I'll run it in the shell to give a better idea of the speed difference." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%writefile vector_monte.R\n", "start_time = proc.time()\n", "N=10^6\n", "f = function(u) exp(-u^2/2)/sqrt(2*pi)\n", "estimate = sum(runif(N, 0, 0.5) < f(runif(N, -5, 5)))/N*(0.5*10)\n", "\n", "print(proc.time() - start_time)\n", "print(estimate)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overwriting vector_monte.R\n" ] } ], "prompt_number": 98 }, { "cell_type": "code", "collapsed": false, "input": [ "!R --vanilla < vector_monte.R" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\r\n", "R version 3.1.2 (2014-10-31) -- \"Pumpkin Helmet\"\r\n", "Copyright (C) 2014 The R Foundation for Statistical Computing\r\n", "Platform: x86_64-apple-darwin13.4.0 (64-bit)\r\n", "\r\n", "R is free software and comes with ABSOLUTELY NO WARRANTY.\r\n", "You are welcome to redistribute it under certain conditions.\r\n", "Type 'license()' or 'licence()' for distribution details.\r\n", "\r\n", " Natural language support but running in an English locale\r\n", "\r\n", "R is a collaborative project with many contributors.\r\n", "Type 'contributors()' for more information and\r\n", "'citation()' on how to cite R or R packages in publications.\r\n", "\r\n", "Type 'demo()' for some demos, 'help()' for on-line help, or\r\n", "'help.start()' for an HTML browser interface to help.\r\n", "Type 'q()' to quit R.\r\n", "\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "> start_time = proc.time()\r\n", "> N=10^6\r\n", "> f = function(u) exp(-u^2/2)/sqrt(2*pi)\r\n", "> estimate = sum(runif(N, 0, 0.5) < f(runif(N, -5, 5)))/N*(0.5*10)\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "> \r\n", "> print(proc.time() - start_time)\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " user system elapsed \r\n", " 0.246 0.013 0.260 \r\n", "> print(estimate)\r\n", "[1] 0.999495\r\n", "> \r\n" ] } ], "prompt_number": 100 }, { "cell_type": "markdown", "metadata": {}, "source": [ "On my machine, this comes out at 0.26 seconds giving the slowest vectorised solution we've seen so far." ] } ], "metadata": {} } ] }