{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Random walk\n",
    "<div id=\"diffu:randomwalk\"></div>\n",
    "\n",
    "\n",
    "Models leading to diffusion equations, see the section [diffu:app](#diffu:app), are\n",
    "usually based on reasoning with *averaged* physical quantities such as\n",
    "concentration, temperature, and velocity.  The underlying physical\n",
    "processes involve complicated microscopic movement of atoms and\n",
    "molecules, but an average of a large number of molecules is performed\n",
    "in a small volume before the modeling starts, and the averaged\n",
    "quantity inside this volume is assigned as a point value at the\n",
    "centroid of the volume. This means that concentration, temperature,\n",
    "and velocity at a space-time point represent averages around the\n",
    "point in a small time interval and small spatial volume.\n",
    "\n",
    "Random walk is a principally different kind of modeling procedure\n",
    "compared to the reasoning behind partial differential equations.  The\n",
    "idea in random walk is to have a large number of \"particles\" that\n",
    "undergo random movements. Averaging can then be used afterwards to\n",
    "compute macroscopic quantities like concentration. The \"particles\"\n",
    "and their random movement represent a very simplified microscopic\n",
    "behavior of molecules, much simpler and computationally much more\n",
    "efficient than direct [molecular simulation](https://en.wikipedia.org/wiki/Molecular_dynamics), yet the random\n",
    "walk model has been very powerful to describe a wide range of\n",
    "phenomena, including heat conduction, quantum mechanics, polymer\n",
    "chains, population genetics, neuroscience, hazard games, and pricing\n",
    "of financial instruments.\n",
    "\n",
    "mathcal{I}_t can be shown that random walk, when averaged, produces models that\n",
    "are mathematically equivalent to diffusion equations. This is the\n",
    "primary reason why we treat random walk in this chapter: two very\n",
    "different algorithms (finite difference stencils and random walk)\n",
    "solve the same type of problems. The simplicity of the random walk\n",
    "algorithm makes it particularly attractive for solving diffusion\n",
    "equations on massively parallel computers.\n",
    "The exposition here is as simple as possible, and good thorough\n",
    "derivation of the models is provided by Hjorth-Jensen [[hjorten]](#hjorten).\n",
    "\n",
    "## Random walk in 1D\n",
    "<div id=\"diffu:randomwalk:1D\"></div>\n",
    "\n",
    "Imagine that we have some particles that perform random moves, either\n",
    "to the right or to the left. We may flip a coin to decide the movement\n",
    "of each particle, say head implies movement to the right and tail\n",
    "means movement to the left. Each move is one unit length.  Physicists\n",
    "use the term *random walk* for this type of movement.\n",
    "The movement is also known as [drunkard's walk](https://en.wikipedia.org/wiki/The_Drunkard%27s_Walk).\n",
    "You may try this yourself: flip the coin and make one step to the left\n",
    "or right, and repeat the process.\n",
    "\n",
    "We introduce the symbol $N$ for the number of steps in a random walk.\n",
    "[Figure](#diffu:randomwalk:1D:fig:ensemble) shows four different\n",
    "random walks with $N=200$.\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw1D_ensemble4.png, width=800 frac=1] Ensemble of 4 random walks, each with 200 steps. <div id=\"diffu:randomwalk:1D:fig:ensemble\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:1D:fig:ensemble\"></div>\n",
    "\n",
    "<p>Ensemble of 4 random walks, each with 200 steps.</p>\n",
    "<img src=\"fig-diffu/rw1D_ensemble4.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "\n",
    "## Statistical considerations\n",
    "<div id=\"diffu:randomwalk:1D:EVar\"></div>\n",
    "\n",
    "\n",
    "Let $S_k$ be the stochastic variable representing a step to the left\n",
    "or to the right in step number $k$. We have that $S_k=-1$ with\n",
    "probability $p$ and $S_k=1$ with probability $q=1-p$. The variable\n",
    "$S_k$ is known as a [Bernoulli variable](https://en.wikipedia.org/wiki/Bernoulli_distribution). The\n",
    "expectation of $S_k$ is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\E{S_k} = p\\cdot (-1) + q\\cdot 1 = 1 - 2p,\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and the variance is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\Var{S_k} = \\E{S_k^2} - \\E{S_k}^2 = 1 - (1-2p)^2 = 4p(1-p)\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The position after $k$ steps is another stochastic variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\bar X_k = \\sum_{i=0}^{k-1} S_i\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The expected position is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\E{\\bar X_k} =\n",
    "\\E{\\sum_{i=0}^{k-1} S_i} = \\sum_{i=0}^{k-1} \\E{S_i}= k(1-2p)\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All the $S_k$ variables are independent. The variance therefore becomes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\Var{\\bar X_k} = \\Var{\\sum_{i=0}^{k-1} S_i} = \\sum_{i=0}^{k-1} \\Var{S_i}=\n",
    "k4p(1-p)\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that $\\Var{\\bar X_k}$ is proportional with the number of steps $k$.\n",
    "For the very important case $p=q=\\frac{1}{2}$, $\\E{\\bar X_k}=0$ and\n",
    "$\\Var{\\bar X_k}=k$.\n",
    "\n",
    "\n",
    "How can we estimate $\\E{\\bar X_k}=0$ and $\\Var{\\bar X_k}=N$?\n",
    "We must have many random walks of the type in\n",
    "[Figure](#diffu:randomwalk:1D:fig:ensemble). For a given $k$, say $k=100$,\n",
    "we find all the values of $\\bar X_k$, name them $\\bar x_{0,k}$, $\\bar x_{1,k}$,\n",
    "$\\bar x_{2,k}$, and so on. The empirical estimate of $\\E{\\bar X_k}$ is the\n",
    "average,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\E{\\bar X_k} \\approx \\frac{1}{W}\\sum_{j=0}^{W-1} \\bar x_{j,k},\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "while an empirical estimate of $\\Var{\\bar X_k}$ is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\Var{\\bar X_k} \\approx \\frac{1}{W}\\sum_{j=0}^{W-1} (\\bar x_{j,k})^2 -\n",
    "\\left(\\frac{1}{W}\\sum_{j=0}^{W-1} \\bar x_{j,k}\\right)^2\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That is, we take the statistics for a given $K$ across the ensemble\n",
    "of random walks (\"vertically\" in\n",
    "[Figure](#diffu:randomwalk:1D:fig:ensemble)). The key quantities\n",
    "to record are $\\sum_i \\bar x_{i,k}$ and $\\sum_i \\bar x_{i,k}^2$.\n",
    "\n",
    "## Playing around with some code\n",
    "<div id=\"diffu:randomwalk:1D:code1\"></div>\n",
    "\n",
    "\n",
    "### Scalar code\n",
    "\n",
    "Python has a `random` module for drawing random numbers, and this\n",
    "module has a function `uniform(a, b)` for drawing a uniformly\n",
    "distributed random number in the interval $[a,b)$.  If an event\n",
    "happens with probability $p$, we can simulate this on the computer by\n",
    "drawing a random number $r$ in $[0,1)$, because then $r\\leq p$ with\n",
    "probability $p$ and $r>p$ with probability $1-p$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random\n",
    "r = random.uniform(0, 1)\n",
    "if r <= p:\n",
    "    # Event happens\n",
    "else:\n",
    "    # Event does not happen"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A random walk with $N$ steps, starting at $x_0$, where we move\n",
    "to the left with probability $p$ and to the right\n",
    "with probability $1-p$ can now be implemented by"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import random, numpy as np\n",
    "\n",
    "def random_walk1D(x0, N, p):\n",
    "    \"\"\"1D random walk with 1 particle.\"\"\"\n",
    "    # Store position in step k in position[k]\n",
    "    position = np.zeros(N)\n",
    "    position[0] = x0\n",
    "    current_pos = x0\n",
    "    for k in range(N-1):\n",
    "        r = random.uniform(0, 1)\n",
    "        if r <= p:\n",
    "            current_pos -= 1\n",
    "        else:\n",
    "            current_pos += 1\n",
    "        position[k+1] = current_pos\n",
    "    return position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vectorized code\n",
    "\n",
    "Since $N$ is supposed to be large and we want to repeat the process for\n",
    "many particles, we should speed up the code as much as possible.\n",
    "Vectorization is the obvious technique here: we draw all the random\n",
    "numbers at once with aid of `numpy`, and then we formulate vector\n",
    "operations to get rid of the loop over the steps (`k`).\n",
    "The `numpy.random` module has vectorized versions of the functions in\n",
    "Python's built-in `random` module. For example, `numpy.random.uniform(a, b, N)`\n",
    "returns `N` random numbers uniformly distributed between `a` (included)\n",
    "and `b` (not included).\n",
    "\n",
    "We can then make an array of all the steps in a random walk: if\n",
    "the random number is less than or equal to $p$, the step is $-1$,\n",
    "otherwise the step is $1$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = np.random.uniform(0, 1, size=N)\n",
    "steps = np.where(r <= p, -1, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The value of `position[k]` is the sum of all steps up to step `k`.\n",
    "Such sums are often needed in vectorized algorithms and therefore\n",
    "available by the `numpy.cumsum` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.cumsum(np.array([1,3,4,6]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The resulting array in this demo has elements $1$, $1+3=4$, $1+3+4=8$,\n",
    "and $1+3+4+6=14$.\n",
    "\n",
    "We can now vectorize the `random_walk1D` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walk1D_vec(x0, N, p):\n",
    "    \"\"\"Vectorized version of random_walk1D.\"\"\"\n",
    "    # Store position in step k in position[k]\n",
    "    position = np.zeros(N+1)\n",
    "    position[0] = x0\n",
    "    r = np.random.uniform(0, 1, size=N)\n",
    "    steps = np.where(r <= p, -1, 1)\n",
    "    position[1:] = x0 + np.cumsum(steps)\n",
    "    return position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This code runs about 10 times faster than the scalar version.\n",
    "With a parallel `numpy` library, the code can also automatically take\n",
    "advantage of hardware for parallel computing because each of the four\n",
    "array operations can be trivially parallelized.\n",
    "\n",
    "\n",
    "### Fixing the random sequence\n",
    "\n",
    "During software development with random numbers it is advantageous to\n",
    "always generate the same sequence of random numbers, as this may help\n",
    "debugging processes. To fix the sequence, we set a *seed* of the random\n",
    "number generator to some chosen integer, e.g.,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.random.seed(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calls to `random_walk1D_vec` give positions of the particle as\n",
    "depicted in [Figure](#diffu:randomwalk:1D:code1:fig1). The particle starts\n",
    "at the origin and moves with $p=\\frac{1}{2}$. Since the seed is the same,\n",
    "the plot to the left is just a magnification of the first 1,000 steps in\n",
    "the plot to the right.\n",
    "<!-- demo_random_walk1D produced the plots -->\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw1D_1sample.png, width=800 frac=1] 1,000 (left) and 50,000 (right) steps of a random walk. <div id=\"diffu:randomwalk:1D:code1:fig1\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:1D:code1:fig1\"></div>\n",
    "\n",
    "<p>1,000 (left) and 50,000 (right) steps of a random walk.</p>\n",
    "<img src=\"fig-diffu/rw1D_1sample.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "\n",
    "### Verification\n",
    "\n",
    "When we have a scalar and a vectorized code, it is always a good idea to\n",
    "develop a unit test for checking that they produce the same result.\n",
    "A problem in the present context is that the two versions apply two different\n",
    "random number generators. For a test to be meaningful, we need to fix\n",
    "the seed and use the same generator. This means that the scalar version\n",
    "must either use `np.random` or have this as an option. An option\n",
    "is the most flexible choice:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "        import random\n",
    "        \n",
    "        def random_walk1D(x0, N, p, random=random):\n",
    "            ...\n",
    "            r = random.uniform(0, 1)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using `random=np.random`, the `r` variable gets computed\n",
    "by `np.random.uniform`, and the sequence of random numbers will be\n",
    "the same as in the vectorized version that employs the same generator\n",
    "(given that the seed is also the same). A proper test function may be\n",
    "to check that the positions in the walk are the same in the scalar and\n",
    "vectorized implementations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_random_walk1D():\n",
    "    # For fixed seed, check that scalar and vectorized versions\n",
    "    # produce the same result\n",
    "    x0 = 2;  N = 4;  p = 0.6\n",
    "    np.random.seed(10)\n",
    "    scalar_computed = random_walk1D(x0, N, p, random=np.random)\n",
    "    np.random.seed(10)\n",
    "    vectorized_computed = random_walk1D_vec(x0, N, p)\n",
    "    assert (scalar_computed == vectorized_computed).all()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that we employ `==` for arrays with real numbers, which is normally\n",
    "an inadequate test due to rounding errors, but in the present case,\n",
    "all arithmetics consists of adding or subtracting one, so these operations\n",
    "are expected to have no rounding errors. Comparing two `numpy` arrays\n",
    "with `==` results in a boolean array, so we need to call the `all()`\n",
    "method to ensure that all elements are `True`, i.e., that all elements\n",
    "in the two arrays match each other pairwise.\n",
    "\n",
    "\n",
    "## Equivalence with diffusion\n",
    "<div id=\"diffu:randomwalk:1D:pde\"></div>\n",
    "\n",
    "\n",
    "The original random walk algorithm can be said to work with\n",
    "dimensionless coordinates $\\bar x_i = -N + i$, $i=0,1,\\ldots, 2N+1$\n",
    "($i\\in [-N,N]$), and $\\bar t_n=n$, $n=0,1,\\ldots,N$.  A mesh with\n",
    "spacings $\\Delta x$ and $\\Delta t$ with dimensions can be introduced\n",
    "by"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "x_i = X_0 + \\bar x_i \\Delta x,\\quad t_n = \\bar t_n\\Delta t\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we implement the algorithm with dimensionless coordinates, we can just\n",
    "use this rescaling to obtain the movement in a coordinate system\n",
    "without unit spacings.\n",
    "\n",
    "Let $P^{n+1}_i$ be the probability of finding the particle at mesh point\n",
    "$\\bar x_i$ at time $\\bar t_{n+1}$. We can reach mesh point $(i,n+1)$ in two\n",
    "ways: either coming in from the left from $(i-1,n)$ or from the\n",
    "right ($i+1,n)$. Each has probability $\\frac{1}{2}$ (if we assume\n",
    "$p=q=\\frac{1}{2}$). The fundamental equation for $P^{n+1}_i$ is"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"diffu:randomwalk:1D:pde:Markov\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "P^{n+1}_i = \\frac{1}{2} P^{n}_{i-1} + \\frac{1}{2} P^{n}_{i+1}\\thinspace .\n",
    "\\label{diffu:randomwalk:1D:pde:Markov} \\tag{1}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(This equation is easiest to understand if one looks at the random walk\n",
    "as a Markov process and applies the transition probabilities, but this is\n",
    "beyond scope of the present text.)\n",
    "\n",
    "Subtracting $P^{n}_i$ from ([1](#diffu:randomwalk:1D:pde:Markov)) results\n",
    "in"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "P^{n+1}_i - P^{n}_i = \\frac{1}{2} (P^{n}_{i-1} -2P^{n}_i + \\frac{1}{2} P^{n}_{i+1})\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Readers who have seen the Forward Euler discretization of a 1D\n",
    "diffusion equation recognize this scheme as very close to such a\n",
    "discretization. We have"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\frac{\\partial}{\\partial t}P(x_i,t_{n})\n",
    "= \\frac{P^{n+1}_i - P^{n}_i}{\\Delta t} + \\Oof{\\Delta t},\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or in dimensionless coordinates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\frac{\\partial}{\\partial\\bar t}P(\\bar x_i,\\bar t_n)\n",
    "\\approx P^{n+1}_i - P^{n}_i\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, we have"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\begin{align*}\n",
    "\\frac{\\partial^2}{\\partial x^2}P(x_i,t_n) &=\n",
    "\\frac{P^{n}_{i-1} -2P^{n}_i + \\frac{1}{2} P^{n}_{i+1}}{\\Delta x^2}\n",
    "+ \\Oof{\\Delta x^2},\\\\\n",
    "\\frac{\\partial^2}{\\partial x^2}P(\\bar x_i,\\bar t_n) &\\approx\n",
    "P^{n}_{i-1} -2P^{n}_i + \\frac{1}{2} P^{n}_{i+1}\\thinspace .\n",
    "\\end{align*}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Equation ([1](#diffu:randomwalk:1D:pde:Markov)) is therefore equivalent with\n",
    "the dimensionless diffusion equation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"diffu:randomwalk:1D:pde:dimless\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\frac{\\partial P}{\\partial\\bar t} = \\frac{1}{2}\n",
    "\\frac{\\partial^2 P}{\\partial \\bar x^2},\n",
    "\\label{diffu:randomwalk:1D:pde:dimless} \\tag{2}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or the diffusion equation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"diffu:randomwalk:1D:pde:dim\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\frac{\\partial P}{\\partial t} = D\\frac{\\partial^2 P}{\\partial x^2},\n",
    "\\label{diffu:randomwalk:1D:pde:dim} \\tag{3}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "with diffusion coefficient"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "D = \\frac{\\Delta x^2}{2\\Delta t}\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This derivation shows the tight link between random walk and diffusion.\n",
    "If we keep track of where the particle is, and repeat the process\n",
    "many times, or run the algorithms for lots of particles, the histogram\n",
    "of the positions will approximate the solution of the diffusion equation\n",
    "for the local probability $P^n_i$.\n",
    "\n",
    "Suppose all the random walks start at the origin. Then the initial\n",
    "condition for the probability distribution is the Dirac delta\n",
    "function $\\delta(x)$. The solution of ([2](#diffu:randomwalk:1D:pde:dimless))\n",
    "can be shown to be"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"diffu:randomwalk:1D:pde:dimless:sol\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\bar P(\\bar x,\\bar t) = \\frac{1}{\\sqrt{4\\pi\\dfc t}}e^{-\\frac{x^2}{4\\dfc t}},\n",
    "\\label{diffu:randomwalk:1D:pde:dimless:sol} \\tag{4}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $\\dfc = \\frac{1}{2}$.\n",
    "\n",
    "## Implementation of multiple walks\n",
    "\n",
    "Our next task is to implement an ensemble of walks (for statistics,\n",
    "see the section [Statistical considerations](#diffu:randomwalk:1D:EVar)) and also provide data from\n",
    "the walks such that we can compute the probabilities of the positions\n",
    "as introduced in the previous section. An appropriate representation\n",
    "of probabilities $P^n_i$ are histograms (with $i$ along the $x$ axis)\n",
    "for a few selected values of $n$.\n",
    "\n",
    "To estimate the expectation and variance of the random walks,\n",
    "the section [Statistical considerations](#diffu:randomwalk:1D:EVar) points to recording\n",
    "$\\sum_j x_{j,k}$ and $\\sum_j x_{j,k}^2$, where $x_{j,k}$ is the\n",
    "position at time/step level $k$ in random walk number $j$.\n",
    "The histogram of positions needs the individual values $x_{i,k}$\n",
    "for all $i$ values and some selected $k$ values.\n",
    "\n",
    "We introduce `position[k]` to hold $\\sum_j x_{j,k}$,\n",
    "`position2[k]` to hold $\\sum_j (x_{j,k})^2$, and\n",
    "`pos_hist[i,k]` to hold $x_{i,k}$. A selection of $k$ values can be\n",
    "specified by saying how many, `num_times`, and let them be equally\n",
    "spaced through time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "pos_hist_times = [(N//num_times)*i for i in range(num_times)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is one of the few situations where we want integer division (`//`) or\n",
    "real division rounded to an integer.\n",
    "\n",
    "### Scalar version\n",
    "\n",
    "Our scalar implementation of running `num_walks` random walks may go\n",
    "like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walks1D(x0, N, p, num_walks=1, num_times=1,\n",
    "                   random=random):\n",
    "    \"\"\"Simulate num_walks random walks from x0 with N steps.\"\"\"\n",
    "    position = np.zeros(N+1)    # Accumulated positions\n",
    "    position[0] = x0*num_walks\n",
    "    position2 = np.zeros(N+1)   # Accumulated positions**2\n",
    "    position2[0] = x0**2*num_walks\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "    #print 'save hist:', post_hist_times\n",
    "\n",
    "    for n in range(num_walks):\n",
    "        num_times_counter = 0\n",
    "        current_pos = x0\n",
    "        for k in range(N):\n",
    "            if k in pos_hist_times:\n",
    "                #print 'save, k:', k, num_times_counter, n\n",
    "                pos_hist[n,num_times_counter] = current_pos\n",
    "                num_times_counter += 1\n",
    "            # current_pos corresponds to step k+1\n",
    "            r = random.uniform(0, 1)\n",
    "            if r <= p:\n",
    "                current_pos -= 1\n",
    "            else:\n",
    "                current_pos += 1\n",
    "            position [k+1] += current_pos\n",
    "            position2[k+1] += current_pos**2\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vectorized version\n",
    "\n",
    "We have already vectorized a single random walk. The additional\n",
    "challenge here is to vectorize the computation of the data for the\n",
    "histogram, `pos_hist`, but given the selected steps in `pos_hist_times`,\n",
    "we can find the corresponding positions by indexing with the\n",
    "list `pos_hist_times`: `position[post_hist_times]`, which are to be\n",
    "inserted in `pos_hist[n,:]`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walks1D_vec1(x0, N, p, num_walks=1, num_times=1):\n",
    "    \"\"\"Vectorized version of random_walks1D.\"\"\"\n",
    "    position  = np.zeros(N+1)    # Accumulated positions\n",
    "    position2 = np.zeros(N+1)    # Accumulated positions**2\n",
    "    walk = np.zeros(N+1)         # Positions of current walk\n",
    "    walk[0] = x0\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "\n",
    "    for n in range(num_walks):\n",
    "        r = np.random.uniform(0, 1, size=N)\n",
    "        steps = np.where(r <= p, -1, 1)\n",
    "        walk[1:] = x0 + np.cumsum(steps)  # Positions of this walk\n",
    "        position  += walk\n",
    "        position2 += walk**2\n",
    "        pos_hist[n,:] = walk[pos_hist_times]\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Improved vectorized version\n",
    "\n",
    "Looking at the vectorized version above, we still have one potentially\n",
    "long Python loop over `n`. Normally, `num_walks` will be much larger than `N`.\n",
    "The vectorization of the loop over `N` certainly speeds up the program,\n",
    "but if we think of vectorization as also a way to parallelize the code,\n",
    "all the independent walks (the `n` loop) can be executed in parallel.\n",
    "Therefore, we should include this loop as well in the vectorized\n",
    "expressions, at the expense of using more memory.\n",
    "\n",
    "We introduce the array `walks` to hold the $N+1$ steps of all the walks:\n",
    "each row represents the steps in one walk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "walks = np.zeros((num_walks, N+1))  # Positions of each walk\n",
    "walks[:,0] = x0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since all the steps are independent, we can just make one long\n",
    "vector of enough random numbers (`N*num_walks`), translate these\n",
    "numbers to $\\pm 1$, then we reshape the array such that the steps\n",
    "of each walk are stored in the rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = np.random.uniform(0, 1, size=N*num_walks)\n",
    "steps = np.where(r <= p, -1, 1).reshape(num_walks, N)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to sum up the steps in each walk. We need the\n",
    "`np.cumsum` function for this, with the argument `axis=1` for\n",
    "indicating a sum across the columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(6).reshape(2,3)\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.cumsum(a, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now `walks` can be computed by"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "walks[:,1:] = x0 + np.cumsum(steps, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `position` vector is the sum of all the walks. That is, we want to\n",
    "sum all the rows, obtained by"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "position  = np.sum(walks, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A corresponding expression computes the squares of the positions.\n",
    "Finally, we need to compute `pos_hist`, but that is a matter of\n",
    "grabbing some of the walks (according to `pos_hist_times`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "pos_hist[:,:] = walks[:,pos_hist_times]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The complete vectorized algorithm without any loop can now be\n",
    "summarized:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walks1D_vec2(x0, N, p, num_walks=1, num_times=1):\n",
    "    \"\"\"Vectorized version of random_walks1D; no loops.\"\"\"\n",
    "    position  = np.zeros(N+1)    # Accumulated positions\n",
    "    position2 = np.zeros(N+1)    # Accumulated positions**2\n",
    "    walks = np.zeros((num_walks, N+1))  # Positions of each walk\n",
    "    walks[:,0] = x0\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "\n",
    "    r = np.random.uniform(0, 1, size=N*num_walks)\n",
    "    steps = np.where(r <= p, -1, 1).reshape(num_walks, N)\n",
    "    walks[:,1:] = x0 + np.cumsum(steps, axis=1)\n",
    "    position  = np.sum(walks,    axis=0)\n",
    "    position2 = np.sum(walks**2, axis=0)\n",
    "    pos_hist[:,:] = walks[:,pos_hist_times]\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is the gain of the vectorized implementations? One important gain\n",
    "is that each vectorized operation can be automatically parallelized\n",
    "if one applies a parallel `numpy` library like [Numba](http://numba.pydata.org). On a single CPU, however, the speed up of the vectorized operations\n",
    "is also significant. With $N=1,000$ and 50,000 repeated walks,\n",
    "the two vectorized versions run about 25 and 18 times faster than the scalar\n",
    "version, with `random_walks1D_vec1` being fastest.\n",
    "<!-- The downside of vectorization of random walks is the large -->\n",
    "<!-- arrays that arise, especially the version in `random_walks1D_vec2`. -->\n",
    "<!-- CPU: -->\n",
    "<!-- N=1000, 50K walks -->\n",
    "<!-- scalar: 42, vec1: 1.7, vec2: 1.7, factors 25 and 25 -->\n",
    "\n",
    "### Remark on vectorized code and parallelization\n",
    "\n",
    "Our first attempt on vectorization removed the loop over the $N$ steps in\n",
    "a single walk. However, the number of walks is usually much larger than\n",
    "$N$, because of the need for accurate statistics. Therefore, we should\n",
    "rather remove the loop over all walks. mathcal{I}_t turns out, from our efficiency\n",
    "experiments, that the function `random_walks1D_vec2` (with no loops) is\n",
    "slower than `random_walks1D_vec1`. This is a bit surprising and may be\n",
    "explained by less efficiency in the statements involving very large\n",
    "arrays, containing all steps for all walks at once.\n",
    "\n",
    "From a parallelization and improved vectorization point of view, it\n",
    "would be more natural to switch the sequence of the loops in the\n",
    "serial code such that the shortest loop is the outer loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walks1D2(x0, N, p, num_walks=1, num_times=1, ...):\n",
    "    ...\n",
    "    current_pos = x0 + np.zeros(num_walks)\n",
    "    num_times_counter = -1\n",
    "\n",
    "    for k in range(N):\n",
    "        if k in pos_hist_times:\n",
    "\t    num_times_counter += 1\n",
    "\t    store_hist = True\n",
    "\telse:\n",
    "\t    store_hist = False\n",
    "\n",
    "        for n in range(num_walks):\n",
    "            # current_pos corresponds to step k+1\n",
    "            r = random.uniform(0, 1)\n",
    "\t    if r <= p:\n",
    "                current_pos[n] -= 1\n",
    "            else:\n",
    "                current_pos[n] += 1\n",
    "            position [k+1] += current_pos[n]\n",
    "            position2[k+1] += current_pos[n]**2\n",
    "            if store_hist:\n",
    "                pos_hist[n,num_times_counter] = current_pos[n]\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The vectorized version of this code, where we just vectorize the\n",
    "loop over `n`, becomes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walks1D2_vec1(x0, N, p, num_walks=1, num_times=1):\n",
    "    \"\"\"Vectorized version of random_walks1D2.\"\"\"\n",
    "    position  = np.zeros(N+1)    # Accumulated positions\n",
    "    position2 = np.zeros(N+1)    # Accumulated positions**2\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "\n",
    "    current_pos = np.zeros(num_walks)\n",
    "    current_pos[0] = x0\n",
    "    num_times_counter = -1\n",
    "\n",
    "    for k in range(N):\n",
    "        if k in pos_hist_times:\n",
    "\t    num_times_counter += 1\n",
    "\t    store_hist = True  # Store histogram data for this k\n",
    "\telse:\n",
    "\t    store_hist = False\n",
    "\n",
    "        # Move all walks one step\n",
    "        r = np.random.uniform(0, 1, size=num_walks)\n",
    "        steps = np.where(r <= p, -1, 1)\n",
    "        current_pos += steps\n",
    "        position[k+1]  = np.sum(current_pos)\n",
    "        position2[k+1] = np.sum(current_pos**2)\n",
    "        if store_hist:\n",
    "            pos_hist[:,num_times_counter] = current_pos\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function runs significantly faster than the `random_walks1D_vec1`\n",
    "function above, typically 1.7 times faster. The code is also more appropriate\n",
    "in a parallel computing context since each vectorized statement\n",
    "can work with data of size `num_walks` over the compute units, repeated `N`\n",
    "times (compared with data of size `N`, repeated `num_walks` times, in\n",
    "`random_walks1D_vec1`).\n",
    "\n",
    "The scalar code with switched loops, `random_walks1D2` runs a bit slower\n",
    "than the original code in `random_walks1D`, so with the longest loop as\n",
    "the inner loop, the vectorized function `random_walks1D2_vec1`\n",
    "is almost 60 times faster than the scalar counterpart, while the\n",
    "code `random_walks1D_vec2` without loops is only around 18 times faster.\n",
    "Taking into account the very large arrays required by the latter function,\n",
    "we end up with `random_walks1D2_vec1` as the preferred implementation.\n",
    "\n",
    "### Test function\n",
    "\n",
    "During program development, it is highly recommended to carry out\n",
    "computations by hand for, e.g., `N=4` and `num_walks=3`.  Normally,\n",
    "this is done by executing the program with these parameters and\n",
    "checking with pen and paper that the computations make sense.  The\n",
    "next step is to use this test for correctness in a formal test\n",
    "function.\n",
    "\n",
    "First, we need to check that the simulation of multiple random walks\n",
    "reproduces the results of `random_walk1D`, `random_walk1D_vec1`, and\n",
    "`random_walk1D_vec2` for the first walk, if the seed is the\n",
    "same. Second, we run three random walks (`N=4`) with the scalar and\n",
    "the two vectorized versions and check that the returned arrays are\n",
    "identical.\n",
    "\n",
    "For this type of test to be successful, we must be sure that exactly\n",
    "the same set of random numbers are used in the three versions, a fact\n",
    "that requires the same random number generator and the same seed, of\n",
    "course, but also the same sequence of computations.  This is not\n",
    "obviously the case with the three `random_walk1D*` functions we\n",
    "have presented. The critical issue in `random_walk1D_vec1` is that the\n",
    "first random numbers are used for the first walk, the second set of\n",
    "random numbers is used for the second walk and so on, to be compatible\n",
    "with how the random numbers are used in the function `random_walk1D`.\n",
    "For the function `random_walk1D_vec2` the situation is a bit more\n",
    "complicated since we generate all the random numbers at once.\n",
    "However, the critical step now is the reshaping of the array returned\n",
    "from `np.where`: we must reshape as `(num_walks, N)` to ensure that\n",
    "the first `N` random numbers are used for the first walk, the next `N`\n",
    "numbers are used for the second walk, and so on.\n",
    "\n",
    "We arrive at the test function below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_random_walks1D():\n",
    "    # For fixed seed, check that scalar and vectorized versions\n",
    "    # produce the same result\n",
    "    x0 = 0;  N = 4;  p = 0.5\n",
    "\n",
    "    # First, check that random_walks1D for 1 walk reproduces\n",
    "    # the walk in random_walk1D\n",
    "    num_walks = 1\n",
    "    np.random.seed(10)\n",
    "    computed = random_walks1D(\n",
    "        x0, N, p, num_walks, random=np.random)\n",
    "    np.random.seed(10)\n",
    "    expected = random_walk1D(\n",
    "        x0, N, p, random=np.random)\n",
    "    assert (computed[0] == expected).all()\n",
    "\n",
    "    # Same for vectorized versions\n",
    "    np.random.seed(10)\n",
    "    computed = random_walks1D_vec1(x0, N, p, num_walks)\n",
    "    np.random.seed(10)\n",
    "    expected = random_walk1D_vec(x0, N, p)\n",
    "    assert (computed[0] == expected).all()\n",
    "    np.random.seed(10)\n",
    "    computed = random_walks1D_vec2(x0, N, p, num_walks)\n",
    "    np.random.seed(10)\n",
    "    expected = random_walk1D_vec(x0, N, p)\n",
    "    assert (computed[0] == expected).all()\n",
    "\n",
    "    # Second, check multiple walks: scalar == vectorized\n",
    "    num_walks = 3\n",
    "    num_times = N\n",
    "    np.random.seed(10)\n",
    "    serial_computed = random_walks1D(\n",
    "        x0, N, p, num_walks, num_times, random=np.random)\n",
    "    np.random.seed(10)\n",
    "    vectorized1_computed = random_walks1D_vec1(\n",
    "        x0, N, p, num_walks, num_times)\n",
    "    np.random.seed(10)\n",
    "    vectorized2_computed = random_walks1D_vec2(\n",
    "        x0, N, p, num_walks, num_times)\n",
    "    # positions: [0, 1, 0, 1, 2]\n",
    "    # Can test without tolerance since everything is +/- 1\n",
    "    return_values = ['pos', 'pos2', 'pos_hist', 'pos_hist_times']\n",
    "    for s, v, r in zip(serial_computed,\n",
    "                       vectorized1_computed,\n",
    "                       return_values):\n",
    "        msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v)\n",
    "        assert (s == v).all(), msg\n",
    "    for s, v, r in zip(serial_computed,\n",
    "                       vectorized2_computed,\n",
    "                       return_values):\n",
    "        msg = '%s: %s (serial) vs %s (vectorized)' % (r, s, v)\n",
    "        assert (s == v).all(), msg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Such test functions are indispensable for further development of the code\n",
    "as we can at any time test whether the basic computations remain correct or not.\n",
    "This is particularly important in stochastic simulations since without\n",
    "test functions and fixed seeds, we always experience variations from run to\n",
    "run, and it can be very difficult to spot bugs through averaged statistical\n",
    "quantities.\n",
    "\n",
    "## Demonstration of multiple walks\n",
    "\n",
    "Assuming now that the code works, we can just scale up the number of\n",
    "steps in each walk and the number of walks. The latter influences the\n",
    "accuracy of the statistical estimates.  [Figure](#diffu:randomwalk:1D:fig:demo1:EX) shows the impact of the number\n",
    "of walks on the expectation, which should approach zero.  [Figure](#diffu:randomwalk:1D:fig:demo1:VarX) displays the corresponding\n",
    "estimate of the variance of the position, which should grow linearly\n",
    "with the number of steps. mathcal{I}_t does, seemingly very accurately, but\n",
    "notice that the scale on the $y$ axis is so much larger than for the\n",
    "expectation, so irregularities due to the stochastic nature of the\n",
    "process become so much less visible in the variance plots.  The\n",
    "probability of finding a particle at a certain position at time (or\n",
    "step) 800 is shown in [Figure](#diffu:randomwalk:1D:fig:demo1:HistX). The dashed red line is the\n",
    "theoretical distribution ([4](#diffu:randomwalk:1D:pde:dimless:sol))\n",
    "arising from solving the diffusion equation\n",
    "([2](#diffu:randomwalk:1D:pde:dimless)) instead. As always, we realize\n",
    "that one needs significantly more statistical samples to estimate a\n",
    "histogram accurately than the expectation or variance.\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw1D_EX_100_10000_100000_1000000.png, width=800 frac=1] Estimated expected value for 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right). <div id=\"diffu:randomwalk:1D:fig:demo1:EX\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:1D:fig:demo1:EX\"></div>\n",
    "\n",
    "<p>Estimated expected value for 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right).</p>\n",
    "<img src=\"fig-diffu/rw1D_EX_100_10000_100000_1000000.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw1D_VarX_100_10000_100000_1000000.png, width=800 frac=1] Estimated variance over 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right). <div id=\"diffu:randomwalk:1D:fig:demo1:VarX\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:1D:fig:demo1:VarX\"></div>\n",
    "\n",
    "<p>Estimated variance over 1000 steps, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right).</p>\n",
    "<img src=\"fig-diffu/rw1D_VarX_100_10000_100000_1000000.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw1D_HistX_100_10000_100000_1000000.png, width=800 frac=1] Estimated probability distribution at step 800, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right). <div id=\"diffu:randomwalk:1D:fig:demo1:HistX\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:1D:fig:demo1:HistX\"></div>\n",
    "\n",
    "<p>Estimated probability distribution at step 800, using 100 walks (upper left), 10,000 (upper right), 100,000 (lower left), and 1,000,000 (lower right).</p>\n",
    "<img src=\"fig-diffu/rw1D_HistX_100_10000_100000_1000000.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "\n",
    "## Ascii visualization of 1D random walk\n",
    "<div id=\"diffu:randomwalk:1D:avplotter\"></div>\n",
    "\n",
    "\n",
    "If we want to study (very) long time series of random walks, it can be\n",
    "convenient to plot the position in a terminal window with the time axis\n",
    "pointing downwards. The module `avplotter` in SciTools has a class `Plotter`\n",
    "for plotting functions in the terminal window with the aid of ascii symbols\n",
    "only. Below is the code required to visualize a simple random walk,\n",
    "starting at the origin, and considered over\n",
    "when the point $x=-1$ is reached. We use a spacing $\\Delta x = 0.05$ (so\n",
    "$x=-1$ corresponds to $i=-20$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "def run_random_walk():\n",
    "    from scitools.avplotter import Plotter\n",
    "    import time, numpy as np\n",
    "    p = Plotter(-1, 1, width=75)   # Horizontal axis: 75 chars wide\n",
    "    dx = 0.05\n",
    "    np.random.seed(10)\n",
    "\n",
    "    x = 0\n",
    "    while True:\n",
    "        random_step = 1 if np.random.random() > 0.5 else -1\n",
    "        x = x + dx*random_step\n",
    "        if x < -1:\n",
    "            break                  # Destination reached!\n",
    "        print p.plot(0, x)\n",
    "\n",
    "        # Allow Ctrl+c to abort the simulation\n",
    "        try:\n",
    "            time.sleep(0.1)  # Wait for interrupt\n",
    "        except KeyboardInterrupt:\n",
    "            print 'Interrupted by Ctrl+c'\n",
    "            break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Observe that we implement an infinite loop, but allow a smooth interrupt\n",
    "of the program by `Ctrl+c` through Python's `KeyboardInterrupt`\n",
    "exception. This is a useful recipe that can be used in many occasions!\n",
    "\n",
    "The output looks typically like"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "                                    *         |\n",
    "                                      *       |\n",
    "                                    *         |\n",
    "                                      *       |\n",
    "                                    *         |\n",
    "                                      *       |\n",
    "                                        *     |\n",
    "                                      *       |\n",
    "                                    *         |\n",
    "                                  *           |\n",
    "                                *             |\n",
    "                              *               |\n",
    "                                *             |\n",
    "                              *               |\n",
    "                             *                |\n",
    "                           *                  |\n",
    "                         *                    |\n",
    "                       *                      |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Positions beyond the limits of the $x$ axis appear with a value.\n",
    "[A long file](http://bit.ly/1UbULeH)\n",
    "contains the complete ascii plot corresponding to the\n",
    "function `run_random_walk` above.\n",
    "<!-- \"https://github.com/hplgit/fdm-book/blob/master/doc/.src/chapters/diffu/fig-diffu/rw_ascii.txt\" -->\n",
    "\n",
    "## Random walk as a stochastic equation\n",
    "<div id=\"diffu:randomwalk:1D:ode\"></div>\n",
    "\n",
    "\n",
    "The (dimensionless) position in a random walk, $\\bar X_k$, can be expressed as\n",
    "a stochastic difference equation:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"diffu:randomwalk:1D:ode:x\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\bar X_k = \\bar X_{k-1} + s, \\quad x_0=0,\n",
    "\\label{diffu:randomwalk:1D:ode:x} \\tag{5}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $s$ is a [Bernoulli variable](https://en.wikipedia.org/wiki/Bernoulli_distribution),\n",
    "taking on the two values $s=-1$ and $s=1$\n",
    "with equal probability:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\hbox{P}(s=1)=\\frac{1}{2},\\quad \\hbox{P}(s=-1)=\\frac{1}{2}\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The $s$ variable in a step is independent of the $s$ variable in other steps.\n",
    "\n",
    "The difference equation expresses essentially the sum of independent\n",
    "Bernoulli variables.  Because of the central limit theorem, $X_k$,\n",
    "will then be normally distributed with expectation $k\\E{s}$ and\n",
    "$k\\Var{s}$.  The expectation and variance of a Bernoulli variable with\n",
    "values $r=0$ and $r=1$ are $p$ and $p(1-p)$, respectively.\n",
    "The variable $s=2r-1$ then has expectation\n",
    "$2\\E{r}-1=2p-1=0$ and variance $2^2\\Var{r}=4p(1-p)=1$.  The position\n",
    "$X_k$ is normally distributed with zero expectation and variance $k$,\n",
    "as we found in the section [Statistical considerations](#diffu:randomwalk:1D:EVar).\n",
    "\n",
    "The central limit theorem tells that as long as $k$ is not small,\n",
    "the distribution of $X_k$ remains the same if\n",
    "we replace the Bernoulli variable $s$ by any other stochastic variable with\n",
    "the same expectation and variance. In particular, we may let $s$ be a\n",
    "standardized Gaussian variable (zero mean, unit variance).\n",
    "\n",
    "<!-- Let us introduce -->\n",
    "<!-- !bt -->\n",
    "<!-- x_k = \\Delta x\\bar x_k,\\] -->\n",
    "<!-- !et -->\n",
    "<!-- such that we take steps of length $\\Delta x$. -->\n",
    "Dividing ([5](#diffu:randomwalk:1D:ode:x)) by $\\Delta t$ gives"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "\\frac{\\bar X_k - \\bar X_{k-1}}{\\Delta t} = \\frac{1}{\\Delta t} s\\thinspace .\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the limit $\\Delta t\\rightarrow 0$, $s/\\Delta t$ approaches a white noise\n",
    "stochastic process.\n",
    "With $\\bar X(t)$ as the continuous process in the limit\n",
    "$\\Delta t\\rightarrow 0$ ($X_k\\rightarrow X(t_k)$),\n",
    "we formally get the stochastic differential equation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- Equation labels as ordinary links -->\n",
    "<div id=\"_auto1\"></div>\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "d\\bar X = dW,\n",
    "\\label{_auto1} \\tag{6}\n",
    "\\end{equation}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "where $W(t)$ is a [Wiener process](https://en.wikipedia.org/wiki/Wiener_process). Then $X$ is also a\n",
    "Wiener process. mathcal{I}_t follows from the stochastic ODE $dX=dW$ that the\n",
    "probability distribution of $X$ is given by the [Fokker-Planck\n",
    "equation](https://en.wikipedia.org/wiki/Fokker-Planck_equation)\n",
    "([2](#diffu:randomwalk:1D:pde:dimless)).  In other words, the key\n",
    "results for random walk we found earlier can alternatively be\n",
    "derived via a stochastic ordinary differential equation and its\n",
    "related Fokker-Planck equation.\n",
    "\n",
    "\n",
    "\n",
    "## Random walk in 2D\n",
    "\n",
    "The most obvious generalization of 1D random walk to two spatial\n",
    "dimensions is to allow movements to the north, east, south, and west,\n",
    "with equal probability $\\frac{1}{4}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walk2D(x0, N, p, random=random):\n",
    "    \"\"\"2D random walk with 1 particle and N moves: N, E, W, S.\"\"\"\n",
    "    # Store position in step k in position[k]\n",
    "    d = len(x0)\n",
    "    position = np.zeros((N+1, d))\n",
    "    position[0,:] = x0\n",
    "    current_pos = np.array(x0, dtype=float)\n",
    "    for k in range(N):\n",
    "        r = random.uniform(0, 1)\n",
    "        if r <= 0.25:\n",
    "            current_pos += np.array([0, 1])   # Move north\n",
    "        elif 0.25 < r <= 0.5:\n",
    "            current_pos += np.array([1, 0])   # Move east\n",
    "        elif 0.5 < r <= 0.75:\n",
    "            current_pos += np.array([0, -1])  # Move south\n",
    "        else:\n",
    "            current_pos += np.array([-1, 0])  # Move west\n",
    "        position[k+1,:] = current_pos\n",
    "    return position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The left plot in [Figure](#diffu:randomwalk:2D:fig:rect_vs_diag) provides\n",
    "an example on 200 steps with this kind of walk. We may refer to this walk\n",
    "as a walk on a *rectangular mesh* as we move from any spatial\n",
    "mesh point $(i,j)$ to one of its four neighbors in the rectangular directions:\n",
    "$(i+1,j)$, $(i-1,j)$, $(i,j+1)$, or $(i,j-1)$.\n",
    "\n",
    "<!-- dom:FIGURE: [fig-diffu/rw2D_sample200.png, width=800 frac=1] Random walks in 2D with 200 steps: rectangular mesh (left) and diagonal mesh (right). <div id=\"diffu:randomwalk:2D:fig:rect_vs_diag\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:2D:fig:rect_vs_diag\"></div>\n",
    "\n",
    "<p>Random walks in 2D with 200 steps: rectangular mesh (left) and diagonal mesh (right).</p>\n",
    "<img src=\"fig-diffu/rw2D_sample200.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "\n",
    "## Random walk in any number of space dimensions\n",
    "\n",
    "From a programming point of view, especially when implementing a random\n",
    "walk in any number of dimensions, it is more natural to consider a walk\n",
    "in the diagonal directions NW, NE, SW, and SE. On a two-dimensional spatial mesh\n",
    "it means that we go from $(i,j)$ to either $(i+1,j+1)$, $(i-1,j+1)$,\n",
    "$(i+1,j-1)$, or $(i-1,j-1)$. We can with such a *diagonal mesh*\n",
    "(see right plot in [Figure](#diffu:randomwalk:2D:fig:rect_vs_diag))\n",
    "draw a Bernoulli variable for the step in each spatial direction and\n",
    "trivially write code that works in any number of spatial directions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walkdD(x0, N, p, random=random):\n",
    "    \"\"\"Any-D (diagonal) random walk with 1 particle and N moves.\"\"\"\n",
    "    # Store position in step k in position[k]\n",
    "    d = len(x0)\n",
    "    position = np.zeros((N+1, d))\n",
    "    position[0,:] = x0\n",
    "    current_pos = np.array(x0, dtype=float)\n",
    "    for k in range(N):\n",
    "        for i in range(d):\n",
    "            r = random.uniform(0, 1)\n",
    "            if r <= p:\n",
    "                current_pos[i] -= 1\n",
    "            else:\n",
    "                current_pos[i] += 1\n",
    "        position[k+1,:] = current_pos\n",
    "    return position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A vectorized version is desired. We follow the ideas from the section [Playing around with some code](#diffu:randomwalk:1D:code1), but each step is now a vector in $d$\n",
    "spatial dimensions. We therefore need to draw $Nd$ random numbers in `r`,\n",
    "compute steps in the various directions through `np.where(r <=p, -1, 1)`\n",
    "(each step being $-1$ or $1$),\n",
    "and then we can reshape this array to an $N\\times d$ array of step\n",
    "*vectors*. Doing an `np.cumsum` summation along axis 0 will add\n",
    "the vectors, as this demo shows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(6).reshape(3,2)\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.cumsum(a, axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With such summation of step vectors, we get all the positions to be\n",
    "filled in the `position` array:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walkdD_vec(x0, N, p):\n",
    "    \"\"\"Vectorized version of random_walkdD.\"\"\"\n",
    "    d = len(x0)\n",
    "    # Store position in step k in position[k]\n",
    "    position = np.zeros((N+1,d))\n",
    "    position[0] = np.array(x0, dtype=float)\n",
    "    r = np.random.uniform(0, 1, size=N*d)\n",
    "    steps = np.where(r <= p, -1, 1).reshape(N,d)\n",
    "    position[1:,:] = x0 + np.cumsum(steps, axis=0)\n",
    "    return position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- dom:FIGURE: [fig-diffu/rw2D_samples_5000.png, width=800 frac=1] Four random walks with 5000 steps in 2D. <div id=\"diffu:randomwalk:2D:fig:samples\"></div> -->\n",
    "<!-- begin figure -->\n",
    "<div id=\"diffu:randomwalk:2D:fig:samples\"></div>\n",
    "\n",
    "<p>Four random walks with 5000 steps in 2D.</p>\n",
    "<img src=\"fig-diffu/rw2D_samples_5000.png\" width=800>\n",
    "\n",
    "<!-- end figure -->\n",
    "\n",
    "\n",
    "## Multiple random walks in any number of space dimensions\n",
    "\n",
    "As we did in 1D, we extend one single walk to a number of walks (`num_walks`\n",
    "in the code).\n",
    "\n",
    "### Scalar code\n",
    "\n",
    "As always, we start with implementing the scalar case:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walksdD(x0, N, p, num_walks=1, num_times=1,\n",
    "                   random=random):\n",
    "    \"\"\"Simulate num_walks random walks from x0 with N steps.\"\"\"\n",
    "    d = len(x0)\n",
    "    position  = np.zeros((N+1, d))   # Accumulated positions\n",
    "    position2 = np.zeros((N+1, d))   # Accumulated positions**2\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times, d))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "\n",
    "    for n in range(num_walks):\n",
    "        num_times_counter = 0\n",
    "        current_pos = np.array(x0, dtype=float)\n",
    "        for k in range(N):\n",
    "            if k in pos_hist_times:\n",
    "                pos_hist[n,num_times_counter,:] = current_pos\n",
    "                num_times_counter += 1\n",
    "            # current_pos corresponds to step k+1\n",
    "            for i in range(d):\n",
    "                r = random.uniform(0, 1)\n",
    "                if r <= p:\n",
    "                    current_pos[i] -= 1\n",
    "                else:\n",
    "                    current_pos[i] += 1\n",
    "            position [k+1,:] += current_pos\n",
    "            position2[k+1,:] += current_pos**2\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Vectorized code\n",
    "\n",
    "Significant speed-ups can be obtained by vectorization. We get rid of\n",
    "the loops in the previous function and arrive at the following vectorized\n",
    "code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "def random_walksdD_vec(x0, N, p, num_walks=1, num_times=1):\n",
    "    \"\"\"Vectorized version of random_walks1D; no loops.\"\"\"\n",
    "    d = len(x0)\n",
    "    position  = np.zeros((N+1, d))  # Accumulated positions\n",
    "    position2 = np.zeros((N+1, d))  # Accumulated positions**2\n",
    "    walks = np.zeros((num_walks, N+1, d))  # Positions of each walk\n",
    "    walks[:,0,:] = x0\n",
    "    # Histogram at num_times selected time points\n",
    "    pos_hist = np.zeros((num_walks, num_times, d))\n",
    "    pos_hist_times = [(N//num_times)*i for i in range(num_times)]\n",
    "\n",
    "    r = np.random.uniform(0, 1, size=N*num_walks*d)\n",
    "    steps = np.where(r <= p, -1, 1).reshape(num_walks, N, d)\n",
    "    walks[:,1:,:] = x0 + np.cumsum(steps, axis=1)\n",
    "    position  = np.sum(walks,    axis=0)\n",
    "    position2 = np.sum(walks**2, axis=0)\n",
    "    pos_hist[:,:,:] = walks[:,pos_hist_times,:]\n",
    "    return position, position2, pos_hist, np.array(pos_hist_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- 2D N=4M -->\n",
    "<!-- scalar/vec (1 walk): 25 -->\n",
    "\n",
    "<!-- 2D N=5000 -->\n",
    "<!-- num_walks=10K -->\n",
    "<!-- scalar/vec=50 -->\n",
    "<!-- 3D: 40 -->\n",
    "<!-- 1D: 100 -->\n",
    "<!-- scalar is about the same for 1D, 2D, 3D, but vec is 0.3, 0.7, 1.0 -->"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}