{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# CR\u00dcCIAL P\u0178THON Week 5: Outers"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline\n",
      "\n",
      "# We'll be using some pretty-printing later on\n",
      "from pprint import pprint"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Populating the interactive namespace from numpy and matplotlib\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import Image \n",
      "Image(url='http://labrosa.ee.columbia.edu/crucialpython/logo.png', width=600) "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<img src=\"http://labrosa.ee.columbia.edu/crucialpython/logo.png\" width=\"600\"/>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "<IPython.core.display.Image at 0x410a550>"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Cast ye out, tight loops!\n",
      "\n",
      "High-level programs sometimes have computational bottleneck at a tight loop, or even nested loops.  This comes up frequently when comparing two sets of objects, say to build a similarity matrix or do collision detection.\n",
      "\n",
      "A common way to eliminate these bottlenecks is to implement the critical portion in a low-level language, such as C, so that the python interpreter doesn't have to work as hard.  Today we'll look at NumPy's `outer` functionality, which when used correctly, can replace many tight-loop patterns that occur in practice.\n",
      "\n",
      "## Example: point range search\n",
      "\n",
      "Say we have two lists of numbers `a` and `b`, and we wish to find all the pairs `a[i], b[j]` such that `a[i]` is within some distance `r` of `b[i]`.  That is, find all $i$ and $j$ such that\n",
      "\n",
      "$$\n",
      "    |a_i - b_i| \\leq r\n",
      "$$\n",
      "\n",
      "This can be done trivially by nested iteration, as seen below."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Let's make a function to generate some example data\n",
      "def make_point_data(n, m):\n",
      "    a = np.arange(-n/2.0, 1 + n / 2.0)\n",
      "    b = np.random.randn(m)\n",
      "\n",
      "    # For convenience, let's sort b\n",
      "    b = np.sort(b)\n",
      "\n",
      "    return a, b"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Build a list and iterate over all pairs\n",
      "def find_similar_points(a, b, r):\n",
      "    '''Given two lists of numbers a and b, and a threshold r,\n",
      "       returns a matrix `hits` such that:\n",
      "       \n",
      "       |a[ hits[i, 0] ] - b[ hits[j, 1] ]| <= r\n",
      "       \n",
      "    '''\n",
      "    \n",
      "    hits = []\n",
      "    for i, ai in enumerate(a):\n",
      "        for j, bj in enumerate(b):\n",
      "            if np.abs(ai - bj) <= r:\n",
      "                hits.append( (i, j) )\n",
      "                \n",
      "    # Return indices in ndarray format\n",
      "    return np.asarray(hits)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Generate and display the example data\n",
      "a, b = make_point_data(5, 10)\n",
      "# And a threshold\n",
      "r = 0.25\n",
      "\n",
      "print 'a: ', a\n",
      "print 'b: ', b\n",
      "print 'r: ', r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "a:  [-2.5 -1.5 -0.5  0.5  1.5  2.5]\n",
        "b:  [-0.81584428  0.27568022  0.3787181   0.39691845  0.45836127  0.8646873\n",
        "  1.00730788  1.44877987  1.56445537  2.01475412]\n",
        "r:  0.25\n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# What'd we get?\n",
      "print 'Hits: '\n",
      "print find_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Hits: \n",
        "[[3 1]\n",
        " [3 2]\n",
        " [3 3]\n",
        " [3 4]\n",
        " [4 7]\n",
        " [4 8]]\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "\n",
      "# Example 2: interval intersection\n",
      "Let's consider a slightly more involved example.  Say we have a set of real line intervals $\\{(s_i, t_i)\\}$ where each ordered pair denotes the start and end of the interval.\n",
      "\n",
      "One quantity of interest is the length of intersection between any two intervals:\n",
      "\n",
      "$$\n",
      "    c(i, j) = |(s_i, t_i) \\cap (s_j, t_j)| = \\max\\left(0, \\min(t_i, t_j) - \\max(s_i, s_j) \\right)\n",
      "$$\n",
      "\n",
      "Again, this can be computed by nested iteration, as seen below."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def make_interval_data(n):\n",
      "    # Generate random start times\n",
      "    S = np.random.randn(n)\n",
      "    \n",
      "    # Sort it, for convenience\n",
      "    S = np.sort(S)\n",
      "\n",
      "    # Add a random positive length to each one\n",
      "    T = S + np.abs(np.random.randn(n))\n",
      "    \n",
      "    return S, T"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def find_interval_overlap(S, T):\n",
      "    '''S and T are vectors that encode intervals (S[i], T[i])\n",
      "    \n",
      "    Returns an n-by-n matrix C such that C[i, j] is the length of the overlap between intervals i and j.\n",
      "    '''\n",
      "    \n",
      "    # Get the shape\n",
      "    n = S.shape[0]\n",
      "    \n",
      "    # Build an output matrix.\n",
      "    C = np.zeros( (n, n) )\n",
      "    \n",
      "    for i in range(n):\n",
      "        for j in range(n):\n",
      "            C[i, j] = max(0, min(T[i], T[j]) - max(S[i], S[j]))\n",
      "            \n",
      "    return C"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Let's make up some test data\n",
      "S, T = make_interval_data(5)\n",
      "\n",
      "# What does it look like?\n",
      "pprint( zip(S, T) )"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "ename": "NameError",
       "evalue": "name 'pprint' is not defined",
       "output_type": "pyerr",
       "traceback": [
        "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
        "\u001b[1;32m<ipython-input-10-d3e12f50d08d>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      3\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[1;31m# What does it look like?\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 5\u001b[1;33m \u001b[0mpprint\u001b[0m\u001b[1;33m(\u001b[0m \u001b[0mzip\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mS\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mT\u001b[0m\u001b[1;33m)\u001b[0m \u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
        "\u001b[1;31mNameError\u001b[0m: name 'pprint' is not defined"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print find_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[ 1.53571796  0.26408445  0.99869268  0.          0.        ]\n",
        " [ 0.26408445  0.26408445  0.20020356  0.          0.        ]\n",
        " [ 0.99869268  0.20020356  1.95900701  0.46699172  0.48550766]\n",
        " [ 0.          0.          0.46699172  0.46699172  0.29865544]\n",
        " [ 0.          0.          0.48550766  0.29865544  0.53138621]]\n"
       ]
      }
     ],
     "prompt_number": 40
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "In both of the examples above, we're doing an all-pairs comparison between two sets of objects.\n",
      "\n",
      "While conceptually simple, the nested iteration approach leads to somewhat cumbersome code, which can also be computationally inefficient.\n",
      "\n",
      "Fortunately, NumPy provides some tools to help us simplify and vectorize this kind of pairwise operation.  The secret ingredient is called [`outer`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.outer.html#numpy.ufunc.outer), and is a property of NumPy's unversal function [ufunc](http://docs.scipy.org/doc/numpy/reference/ufuncs.html) abstraction.\n",
      "\n",
      "Quoth the manual,\n",
      "\n",
      "> Universal functions (ufunc)\n",
      ">\n",
      "> A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array  broadcasting, type casting, and several other standard features. That is, a ufunc is a \u201cvectorized\u201d wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.\n",
      "\n",
      "A ufunc `f(a, b)` implement the `outer` method, which returns a matrix `X[i, j] = f(a[i], b[j])`.  The benefit of this approach is that ufuncs are typically implemented in C, and are often much more efficient in practice than implementing the equivalent loop in python.\n",
      "\n",
      "Some examples of potentially interesting ufuncs include:\n",
      "\n",
      "  * add, subtract\n",
      "  * multiply, divide\n",
      "  * less, less_equal\n",
      "  * bitwise_and, bitwise_or,\n",
      "  * etc.\n",
      "\n",
      "A list of [available ufuncs](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs) can be found in the `numpy` documentation.\n",
      "\n",
      "---"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Here's how to write our similar_points function using the subtraction ufunc\n",
      "def fast_similar_points(a, b, r):\n",
      "    \n",
      "    # subtract.outer builds a matrix of a[i] - b[j]\n",
      "    # abs computes the absolute value\n",
      "    difference = np.abs(np.subtract.outer(a, b))\n",
      "    \n",
      "    # <= r does a pointwise comparison against the threshold\n",
      "    # argwhere finds all indices (i, j) such that hits[i, j] == True\n",
      "    return np.argwhere(difference <= r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 41
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Make some more test data, 5 points in a and 10 in b\n",
      "a, b = make_point_data(5, 10)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 42
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print a\n",
      "print b\n",
      "print r"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[-2.5 -1.5 -0.5  0.5  1.5  2.5]\n",
        "[-1.96563268 -1.68353098 -0.77402719 -0.27807597 -0.16635048  0.08998779\n",
        "  0.12748518  0.79521268  0.98956769  1.98828443]\n",
        "0.25\n"
       ]
      }
     ],
     "prompt_number": 43
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# For comparison purposes, our previous implementation\n",
      "print find_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[1 1]\n",
        " [2 3]]\n"
       ]
      }
     ],
     "prompt_number": 44
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# And the new one\n",
      "print fast_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[1 1]\n",
        " [2 3]]\n"
       ]
      }
     ],
     "prompt_number": 45
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Timing benchmark\n",
      "%timeit find_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1000 loops, best of 3: 162 \u00b5s per loop\n"
       ]
      }
     ],
     "prompt_number": 46
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit fast_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10000 loops, best of 3: 28.9 \u00b5s per loop\n"
       ]
      }
     ],
     "prompt_number": 47
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# How does it scale? Let's increase the size of our data\n",
      "a, b = make_point_data(500, 1000)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 48
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit find_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 993 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 49
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit fast_similar_points(a, b, r)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 4.96 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 50
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Similarly, our interval-overlap calculator can be simplified by using outers to compute the intermediate steps\n",
      "def fast_interval_overlap(S, T):\n",
      "    \n",
      "    # For each pair of intervals, find the later start: max(S[i], S[j])\n",
      "    max_start = np.maximum.outer(S, S)\n",
      "    \n",
      "    # And the earlier end: min(T[i], T[j])\n",
      "    min_end   = np.minimum.outer(T, T)\n",
      "    \n",
      "    # Subtract min_end[i, j] - max_start[i, j]\n",
      "    # If the result is negative, then the intersection is 0\n",
      "    return np.maximum(0, min_end - max_start)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 51
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "S, T = make_interval_data(5)\n",
      "pprint( zip(S, T) )"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[(-1.047725458139019, -0.33908813387328129),\n",
        " (-0.40815662861575197, 0.41324316199846467),\n",
        " (-0.25504141148370885, 1.0775303283016173),\n",
        " (0.98266375287621432, 2.8308856990815423),\n",
        " (1.06482027877006, 2.1899451748162431)]\n"
       ]
      }
     ],
     "prompt_number": 52
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print fast_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[ 0.70863732  0.06906849  0.          0.          0.        ]\n",
        " [ 0.06906849  0.82139979  0.66828457  0.          0.        ]\n",
        " [ 0.          0.66828457  1.33257174  0.09486658  0.01271005]\n",
        " [ 0.          0.          0.09486658  1.84822195  1.1251249 ]\n",
        " [ 0.          0.          0.01271005  1.1251249   1.1251249 ]]\n"
       ]
      }
     ],
     "prompt_number": 53
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit find_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10000 loops, best of 3: 61.3 \u00b5s per loop\n"
       ]
      }
     ],
     "prompt_number": 54
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit fast_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100000 loops, best of 3: 14.4 \u00b5s per loop\n"
       ]
      }
     ],
     "prompt_number": 55
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Scaling up?\n",
      "S, T = make_interval_data(1000)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 56
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit find_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 2.32 s per loop\n"
       ]
      }
     ],
     "prompt_number": 57
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit fast_interval_overlap(S, T)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 21.6 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 58
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "# Note:\n",
      "\n",
      "Outers are great for relatively small comparison sets and simple operations.  However, they require $\\Theta(nm)$ storage for an $n$-by-$m$ comparison, which can quickly eat up memory if you're not careful.  \n",
      "\n",
      "For many applications, though, the speedup due to vectorization is worth the increased memory complexity."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}