{
 "metadata": {
  "name": "choose k from n benchmark"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Choose k from n benchmark\n",
      "I recently learned that NumPy 1.7/1.8 will have a `choice` function that randomly chooses k of n elements, with or without replacement, with or without weighted probabilities. I was interested in the simplest case of choosing without replacement and without weights (uniformly).\n",
      "\n",
      "I took a look at [the code](https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx) and saw the implementation for this case is basically:\n",
      "\n",
      "```\n",
      "return np.random.permutation(n)[:k]\n",
      "```\n",
      "\n",
      "which surprised me because if `n` is much larger than `k` than it seems you are doing a lot of work for nothing permuting that big array.\n",
      "\n",
      "I stumbled upon a different suggestion on *stackoverflow* (see the comment by *Sa\u0161a \u0160ijak* on [this question](http://stackoverflow.com/questions/306400/how-do-i-randomly-select-an-item-from-a-list-using-python) which suggested using `random.sample` - that is built-in random, **not** *NumPy*'s random.\n",
      "\n",
      "So I made a quick benchmark for **my specific case**:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import random\n",
      "import numpy as np\n",
      "def choose_no_rep_python(n,k):\n",
      "    return random.sample(xrange(n), k)    \n",
      "def choose_no_rep_numpy(n, k):\n",
      "    return np.random.permutation(n)[:k]\n",
      "def choose_no_rep_numpy_take1(n, k):\n",
      "    return np.random.permutation(n).take(range(k))\n",
      "def choose_no_rep_numpy_take2(n, k):\n",
      "    return np.random.permutation(n).take(arange(k))\n",
      "def choose_no_rep_numpy_take3(n, k):\n",
      "    return np.random.permutation(n).take(xrange(k))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit -n 10000 choose_no_rep_python(1000, 4)\n",
      "%timeit -n 10000 choose_no_rep_numpy(1000, 4)\n",
      "%timeit -n 10000 choose_no_rep_numpy_take1(1000, 4)\n",
      "%timeit -n 10000 choose_no_rep_numpy_take2(1000, 4)\n",
      "%timeit -n 10000 choose_no_rep_numpy_take3(1000, 4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10000 loops, best of 3: 6.73 us per loop\n",
        "10000 loops, best of 3: 320 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "10000 loops, best of 3: 453 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "10000 loops, best of 3: 439 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "10000 loops, best of 3: 456 us per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It seems that the naive python implementation is ~50-fold faster than the NumPy implementation, which surprised me...\n",
      "\n",
      "## Technical details"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import os, sys, platform\n",
      "print platform.system(), platform.release()\n",
      "print \"python version\", sys.version\n",
      "fin = os.popen(\"ipython -V\")\n",
      "print \"ipython version\", fin.read(),\n",
      "fin.close()\n",
      "print \"numpy version\", np.version.version"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Windows 7\n",
        "python version 2.7.3 |EPD 7.3-2 (64-bit)| (default, Apr 12 2012, 15:20:16) [MSC v.1500 64 bit (AMD64)]\n",
        "ipython version "
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "0.13\n",
        "numpy version 1.6.1\n"
       ]
      }
     ],
     "prompt_number": 29
    }
   ],
   "metadata": {}
  }
 ]
}