{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Solution to a problem posted here:\n",
    "\n",
    "http://stackoverflow.com/questions/36455104/create-a-random-order-of-x-y-pairs-without-repeating-subsequent-xs#\n",
    "\n",
    ">Say I have a list of valid X = [1, 2, 3, 4, 5] and a list of valid Y = [1, 2, 3, 4, 5].\n",
    "\n",
    ">I need to generate all combinations of every element in X and every element in Y (in this case, 25) and get those combinations in random order.\n",
    "\n",
    ">This in itself would be simple, but there is an additional requirement: In this random order, there cannot be a repetition of the same x in succession. For example, this is okay:\n",
    "\n",
    "    [1, 3]\n",
    "    [2, 5]\n",
    "    [1, 2]\n",
    "    ...\n",
    "    [1, 4]\n",
    "    \n",
    ">This is not:\n",
    "\n",
    "    [1, 3]\n",
    "    [1, 2]  <== the \"1\" cannot repeat, because there was already one before\n",
    "    [2, 5]\n",
    "    ...\n",
    "    [1, 4]\n",
    "    \n",
    ">Now, the least efficient idea would be to simply randomize the full set as long as there are no more repetitions. My approach was a bit different, repeatedly creating a shuffled variant of X, and a list of all Y * X, then picking a random next one from that. So far, I've come up with this:\n",
    "\n",
    "    [...]\n",
    "\n",
    ">But I'm sure this can be done even more efficiently or in a more succinct way?\n",
    "\n",
    ">Also, my solution first goes through all X values before continuing with the full set again, which is not perfectly random. I can live with that for my particular application case."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My solution uses NumPy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here are some example values for x and y.  I assume that there are no repeated values in x."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[10 11 12 13] [20 21 22 23 24]\n"
     ]
    }
   ],
   "source": [
    "xs = np.arange(10, 14)\n",
    "ys = np.arange(20, 25)\n",
    "print(xs, ys)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`indices` is the list of indices I'll choose from at random:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "n = len(xs)\n",
    "m = len(ys)\n",
    "indices = np.arange(n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I'll make an array to hold the values of y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[20 21 22 23 24]\n",
      " [20 21 22 23 24]\n",
      " [20 21 22 23 24]\n",
      " [20 21 22 23 24]]\n"
     ]
    }
   ],
   "source": [
    "array = np.tile(ys, (n, 1))\n",
    "print(array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And shuffle the rows independently"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[21 23 22 24 20]\n",
      " [22 23 20 21 24]\n",
      " [23 20 21 22 24]\n",
      " [20 24 21 22 23]]\n"
     ]
    }
   ],
   "source": [
    "[np.random.shuffle(array[i]) for i in range(n)] \n",
    "print(array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I'll keep track of how many unused ys there are in each row"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 5 5 5]\n"
     ]
    }
   ],
   "source": [
    "counts = np.full_like(xs, m)\n",
    "print(counts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I'll choose a row, using the counts as weights"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0.25  0.25  0.25  0.25]\n"
     ]
    }
   ],
   "source": [
    "weights = np.array(counts, dtype=float)\n",
    "weights /= np.sum(weights)\n",
    "print(weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`i` is the row I chose, which corresponds to a value of x."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2\n"
     ]
    }
   ],
   "source": [
    "i = np.random.choice(indices, p=weights)\n",
    "print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now I decrement the counter associated with `i`, assemble a pair by choosing a value of x and a value of y.\n",
    "\n",
    "I also clobber the array value I used, which is not necessary, but helps with visualization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(12, 24)\n"
     ]
    }
   ],
   "source": [
    "counts[i] -= 1\n",
    "pair = xs[i], array[i, counts[i]]\n",
    "array[i, counts[i]] = -1\n",
    "print(pair)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check that the counts got decremented"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 5 4 5]\n"
     ]
    }
   ],
   "source": [
    "print(counts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And one of the values in array got used"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[21 23 22 24 20]\n",
      " [22 23 20 21 24]\n",
      " [23 20 21 22 -1]\n",
      " [20 24 21 22 23]]\n"
     ]
    }
   ],
   "source": [
    "print(array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next time through is almost the same, except that when we assemble the weights, we give zero weight to the index we just used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0.33333333  0.33333333  0.          0.33333333]\n"
     ]
    }
   ],
   "source": [
    "weights = np.array(counts, dtype=float)\n",
    "weights[i] = 0\n",
    "weights /= np.sum(weights)\n",
    "print(weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Everything else is the same"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(11, 24)\n"
     ]
    }
   ],
   "source": [
    "i = np.random.choice(indices, p=weights)\n",
    "counts[i] -= 1\n",
    "pair = xs[i], array[i, counts[i]]\n",
    "array[i, counts[i]] = -1\n",
    "print(pair)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[5 4 4 5]\n"
     ]
    }
   ],
   "source": [
    "print(counts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[21 23 22 24 20]\n",
      " [22 23 20 21 -1]\n",
      " [23 20 21 22 -1]\n",
      " [20 24 21 22 23]]\n"
     ]
    }
   ],
   "source": [
    "print(array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can wrap all that up in a function, using a special value for `i` during the first iteration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def generate_pairs(xs, ys):\n",
    "    n = len(xs)\n",
    "    m = len(ys)\n",
    "    indices = np.arange(n)\n",
    "    \n",
    "    array = np.tile(ys, (n, 1))\n",
    "    [np.random.shuffle(array[i]) for i in range(n)]\n",
    "    \n",
    "    counts = np.full_like(xs, m)\n",
    "    i = -1\n",
    "\n",
    "    for _ in range(n * m):\n",
    "        weights = np.array(counts, dtype=float)\n",
    "        if i != -1:\n",
    "            weights[i] = 0\n",
    "        weights /= np.sum(weights)\n",
    "\n",
    "        i = np.random.choice(indices, p=weights)\n",
    "        counts[i] -= 1\n",
    "        pair = xs[i], array[i, counts[i]]\n",
    "        array[i, counts[i]] = -1\n",
    "        \n",
    "        yield pair"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's how it works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(13, 24)\n",
      "(12, 24)\n",
      "(11, 21)\n",
      "(12, 22)\n",
      "(10, 22)\n",
      "(11, 24)\n",
      "(10, 21)\n",
      "(13, 20)\n",
      "(10, 23)\n",
      "(13, 21)\n",
      "(12, 20)\n",
      "(10, 24)\n",
      "(11, 22)\n",
      "(10, 20)\n",
      "(13, 23)\n",
      "(12, 23)\n",
      "(13, 22)\n",
      "(11, 20)\n",
      "(12, 21)\n",
      "(11, 23)\n"
     ]
    }
   ],
   "source": [
    "for pairs in generate_pairs(xs, ys):\n",
    "    print(pairs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inside the loop, we have to copy the weights, add them up, and choose a random index using the weights.  These are all linear in `n`.  So the overall complexity to generate all pairs is `O(n^2 m)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}