{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Solution to a problem posted here:\n", "\n", "http://stackoverflow.com/questions/36455104/create-a-random-order-of-x-y-pairs-without-repeating-subsequent-xs#\n", "\n", ">Say I have a list of valid X = [1, 2, 3, 4, 5] and a list of valid Y = [1, 2, 3, 4, 5].\n", "\n", ">I need to generate all combinations of every element in X and every element in Y (in this case, 25) and get those combinations in random order.\n", "\n", ">This in itself would be simple, but there is an additional requirement: In this random order, there cannot be a repetition of the same x in succession. For example, this is okay:\n", "\n", " [1, 3]\n", " [2, 5]\n", " [1, 2]\n", " ...\n", " [1, 4]\n", " \n", ">This is not:\n", "\n", " [1, 3]\n", " [1, 2] <== the \"1\" cannot repeat, because there was already one before\n", " [2, 5]\n", " ...\n", " [1, 4]\n", " \n", ">Now, the least efficient idea would be to simply randomize the full set as long as there are no more repetitions. My approach was a bit different, repeatedly creating a shuffled variant of X, and a list of all Y * X, then picking a random next one from that. So far, I've come up with this:\n", "\n", " [...]\n", "\n", ">But I'm sure this can be done even more efficiently or in a more succinct way?\n", "\n", ">Also, my solution first goes through all X values before continuing with the full set again, which is not perfectly random. I can live with that for my particular application case." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My solution uses NumPy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some example values for x and y. I assume that there are no repeated values in x." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[10 11 12 13] [20 21 22 23 24]\n" ] } ], "source": [ "xs = np.arange(10, 14)\n", "ys = np.arange(20, 25)\n", "print(xs, ys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`indices` is the list of indices I'll choose from at random:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "n = len(xs)\n", "m = len(ys)\n", "indices = np.arange(n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll make an array to hold the values of y:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[20 21 22 23 24]\n", " [20 21 22 23 24]\n", " [20 21 22 23 24]\n", " [20 21 22 23 24]]\n" ] } ], "source": [ "array = np.tile(ys, (n, 1))\n", "print(array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And shuffle the rows independently" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[21 23 22 24 20]\n", " [22 23 20 21 24]\n", " [23 20 21 22 24]\n", " [20 24 21 22 23]]\n" ] } ], "source": [ "[np.random.shuffle(array[i]) for i in range(n)] \n", "print(array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'll keep track of how many unused ys there are in each row" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 5 5 5]\n" ] } ], "source": [ "counts = np.full_like(xs, m)\n", "print(counts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I'll choose a row, using the counts as weights" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.25 0.25 0.25 0.25]\n" ] } ], "source": [ "weights = np.array(counts, dtype=float)\n", "weights /= np.sum(weights)\n", "print(weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`i` is the row I chose, which corresponds to a value of x." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n" ] } ], "source": [ "i = np.random.choice(indices, p=weights)\n", "print(i)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I decrement the counter associated with `i`, assemble a pair by choosing a value of x and a value of y.\n", "\n", "I also clobber the array value I used, which is not necessary, but helps with visualization." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12, 24)\n" ] } ], "source": [ "counts[i] -= 1\n", "pair = xs[i], array[i, counts[i]]\n", "array[i, counts[i]] = -1\n", "print(pair)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check that the counts got decremented" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 5 4 5]\n" ] } ], "source": [ "print(counts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And one of the values in array got used" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[21 23 22 24 20]\n", " [22 23 20 21 24]\n", " [23 20 21 22 -1]\n", " [20 24 21 22 23]]\n" ] } ], "source": [ "print(array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next time through is almost the same, except that when we assemble the weights, we give zero weight to the index we just used." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.33333333 0.33333333 0. 0.33333333]\n" ] } ], "source": [ "weights = np.array(counts, dtype=float)\n", "weights[i] = 0\n", "weights /= np.sum(weights)\n", "print(weights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everything else is the same" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(11, 24)\n" ] } ], "source": [ "i = np.random.choice(indices, p=weights)\n", "counts[i] -= 1\n", "pair = xs[i], array[i, counts[i]]\n", "array[i, counts[i]] = -1\n", "print(pair)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[5 4 4 5]\n" ] } ], "source": [ "print(counts)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[21 23 22 24 20]\n", " [22 23 20 21 -1]\n", " [23 20 21 22 -1]\n", " [20 24 21 22 23]]\n" ] } ], "source": [ "print(array)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can wrap all that up in a function, using a special value for `i` during the first iteration." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def generate_pairs(xs, ys):\n", " n = len(xs)\n", " m = len(ys)\n", " indices = np.arange(n)\n", " \n", " array = np.tile(ys, (n, 1))\n", " [np.random.shuffle(array[i]) for i in range(n)]\n", " \n", " counts = np.full_like(xs, m)\n", " i = -1\n", "\n", " for _ in range(n * m):\n", " weights = np.array(counts, dtype=float)\n", " if i != -1:\n", " weights[i] = 0\n", " weights /= np.sum(weights)\n", "\n", " i = np.random.choice(indices, p=weights)\n", " counts[i] -= 1\n", " pair = xs[i], array[i, counts[i]]\n", " array[i, counts[i]] = -1\n", " \n", " yield pair" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's how it works:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(13, 24)\n", "(12, 24)\n", "(11, 21)\n", "(12, 22)\n", "(10, 22)\n", "(11, 24)\n", "(10, 21)\n", "(13, 20)\n", "(10, 23)\n", "(13, 21)\n", "(12, 20)\n", "(10, 24)\n", "(11, 22)\n", "(10, 20)\n", "(13, 23)\n", "(12, 23)\n", "(13, 22)\n", "(11, 20)\n", "(12, 21)\n", "(11, 23)\n" ] } ], "source": [ "for pairs in generate_pairs(xs, ys):\n", " print(pairs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inside the loop, we have to copy the weights, add them up, and choose a random index using the weights. These are all linear in `n`. So the overall complexity to generate all pairs is `O(n^2 m)`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }