{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%load_ext watermark" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sebastian Raschka \n", "last updated: 2016-06-23 \n", "\n", "CPython 3.5.1\n", "IPython 4.2.0\n" ] } ], "source": [ "%watermark -a 'Sebastian Raschka' -u -d -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Divide-and-Conquer Algorithms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The subfamily of *Divide-and-Conquer* algorithms is one of the main paradigms of algorithmic problem solving next to *Dynamic Programming* and *Greedy Algorithms*. The main goal behind greedy algorithms is to implement an efficient procedure for often computationally more complex, often infeasible brute-force methods such as exhaustive search algorithms by splitting a task into subtasks that can be solved indpendently and in parallel; later, the solutions are combined to yield the final result.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 1 -- Binary Search" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say we want to implement an algorithm that returns the index position of an item that we are looking for in an array. \n", "in an array. Here, we assume that the array is alreadt sorted. The simplest (and computationally most expensive) approach would be to check each element in the array iteratively, until we find the desired match or return -1:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def linear_search(lst, item):\n", " for i in range(len(lst)):\n", " if lst[i] == item:\n", " return i\n", " return -1" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "0\n", "-1\n", "-1\n" ] } ], "source": [ "lst = [1, 5, 8, 12, 13]\n", "\n", "for k in [8, 1, 23, 11]:\n", " print(linear_search(lst=lst, item=k))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The runtime of linear search is obviously $O(n)$ since we are checking each element in the array -- remember that big-Oh is our upper bound. Now, a cleverer way of implementing a search algorithm would be *binary search*, which is a simple, yet nice example of a *divide-and-conquer* algorithm.\n", "\n", "The idea behind divide-and-conquer algorithm is to break a problem down into non-overlapping subproblems of the original problem, which we can then solve recursively. Once, we processed these recursive subproblems, we combine the solutions into the end result.\n", "\n", "Using a divide-and-conquer approach, we can implement an $O(\\log n)$ search algorithm called *binary search*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The idea behind binary search is quite simple:\n", "\n", "1. We take the midpoint of an array and compare it to its search key\n", "2. If the search key is equal to the midpoint, we are done, else\n", " 3. search key < midpoint?\n", " 4. Yes: repeat search (back to step 1) with subarray that ends at index position `midpoint - 1` \n", " 5. No: repeat search (back step 1) with subarray that starts `midpoint + 1 `\n", " \n", " \n", "Assuming that we are looking for the search key *k=5*, the individual steps of binary search can be illustrated as follows:\n", "\n", "![](./images/binary-search-1.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And below follows our Python implementation of this idea:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def binary_search(lst, item):\n", " first = 0\n", " last = len(lst) - 1\n", " found = False\n", "\n", " while first <= last and not found:\n", " midpoint = (first + last) // 2\n", " if lst[midpoint] == item:\n", " found = True\n", " else:\n", " if item < lst[midpoint]:\n", " last = midpoint - 1\n", " else:\n", " first = midpoint + 1\n", " \n", " if found:\n", " return midpoint\n", " else:\n", " return -1" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2\n", "0\n", "-1\n", "-1\n" ] } ], "source": [ "for k in [8, 1, 23, 11]:\n", " print(binary_search(lst=lst, item=k))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example 2 -- Finding the Majority Element" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Finding the Majority Element\" is a problem where we want to find an element in an array positive integers with length *n* that occurs more than *n/2* in that array. For example, if we have an array $a = [1, 2, 3, 3, 3]$, $3$ would be the majority element. In another array, b = [1, 2, 3, 3] there exists no majority element, since $2$ (where $2$ is the the count of element $3$) is not greater than $n / 2$.\n", "\n", "Let's start with a simple implementation where we count how often each unique element occurs in the array. Then, we return the element that meets the criterion \"$\\text{occurences } > n / 2$\", and if such an element does not exist, we return -1. Note that we return a tuple of three items: (element, number_occurences, count_dictionary), which we will use later ..." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[] -> -1\n", "[1, 2, 3, 4, 4, 5] -> -1\n", "[1, 2, 4, 4, 4, 5] -> -1\n", "[4, 2, 4, 4, 4, 5] -> 4\n", "[5, 4, 4, 4, 2, 4] -> 4\n", "[2, 3, 9, 2, 2] -> 2\n", "[2, 2, 9, 3, 2] -> 2\n", "[0, 0, 2, 2, 2] -> 2\n", "[2, 2, 2, 0, 0] -> 2\n" ] } ], "source": [ "def majority_ele_lin(lst): \n", " cnt = {}\n", " for ele in lst:\n", " if ele not in cnt:\n", " cnt[ele] = 1\n", " else:\n", " cnt[ele] += 1\n", " for ele, c in cnt.items():\n", " if c > (len(lst) // 2):\n", " return (ele, c, cnt)\n", " return (-1, -1, cnt)\n", "\n", "###################################################\n", "\n", "lst0 = []\n", "print(lst0, '->', majority_ele_lin(lst=lst0)[0])\n", "\n", "lst1 = [1, 2, 3, 4, 4, 5]\n", "print(lst1, '->', majority_ele_lin(lst=lst1)[0])\n", "\n", "lst2 = [1, 2, 4, 4, 4, 5]\n", "print(lst2, '->', majority_ele_lin(lst=lst2)[0])\n", "\n", "lst3 = [4, 2, 4, 4, 4, 5]\n", "print(lst3, '->', majority_ele_lin(lst=lst3)[0])\n", "print(lst3[::-1], '->', majority_ele_lin(lst=lst3[::-1])[0])\n", "\n", "lst4 = [2, 3, 9, 2, 2]\n", "print(lst4, '->',majority_ele_lin(lst=lst4)[0])\n", "print(lst4[::-1], '->', majority_ele_lin(lst=lst4[::-1])[0])\n", "\n", "lst5 = [0, 0, 2, 2, 2]\n", "print(lst5, '->',majority_ele_lin(lst=lst5)[0])\n", "print(lst5[::-1], '->', majority_ele_lin(lst=lst5[::-1])[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, \"finding the majority element\" is a nice task for a Divide and Conquer algorithm. Here, we use the fact that if a list has a majority element it is also the majority element of one of its two sublists, if we split it into 2 halves. \n", "\n", "More concretely, what we do is:\n", "\n", "1. Split the array into 2 halves\n", "2. Run the majority element search on each of the two halves\n", "3. Combine the 2 subresults\n", " 1. Neither of the 2 sub-arrays has a majority element; thus, the combined list can't have a majority element so that we return -1\n", " 2. The right sub-array has a majority element, whereas the left sub-array hasn't. Now, we need to take the count of this \"right\" majority element, add the number of times it occurs in the left sub-array, and check if the combined count satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion.\n", " 3. Same as above but with \"left\" and \"right\" sub-array swapped in the comparison.\n", " 4. Both sub-arrays have an majority element. Compute the combined count of each of the elements as before and check whether one of these elements satisfies the \"$\\text{occurences} > \\frac{n}{2}$\" criterion." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[] -> -1\n", "[1, 2, 3, 4, 4, 5] -> -1\n", "[1, 2, 4, 4, 4, 5] -> -1\n", "[4, 2, 4, 4, 4, 5] -> 4\n", "[5, 4, 4, 4, 2, 4] -> 4\n", "[2, 3, 9, 2, 2] -> 2\n", "[2, 2, 9, 3, 2] -> 2\n", "[0, 0, 2, 2, 2] -> 3\n", "[2, 2, 2, 0, 0] -> 3\n" ] } ], "source": [ "def majority_ele_dac(lst): \n", " \n", " n = len(lst)\n", " left = lst[:n // 2]\n", " right = lst[n // 2:]\n", " \n", " l_maj = majority_ele_lin(left)\n", " r_maj = majority_ele_lin(right)\n", " \n", " # case 3A\n", " if l_maj[0] == -1 and r_maj[0] == -1:\n", " return -1\n", " \n", " # case 3B\n", " elif l_maj[0] == -1 and r_maj[0] > -1:\n", " cnt = r_maj[1]\n", " if r_maj[0] in l_maj[2]:\n", " cnt += l_maj[2][r_maj[0]]\n", " if cnt > n // 2:\n", " return r_maj[0]\n", " \n", " # case 3C\n", " elif r_maj[0] == -1 and l_maj[0] > -1:\n", " cnt = l_maj[1]\n", " if l_maj[0] in r_maj[2]:\n", " cnt += r_maj[2][l_maj[0]]\n", " if cnt > n // 2:\n", " return l_maj[0]\n", " \n", " # case 3D\n", " else: \n", " c1, c2 = l_maj[1], r_maj[1]\n", " if l_maj[0] in r_maj[2]:\n", " c1 = l_maj[1] + r_maj[2][l_maj[0]]\n", " if r_maj[0] in l_maj[2]:\n", " c2 = r_maj[1] + l_maj[2][r_maj[0]]\n", " m = max(c1, c2)\n", " if m > n // 2:\n", " return m\n", " return -1\n", "\n", "###################################################\n", "\n", "lst0 = []\n", "print(lst0, '->', majority_ele_dac(lst=lst0))\n", "\n", "lst1 = [1, 2, 3, 4, 4, 5]\n", "print(lst1, '->', majority_ele_dac(lst=lst1))\n", "\n", "lst2 = [1, 2, 4, 4, 4, 5]\n", "print(lst2, '->', majority_ele_dac(lst=lst2))\n", "\n", "lst3 = [4, 2, 4, 4, 4, 5]\n", "print(lst3, '->', majority_ele_dac(lst=lst3))\n", "print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n", "\n", "lst4 = [2, 3, 9, 2, 2]\n", "print(lst4, '->',majority_ele_dac(lst=lst4))\n", "print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n", "\n", "lst5 = [0, 0, 2, 2, 2]\n", "print(lst5, '->',majority_ele_dac(lst=lst5))\n", "print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In algorithms such as binary search that we saw at the beginning of this notebook, we recursively break down our problem into smaller subproblems. Thus, we have a recurrence problem with time complexity\n", "\n", "$T(n) = T(\\frac{2}{n}) + O(1) \\rightarrow T(n) = O(\\log n).$\n", "\n", "In this example, finding the majority element, we break our problem down into only 2 subproblems. Thus, the complexity of our algorithm is\n", "\n", "$T(n) = 2T (\\frac{2}{n} + O(n) \\rightarrow T(n) = O(n \\log n).$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Adding multiprocessing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our Divide and Conquer approach above is actually a good candidate for multi-processing, since we can parallelize the majority element search in the two sub-lists. So, let's make a simple modification and use Python's `multiprocessing` module for that. Here, we use the `apply_async` method from the `Pool` class, which doesn't return the results in order (in contrast to the `apply` method). Thus, the left sublist and right sublist may be swapped in the variable assignment `l_maj, r_maj = [p.get() for p in results]`. However, for our implementation, this doesn't make a difference." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[] -> -1\n", "[1, 2, 3, 4, 4, 5] -> -1\n", "[1, 2, 4, 4, 4, 5] -> -1\n", "[4, 2, 4, 4, 4, 5] -> 4\n", "[5, 4, 4, 4, 2, 4] -> 4\n", "[2, 3, 9, 2, 2] -> 2\n", "[2, 2, 9, 3, 2] -> 2\n", "[0, 0, 2, 2, 2] -> 3\n", "[2, 2, 2, 0, 0] -> 3\n" ] } ], "source": [ "import multiprocessing as mp\n", "\n", "def majority_ele_dac_mp(lst): \n", " \n", " n = len(lst)\n", " left = lst[:n // 2]\n", " right = lst[n // 2:]\n", " \n", " results = (pool.apply_async(majority_ele_lin, args=(x,)) \n", " for x in (left, right))\n", " l_maj, r_maj = [p.get() for p in results]\n", " \n", " if l_maj[0] == -1 and r_maj[0] == -1:\n", " return -1\n", " \n", " elif l_maj[0] == -1 and r_maj[0] > -1:\n", " cnt = r_maj[1]\n", " if r_maj[0] in l_maj[2]:\n", " cnt += l_maj[2][r_maj[0]]\n", " if cnt > n // 2:\n", " return r_maj[0]\n", " \n", " elif r_maj[0] == -1 and l_maj[0] > -1:\n", " cnt = l_maj[1]\n", " if l_maj[0] in r_maj[2]:\n", " cnt += r_maj[2][l_maj[0]]\n", " if cnt > n // 2:\n", " return l_maj[0]\n", " \n", " else: \n", " c1, c2 = l_maj[1], r_maj[1]\n", " if l_maj[0] in r_maj[2]:\n", " c1 = l_maj[1] + r_maj[2][l_maj[0]]\n", " if r_maj[0] in l_maj[2]:\n", " c2 = r_maj[1] + l_maj[2][r_maj[0]]\n", " m = max(c1, c2)\n", " if m > n // 2:\n", " return m\n", " return -1\n", "\n", "###################################################\n", "\n", "lst0 = []\n", "print(lst0, '->', majority_ele_dac(lst=lst0))\n", "\n", "lst1 = [1, 2, 3, 4, 4, 5]\n", "print(lst1, '->', majority_ele_dac(lst=lst1))\n", "\n", "lst2 = [1, 2, 4, 4, 4, 5]\n", "print(lst2, '->', majority_ele_dac(lst=lst2))\n", "\n", "lst3 = [4, 2, 4, 4, 4, 5]\n", "print(lst3, '->', majority_ele_dac(lst=lst3))\n", "print(lst3[::-1], '->', majority_ele_dac(lst=lst3[::-1]))\n", "\n", "lst4 = [2, 3, 9, 2, 2]\n", "print(lst4, '->',majority_ele_dac(lst=lst4))\n", "print(lst4[::-1], '->', majority_ele_dac(lst=lst4[::-1]))\n", "\n", "lst5 = [0, 0, 2, 2, 2]\n", "print(lst5, '->',majority_ele_dac(lst=lst5))\n", "print(lst5[::-1], '->', majority_ele_dac(lst=lst5[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ... to be continued" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" } }, "nbformat": 4, "nbformat_minor": 0 }