{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis of Algorithms\n", "\n", "Code examples from [Think Complexity, 2nd edition](https://thinkcomplex.com).\n", "\n", "Copyright 2017 Allen Downey, [MIT License](https://opensource.org/licenses/MIT)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import seaborn as sns\n", "\n", "import os\n", "import string\n", "\n", "from utils import decorate, savefig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Empirical order of growth\n", "\n", "Sometimes we can figure out what order of growth a function belongs to by running it with a range of problem sizes and measuring the run time.\n", "\n", "To measure runtimes, we'll use `etime`, which uses `os.times` to compute the total time used by a process, including \"user time\" and \"system time\". User time is time spent running your code; system time is time spent running operating system code on your behalf." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def etime():\n", " \"\"\"Measures user and system time this process has used.\n", "\n", " Returns the sum of user and system time.\"\"\"\n", " user, sys, chuser, chsys, real = os.times()\n", " return user+sys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`time_func` takes a function object and a problem size, `n`, runs the function, and returns the elapsed time." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def time_func(func, n):\n", " \"\"\"Run a function and return the elapsed time.\n", " \n", " func: function\n", " n: problem size\n", " \n", " returns: user+sys time in seconds\n", " \"\"\"\n", " start = etime()\n", " func(n)\n", " end = etime()\n", " elapsed = end - start\n", " return elapsed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example: a function that makes a list with the given length using `append`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def append_list(n):\n", " t = []\n", " for i in range(n):\n", " t.append(i)\n", " return t\n", "\n", "time_func(append_list, 1000000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`run_timing_test` takes a function, runs it with a range of problem sizes, and returns two lists: problem sizes and times." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def run_timing_test(func, max_time=1):\n", " \"\"\"Tests the given function with a range of values for n.\n", " \n", " func: function object\n", "\n", " returns: list of ns and a list of run times.\n", " \"\"\"\n", " ns = []\n", " ts = []\n", " for i in range(10, 28):\n", " n = 2**i\n", " t = time_func(func, n)\n", " print(n, t)\n", " if t > 0:\n", " ns.append(n)\n", " ts.append(t)\n", " if t > max_time:\n", " break\n", "\n", " return ns, ts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example with `append_list`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "ns, ts = run_timing_test(append_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`fit` takes the lists of ns and ts and fits it with a curve of the form `a * n**exp`, where `exp` is a given exponent and `a` is chosen so that the line goes through a particular point in the sequence, usually the last. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def fit(ns, ts, exp=1.0, index=-1):\n", " \"\"\"Fits a curve with the given exponent.\n", " \n", " ns: sequence of problem sizes\n", " ts: sequence of times\n", " exp: exponent of the fitted curve\n", " index: index of the element the fitted line should go through\n", " \n", " returns: sequence of fitted times\n", "\n", " \n", " \"\"\"\n", " # Use the element with the given index as a reference point, \n", " # and scale all other points accordingly.\n", " nref = ns[index]\n", " tref = ts[index]\n", "\n", " tfit = []\n", " for n in ns:\n", " ratio = n / nref\n", " t = ratio**exp * tref\n", " tfit.append(t)\n", "\n", " return tfit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plot_timing_test` plots the results." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "def plot_timing_test(ns, ts, label='', color='blue', exp=1.0, scale='log'):\n", " \"\"\"Plots data and a fitted curve.\n", "\n", " ns: sequence of n (problem size)\n", " ts: sequence of t (run time)\n", " label: string label for the data curve\n", " color: string color for the data curve\n", " exp: exponent (slope) for the fitted curve\n", " \"\"\"\n", " tfit = fit(ns, ts, exp)\n", " fit_label = 'exp = %d' % exp\n", " plt.plot(ns, tfit, label=fit_label, color='0.7', linestyle='dashed')\n", " plt.plot(ns, ts, 'o-', label=label, color=color, alpha=0.7)\n", " plt.xlabel('Problem size (n)')\n", " plt.ylabel('Runtime (seconds)')\n", " plt.xscale(scale)\n", " plt.yscale(scale)\n", " plt.legend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the results from `append_list`. When we plot `ts` versus `ns` on a log-log scale, we should get a straight line. If the order of growth is linear, the slope of the line should be 1. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "plot_timing_test(ns, ts, 'append_list', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For small values of `n`, the runtime is so short that we're probably not getting an accurate measurement of just the operation we're interested in. But as `n` increases, runtime seems to converge to a line with slope 1. \n", "\n", "That suggests that performing append `n` times is linear, which suggests that a single append is constant time. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### list.pop\n", "\n", "Now let's try that with `pop`, and specifically with `pop(0)`, which pops from the left side of the list." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def pop_left_list(n):\n", " t= []\n", " for i in range(n):\n", " t.append(i)\n", " for _ in range(n):\n", " t.pop(0)\n", " return t\n", "\n", "ns, ts = run_timing_test(pop_left_list)\n", "plot_timing_test(ns, ts, 'pop_left_list', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That doesn't look very good. The runtimes increase more steeply than the line with slope 1. Let's try slope 2." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "plot_timing_test(ns, ts, 'pop_left_list', exp=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last few points converge on the line with slope 2, which suggests that `pop(0)` is quadratic." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** What happens if you pop from the end of the list? Write a function called `pop_right_list` that pops the last element instead of the first. Use `run_timing_test` to characterize its performance. Then use `plot_timing_test` with a few different values of `exp` to find the slope that best matches the data. What conclusion can you draw about the order of growth for popping an element from the end of a list?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sorting\n", "\n", "We expect sorting to be `n log n`. On a log-log scale, that doesn't look like a straight line, so there's no simple test whether it's really `n log n`. Nevertheless, we can plot results for sorting lists with different lengths, and see what it looks like." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def sort_list(n):\n", " t = np.random.random(n)\n", " t.sort()\n", " return t\n", "\n", "ns, ts = run_timing_test(sort_list)\n", "plot_timing_test(ns, ts, 'sort_list', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It sure looks like sorting is linear, so that's surprising. But remember that `log n` changes much more slowly than `n`. Over a wide range of values, `n log n` can be hard to distinguish from an algorithm with linear growth. As `n` gets bigger, we would expect this curve to be steeper than slope 1. But often, for practical problem sizes, `n log n` is as good as linear." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bisection search\n", "\n", "We expect bisection search to be `log n`, which is so fast it is hard to measure the way we've been doing it.\n", "\n", "To make it work, I create the sorted list ahead of time and use the parameter `hi` to specify which part of the list to search. Also, I have to run each search 100,000 times so it takes long enough to measure. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "t = np.random.random(16777216)\n", "t.sort()\n", "\n", "from bisect import bisect\n", "\n", "def search_sorted_list(n):\n", " for i in range(100000):\n", " index = bisect(t, 0.1, hi=n) \n", " return index\n", "\n", "ns, ts = run_timing_test(search_sorted_list, max_time=0.2)\n", "plot_timing_test(ns, ts, 'search_sorted_list', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like the runtime increases slowly as `n` increases, but it's definitely not linear. To see if it's constant time, we can compare it to the line with slope 0." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "plot_timing_test(ns, ts, 'search_sorted_list', exp=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Nope, looks like it's not constant time, either. We can't really conclude that it's `log n` based on this curve alone, but the results are certainly consistent with that theory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dictionary methods\n", "\n", "**Exercise:** Write methods called `add_dict` and `lookup_dict`, based on `append_list` and `pop_left_list`. What is the order of growth for adding and looking up elements in a dictionary?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Implementing a hash table\n", "\n", "The reason Python dictionaries can add and look up elements in constant time is that they are based on hash tables. In this section, we'll implement a hash table in Python. Remember that this example is for educational purposes only. There is no practical reason to write a hash table like this in Python.\n", "\n", "We'll start with a simple linear map, which is a list of key-value pairs." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "class LinearMap(object):\n", " \"\"\"A simple implementation of a map using a list of tuples\n", " where each tuple is a key-value pair.\"\"\"\n", "\n", " def __init__(self):\n", " self.items = []\n", "\n", " def add(self, k, v):\n", " \"\"\"Adds a new item that maps from key (k) to value (v).\n", " Assumes that they keys are unique.\"\"\"\n", " self.items.append((k, v))\n", "\n", " def get(self, k):\n", " \"\"\"Looks up the key (k) and returns the corresponding value,\n", " or raises KeyError if the key is not found.\"\"\"\n", " for key, val in self.items:\n", " if key == k:\n", " return val\n", " raise KeyError" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First let's make sure it works:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": true }, "outputs": [], "source": [ "def test_map(m):\n", " s = string.ascii_lowercase\n", "\n", " for k, v in enumerate(s):\n", " m.add(k, v)\n", "\n", " for k in range(len(s)):\n", " print(k, m.get(k))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "m = LinearMap()\n", "test_map(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's see how long it takes to add `n` elements." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def add_linear_map(n):\n", " d = LinearMap()\n", " for i in range(n):\n", " d.add(i, 1)\n", " return d\n", "\n", "ns, ts = run_timing_test(add_linear_map)\n", "plot_timing_test(ns, ts, 'add_linear_map', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Adding `n` elements is linear, so each add is constant time. How about lookup?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def lookup_linear_map(n):\n", " d = LinearMap()\n", " for i in range(n):\n", " d.add(i, 1)\n", " total = 0\n", " for i in range(n):\n", " total += d.get(i)\n", " return d\n", "\n", "ns, ts = run_timing_test(lookup_linear_map)\n", "plot_timing_test(ns, ts, 'lookup_linear_map', exp=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking up `n` elements is $O(n^2)$ (notice that `exp=2`). So each lookup is linear.\n", "\n", "Let's see what happens if we break the list of key-value pairs into 100 lists." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "class BetterMap(object):\n", " \"\"\"A faster implementation of a map using a list of LinearMaps\n", " and the built-in function hash() to determine which LinearMap\n", " to put each key into.\"\"\"\n", "\n", " def __init__(self, n=100):\n", " \"\"\"Appends (n) LinearMaps onto (self).\"\"\"\n", " self.maps = []\n", " for i in range(n):\n", " self.maps.append(LinearMap())\n", "\n", " def find_map(self, k):\n", " \"\"\"Finds the right LinearMap for key (k).\"\"\"\n", " index = hash(k) % len(self.maps)\n", " return self.maps[index]\n", "\n", " def add(self, k, v):\n", " \"\"\"Adds a new item to the appropriate LinearMap for key (k).\"\"\"\n", " m = self.find_map(k)\n", " m.add(k, v)\n", "\n", " def get(self, k):\n", " \"\"\"Finds the right LinearMap for key (k) and looks up (k) in it.\"\"\"\n", " m = self.find_map(k)\n", " return m.get(k)\n" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "m = BetterMap()\n", "test_map(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The run time is better (we get to a larger value of `n` before we run out of time). " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def lookup_better_map(n):\n", " d = BetterMap()\n", " for i in range(n):\n", " d.add(i, 1)\n", " total = 0\n", " for i in range(n):\n", " total += d.get(i)\n", " return d\n", "\n", "ns, ts = run_timing_test(lookup_better_map)\n", "plot_timing_test(ns, ts, 'lookup_better_map', exp=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The order of growth is hard to characterize. It looks steeper than the line with slope 1. Let's try slope 2." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "plot_timing_test(ns, ts, 'lookup_better_map', exp=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It might be converging to the line with slope 2, but it's hard to say anything conclusive without running larger problem sizes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Go back and run `run_timing_test` with a larger value of `max_time` and see if the run time converges to the line with slope 2. Just be careful not to make `max_time` to big." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we're ready for a complete implementation of a hash map." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "class HashMap(object):\n", " \"\"\"An implementation of a hashtable using a BetterMap\n", " that grows so that the number of items never exceeds the number\n", " of LinearMaps.\n", "\n", " The amortized cost of add should be O(1) provided that the\n", " implementation of sum in resize is linear.\"\"\"\n", "\n", " def __init__(self):\n", " \"\"\"Starts with 2 LinearMaps and 0 items.\"\"\"\n", " self.maps = BetterMap(2)\n", " self.num = 0\n", "\n", " def get(self, k):\n", " \"\"\"Looks up the key (k) and returns the corresponding value,\n", " or raises KeyError if the key is not found.\"\"\"\n", " return self.maps.get(k)\n", "\n", " def add(self, k, v):\n", " \"\"\"Resize the map if necessary and adds the new item.\"\"\"\n", " if self.num == len(self.maps.maps):\n", " self.resize()\n", "\n", " self.maps.add(k, v)\n", " self.num += 1\n", "\n", " def resize(self):\n", " \"\"\"Makes a new map, twice as big, and rehashes the items.\"\"\"\n", " new_map = BetterMap(self.num * 2)\n", "\n", " for m in self.maps.maps:\n", " for k, v in m.items:\n", " new_map.add(k, v)\n", "\n", " self.maps = new_map" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "m = HashMap()\n", "test_map(m)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Exercise:** Write a function called `lookup_hash_map`, based on `lookup_better_map`, and characterize its run time.\n", "\n", "If things go according to plan, the results should converge to a line with slope 1. Which means that `n` lookups is linear, which means that each lookup is constant time. Which is pretty much magic." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "# Solution goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }