{ "metadata": { "name": "recitation9" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CS1001.py\n", "\n", "## Extended Introduction to Computer Science with Python, Tel-Aviv University, Spring 2013\n", "\n", "# Recitation 9 - 9-13.5.2013\n", "\n", "## Last update: 13.5.2013" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data structures\n", "\n", "Python's:\n", "\n", "- `list`\n", "- `dict`\n", "- `string`\n", "- `tuple`\n", "- `set`\n", "\n", "Course implementations:\n", "\n", "- `tree`\n", "- `matrix`\n", "- `hashtable`\n", "- **`linked list`**\n", "\n", "## Linked list\n", "\n", "A dynamic structure, location in memory is not consecutive. \n", "\n", "Some operations become $O(1)$, on the expence of others (especially access).\n", "\n", "[![Linked List view](http://greenteapress.com/thinkpython/html/illustrations/link2.png)](http://greenteapress.com/thinkpython/html/chap17.html)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class Node:\n", " def __init__(self, val):\n", " self.value = val\n", " self.next = None\n", " \n", " def __repr__(self):\n", " return str(self.value)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "node1 = Node(1)\n", "node2 = Node(2)\n", "node1.next = node2\n", "node3 = Node(3)\n", "node2.next = node3" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "class LinkedList:\n", " def __init__(self):\n", " self.first = None\n", "\n", " def __repr__(self):\n", " out = \"[\"\n", " curr = self.first\n", " while curr:\n", " out += str(curr) + \", \"\n", " curr = curr.next\n", " return out[:-2] + \"]\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "lst = LinkedList()\n", "lst.first = node1\n", "lst" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 6, "text": [ "[1, 2, 3]" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterating the list, as done is `__repr__`, can be done by advancing the current node to the next untill reaching a `None`:\n", "\n", "![Iterating a linked list](http://greenteapress.com/thinkpython/html/illustrations/link3.png)\n", "\n", "Now we will implement some more methods:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class LinkedList:\n", " def __init__(self):\n", " self.first = None\n", "\n", " def __repr__(self):\n", " out = \"[\"\n", " curr = self.first\n", " while curr:\n", " out += str(curr) + \", \"\n", " curr = curr.next\n", " return out[:-2] + \"]\"\n", " \n", " def length(self):\n", " curr = self.first\n", " i = 0\n", " while curr:\n", " i += 1\n", " curr = curr.next\n", " return i\n", " \n", " def prepend(self, value):\n", " node = Node(value)\n", " if not self.first:\n", " self.first = node\n", " else:\n", " self.first, node.next = node, self.first\n", " \n", " def append(self, value):\n", " node = Node(value)\n", " curr = self.first\n", " if not curr:\n", " self.first = node\n", " else:\n", " while curr.next:\n", " curr = curr.next\n", " curr.next = node\n", "\n", " def insert(self, index, value):\n", " curr = self.first\n", " for i in range(index - 1):\n", " if not curr or not curr.next:\n", " raise IndexError()\n", " curr = curr.next\n", " node = Node(value)\n", " curr.next, node.next = node, curr.next\n", "\n", " def remove(self, index):\n", " curr = self.first\n", " # iterate to the one before the one to be removed\n", " for i in range(index - 1):\n", " if not curr or not curr.next:\n", " raise IndexError\n", " curr = curr.next\n", " curr.next = curr.next.next\n", " \n", " def get(self, index):\n", " curr = self.first\n", " for i in range(index):\n", " if not curr or not curr.next:\n", " raise IndexError\n", " curr = curr.next\n", " return curr.value\n", " \n", " def index(self, value):\n", " curr = self.first\n", " i = 0\n", " while curr:\n", " if curr.value == value:\n", " return i\n", " curr = curr.next\n", " i += 1\n", " raise ValueError()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "node1 = Node(1)\n", "node2 = Node(2)\n", "node1.next = node2\n", "node3 = Node(3)\n", "node2.next = node3\n", "lst = LinkedList()\n", "lst.first = node1\n", "\n", "print(lst)\n", "print(lst.length())\n", "lst.prepend(0)\n", "print(lst)\n", "lst.append(4)\n", "print(lst)\n", "lst.insert(2, 1.5)\n", "print(lst)\n", "lst.remove(2)\n", "print(lst)\n", "print(lst.get(2))\n", "print(lst.index(2))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 2, 3]\n", "3\n", "[0, 1, 2, 3]\n", "[0, 1, 2, 3, 4]\n", "[0, 1, 1.5, 2, 3, 4]\n", "[0, 1, 2, 3, 4]\n", "2\n", "2\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reverse\n", "\n", "We want to reverse a linked list. We implement this as a mehtod just so we don't copy-paste the entire code from above - this can and should be a method." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def reverse(lst):\n", " prev = None\n", " curr = lst.first\n", " while curr:\n", " curr.next, prev, curr = prev, curr, curr.next\n", " lst.first = prev" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "print(lst)\n", "reverse(lst)\n", "print(lst)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[0, 1, 2, 3, 4]\n", "[4, 3, 2, 1, 0]\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Queue\n", "\n", "We want to have a first-in-first-out (FIFO) data structure so that we can:\n", "\n", "-\tpush/enqueue: insert an element to the top of the queue\n", "-\tpop/dequeue: remove an element from the bottom of the queue\n", "\n", "How do our current ordered data structure deal with these tasks?\n", "\n", "- hashtable\n", "- list\n", "- linked list\n", "\n", "Lets try it. \n", "\n", "With a `list` implementation `push` and `pop` must be $O(1)$ and $O(n)$ or vice versa, as `append` and `pop()` are $O(1)$ but `insert(0)` and `pop(0)` are $O(n)$.\n", "\n", "We will implement it so that `push` is cheap:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class QueueList:\n", " def __init__(self):\n", " self.lst = []\n", " \n", " def __repr__(self):\n", " return self.lst.__repr__()\n", "\n", " def push(self, value):\n", " self.lst.append(Node(value))\n", "\n", " def pop(self):\n", " return self.lst.pop(0)\n", "\n", "q1 = QueueList()\n", "q1.push(1)\n", "q1.push(2)\n", "q1.push(3)\n", "print(q1.pop())\n", "q1" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n" ] }, { "output_type": "pyout", "prompt_number": 11, "text": [ "[2, 3]" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "With a linked list it's the other way around - `prepend` and `remove(0)` are $O(1)$ but `append` and `remove(-1)` are $O(n)$. We will push to the head so that we get a cheap push:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class QueueLinkedList: \n", " def __init__(self): \n", " self.head = None \n", "\n", " def __repr__(self):\n", " out = \"[\"\n", " curr = self.head\n", " while curr:\n", " out += str(curr) + \", \"\n", " curr = curr.next\n", " return out[:-2] + \"]\"\n", "\n", " def push(self, value): \n", " node = Node(value)\n", " if self.head == None:\n", " self.head = node\n", " else:\n", " self.head, node.next = node, self.head\n", " \n", " def pop(self):\n", " if self.head != None:\n", " curr, prev = self.head, None\n", " while curr.next:\n", " curr, prev = curr.next, curr\n", " prev.next = None\n", " return curr.value\n", "\n", "q2 = QueueLinkedList()\n", "q2.push(1)\n", "q2.push(2)\n", "q2.push(3)\n", "print(q2.pop())\n", "q2" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n" ] }, { "output_type": "pyout", "prompt_number": 12, "text": [ "[3, 2]" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check performence:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "q1 = QueueList()\n", "q2 = QueueLinkedList()\n", "for i in range(10**5): \n", " q1.push(1)\n", " q2.push(1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit -n 100 q1.push(1)\n", "%timeit -n 100 q2.push(1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 1.12 us per loop\n", "100 loops, best of 3: 1.37 us per loop\n" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit -n 100 q1.pop()\n", "%timeit -n 100 q2.pop()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 68.3 us per loop\n", "100 loops, best of 3: 23.1 ms per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So indeed `push` is cheap and `pop` is expensive, much more so in the linked list because `list` is implemented in C." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want a data structure where both `push` and `pop` are $O(1)$.\n", "\n", "- linked list with tail reference!\n", "\n", "We will keep track of the `head` and the `tail` and so both `push` and `pop` can be $O(1)$:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class TailedQueueLinkedList: \n", " def __init__(self): \n", " self.head = None \n", " self.tail = None \n", "\n", " def __repr__(self):\n", " out = \"[\"\n", " curr = self.head\n", " while curr:\n", " out += str(curr) + \", \"\n", " curr = curr.next\n", " if len(out) > 2:\n", " out = out[:-2]\n", " return out + \"]\"\n", "\n", " def push(self, value): \n", " node = Node(value)\n", " if self.head == None:\n", " # empty queue\n", " self.head, self.tail = node, node\n", " else:\n", " # push to the tail. can you think of a way to push from the head? where will you pop from?\n", " self.tail.next, self.tail = node, node\n", " \n", " def pop(self):\n", " # pop from the head\n", " if self.head != None:\n", " node = self.head\n", " self.head = self.head.next \n", " return node.value\n", "\n", "q3 = TailedQueueLinkedList()\n", "q3.push(1)\n", "q3.push(2)\n", "q3.push(3)\n", "print(q3.pop())\n", "q3" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n" ] }, { "output_type": "pyout", "prompt_number": 14, "text": [ "[2, 3]" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "q1 = QueueList()\n", "q2 = QueueLinkedList()\n", "q3 = TailedQueueLinkedList()\n", "for i in range(10**5): \n", " q1.push(1)\n", " q2.push(1)\n", " q3.push(1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 41 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit -n 100 q1.push(1)\n", "%timeit -n 100 q2.push(1)\n", "%timeit -n 100 q3.push(1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 1.13 us per loop\n", "100 loops, best of 3: 1.38 us per loop\n", "100 loops, best of 3: 1.37 us per loop\n" ] } ], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit -n 100 q1.pop()\n", "%timeit -n 100 q2.pop()\n", "%timeit -n 100 q3.pop()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100 loops, best of 3: 68.5 us per loop\n", "100 loops, best of 3: 24.9 ms per loop" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "100 loops, best of 3: 1.04 us per loop\n" ] } ], "prompt_number": 43 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And indeed the tailed linked list keeps a low time for both `push` and `pop`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Iterators\n", "\n", "We want to be able to do:\n", "\n", " for element in seq:\n", " ...\n", "\n", "`seq` must be iterable -> must have a method named `__iter__`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "class A:\n", " def __init__(self):\n", " self.values = [1,2,3,4]\n", "\n", " def __repr__(self):\n", " return str(self.values)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 66 }, { "cell_type": "code", "collapsed": false, "input": [ "a = A()\n", "print(a)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1, 2, 3, 4]\n" ] } ], "prompt_number": 67 }, { "cell_type": "code", "collapsed": false, "input": [ "for x in a:\n", " print(x)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'A' object is not iterable", "output_type": "pyerr", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[1;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[1;32min\u001b[0m \u001b[0ma\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mTypeError\u001b[0m: 'A' object is not iterable" ] } ], "prompt_number": 68 }, { "cell_type": "code", "collapsed": false, "input": [ "class A:\n", " ''' using the built-in iterator of list '''\n", " def __init__(self):\n", " self.values = [1,2,3,4]\n", "\n", " def __repr__(self):\n", " return str(self.values)\n", "\n", " def __iter__(self): #now A is iterable\n", " return iter(self.values) \t#returns the iterator class of a list" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 69 }, { "cell_type": "code", "collapsed": false, "input": [ "a = A()\n", "for x in a:\n", " print(x)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n", "2\n", "3\n", "4\n" ] } ], "prompt_number": 70 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can write our own iterator:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class B:\n", " ''' using our own iterator class: BIterator '''\n", " def __init__(self):\n", " self.values = [1,2,3,4]\n", " self.counts = [3,4,0,1]\n", "\n", " def __repr__(self):\n", " return str(self.values) + \"\\n\" + str(self.count)\n", " \n", " def __iter__(self):\n", " return BIterator(self.values, self.counts)\n", "\n", "class BIterator():\n", " ''' iterator of B, must have __next__ '''\n", " def __init__(self, values, counts):\n", " self.index = 0\n", " self.lst = [values[i] for i in range(len(values)) for j in range(counts[i])]\n", " \n", " def __next__(self):\n", " if self.index < len(self.lst):\n", " res = self.lst[self.index]\n", " self.index += 1\n", " return res\n", " raise StopIteration" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 75 }, { "cell_type": "code", "collapsed": false, "input": [ "b = B()\n", "for x in b:\n", " print(x, end=\", \")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1, 1, 1, 2, 2, 2, 2, 4, " ] } ], "prompt_number": 74 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generators\n", "\n", "Instead a separate class for the iterator, `__iter__` can use `yield`.\n", "\n", "This is called a **generator** (as it generates an iterator):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class C:\n", " ''' a generator example using yield '''\n", " \n", " def __init__(self):\n", " self.values = [1,2,3,4]\n", " self.counts = [3,4,0,1]\n", "\n", " def __repr__(self):\n", " return str(self.values) + \"\\n\" + str(self.count)\n", " \n", " def __iter__(self):\n", " for i in range(len(self.values)):\n", " for j in range(self.counts[i]):\n", " yield self.values[i]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 76 }, { "cell_type": "code", "collapsed": false, "input": [ "c = C()\n", "for x in c:\n", " print(x, end=\", \")" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1, 1, 1, 2, 2, 2, 2, 4, " ] } ], "prompt_number": 78 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generators involve lazy evaluation \u2013 one element is created at a time.\n", "\n", "`range` is a generator we already know. Lets compare `range` with `list(range)`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "for i in list(range(10**8)):\n", " pass" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(10**8):\n", " pass" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![List vs Generator](https://raw.github.com/yoavram/CS1001.py/master/list_vs_generator.png)\n", "\n", "If you want to try this at home, you need to run this while having a process monitor tool open (Windows: Task Manager, Linux: top):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use generators to run an \"infinite\" for loop.\n", "\n", "Suppose you want to iterate the natural numbers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def natural_numbers():\n", " n = 0\n", " while True:\n", " n += 1\n", " yield n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "for n in natural_numbers():\n", " if n % 667 == 0 and n**2 % 766 == 0:\n", " break\n", "print(n)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "510922\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use \"generator compehension\" instead of \"list comprehension\" to save memory and to perform lazy init to save runtime because you know you won't use all of them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "evens_less_than_1e6_list = [x for x in range(1, 10**6) if x % 2 == 0]\n", "%timeit -n 3 [x for x in range(1, 10**6) if x % 2 == 0]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3 loops, best of 3: 637 ms per loop\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "evens_less_than_1e6_generator = (x for x in range(1, 10**6) if x % 2 == 0)\n", "%timeit -n 3 (x for x in range(1, 10**6) if x % 2 == 0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3 loops, best of 3: 6.64 us per loop\n" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "import sys\n", "sys.getsizeof(evens_less_than_1e6_list), sys.getsizeof(evens_less_than_1e6_generator)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 32, "text": [ "(4290016, 72)" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fin\n", "This notebook is part of the [Extended introduction to computer science](http://tau-cs1001-py.wikidot.com/) course at Tel-Aviv University.\n", "\n", "The notebook was written using Python 3.2 and IPython 0.13.1.\n", "\n", "The code is available at .\n", "\n", "The notebook can be viewed online at .\n", "\n", "This work is licensed under a [Creative Commons Attribution-ShareAlike 3.0 Unported License](http://creativecommons.org/licenses/by-sa/3.0/)." ] } ], "metadata": {} } ] }