{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1><center>cs1001.py , Tel Aviv University, Fall 2017-2018</center></h1>\n",
    "<img src=\"http://www.pngall.com/wp-content/uploads/2016/05/Python-Logo-PNG-Image-180x180.png\" width=50/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recitation 10\n",
    "\n",
    "We discussed hash tables and iterators + generators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Takeaways:\n",
    "- Hash tables can be useful for many algorithms, including memoization. \n",
    "- Make sure you understand the complexity analysis for hash tables (see the links below).\n",
    "- Generators function is a function that contains the yield command and returns a genertor object. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Code for printing several outputs in one cell (not part of the recitation):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.interactiveshell import InteractiveShell\n",
    "InteractiveShell.ast_node_interactivity = \"all\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Hash"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We wish to have a data structure that implements the operations: insert, search and delete in **expected** $O(1)$ time. \n",
    "\n",
    "Summarizing the insert and search complexity of the data structures that we have seen already:\n",
    "<img src=\"tbl_ds.PNG\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A detailed summary on the complexity of insert/search operations using hash tables can be found <a href=\"http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017a/hashtable_find_and_insert_complexity.pdf\"> here </a>. Make sure you read it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise: \n",
    "given a string $st$ of length $n$ and a small integer $\\ell$, write a function that checks whether there is a substring in $st$ of length $\\ell$ that appears more than once.\n",
    "\n",
    "\n",
    "Make sure you read the following <a href=\"http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2016b/m_10_repeating_substring_additional_material.pdf\">summary</a> that includes a detailed explanation on the experiments."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Naive solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The complexity is $O(\\ell(n-\\ell)^2)$. \n",
    "There $O((n-\\ell)^2)$ iterations (make sure you undersand why) and in each iteration we perform operations in $O(\\ell)$ time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def repeat_naive(st, l): \n",
    "    for i in range(len(st)-l+1):\n",
    "        for j in range(i+1,len(st)-l+1):\n",
    "            if st[i:i+l]==st[j:j+l]:\n",
    "                return True\n",
    "    return False\n",
    "\n",
    "repeat_naive(\"hello\", 1)\n",
    "repeat_naive(\"hello\"*10, 45)\n",
    "repeat_naive(\"hello\"*10, 46)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function that generates a random string of a given size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import random\n",
    "def gen_str(size, alphabet = \"abcdefghijklmnopqrstuvwxyz\"):\n",
    "    ''' Generate a random string of length size over alphabet '''\n",
    "    s=\"\"\n",
    "    for i in range(size):\n",
    "        s += random.choice(alphabet)\n",
    "    return s\n",
    "rndstr = gen_str(1000)\n",
    "repeat_naive(rndstr, 3)\n",
    "repeat_naive(rndstr, 10)\n",
    "\n",
    "rndstr1 = gen_str(10000)\n",
    "repeat_naive(rndstr1, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The class Hashtable from the lectures"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Hashtable:\n",
    "    def __init__(self, m, hash_func=hash):\n",
    "        \"\"\" initial hash table, m empty entries \"\"\"\n",
    "        ##bogus initialization #1:\n",
    "        #self.table = [[]*m]\n",
    "        ##bogus initialization #2:\n",
    "        #empty=[]\n",
    "        #self.table = [empty for i in range(m)]\n",
    "        \n",
    "        self.table = [ [] for i in range(m)]\n",
    "        self.hash_mod = lambda x: hash_func(x) % m # using python hash function\n",
    "\n",
    "    def __repr__(self):\n",
    "        L = [self.table[i] for i in range(len(self.table))]\n",
    "        return \"\".join([str(i) + \" \" + str(L[i]) + \"\\n\" for i in range(len(self.table))])\n",
    "    \n",
    "    def find(self, item):\n",
    "        \"\"\" returns True if item in hashtable, False otherwise  \"\"\"\n",
    "        i = self.hash_mod(item)\n",
    "        return item in self.table[i]\n",
    "        #if item in self.table[i]:\n",
    "        #    return True\n",
    "        #else:\n",
    "        #    return False\n",
    "\n",
    "    def insert(self, item):\n",
    "        \"\"\" insert an item into table \"\"\"\n",
    "        i = self.hash_mod(item)\n",
    "        if item not in self.table[i]:\n",
    "            self.table[i].append(item)\n",
    "      \n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Solution using the class Hashtable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The expected (average) complexity is: $O(\\ell(n-\\ell))$\n",
    "\n",
    "Creating the table takes $O(n-\\ell)$ time, and there are $O(n-\\ell)$ iterations, each taking expected $O(\\ell)$ time.\n",
    "\n",
    "\n",
    "\n",
    "The worst case complexity is: $O(\\ell(n-\\ell)^2)$\n",
    "\n",
    "Creating the table takes $O(n-\\ell)$ time, and the time for executing the loop is\n",
    "$\\ell\\cdot\\sum_{i=0}^{n-\\ell}{i}= O(\\ell(n-\\ell)^2)$ \n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def repeat_hash1(st, l):\n",
    "    m=len(st)-l+1\n",
    "    htable = Hashtable(m)\n",
    "    for i in range(len(st)-l+1):\n",
    "        if htable.find(st[i:i+l])==False:\n",
    "            htable.insert(st[i:i+l])\n",
    "        else:\n",
    "            return True\n",
    "    return False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Which of Python's naitive DS fits the solution?\n",
    "<img src=\"tbl_container.PNG\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Solution using Python's set implementation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def repeat_hash2(st, l):\n",
    "    htable = set() #Python sets use hash functions for fast lookup\n",
    "    for i in range(len(st)-l+1):\n",
    "        if st[i:i+l] not in htable:\n",
    "            htable.add(st[i:i+l])\n",
    "        else: return True\n",
    "    return False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Competition between the 3 solutions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For a random string of size $n=1000$ and for $l=10$ the running time of repeat_hash2 is the smallest, while the one for repeat_naive is the largest.\n",
    "\n",
    "When increasing $n$ to 2000, the running time of repeat_naive increases by ~4, while the running time of repeat_hash1, repeat_hash2 increases by ~2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "str_len= 1000 repeating substring len= 10\n",
      "repeat_naive 0.17578042042896413 found? False\n",
      "repeat_hash1 0.0020885685707980883 found? False\n",
      "repeat_hash2 0.0005123723312863149 found? False\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "str_len=1000\n",
    "st=gen_str(str_len)\n",
    "l=10\n",
    "print(\"str_len=\",str_len, \"repeating substring len=\",l)\n",
    "for f in [repeat_naive,repeat_hash1,repeat_hash2]:\n",
    "    t0=time.clock()\n",
    "    res=f(st, l)\n",
    "    t1=time.clock()\n",
    "    print(f.__name__, t1-t0, \"found?\",res)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "str_len= 2000 repeating substring len= 10\n",
      "repeat_naive 0.7582052599900635 found? False\n",
      "repeat_hash1 0.0047206939198076725 found? False\n",
      "repeat_hash2 0.0010855345899472013 found? False\n"
     ]
    }
   ],
   "source": [
    "str_len=2000\n",
    "st=gen_str(str_len)\n",
    "l=10\n",
    "print(\"str_len=\",str_len, \"repeating substring len=\",l)\n",
    "for f in [repeat_naive,repeat_hash1,repeat_hash2]:\n",
    "    t0=time.clock()\n",
    "    res=f(st, l)\n",
    "    t1=time.clock()\n",
    "    print(f.__name__, t1-t0, \"found?\",res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When $st$ is \"a\"$*1000$, repeat_hash1 is the slowest, since it spends time on creating an empty table of size 991.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "str_len= 2000 repeating substring len= 10\n",
      "repeat_naive 1.1842188541777432e-05 found? True\n",
      "repeat_hash1 0.0003122392372461036 found? True\n",
      "repeat_hash2 8.684277418069541e-06 found? True\n"
     ]
    }
   ],
   "source": [
    "st=\"a\"*1000\n",
    "l=10\n",
    "print(\"str_len=\",str_len, \"repeating substring len=\",l)\n",
    "for f in [repeat_naive,repeat_hash1,repeat_hash2]:\n",
    "    t0=time.clock()\n",
    "    res=f(st, l)\n",
    "    t1=time.clock()\n",
    "    print(f.__name__, t1-t0, \"found?\",res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The effect of table size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our solution, with control over the table size"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "def repeat_hash1_var_size(st, l, m=0):\n",
    "    if m==0: #default hash table size is ~number of substrings to be inserted\n",
    "        m=len(st)-l+1\n",
    "    htable = Hashtable(m)\n",
    "    for i in range(len(st)-l+1):\n",
    "        if htable.find(st[i:i+l])==False:\n",
    "            htable.insert(st[i:i+l])\n",
    "        else:\n",
    "            return True\n",
    "    return False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Comparing different table sizes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "str_len= 1000 repeating substring len= 10\n",
      "0.0392379310214892 found? False table size= 1\n",
      "0.00573320165858604 found? False table size= 10\n",
      "0.0035834484006045386 found? False table size= 100\n",
      "0.0035980537650175393 found? False table size= 1000\n",
      "0.003550684981746599 found? False table size= 1500\n",
      "0.0036312119191279635 found? False table size= 10000\n",
      "0.045343372359639034 found? False table size= 100000\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "str_len=1000\n",
    "st=gen_str(str_len)\n",
    "l=10\n",
    "print(\"str_len=\",str_len, \"repeating substring len=\",l)\n",
    "for m in [1, 10, 100, 1000, 1500, 10000, 100000]:\n",
    "    t0=time.clock()\n",
    "    res=repeat_hash1_var_size(st, l, m)\n",
    "    t1=time.clock()\n",
    "    print(t1-t0, \"found?\",res, \"table size=\",m)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Iterators and Generators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "list_iterator"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "l = [1,2,3]\n",
    "li = iter(l)\n",
    "type(li)\n",
    "li2 = iter(l)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "z is 3\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "1"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "before loop\n",
      "2\n",
      "3\n"
     ]
    }
   ],
   "source": [
    "next(li)\n",
    "next(li)\n",
    "z = next(li)\n",
    "print(\"z is\", z)\n",
    "next(li2)\n",
    "#next(li)\n",
    "print(\"before loop\")\n",
    "for elem in li2:\n",
    "    print(elem)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "ename": "StopIteration",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mStopIteration\u001b[0m                             Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-20-c9b8845252db>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mli\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mStopIteration\u001b[0m: "
     ]
    }
   ],
   "source": [
    "next(li)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A function which produces a list of all positive even numbers up to $n$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Evens_list(n):\n",
    "    ''' a list of evens up to n '''\n",
    "    return [num for num in range(n) if num%2==0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A generator function (includes a \"yield\" statement) that returns a generator that generates all positive even numbers up to $n$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def Evens_gen(n):\n",
    "    ''' returns a generator of evens up to n '''\n",
    "    for i in range(n):\n",
    "        print(\"current i:\", i)\n",
    "        if i%2 == 0:\n",
    "            print(\"before yield\")\n",
    "            yield i\n",
    "            print(\"after yield\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "g = Evens_gen(10)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "current i: 0\n",
      "before yield\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "next(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "after yield\n",
      "current i: 1\n",
      "current i: 2\n",
      "before yield\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "2"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "next(g)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A generator function which produces the **infinite** sequence of all positive even numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def All_evens_gen():\n",
    "    i=0\n",
    "    while True:\n",
    "        yield i\n",
    "        i+=2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Solving this <a href=\"http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017a/a10_geneators_exam_q.pdf\"> exam question about generators</a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Section (b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def SomePairs():\n",
    "    i=0\n",
    "    while True:\n",
    "        for j in range(i):\n",
    "            yield(i,j)\n",
    "        i=i+1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Section (c)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def RevGen(PairsGen):\n",
    "    pairs = PairsGen()\n",
    "    while True:\n",
    "        pair = next(pairs)\n",
    "        yield(pair[1],pair[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Section (d1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def UnionGenerators(gen1, gen2):\n",
    "    while True:\n",
    "        yield next(gen1)\n",
    "        yield next(gen2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Section (d2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def EqPairs():\n",
    "    i=0\n",
    "    while True:\n",
    "        yield (i,i)\n",
    "        i=i+1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def AllPairs():\n",
    "    return UnionGenerators(SomePairs(),\n",
    "                           UnionGenerators(EqPairs(),\n",
    "                                           RevGen(SomePairs)))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}