{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1><center>cs1001.py , Tel Aviv University, Fall 2017-2018</center></h1>\n",
    "<img src=\"http://www.pngall.com/wp-content/uploads/2016/05/Python-Logo-PNG-Image-180x180.png\" width=50/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Recitation 11\n",
    "\n",
    "We discussed the \"string matching\" problem and the Karp-Rabin(KR) algorithm for solving it. Then we solved a question about KR using generators."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###### Takeaways:\n",
    "- Make sure you read our <a href=\"http://tau-cs1001-py.wdfiles.com/local--files/recitation-logs-2017b/KR-summary_new.pdf\">summary.</a>\n",
    "- A naive solution for the string-matching problem has $O(m(n-m))$ time complexity.\n",
    "- By allowing \"false-positives\" (with small probability), we can obtain a linear time solution for the string-matching problem.\n",
    "- Make sure you understand the way the algorithm works, and in particular the \"rolling hash mechanism\", that is, how to compute the fingerprint of the next substring in $O(1)$ time, given the fingerprint of the current substring.\n",
    "- Make sure you understand the \"aritmetization\" step used by the algorithm.\n",
    "- Make sure you understand the question we solved."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Code for printing several outputs in one cell (not part of the recitation):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.core.interactiveshell import InteractiveShell\n",
    "InteractiveShell.ast_node_interactivity = \"all\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The string-matching problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given a string $text$ of length $n$, and a short(er) string $pattern$ of length $m$ ($m\\leq n$), report all occurrances of $pattern$ in $text$.\n",
    "\n",
    "Example:\n",
    "\n",
    "$text = $\"abracadabra\",  $pattern = $\"abr\"\n",
    "\n",
    "The requested output should be $[0,7]$, since $pattern$ appears in $text$ in indices $0,7$.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Karp-Rabin Algorithm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fingerprint(text, basis=2**16, r=2**32-3):\n",
    "    \"\"\" used to compute karp-rabin fingerprint of the pattern\n",
    "        employs Horner method (modulo r) \"\"\"\n",
    "    partial_sum = 0\n",
    "    for ch in text:\n",
    "        partial_sum =(partial_sum*basis + ord(ch)) % r\n",
    "    return partial_sum\n",
    "\n",
    "def text_fingerprint(text, m, basis=2**16, r=2**32-3):\n",
    "    \"\"\" computes karp-rabin fingerprint of the text \"\"\"\n",
    "    f=[]\n",
    "    b_power = pow(basis, m-1, r)\n",
    "    list.append(f, fingerprint(text[0:m], basis, r))\n",
    "    # f[0] equals first text fingerprint \n",
    "    for s in range(1, len(text)-m+1):\n",
    "        new_fingerprint = ( (f[s-1] - ord(text[s-1])*b_power)*basis\n",
    "                         +ord(text[s+m-1]) ) % r\n",
    "            # compute f[s], based on f[s-1]\n",
    "        list.append(f,new_fingerprint)# append f[s] to existing f       \n",
    "    return f\n",
    "\n",
    "def find_matches_KR(pattern, text, basis=2**16, r=2**32-3):\n",
    "    \"\"\" find all occurances of pattern in text\n",
    "        using coin flipping Karp-Rabin algorithm \"\"\"\n",
    "    if len(pattern) > len(text):\n",
    "        return []\n",
    "    p = fingerprint(pattern, basis, r)\n",
    "    f = text_fingerprint(text, len(pattern), basis, r)\n",
    "    matches = [s for (s,f_s) in enumerate(f) if f_s == p]\n",
    "    # list comprehension \n",
    "    return matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "text = \"abracadabra\"\n",
    "pattern = \"abr\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6422933"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fingerprint(\"abr\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "416618250354"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6422933"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base = 2**16\n",
    "arit = ord(\"a\")*(base**2) + ord(\"b\")*(base**1) + ord(\"r\")*(base**0)\n",
    "arit\n",
    "r = 2**32 - 3\n",
    "fp = arit%r\n",
    "fp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[6422933,\n",
       " 7471495,\n",
       " 6357433,\n",
       " 6488452,\n",
       " 6357389,\n",
       " 6553988,\n",
       " 6357390,\n",
       " 6422933,\n",
       " 7471495]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_fingerprint(text, 3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 7]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "find_matches_KR(pattern, text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Safe version\n",
    "Makes sure no false positives occur. In the worst case, when all $n-m$ possible substrings are indeed matches, behaves as the naive solution in terms of time complexity."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def find_matches_KR_safe(pattern, text, basis=2**16, r=2**32-3):\n",
    "    \"\"\" a safe version of KR\n",
    "        checks every suspect for a match \"\"\"\n",
    "\n",
    "    if len(pattern) > len(text):\n",
    "        return []\n",
    "    p = fingerprint(pattern, basis, r)\n",
    "    f = text_fingerprint(text, len(pattern), basis, r)\n",
    "    matches = [s for (s,f_s) in enumerate(f) if f_s == p \\\n",
    "               and text[s:s+len(pattern)]==pattern]\n",
    "    #note that python performs \"cleaver evaluation\" of the 'and' statement\n",
    "    return matches"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Competition between versions on single char string.\n",
    "This is the worst-case scenario for the safe version.\n",
    "Changing $m$ has a greater effect on the safe version than on the standard KR."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "text = 'a'* 100000\n",
      "pattern = 'a'* 100\n",
      "find_matches_KR_safe 0.17724202722919818\n",
      "find_matches_KR 0.1276535441773806\n",
      "\n",
      "pattern = 'a'* 1000\n",
      "find_matches_KR_safe 0.18844752517457397\n",
      "find_matches_KR 0.12176520046895167\n",
      "\n",
      "pattern = 'a'* 10000\n",
      "find_matches_KR_safe 0.27398147088196856\n",
      "find_matches_KR 0.13507782308521266\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "text = \"a\"*100000\n",
    "print(\"text = 'a'*\",len(text))\n",
    "for pattern in [\"a\"*100, \"a\"*1000, \"a\"*10000]:\n",
    "#for pattern in [\"a\"*30000, \"a\"*40000, \"a\"*50000, \"a\"*60000, \"a\"*70000]: #max in m=n/2\n",
    "    print(\"pattern = 'a'*\",len(pattern))\n",
    "    for f in [find_matches_KR_safe, find_matches_KR]:\n",
    "        t0=time.clock()\n",
    "        res=f(pattern, text)\n",
    "        t1=time.clock()\n",
    "        print (f.__name__, t1-t0)\n",
    "    print(\"\") #newline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Competition between versions on random strings. \n",
    "\n",
    "Note that the standard and safe versions of KR has similar running times. Moreover, as $m$ increases, the running time slightly decreases since there are less candidates to consider."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "random str of len n= 100000  , random pattern of length m= 1000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "find_matches_KR 0.17595477853319608\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "find_matches_KR_safe 0.1846639377035899\n",
      "random str of len n= 100000  , random pattern of length m= 10000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "find_matches_KR 0.1361657278466737\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "find_matches_KR_safe 0.17290264515031595\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "def gen_str(size):\n",
    "    ''' Generate a random lowercase English string of length size'''\n",
    "    s=\"\"\n",
    "    for i in range(size):\n",
    "        s+=random.choice(\"abcdefghijklmnopqrstuvwxyz\")\n",
    "    return s\n",
    "\n",
    "\n",
    "n=100000\n",
    "m=1000\n",
    "text = gen_str(n)\n",
    "pattern = gen_str(m)\n",
    "print(\"random str of len n=\", n, \" , random pattern of length m=\",m)\n",
    "for f in [find_matches_KR, find_matches_KR_safe]:\n",
    "    t0=time.clock()\n",
    "    f(pattern, text)\n",
    "    t1=time.clock()\n",
    "    print (f.__name__, t1-t0)\n",
    "    \n",
    "n=100000\n",
    "m=10000\n",
    "text = gen_str(n)\n",
    "pattern = gen_str(m)\n",
    "print(\"random str of len n=\", n, \" , random pattern of length m=\",m)\n",
    "for f in [find_matches_KR, find_matches_KR_safe]:\n",
    "    t0=time.clock()\n",
    "    f(pattern, text)\n",
    "    t1=time.clock()\n",
    "    print (f.__name__, t1-t0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Choice of $r$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By setting $r$ to be a power of the base, say $r=base$, we will obtain more false-positives. This may serve as an intuition for choosing a prime $r$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 4, 6, 9]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "[6]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "find_matches_KR(\"da\", \"abracadabra\", basis=2**16, r=2**16)\n",
    "find_matches_KR_safe(\"da\", \"abracadabra\", basis=2**16, r=2**16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6553697"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6488161"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fingerprint(\"da\", 2**16, r=2**16)\n",
    "ord(\"d\")*(2**16)**1 + ord(\"a\")\n",
    "ord(\"a\")\n",
    "\n",
    "fingerprint(\"ca\", 2**16, r=2**16)\n",
    "ord(\"c\")*(2**16)**1 + ord(\"a\")\n",
    "(ord(\"c\")*(2**16)**1 + ord(\"a\") )%2**16"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "420913348705"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "420913348705"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "97"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "425208316001"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base = 2**16\n",
    "r = 2**16\n",
    "fingerprint(\"bda\", base, r)\n",
    "ord(\"b\")*(base**2) + ord(\"d\")*(base**1) + ord(\"a\")\n",
    "(ord(\"b\")*base + ord(\"d\"))*base + ord(\"a\")\n",
    "((ord(\"b\")*base + ord(\"d\"))*base + ord(\"a\"))%r == ord(\"a\")%r\n",
    "\n",
    "\n",
    "fingerprint(\"cda\", base, r)\n",
    "(ord(\"c\")*base + ord(\"d\"))*base + ord(\"a\")\n",
    "((ord(\"c\")*base + ord(\"d\"))*base + ord(\"a\"))%r == ord(\"a\")%r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 7]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "find_matches_KR(\"Humpty\", \"Humpty Dumpty\", r=2**(16*5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2158299737877522940025"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "2158299737877522940025"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fingerprint(\"Humpty\", r=2**(16*5))\n",
    "fingerprint(\"Dumpty\", r=2**(16*5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "text_fingerprint(\"Humpty Dumpty\", 6, r=2**(16*5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "find_matches_KR(\"Humpty\", \"Humpty Dumpty\", r=2**(16*6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Alternative fingerprint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fingerprint_sum(string, r=2**32-3):\n",
    "    ''' a different kind of fingerprint: sum of all ords '''\n",
    "    partial_sum=0\n",
    "    for x in string:\n",
    "        partial_sum = (partial_sum + ord(x)) % r\n",
    "    return partial_sum\n",
    "\n",
    "def text_fingerprint_sum(string, length, r=2**32-3):\n",
    "    \"\"\" used to compute a simple sum fingerprint of the text \"\"\"\n",
    "    f =[]\n",
    "    list.append(f, fingerprint_sum(string[0:length], r))\n",
    "    for s in range (1, len(string)-length+1):\n",
    "        new_fingerprint = (f[s-1] - ord(string[s-1]) + ord(string[s+length-1])) % r\n",
    "        # compute f[s], based on f[s -1]\n",
    "        list.append(f, new_fingerprint) # append f[s] to existing f\n",
    "    return f"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[309, 309, 310, 293, 296, 294, 295, 309, 309]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "text_fingerprint_sum(\"abracadabra\", 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Solving question 2  in <a href=\"http://tau-cs1001-py.wdfiles.com/local--files/home-assignments-2017b/HW6_2017b.pdf\"> this exercise</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fingerprint(string, basis=2**16, r=2**32-3):\n",
    "    \"\"\" used to computes karp-rabin fingerprint of the pattern\n",
    "    employs Horner method (modulo r) \"\"\"\n",
    "    partial_sum=0\n",
    "    for x in string:\n",
    "        partial_sum=(partial_sum*basis+ord(x)) % r\n",
    "    return partial_sum\n",
    "\n",
    "def slide(prev_fp,prev_char,next_char,b_power,basis=2**16,r=2**32-3):\n",
    "    new_fp=((prev_fp-ord(prev_char)*b_power)*basis+ord(next_char)) % r\n",
    "    return new_fp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Section (a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "def kr_gen(text,length,basis=2**16,r=2**32-3):\n",
    "    fp = fingerprint(text[:length])\n",
    "    yield fp\n",
    "    b_power = pow(basis,length-1,r)\n",
    "    for s in range(1,len(text)-length+1):\n",
    "        fp = slide(fp,text[s-1],text[s-1+length],b_power)\n",
    "        yield fp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6422933"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "7471495"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gen = kr_gen(\"abracadabra\", 3)\n",
    "next(gen)\n",
    "next(gen)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6357433"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6488452"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6357389"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6553988"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6357390"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "6422933"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "7471495"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "ename": "StopIteration",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mStopIteration\u001b[0m                             Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-28-6fd1aa460113>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      6\u001b[0m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      7\u001b[0m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 8\u001b[1;33m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      9\u001b[0m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m     10\u001b[0m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mgen\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mStopIteration\u001b[0m: "
     ]
    }
   ],
   "source": [
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n",
    "next(gen)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Section (b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_shared_substrings(text1,text2,length):\n",
    "    g1 = kr_gen(text1, length)\n",
    "    i1 = 0\n",
    "    for fp1 in g1:\n",
    "        g2 = kr_gen(text2, length)\n",
    "        i2 = 0\n",
    "        for fp2 in g2:\n",
    "            if fp1 == fp2:\n",
    "                yield(i1, i2)\n",
    "            i2 += 1\n",
    "        i1 += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 1)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "text/plain": [
       "(3, 2)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "g = generate_shared_substrings(\"abcdef\",\"xcdefx\",3)\n",
    "next(g)\n",
    "next(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "ename": "StopIteration",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mStopIteration\u001b[0m                             Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-31-e734f8aca5ac>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mnext\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mg\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mStopIteration\u001b[0m: "
     ]
    }
   ],
   "source": [
    "next(g)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}