{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Using Recursive Bisection Search to Detect Characters in Strings\n", "\n", "*January 30, 2016*\n", "\n", "*An article about a mini-assignment from the edX course [\"Introduction to Computer Science and Programming Using Python\"](https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-6).*\n", "\n", "
\n", "\n", "The problem is as follows:\n", "\n", ">Given a character and a string of alphabetized characters, write a program that returns “True” if the character is present in the string and “False” if it is not.\n", "\n", "The assignment gives you a few hints and specifications (naturally you cannot use Python’s 'in' statement).\n", "\n", "My first attempt failed horribly. It started well, I could solve the test case (find 'a' in 'abc'), but I did not sufficiently understand what the program was doing. If something didn’t work, I’d quickly determine a possible cause and apply a patch. Eventually the code turned into spaghetti; it looked bad and could not handle the inputs it should for unknown reasons.\n", "\n", "Still, I learned more about the problem and what I had to pay attention to. I realized that there were three base cases, not one. I also realized that I did not know how to slice strings.\n", "\n", "Attempt #2 was more serious. I broke out pencil and paper and methodically solved two cases: find 'k' in 'hkqruv' and find 'u' in 'hkqruv'. Why these two? The first requires bisecting left, while the second requires bisecting right.\n", "\n", "## The First Case\n", "\n", "I have the string 'hkqruv' (aStr) and want to find 'k'. Since the string is alphabetized, bisection search is possible. We can take the length of the string (6) and divide it by 2 to get the index position of our first guess.\n", "\n", "```python\n", "aStrLength = len(aStr)\n", "guessIndex = aStrLength // 2\n", "guessChar = aStr[guessIndex]\n", "```\n", "\n", "We can test if this character equals the character we are searching for (k) and whether it is higher or lower in the alphabet than 'k'. Based on the result of our tests, the program will either return True or carry out a left or right bisection.\n", "\n", "![https://i.imgur.com/Qy9Kdkr.png](https://i.imgur.com/Qy9Kdkr.png)\n", "\n", "In the above graphic, we see our guess character in bold surrounded by two intervals/bisections. Since we are looking for 'k' and know that r > k, we should look in the left bisection.\n", "\n", "```python\n", "# search the left bisection\n", "elif guessStringValue > char:\n", " section = aStr[0:guessIndexValue]\n", " return isIn(char, section)\n", "```\n", "\n", "And the above code does exactly that. If our guessCharacter is greater than the character we are searching for, we slice up the input string ('hkqruv') so that it matches the interval we want to search ('hkq') and then call the function again.\n", "\n", "## The Second Case\n", "\n", "The second case is very similar:\n", "\n", "```python\n", "# search the right bisection\n", "elif guessStringValue < char:\n", " section = aStr[guessIndexValue+1:]\n", " return isIn(char, section)\n", "```\n", "\n", "The whole process repeats itself until reaching one of three base cases:\n", "\n", "1. Input string is an empty string (return False)\n", "2. Input string is length 1 (return False; no more bisections can be made)\n", "3. Our guessCharacter matches the input character (return True)\n", "\n", "In the future, I should ensure I understand how to solve the problem computationally prior to implementing it and to test incrementally. There were a few times where I moved forward on bad assumptions. For instance, I thought:\n", "\n", "```python\n", "inputString[guessIndexValue:]\n", "```\n", "\n", "Would output \"bc\", when it outputted \"abc\". What I actually wanted was:\n", "\n", "```python\n", "inputString[guessIndexValue+1:]\n", "```\n", "\n", "If I had taken a minute to think about it or confirm how slicing works, much frustration would have been averted.\n", "\n", "## The Finished Product" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def isIn(char, aStr):\n", " '''\n", " char: a single character\n", " aStr: an alphabetized string\n", "\n", " returns: True if char is in aStr; False otherwise\n", " '''\n", " aStrLength = len(aStr)\n", "\n", " # returns False when char not found or empty string is entered\n", " if aStr == \"\" or aStrLength == 0:\n", " return False\n", "\n", " else:\n", " guessIndexValue = aStrLength // 2\n", " guessStringValue = aStr[guessIndexValue]\n", "\n", " if guessStringValue == char:\n", " return True\n", "\n", " # search the left bisection\n", " elif guessStringValue > char:\n", " section = aStr[0:guessIndexValue]\n", " return isIn(char, section)\n", "\n", " # search the right bisection\n", " elif guessStringValue < char:\n", " section = aStr[guessIndexValue+1:]\n", " return isIn(char, section)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isIn(\"k\", \"hkqruv\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isIn(\"z\", \"hkqruv\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }