{
 "metadata": {
  "name": "",
  "signature": "sha256:4283212f2482f03f8462ac697c38a741854b56f5e9544bd98909248328f7b1f3"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Strings\n",
      "## Manipulating strings\n",
      "\n",
      "Strings are defined as a sequence of characters.  There are three ways to create strings: \n",
      "\n",
      "1. double quotes \n",
      "2. single quotes\n",
      "3. `str()` function\n",
      "\n",
      "Here is an example of all three: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dessert = \"chocolate\"\n",
      "topping = 'cherries'\n",
      "\n",
      "print dessert\n",
      "print topping\n",
      "print str(3.1415)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "chocolate\n",
        "cherries\n",
        "3.1415\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The function `len()` can be used to calculate the length of a string. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(dessert)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "9"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## String operators\n",
      "The string operators + and * concatenate and repeat a given string. For example, the first line will print combines the two strings `chocolate` and `cake` to make a new string `chocolatecake`. The second line repeats the string `chocolate` three times to make the new string `chocolatechocolatechocolate`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'chocolate' + 'cake'\n",
      "print 'chocolate' * 3"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "chocolatecake\n",
        "chocolatechocolatechocolate\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Bracket operator\n",
      "To access a particular character in a string, you can use the bracket operator `[]`. The index must be an integer. In Python, the first character in a string starts at 0 (not 1). For example, to extract the first element in the string `chocolate`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dessert[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "'c'"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* To extract characters starting from the end of the string, use negative numbers (e.g. dessert[-1] for the last character and dessert[-2] for the second to last character, etc).  \n",
      "* To extract multiple characters (or a segment), use the bracket operator with a colon (:) also known as a slice operator. For example, use `[m:n]` where m is the position to start and n-1 is the position to end (i.e. the slice operator will extract up to but not include the nth position). If the n is missing, then the characters starting from position m to the end of the string are extracted (similar idea if m is missing). \n",
      "* The bracket operator can take in a third argument `[m:n:s]` which is the step size of s between characters. A step size of -1  goes through the word backwards. For example, the following will print 'chocolate' backwards: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dessert[::-1]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "'etalocohc'"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### In operator\n",
      "If you want to search if one string is a substring of a second string, use the boolean string operator `in`: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print 'late' in 'chocolate'\n",
      "print 'date' in 'chocolate'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "True\n",
        "False\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can also use the `not in` operator in the opposite way. The `in` operator can be used in conditional statements such as `if` / `else` statements."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### Relational operators\n",
      "The operator == (or `is` ) can be used to test if two strings are equal. The operators <, > can be used to test the alphabetical order of strings. \n",
      "\n",
      "## Loop through strings\n",
      "To traverse through all characters in a given string, you can use `for` or `while` loops. Here we create the names of the duck statues in the Public Gardens in downtown Boston: Jack, Kack, Lack, Mack, Nack, Oack, Pack, Qack. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "prefixes = 'JKLMNOPQ'\n",
      "suffix = 'ack'\n",
      "for letter in prefixes:\n",
      "    print letter + suffix"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Jack\n",
        "Kack\n",
        "Lack\n",
        "Mack\n",
        "Nack\n",
        "Oack\n",
        "Pack\n",
        "Qack\n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you want to compare adjacent letters, you may want to use a `while` loop. For example, if you want to determine if a word is a palindrome: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def is_palindrome(word):\n",
      "    i = 0\n",
      "    j = len(word) - 1\n",
      "    while i < j:\n",
      "        if word[i] != word[j]:\n",
      "            return False\n",
      "        i = i + 1\n",
      "        j = j - 1\n",
      "    return True\n",
      "\n",
      "is_palindrome('tot')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## String methods\n",
      "There are a set of methods in Python that take in a string and return a value (similar to a function, but different syntax).  The syntax is the name of the string followed by a dot (or period) followed by the name of the method. \n",
      "\n",
      "#### List of string methods\n",
      "* `strip()` = gets rid of the white space in a string\n",
      "* `upper()` = take in a string and return the string in all upper case letters\n",
      "* `lower()` = take in a string and return the string in all lower case letters\n",
      "* `find()` = find all the substrings in a string\n",
      "\n",
      "For example, if you want to , use the word return all upper case letters, use the `upper()` method.  If you want to find all the 'o''s in a word, use `find()`: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print dessert.upper()\n",
      "print dessert.find('o')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CHOCOLATE\n",
        "2\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `find()` method can also take in a starting position and a stopping position of where it should search (but remember the index starts at 0). \n",
      "\n",
      "* `split(delimiter)` = splits a string based on a delimiter. If no argument is provided, it splits based on white spaces. If the delimiter is provided as an argument, it will split based on that parameter."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\"Howdy! How are you today?\".split()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "['Howdy!', 'How', 'are', 'you', 'today?']"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `split(delimiter, num)` method can take an optional second argument which is the number of times to split (useful if you want to search a string for a substring and then work with everything before the substring). \n",
      "\n",
      "* `ljust(length)` and `rjust(length)` = pad the string with spaces on the left and ride side with a given length\n",
      "\n",
      "* `replace('potato', 'tomato')` = searches a string and replaces all the words 'potato' with the word 'tomato'.  "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}