{ "metadata": { "name": "", "signature": "sha256:4283212f2482f03f8462ac697c38a741854b56f5e9544bd98909248328f7b1f3" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Strings\n", "## Manipulating strings\n", "\n", "Strings are defined as a sequence of characters. There are three ways to create strings: \n", "\n", "1. double quotes \n", "2. single quotes\n", "3. `str()` function\n", "\n", "Here is an example of all three: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "dessert = \"chocolate\"\n", "topping = 'cherries'\n", "\n", "print dessert\n", "print topping\n", "print str(3.1415)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "chocolate\n", "cherries\n", "3.1415\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function `len()` can be used to calculate the length of a string. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "len(dessert)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "9" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String operators\n", "The string operators + and * concatenate and repeat a given string. For example, the first line will print combines the two strings `chocolate` and `cake` to make a new string `chocolatecake`. The second line repeats the string `chocolate` three times to make the new string `chocolatechocolatechocolate`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'chocolate' + 'cake'\n", "print 'chocolate' * 3" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "chocolatecake\n", "chocolatechocolatechocolate\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Bracket operator\n", "To access a particular character in a string, you can use the bracket operator `[]`. The index must be an integer. In Python, the first character in a string starts at 0 (not 1). For example, to extract the first element in the string `chocolate`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dessert[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "'c'" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "* To extract characters starting from the end of the string, use negative numbers (e.g. dessert[-1] for the last character and dessert[-2] for the second to last character, etc). \n", "* To extract multiple characters (or a segment), use the bracket operator with a colon (:) also known as a slice operator. For example, use `[m:n]` where m is the position to start and n-1 is the position to end (i.e. the slice operator will extract up to but not include the nth position). If the n is missing, then the characters starting from position m to the end of the string are extracted (similar idea if m is missing). \n", "* The bracket operator can take in a third argument `[m:n:s]` which is the step size of s between characters. A step size of -1 goes through the word backwards. For example, the following will print 'chocolate' backwards: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "dessert[::-1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "'etalocohc'" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In operator\n", "If you want to search if one string is a substring of a second string, use the boolean string operator `in`: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "print 'late' in 'chocolate'\n", "print 'date' in 'chocolate'" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True\n", "False\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also use the `not in` operator in the opposite way. The `in` operator can be used in conditional statements such as `if` / `else` statements." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Relational operators\n", "The operator == (or `is` ) can be used to test if two strings are equal. The operators <, > can be used to test the alphabetical order of strings. \n", "\n", "## Loop through strings\n", "To traverse through all characters in a given string, you can use `for` or `while` loops. Here we create the names of the duck statues in the Public Gardens in downtown Boston: Jack, Kack, Lack, Mack, Nack, Oack, Pack, Qack. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "prefixes = 'JKLMNOPQ'\n", "suffix = 'ack'\n", "for letter in prefixes:\n", " print letter + suffix" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Jack\n", "Kack\n", "Lack\n", "Mack\n", "Nack\n", "Oack\n", "Pack\n", "Qack\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you want to compare adjacent letters, you may want to use a `while` loop. For example, if you want to determine if a word is a palindrome: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "def is_palindrome(word):\n", " i = 0\n", " j = len(word) - 1\n", " while i < j:\n", " if word[i] != word[j]:\n", " return False\n", " i = i + 1\n", " j = j - 1\n", " return True\n", "\n", "is_palindrome('tot')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "True" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String methods\n", "There are a set of methods in Python that take in a string and return a value (similar to a function, but different syntax). The syntax is the name of the string followed by a dot (or period) followed by the name of the method. \n", "\n", "#### List of string methods\n", "* `strip()` = gets rid of the white space in a string\n", "* `upper()` = take in a string and return the string in all upper case letters\n", "* `lower()` = take in a string and return the string in all lower case letters\n", "* `find()` = find all the substrings in a string\n", "\n", "For example, if you want to , use the word return all upper case letters, use the `upper()` method. If you want to find all the 'o''s in a word, use `find()`: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "print dessert.upper()\n", "print dessert.find('o')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "CHOCOLATE\n", "2\n" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `find()` method can also take in a starting position and a stopping position of where it should search (but remember the index starts at 0). \n", "\n", "* `split(delimiter)` = splits a string based on a delimiter. If no argument is provided, it splits based on white spaces. If the delimiter is provided as an argument, it will split based on that parameter." ] }, { "cell_type": "code", "collapsed": false, "input": [ "\"Howdy! How are you today?\".split()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "['Howdy!', 'How', 'are', 'you', 'today?']" ] } ], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `split(delimiter, num)` method can take an optional second argument which is the number of times to split (useful if you want to search a string for a substring and then work with everything before the substring). \n", "\n", "* `ljust(length)` and `rjust(length)` = pad the string with spaces on the left and ride side with a given length\n", "\n", "* `replace('potato', 'tomato')` = searches a string and replaces all the words 'potato' with the word 'tomato'. " ] } ], "metadata": {} } ] }