{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python basics: Expressions and strings\n", "\n", "By [Allison Parrish](http://www.decontextualize.com/)\n", "\n", "In this tutorial, I introduce the basics of how to use Python to process text, starting with the concept of expressions and evaluation. I go into particular detail on Python's string manipulation functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A note on Python versions\n", "\n", "There are two main \"branches\" of Python in current use: Python 2 and Python 3. Both of these branches have their own versions: the latest version of Python 2 (as of this writing) is Python 2.7.x, and the latest version of Python 3 is Python 3.7.x. The branches and versions all have slightly different capabilities and their syntax and structure are slightly different. Python 2.7.x still has a larger number of users overall, and many new projects continue to support it. But most data scientists and data journalists using Python today use the newer version, and following their lead, we'll be using we're using Python 3.6 or later in this course (specifically, the version included with the latest version of [Anaconda](https://www.anaconda.com/download/)). \n", "\n", "(The main reason you need to know this information is that you should be careful when looking up Python information on the Internet---make sure whatever tutorial you're looking at is about Python 3, not Python 2.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Expressions and evaluation\n", "\n", "Let's start with a very high-level description of how computer programming works. When you're writing a computer program, you're describing to the computer what you want, and then asking the computer to figure that thing out for you. Your description of what you want is called an *expression*. The process that the computer uses to turn your expression into whatever that expression means is called *evaluation.*\n", "\n", "Think of a science fiction movie where a character asks the computer, out loud, \"What's the square root of nine billion?\" or \"How many people older than 50 live in Paris, France?\" Those are examples of expressions. The process that the computer uses to transform those expressions into a response is evaluation.\n", "\n", "When the process of evaluation is complete, you're left with a single \"value\". Think of it schematically like so:\n", "\n", "![Expression -> Evaluation -> Value](http://static.decontextualize.com/snaps/expressiondiagram.png)\n", "\n", "What makes computer programs powerful is that they make it possible to write very precise and sophisticated expressions. And importantly, you can embed the results of evaluating one expression inside of another expression, or save the results of evaluating an expression for later in your program.\n", "\n", "Unfortunately, computers can't understand and intuit your desires simply from a verbal description. That's why we need computer programming languages: to give us a way to write expressions in a way that the computer can understand. Because programming languages are designed to be precise, they can also be persnickety (and frustrating). And every programming language is different. It's tricky, but worth it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arithmetic expressions\n", "\n", "Let's start with simple arithmetic expressions. The way that you write arithmetic expressions in Python is very similar to the way that you write arithmetic expressions in, say, grade school arithmetic, or algebra. In the example below, `3 + 5` is the expression. You can tell Python to evaluate the expression and display its value simply by typing in the expression in a new notebook cell and typing CTRL+ENTER." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "6" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "1 + 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arithmetic expressions in Python can be much more sophisticated than this, of course. We won't go over all of the details right now, but one thing you should know immediately is that Python arithmetic operations are evaluated using the typical order of operations, which you can override with parentheses:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "4 + 5 * 6" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "54" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(4 + 5) * 6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can write arithmetic expressions with or without spaces between the numbers and the operators (but usually it's considered better style to include spaces):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "60" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "10+20+30" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Expressions in Python can also be very simple. In fact, a number on its own is its own expression, which Python evaluates to that number itself:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "19" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "19" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you write an expression that Python doesn't understand, then you'll get an error. Here's what that looks like:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m + 20 19\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "+ 20 19" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Expressions of inequality\n", "\n", "You can also ask Python whether two expressions evaluate to the same value, or if one expression evaluates to a value greater than another expression, using a similar familiar syntax. When evaluating such expressions, Python will return one of two special values: either `True` or `False`.\n", "\n", "The `==` operator compares the expression on its left side to the expression on its right side. It evaluates to `True` if the values are equal, and `False` if they're not equal." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "3 * 5 == 9 + 6" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "20 == 7 * 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `<` operator compares the expression on its left side to the expression on its right side, evaluating to `True` if the left-side expression is less than the right-side expression, `False` otherwise. The `>` does the same thing, except checking to see if the left-side expression is greater than the right-side expression:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 < 18" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 > 18" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `>=` and `<=` operators translate to \"greater than or equal\" and \"lesser than or equal,\" respectively:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "22 >= 22" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "22 <= 22" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure to get the order of the angle bracket and the equal sign right!" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 22 =< 22\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "22 =< 22" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variables\n", "\n", "You can save the result of evaluating an expression for later using the `=` operator (called the \"assignment operator\"). On the left-hand side of the `=`, write a word that you'd like to use to refer to the value of the expression, and on the right-hand side, write the expression itself. After you've assigned a value like this, whenever you include that word in your Python code, Python will evaluate the word and replace it with the value you assigned to it earlier. Like so:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "x = (4 + 5) * 6" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "54" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Notice that the line `x = (4 + 5) * 6` didn't cause Python to display anything. That's because an assignment in Python isn't an expression, it's a \"statement\"---we'll discuss the difference later.)\n", "\n", "Now, whenever you use the variable `x` in your program, it \"stands in\" for the result of the expression that you assigned to it." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "9.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x / 6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create as many variables as you want!" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "another_variable = (x + 2) * 4" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "224" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "another_variable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Variable names can contain letters, numbers and underscores, but must begin with a letter or underscore. There are other, more technical constraints on variable names; you can review them [here](http://en.wikibooks.org/wiki/Think_Python/Variables,_expressions_and_statements#Variable_names_and_keywords).\n", "\n", "If you attempt to use a the name of a variable that you haven't defined in the notebook, Python will raise an error:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'voldemort' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mvoldemort\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'voldemort' is not defined" ] } ], "source": [ "voldemort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you assign a value to a variable, and then assign a value to it again, the previous value of the variable will be overwritten:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "x = 15" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "x = 42" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "42" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The fact that variables can be overwritten with new values can be helpful in some contexts (e.g., if you're writing a program and you're using the variable to keep track of some value that changes over time). But it can also be annoying if you use the same variable name twice on accident and overwrite values in one part of your program that another part of your program is using the same variable name to keep track of!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Types\n", "\n", "Another important thing to know is that when Python evaluates an expression, it assigns the result to a \"type.\" A type is a description of what kind of thing a value is, and Python uses that information to determine later what you can do with that value, and what kinds of expressions that value can be used in. You can ask Python what type it thinks a particular expression evaluates to, or what type a particular value is, using the `type()` function:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(100 + 1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The word int stands for \"integer.\" (\"Integers\" are numbers that don't have a fractional component, i.e., -2, -1, 0, 1, 2, etc.) Python has many, many other types, and lots of (sometimes arcane) rules for how those types interact with each other when used in the same expression. For example, you can create a floating point type (i.e., a number with a decimal point in it) by writing a number with a decimal point in it:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(3.14)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Interestingly, the result of adding a floating-point number and an integer number together is always a floating point number:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(3.14 + 17)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... and the result of dividing one integer by another integer is a floating point number:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(4 / 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Throwing an expression into the `type()` function is a good way to know whether or not the value you're working with is the value you were expecting to work with. We'll use it for debugging some example code later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Strings\n", "\n", "Another type of value in Python is called a \"string.\" Strings are a way of representing in our computer programs stretches of text: one or more letters in sequential order. To make an expression that evaluates to a string in Python, simply enclose some text inside of quotes and put it into the interactive interpreter:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Suppose there is a pigeon, suppose there is.'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Suppose there is a pigeon, suppose there is.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Asking Python for the type of a string returns `str`:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(\"Suppose there is a pigeon, suppose there is.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can use single quotes or double quotes to enclose strings (I tend to use them interchangeably), as long as the opening quote matches the closing quote:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Suppose there is a pigeon, suppose there is.'" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "'Suppose there is a pigeon, suppose there is.'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(When you ask Python to evaluate a string expression, it will display it with single quotes surrounding it.)\n", "\n", "You can assign strings to variables, just like any other value:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "roastbeef = \"Suppose there is a pigeon, suppose there is.\"" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Suppose there is a pigeon, suppose there is.'" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "roastbeef" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In versions of Python previous to Python 3, it could be tedious to use any characters inside of strings that weren't ASCII characters (i.e., the letters, numbers and punctuation used most commonly when writing English). In Python 3, you can easily include whatever characters you want by typing them into the string directly:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "cat_message = \"我爱猫!😻\"" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'我爱猫!😻'" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat_message" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### \"Escaping\" special characters in strings\n", "\n", "Normally, if there are any characters you want in your string, all you have to do to put them there is type the characters in on your keyboard, or paste in the text that you want from some other source. There are some characters, however, that require special treatment and can't be typed into a string directly.\n", "\n", "For example, say you have a double-quoted string. Now, the rules about quoting strings (as outlined above) is that the quoted string begins with a double-quote character and ends with a double-quote character. But what if you want to include a double-quote character INSIDE the string? You might think you could do this:\n", "\n", " \"And then he said, \"I think that's a cool idea,\" and vanished.\"\n", " \n", "But that won't work:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m \"And then he said, \"I think that's a cool idea,\" and vanished.\"\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "\"And then he said, \"I think that's a cool idea,\" and vanished.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It doesn't work because Python interprets the first double-quote it sees after the beginning of the string as the double-quote that marks the end of the string. Then it sees all of the stuff after the string and says, \"okay, the programmer must not be having a good day?\" and displays a syntax error. Clearly, we need a way to tell Python \"I want you to interpret this character not with the special meaning it has in Python, but LITERALLY as the thing that I typed.\"\n", "\n", "We can do this exact thing by putting a backslash in front of the characters that we want Python to interpret literally, like so:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'And then he said, \"I think that\\'s a cool idea,\" and vanished.'" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"And then he said, \\\"I think that's a cool idea,\\\" and vanished.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A character indicated in this way is called an \"escape\" character (because you've \"escaped\" from the typical meaning of the character). There are several other useful escape characters to know about:\n", "\n", "* I showed `\\\"` above, but you can also use `\\'` in a single-quoted string.\n", "* Use `\\n` if you want to include a new line in your string.\n", "* Use `\\t` instead of hitting the tab key to put a tab in your string.\n", "* Because `\\` is itself the character used to escape other characters, you need to type `\\\\` if you actually want a backslash in your string.\n", "\n", "### Printing vs. evaluating\n", "\n", "There are two ways to see the result of an expression in the interactive interpreter. You can either type the expression directly:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "22" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "7 + 15" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'\\tA \"string\" with escape\\ncharacters.'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"\\tA \\\"string\\\" with escape\\ncharacters.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can \"print\" the expression using the `print()` function by putting the expression inside the parentheses:" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "22\n" ] } ], "source": [ "print(7 + 15)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tA \"string\" with escape\n", "characters.\n" ] } ], "source": [ "print(\"\\tA \\\"string\\\" with escape\\ncharacters.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the `print()` function doesn't make a huge difference when displaying the result of an arithmetic expression. But it *does* make a difference when displaying a string. When you simply type an expression that evaluates to a string in order to display it, without the `print()` function, Python won't \"interpolate\" any special characters in the string. (\"Interpolate\" is a fancy computer programming term that means \"replace symbols in something with whatever those symbols represent.\") The `print()` function, on the other hand, *will* perform the interpolation.\n", "\n", "Typing the expression itself results in Python showing you *exactly* the code you'd need to copy and paste in order to replicate the vale. Typing the expression into `print()` tells Python to do its best to make the result of the expression look \"nice.\" (The `print()` function also sends the result of the expression to standard output, which will be important to know when we're writing our own Python programs on the command line later on.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Asking questions about strings\n", "\n", "Now that we can get some text into our program, let's talk about some of the ways Python allows us to do interesting things with that text.\n", "\n", "Let's talk about the `len()` function first. If you take an expression that evaluates to a string and put it inside the parentheses of `len()`, you get an integer value that indicates how long the string is. Like so:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "44" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(\"Suppose there is a pigeon, suppose there is.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The value that `len()` evaluates to can itself be used in other expressions (just like any other value!):" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "16" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(\"Camembert\") + len(\"Cheddar\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next up: the `in` operator, which lets us check to see if a particular string is found inside of another string." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foo\" in \"buffoon\"" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foo\" in \"reginald\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `in` operator takes one expression evaluating to a string on the left and another on the right, and returns `True` if the string on the left occurs somewhere inside of the string on the right.\n", "\n", "We can check to see if a string begins with or ends with another string using that string's `.startswith()` and `.endswith()` methods, respectively:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foodie\".startswith(\"foo\")" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foodie\".endswith(\"foo\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.isdigit()` method returns `True` if Python thinks the string could represent an integer, and `False` otherwise:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foodie\".isdigit()" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"4567\".isdigit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.isdigit()` method (along with many of the other methods discussed in this section) works not just for ASCII characters but generally across Unicode. For example, it returns `True` for a full-width digit:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"7\".isdigit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the `.islower()` and `.isupper()` methods return `True` if the string is in all lower case or all upper case, respectively (and `False` otherwise)." ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foodie\".islower()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"foodie\".isupper()" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"YELLING ON THE INTERNET\".islower()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"YELLING ON THE INTERNET\".isupper()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `in` operator discussed above will tell us if a substring occurs in some other string. If we want to know *where* that substring occurs, we can use the `.find()` method. The `.find()` method takes a single parameter between its parentheses: an expression evaluating to a string, which will be searched for within the string whose `.find()` method was called. If the substring is found, the entire expression will evaluate to the index at which the substring is found. If the substring is not found, the expression evaluates to `-1`. To demonstrate:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Now is the winter of our discontent\".find(\"win\")" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-1" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Now is the winter of our discontent\".find(\"lose\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.count()` method will return the number of times a particular substring is found within the larger string:" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"I got rhythm, I got music, I got my man, who could ask for anything more\".count(\"I got\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, remember the `==` operator that we discussed earlier? You can use that in Python to check to see if two strings contain the same characters in the same order:" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"pants\" == \"pants\"" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"pants\" == \"trousers\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple string transformations\n", "\n", "Python strings have a number of different methods which, when called on a string, return a copy of that string with a simple transformation applied to it. These are helpful for normalizing and cleaning up data, or preparing it to be displayed.\n", "\n", "Let's start with `.lower()`, which evaluates to a copy of the string in all lower case:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'argumentation! disagreement! strife!'" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"ARGUMENTATION! DISAGREEMENT! STRIFE!\".lower()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The converse of `.lower()` is `.upper()`:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'E.E. CUMMINGS IS. NOT. HAPPY ABOUT THIS.'" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"e.e. cummings is. not. happy about this.\".upper()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The method `.title()` evaluates to a copy of the string it's called on, replacing every letter at the beginning of a word in the string with a capital letter:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Dr. Strangelove, Or, How I Learned To Love The Bomb'" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"dr. strangelove, or, how I learned to love the bomb\".title()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.strip()` method removes any whitespace from the beginning or end of the string (but not between characters later in the string):" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'got some random whitespace in some places here'" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\" got some random whitespace in some places here \".strip()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, the `.replace()` method takes two parameters: a string to find, and a string to replace that string with whenever it's found. You can use this to make sad stories." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'I used to have rhythm, I used to have music, I used to have my man, who could ask for anything more'" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"I got rhythm, I got music, I got my man, who could ask for anything more\".replace(\"I got\", \"I used to have\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.replace()` method works with non-ASCII characters as well, of course:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'我爱狗!'" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"我爱猫!\".replace(\"猫\", \"狗\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reading in the contents of a file as a string\n", "\n", "So far we've just been typing our strings directly into the interactive interpreter by writing *string literals* (i.e., characters in between quotation marks). This is nice but for larger chunks of text it's desirable to be able to read files from your file system directly. Fortunately, Python makes it easy to do this! The code below will read the contents of the file `sea_rose.txt` into a variable called `text`:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "scrolled": true }, "outputs": [], "source": [ "text = open(\"sea_rose.txt\").read()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can change the name of the variable to whatever you want, of course, and you can choose a different file name as well. Once the text is loaded, it's just a regular string, and you can do whatever you want with it! You could just print it out:" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rose, harsh rose, \n", "marred and with stint of petals, \n", "meagre flower, thin, \n", "spare of leaf,\n", "\n", "more precious \n", "than a wet rose \n", "single on a stem -- \n", "you are caught in the drift.\n", "\n", "Stunted, with small leaf, \n", "you are flung on the sand, \n", "you are lifted \n", "in the crisp sand \n", "that drives in the wind.\n", "\n", "Can the spice-rose \n", "drip such acrid fragrance \n", "hardened in a leaf?\n", "\n" ] } ], "source": [ "print(text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can ask questions about it:" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text.count(\"you\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or you can transform it:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rose, haaaarsh rose, \n", "maaaarred aaaand with stint of petaaaals, \n", "meaaaagre flower, thin, \n", "spaaaare of leaaaaf,\n", "\n", "more precious \n", "thaaaan aaaa wet rose \n", "single on aaaa stem -- \n", "you aaaare caaaaught in the drift.\n", "\n", "Stunted, with smaaaall leaaaaf, \n", "you aaaare flung on the saaaand, \n", "you aaaare lifted \n", "in the crisp saaaand \n", "thaaaat drives in the wind.\n", "\n", "Caaaan the spice-rose \n", "drip such aaaacrid fraaaagraaaance \n", "haaaardened in aaaa leaaaaf?\n", "\n" ] } ], "source": [ "print(text.replace(\"a\", \"aaaa\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some caveats:\n", "\n", "* The file you specify must be located in the same directory as the interactive interpreter.\n", "* The file needs to be in *plain text* format. [More information on plain text](http://air.decontextualize.com/plain-text/)\n", "* The file needs to be in either ASCII or UTF-8 encoding. (We'll talk more about encodings later, but if the text you want to work with isn't in UTF-8 format, most text editors will allow you to modify the encoding of a file when you save it.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Functions and methods\n", "\n", "Okay, we're getting somewhere together! But I've still been using a lot of jargon when explaning this stuff. One thing that might confuse you: what's a \"function\" and what's a \"method\"?\n", "\n", "We've talked about two \"functions\" so far: `len()` and `type()`. A function is a special word that you can use in Python expressions that runs some pre-defined code: you put your expression inside the parentheses, and Python sends the result of evaluating that expression to the code in the function. That code operates on the value that you gave it, and then itself evaluates to another value. Using a function in this way is usually called \"calling\" it or \"invoking\" it. The stuff that you put inside the parentheses is called a \"parameter\" or \"argument\"; the value that the function gives back is called its \"return value.\"\n", "\n", "![Function diagram](http://static.decontextualize.com/snaps/functiondiagram.png)\n", "\n", "The `len()` and `type()` functions are two of what are called \"built-in functions,\" i.e. functions that come with Python and are available whenever you're writing Python code. In Python, built-in functions tend to be able to take many different types of value as parameters. ([There are a lot of other built-in functions](https://docs.python.org/2/library/functions.html), not just `len()` and `type()`! We'll discuss them as the need arises.)\n", "\n", "> NOTE: You can also write your own functions---we'll learn how to do this later in the class. Writing functions is a good way to avoid repetition in your code and to compartmentalize it.)\n", "\n", "\"Methods\" work a lot like functions, except in how it looks when you use them. Instead of putting the expression that you want to use them with inside the parentheses, you put the call to the method directly AFTER the expression that you want to call it on, following a period (`.`). Methods, unlike built-in functions, are usually only valid for one type of value; e.g., values of the string type have a `.strip()` method, but integer values don't.\n", "\n", "It's important to remember that methods can be called both on an expression that evaluates to a particular value AND on a variable that contains that value. So you can do this:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"hello\".find('e')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...and this:" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = \"hello\"\n", "s.find('e')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting help in the interactive interpreter\n", "\n", "The interactive interpreter has all kinds of nuggets to help you program in Python. The first one worth mentioning is the `help()` function. Pass any function or method as a parameter to `help()` and you'll get a handy description of the method or function and what it does:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function len in module builtins:\n", "\n", "len(obj, /)\n", " Return the number of items in a container.\n", "\n" ] } ], "source": [ ">>> help(len)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember above when we were talking about how certain types of value have certain \"methods\" that you can only use with that type of value? Sometimes it's helpful to be reminded of exactly which methods an object supports. You can find this out right in the interactive interpreter without having to look it up in the documentation using the `dir()` built-in function. Just pass the value that you want to know more about to `dir()`:" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['__add__',\n", " '__class__',\n", " '__contains__',\n", " '__delattr__',\n", " '__dir__',\n", " '__doc__',\n", " '__eq__',\n", " '__format__',\n", " '__ge__',\n", " '__getattribute__',\n", " '__getitem__',\n", " '__getnewargs__',\n", " '__gt__',\n", " '__hash__',\n", " '__init__',\n", " '__init_subclass__',\n", " '__iter__',\n", " '__le__',\n", " '__len__',\n", " '__lt__',\n", " '__mod__',\n", " '__mul__',\n", " '__ne__',\n", " '__new__',\n", " '__reduce__',\n", " '__reduce_ex__',\n", " '__repr__',\n", " '__rmod__',\n", " '__rmul__',\n", " '__setattr__',\n", " '__sizeof__',\n", " '__str__',\n", " '__subclasshook__',\n", " 'capitalize',\n", " 'casefold',\n", " 'center',\n", " 'count',\n", " 'encode',\n", " 'endswith',\n", " 'expandtabs',\n", " 'find',\n", " 'format',\n", " 'format_map',\n", " 'index',\n", " 'isalnum',\n", " 'isalpha',\n", " 'isdecimal',\n", " 'isdigit',\n", " 'isidentifier',\n", " 'islower',\n", " 'isnumeric',\n", " 'isprintable',\n", " 'isspace',\n", " 'istitle',\n", " 'isupper',\n", " 'join',\n", " 'ljust',\n", " 'lower',\n", " 'lstrip',\n", " 'maketrans',\n", " 'partition',\n", " 'replace',\n", " 'rfind',\n", " 'rindex',\n", " 'rjust',\n", " 'rpartition',\n", " 'rsplit',\n", " 'rstrip',\n", " 'split',\n", " 'splitlines',\n", " 'startswith',\n", " 'strip',\n", " 'swapcase',\n", " 'title',\n", " 'translate',\n", " 'upper',\n", " 'zfill']" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ ">>> dir(\"hello\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a list of all of the methods that the string type supports. (Ignore anything that begins with two underscores (`__`) for now---those are special weird built-in methods that aren't very useful to call on their own.) If you want to know more about one method in particular, you can type this (note again that you need to NOT include the parentheses after the method):" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on built-in function swapcase:\n", "\n", "swapcase(...) method of builtins.str instance\n", " S.swapcase() -> str\n", " \n", " Return a copy of S with uppercase characters converted to lowercase\n", " and vice versa.\n", "\n" ] } ], "source": [ "help(\"hello\".swapcase)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hey awesome! We've learned something about another string method. Let's try this method out:" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'nEW yORK uNIVERSITY'" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"New York University\".swapcase()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> EXERCISE: Use `dir()` and `help()` to find and research a string method that isn't mentioned in the notes. Then write an expression using that method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String indexing\n", "\n", "Python has some powerful language constructions that allow you to access parts of the string by their numerical position in the string. You can get an individual character of a string by putting square brackets (`[]`) right after an expression that evaluates to a string, and putting inside the square brackets the number that represents which character you want. Here's an example:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'n'" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"bungalow\"[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also do this with variables that contain string values, of course:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'n'" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message = \"bungalow\"\n", "message[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we were to say this expression out loud, it might read, \"I have a string, consisting of the characters `b`, `u`, `n`, `g`, `a`, `l`, `o` and `w`, in that order. Give me back the second item in that string.\" Python evaluates that expression to `n`, which is indeed the second letter in the word \"bungalow.\"\n", "\n", "### The second letter? Am I seeing things. \"u\" is clearly the second letter.\n", "\n", "You're right---good catch. But for reasons too complicated to go into here, Python (along with many other programming languages!) starts counting at 0, instead of 1. So what looks like the third letter of the string to human eyes is actually the second letter to Python. The first letter of the string is accessed using index 0, like so:" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'b'" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The way I like to conceptualize this is to think of list indexes not as specifying the number of the item you want, but instead specifying how \"far away\" from the beginning of the list to look for that value.\n", "\n", "If you attempt to use a value for the index of a list that is beyond the end of the list (i.e., the value you use is higher than the last index in the list), Python gives you an error:" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmessage\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m17\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "message[17]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An individual character from a string still has the same type as the string it came from:" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(message[3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, of course, a string containing an individual character has a length of 1:" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(message[3])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Indexes can be expressions too\n", "\n", "The thing that goes inside of the index brackets doesn't have to be a number that you've just typed in there. Any Python expression that evaluates to an integer can go in there." ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "'o'" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[2 * 3]" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'message' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmessage\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'message' is not defined" ] } ], "source": [ "x = 3\n", "message[x]" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'a'" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[message.find(\"a\")]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Negative indexes\n", "\n", "If you use `-1` as the value inside of the brackets, something interesting happens:" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'w'" ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The expression evaluates to the *last* character in the string. This is essentially the same thing as the following code:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'w'" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[len(message) - 1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... except easier to write. In fact, you can use any negative integer in the brackets, and Python will count that many items from the end of the string, and the expression evaluates to that item." ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'l'" ] }, "execution_count": 96, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[-3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the value in the brackets would \"go past\" the beginning of the list, Python will raise an error:" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "ename": "IndexError", "evalue": "string index out of range", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmessage\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m987\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mIndexError\u001b[0m: string index out of range" ] } ], "source": [ "message[-987]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## String slices\n", "\n", "The index bracket syntax explained above allows you to write an expression that evaluates to a character in a string, based on its position in the string. Python also has a powerful way for you to write expressions that return a *section* of a string, starting from a particular index and ending with another index. In Python parlance we'll call this section a *slice*.\n", "\n", "Writing an expression to get a slice of a string looks a lot like writing an expression to get a single character. The difference is that instead of putting one number between square brackets, we put *two* numbers, separated by a colon. The first number tells Python where to begin the slice, and the second number tells Python where to end it." ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ung'" ] }, "execution_count": 98, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[1:4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the value after the colon specifies at which index the slice should end, but the slice does *not* include the value at that index. I would translate the expression above as saying \"give me characters one through four of the string in the \"message\" variable, NOT INCLUDING character four.\"\n", "\n", "The fact that slice indexes aren't inclusive means that you can tell how long the slice will be by subtracting the value before the colon from the value after it:" ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ung'" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[1:4]" ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 100, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(message[1:4])" ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "4 - 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also note that---as always!---any expression that evaluates to an integer can be used for either value in the brackets. For example:" ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ga'" ] }, "execution_count": 102, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 3\n", "message[x:x+2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, note that the type of a slice is still `str`:" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(message[5:7])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Omitting slice values\n", "\n", "Because it's so common to use the slice syntax to get a string that is either a slice starting at the beginning of the string or a slice ending at the end of the string, Python has a special shortcut. Instead of writing:" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bun'" ] }, "execution_count": 104, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[0:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can leave out the `0` and write this instead:" ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bun'" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Likewise, if you wanted a slice that starts at index 4 and goes to the end of the string, you might write:" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'alow'" ] }, "execution_count": 106, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[4:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Negative index values in slices\n", "\n", "Now for some tricky stuff: You can use negative index values in slice brackets as well! For example, to get a slice of a string from the fourth-to-last element of the string up to (but not including) the second-to-last element of the string:" ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'al'" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[-4:-2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(Even with negative slice indexes, the numbers have the property that subtracting the first from the second yields the length of the slice, i.e. `-2 - (-4)` is `2`).\n", "\n", "To get the last three elements of the string:" ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bunga'" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "message[:-3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> EXERCISE: Write an expression, or a series of expressions, that prints out \"Sea Rose\" from the first occurence of the string `sand` up until the end of the poem. (Hint: Use the `.find()` method, discussed above.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Putting strings together\n", "\n", "Earlier, we discussed how the `+` operator can be used to create an expression that evaluates to the sum of two numbers. E.g.:" ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "109" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "17 + 92" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `+` operator can also be used to create a new string from two other strings. This is called \"concatenation\":" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Spiderman'" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"Spider\" + \"man\"" ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Nickel, what is nickel, it is originally rid of a cover.'" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "part1 = \"Nickel, what is nickel, \"\n", "part2 = \"it is originally rid of a cover.\"\n", "part1 + part2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can combine as many strings as you want this way, using the `+` operator multiple times in the same expression:" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'basketball'" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\"bas\" + \"ket\" + \"ball\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> EXERCISE: Write an expression that evaluates to a string containing the first fifty characters of \"Sea Rose\" followed by the last fifty characters of \"Sea Rose.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Strings and numbers\n", "\n", "It's important to remember that a string that contains what looks like a number does *not* behave like an actual integer or floating point number does. For example, attempting to subtract one string containing a number from another string containing a number will cause an error to be raised:" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for -: 'str' and 'str'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;34m\"15\"\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;34m\"4\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for -: 'str' and 'str'" ] } ], "source": [ "\"15\" - \"4\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The \"unsupported operand type(s)\" error means that you tried to use an operator (in this case `+`) with two types that the operator in question doesn't know how to work with. (Python is saying: \"You asked me to subtract a string from another string. That doesn't make sense to me.\")\n", "\n", "Attempting to add an integer or floating-point number to a string that has (what looks like) a number inside of it will raise a similar error:" ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unsupported operand type(s) for +: 'int' and 'str'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;36m16\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m\"8.9\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'int' and 'str'" ] } ], "source": [ "16 + \"8.9\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fortunately, there are built-in functions whose purpose is to convert from one type to another; notably, you can put a string inside the parentheses of the `int()` and `float()` functions, and it will evaluate to (what Python interprets as) the integer and floating-point values (respectively) of the string: " ] }, { "cell_type": "code", "execution_count": 115, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 115, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(\"17\")" ] }, { "cell_type": "code", "execution_count": 116, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "int(\"17\")" ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "int" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(int(\"17\"))" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(\"3.14159\")" ] }, { "cell_type": "code", "execution_count": 119, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.14159" ] }, "execution_count": 119, "metadata": {}, "output_type": "execute_result" } ], "source": [ "float(\"3.14159\")" ] }, { "cell_type": "code", "execution_count": 120, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "float" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(float(\"3.14159\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you give a string to one of these functions that Python can't interpret as an integer or floating-point number, Python will raise an error:" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "ename": "ValueError", "evalue": "invalid literal for int() with base 10: 'shumai'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"shumai\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'shumai'" ] } ], "source": [ "int(\"shumai\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Strings with multiple lines\n", "\n", "Sometimes we want to work with strings that have more than one \"line\" of text in them. The problem with this is that Python interprets your having pressed \"Enter\" with your having finished your input, so if you try to cut-and-paste in some text with new line characters, you'll get an error:" ] }, { "cell_type": "code", "execution_count": 122, "metadata": {}, "outputs": [ { "ename": "SyntaxError", "evalue": "EOL while scanning string literal (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m poem = \"Rose, harsh rose,\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m EOL while scanning string literal\n" ] } ], "source": [ "poem = \"Rose, harsh rose, \n", "marred and with stint of petals, \n", "meagre flower, thin, \n", "spare of leaf,\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(`EOL while scanning string literal` is Python's way of saying \"you hit enter too soon.\") One way to work around this is to include `\\n` (newline character) inside the string when we type it into our program:" ] }, { "cell_type": "code", "execution_count": 123, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rose, harsh rose,\n", "marred and with stint of petals,\n", "meagre flower, thin,\n", "spare of leaf,\n" ] } ], "source": [ "poem = \"Rose, harsh rose,\\nmarred and with stint of petals,\\nmeagre flower, thin,\\nspare of leaf,\"\n", "print(poem)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works, but it's kind of inconvenient! A better solution is to use a different way of quoting strings in Python, the triple-quote. It looks like this:" ] }, { "cell_type": "code", "execution_count": 124, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Rose, harsh rose, \n", "marred and with stint of petals, \n", "meagre flower, thin, \n", "spare of leaf,\n" ] } ], "source": [ "poem = \"\"\"Rose, harsh rose, \n", "marred and with stint of petals, \n", "meagre flower, thin, \n", "spare of leaf,\"\"\"\n", "print(poem)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you use three quotes instead of one, Python allows you to put new line characters directly into the string. Nice! We'll be using this for some of the examples below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Exercise: Create a variable called `poem` and assign the text of \"Sea Rose\" to that variable. Use the `len()` function to find out how many characters are in it. Then, use the `count()` method to find out how many times the string `rose` occurs within it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "This section introduces many of the basic building blocks you'll need in order to use computer programs to write poems. We've talked about how to use the interactive interpreter, and about expressions and values, and about the distinction between functions and methods; and we've discussed the details of how strings work and how to manipulate them.\n", "\n", "Further reading:\n", "\n", "* From [Think Python](http://www.greenteapress.com/thinkpython/html/index.html): [Variables, expressions and statements](http://greenteapress.com/thinkpython2/html/thinkpython2003.html); [Strings](http://greenteapress.com/thinkpython2/html/thinkpython2009.html).\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }