{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# List, Set, and Dictionary Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our prior session we discussed a variety of loop patterns. \n", "\n", "One of the most common patterns that we encounter in practice is the need to iterate through a list of values, transform the elements of the list using some operations, filter out the results, and return back a new list of values.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's examine again our example with the NBA teams and franchise names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nba_teams = [\n", " \"Atlanta Hawks\", \"Boston Celtics\", \"Brooklyn Nets\", \"Charlotte Hornets\",\n", " \"Chicago Bulls\", \"Cleveland Cavaliers\", \"Dallas Mavericks\",\n", " \"Denver Nuggets\", \"Detroit Pistons\", \"Golden State Warriors\",\n", " \"Houston Rockets\", \"Indiana Pacers\", \"LA Clippers\", \"Los Angeles Lakers\",\n", " \"Memphis Grizzlies\", \"Miami Heat\", \"Milwaukee Bucks\",\n", " \"Minnesota Timberwolves\", \"New Orleans Pelicans\", \"New York Knicks\",\n", " \"Oklahoma City Thunder\", \"Orlando Magic\", \"Philadelphia 76ers\",\n", " \"Phoenix Suns\", \"Portland Trail Blazers\", \"Sacramento Kings\",\n", " \"San Antonio Spurs\", \"Toronto Raptors\", \"Utah Jazz\", \"Washington Wizards\"\n", "]\n", "print(\"The list contains\", len(nba_teams), \"teams\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "franchise_names = [] # We create an empty list\n", "for team in nba_teams: # We iterate over all elements of the list\n", " # Do some operation on the list element \"team\"\n", " # and get back the result \"franchise\"\n", " franchise = team.split()[-1] \n", " # Append the \"franchise\" element in the list that we created before the loop\n", " franchise_names.append(franchise)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And below we re-write the code above as a **list comprehension**. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "franchise_names = [ team.split()[-1] for team in nba_teams ] " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In other words, list comprehensions give us the ability to write a very common loop pattern as a one-liner. However, it is not just about brevity; when we see code that uses a list comprehension we understand quickly that the code is processing one list to create another, and the various elements are together in a very specific order. Such a clarity is not guaranteed with a loop, as loops may have many uses.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining List Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The syntax of list comprehensions is based on the way mathematicians define sets and lists, a syntax that leaves it clear what the contents should be. \n", "\n", "For example `S` is a set of the square of all integer numbers from 0 to 9. In math notation, we write:\n", "\n", "+ `S = {x² : x in {0 ... 9}}`\n", "\n", "Python's list comprehensions give a very natural way to write statements just like these. It may look strange early on, but it becomes a very natural and concise way of creating lists, without having to write for-loops.\n", "\n", "Let's see again the comparison with for loops:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# This code below will create a list with the squares\n", "# of the numbers from 0 to 9 \n", "S = [] # we create an empty list\n", "for i in range(10): # We iterate over all numbers from 0 to 9\n", " S.append(i*i) # We add in the list the square of the number i\n", "print(S )# we print(the list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "S = [i*i for i in range(10)]\n", "print(S)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's do one more example. The `V` is the powers of 2 from $2^0$ until $2^{12}$:\n", "\n", "+ `V = (1, 2, 4, 8, ..., 2¹²)`\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "V=[] # Create a list\n", "for i in range(13): # Change i to be from 0 to 12\n", " V.append(2**i) # Add 2**i in the new list\n", "print(V)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# And rewritten as a list comprehension:\n", "V = [2**i for i in range(13)]\n", "print(V)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again notice the structure:\n", "```python\n", "newlist = []\n", "for i in somelist:\n", " x = do_something_with(i)\n", " newlist.append(x)\n", "```\n", "gets rewritten as\n", "```python\n", "newlist = [do_something_with(i) for i in somelist]\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The *if* statement within a list comprehension\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's consider the following case. We want to process the list of NBA teams, and keep in a list the teams that have a franchise name that contains a given substring. \n", "\n", "In the example below, we will try to find all the teams that start with the letter `B`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nba_teams = [\n", " \"Atlanta Hawks\", \"Boston Celtics\", \"Brooklyn Nets\", \"Charlotte Hornets\",\n", " \"Chicago Bulls\", \"Cleveland Cavaliers\", \"Dallas Mavericks\",\n", " \"Denver Nuggets\", \"Detroit Pistons\", \"Golden State Warriors\",\n", " \"Houston Rockets\", \"Indiana Pacers\", \"LA Clippers\", \"Los Angeles Lakers\",\n", " \"Memphis Grizzlies\", \"Miami Heat\", \"Milwaukee Bucks\",\n", " \"Minnesota Timberwolves\", \"New Orleans Pelicans\", \"New York Knicks\",\n", " \"Oklahoma City Thunder\", \"Orlando Magic\", \"Philadelphia 76ers\",\n", " \"Phoenix Suns\", \"Portland Trail Blazers\", \"Sacramento Kings\",\n", " \"San Antonio Spurs\", \"Toronto Raptors\", \"Utah Jazz\", \"Washington Wizards\"\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "franchise_names = []\n", "look_for = 'B' #looking\n", "for team in nba_teams:\n", " franchise = team.split()[-1]\n", " if franchise.startswith(look_for):\n", " franchise_names.append(franchise)\n", "print(franchise_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This pattern, where we do not add *all* the elements in the resulting list is also very common. List comprehensions allow such patterns to be also expressed as list comprehensions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "look_for = 'B'\n", "franchise_names = [team.split()[-1] for team in nba_teams if team.split()[-1].startswith(look_for)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(franchise_names)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Alternatively, you can even break the lines within a comprehension\n", "# This may help with readability\n", "franchise_names = [team.split()[-1] \n", " for team in nba_teams \n", " if team.split()[-1].startswith(look_for)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(franchise_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is another example, with a list comprehension. We have `S` is a set of the square of all integer numbers from 0 to 9, and we define `M` to be all the elements in `S` that are even. In math notation:\n", "\n", "+ `S = {x² : x in {0 ... 9}}`\n", "+ `M = {x | x in S and x even}`\n", "\n", "Now let's write the above as list comprehensions. **Note the list comprehension for deriving M uses a \"if statement\" to filter out those values that aren't of interest**, restricting to only the even squares." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "S = [i*i for i in range(10)]\n", "print(S)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M = []\n", "for i in S: # iterate through all elements in S\n", " if i%2 == 0: # if i is an event number\n", " M.append(i) # ..add it to the list\n", "print(M)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "M = [x for x in S if x%2 == 0]\n", "print(M)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are simple examples, using numerical compuation. Let's see a more \"practical\" use: In the following operation we transform a string into an list of values, a more complex operation: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sentence = 'The quick brown fox jumps over the lazy dog'\n", "words = [(w.upper(), w.lower(), len(w)) for w in sentence.split()]\n", "words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, what the code does here? It takes as input the string `sentence`, creates a list of words, and for each word it creates a tuple, with the word in uppercase, lowercase, together with the length of the word." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set and Dictionary Comprehensions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to _list_ comprehensions, we also have the same principle for sets and dictionaries. We can create sets and dictionaries in the same way, but now we do not use square brackets to surround the comprehension, but use braces instead. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Creating a set instead of a list.\n", "S = {i*i for i in range(10)}\n", "S" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Dictionary comprehension, where team name becomes the key, and franchise name the value\n", "teams_franchise = {team:team.split()[-1] for team in nba_teams}\n", "teams_franchise" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Dictionary comprehension, where team name becomes the key, and franchise name the value\n", "words = {w:len(w) for w in sentence.split()}\n", "words" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You are given the sentence 'The quick brown fox jumps over the lazy dog', " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sentence = 'The quick brown fox jumps over the lazy dog'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "**Question 1**: List each word and its length from the string 'The quick brown fox jumps over the lazy dog', conditioned on the length of the word being four characters and above\n", "\n", "**Question 2**: List only words with the letter o in them\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# List each word and its length from the string \n", "# 'The quick brown fox jumps over the lazy dog', \n", "# conditioned on the length of the word being four characters and above\n" ] }, { "cell_type": "markdown", "metadata": { "solution2": "shown", "solution2_first": true }, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "shown" }, "outputs": [], "source": [ "[ (word, len(word)) for word in sentence.split() if len(word)>=4]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# List only words with the letter o in them\n" ] }, { "cell_type": "markdown", "metadata": { "solution2": "shown", "solution2_first": true }, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "shown" }, "outputs": [], "source": [ "[ word for word in sentence.split() if 'o' in word]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will work now on a more challenging exercise. This will not only require the use of comprehensions, but will also ask you to put together things that we learned earlier in the course, especially when we studied strings.\n", "\n", "**Question 1**: You are given the `wsj` article below. Write a list comprehension for getting the words that appear more than once. \n", " * Use the `.split()` command for splitting, without passing a parameter.\n", " * When counting words, case does not matter (i.e., YAHOO is the same as Yahoo).\n", "\n", "**Question 2**: Find all the *characters* in the article that are not letters or numbers. You can use the isdigit() and isalpha() functions, which work on strings. (e.g, `\"Panos\".isalpha()` and `\"1234\".isdigit()` return True) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "wsj = \"\"\"\n", "Yahoo Inc. disclosed a massive security breach by a “state-sponsored actor” affecting at least 500 million users, potentially the largest such data breach on record and the latest hurdle for the beaten-down internet company as it works through the sale of its core business.\n", "Yahoo said certain user account information—including names, email addresses, telephone numbers, dates of birth, hashed passwords and, in some cases, encrypted or unencrypted security questions and answers—was stolen from the company’s network in late 2014 by what it believes is a state-sponsored actor.\n", "Yahoo said it is notifying potentially affected users and has taken steps to secure their accounts by invalidating unencrypted security questions and answers so they can’t be used to access an account and asking potentially affected users to change their passwords.\n", "Yahoo recommended users who haven’t changed their passwords since 2014 do so. It also encouraged users change their passwords as well as security questions and answers for any other accounts on which they use the same or similar information used for their Yahoo account.\n", "The company, which is working with law enforcement, said the continuing investigation indicates that stolen information didn't include unprotected passwords, payment-card data or bank account information.\n", "With 500 million user accounts affected, this is the largest-ever publicly disclosed data breach, according to Paul Stephens, director of policy and advocacy with Privacy Rights Clearing House, a not-for-profit group that compiles information on data breaches.\n", "No evidence has been found to suggest the state-sponsored actor is currently in Yahoo’s network, and Yahoo didn’t name the country it suspected was involved. In August, a hacker called “Peace” appeared in online forums, offering to sell 200 million of the company’s usernames and passwords for about $1,900 in total. Peace had previously sold data taken from breaches at Myspace and LinkedIn Corp.\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# getting the words that appear more than once\n" ] }, { "cell_type": "markdown", "metadata": { "solution2": "shown", "solution2_first": true }, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "shown" }, "outputs": [], "source": [ "words = wsj.lower().split()\n", "recurring = [w for w in words if words.count(w)>1]\n", "print(recurring)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(sorted(set(recurring)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Find all the *characters* in the article that are not letters or numbers" ] }, { "cell_type": "markdown", "metadata": { "solution2": "shown", "solution2_first": true }, "source": [ "### Solution" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "solution2": "shown" }, "outputs": [], "source": [ "# Let's use a set comprehension here, to eliminate duplicates\n", "nonalphanumeric = {c for c in wsj if not c.isdigit() and not c.isalpha()}\n", "print(nonalphanumeric)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }