{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "Created by [Nathan Kelber](http://nkelber.com) and Ted Lawless for [JSTOR Labs](https://labs.jstor.org/) under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/)
\n", "For questions/comments/improvements, email nathan.kelber@ithaka.org.
\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Basics III\n", "\n", "**Description:** This lesson describes the basics of [lists](https://docs.tdm-pilot.org/key-terms/#list) and [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary) including:\n", "\n", "* The `in` and `not in` [operators](https://docs.tdm-pilot.org/key-terms/#operator)\n", "* [Lists](https://docs.tdm-pilot.org/key-terms/#list)\n", "* [List](https://docs.tdm-pilot.org/key-terms/#list) methods (`index()`, `append()`, `insert()`, `sort()`)\n", "* [Dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary)\n", "* [Dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) methods (`update()`, `keys()`, `values()`, `items()`, `get()`)\n", "\n", "This is part 3 of 3 in the series *Python Basics* that will prepare you to do text analysis using the [Python](https://docs.tdm-pilot.org/key-terms/#python) programming language.\n", "\n", "**Use Case:** For Learners (Detailed explanation, not ideal for researchers)\n", "\n", "**Difficulty:** Beginner\n", "\n", "**Completion Time:** 90 minutes\n", "\n", "**Knowledge Required:** \n", "* [Getting Started with Jupyter Notebooks](./getting-started-with-jupyter.ipynb)\n", "* [Python Basics I](./python-basics-1.ipynb)\n", "* [Python Basics II](./python-basics-2.ipynb)\n", "\n", "**Knowledge Recommended:** None\n", "\n", "**Data Format:** None\n", "\n", "**Libraries Used:** None\n", "\n", "**Research Pipeline:** None\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In *Python Basics* I, we learned about three data types: [integers](https://docs.tdm-pilot.org/key-terms/#integer), [floats](https://docs.tdm-pilot.org/key-terms/#float), and [strings](https://docs.tdm-pilot.org/key-terms/#string). In this lesson, we will learn about two additional data types: [lists](https://docs.tdm-pilot.org/key-terms/#list) and [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary). [Lists](https://docs.tdm-pilot.org/key-terms/#list) and [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary) help us store many values inside a single [variable](https://docs.tdm-pilot.org/key-terms/#variable). This is helpful for a few reasons.\n", "\n", "* We can store many items in a single [list](https://docs.tdm-pilot.org/key-terms/#list) or [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary), making it easier to keep the data together \n", "* [Lists](https://docs.tdm-pilot.org/key-terms/#list) and [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary) only require a single [assignment state](https://docs.tdm-pilot.org/key-terms/#assignment-statement)\n", "* [Lists](https://docs.tdm-pilot.org/key-terms/#list) and [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary) have additional capabilities that will make organizing our data easier\n", "\n", "The fundamental difference between a [list](https://docs.tdm-pilot.org/key-terms/#list) and a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) is that a [list](https://docs.tdm-pilot.org/key-terms/#list) stores items in sequential order (starting from 0) while a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) stores items in [key/value pairs](https://docs.tdm-pilot.org/key-terms/#key-value-pair). When we want to retrieve an item in a list, we use an [index number](https://docs.tdm-pilot.org/key-terms/#index-number) or a set of [index numbers](https://docs.tdm-pilot.org/key-terms/#index-number) called a [slice](https://docs.tdm-pilot.org/key-terms/#slice) as a reference. When we want to retrieve an item from a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary), we supply a [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) that returns the [value](https://docs.tdm-pilot.org/key-terms/#key-value-pair) (or set of values) associated with that [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair). Each of these approaches can be beneficial depending on what kind of data we are working with (and what we intend to do with the data). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lists\n", "\n", "A [list](https://docs.tdm-pilot.org/key-terms/#list) can store anywhere from zero to millions of items. The items that can be stored in a [list](https://docs.tdm-pilot.org/key-terms/#list) include the data types we have already learned: [integers](https://docs.tdm-pilot.org/key-terms/#integer), [floats](https://docs.tdm-pilot.org/key-terms/#float), and [strings](https://docs.tdm-pilot.org/key-terms/#string). A [list](https://docs.tdm-pilot.org/key-terms/#list) [assignment state](https://docs.tdm-pilot.org/key-terms/#assignment-statement) takes the form.\n", "`my_list = [item1, item2, item3, item4...]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A list containing integers\n", "my_favorite_numbers = [7, 21, 100]\n", "print(my_favorite_numbers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A list containing strings\n", "my_inspirations = ['Harriet Tubman', 'Rosa Parks', 'Pauli Murray']\n", "print(my_inspirations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both `my_favorite_numbers` and `my_inspirations` have three items, but we could have also initialized them with no items `my_favorite_numbers = []` or many more items. Each item has an [index number](https://docs.tdm-pilot.org/key-terms/#index-number) that depends on their order. The first item is 0, the second item is 1, the third item is 2, etc. In the `my_inspirations` list, `'Pauli Murray'` is item 2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieving an item in a list\n", "my_inspirations[2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**What happens if you change the index number to 1? What about 3? 2.0?**\n", "\n", "___\n", "\n", "\n", "[Lists](https://docs.tdm-pilot.org/key-terms/#list) can also contain other [lists](https://docs.tdm-pilot.org/key-terms/#list). To retrieve a value from a [list](https://docs.tdm-pilot.org/key-terms/#list) within a [list](https://docs.tdm-pilot.org/key-terms/#list), we use two [indexes](https://docs.tdm-pilot.org/key-terms/#index-number) (or indices). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieving an item from a list within a list\n", "my_inspirations = [['Harriet Tubman', 'Rosa Parks', 'Pauli Murray'], ['Martin Luther King Jr.', 'Frederick Douglass', 'Malcolm X']]\n", "my_inspirations[0][2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Can you change the index for `my_inspirations` to retrieve `'Malcolm X'`?**\n", "___\n", "\n", "We can also select items from a [list](https://docs.tdm-pilot.org/key-terms/#list) beginning from the end/right side of a [list](https://docs.tdm-pilot.org/key-terms/#list) by using negative [index numbers](https://docs.tdm-pilot.org/key-terms/#index-number)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Retrieving an item from the end of a list by using negative indices\n", "my_inspirations = ['Harriet Tubman', 'Rosa Parks', 'Pauli Murray']\n", "my_inspirations[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also retrieve a group of consecutive items from a [list](https://docs.tdm-pilot.org/key-terms/#list) using [slices](https://docs.tdm-pilot.org/key-terms/#slice) instead of a single [index number](https://docs.tdm-pilot.org/key-terms/#index-number). We create a [slice](https://docs.tdm-pilot.org/key-terms/#slice) by indicating a starting and ending [index number](https://docs.tdm-pilot.org/key-terms/#index-number). The [slice](https://docs.tdm-pilot.org/key-terms/#slice) is a smaller [list](https://docs.tdm-pilot.org/key-terms/#list) containing all the items between our starting and stopping [index number](https://docs.tdm-pilot.org/key-terms/#index-number)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Taking a slice of a list\n", "historical_periods = ['Classical Antiquity', 'Early Middle Ages', 'High Middle Ages', 'Late Middle Ages', 'Early Modern Period', 'Late Modern Period', 'Contemporary History']\n", "historical_periods[3:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice in our [slice](https://docs.tdm-pilot.org/key-terms/#slice) that the second [index](https://docs.tdm-pilot.org/key-terms/#index-number) in a [slice](https://docs.tdm-pilot.org/key-terms/#slice) is the stopping point. That is our return [list](https://docs.tdm-pilot.org/key-terms/#list) contains `historical_periods[3]` (`'Late Middle Ages'`) and `historical_periods[4]` (`'Early Modern Period'`), but it does not include `historical_periods[5]` (`'Late Modern Period'`). This can be confusing if you were expecting three items instead of two. One way to remember this is by subtracting the [indexes](https://docs.tdm-pilot.org/key-terms/#index-number) in your head (5 - 3 = 2 items).\n", "\n", "It is not uncommon for [lists](https://docs.tdm-pilot.org/key-terms/#list) to be hundreds or thousands of items long. It would be a chore to count all those items to create a [slice](https://docs.tdm-pilot.org/key-terms/#slice). If you want to know the length of a [list](https://docs.tdm-pilot.org/key-terms/#list), you can use the `len()` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the len() function to discover the number of items in the list\n", "historical_periods = ['Classical Antiquity', 'Early Middle Ages', 'High Middle Ages', 'Late Middle Ages', 'Early Modern Period', 'Late Modern Period', 'Contemporary History']\n", "len(historical_periods)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `historical_periods` [list](https://docs.tdm-pilot.org/key-terms/#list) is 7 items long, meaning the whole [list](https://docs.tdm-pilot.org/key-terms/#list) is within the [slice](https://docs.tdm-pilot.org/key-terms/#slice) `historical_periods[0:6]`. When we take a [slice](https://docs.tdm-pilot.org/key-terms/#slice) of a [list](https://docs.tdm-pilot.org/key-terms/#list) we can also leave out the first [index number](https://docs.tdm-pilot.org/key-terms/#index-number) (0 is assumed) or the stopping [index number](https://docs.tdm-pilot.org/key-terms/#index-number) (the last item is assumed)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Taking a slice of a list without a beginning index\n", "historical_periods = ['Classical Antiquity', 'Early Middle Ages', 'High Middle Ages', 'Late Middle Ages', 'Early Modern Period', 'Late Modern Period', 'Contemporary History']\n", "historical_periods[:2]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Taking a slice of a list without a stopping index\n", "historical_periods = ['Classical Antiquity', 'Early Middle Ages', 'High Middle Ages', 'Late Middle Ages', 'Early Modern Period', 'Late Modern Period', 'Contemporary History']\n", "historical_periods[4:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `in` and `not in` Operators\n", "\n", "If we have a long [list](https://docs.tdm-pilot.org/key-terms/#list), it may be helpful to check whether a value is in the [list](https://docs.tdm-pilot.org/key-terms/#list). We can do this with the `in` and `not in` operators, which return a [boolean value](https://docs.tdm-pilot.org/key-terms/#boolean-value): **True** or **False**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Checking whether an item is in a list using the `in` operator\n", "\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen',\n", " 'Jacqueline Gallagher',\n", " 'Carlos Mcdowell',\n", " 'Jeffrey Harris',\n", " 'Danielle Griffith',\n", " 'Sarah Craig',\n", " 'Vernon Vasquez',\n", " 'Anthony Burton',\n", " 'Erica Bryant',\n", " 'Patricia Walker',\n", " 'Karen Brown',\n", " 'Terri Walker',\n", " 'Michelle Knight',\n", " 'Kathleen Douglas',\n", " 'Debbie Estrada',\n", " 'Jennifer Brewer',\n", " 'Taylor Rodriguez',\n", " 'Lisa Turner',\n", " 'Julie Hudson',\n", " 'Christina Cox',\n", " 'Nancy Patrick',\n", " 'Rita Mosley',\n", " 'Nicholas Gordon',\n", " 'Wanda Vasquez',\n", " 'Jason Lopez',\n", " 'Anna Powers',\n", " 'Tyler Perez']\n", "\n", "'Patricia Walker' in staff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Is Rita McDowell in `staff`? What about Erica Bryant?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can change the value of any item in a [list](https://docs.tdm-pilot.org/key-terms/#list) using an [assignment statement](https://docs.tdm-pilot.org/key-terms/#assignment-statement) that contains the item's [index number](https://docs.tdm-pilot.org/key-terms/#index-number)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Changing the value of an item in a list\n", "historical_periods = ['Classical Antiquity', 'Dark Ages', 'Modern Period']\n", "print(historical_periods)\n", "historical_periods[1] = 'Middle Ages'\n", "print(historical_periods)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we know how to change an item in a [list](https://docs.tdm-pilot.org/key-terms/#list). And can use the `in` and `not in` operator to determine if an item is in the [list](https://docs.tdm-pilot.org/key-terms/#list). But if we don't know the [index number](https://docs.tdm-pilot.org/key-terms/#index-number) of the item, we won't know which [index number](https://docs.tdm-pilot.org/key-terms/#index-number) to change. We need a way to pass both the name of the [list](https://docs.tdm-pilot.org/key-terms/#list) and the name of the item simultaneously to a [function](https://docs.tdm-pilot.org/key-terms/#function). To do that, we'll use a special kind of [function](https://docs.tdm-pilot.org/key-terms/#function) called a [method](https://docs.tdm-pilot.org/key-terms/#method). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# List Methods\n", "\n", "A [method](https://docs.tdm-pilot.org/key-terms/#method) is a kind of [function](https://docs.tdm-pilot.org/key-terms/#function). (Remember, [functions](https://docs.tdm-pilot.org/key-terms/#function) end in parentheses.) [Methods](https://docs.tdm-pilot.org/key-terms/#method), however, act on objects (like [lists](https://docs.tdm-pilot.org/key-terms/#list)) so they have a slightly different written form. We will take a look at five useful [methods](https://docs.tdm-pilot.org/key-terms/#method) for working with [lists](https://docs.tdm-pilot.org/key-terms/#list).\n", "\n", "|Method Name | Purpose | Form |\n", "|---|---|---|\n", "|index()| search for an item in a list and return the index number | list_name.index(item_name)|\n", "|append()| add an item to the end of a list | list_name.append(item_name)|\n", "|insert()| insert an item in the middle of a list | list_name.insert(index_number, item_name)|\n", "|remove()| remove an item from a list based on value | list_name.remove('item_value')|\n", "|sort()| sort the order of a list | list_name.sort()|\n", "\n", "## The `index()` Method\n", "\n", "The `index()` [method](https://docs.tdm-pilot.org/key-terms/#method) checks to see if a value is in a [list](https://docs.tdm-pilot.org/key-terms/#list). If the value is found, it returns the [index number](https://docs.tdm-pilot.org/key-terms/#index-number) for the first item with that value. (Keep in mind, there could be multiple items with a single value in a [list](https://docs.tdm-pilot.org/key-terms/#list)). If the value is not found, the `index()` [method](https://docs.tdm-pilot.org/key-terms/#method) returns a `ValueError`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the index() method to return the index number of an item by passing a value\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "\n", "staff" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**What is the index number of Lisa Grant in `staff`?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `append()` Method\n", "\n", "The `append()` [method](https://docs.tdm-pilot.org/key-terms/#method) adds a value to the end of a [list](https://docs.tdm-pilot.org/key-terms/#list)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Can you add your name to `staff`?**\n", "___" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the append() method to add an item to the end of a list\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "# Write your append statement under this comment\n", "\n", "list(staff) # Prints `staff` in a vertical list format to check that the name was added. The list() function will print our list in an easier to read format than the print() function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also add an item to the end of a [list](https://docs.tdm-pilot.org/key-terms/#list) by using an [assignment statement](https://docs.tdm-pilot.org/key-terms/#assignment-statement)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding an item to a list using an assignment statement\n", "staff = staff + ['Einhorn Finkle'] # Concatenate name_list with the list ['Einhorn Finkle']\n", "list(staff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `insert()` Method\n", "\n", "The `insert()` [method](https://docs.tdm-pilot.org/key-terms/#method) is similar to `append()` but it takes an [argument](https://docs.tdm-pilot.org/key-terms/#argument) that lets us choose an [index number](https://docs.tdm-pilot.org/key-terms/#index-number) to insert the new item." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the insert() method to add an item at a specific index number\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "\n", "staff.insert(5, 'Arya Stark') # Insert the name 'Arya Stark' at index 5 (The sixth name on the list)\n", "list(staff) # Prints `staff` to check that the name was added at the right spot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Can you make your name the third item on `staff`?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `remove()` Method\n", "\n", "The `remove()` [method](https://docs.tdm-pilot.org/key-terms/#method) removes the first item from the [list](https://docs.tdm-pilot.org/key-terms/#list) that has a matching value. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the remove() method to remove the first item with a matching value\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "# Write your remove statement under this comment\n", "\n", "list(staff) # Prints `staff` to check that the name was removed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Can you remove Glenn Allen from `staff`?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you know the value you wish to remove then the `remove()` [method](https://docs.tdm-pilot.org/key-terms/#method) is the best option. If you know the [index number](https://docs.tdm-pilot.org/key-terms/#index-number) of the item, you can use a `del` statement to delete [list](https://docs.tdm-pilot.org/key-terms/#list) items." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using a `del` statement to delete a list item\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "del staff[-1] # Delete the final item in the list. In this case, 'Glenn Allen'.\n", "list(staff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `sort()` Method\n", "\n", "The `sort()` [method](https://docs.tdm-pilot.org/key-terms/#method) sorts a [list](https://docs.tdm-pilot.org/key-terms/#list) in alphabetical order, where strings with capital letters are sorted A-Z, then strings with lowercase letters are sorted A-Z." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the `sort()` method to sort a list in alpha-numeric order\n", "staff = ['Tara Richards',\n", " 'Tammy French',\n", " 'Justin Douglas',\n", " 'Lauren Marquez',\n", " 'Aaron Wilson',\n", " 'Dennis Howell',\n", " 'Brandon Reed',\n", " 'Kelly Baker',\n", " 'Justin Howard',\n", " 'Sarah Myers',\n", " 'Vanessa Burgess',\n", " 'Timothy Davidson',\n", " 'Jessica Lee',\n", " 'Christopher Miller',\n", " 'Lisa Grant',\n", " 'Ryan Chan',\n", " 'Gary Carson',\n", " 'Anthony Mitchell',\n", " 'Jacob Turner',\n", " 'Jennifer Bonilla',\n", " 'Rachel Gonzalez',\n", " 'Andrew Clark',\n", " 'Richard Pearson',\n", " 'Glenn Allen']\n", "staff.sort()\n", "list(staff)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**The `sort()` method can take the argument `reverse=True`. Try applying it to the `sort()` method above. What does it do?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Coding Challenge! < / >

\n", "\n", "**Using your knowledge of flow control statements and lists, can you write a program that transforms a list of strings into a single string of comma-separated values?**\n", "___" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Level 1 Challenge\n", "# A challenge to transform a list of books into a single string of comma-separated values\n", "# The result should be the string: 'Piers Plowman, The Canterbury Tales, Revelations of Divine Love, The Decameron, Le Morte d'Arthur'\n", "book_list = ['Piers Plowman', 'The Canterbury Tales', 'Revelations of Divine Love' 'The Decameron', \"Le Morte d'Arthur\"]\n", "\n", "# Level 2 Challenge\n", "# For even more challenge, transform a list of lists containing books and authors into a single string of comma-separated values with an 'and' before the final value. Oxford comma please.\n", "# The result should be the string: 'Piers Plowman by William Langland, The Canterbury Tales by Geoffrey Chaucer, Revelations of Divine Love by Julian of Norwich, The Decameron by Giovanni Boccaccio, and Le Morte d'Arthur by Sir Thomas Malory'\n", "\n", "books_with_authors = [['Piers Plowman', 'William Langland'], ['The Canterbury Tales', 'Geoffrey Chaucer'], ['Revelations of Divine Love', 'Julian of Norwich'], ['The Decameron', 'Giovanni Boccaccio'], [\"Le Morte d'Arthur\", 'Sir Thomas Malory']]\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dictionaries\n", "\n", "Like a [list](https://docs.tdm-pilot.org/key-terms/#list), a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) can hold many values within a single [variable](https://docs.tdm-pilot.org/key-terms/#variable). We have seen that the items of a [list](https://docs.tdm-pilot.org/key-terms/#list) are stored in a strictly-ordered fashion, starting from item 0. In a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary), each [value](https://docs.tdm-pilot.org/key-terms/#key-value-pair) is stored in relation to a descriptive [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) forming a [key/value pair](https://docs.tdm-pilot.org/key-terms/#key-value-pair). Technically, as of [Python](https://docs.tdm-pilot.org/key-terms/#python) 3.7 (June 2018), [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary) are also ordered by insertion. In practice, however, the most useful aspect of a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) is the ability to supply a [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) and receive a [value](https://docs.tdm-pilot.org/key-terms/#key-value-pair) without reference to [indices](https://docs.tdm-pilot.org/key-terms/#index-number). Whereas a [list](https://docs.tdm-pilot.org/key-terms/#list) is typed with brackets `[]`, a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) is typed with braces `{}`. The [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) and/or [value](https://docs.tdm-pilot.org/key-terms/#key-value-pair) can be an [integer](https://docs.tdm-pilot.org/key-terms/#integer), [float](https://docs.tdm-pilot.org/key-terms/#float), or [string](https://docs.tdm-pilot.org/key-terms/#string).\n", "\n", "`example_dictionary = {key1 : value1, key2 : value2, key3 : value3}`\n", "\n", "We could imagine, for example, a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) with our professional contacts' names as [keys](https://docs.tdm-pilot.org/key-terms/#key-value-pair) and their occupations as [values](https://docs.tdm-pilot.org/key-terms/#key-value-pair)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# An example of a dictionary storing names and occupations\n", "contacts ={\n", " 'Amanda Bennett': 'Engineer, electrical',\n", " 'Bryan Miller': 'Radiation protection practitioner',\n", " 'Christopher Garrison': 'Planning and development surveyor',\n", " 'Debra Allen': 'Intelligence analyst',\n", " 'Donna Decker': 'Architect',\n", " 'Heather Bullock': 'Media planner',\n", " 'Jason Brown': 'Energy manager',\n", " 'Jason Soto': 'Lighting technician, broadcasting/film/video',\n", " 'Marissa Munoz': 'Further education lecturer',\n", " 'Matthew Mccall': 'Chief Technology Officer',\n", " 'Michael Norman': 'Translator',\n", " 'Nicole Leblanc': 'Financial controller',\n", " 'Noah Delgado': 'Engineer, land',\n", " 'Rachel Charles': 'Physicist, medical',\n", " 'Stephanie Petty': 'Architect'}\n", "from pprint import pprint # We import the pretty print function which prints out dictionaries in a neater fashion than the built-in print() function\n", "pprint(contacts) # Use the pretty print function to print `contacts`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can add a new [key/value pair](https://docs.tdm-pilot.org/key-terms/#key-value-pair) to our [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) using an [assignment statement](https://docs.tdm-pilot.org/key-terms/#assignment-statement)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding the key 'Nathan Kelber' with the value 'Digital Humanities Fellow' to the dictionary contact\n", "contacts['Nathan Kelber'] = 'Digital Humanities Fellow'\n", "\n", "pprint(contacts) # Use the pretty print function to print `contacts`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Can you add your name to the contacts dictionary?**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similar to deleting an item from a [list](https://docs.tdm-pilot.org/key-terms/#list), we can use a `del` statement to delete a [key/value pair](https://docs.tdm-pilot.org/key-terms/#key-value-pair). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Deleting a key/value pair from the dictionary contacts\n", "del contacts['Bryan Miller'] \n", "\n", "pprint(contacts) # Use the pretty print function to print `contacts`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dictionary Methods\n", "\n", "We'll take a look at five useful [methods](https://docs.tdm-pilot.org/key-terms/#method) for working with [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary): `update()`, `keys()`, `values()`, `items()`, and `get()`.\n", "\n", "|Method Name | Purpose | Form |\n", "|---|---|---|\n", "|update()| add new key/value pairs to a dictionary | dict_name.update(\\[(key1, value1), (key2, value2)])|\n", "|   | combine two dictionaries |dict_name.update(dict_name2)|\n", "|keys()| check if a key is in a dictionary (True/False) | key_name in dict_name.keys()|\n", "|   | Loop through the keys in a dictionary | for k in dict.keys():|\n", "|values()| check if a value is in a dictionary (True/False) | value_name in dict_name.values()|\n", "|  | Loop through the values in a dictionary | for v in dict.values():|\n", "|items()| Loop through the keys and values in a dictionary | for k, v in dict.items():|\n", "|get()| retrieve the value for a specific key | dict_name.get(key_name) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `update()` Method\n", "\n", "The `update()` [method](https://docs.tdm-pilot.org/key-terms/#method) is useful for adding many [key/value pairs](https://docs.tdm-pilot.org/key-terms/#key-value-pair) to a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) at once. The `update()` [method](https://docs.tdm-pilot.org/key-terms/#method) accepts a single [key/value pair](https://docs.tdm-pilot.org/key-terms/#key-value-pair), multiple pairs, or even other [dictionaries](https://docs.tdm-pilot.org/key-terms/#dictionary)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adding several key/value pairs to the dictionary contacts using the update() method\n", "#contacts.update([('Matthew Kirschenbaum', 'Professor'), ('Sarah Morris', 'Librarian')])\n", "contacts.update([('Jason Hanover', 'Animal Trainer')])\n", "pprint(contacts) # Use the pretty print function to print `contacts`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `keys()` and `values()` Methods\n", "\n", "The `keys()`, `values()`, and `items()` [methods](https://docs.tdm-pilot.org/key-terms/#method) are useful for when checking whether a particular [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) or [value](https://docs.tdm-pilot.org/key-terms/#key-value-pair) exists in a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary). We can pair them with `in` or `not in` operators to check whether a value is in our [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary) (just like we did with [lists](https://docs.tdm-pilot.org/key-terms/#list)). " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Checking whether a string exists within a key, value, or both.\n", "contacts ={\n", " 'Amanda Bennett': 'Engineer, electrical',\n", " 'Bryan Miller': 'Radiation protection practitioner',\n", " 'Christopher Garrison': 'Planning and development surveyor',\n", " 'Debra Allen': 'Intelligence analyst',\n", " 'Donna Decker': 'Architect',\n", " 'Heather Bullock': 'Media planner',\n", " 'Jason Brown': 'Energy manager',\n", " 'Jason Soto': 'Lighting technician, broadcasting/film/video',\n", " 'Marissa Munoz': 'Further education lecturer',\n", " 'Matthew Mccall': 'Chief Technology Officer',\n", " 'Michael Norman': 'Translator',\n", " 'Nicole Leblanc': 'Financial controller',\n", " 'Noah Delgado': 'Engineer, land',\n", " 'Rachel Charles': 'Physicist, medical',\n", " 'Stephanie Petty': 'Architect'}\n", "\n", "contacts" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Checking if the value Architect is in the contacts dictionary\n", "# Do I know an architect?\n", "'Architect' in contacts.values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `get()` Method\n", "\n", "If we are sure a [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) exists, we can use\n", "\n", "`dict_name[key_name]` \n", "\n", "to return a value. However, if the [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) is not found, the result will be a `KeyError`. The more robust approach is usually to use the `get()` [method](https://docs.tdm-pilot.org/key-terms/#method). If the [key](https://docs.tdm-pilot.org/key-terms/#key-value-pair) is not found, the `None` value will be returned. (Optionally, we can also specify a default message to return.)\n", "\n", "`dict_name.get('key_name', 'key_not_found_message')`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Using the get() method to retrieve the value for the key 'Marissa Munoz'\n", "contacts.get('Marissa Munoz', 'No contact with that name')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "

Try it! < / >

\n", "\n", "**Try searching for a name not in contacts with `contacts[name]` and then with `contacts.get('name', 'No contact with that name')`**\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining `keys()`, `values()`, and `items()` with Flow Control Statements\n", "\n", "It is often usful to combine `for` loops with the keys(), values(), or items() [methods](https://docs.tdm-pilot.org/key-terms/#method) to repeat a task for each entry in a [dictionary](https://docs.tdm-pilot.org/key-terms/#dictionary)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print every key in our contacts dictionary\n", "for k in contacts.keys(): # The variable `k` here is just a convenient shorthand. It could easily be named `key` or something else.\n", " print(k)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print every value in our contacts dictionary\n", "for v in contacts.values(): # The variable `v` here is just a convenient shorthand. It could easily be named `value` or something else.\n", " print(v)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print every key and value in our contacts dictionary\n", "for k, v in contacts.items():\n", " print('Key: ' + k + ' Value: ' + v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___\n", "# Lesson Complete\n", "\n", "Congratulations! You've completed the *Python Basics* series. \n", "\n", "Considering the amount of material in *Python Basics* I, II, and III, there's a good chance you won't retain it all. That's okay. Programmers often need to look up things to accomplish a task they haven't done in a while, particularly if it is in a language they don't often use. When you're working on a project, you can always come back to these lessons as reference materials. In other words, you've learned an incredible amount, so don't be surprised if it doesn't all stick at first.\n", "\n", "If you want to help yourself retain what you've learned, the best way is to start putting it into practice. Try your hand at creating some small Python projects and recognize that the things you've learned here will cement with time and practice. When you do forget a particular thing—as we all do—a quick web search often turns up some useful examples.\n", "\n", "\n", "## Start Another Python Skills Lesson: \n", "* [Working with Dataset Files](./working-with-dataset-files.ipynb)\n", "* [Pandas I](./pandas-1.ipynb)\n", "\n", "## Start a Text Analysis Lesson:\n", "* [Exploring Metadata](./exploring-metadata.ipynb)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coding Challenge! Solutions\n", "\n", "There are often many ways to solve programming problems. Here are a few possible ways to solve the challenges, but there are certainly more!\n", "\n", "### Challenge Level 1\n", "\n", "Transform a list of books into a single string of comma-separated values. Using the list:\n", "\n", "`book_list = ['Piers Plowman', 'The Canterbury Tales', 'Revelations of Divine Love', 'The Decameron', \"Le Morte d'Arthur\"]`\n", "\n", "Print the string:\n", "`'Piers Plowman, The Canterbury Tales, Revelations of Divine Love, The Decameron, Le Morte d'Arthur'`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Assign list to book_list\n", "book_list = ['Piers Plowman', 'The Canterbury Tales', 'Revelations of Divine Love', 'The Decameron', \"Le Morte d'Arthur\"]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 1\n", "# For loop\n", "\n", "for book in book_list[:-1]: \n", " print(book + ', ', end='')\n", "print(book_list[-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 2 \n", "# For i in range()\n", "\n", "for i in range(len(book_list) - 1):\n", " print(book_list[i] + ', ', end='')\n", "print(book_list[-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 3\n", "# While loop\n", "\n", "# While loop\n", "i=0\n", "while i < (len(book_list) - 1):\n", " print(book_list[i] + ', ', end='')\n", " i += 1\n", "print(book_list[-1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 4\n", "# String join method\n", "# (We did not learn this join method, but it does exactly what we want)\n", "print(', '.join(book_list))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Challenge Level 2\n", "\n", "Transform a list of lists containing books and authors into a single string of comma-separated values with an 'and' before the final value. Oxford comma please.\n", "\n", "Using the list:\n", "\n", "`[['Piers Plowman', 'William Langland'], ['The Canterbury Tales', 'Geoffrey Chaucer'], ['Revelations of Divine Love', 'Julian of Norwich'], ['The Decameron', 'Giovanni Boccaccio'], [\"Le Morte d'Arthur\", 'Sir Thomas Malory']]`\n", "\n", "Print the string:\n", "\n", "`'Piers Plowman by William Langland, The Canterbury Tales by Geoffrey Chaucer, Revelations of Divine Love by Julian of Norwich, The Decameron by Giovanni Boccaccio, and Le Morte d'Arthur by Sir Thomas Malory'`\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Assign list to books_with_authors\n", "books_with_authors = [['Piers Plowman', 'William Langland'], ['The Canterbury Tales', 'Geoffrey Chaucer'], ['Revelations of Divine Love', 'Julian of Norwich'], ['The Decameron', 'Giovanni Boccaccio'], [\"Le Morte d'Arthur\", 'Sir Thomas Malory']]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 1\n", "# For loop\n", "for b in books_with_authors[:-1]:\n", " print(b[0] + ' by ' + b[1] + ', ', end='')\n", "print('and ' + books_with_authors[-1][0] + ' by ' + books_with_authors[-1][1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 2\n", "# For i in range()\n", "\n", "for i in range(len(books_with_authors) - 1):\n", " print(books_with_authors[i][0] + ' by ' + books_with_authors[i][1] + ', ', end='')\n", "print('and ' + books_with_authors[-1][0] + ' by ' + books_with_authors[-1][1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 3\n", "# While loop\n", "i=0\n", "while i < (len(books_with_authors) - 1):\n", " print(books_with_authors[i][0] + ' by ' + books_with_authors[i][1] + ', ', end='')\n", " i += 1\n", "print('and ' + books_with_authors[-1][0] + ' by ' + books_with_authors[-1][1])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Solution 4\n", "# String join method \n", "# (We did not learn this method, but it does exactly what we want.)\n", "\n", "new_list = []\n", "for b in books_with_authors[:-1]:\n", " new_list.append(str(b[0] + ' by ' + b[1]))\n", "\n", "print(', '.join(new_list) + ', and ' + books_with_authors[-1][0] + ' by ' + books_with_authors[-1][1])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "354px" }, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }