{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 10 - Dictionaries\n", "*This notebook uses code snippets and explanation from [this course](https://github.com/kadarakos/python-course/blob/master/Chapter%205%20-%20Lists.ipynb)*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last type of container we will introduce in this topic is **dictionaries**. Programming is mostly about solving real-world problems as efficiently as possible, but it is also important to write and organize code in a human-readable fashion. A dictionary offers a kind of abstraction that comes in handy often: it is a type of \"associative memory\" or key:value storage. It allows you to describe two pieces of data and their relationship. \n", "\n", "**At the end of this chapter, you will:**\n", "* understand the relevance of dictionaries\n", "* know how to create a dictionary\n", "* know how to add items to a dictionary\n", "* know how to inspect/extract items from a dictionary\n", "* know how to count with a dictionary\n", "* know how to create nested dictionaries\n", "\n", "**If you want to learn more about these topics, you might find the following links useful:**\n", "* [Python documentation](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have **questions** about this chapter, please contact us **(cltl.python.course@gmail.com)**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Dictionaries\n", "Imagine that you are a teacher, and you've graded exams (everyone got high grades, of course). You would like to store this information so that you can *ask* the program for the grade of a particular student. After some thought, you first try to accomplish this using a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = ['Frank', 8, 'Susan', 7, 'Guido', 10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student = 'Frank'\n", "index_of_student = student_grades.index(student) # we use the index method (list.index)\n", "print('grade of', student, 'is', student_grades[index_of_student + 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, you're not happy about the solution. Every time you request a grade, we need to \n", "first determine the position of the student in the list and then use that index + 1 to obtain the grade. That's pretty inefficient. The take-home message here is that **lists are not really good if we want two pieces of information together**. Dictionaries for the rescue!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades['Frank']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. How to create a dictionary\n", "Let's take another look at the **student_grades** dictionary. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* a dictionary is surrounded by curly brackets, and the key/value pairs are separated by commas.\n", "* A dictionary consists of one or more **key:value pairs**. The key is the \"identifier\" or \"name\" that is used to describe the value.\n", "* the **keys** in a dictionary are unique\n", "* the syntax for a key/value pair is: KEY : VALUE\n", "* the keys (e.g. 'Frank') in a dictionary have to be **immutable**\n", "* the values (e.g., 8) in a dictionary can by **any python object**\n", "* a dictionary can be empty" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that **keys** in a dictionary have to **immutable**. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works (strings as keys):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This does not (list as keys):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_dict = {['a', 'list']: 8}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that the values in a dictionary can by **any python object**." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works (integers as values):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_dict = {'Frank': 8, 'Susan': 7}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But this as well (lists as values):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "another_dict = {'Frank' : [8], 'Susan' : [7]}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that a dictionary can be empty (use **dict()**):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "an_empty_dict = dict()\n", "another_empty_dict = {} # This works too, but it is less readable and confusing (looks similar to sets)\n", "print(type(another_empty_dict), type(an_empty_dict))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. How to add items to a dictionary\n", "There is one very simple way in order to add a **key:value** pair to a dictionary. Please look at the following code snippet:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_dict = dict()\n", "print(a_dict)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_dict['Frank'] = 8\n", "print(a_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that dictionary keys should be **unique** identifiers for the values in the dictionary. **Key:value** pairs get overwritten if you assign a different value to an existing key." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_dict = dict()\n", "a_dict['Frank'] = 8\n", "print(a_dict)\n", "a_dict['Frank'] = 7\n", "print(a_dict)\n", "a_dict['Frank'] = 9\n", "print(a_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. How to access data in a dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most basic operation on a dictionary is a **look-up**. Simply enter the key and the dictionary returns the value." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(student_grades['Frank'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If the key is not in the dictionary, it will return a KeyError." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades['Piet']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to avoid getting an error, you can use an **if-statement**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key = 'Piet'\n", "if key in student_grades:\n", " print(student_grades[key])\n", "else:\n", " print(key, 'not in dictionary')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "key = 'Frank'\n", "if key in student_grades:\n", " print(student_grades[key])\n", "else:\n", " print(key, 'not in dictionary')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the **keys** method returns the keys in a dictionary " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "the_keys = student_grades.keys()\n", "print(the_keys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "the **values** method returns the values in a dictionary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "the_values = student_grades.values()\n", "print(the_values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the built-in functions to inspect the keys and values. For example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "the_values = student_grades.values()\n", "print(len(the_values)) # number of values in a dict\n", "print(max(the_values)) # highest value of values in a dict\n", "print(min(the_values)) # lowest value of values in a dict\n", "print(sum(the_values)) # sum of all values of values in a dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, what if we want to know which students got a 8 or higher? The **items** method is very useful for this scenario. Please carefully look at the following code snippet." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "student_grades = {'Frank': 8, 'Susan': 7, 'Guido': 10}\n", "print(student_grades.items())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The **items** method returns a list of tuples. We can combine what we have learnt about looping and tuples to access the keys (the students' names) and values (their grades):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for key, value in student_grades.items(): # please note the tuple unpacking\n", " print(key, value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This also makes it possible to detect which students obtained a grade of 8 or higher." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for student, grade in student_grades.items():\n", " if grade > 7:\n", " print(student, grade)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Counting with a dictionary\n", "Dictionaries are very useful to derive statistics. For example, we can easily determine the frequency of each letter in a word." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "letter2freq = dict()\n", "word = 'hippo'\n", "\n", "for letter in word: \n", " \n", " if letter in letter2freq: # add 1 to the dictionary if the keys exists\n", " letter2freq[letter] += 1 # note: x +=1 does the same as x = x + 1\n", " else:\n", " letter2freq[letter] = 1 # set default value to 1 if key does not exists \n", "\n", " print(letter, letter2freq)\n", " \n", "print()\n", "print(letter2freq)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can do this as well with lists" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_sentence = ['Obama', 'was', 'the', 'president', 'of', 'the', 'USA']\n", "word2freq = dict()\n", "\n", "for word in a_sentence: \n", " \n", " if word in word2freq: # add 1 to the dictionary if the keys exists\n", " word2freq[word] += 1 \n", " else:\n", " word2freq[word] = 1 # set default value to 1 if key does not exists \n", "\n", " print(word, word2freq)\n", "\n", "print()\n", "print(word2freq)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python actually has a module, which is very useful for counting. It's called [collections](https://docs.python.org/3/library/collections.html#collections.Counter)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import Counter" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "word_freq = Counter(['Obama', 'was', 'the', 'president', 'of', 'the', 'USA'])\n", "print(word_freq)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feel free to start using this module **after** the assignment of this block." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Nested dictionaries\n", "Since dictionaries consists of **key:value** pairs, we can actually make another dictionary the **value** of a dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_nested_dictionary = {'a_key': \n", " {'nested_key1': 1,\n", " 'nested_key2': 2,\n", " 'nested_key3': 3}\n", " }\n", "print(a_nested_dictionary)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please note that the value is in fact a dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(a_nested_dictionary['a_key'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to access the nested value, we must do a look up for each key on each nested level" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "the_nested_value = a_nested_dictionary['a_key']['nested_key1']\n", "print(the_nested_value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Practice questions:\n", " \n", " What do sets and dictionaries have in common?\n", " What do lists and tuples have in common?\n", " Can you add things to a list?\n", " Can you add things to a tuples?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An overview:\n", " \n", "| property | set | list | tuple | dict keys | dict values | \n", "|------------------------------- |-------------------|-----------------|-------------|-----------|-------------|\n", "| **mutable** (can you add add/remove?) | yes | yes | no | yes | yes | \n", "| **can** contain duplicates | no | yes | yes | no | yes |\n", "| **ordered** | no | yes | yes | yes, but do not rely on it | depends on type of value |\n", "| **finding** element(s) | quick | slow | slow | quick | depends on type of value |\n", "| **can** contain | immutables | all | all | immutables | all |\n", "\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1:\n", "\n", "You are tying to keep track of your groceries using a python dictionary. Please add 'tomatoes', 'bread', 'chocolate bars' and 'pineapples' to your shopping dictionary and assign values according to how many items of each you would like to buy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2:\n", " \n", "Print the number of *pineapples* you would like to buy by using only one line of code and without printing the entire dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3:\n", "\n", "Use a loop and unpacking to print the items and numbers on your shopping list in the following format:\n", "\n", "Item: [Item], number: [number]\n", "\n", "e.g. Item: tomatoes, number: 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# you code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 4:\n", " \n", " Which container would you use to count the frequency of each word in a text?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 4 }