{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1.5 Dictionaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Estimated time for this notebook: 10 minutes*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.5.1 The Python Dictionary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python supports a container type called a dictionary."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is also known as an \"associative array\", \"map\" or \"hash\" in other languages."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a list, we use a number to look up an element:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "names = \"Martin Luther King\".split(\" \")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Luther'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "names[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In a dictionary, we look up an element using **another object of our choice**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "me = {\"name\": \"James\", \"age\": 39, \"Jobs\": [\"Programmer\", \"Teacher\"]}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'James', 'age': 39, 'Jobs': ['Programmer', 'Teacher']}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "me"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Programmer', 'Teacher']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "me[\"Jobs\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "39"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "me[\"age\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(me)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Keys and Values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The things we can use to look up with are called **keys**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['name', 'age', 'Jobs'])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "me.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The things we can look up are called **values**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_values(['James', 39, ['Programmer', 'Teacher']])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "me.values()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When we test for containment on a `dict` we test on the **keys**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"Jobs\" in me"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"James\" in me"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"James\" in me.values()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Immutable Keys Only"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The way in which dictionaries work is one of the coolest things in computer science:\n",
    "the \"hash table\". The details of this are beyond the scope of this course, but we will consider some aspects in the section on performance programming. \n",
    "\n",
    "One consequence of this implementation is that you can only use **immutable** things as keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "good_match = {(\"Lamb\", \"Mint\"): True, (\"Bacon\", \"Chocolate\"): False}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "but:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unhashable type: 'list'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Cell \u001b[0;32mIn [14], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m illegal \u001b[38;5;241m=\u001b[39m {\n\u001b[1;32m      2\u001b[0m     [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mLamb\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mMint\u001b[39m\u001b[38;5;124m\"\u001b[39m]: \u001b[38;5;28;01mTrue\u001b[39;00m,\n\u001b[1;32m      3\u001b[0m     [\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mBacon\u001b[39m\u001b[38;5;124m\"\u001b[39m, \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mChocolate\u001b[39m\u001b[38;5;124m\"\u001b[39m]: \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[1;32m      4\u001b[0m }\n",
      "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
     ]
    }
   ],
   "source": [
    "illegal = {[\"Lamb\", \"Mint\"]: True, [\"Bacon\", \"Chocolate\"]: False}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember -- square brackets denote lists, round brackets denote `tuple`s."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dictionary Order"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dictionaries will retain the order of the elements as they are defined (in Python versions >= 3.7)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}\n",
      "dict_values([0, 1, 2, 3, 4])\n"
     ]
    }
   ],
   "source": [
    "my_dict = {\"0\": 0, \"1\": 1, \"2\": 2, \"3\": 3, \"4\": 4}\n",
    "print(my_dict)\n",
    "print(my_dict.values())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'4': 4, '3': 3, '2': 2, '1': 1, '0': 0}\n",
      "dict_values([4, 3, 2, 1, 0])\n"
     ]
    }
   ],
   "source": [
    "rev_dict = {\"4\": 4, \"3\": 3, \"2\": 2, \"1\": 1, \"0\": 0}\n",
    "print(rev_dict)\n",
    "print(rev_dict.values())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python does not consider the order of the elements relevant to equality:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "my_dict == rev_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.5.2 Sets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A set is a `list` which cannot contain the same element twice.\n",
    "We make one by calling `set()` on any sequence, e.g. a list or string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "name = \"James Hetherington\"\n",
    "unique_letters = set(name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{' ', 'H', 'J', 'a', 'e', 'g', 'h', 'i', 'm', 'n', 'o', 'r', 's', 't'}"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unique_letters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or by defining a literal like a dictionary, but without the colons:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "primes_below_ten = {2, 3, 5, 7}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "set"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(unique_letters)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "set"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(primes_below_ten)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{' ', 'H', 'J', 'a', 'e', 'g', 'h', 'i', 'm', 'n', 'o', 'r', 's', 't'}"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "unique_letters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will be easier to read if we turn the set of letters back into a string, with `join`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'es mgtanorHiJh'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\".join(unique_letters)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A set has no particular order, but is really useful for checking or storing **unique** values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set operations work as in mathematics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = set(\"Hello\")\n",
    "y = set(\"Goodbye\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'e', 'o'}"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x & y  # Intersection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'G', 'H', 'b', 'd', 'e', 'l', 'o', 'y'}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x | y  # Union"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'G', 'b', 'd', 'y'}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y - x  # y intersection with complement of x: letters in Goodbye but not in Hello"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your programs will be faster and more readable if you use the appropriate container type for your data's meaning.\n",
    "Always use a set for lists which can't in principle contain the same data twice, always use a dictionary for anything\n",
    "which feels like a mapping from keys to values."
   ]
  }
 ],
 "metadata": {
  "jekyll": {
   "display_name": "Dictionaries"
  },
  "kernelspec": {
   "display_name": "Python 3.8.13 ('rse-course')",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13"
  },
  "vscode": {
   "interpreter": {
    "hash": "f650e974019050aa85dfa4360d34a701a94ec69355227db1f7582fa6641cb3af"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}