"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Colab setup ------------------\n",
"import os, sys, subprocess\n",
"if \"google.colab\" in sys.modules:\n",
" cmd = \"pip install --upgrade watermark\"\n",
" process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)\n",
"# ------------------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"**Dictionaries**, which is Python's built-in mapping object. A **mapping object** allows an arbitrary collection of objects to be indexed by an arbitrary collection of values. That's a mouthful. It is easier to understand instead by comparing to a sequence.\n",
"\n",
"Let's take a sequence of two strings, say a tuple containing a first and last name.\n",
"\n",
" name = ('jeffrey', 'lebowski')\n",
"\n",
"We are restricted on how we reference the sequence. I.e., the first name is `name[0]`, and the last name is `name[1]`. A mapping object could instead be indexed like `name['first name']` and `name['last name']`. You can imagine this would be very useful! A classic example in biology might be looking up amino acids that are coded for by given codons. E.g., you might want\n",
"\n",
" aa['CTT']\n",
"\n",
"to give you `'Leucine'`.\n",
"\n",
"Python's only build-in mapping type is a **dictionary**. You might imagine that the Oxford English Dictionary might conveniently be stored as a dictionary (obviously). I.e., you would not want to store definitions that have to be referenced like\n",
"\n",
" oed[103829]\n",
" \n",
"Rather, you would like to get definitions like this:\n",
"\n",
" oed['computer']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary syntax\n",
"\n",
"The syntax for creating a dictionary, as usual, is best seen through example."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 6, 'b': 7, 'c': 27.6}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict = {'a': 6, 'b': 7, 'c': 27.6}\n",
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A dictionary is created using curly braces (`{}`). Each entry has a **key**, followed by a colon, and then the value associated with the key. In the example above, the keys are all strings, which is the most common use case. Note that the items can be of any type; in the above example, they are `int`s and a `float`.\n",
"\n",
"We could also create the dictionary using the built-in `dict()` function, which can take a tuple of 2-tuples, each one containing a key-value pair."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 6, 'b': 7, 'c': 27.6}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict((('a', 6), ('b', 7), ('c', 27.6)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can make a dictionary with keyword arguments to the `dict()` function."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 'yes', 'b': 'no', 'c': 'maybe'}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dict(a='yes', b='no', c='maybe')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We do not need to have strings as the keys. In fact, any *immutable* object can be a key."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 'zero',\n",
" 1.7: [1, 2, 3],\n",
" (5, 6, 'dummy string'): 3.14,\n",
" 'strings are immutable': 42}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict = {\n",
" 0: 'zero',\n",
" 1.7: [1, 2, 3],\n",
" (5, 6, 'dummy string'): 3.14,\n",
" 'strings are immutable': 42\n",
"}\n",
"\n",
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, mutable objects cannot be used as keys."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m my_dict = {\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m'immutable is ok'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'mutable'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'not'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'ok'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m }\n",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"source": [
"my_dict = {\n",
" 'immutable is ok': 1,\n",
" ['mutable', 'not', 'ok']: 5\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A dictionary use case in bioinformatics\n",
"\n",
"It might be useful to quickly look up 3-letter amino acid codes. Dictionaries are particularly useful for this."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"aa_dict = {\n",
" \"A\": \"Ala\",\n",
" \"R\": \"Arg\",\n",
" \"N\": \"Asn\",\n",
" \"D\": \"Asp\",\n",
" \"C\": \"Cys\",\n",
" \"Q\": \"Gln\",\n",
" \"E\": \"Glu\",\n",
" \"G\": \"Gly\",\n",
" \"H\": \"His\",\n",
" \"I\": \"Ile\",\n",
" \"L\": \"Leu\",\n",
" \"K\": \"Lys\",\n",
" \"M\": \"Met\",\n",
" \"F\": \"Phe\",\n",
" \"P\": \"Pro\",\n",
" \"S\": \"Ser\",\n",
" \"T\": \"Thr\",\n",
" \"W\": \"Trp\",\n",
" \"Y\": \"Tyr\",\n",
" \"V\": \"Val\",\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful dictionary would contain the set of codons and the amino acids they code for. This is built in the code below using the `zip()` function we learned before. To see the logic on how this is constructed, see the codon table [here](https://en.wikipedia.org/wiki/DNA_codon_table)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'TTT': 'F',\n",
" 'TTC': 'F',\n",
" 'TTA': 'L',\n",
" 'TTG': 'L',\n",
" 'TCT': 'S',\n",
" 'TCC': 'S',\n",
" 'TCA': 'S',\n",
" 'TCG': 'S',\n",
" 'TAT': 'Y',\n",
" 'TAC': 'Y',\n",
" 'TAA': '*',\n",
" 'TAG': '*',\n",
" 'TGT': 'C',\n",
" 'TGC': 'C',\n",
" 'TGA': '*',\n",
" 'TGG': 'W',\n",
" 'CTT': 'L',\n",
" 'CTC': 'L',\n",
" 'CTA': 'L',\n",
" 'CTG': 'L',\n",
" 'CCT': 'P',\n",
" 'CCC': 'P',\n",
" 'CCA': 'P',\n",
" 'CCG': 'P',\n",
" 'CAT': 'H',\n",
" 'CAC': 'H',\n",
" 'CAA': 'Q',\n",
" 'CAG': 'Q',\n",
" 'CGT': 'R',\n",
" 'CGC': 'R',\n",
" 'CGA': 'R',\n",
" 'CGG': 'R',\n",
" 'ATT': 'I',\n",
" 'ATC': 'I',\n",
" 'ATA': 'I',\n",
" 'ATG': 'M',\n",
" 'ACT': 'T',\n",
" 'ACC': 'T',\n",
" 'ACA': 'T',\n",
" 'ACG': 'T',\n",
" 'AAT': 'N',\n",
" 'AAC': 'N',\n",
" 'AAA': 'K',\n",
" 'AAG': 'K',\n",
" 'AGT': 'S',\n",
" 'AGC': 'S',\n",
" 'AGA': 'R',\n",
" 'AGG': 'R',\n",
" 'GTT': 'V',\n",
" 'GTC': 'V',\n",
" 'GTA': 'V',\n",
" 'GTG': 'V',\n",
" 'GCT': 'A',\n",
" 'GCC': 'A',\n",
" 'GCA': 'A',\n",
" 'GCG': 'A',\n",
" 'GAT': 'D',\n",
" 'GAC': 'D',\n",
" 'GAA': 'E',\n",
" 'GAG': 'E',\n",
" 'GGT': 'G',\n",
" 'GGC': 'G',\n",
" 'GGA': 'G',\n",
" 'GGG': 'G'}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The set of DNA bases\n",
"bases = ['T', 'C', 'A', 'G']\n",
"\n",
"# Build list of codons\n",
"codon_list = []\n",
"for first_base in bases:\n",
" for second_base in bases:\n",
" for third_base in bases:\n",
" codon_list += [first_base + second_base + third_base]\n",
"\n",
"# The amino acids that are coded for (* = STOP codon)\n",
"amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'\n",
"\n",
"# Build dictionary from tuple of 2-tuples (technically an iterator, but it works)\n",
"codons = dict(zip(codon_list, amino_acids))\n",
"\n",
"# Show that we did it\n",
"codons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A dictionary is an implementation of a hash table\n",
"\n",
"It is useful to stop and think about how a dictionary works. Let's create a dictionary and look at where the values are stored in memory."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"140482662461504\n",
"4364753408\n",
"4364753440\n",
"140482662789552\n"
]
}
],
"source": [
"# Create dictionary\n",
"my_dict = {'a': 6, 'b': 7, 'c':12.6}\n",
"\n",
"# Find where they are stored\n",
"print(id(my_dict))\n",
"print(id(my_dict['a']))\n",
"print(id(my_dict['b']))\n",
"print(id(my_dict['c']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, each entry in the dictionary is stored at a different location in memory. The dictionary itself also has its own address. So, when I index a dictionary with a key, how does the dictionary know which address in memory to use to fetch the value I am interested in?\n",
"\n",
"Dictionaries use a **hash function** to do this. A hash function converts its input to an integer. For example, we can use Python's built-in hash function to convert the keys to integers."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(-3286243246506062721, -4637256982569203119, 4044209238390845933)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash('a'), hash('b'), hash('c')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Under the hood, Python then converts these integers to integers that could correspond to locations in memory. A collection of elements that can be indexed this way is called a **hash table**. This is a very common data structure in computing. Wikipedia has a [pretty good discussion on them](https://en.wikipedia.org/wiki/Hash_table).\n",
"\n",
"Given what you know about how dictionaries work, why do you think mutable objects are not acceptable as keys?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionaries are mutable\n",
"\n",
"Dictionaries are mutable. This means that they can be changed in place. For example, if we want to add an element to a dictionary, we use simple syntax."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'a': 6, 'b': 7, 'c': 12.6}\n",
"{'a': 6, 'b': 7, 'c': 12.6, 'd': 'Data analysis is so much fun!'}\n",
"{'a': 'I was not satisfied with entry a.', 'b': 7, 'c': 12.6, 'd': 'Data analysis is so much fun!'}\n"
]
}
],
"source": [
"# Remind ourselves what the dictionary is\n",
"print(my_dict)\n",
"\n",
"# Add an entry\n",
"my_dict['d'] = 'Data analysis is so much fun!'\n",
"\n",
"# Look at dictionary again\n",
"print(my_dict)\n",
"\n",
"# Change an entry\n",
"my_dict['a'] = 'I was not satisfied with entry a.'\n",
"\n",
"# Look at it again\n",
"print(my_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Membership operators with dictionaries\n",
"\n",
"The `in` and `not in` operators work with dictionaries, but both only query keys and _not_ values. We see this again by example."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(True, False, True)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Make a fresh my_dict\n",
"my_dict = {'a': 1, 'b': 2, 'c': 3}\n",
"\n",
"# in works with keys\n",
"'b' in my_dict, 'd' in my_dict, 'e' not in my_dict"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Try it with values\n",
"2 in my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yup! We get `False`. Why? Because `2` is not a *key* in `my_dict`. We can also iterate over the keys in a dictionary using the `in` keyword."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a : 1\n",
"b : 2\n",
"c : 3\n"
]
}
],
"source": [
"for key in my_dict:\n",
" print(key, ':', my_dict[key])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The best, and preferred, method, is to iterate over `key`,`value` pairs in a dictionary using the `items()` method of a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a 1\n",
"b 2\n",
"c 3\n"
]
}
],
"source": [
"for key, value in my_dict.items():\n",
" print(key, value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, however, that like lists, the `item`s that come out of the `my_dict.items()` iterator are *not* items in the dictionary, but copies of them. If you make changes within the `for` loop, you will not change entries in the dictionary."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 1, 'b': 2, 'c': 3}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for key, value in my_dict.items():\n",
" value = 'this string will not be in dictionary.'\n",
" \n",
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You will, though, if you use the keys."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 'this will be in the dictionary.',\n",
" 'b': 'this will be in the dictionary.',\n",
" 'c': 'this will be in the dictionary.'}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for key, _ in my_dict.items():\n",
" my_dict[key] = 'this will be in the dictionary.'\n",
" \n",
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Built-in functions for dictionaries\n",
"\n",
"The built-in `len()` function and `del` operation work on dictionaries. \n",
"\n",
"* `len(d)` gives the number of entries in dictionary `d`\n",
"* `del d[k]` deletes entry with key `k` from dictionary `d`\n",
"\n",
"This is the first time we've encountered the `del` keyword. This keyword is used to delete variables and their values from memory. The `del` keyword can also be to delete items from lists. Let's see things in practice."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_dict: {'a': 1, 'b': 2, 'c': 3, 'd': 4}\n",
"my_list: [1, 2, 3, 4]\n",
"length of my_dict: 4\n",
"length of my_list: 4\n",
"post-deleted my_dict: {'a': 1, 'c': 3, 'd': 4}\n",
"post-deleted my_list: [1, 3, 4]\n"
]
}
],
"source": [
"# Create my_list and my_dict for reference\n",
"my_dict = dict(a=1, b=2, c=3, d=4)\n",
"my_list = [1, 2, 3, 4]\n",
"\n",
"# Print them\n",
"print('my_dict:', my_dict)\n",
"print('my_list:', my_list)\n",
"\n",
"# Get lengths\n",
"print('length of my_dict:', len(my_dict))\n",
"print('length of my_list:', len(my_list))\n",
"\n",
"# Delete a key from my_dict\n",
"del my_dict['b']\n",
"\n",
"# Delete entry from my_list\n",
"del my_list[1]\n",
"\n",
"# Show post-deleted objects\n",
"print('post-deleted my_dict:', my_dict)\n",
"print('post-deleted my_list:', my_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, though, that you cannot delete an item from a tuple, since it's immutable."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "'tuple' object doesn't support item deletion",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mmy_tuple\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mdel\u001b[0m \u001b[0mmy_tuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m: 'tuple' object doesn't support item deletion"
]
}
],
"source": [
"my_tuple = (1, 2, 3, 4)\n",
"del my_tuple[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary methods\n",
"\n",
"Dictionaries have several built-in methods in addition to the `items()` you have already seen. Following are a few of them, assuming the dictionary is `d`.\n",
"\n",
"| method | effect |\n",
"|:-------|:-------|\n",
"|`d.keys()`|return keys|\n",
"|`d.pop(key)` | return value associated with `key` and delete `key` from `d`|\n",
"|`d.values()` | return the values in `d`|\n",
"\n",
"Let's try these out."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_keys(['a', 'b', 'c', 'd'])"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict = dict(a=1, b=2, c=3, d=4)\n",
"\n",
"my_dict.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that this is a `dict_keys` object. We cannot index it. If, say, we wanted to sort the keys and have them index-able, we would have to convert them to a list."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['a', 'b', 'c', 'd']"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sorted(list(my_dict.keys()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is not a usual use case, though, and be warned that doing then when this is not explicitly what you want can lead to bugs. Now let's try popping an entry out of the dictionary."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict.pop('c')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'a': 1, 'b': 2, 'd': 4}"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...and, as we expect, key `'c'` is now deleted, and its value was returned in the call to `my_dict.pop('c')`. Finally, we can look at the values."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dict_values([1, 2, 4])"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_dict.values()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We get a `dict_values` object, similar to the `dict_keys` object we got with the `my_dict.keys()` method.\n",
"\n",
"You can get more information about build-in methods from the [Python documentation](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionary comprehensions\n",
"\n",
"Like lists could be created using list comprehensions, so too can dictionaries be created using dictionary comprehensions. Recall from when we made a dictionary of codon/amino acid codes before that we build them using"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"codons = dict(zip(codon_list, amino_acids))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a reminder, here are the first five entries of `codon_list` and `amino_acids`."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"codon list: ['TTT', 'TTC', 'TTA', 'TTG', 'TCT']\n",
"amino_acids: FFLLS\n"
]
}
],
"source": [
"print('codon list:', codon_list[:5])\n",
"print('amino_acids:', amino_acids[:5])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We could create the same dictionary `codons` with a dictionary comprehension as follows."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
"codons = {key: value for key, value in zip(codon_list, amino_acids)}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A list comprehension works analogously to a list comprehension. The only differences are that the dictionary comprehension is enclosed in curly braces and each entry is given by a key and value *pair* (as opposed to a single entry for a list comprehension) using the syntax `key: value`. (The names of the variables can be anything, e.g., `k: v` is legal.) Of course, the iterators in the dictionary comprehension also need to emit both keys and values as well."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using dictionaries as kwargs\n",
"\n",
"A nifty feature of dictionaries is that they can be passed into functions as keyword arguments. We covered named keyword arguments in the lesson on functions. In addition to the named keyword arguments, a function can take in arbitrary keyword arguments (**not** arbitrary non-keyword arguments). This is specified in the function definition by including a last argument with a double-asterisk, `**`. The kwargs with the double-asterisk get passed in as a dictionary."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"def concatenate_sequences(a, b, **kwargs):\n",
" \"\"\"Concatenate (combine) 2 or more sequences.\"\"\"\n",
" seq = a + b\n",
"\n",
" for key in kwargs:\n",
" seq += kwargs[key]\n",
" \n",
" return seq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try it!"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'TGACACCAGGGAGGGGGGGGGAAAATTTTT'"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concatenate_sequences('TGACAC', 'CAGGGA', c='GGGGGGGGG', d='AAAATTTTT')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, imagine we have a dictionary that contains our values."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"my_dict = {\n",
" 'a': 'TGACAC', \n",
" 'b': 'CAGGGA', \n",
" 'c': 'GGGGGGGGG', \n",
" 'd': 'AAAATTTTT'\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now pass this directly into the function by preceding it with a double asterisk."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'TGACACCAGGGAGGGGGGGGGAAAATTTTT'"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"concatenate_sequences(**my_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Beautiful! This example is kind of trivial, but you can imagine that it can come in handy, e.g. with large sets of sequence fragments that you read in from a file. Many packages, particularly plotting packages, use `**kwargs` extensively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing environment"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPython 3.8.3\n",
"IPython 7.16.1\n",
"\n",
"jupyterlab 2.1.5\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -v -p jupyterlab"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}