{ "metadata": { "name": "", "signature": "sha256:99bd286178abac125d920fd879ba8a3bb37e5c1031cf3bf018b8048f16aef3d2" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Hands-on: Python Fundamentals -- Dicts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Objectives:**\n", "\n", "Upon completion of this lesson, you should be able to:\n", "\n", "* Describe the characteristics of the `dict` in Python\n", "\n", "* Perform basic operations with `dict`s including creation, \"querying\", updates, and traversing\n", "\n", "* Get an idea in which situations `dict`s are and should be used" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The dictionary data structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "* \n", "In Python, a dictionary (or `dict`) is mapping between a set of\n", "indices (keys) and a set of values\n", "\n", "\n", "* \n", "The items in a dictionary are key-value pairs" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The dictionary data structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Keys can be any Python data type **BUT**\n", "\n", "\n", "* Because **keys** are used for indexing, they **should be immutable**\n", "\n", "\n", "* Values can be any Python data type\n", "\n", "\n", "* Values can be mutable or immutable" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Creating a dictionary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* There are a number of ways to create and fill a dictionary. E.g. you can create an empty one and keep assining new values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Create an empty dictionary\n", "eng2sp = dict()\n", "eng2sp = {} # equivalent\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp['one'] = 'uno'\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp['two'] = 'dos'\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Creating a dictionary \"hardcoded\" way" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "* \n", "In general, the order of items in a dictionary is unpredictable" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so never rely on any order of keys in the dictionary." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**Note**\n", "\n", "If you need to maintain the order in which elements were added to dictionary, use \"OrderedDict\" (see\n", "\n", "https://docs.python.org/2/library/collections.html#collections.OrderedDict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**Excercise**\n", "\n", "Try to create a dictionary with a key being\n", "- int\n", "- float\n", "- string\n", "- list of ints\n", "- some other dict\n", "\n", "Which ones would work, and which ones would fail? What is the message?" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Creating a dictionary from a list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create a dict from an iterable (e.g. list) which provides `(key, value)` pairs:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp = dict([('one', 'uno'), ('two', 'dos'), ('three', 'tres')])\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**Excercise**\n", "\n", "Quite frequently you might have already such two lists which contain your keys and values. Then you could easily create a necessary list of pairs from them up to produce a dictionary -- which function will you use? Do it:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng = ['one', 'two', 'three']\n", "sp = ['uno', 'dos', 'tres']\n", "dict(TODO)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Dictionary comprehension" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly to list comprehensions, there are dict-comprehensions which allow for faster, flexible, and concise in code dynamic creation of lists. E.g. if we wanted a dictionary only for those english words with 'e' in them:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "{e:s for e,s in zip(eng, sp) if 'e' in e}" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Dictionary indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Dictionaries are indexed by keys, not by a positional index as lists (but your keys could be integers as well)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp['three']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* If the index is not a key in the dictionary, Python raises an exception" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp['five']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Dictionary indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* You can check for the existence of a key using *in* operator (known also to e.g. *lists*) so you can avoid that error:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "if 'five' in eng2sp:\n", " print eng2sp['five']" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or use **.get** method and provide alternative default:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print eng2sp.get('five', \"sorry -- no spanish\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The in operator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Note that the *in* operator works a little bit differently for dictionaries than for other sequences\n", "\n", "\n", "* For offset indexed sequences (strings, lists, tuples), x in y checks to see whether x is an item in the sequence" ] }, { "cell_type": "code", "collapsed": false, "input": [ "6 in [4,5,6,7]\n", "True" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* For dictionaries, x in y checks to see whether x is a key in the dictionary" ] }, { "cell_type": "code", "collapsed": false, "input": [ "'two' in {'one':'uno', 'two':'dos', 'three':'tres'}" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "'dos' in {'one':'uno', 'two':'dos', 'three':'tres'}" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Deleting items" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Let's add some new value\n", "eng2sp['five'] = 'cinco'\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "del eng2sp['five']\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, if you are interested to obtain the value and remove it from the dictionary -- use `.pop`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "eng2sp['five'] = 'cinco'\n", "five_sp = eng2sp.pop('five')\n", "print five_sp\n", "print eng2sp" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Keys and values. \"(key, value)\" is an \"item\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "* \n", "The keys method returns a list of the keys in a dictionary" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print eng2sp.keys()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "* \n", "The values method returns a list of the values" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print eng2sp.values()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "* \n", "The items method returns a list of tuple pairs of the key-value pairs in a\n", "dictionary" ] }, { "cell_type": "code", "collapsed": false, "input": [ "print eng2sp.items()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- So if you need to check either your dictionary contains a specific value, check among its `value`s:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "'dos' in {'one':'uno', 'two':'dos', 'three':'tres'}.values()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Example: histogram.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's imagine we need to count appearances of every unique element of a sequence, e.g. of a string:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def histogram(seq):\n", " d = dict()\n", " for element in seq:\n", " if element not in d:\n", " d[element] = 1\n", " else:\n", " d[element] += 1\n", " return d\n", "\n", "h = histogram('brontosaurus')\n", "print h" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And lets create a helper function to print such a histogram:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def print_hist(hist):\n", " for key in hist:\n", " print key, hist[key]\n", "\n", "h = histogram('brontosaurus')\n", "print_hist(h)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have learned so much, let's change the print_hist function to use (key,value) pairs:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def print_hist(hist):\n", " for key, value in hist:\n", " print key, value\n", "\n", "h = histogram('brontosaurus')\n", "print_hist(h)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question:** What happened?\n", "\n", "**Excercises:** \n", "\n", "1. Fix print_hist\n", "2. Modify `histogram` to make use of `dict.get` and make implementation simpler/more concise" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Sorting the keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Do you remember that `dict`s do not preserve the order of items? So what if we wanted to get the histogram in order?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def print_hist(hist):\n", " for key in sorted(hist.keys()):\n", " print key, hist[key]\n", "\n", "h = histogram('brontosaurus')\n", "print_hist(h)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**MiniExcercise**\n", "\n", "Use above example code, but adjust it to make use of a `.sort()` method of a list you obtain from `hist.keys()` instead of using `sorted` function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**Excercise**\n", "\n", "Develop a function `invert_dict` which for a dictionary would invert keys and values, i.e. would create inverse mapping. Here is a code \"stub\" and test cases:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def invert_dict(d):\n", " # TODO magic, try to come up with 1 line solution\n", "\n", "sp2eng = invert_dict(eng2sp)\n", "assert(sp2eng == {'dos': 'two', 'tres': 'three', 'uno': 'one'})" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Keyword arguments to the function are passed as a dict" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly to as tuple can absorb all positional arguments to a function (`*args`), `dict` can absorb all keyword arguments:\n", "\n", "* A parameter name that begins with \\** gathers all the arguments into a dictionary\n", "\n", "* This allows functions to take a variable number of keyword arguments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You might frequently see a pattern such as" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# some function, this one is just for example\n", "def func(x, param=0):\n", " print \"got x=%s param=%s\" % (x, param)\n", " \n", "# another function which passes majority or all arguments\n", "# into func\n", "def func2(a, *args, **kwargs):\n", " print a, args, kwargs\n", " func(*args, **kwargs)\n", "\n", "\n", "func2(1, 'three')\n", "func2(1, 'four', param=\"123\")" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "\n", "**Question**\n", "\n", "What disadvantage(s) of such a construct?" ] } ], "metadata": {} } ] }