{
 "metadata": {
  "name": "",
  "signature": "sha256:99bd286178abac125d920fd879ba8a3bb37e5c1031cf3bf018b8048f16aef3d2"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Hands-on: Python Fundamentals -- Dicts"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Objectives:**\n",
      "\n",
      "Upon completion of this lesson, you should be able to:\n",
      "\n",
      "* Describe the characteristics of the `dict` in Python\n",
      "\n",
      "* Perform basic operations with `dict`s including creation, \"querying\", updates, and traversing\n",
      "\n",
      "* Get an idea in which situations `dict`s are and should be used"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The dictionary data structure"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "* \n",
      "In Python, a dictionary (or `dict`) is mapping between a set of\n",
      "indices (keys) and a set of values\n",
      "\n",
      "\n",
      "* \n",
      "The items in a dictionary are key-value pairs"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The dictionary data structure"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Keys can be any Python data type **BUT**\n",
      "\n",
      "\n",
      "* Because **keys** are used for indexing, they **should be immutable**\n",
      "\n",
      "\n",
      "* Values can be any Python data type\n",
      "\n",
      "\n",
      "* Values can be mutable or immutable"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Creating a dictionary"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* There are a number of ways to create and fill a dictionary.  E.g. you can create an empty one and keep assining new values"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Create an empty dictionary\n",
      "eng2sp = dict()\n",
      "eng2sp = {}     # equivalent\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp['one'] = 'uno'\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp['two'] = 'dos'\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Creating a dictionary \"hardcoded\" way"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "* \n",
      "In general, the order of items in a dictionary is unpredictable"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "so never rely on any order of keys in the dictionary."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "**Note**\n",
      "\n",
      "If you need to maintain the order in which elements were added to dictionary, use \"OrderedDict\" (see\n",
      "\n",
      "https://docs.python.org/2/library/collections.html#collections.OrderedDict)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "**Excercise**\n",
      "\n",
      "Try to create a dictionary with a key being\n",
      "- int\n",
      "- float\n",
      "- string\n",
      "- list of ints\n",
      "- some other dict\n",
      "\n",
      "Which ones would work, and which ones would fail?  What is the message?"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Creating a dictionary from a list"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You can create a dict from an iterable (e.g. list) which provides `(key, value)` pairs:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp = dict([('one', 'uno'), ('two', 'dos'), ('three', 'tres')])\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "**Excercise**\n",
      "\n",
      "Quite frequently you might have already such two lists which contain your keys and values.  Then you could easily create a necessary list of pairs from them up to produce a dictionary -- which function will you use?  Do it:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng = ['one', 'two', 'three']\n",
      "sp = ['uno', 'dos', 'tres']\n",
      "dict(TODO)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Dictionary comprehension"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Similarly to list comprehensions, there are dict-comprehensions which allow for faster, flexible, and concise in code dynamic creation of lists.  E.g. if we wanted a dictionary only for those english words with 'e' in them:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "{e:s for e,s in zip(eng, sp) if 'e' in e}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Dictionary indexing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Dictionaries are indexed by keys, not by a positional index as lists (but your keys could be integers as well)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp['three']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* If the index is not a key in the dictionary, Python raises an exception"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp['five']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Dictionary indexing"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* You can check for the existence of a key using *in* operator (known also to e.g. *lists*) so you can avoid that error:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "if 'five' in eng2sp:\n",
      "   print eng2sp['five']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "or use **.get** method and provide alternative default:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print eng2sp.get('five', \"sorry -- no spanish\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The in operator"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Note that the *in* operator works a little bit differently for dictionaries than for other sequences\n",
      "\n",
      "\n",
      "* For offset indexed sequences (strings, lists, tuples),  x in y checks to see whether x is an item in the sequence"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "6 in [4,5,6,7]\n",
      "True"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* For dictionaries,  x in y checks to see whether x is a key in the dictionary"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "'two' in {'one':'uno', 'two':'dos', 'three':'tres'}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "'dos' in {'one':'uno', 'two':'dos', 'three':'tres'}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Deleting items"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Let's add some new value\n",
      "eng2sp['five'] = 'cinco'\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "del eng2sp['five']\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Or, if you are interested to obtain the value and remove it from the dictionary -- use `.pop`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "eng2sp['five'] = 'cinco'\n",
      "five_sp = eng2sp.pop('five')\n",
      "print five_sp\n",
      "print eng2sp"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Keys and values. \"(key, value)\" is an \"item\""
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "* \n",
      "The keys method returns a list of the keys in a dictionary"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print eng2sp.keys()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "* \n",
      "The values method returns a list of the values"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print eng2sp.values()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "* \n",
      "The items method returns a list of  tuple pairs of the key-value pairs in a\n",
      "dictionary"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print eng2sp.items()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- So if you need to check either your dictionary contains a specific value, check among its `value`s:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "'dos' in {'one':'uno', 'two':'dos', 'three':'tres'}.values()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Example: histogram.py"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's imagine we need to count appearances of every unique element of a sequence, e.g. of a string:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def histogram(seq):\n",
      "    d = dict()\n",
      "    for element in seq:\n",
      "        if element not in d:\n",
      "            d[element] = 1\n",
      "        else:\n",
      "            d[element] += 1\n",
      "    return d\n",
      "\n",
      "h = histogram('brontosaurus')\n",
      "print h"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And lets create a helper function to print such a histogram:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def print_hist(hist):\n",
      "    for key in hist:\n",
      "        print key, hist[key]\n",
      "\n",
      "h = histogram('brontosaurus')\n",
      "print_hist(h)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now that we have learned so much, let's change the print_hist function to use (key,value) pairs:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def print_hist(hist):\n",
      "    for key, value in hist:\n",
      "        print key, value\n",
      "\n",
      "h = histogram('brontosaurus')\n",
      "print_hist(h)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Question:** What happened?\n",
      "\n",
      "**Excercises:** \n",
      "\n",
      "1. Fix print_hist\n",
      "2. Modify `histogram` to make use of `dict.get` and make implementation simpler/more concise"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Sorting the keys"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Do you remember that `dict`s do not preserve the order of items?  So what if we wanted to get the histogram in order?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def print_hist(hist):\n",
      "    for key in sorted(hist.keys()):\n",
      "        print key, hist[key]\n",
      "\n",
      "h = histogram('brontosaurus')\n",
      "print_hist(h)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "**MiniExcercise**\n",
      "\n",
      "Use above example code, but adjust it to make use of a `.sort()` method of a list you obtain from `hist.keys()` instead of using `sorted` function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "**Excercise**\n",
      "\n",
      "Develop a function `invert_dict` which for a dictionary would invert keys and values, i.e. would create inverse mapping.  Here is a code \"stub\" and test cases:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def invert_dict(d):\n",
      "    # TODO magic, try to come up with 1 line solution\n",
      "\n",
      "sp2eng = invert_dict(eng2sp)\n",
      "assert(sp2eng == {'dos': 'two', 'tres': 'three', 'uno': 'one'})"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Keyword arguments to the function are passed as a dict"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Similarly to as tuple can absorb all positional arguments to a function (`*args`), `dict` can absorb all keyword arguments:\n",
      "\n",
      "* A parameter name that begins with \\** gathers all the arguments into a dictionary\n",
      "\n",
      "* This allows functions to take a variable number of keyword arguments"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You might frequently see a pattern such as"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# some function, this one is just for example\n",
      "def func(x, param=0):\n",
      "    print \"got x=%s param=%s\" % (x, param)\n",
      "    \n",
      "# another function which passes majority or all arguments\n",
      "# into func\n",
      "def func2(a, *args, **kwargs):\n",
      "    print a, args, kwargs\n",
      "    func(*args, **kwargs)\n",
      "\n",
      "\n",
      "func2(1, 'three')\n",
      "func2(1, 'four', param=\"123\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- - -\n",
      "\n",
      "**Question**\n",
      "\n",
      "What disadvantage(s) of such a construct?"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}