{
 "metadata": {
  "name": "Day_17_Midterm_with_Key"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Working with Open Data Midterm (March 19, 2013)\n",
      "\n",
      "*There are **84** points in this exam, but the test will be scored out of a base total of **60 points**.*\n",
      "\n",
      "Name: ______________________________________\n",
      "\n",
      "Date: ______________________________________\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "1. Open data and the census (Total: 7)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**1a.**  <span style=\"font-weight:bold; color:red\">[7]</span>  What is **open data**?  Use the US Census data set, specifically the Census Quickfacts (<http://quickfacts.census.gov/qfd/download_data.html>) that we've been studying in this course, to illustrate your definition of open data.  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* A piece of content or data is open if anyone is free to use, reuse, and redistribute it \u2014 subject only, at most, to the requirement to attribute and/or share-alike. (3)\n",
      "\n",
      "* US Census data is free of copyright as a work of the US federal government and is free of charge. (2)\n",
      "\n",
      "* some illustration of how census data can be used (2)\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "2. CourtListener (Total: 7)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**2a.** <span style=\"font-weight:bold; color:red\">[7]</span>  What problems is http://www.courtlistener.com/ trying to solve?   Why does CourtListener involve web scraping? "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* 4 for description of CourtListener, what it aggregates, that it's an alert service, offers bulk downloads, etc\n",
      "* 3 for what scraping is, lack of standards around how court data can be presented in structured form, APIs for how to access court cases, or how alerts done."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "3. Project Questions (Total: 14)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**3a.** <span style=\"font-weight:bold; color:red\">[2]</span>  In a sentence or two, describe what are you aiming to accomplish in your project."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**3b.** <span style=\"font-weight:bold; color:red\">[8]</span>  What open data set are you using in your project?  How is that data open?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**3c.** <span style=\"font-weight:bold; color:red\">[4]</span> What is an immediate next step in your project?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "4. Verifying population totals in Census (Total: 20)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Consider the following code to calculate the population of the US, state-like entities, and county-like entities.\n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "from pandas import Series, DataFrame\n",
      "\n",
      "from itertools import islice\n",
      "\n",
      "import datetime\n",
      "\n",
      "from itertools import islice\n",
      "import codecs\n",
      "import re\n",
      "import csv\n",
      "\n",
      "import os\n",
      "\n",
      "if os.getcwd() == '/home/picloud/notebook':\n",
      "    ON_PICLOUD = True\n",
      "    DATA_DIR = '/home/picloud/working-open-data/data'\n",
      "    PYDATA_DIR = '/home/picloud/pydata-book/'\n",
      "else:\n",
      "    ON_PICLOUD = False\n",
      "    DATA_DIR = os.path.join(os.pardir, \"data\")\n",
      "    PYDATA_DIR = os.path.join(os.pardir, \"pydata-book\")\n",
      "    \n",
      "dataset_fname = os.path.join (DATA_DIR, \"census/DataSet.txt\")\n",
      "datadict_fname = os.path.join (DATA_DIR, \"census/DataDict.txt\")\n",
      "fips_fname = os.path.join (DATA_DIR, \"census/FIPS_CountyName.txt\")\n",
      "\n",
      "assert os.path.exists(DATA_DIR)\n",
      "assert os.path.exists(PYDATA_DIR)\n",
      "assert os.path.exists(dataset_fname)\n",
      "assert os.path.exists(datadict_fname)\n",
      "assert os.path.exists(fips_fname)\n",
      "\n",
      "\n",
      "# read in fips code\n",
      "fips_file = codecs.open(fips_fname, encoding='iso-8859-1')\n",
      "\n",
      "fips = dict()\n",
      "for row in islice(fips_file, None):\n",
      "    fips[row[:5]] = row[6:-1]\n",
      "    \n",
      "\n",
      "# read in data set\n",
      "ds_file = codecs.open(dataset_fname, encoding='iso-8859-1')\n",
      "reader = csv.DictReader(ds_file)\n",
      "dataset = dict([(row[\"fips\"], row) for row in islice(reader, None)])\n",
      "    \n",
      "states_fips = sorted([k for k in fips.keys() if k[-3:] == '000' and k != '00000'])\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/raymondyee/.virtualenvs/epd1/lib/python2.7/site-packages/pytz/__init__.py:35: UserWarning: Module argparse was already imported from /Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/argparse.pyc, but /Users/raymondyee/.virtualenvs/epd1/lib/python2.7/site-packages is being added to sys.path\n",
        "  from pkg_resources import resource_stream\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "And look at the following outputs to remind you of the content of the data set:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!head $fips_fname"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "00000 UNITED STATES\r\n",
        "01000 ALABAMA\r\n",
        "01001 Autauga County, AL\r\n",
        "01003 Baldwin County, AL\r\n",
        "01005 Barbour County, AL\r\n",
        "01007 Bibb County, AL\r\n",
        "01009 Blount County, AL\r\n",
        "01011 Bullock County, AL\r\n",
        "01013 Butler County, AL\r\n",
        "01015 Calhoun County, AL\r\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataset[\"00000\"][\"POP010210\"]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "'308745538'"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for f in states_fips[:5]:\n",
      "    print f, fips[f]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "01000 ALABAMA\n",
        "02000 ALASKA\n",
        "04000 ARIZONA\n",
        "05000 ARKANSAS\n",
        "06000 CALIFORNIA\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "list(islice(sorted(dataset.keys()),5))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "['00000', '01000', '01001', '01003', '01005']"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "fips['06000']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "u'CALIFORNIA'"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Questions for Section 4"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# number of counties in CA\n",
      "\n",
      "from collections import Counter\n",
      "print Counter([k[0:2] for k in dataset.keys() if k[2:5] != '000'])['06']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "58\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**4a.** <span style=\"font-weight:bold; color:red\">[6]</span>  Explain how the following code (which produces `58`) shows that the number of counties in California is 58:\n",
      "\n",
      "    from collections import Counter\n",
      "    print Counter([k[0:2] for k in dataset.keys() if k[2:5] != '000'])['06']\n",
      "\n",
      "Include in your explaination:\n",
      "\n",
      "* what `Counter` does\n",
      "* the significance of `k[0:2]`\n",
      "* why do we have `k[2:5] != '000'`\n",
      "* why `['06']` is part of the code"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* list comprehension of state prefixes by filtering on counties\n",
      "* Counter then takes tallies those state prefixes\n",
      "* state prefix from 1st 2 characters of fips code\n",
      "* counties don't end in '000'\n",
      "* California prefix is 06\n",
      "* what Counter does: Counter takes iterable and creats a count value for all values in iterable -- these values become keys"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**4b.** <span style=\"font-weight:bold; color:red\">[4]</span>  Explain what the following piece of code calculates (and how), and what the answer is:\n",
      "\n",
      "    len([k for k in dataset.keys() if k[2:5] == '000' and k != '00000'])"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* 2 for answer\n",
      "* 1 for explaining \"grammar\" of the statement -- e.g., `len` gives number of elements\n",
      "* 1 for semantics -- e.g., 50 states + DC = 51; exclude USA ('00000'); include state like entites (ends in '000'))"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**4c.** <span style=\"font-weight:bold; color:red\">[5]</span> What is the answer to following?\n",
      "\n",
      "    sum([int(dataset[k][\"POP010210\"]) for k in dataset.keys() if k[2:5] == '000' and k != '00000']) \n",
      "\n",
      "\n",
      "Explain how you came up with the answer."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* quote the exact pop of USA\n",
      "* get states by looking for '000' ending but exclude US ('00000')\n",
      "* dataset[k]['POP010210'] holds census pop for fips code k \n",
      "* need int() coercion\n",
      "* sum -- to add up all in list"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**4d.** <span style=\"font-weight:bold; color:red\">[5]</span> What is the answer to following?\n",
      "\n",
      "    sum([int(dataset[k][\"POP010210\"]) for k in dataset.keys() if k[2:5] != '000'])\n",
      "\n",
      "Explain how you came up with the answer."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* quote the exact pop of USA\n",
      "* get counties by looking for '000' ending but exclude US ('00000')\n",
      "* dataset[k]['POP010210'] holds census pop for fips code k \n",
      "* need int() coercion\n",
      "* sum -- to add up all in list\n",
      "* ok to say that logic of 4d is same as 4c except county -- no need to write it all out again"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "5. Slice notation (Total: 7)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Consider the following code using slice notation:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import string\n",
      "alphabet = string.lowercase\n",
      "\n",
      "alphabet\n",
      "\n",
      "print \"alphabet:\", alphabet\n",
      "print \"alphabet[0]:\", alphabet[0]\n",
      "print \"alphabet[-1], alphabet[0:5], alphabet[-2:]:\", alphabet[-1], alphabet[0:5], alphabet[-2:]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "alphabet: abcdefghijklmnopqrstuvwxyz\n",
        "alphabet[0]: a\n",
        "alphabet[-1], alphabet[0:5], alphabet[-2:]: z abcde yz\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Calculate the following:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**5a.** <span style=\"font-weight:bold; color:red\">[1]</span>  `alphabet[5]`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphabet[5]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "'f'"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**5b.** <span style=\"font-weight:bold; color:red\">[1]</span>  `alphabet[0:3]`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphabet[0:3]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "'abc'"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**5c.** <span style=\"font-weight:bold; color:red\">[2]</span>  `alphabet[1:4:2]`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphabet[1:4:2]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "'bd'"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**5d.** <span style=\"font-weight:bold; color:red\">[1]</span> `alphabet[-6:]`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphabet[-6:]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "'uvwxyz'"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**5e.** <span style=\"font-weight:bold; color:red\">[2]</span>  `alphabet[-1:-3:-1]`"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "alphabet[-1:-3:-1]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "'zy'"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "6. ndarray (Total: 3)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "a = array([0,1,2,3])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "a + 5"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "array([5, 6, 7, 8])"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**6a.** <span style=\"font-weight:bold; color:red\">[3]</span> Given that \n",
      "\n",
      "    a = array([0,1,2,3])\n",
      "\n",
      "and that \n",
      "\n",
      "    a + 5 \n",
      "\n",
      "is:\n",
      "\n",
      "    array([5, 6, 7, 8])\n",
      "\n",
      "what is:\n",
      "\n",
      "    sum(2*a)\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "sum(2*a)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "12"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "7. Chemical elements DataFrame (Total: 18)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Consider the following `DataFrame` holding information about the lightest chemical elements"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# round off atomic weight\n",
      "\n",
      "elements = DataFrame([{'number': 1, 'name': 'hydrogen', 'weight':1}, \n",
      "                      {'number': 2, 'name': 'helium', 'weight':4},\n",
      "                      {'number': 3, 'name': 'lithium', 'weight':7},\n",
      "                      {'number': 4, 'name': 'beryllium', 'weight':9},\n",
      "                      {'number': 5, 'name': 'boron', 'weight':11},\n",
      "                      {'number': 6, 'name': 'carbon', 'weight':12},\n",
      "                     ], index= ['H', 'He', 'Li', 'Be', 'B', 'C'])\n",
      "\n",
      "# add group information\n",
      "\n",
      "elements['group'] = Series([1, 18, 1, 2, 13, 14], index = ['H', 'He', 'Li', 'Be', 'B', 'C'])\n",
      "\n",
      "elements\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "      <th>number</th>\n",
        "      <th>weight</th>\n",
        "      <th>group</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>H</th>\n",
        "      <td>  hydrogen</td>\n",
        "      <td> 1</td>\n",
        "      <td>  1</td>\n",
        "      <td>  1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>He</th>\n",
        "      <td>    helium</td>\n",
        "      <td> 2</td>\n",
        "      <td>  4</td>\n",
        "      <td> 18</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Li</th>\n",
        "      <td>   lithium</td>\n",
        "      <td> 3</td>\n",
        "      <td>  7</td>\n",
        "      <td>  1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Be</th>\n",
        "      <td> beryllium</td>\n",
        "      <td> 4</td>\n",
        "      <td>  9</td>\n",
        "      <td>  2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>B</th>\n",
        "      <td>     boron</td>\n",
        "      <td> 5</td>\n",
        "      <td> 11</td>\n",
        "      <td> 13</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>C</th>\n",
        "      <td>    carbon</td>\n",
        "      <td> 6</td>\n",
        "      <td> 12</td>\n",
        "      <td> 14</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "         name  number  weight  group\n",
        "H    hydrogen       1       1      1\n",
        "He     helium       2       4     18\n",
        "Li    lithium       3       7      1\n",
        "Be  beryllium       4       9      2\n",
        "B       boron       5      11     13\n",
        "C      carbon       6      12     14"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Calculate the following  (showing how you arrive at your answer)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7a.** <span style=\"font-weight:bold; color:red\">[1]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(elements.index)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "6"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7b.** <span style=\"font-weight:bold; color:red\">[4]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "elements[elements.number > 4][\"weight\"].sum()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "23"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7c.** <span style=\"font-weight:bold; color:red\">[4]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "set(elements[elements['group'] == 1].name)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "set(['lithium', 'hydrogen'])"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7d.** <span style=\"font-weight:bold; color:red\">[4]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "elements.sort_index(by='weight')['number'][::-1][:2].sum()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "11"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Now we add comments to DataFrame"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "comments = Series(['first and most common element', 'the C in organic'], index=['H', 'C'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "elements['comments'] = comments\n",
      "elements"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>name</th>\n",
        "      <th>number</th>\n",
        "      <th>weight</th>\n",
        "      <th>group</th>\n",
        "      <th>comments</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>H</th>\n",
        "      <td>  hydrogen</td>\n",
        "      <td> 1</td>\n",
        "      <td>  1</td>\n",
        "      <td>  1</td>\n",
        "      <td> first and most common element</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>He</th>\n",
        "      <td>    helium</td>\n",
        "      <td> 2</td>\n",
        "      <td>  4</td>\n",
        "      <td> 18</td>\n",
        "      <td>                           NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Li</th>\n",
        "      <td>   lithium</td>\n",
        "      <td> 3</td>\n",
        "      <td>  7</td>\n",
        "      <td>  1</td>\n",
        "      <td>                           NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Be</th>\n",
        "      <td> beryllium</td>\n",
        "      <td> 4</td>\n",
        "      <td>  9</td>\n",
        "      <td>  2</td>\n",
        "      <td>                           NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>B</th>\n",
        "      <td>     boron</td>\n",
        "      <td> 5</td>\n",
        "      <td> 11</td>\n",
        "      <td> 13</td>\n",
        "      <td>                           NaN</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>C</th>\n",
        "      <td>    carbon</td>\n",
        "      <td> 6</td>\n",
        "      <td> 12</td>\n",
        "      <td> 14</td>\n",
        "      <td>              the C in organic</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "         name  number  weight  group                       comments\n",
        "H    hydrogen       1       1      1  first and most common element\n",
        "He     helium       2       4     18                            NaN\n",
        "Li    lithium       3       7      1                            NaN\n",
        "Be  beryllium       4       9      2                            NaN\n",
        "B       boron       5      11     13                            NaN\n",
        "C      carbon       6      12     14               the C in organic"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Calculate the following, again showing how you arrive at your answer"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7e.** <span style=\"font-weight:bold; color:red\">[1]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "elements.comments.dropna().count()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 24,
       "text": [
        "2"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**7f.** <span style=\"font-weight:bold; color:red\">[4]</span> "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\"\".join(elements.comments.dropna().apply(lambda x: x[0]).values)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 25,
       "text": [
        "'ft'"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Hints for Question 7f"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\"\".join(['a','b'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 26,
       "text": [
        "'ab'"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "elements.number.apply(lambda x: 2*x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 27,
       "text": [
        "H      2\n",
        "He     4\n",
        "Li     6\n",
        "Be     8\n",
        "B     10\n",
        "C     12\n",
        "Name: number"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\"\".join(elements.number.apply(lambda x: str(2*x)).values)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "'24681012'"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "8. Matching the capitals to latitude, longitude pairs (Total: 4)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<img src=\"https://www.evernote.com/shard/s1/sh/b5dd80cb-740d-4aa2-a2a7-151fd0e3119b/77a2a008cbee7050a323c5302075ab1b/res/b777ee6b-b293-4e2c-b4e8-2313e6db7af0/Google_Maps-20130317-171357.jpg.jpg?resizeSmall&width=500\">"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**8a.** <span style=\"font-weight:bold; color:red\">[4]</span>  Match the following capital cities to their respective latitude, longitudes -- to each number, match a letter:\n",
      "\n",
      "1.  Ottawa, Canada:          ____________\n",
      "2.  Moscow, Russia:          ____________\n",
      "3.  Manila, Philippines:     ____________\n",
      "4.  Buenos Aires, Argentina: ____________\n",
      "\n",
      "Lat/long:\n",
      "\n",
      "    A (14.583333, 120.966667)\n",
      "    B (45.420833, -75.69)\n",
      "    C (-34.603333, -58.381667)\n",
      "    D (55.75, 37.616667)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "B, D, A, C"
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "9. datetime (Total: 4)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import datetime"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 29
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**9a.** <span style=\"font-weight:bold; color:red\">[2]</span> What is\n",
      "\n",
      "    "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "(datetime.datetime.now() + datetime.timedelta(days=20)).month"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "4"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**9b.** <span style=\"font-weight:bold; color:red\">[2]</span>  "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "What does the following code print:\n",
      "\n",
      "    dt = datetime.datetime.fromtimestamp(24*60*60*5) - datetime.datetime.fromtimestamp(0)\n",
      "    print dt.days"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "5"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "\n",
      "\n",
      "\n",
      "\n",
      "\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}