{
 "metadata": {
  "name": "",
  "signature": "sha256:ff12a90f2c21bc3d6c8d81668d584b4dd381ece6ecd0664a33318293c3ada0c7"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#Homework assignment 2\n",
      "\n",
      "To turn in this assignment, use the same methodology as we used last week. (Download a copy of this notebook, fill in the blanks, and e-mail to Dan.)\n",
      "\n",
      "##Problem set 1: Working with dictionaries\n",
      "\n",
      "In the following code cell, I've made a dictionary mapping the names of several states to their capitals, called `state_capitals`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "state_capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix'}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the blank below, write an expression that evaluates `Juneau`, using square brackets to get a value from the dictionary."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "state_capitals['Alaska']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 2,
       "text": [
        "'Juneau'"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, write an expression that evaluates to the number of keys in the dictionary."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(state_capitals.keys())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "3"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the following code cell, I've made a list of strings and assigned it to a variable called `cheeses`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cheeses = [\"cheddar\", \"emmental\", \"gouda\", \"brie\", \"camembert\"]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the blank below, I've provided the skeleton of a `for` loop. Replace the `???` in the `for` loop with a statement that will cause the `for` loop to fill in the blank dictionary `cheese_name_lengths`, such that the dictionary has a key for every string in the `cheeses` list, and each key maps to a value that is the length of that string. The final line of the code compares `cheese_name_lengths` to the known correct value for the dictionary; when you run the code cell, it should print out `True`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cheese_name_lengths = {}\n",
      "for cheese in cheeses:\n",
      "    cheese_name_lengths[cheese] = len(cheese)\n",
      "print cheese_name_lengths == {'emmental': 8, 'gouda': 5, 'cheddar': 7, 'brie': 4, 'camembert': 9}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "True\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Problem set 2: the New York Times API\n",
      "\n",
      "This one is tough, but I have faith in you. You're smart, and capable, and the outfit you're wearing for doing homework in is *great*.\n",
      "\n",
      "Get a key for the [Campaign Finance API](http://developer.nytimes.com/docs/campaign_finance_api). Write a Python program in the cell below that calculates and prints out the *total dollar amount* of presidential campaign contributions from contributors in New York state, to any candidate, in the 2012 election cycle. (Hint: Use the [Presidential State/Zip URI structure](http://developer.nytimes.com/docs/campaign_finance_api#h3-pres-state-zip). Make use of the [API tool](http://prototype.nytimes.com/gst/apitool/index.html) as appropriate.) I've already filled in the appropriate `import` statements for you."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "import urllib\n",
      "import json\n",
      "\n",
      "api_key = \"your api key here\"\n",
      "\n",
      "url = \"http://api.nytimes.com/svc/elections/us/v3/finances/2012/president/states/NY.json?api-key=\" + api_key\n",
      "response_str = urllib.urlopen(url).read()\n",
      "response_dict = json.loads(response_str)\n",
      "\n",
      "sum([float(rec['total']) for rec in response_dict['results']])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "19022925.18"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Problem set 3: Working with strings\n",
      "\n",
      "In the cell below, I've created a list of strings and assigned it to a variable `capitalize_me`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "capitalize_me = ['an abacus', 'bitter beefsteak', 'comfy culottes']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the following blank code cell, write a short program (or a single expression!) that evaluates to another list, containing copies of these strings with their first letter capitalized. In other words, your filled-in code cell should display this when you run it:\n",
      "\n",
      "    ['An abacus', 'Bitter beefsteak', 'Comfy culottes']\n",
      "\n",
      "Use string slices and the `.upper()` method in your solution."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "[s[0].upper() + s[1:] for s in capitalize_me]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "['An abacus', 'Bitter beefsteak', 'Comfy culottes']"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##Problem set 4: Regular expressions\n",
      "\n",
      "We're going to work with the Enron e-mail subject lines in this problem set. Make sure you have a copy of the corpus downloaded to your machine by running the following code cell:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import urllib\n",
      "urllib.urlretrieve(\"https://raw.githubusercontent.com/ledeprogram/courses/master/databases/data/enronsubjects.txt\", \"enronsubjects.txt\")\n",
      "subjects = [x.strip() for x in open(\"enronsubjects.txt\").readlines()]\n",
      "all_subjects = open(\"enronsubjects.txt\").read()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The variable `subjects` now contains a list, with each item in the list being a string that has a single subject line in it. The `all_subjects` variable contains a big string with all of the subject lines in it.\n",
      "\n",
      "In the following cell, write a list comprehension that evaluates to a list of all subject lines that contain a US phone number (i.e., in the format 555-555-1212). Use the `re.search()` function to accomplish this task. (Hint: there should be 28 of them.) I've included the appropriate `import` statement for you."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import re\n",
      "[subj for subj in subjects if re.search(r\"\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d\", subj)]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "['Call Chris 713-853-4743',\n",
        " \"FW: Birgit's Contact Info: 713-222-7667\",\n",
        " \"Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " \"Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " \"FW: Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " \"RE: Birgit's Contact Info: 713-222-7667\",\n",
        " 'Terry 281-296-0573',\n",
        " 'Re: 713-851-2499',\n",
        " \"FW: Mark's number is 713-345-7896\",\n",
        " \"RE: Mark's number is 713-345-7896\",\n",
        " \"Mark's number is 713-345-7896\",\n",
        " \"RE: Mark's number is 713-345-7896\",\n",
        " \"Mark's number is 713-345-7896\",\n",
        " 'Re: Fw: KU Calendar please call for map 1-281-367-8953 or',\n",
        " 'Bill F 713-528-0759',\n",
        " 'Call Jonathon Fairbanks 713-850-9002w/713-703-8294c and Freddy',\n",
        " 'Call Ken Kirk re CGAS lawsuit 614-888-9588',\n",
        " 'Call Alisa Johnston at Dynegy 713-767-8686 re Debbie Chance',\n",
        " 'Re: Set up meeting w/Teldata /Tracy Ashmore-303-571-6135',\n",
        " 'Re: Kaye Ellis - 281-537-9334 (home)',\n",
        " 'Interconnection Issues Discussion Paper Conf Call 1-800-937-6563,',\n",
        " \"Tentative: EPSA Cost/Benefit Analysis MEETING  Julie Simon to support FERC's RTO policies.  Dial 1-800-937-6563 and ask for the Julie Simon/EPSA call.\",\n",
        " 'CA Pacific NW Refund Conf Call (Alvarez) 1-888-296-1938, HC:']"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now use the `re.findall()` function to create an expression that evaluates to a list of *just* the phone numbers."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "re.findall(r\"\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d\", all_subjects)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "['713-853-4743',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '713-222-7667',\n",
        " '281-296-0573',\n",
        " '713-851-2499',\n",
        " '713-345-7896',\n",
        " '713-345-7896',\n",
        " '713-345-7896',\n",
        " '713-345-7896',\n",
        " '713-345-7896',\n",
        " '281-367-8953',\n",
        " '713-528-0759',\n",
        " '713-850-9002',\n",
        " '713-703-8294',\n",
        " '614-888-9588',\n",
        " '713-767-8686',\n",
        " '303-571-6135',\n",
        " '281-537-9334',\n",
        " '800-937-6563',\n",
        " '800-937-6563',\n",
        " '888-296-1938']"
       ]
      }
     ],
     "prompt_number": 17
    }
   ],
   "metadata": {}
  }
 ]
}