{ "metadata": { "name": "", "signature": "sha256:ff12a90f2c21bc3d6c8d81668d584b4dd381ece6ecd0664a33318293c3ada0c7" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Homework assignment 2\n", "\n", "To turn in this assignment, use the same methodology as we used last week. (Download a copy of this notebook, fill in the blanks, and e-mail to Dan.)\n", "\n", "##Problem set 1: Working with dictionaries\n", "\n", "In the following code cell, I've made a dictionary mapping the names of several states to their capitals, called `state_capitals`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "state_capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix'}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the blank below, write an expression that evaluates `Juneau`, using square brackets to get a value from the dictionary." ] }, { "cell_type": "code", "collapsed": false, "input": [ "state_capitals['Alaska']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "'Juneau'" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, write an expression that evaluates to the number of keys in the dictionary." ] }, { "cell_type": "code", "collapsed": false, "input": [ "len(state_capitals.keys())" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "3" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following code cell, I've made a list of strings and assigned it to a variable called `cheeses`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "cheeses = [\"cheddar\", \"emmental\", \"gouda\", \"brie\", \"camembert\"]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the blank below, I've provided the skeleton of a `for` loop. Replace the `???` in the `for` loop with a statement that will cause the `for` loop to fill in the blank dictionary `cheese_name_lengths`, such that the dictionary has a key for every string in the `cheeses` list, and each key maps to a value that is the length of that string. The final line of the code compares `cheese_name_lengths` to the known correct value for the dictionary; when you run the code cell, it should print out `True`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "cheese_name_lengths = {}\n", "for cheese in cheeses:\n", " cheese_name_lengths[cheese] = len(cheese)\n", "print cheese_name_lengths == {'emmental': 8, 'gouda': 5, 'cheddar': 7, 'brie': 4, 'camembert': 9}" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "True\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Problem set 2: the New York Times API\n", "\n", "This one is tough, but I have faith in you. You're smart, and capable, and the outfit you're wearing for doing homework in is *great*.\n", "\n", "Get a key for the [Campaign Finance API](http://developer.nytimes.com/docs/campaign_finance_api). Write a Python program in the cell below that calculates and prints out the *total dollar amount* of presidential campaign contributions from contributors in New York state, to any candidate, in the 2012 election cycle. (Hint: Use the [Presidential State/Zip URI structure](http://developer.nytimes.com/docs/campaign_finance_api#h3-pres-state-zip). Make use of the [API tool](http://prototype.nytimes.com/gst/apitool/index.html) as appropriate.) I've already filled in the appropriate `import` statements for you." ] }, { "cell_type": "code", "collapsed": true, "input": [ "import urllib\n", "import json\n", "\n", "api_key = \"your api key here\"\n", "\n", "url = \"http://api.nytimes.com/svc/elections/us/v3/finances/2012/president/states/NY.json?api-key=\" + api_key\n", "response_str = urllib.urlopen(url).read()\n", "response_dict = json.loads(response_str)\n", "\n", "sum([float(rec['total']) for rec in response_dict['results']])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "19022925.18" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Problem set 3: Working with strings\n", "\n", "In the cell below, I've created a list of strings and assigned it to a variable `capitalize_me`." ] }, { "cell_type": "code", "collapsed": false, "input": [ "capitalize_me = ['an abacus', 'bitter beefsteak', 'comfy culottes']" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following blank code cell, write a short program (or a single expression!) that evaluates to another list, containing copies of these strings with their first letter capitalized. In other words, your filled-in code cell should display this when you run it:\n", "\n", " ['An abacus', 'Bitter beefsteak', 'Comfy culottes']\n", "\n", "Use string slices and the `.upper()` method in your solution." ] }, { "cell_type": "code", "collapsed": false, "input": [ "[s[0].upper() + s[1:] for s in capitalize_me]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "['An abacus', 'Bitter beefsteak', 'Comfy culottes']" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Problem set 4: Regular expressions\n", "\n", "We're going to work with the Enron e-mail subject lines in this problem set. Make sure you have a copy of the corpus downloaded to your machine by running the following code cell:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import urllib\n", "urllib.urlretrieve(\"https://raw.githubusercontent.com/ledeprogram/courses/master/databases/data/enronsubjects.txt\", \"enronsubjects.txt\")\n", "subjects = [x.strip() for x in open(\"enronsubjects.txt\").readlines()]\n", "all_subjects = open(\"enronsubjects.txt\").read()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable `subjects` now contains a list, with each item in the list being a string that has a single subject line in it. The `all_subjects` variable contains a big string with all of the subject lines in it.\n", "\n", "In the following cell, write a list comprehension that evaluates to a list of all subject lines that contain a US phone number (i.e., in the format 555-555-1212). Use the `re.search()` function to accomplish this task. (Hint: there should be 28 of them.) I've included the appropriate `import` statement for you." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re\n", "[subj for subj in subjects if re.search(r\"\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d\", subj)]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "['Call Chris 713-853-4743',\n", " \"FW: Birgit's Contact Info: 713-222-7667\",\n", " \"Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " \"Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " \"FW: Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " \"RE: Birgit's Contact Info: 713-222-7667\",\n", " 'Terry 281-296-0573',\n", " 'Re: 713-851-2499',\n", " \"FW: Mark's number is 713-345-7896\",\n", " \"RE: Mark's number is 713-345-7896\",\n", " \"Mark's number is 713-345-7896\",\n", " \"RE: Mark's number is 713-345-7896\",\n", " \"Mark's number is 713-345-7896\",\n", " 'Re: Fw: KU Calendar please call for map 1-281-367-8953 or',\n", " 'Bill F 713-528-0759',\n", " 'Call Jonathon Fairbanks 713-850-9002w/713-703-8294c and Freddy',\n", " 'Call Ken Kirk re CGAS lawsuit 614-888-9588',\n", " 'Call Alisa Johnston at Dynegy 713-767-8686 re Debbie Chance',\n", " 'Re: Set up meeting w/Teldata /Tracy Ashmore-303-571-6135',\n", " 'Re: Kaye Ellis - 281-537-9334 (home)',\n", " 'Interconnection Issues Discussion Paper Conf Call 1-800-937-6563,',\n", " \"Tentative: EPSA Cost/Benefit Analysis MEETING Julie Simon to support FERC's RTO policies. Dial 1-800-937-6563 and ask for the Julie Simon/EPSA call.\",\n", " 'CA Pacific NW Refund Conf Call (Alvarez) 1-888-296-1938, HC:']" ] } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now use the `re.findall()` function to create an expression that evaluates to a list of *just* the phone numbers." ] }, { "cell_type": "code", "collapsed": false, "input": [ "re.findall(r\"\\d\\d\\d-\\d\\d\\d-\\d\\d\\d\\d\", all_subjects)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "['713-853-4743',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '713-222-7667',\n", " '281-296-0573',\n", " '713-851-2499',\n", " '713-345-7896',\n", " '713-345-7896',\n", " '713-345-7896',\n", " '713-345-7896',\n", " '713-345-7896',\n", " '281-367-8953',\n", " '713-528-0759',\n", " '713-850-9002',\n", " '713-703-8294',\n", " '614-888-9588',\n", " '713-767-8686',\n", " '303-571-6135',\n", " '281-537-9334',\n", " '800-937-6563',\n", " '800-937-6563',\n", " '888-296-1938']" ] } ], "prompt_number": 17 } ], "metadata": {} } ] }