{ "metadata": { "name": "Data and Databases Homework Assignment 6" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "#Homework Assignment #6\n\nThese problem sets focus on MongoDB and Tornado.\n\n##Problem set #1: Hu*mongo*us soup\n\nFor this first problem set, we're going to build off of your solution to Problem Set #2 (\"Of Widgets and Pandas\") in last week's homework assignment. Specifically, we're going to be working with the widgets listed on [this page](http://static.decontextualize.com/widgets.html). You'll be creating a MongoDB database that has a document for every listed widget. But first, in the cell below, connect to your local MongoDB instance and make a variable `collection` that points to a collection called `widgets` in the `lede_program` database. I've done the appropriate `import` statements for you, and included a line at the end that prints the full name of the collection (i.e., its database name plus the name of the collection.) The cell's output should be the string `lede_program.widgets`." }, { "cell_type": "code", "collapsed": false, "input": "import pymongo\n\n# your code here\n\n# end your code\n\nprint collection.full_name", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "In the cell below, write a Python statement that will remove all documents from this collection. We want to start fresh! I've left in a line that prints out the number of records in the collection. This number should be `0`." }, { "cell_type": "code", "collapsed": false, "input": "# your code here\n\n# end your code\nprint collection.count()", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Great! Now, the tough part. In the cell below, duplicate the code in the second code cell from Problem Set #2 in Assignment #5. There should be one key difference in your code, however: instead of creating an empty list, and adding each document to the list, you should instead *insert* each document into the `widgets` collection. After you've executed the code, evaluating the expression `list(collection.find())` should look something like this (your `ObjectId` numbers will be different):\n\n [{u'_id': ObjectId('53b4c0c92735fe3ff2977816'),\n u'partno': u'C1-9476',\n u'price': 2.7,\n u'quantity': 512,\n u'widgetname': u'Skinner Widget'},\n {u'_id': ObjectId('53b4c0c92735fe3ff2977817'),\n u'partno': u'JDJ-32/V',\n u'price': 9.36,\n u'quantity': 967,\n u'widgetname': u'Widget For Furtiveness'},\n ... some widgets omitted for brevity ...\n {u'_id': ObjectId('53b4c0c92735fe3ff297781e'),\n u'partno': u'5B-941/F',\n u'price': 13.26,\n u'quantity': 919,\n u'widgetname': u'Widget For Cinema'}]\n \n(Hint: Pay attention to types! Make sure that `price` is an integer and `quantity` is a floating-point number when inserting the document into MongoDB.) I've included some scaffolding for you, including the Beautiful Soup import statement and the code to fetch the contents of `widgets.html` into a variable called `html_str`. I've also included, at the very end, the expression to show all documents in the collection." }, { "cell_type": "code", "collapsed": false, "input": "from bs4 import BeautifulSoup\nimport urllib\n\nhtml_str = urllib.urlopen(\"http://static.decontextualize.com/widgets.html\").read()\n\n# your code here\n\n# end your code\n\nlist(collection.find())", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Nice. Your work is how I like my burgers: well done. (It's also how I like my steak: rare, and of the highest quality.)\n\nFinally, in the cell below, write an expression that checks to ensure that the number of documents in the collection is equal to the number of widgets in `widgets.html`. The cell should contain a single expression that evaluates to `True`." }, { "cell_type": "code", "collapsed": false, "input": "", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "##Problem set #2: An inquiry into the nature of widgets\n\nThis problem set focuses on exercising your ability to write expressions with Pymongo that filter, limit, and sort lists of documents in a MongoDB collection, using the `.find()`, `.sort()` and `.limit()` methods.\n\nFirst problem. In the cell below, write a statement that performs a MongoDB query returning a list containing one document: the least expensive widget in the catalog. I.e., your code, when run, should evaluate to this (keeping in mind that your `ObjectId` will be different):\n\n [{u'_id': ObjectId('53b4c0c92735fe3ff297781b'),\n u'partno': u'MZ-556/B',\n u'price': 2.35,\n u'quantity': 948,\n u'widgetname': u'Yellow-Tipped Widget'}]\n " }, { "cell_type": "code", "collapsed": false, "input": "", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Now, in the cell below, write an expression that returns a list of widget documents where the quantity of available widgets is greater than `900`. These documents should only have a subset of available fields, namely `partno` and `quantity`. Your code, when run, should evaluate to this (again, keeping in mind that your `ObjectId`s will be different; the order of documents in the list might also be different):\n\n [{u'partno': u'JDJ-32/V', u'quantity': 967},\n {u'partno': u'MZ-556/B', u'quantity': 948},\n {u'partno': u'5B-941/F', u'quantity': 919}]" }, { "cell_type": "code", "collapsed": false, "input": "", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Cool. Finally, in the cell below, write an expression that returns a list of widget documents where the word \"Widget\" occurs at the end of the `widgetname` string. Use the `$regex` query selector. The documents in the list should include only the `widgetname` field, and should be sorted by the `widgetname` field. I.e., your code, when run, should evaluate to this (again, your `ObjectId`s will be different):\n\n [{u'widgetname': u'Infinite Widget'},\n {u'widgetname': u'Manicurist Widget'},\n {u'widgetname': u'Self-Knowledge Widget'},\n {u'widgetname': u'Skinner Widget'},\n {u'widgetname': u'Unshakable Widget'},\n {u'widgetname': u'Yellow-Tipped Widget'}]" }, { "cell_type": "code", "collapsed": false, "input": "", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "##Problem set #3: It's a twister!\n\nIn this problem set, you'll make a very simple web API with Tornado.\n\nThis problem set works a little bit different from the others! You'll be pasting into the cell below a program that you've written elsewhere. (As discussed in class, it's difficult to run a Tornado application inside of iPython Notebook.)\n\nHere's how your web API should work. A request to the resource `/oz` should return the following response (as a JSON string):\n\n {\"result\": \"Toto, I've a feeling we're not in Kansas anymore.\"}\n \nIf the parameters `pet` and `place` are included in the query string, then the string in the response should include the strings specified as the values of those keys in place of `Toto` and `Kansas`, respectively. For example, the following request with `curl`, assuming your web service is running on `localhost` port `8000`...\n\n curl -s \"http://localhost:8000/oz?pet=Fluffy&place=Brooklyn\"\n \n... should print the following response:\n\n {\"result\": \"Fluffy, I've a feeling we're not in Brooklyn anymore.\"}\n \nI've included the basic framework for the application for you. The part you need to fill in is in the definition of the `get()` method." }, { "cell_type": "code", "collapsed": false, "input": "import tornado.httpserver\nimport tornado.ioloop\nimport tornado.options\nimport tornado.web\n\nfrom tornado.options import define, options\n\ndefine(\"port\", default=8000, help=\"run on the given port\", type=int)\ntornado.options.parse_command_line()\n\nclass OzHandler(tornado.web.RequestHandler):\n def get(self):\n # your code here!\n\n # end your code\n\napplication = tornado.web.Application(handlers=[(r\"/oz\", OzHandler)])\nhttp_server = tornado.httpserver.HTTPServer(application)\nhttp_server.listen(options.port)\ntornado.ioloop.IOLoop.instance().start()", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "Great job! I enjoyed writing this homework assignment and I hope you enjoyed completing it." } ], "metadata": {} } ] }