{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "Application Programing Interfaces (API's) are one of the standard ways for interacting with data and software services on the internet. Learning how to use them with your programming is one of the fundamental steps in becoming a fluent developer. Here we will explore one API in particular, the one for the Digital Public Library of America (DPLA). But, first, what exactly is an API?" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Imagine the following scenario: you have just accomplished a big task in putting the entire run of your university's literary journal online. People can explore the full text of each issue, and they can also download the images for your texts. Hooray! As we just learned in the lesson on web scraping, an interested digital humanist could use just this information to pull down your materials. They might scrape each page for the full texts, titles, and dates of your journal run, and put together their own little corpus for analysis. But that's a lot of work. Web scraping seems fun and all at first, but the novelty quickly wears off. We wouldn't want to scrape _every_ resource from the web. Surely there must be a better way, and there is! What if we were to package all that data up in a more usable way for our users to consume with their programs? That's where API's come in." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "API's are a way for exchanging information and services from one piece of software to another. In this case, we're theorizing an API that would provide data. When a user comes to our journal site, we might imagine them saying, \"hey - could you give me all the journals published in the 1950's?\" And then our fledgling API would respond with something like the following:" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "[{\"ArticleID\":[42901,42902,42903,42904,42905,42906,42907,42908,42909,42910,42911,42912],\"ID\":1524,\"Issue\":1,\"IssueLabel\":\"1\",\"Season\":\"Spring\",\"Volume\":1,\"Year\":1950,\"YearLabel\":\"1950\"},{\"ArticleID\":[42913,42914,42915,42916,42917,42918,42919,42920,42921,42922],\"ID\":1525,\"Issue\":2,\"IssueLabel\":\"2\",\"Season\":\"Summer\",\"Volume\":1,\"Year\":1950,\"YearLabel\":\"1950\"},{\"ArticleID\":[42923,42924,42925,42926,42927,42928,42929,42930,42931,42932],\"ID\":1526,\"Issue\":3,\"IssueLabel\":\"3\",\"Season\":\"Winter\",\"Volume\":1,\"Year\":1950,\"YearLabel\":\"1950\"},……]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "As we've discussed all along, computers are pretty bad at inferring things, so our API neatly structures a way for your programming to interface with an application that we've made (API - get it?) more easily. The results give us a list of all the articles in each issue, as well as relevant metadata for the issue. In this case, we learn the year, data, season, and issue number. With this information, we could make several other API requests for particular articles. But data isn't the only thing you can get from API's - they can also do things for us as well! Have you ever used a social media account to log in to a different website - say using Facebook to log into the New York Times website? Behind the scenes, the NY Times is using the Facebook API to authenticate you and prove that you're a user. API's let you do an awful lot, and they let you build on the work that others have done." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "But this isn't a lesson about how to build API's - we're going to talk about how to use them. There are a couple different ways in which we can do this: from scratch or with a wrapper. In the former, we go through all the different steps of putting together a request for information from the DPLA API. In the latter, we use someone else's code to do the heavy lifting for us. First, we'll do things the easy way by working with DPyLA, a Python wrapper for the DPLA API. Let's pull in the relevant Python pieces. Notice the $, which indicates that we're working in command line and not Python. We'll need to install first." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "$ pip install DPLA" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We've got the DPLA wrapper installed, now we'll import it into our Python script. Remember, the \"from X.Y import Z\" will enable us to keep from writing X.Y.Z every time. In this case, it keeps us from writing dpla.api.DPLA when we would rather just write DPLA." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "from dpla.api import DPLA" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "API's generally require you to prove that you are an authentic user (not a bot), and, in some cases, that you have permission to access their interface. You generally do this by authenticating through the service using credentials that you have registered with them. DPLA lets you register by sending a request through terminal. Below, change \"YOUR_EMAIL@example.com\" to be an email address of your choice. " ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "deletable": true, "editable": true }, "source": [ "$ curl -v -XPOST https://api.dp.la/v2/api_key/YOUR_EMAIL@example.com" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "After running the command you should get an email with your API key. You'll then need to include this API key in every request you send to the DPLA API. For the sake of not sharing my own API key, I won't write it here. In fact, Python has a handy way for making sure that we don't share our login details in situations just like this. What we'll do is we will store our password locally in our file structure, hidden away from GitHub. Python will read in that variable from our system, store the password, and have access to it in a safe way. This process is sometimes called **sanitizing**, because you're cleaning your code to make sure that sensitive information is hidden. Run the following terminal command" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "deletable": true, "editable": true }, "source": [ "$ export API_KEY=YOUR_API_KEY_HERE" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Now our API key is stored locally, so we'll pull it into Python. To do that, we will pull in 'os', a Python module for interacting with the file system on your computer. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import os\n", "my_api_key = os.getenv('API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "deletable": true, "editable": true }, "source": [ "Now you should have your own API Key stored, and we can use it to make requests. The DPyLA wrapper makes this easy. First we open a connection to the DPLA API. Notice how we're calling it with our stored api_key." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "dpla_connection = DPLA(my_api_key)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "If you follow along with the [documentation for the wrapper on GitHub](https://github.com/bibliotechy/DPyLA), you can actually see that the wrapper gives us a handy way of requesting our own API key for the first time. We could have done this instead of calling a command from the terminal to get that email sent to us. This is the line of code the documentation gives us for doing so:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'{\"message\":\"API key created and sent via email. Be sure to check your Spam folder, too.\"}'\n" ] } ], "source": [ "DPLA.new_key(\"your.email.address@here.com\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "But we did this from the command line instead. This is a good first indication of the ways that wrappers can make your lives easier. They provide easy shortcuts for things that we would have to do from scratch otherwise. Now that we're all set up with the API, we can use this our dpla object to get information from their API! Let's do a quick search for something." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "result = dpla_connection.search('austen')\n", "print(type(result))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Python's built in type() function tells us what we're doing with - notice that the API did not return us a list of items as you might expect. Instead, it's returned a Results object. This means that we can do all sorts of things to what we've gotten back, and simply dumping out the list of the search results is only one such choice. To see all the different commands that we might call on this object, you can call one of the built in commands. Or, if you're working from iTerm2, you can type \"result.\" and hit tab twice to see options." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'dpla': , 'request': , 'count': 1586, 'limit': 10, 'start': 0, 'items': [{'@context': 'http://dp.la/api/items/context', '_id': 'nypl--510d47dc-7ea2-a3d9-e040-e00a18064a99', 'rights': 'http://rightsstatements.org/vocab/UND/1.0/', 'admin': {'object_status': 1}, '@id': 'http://dp.la/api/items/04625819cd330e54ada5a2c67dff70b8', 'object': 'http://images.nypl.org/index.php?id=1103430&t=t', 'aggregatedCHO': '#sourceResource', 'ingestDate': '2018-04-11T18:39:12.744966Z', '@type': 'ore:Aggregation', 'ingestionSequence': None, 'isShownAt': 'http://digitalcollections.nypl.org/items/510d47dc-7ea2-a3d9-e040-e00a18064a99', 'provider': {'@id': 'http://dp.la/api/contributor/nypl', 'name': 'The New York Public Library'}, 'sourceResource': {'hasType': '', '@id': 'http://dp.la/api/items/04625819cd330e54ada5a2c67dff70b8#sourceResource', 'format': ['Portraits', 'Clippings'], 'collection': {'@id': 'http://dp.la/api/collections/3f3\n" ] } ], "source": [ "print(str(result.__dict__)[:1000])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The __dict__ command shows us a range of options. We can get the number of search results, the list of all items, and a couple other bits about the particular connection we've opened up. You can actually use these same tricks - __dict__ and dot + tabbing to explore virtually every other thing that you will encounter in Python. They give you information about the objects that you're working with, which is half the battle in any Python situation. But for now let's get some more information about the API results we see. We'll take a look at the first object here." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'@context': 'http://dp.la/api/items/context',\n", " '@id': 'http://dp.la/api/items/04625819cd330e54ada5a2c67dff70b8',\n", " '@type': 'ore:Aggregation',\n", " '_id': 'nypl--510d47dc-7ea2-a3d9-e040-e00a18064a99',\n", " 'admin': {'object_status': 1},\n", " 'aggregatedCHO': '#sourceResource',\n", " 'dataProvider': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection. The New York Public Library',\n", " 'id': '04625819cd330e54ada5a2c67dff70b8',\n", " 'ingestDate': '2018-04-11T18:39:12.744966Z',\n", " 'ingestType': 'item',\n", " 'ingestionSequence': None,\n", " 'isShownAt': 'http://digitalcollections.nypl.org/items/510d47dc-7ea2-a3d9-e040-e00a18064a99',\n", " 'object': 'http://images.nypl.org/index.php?id=1103430&t=t',\n", " 'originalRecord': {'_id': '510d47dc-7ea2-a3d9-e040-e00a18064a99',\n", " 'collection': {'@id': 'http://dp.la/api/collections/3f371a92211aa17caadbb21fb98f3bd4',\n", " 'id': '3f371a92211aa17caadbb21fb98f3bd4'},\n", " 'genre': [{'#text': 'Clippings',\n", " 'authority': 'lctgm',\n", " 'valueURI': 'http://id.loc.gov/vocabulary/graphicMaterials/tgm002169'},\n", " {'#text': 'Portraits',\n", " 'authority': 'lctgm',\n", " 'valueURI': 'http://id.loc.gov/vocabulary/graphicMaterials/tgm008085'}],\n", " 'identifier': [{'#text': '242',\n", " 'displayLabel': 'Hades Collection Guide ID (legacy)',\n", " 'type': 'local_hades_collection'},\n", " {'#text': 'Portrait File',\n", " 'displayLabel': 'Other local Identifier',\n", " 'type': 'local_other'},\n", " {'#text': '348630',\n", " 'displayLabel': 'Hades struc ID (legacy)',\n", " 'type': 'local_hades'},\n", " {'#text': '36e7b8b0-c533-012f-49cc-58d385a7bc34', 'type': 'uuid'}],\n", " 'location': [{'physicalLocation': [{'#text': 'nn',\n", " 'authority': 'marcorg',\n", " 'type': 'repository'},\n", " {'#text': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection',\n", " 'type': 'division'},\n", " {'#text': 'Wallach Division: Print Collection',\n", " 'type': 'division_short_name'},\n", " {'#text': 'PRN', 'type': 'code'}]},\n", " {'physicalLocation': [{'#text': 'The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Print Collection',\n", " 'type': 'division'},\n", " {'#text': 'Wallach Division: Print Collection',\n", " 'type': 'division_short_name'},\n", " {'#text': 'PRN', 'type': 'code'}]}],\n", " 'provider': {'@id': 'http://dp.la/api/contributor/nypl',\n", " 'name': 'The New York Public Library'},\n", " 'relatedItem': {'identifier': [{'#text': '34e5a7f0-c533-012f-aa43-58d385a7bc34',\n", " 'type': 'uuid'},\n", " {'#text': '348620', 'type': 'local_hades'}],\n", " 'relatedItem': {'identifier': [{'#text': '17513b20-c52e-012f-78fb-58d385a7bc34',\n", " 'type': 'uuid'},\n", " {'#text': '591659', 'type': 'local_hades'}],\n", " 'relatedItem': {'identifier': [{'#text': '16ad5350-c52e-012f-aecf-58d385a7bc34',\n", " 'type': 'uuid'},\n", " {'#text': '287739 local_hades_collection', 'type': 'local_hades'}],\n", " 'titleInfo': {'title': 'Print Collection portrait file'},\n", " 'type': 'host'},\n", " 'titleInfo': {'title': 'A'},\n", " 'type': 'host'},\n", " 'titleInfo': {'title': 'Jane Austen.'},\n", " 'type': 'host'},\n", " 'rightsStatementURI': 'http://rightsstatements.org/vocab/UND/1.0/',\n", " 'schemaLocation': 'http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd',\n", " 'subject': {'authority': '', 'topic': 'Public figures'},\n", " 'titleInfo': {'supplied': 'no', 'title': 'J. Austen.', 'usage': 'primary'},\n", " 'tmp_high_res_link': None,\n", " 'tmp_image_id': '1103430',\n", " 'tmp_item_link': 'http://digitalcollections.nypl.org/items/510d47dc-7ea2-a3d9-e040-e00a18064a99',\n", " 'tmp_rights_statement': 'The copyright and related rights status of this item has been reviewed by The New York Public Library, but we were unable to make a conclusive determination as to the copyright status of the item. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use.',\n", " 'typeOfResource': 'still image',\n", " 'version': '3.4'},\n", " 'provider': {'@id': 'http://dp.la/api/contributor/nypl',\n", " 'name': 'The New York Public Library'},\n", " 'rights': 'http://rightsstatements.org/vocab/UND/1.0/',\n", " 'score': 7.755912,\n", " 'sourceResource': {'@id': 'http://dp.la/api/items/04625819cd330e54ada5a2c67dff70b8#sourceResource',\n", " 'collection': {'@id': 'http://dp.la/api/collections/3f371a92211aa17caadbb21fb98f3bd4',\n", " 'id': '3f371a92211aa17caadbb21fb98f3bd4',\n", " 'title': 'Print Collection portrait file'},\n", " 'format': ['Portraits', 'Clippings'],\n", " 'hasType': '',\n", " 'relation': ['Jane Austen', 'A', 'Print Collection portrait file'],\n", " 'stateLocatedIn': [{'name': 'New York'}],\n", " 'subject': [{'name': 'Public figures'}],\n", " 'title': 'J. Austen',\n", " 'type': ['image']}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item = result.items[0]\n", "item" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We get a _lot_ of information from a source like this - far more than you probably wanted to know about this individual object. This is what makes API's both useful and tricky to work with. They often want to set up users with everything they could possibly need, but they can't know what it is that users will be interested in. So very often they seem to err on the side of completion, which can sometimes make it difficult to parse the results. One difficult piece here is that the information is hierarchical - the data is organized a bit like a tree. So you have to respect that hierarchy by unfolding it as it expects. The first line below does not work, but the second does. Can you see why?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "ename": "KeyError", "evalue": "'stateLocatedIn'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mitem\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'stateLocatedIn'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: 'stateLocatedIn'" ] } ], "source": [ "item['stateLocatedIn']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "[{'name': 'New York'}]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item['sourceResource']['stateLocatedIn']" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "There is no top-level key for 'stateLocatedIn'. That data is actually organized under 'sourceResource', so we have to tell the script exactly where we want to look. We can confirm this by walking down the tree towards the data we're interested in. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "{'@id': 'http://dp.la/api/items/04625819cd330e54ada5a2c67dff70b8#sourceResource',\n", " 'collection': {'@id': 'http://dp.la/api/collections/3f371a92211aa17caadbb21fb98f3bd4',\n", " 'id': '3f371a92211aa17caadbb21fb98f3bd4',\n", " 'title': 'Print Collection portrait file'},\n", " 'format': ['Portraits', 'Clippings'],\n", " 'hasType': '',\n", " 'relation': ['Jane Austen', 'A', 'Print Collection portrait file'],\n", " 'stateLocatedIn': [{'name': 'New York'}],\n", " 'subject': [{'name': 'Public figures'}],\n", " 'title': 'J. Austen',\n", " 'type': ['image']}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.items[0]['sourceResource']" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This confirms what we had suggested before - things are nested in ways that can be difficult to parse. Imagine saying, \"to find Y, first you have to look under X\" rather than saying \"look at Y. Why can't you find it?\" We can get more information by querying 'item.keys()'. We're dealing with a dictionary object, so we can use all the normal dictionary commands." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "dict_keys(['@context', '_id', 'rights', 'admin', '@id', 'object', 'aggregatedCHO', 'ingestDate', '@type', 'ingestionSequence', 'isShownAt', 'provider', 'sourceResource', 'ingestType', 'dataProvider', 'originalRecord', 'id', 'score'])" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "item.keys()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Another helpful tool is [http://jsonviewer.stack.hu/](http://jsonviewer.stack.hu/), which can help you format things a little bit to make JSON more workable. Let's loop over the first ten items here to get some interesting information about them. Notice here that 'item' is our name for the individual object that we are looking at in each iteration of the loop." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'name': 'New York'}]\n", "[{'name': 'New York'}]\n" ] }, { "ename": "KeyError", "evalue": "'stateLocatedIn'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mitem\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;36m9\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mitem\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'sourceResource'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'stateLocatedIn'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mKeyError\u001b[0m: 'stateLocatedIn'" ] } ], "source": [ "for item in result.items[:9]:\n", " print(item['sourceResource']['stateLocatedIn'])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Several New York Objects, and then an error. Let's look at the fifth object to see what is wrong with it. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "{'@id': 'http://dp.la/api/items/280187e66f0c3af7a6216953cd4517c0#sourceResource',\n", " 'creator': ['Firkins, Oscar W., 1864-1932'],\n", " 'date': {'begin': '1965', 'displayDate': '1965 [c1920]', 'end': '1965'},\n", " 'extent': ['ix, 254 p. 23 cm.'],\n", " 'format': ['Electronic resource', 'Language material'],\n", " 'identifier': ['(MiU)000426718',\n", " 'sdr-miu000426718',\n", " '(OCoLC)356048',\n", " '(CaOTULAS)160214876',\n", " '(RLIN)MIUG86-B56754',\n", " 'LC call number: PR4036 .F5 1965',\n", " 'Hathi: 000426718'],\n", " 'language': [{'iso639_3': 'eng', 'name': 'English'}],\n", " 'publisher': ['New York, Russell & Russell'],\n", " 'rights': 'Public domain. Learn more at http://www.hathitrust.org/access_use',\n", " 'specType': ['Book'],\n", " 'subject': [{'name': 'Austen, Jane, 1775-1817'}],\n", " 'title': ['Jane Austen'],\n", " 'type': ['text']}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.items[4]['sourceResource']" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The first several results all appear to be held by libraries, while the fifth result is an electronic resource. It makes sense that this resourcee would not be held in a particular state. Let's make our results a bit more nuanced so as to account for these edge cases." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'name': 'New York'}]\n", "[{'name': 'New York'}]\n", "Photographs\n", "[{'name': 'New York'}]\n", "['Electronic resource', 'Language material']\n", "[{'name': 'New York'}]\n", "['Electronic resource', 'Language material']\n", "['Biography', 'Electronic resource', 'Language material']\n", "['Electronic resource', 'Language material']\n" ] } ], "source": [ "for item in result.items[:9]:\n", " if 'stateLocatedIn' in item['sourceResource']:\n", " print(item['sourceResource']['stateLocatedIn'])\n", " else:\n", " print(item['sourceResource']['format'])\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Above we checked to see if the 'sourceResource' dictionary has a particular key, which allows us to skip over electronic resources. And notice how we have a couple different formats for state names already! The first four list the full state, while the last item lists an abbreviation. This can get very tricky very quickly, and it points to why data cleaning is one of the most important tasks you do as a programming humanist. If we were interested in working across these dates, but they are formatted inconsistently, we would have to clean them up." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "API's frequently limit the number of requests you can make to their service during a particular time period. For example, Twitter limits the number of requests you can make to 15 requests per 15 minutes. This ensures that you don't accidentally blow up their system with requests while you're learning, but it also helps to ensure that people are using their service for legitimate scripts rather than incessant spam bots." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "That's the easy way to do things. When you're interested in using the data provided by a service, you should always look to see if they have an API. And whenever there is an API, it is worth looking to see whether there is also a wrapper for you to use. There are often different wrappers for different programming languages as well, but Python is a pretty common and popular language. So you'll often find someone else's work that you can build on." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Before we move on, I want to give just a taste of how to do things the hard way, if you didn't have a wrapper for this particular API. There are a few things you need to know:\n", "\n", "* API Endpoint: the base URL that will be responding to your requests. Think of API's like data that live at particular URLS. If you've ever looked at the URL for a page you're in and seen something like shoppingbaseurl.com/q?=shoes&type=mens&cost=expensive, you're using something similar to an API. Basically, using an API entails constructing a URL that points to exactly the data you want. The API consists of the baseurl that gets you to the root/heart/doorway of the API, and then you give params to nuance your search/request.\n", "* Search Parameters (Params for Short): the particular things you send with your request to get back the information you want. Remember how Python dictionaries used key: value pairs? We'll do the same thing here. In the case of the previous example, you have three params: a search query, a type, and a cost. If this were a Python dictionary, we might write that as {'search': 'shoes', 'type': 'mens', 'cost': 'expensive'}.\n", "* API key: we've already covered this, but the API key is what authenticates you when the API you're using so that they will allow you to use it. Sometimes, you simply pass your key as an additional search paramater. Other times, you might have to authenticate with a separate service (like OAuth).\n", "\n", "First, let's import the Python libraries that we'll need:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "import requests" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "'requests' is a library that allows you to make requests to an API. Now we'll store our API endpoint so that we know where we will be making requests to. In this case, we can find out API endpoint by looking at DPLA's great [documentation](https://dp.la/info/developers/codex/api-basics/). Not every API provides you with such great guides to their work, so thanks to the wonderful people at DPLA for making this information available!" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "endpoint = 'https://api.dp.la/v2/items'" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Now we will set up our search params. If we're still working in the same terminal session, we should have our api_key stored in 'my_api_key'. And it's important to note that not just any search parameters would work. The API documentation specifies which pieces are allowed. If we sent over 'how_great_is_ethan' as a part of the API, it would not function." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "params = {\n", " 'api_key': my_api_key,\n", " 'q': 'Austin, Texas',\n", " }" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "I've set up a basic search here for information about Austin, Texas, and I've given the params my personal api key to authenticate me. I've also specified that I want to get items back. The handy thing about the 'requests' library is that will mash all this together for us in a valid URL." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "'https://api.dp.la/v2/items?api_key=a21b41e2e2d41cc3f488e6597078bafc&q=Austin%2C+Texas'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requested_the_hard_way = requests.get(endpoint, params)\n", "requested_the_hard_way.status_code\n", "requested_the_hard_way.url" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "You might be more familiar with a 404 code, which is what you get if you go to a webpage that doesn't exist. A 200 code is one that we don't often see, because it means that things went OK! For the sake of camparison, I've pulled in our old connection that used the wrapper and renamed it alongside our request done from scratch.\n", "\n", "Why is this harder?\n", "\n", "For one, we had to find out the correct API endpoint and parameters. Sometimes this is easier said than done. For example, my first attempt at using the API actually failed - I assumed that 'items' was part of the parameters rather than the endpoint, because you can actually get different types of things beyond items - collections, for example. Let's take a look at the things we could do with our 'requested_the_hard_way' object:\n", "```\n", "res.apparent_encoding res.elapsed res.is_redirect res.ok res.status_code\n", "res.close( res.encoding res.iter_content( res.raise_for_status( res.text\n", "res.connection res.headers res.iter_lines( res.raw res.url\n", "res.content res.history res.json( res.reason\n", "res.cookies res.is_permanent_redirect res.links res.request\n", "```\n", "As we can see, there's nothing here that actually relates to the DPLA yet - all we have are actions that relate to the API - ways to check what kind of data we asked for, where asked for it, and how. The data we got in response is here, though, in two places - .json and .text. The latter gives a long string version of the response, while .json gives us a formatted version of the data. JSON stands for JavaScript Object Notation, and we can interact with it in ways similar to a Python dictionary." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "81195" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requested_the_hard_way.json()['count']" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Our result got 81195 results (at the time of writing - your results might be different)! Let's take a closer look. " ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "{'@context': 'http://dp.la/api/items/context',\n", " '@id': 'http://dp.la/api/items/c2db8605123c7b7d396c1e3b164d1e51',\n", " '@type': 'ore:Aggregation',\n", " '_id': 'smithsonian--http://collections.si.edu/search/results.htm?q=record_ID%3Anasm_A19960136000&repo=DPLA',\n", " 'aggregatedCHO': '#sourceResource',\n", " 'dataProvider': 'National Air and Space Museum',\n", " 'id': 'c2db8605123c7b7d396c1e3b164d1e51',\n", " 'ingestDate': '2018-04-10T10:24:40.597319Z',\n", " 'ingestType': 'item',\n", " 'ingestionSequence': None,\n", " 'isShownAt': 'http://collections.si.edu/search/results.htm?q=record_ID%3Anasm_A19960136000&repo=DPLA',\n", " 'object': 'https://airandspace.si.edu/webimages/collections/thumbnails/a19960136000cp04.jpg',\n", " 'originalRecord': {'_id': 'nasm_A19960136000',\n", " 'collection': [{'@id': 'http://dp.la/api/collections/c349831fbbb138b7176675dc83b12460',\n", " 'id': 'c349831fbbb138b7176675dc83b12460',\n", " 'title': 'National Air and Space Museum Collection'}],\n", " 'descriptiveNonRepeating': {'data_source': 'National Air and Space Museum',\n", " 'online_media': {'@mediaCount': '1',\n", " 'media': {'#text': 'https://airandspace.si.edu/webimages/collections/full/a19960136000cp04.jpg',\n", " '@idsId': 'https://airandspace.si.edu/webimages/collections/full/a19960136000cp04.jpg',\n", " '@thumbnail': 'https://airandspace.si.edu/webimages/collections/thumbnails/a19960136000cp04.jpg',\n", " '@type': 'Images'}},\n", " 'record_ID': 'nasm_A19960136000',\n", " 'record_link': 'http://collections.si.edu/search/results.htm?q=record_ID%3Anasm_A19960136000&repo=DPLA',\n", " 'title': {'#text': 'Austin', '@label': 'Title'},\n", " 'title_sort': 'AUSTIN',\n", " 'unit_code': 'NASM'},\n", " 'freetext': {'dataSource': {'#text': 'National Air and Space Museum',\n", " '@label': 'Data Source'},\n", " 'identifier': {'#text': 'A19960136000', '@label': 'Inventory Number'},\n", " 'name': {'#text': 'Muse Air', '@label': 'Sponsor'},\n", " 'notes': [{'#text': 'Muse Air makes flying beautiful AUSTIN. Color, offset photolithograph advertising air travel to Austin, Texas. A white paper sculpture of the capital building on a blue background.',\n", " '@label': 'Physical Description'},\n", " {'#text': 'Fly Now: The National Air and Space Museum Poster Collection',\n", " '@label': 'Summary'},\n", " {'#text': 'Throughout their history, posters have been a significant means of mass communication, often with striking visual effect. Wendy Wick Reaves, the Smithsonian Portrait Gallery Curator of Prints and Drawings, comments that \"sometimes a pictorial poster is a decorative masterpiece-something I can\\'t walk by without a jolt of aesthetic pleasure. Another might strike me as extremely clever advertising … But collectively, these \\'pictures of persuasion,\\' as we might call them, offer a wealth of art, history, design, and popular culture for us to understand. The poster is a familiar part of our world, and we intuitively understand its role as propaganda, promotion, announcement, or advertisement.\"',\n", " '@label': 'Summary'},\n", " {'#text': \"Reaves' observations are especially relevant for the impressive array of aviation posters in the National Air and Space Museum's 1300+ artifact collection. Quite possibly the largest publicly-held collection of its kind in the United States, the National Air and Space Museum's posters focus primarily on advertising for aviation-related products and activities. Among other areas, the collection includes 19th-century ballooning exhibition posters, early 20th-century airplane exhibition and meet posters, and twentieth-century airline advertisements.\",\n", " '@label': 'Summary'},\n", " {'#text': 'The posters in the collection represent printing technologies that include original lithography, silkscreen, photolithography, and computer-generated imagery. The collection is significant both for its aesthetic value and because it is a unique representation of the cultural, commercial and military history of aviation. The collection represents an intense interest in flight, both public and private, during a significant period of its technological and social development.',\n", " '@label': 'Summary'},\n", " {'#text': 'Copyright Disclosure for Orphaned Works', '@label': 'Summary'},\n", " {'#text': \"Whenever possible, the museum provides factual information about copyright owners and related matters in its records and other texts related to the collections. For many of the images in this collection, some of which were created for or by corporate entities that no longer exist, the museum does not own any copyrights. Therefore, it generally does not grant or deny permission to copy, distribute or otherwise use material in this collection. If identified, permission and possible fees may be required from the copyright owner independently of the museum. It is the user's obligation to determine and satisfy copyright or other use restrictions when copying, distributing or otherwise using materials found in the museum's collections. Transmission or reproduction of protected materials beyond that allowed by fair use requires the written permission of the copyright owners. Users must make their own assessments of rights in light of their intended use.\",\n", " '@label': 'Summary'},\n", " {'#text': \"If you have any more information about an item you've seen in the Fly Now: The National Air and Space Museum Poster Collection, or if you are a copyright owner and believe we have not properly attributed your work to you or have used it without permission, we want to hear from you. Please contact pisanod@si.edu with your contact information and a link to the relevant content.\",\n", " '@label': 'Summary'},\n", " {'#text': \"View more information about the Smithsonian's general copyright policies at http://www.si.edu/termsofuse\",\n", " '@label': 'Summary'}],\n", " 'objectRights': {'#text': 'Do not reproduce without permission from the Smithsonian Institution, National Air and Space Museum',\n", " '@label': 'Restrictions & Rights'},\n", " 'objectType': {'#text': 'ART-Posters, Original Art Quality',\n", " '@label': 'Type'},\n", " 'physicalDescription': [{'#text': 'Poster, Advertising, Commercial Aviation',\n", " '@label': 'Medium'},\n", " {'#text': '2-D - Unframed (H x W): 81.3 x 61cm (32 x 24 in.)',\n", " '@label': 'Dimensions'}],\n", " 'place': {'#text': 'United States of America',\n", " '@label': 'Country of Origin'},\n", " 'setName': {'#text': 'National Air and Space Museum Collection',\n", " '@label': 'See more items in'}},\n", " 'indexedStructured': {'name': 'Muse Air',\n", " 'object_type': ['Posters', 'Works of art'],\n", " 'online_media_type': 'Images',\n", " 'place': 'United States of America',\n", " 'usage_flag': ['Aviation', 'Commercial', 'Art']},\n", " 'provider': {'@id': 'http://dp.la/api/contributor/smithsonian',\n", " 'name': 'Smithsonian Institution'}},\n", " 'provider': {'@id': 'http://dp.la/api/contributor/smithsonian',\n", " 'name': 'Smithsonian Institution'},\n", " 'score': 6.5518055,\n", " 'sourceResource': {'@id': 'http://dp.la/api/items/c2db8605123c7b7d396c1e3b164d1e51#sourceResource',\n", " 'collection': [{'@id': 'http://dp.la/api/collections/c349831fbbb138b7176675dc83b12460',\n", " 'id': 'c349831fbbb138b7176675dc83b12460',\n", " 'title': 'National Air and Space Museum Collection'}],\n", " 'description': ['Muse Air makes flying beautiful AUSTIN. Color, offset photolithograph advertising air travel to Austin, Texas. A white paper sculpture of the capital building on a blue background.',\n", " 'Fly Now: The National Air and Space Museum Poster Collection',\n", " 'Throughout their history, posters have been a significant means of mass communication, often with striking visual effect. Wendy Wick Reaves, the Smithsonian Portrait Gallery Curator of Prints and Drawings, comments that \"sometimes a pictorial poster is a decorative masterpiece-something I can\\'t walk by without a jolt of aesthetic pleasure. Another might strike me as extremely clever advertising … But collectively, these \\'pictures of persuasion,\\' as we might call them, offer a wealth of art, history, design, and popular culture for us to understand. The poster is a familiar part of our world, and we intuitively understand its role as propaganda, promotion, announcement, or advertisement.',\n", " \"Reaves' observations are especially relevant for the impressive array of aviation posters in the National Air and Space Museum's 1300+ artifact collection. Quite possibly the largest publicly-held collection of its kind in the United States, the National Air and Space Museum's posters focus primarily on advertising for aviation-related products and activities. Among other areas, the collection includes 19th-century ballooning exhibition posters, early 20th-century airplane exhibition and meet posters, and twentieth-century airline advertisements.\",\n", " 'The posters in the collection represent printing technologies that include original lithography, silkscreen, photolithography, and computer-generated imagery. The collection is significant both for its aesthetic value and because it is a unique representation of the cultural, commercial and military history of aviation. The collection represents an intense interest in flight, both public and private, during a significant period of its technological and social development.',\n", " 'Copyright Disclosure for Orphaned Works',\n", " \"Whenever possible, the museum provides factual information about copyright owners and related matters in its records and other texts related to the collections. For many of the images in this collection, some of which were created for or by corporate entities that no longer exist, the museum does not own any copyrights. Therefore, it generally does not grant or deny permission to copy, distribute or otherwise use material in this collection. If identified, permission and possible fees may be required from the copyright owner independently of the museum. It is the user's obligation to determine and satisfy copyright or other use restrictions when copying, distributing or otherwise using materials found in the museum's collections. Transmission or reproduction of protected materials beyond that allowed by fair use requires the written permission of the copyright owners. Users must make their own assessments of rights in light of their intended use.\",\n", " \"If you have any more information about an item you've seen in the Fly Now: The National Air and Space Museum Poster Collection, or if you are a copyright owner and believe we have not properly attributed your work to you or have used it without permission, we want to hear from you. Please contact pisanod@si.edu with your contact information and a link to the relevant content.\",\n", " \"View more information about the Smithsonian's general copyright policies at http://www.si.edu/termsofuse\"],\n", " 'extent': ['2-D - Unframed (H x W): 81.3 x 61cm (32 x 24 in.)'],\n", " 'format': 'Poster, Advertising, Commercial Aviation',\n", " 'spatial': [{'country': 'United States', 'name': 'United States'}],\n", " 'stateLocatedIn': [{'name': 'Washington, D.C.'}],\n", " 'subject': [{'name': 'Muse Air'}],\n", " 'title': 'Austin',\n", " 'type': ['image']}}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "requested_the_hard_way.json()['docs'][0]" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The API limits our results to ten per page, so in order to get an analysis of the full sweep of the search results, we'll need to increment over each page and add the results together, like so.\n", "\n", "We use the math.ceil function to round up the number get. We know we have 500 results per page in this, more expanded version of the api call. By dividing the total number of results by 500, we'll get how many pages we need to iterate over. We round up because that last page will have a few results on it. To do this over the full collection of results, we would go all the way up to the total count - 144 pages. We'll just go 20 pages in to save bandwidth and time. Surely 10,000 results is enough to do something interesting." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "163\n", "1\n", "2\n", "3\n", "4\n", "5\n", "6\n", "7\n", "8\n", "9\n", "10\n", "11\n", "12\n", "13\n", "14\n", "15\n", "16\n", "17\n", "18\n", "19\n", "20\n", "10000\n" ] } ], "source": [ "import math\n", "\n", "total_hard_way_results = []\n", "params = {\n", " 'api_key': my_api_key,\n", " 'q': 'Austin, Texas',\n", " 'page_size': '500'\n", " }\n", "total_results = requested_the_hard_way.json()['count']\n", "number_of_pages = math.ceil(total_results / 500)\n", "print(number_of_pages)\n", "list_of_page_numbers = range(1, 21, 1)\n", "for page_number in list_of_page_numbers:\n", " print(page_number)\n", " params['page'] = page_number\n", " for result in requests.get(endpoint, params).json()['docs']:\n", " total_hard_way_results.append(result)\n", " \n", "print(len(total_hard_way_results))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "deletable": true, "editable": true, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'other_format': 3012, 'Washington, D.C.': 9, 'Massachusetts': 15, 'New York': 17, 'Indiana': 8, 'South Carolina': 3, 'Montana': 1, 'Illinois': 9, 'Virginia': 2, 'North Carolina': 6, 'Wisconsin': 4, 'Tennessee': 2, 'Michigan': 2, 'UT': 28, 'Missouri': 18}\n" ] } ], "source": [ "state_results = {}\n", "state_results['other_format'] = 0\n", "for item in total_hard_way_results:\n", " if 'stateLocatedIn' in item['sourceResource']:\n", " if item['sourceResource']['stateLocatedIn'][0]['name'] in state_results:\n", " state_results[item['sourceResource']['stateLocatedIn'][0]['name']] += 1\n", " else: \n", " state_results[item['sourceResource']['stateLocatedIn'][0]['name']] = 1\n", " elif 'format' in item['sourceResource'] and item['sourceResource']['format'] != 'Text':\n", " state_results['other_format'] += 1\n", " else:\n", " pass\n", "print(state_results)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We see a number interesting things here. For one, we see that Texas has the largest number of holdings about Texas. This makes sense. But look how low those numbers are. We can start to get a sense of how difficult it is to work with this API data. We have 10,000 files we're looking at, but only a small percentage of them have physical locations. We captured all of these other materials in the other_format key. This key, were we to open it up, contains images, electronic resources, and more. But where are all the other search results coming from? More information would be required. but this tells us that the DPLA is just that - Digital. While it might aggregate materials for some physical holdings, it primarily deals in materials that live electronically." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Exercises\n", "\n", "1. Write your own script to use the DPLA API for a search item using the Python wrapper.\n", "2. Do the same search, but from scratch rather than using the wrapper.\n", "2. Manipulate the DPLA metadata to do something interesting with the results." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "\n", "## Potential Answers" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "'http://ark.digitalcommonwealth.org/ark:/50959/5h73tn08m'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1. Write your own script to use the DPLA API for a search item using the Python wrapper.\n", "# Let's find a dog-related item! We'll dig through the JSON to find out the URL for the original item.\n", "\n", "\n", "from dpla.api import DPLA\n", "import os\n", "my_api_key = os.getenv('API_KEY')\n", "dpla_connection = DPLA(my_api_key)\n", "result = dpla_connection.search('dogs')\n", "item = result.items[0]\n", "item\n", "# item['@id']\n", "item['originalRecord']['metadata']['mods:mods']['mods:identifier'][1]['#text']" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/plain": [ "'http://ark.digitalcommonwealth.org/ark:/50959/5h73tn08m'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 2. Do the same search, but from scratch rather than using the wrapper.\n", "import requests\n", "endpoint = 'https://api.dp.la/v2/items'\n", "params = {\n", " 'api_key': my_api_key,\n", " 'q': 'dogs',\n", " }\n", " \n", "requested_the_hard_way = requests.get(endpoint, params)\n", "requested_the_hard_way.status_code\n", "requested_the_hard_way.url\n", "item = requested_the_hard_way.json()['docs'][0]\n", "item['originalRecord']['metadata']['mods:mods']['mods:identifier'][1]['#text']" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Military', 'Military and War', 'Bellefield', 'Wildlife resources', 'Sciuridae', 'Van Nuys (Los Angeles, Calif.)', 'Trees', 'Mountains', 'Sheep', 'Textiles', 'Itasca State Park (Minn.).', 'Museum exhibits--Texas--Austin', 'Snow', 'Dog leashes--California--San Fernando', 'Dogs--California--San Fernando', 'Facial Nerve Diseases', 'Prairies', 'Business, Economics and Finance - Journalism', 'Fire fighters--Indiana--Fort Wayne', 'Saint Bernard dog', 'Deer Creek (Ill.)', 'Police', 'Dwellings--California--Reseda (Los Angeles)', 'Farming and Farms', 'Ashton, Benjamin', 'Searches and seizures', 'Recreation Park (San Fernando, Calif.)', 'Corrals', 'Cats--Juvenile literature', 'Anthropology', 'Classrooms', 'Burns, Robert, 1759-1796', 'Eating and drinking.', 'Cooking (Meat)', 'Singers', 'De Heredia, Carlos, -1918', 'Appleton, Joan Egleston, 1912-2006', 'Portrait photographs', 'Relief prints', 'Celebrations', 'Meat', 'Underwood & Underwood', 'Mexican American art', 'Bridges', 'Peale, Titian Ramsay, 1799-1885', 'Night photographs--1951-1960', 'Piers and wharves--Tahoe, Lake, Region (Calif. and Nev.)', 'Dogs--Folklore', 'Ethnic', 'Animal training', 'Two Dogs', 'Children (people by age group)--People by age group', 'Animals--Dogs', 'Filipino', 'Bear hunting', 'Graffiti--Georgia--Savannah', 'Basco, Alice', 'Borglum, Solon H', 'Trees--California--Los Angeles', 'Country life', 'Bird banding.', 'State of being', 'Durr, Louis Monroe--Family', 'Homes and haunts', 'Sleep', 'Canidae', 'Williams, John Harvey--Family', 'Islands of the Pacific', 'Legacy Digitization', 'Wagons', 'Dogs--Georgia', 'Photographs', 'Man', 'Dogs in art', 'Pets--California--San Fernando', 'Tables--California--San Fernando', 'Dogs--California--San Fernando Valley', 'Bromfield, Louis, 1896-1956', 'San Fernando (Calif.)', 'Filipinos', 'Veterinarians--California--Los Angeles', 'Paintings', 'Azara, Felix de, 1746-1811', 'Vodka', 'Women', 'Universities and colleges', 'Endangered species--Protection', 'Girls--California--Pasadena', 'Cats--Diseases', 'Natural History', \"Publishers' advertisements--New Hampshire--Concord\", 'Full length', 'H.C. White Co', 'Signs and signboards', 'Dogs--California--Van Nuys (Los Angeles)', 'Ski Archives', 'Utah', '636.7M57H', \"All Souls' Day--Texas--Austin--Photographs\", 'Animal Rescue League of Hennepin County', 'Zola, Émile, 1840-1902', 'Paul Singer collection', 'Ladd, Lawrence W', 'Pointing dogs', 'Eskimos', 'Indians of North America', 'Farm life', 'Plantations--South Carolina', 'Albritton, Joe W', 'Dwellings--California--Van Nuys (Los Angeles)', 'Mansions', '1910-1919', 'Ethnology', 'Ponies', 'Men--California--Los Angeles', 'Sharp, Joseph Henry', 'Double portraits', 'College campuses', 'Women--Portraits', 'Prairie dogs,Montana', 'Horses', 'Textile industry--North Carolina--Greensboro', 'DeKalb County (Ga.)', 'Baskets', 'Estates', 'Portraits--Bonfils, Winifred', 'Printed wrappers (Binding)', 'Oklahoma', 'Landscapes', 'North Dakota.', 'Black newspapers - Tennessee - Memphis', 'Art', 'Schipperke', 'Fences--California--Los Angeles', 'Mammals', 'Dogs--California--Los Angeles', 'Forests and forestry', 'Dogs, Breeds Of Spaniel Cocker #1', 'Valley Times Collection photographs', 'Ski patrollers--Utah--Alta--Photographs', 'Signs and signboards--Georgia--Savannah', 'World War, 1939-1945', 'Shoes', 'Clark County (Idaho)', 'Rodentia', 'Boys--California--San Fernando', 'Bears', 'Cone Mills Corporation', 'Fences', 'Blind', 'Dennis, Morgan, 1892-1960', '1940-1949', 'Day of the Dead', 'Ranches and ranching', 'Foxes', 'Entertainers', 'Study', 'Juvenile literature--1843', 'Houston, Henry Howard, II, 1895-1918', 'Boys--California--Los Angeles', 'Couples--Portraits', 'Spectators--California--San Fernando', 'Dog walking', 'Barr, Burt', 'Navajo Indian--Education', 'Wildlife management', 'Costume', 'Preserving Historic Hobo Day', 'The Hans Syz Collection', 'Animals, Laboratory--Bibliography', 'Circus animals', 'Ewing, Albert J. (1870-1934)', 'Meissen Manufactory', 'Recreation', 'Shop signs--Georgia--Savannah', 'Durr, Louis Monroe', 'Men', 'Malabar Farm', 'Dogs--Training--California--Los Angeles', 'Pekingese dog', 'Architecture-Out Buildings', 'Girls--California--San Fernando', 'Dogs--California--Oakland', 'Setters (Dogs)', 'Alta', 'Maniaci, Bonnie', 'Hunting and Fishing', 'Georgetown County (S.C.)--Pictorial Works', 'Nature', 'Cameras--California--Los Angeles', 'Drinking fountains', 'Other', 'Animals in human situations', 'Dogs--California--Pasadena', 'Southern Illinois University at Carbondale', 'MPR News Feature', 'Landscape photographs', 'Natural resources', 'Houses', 'Manufacturing', 'Watchdogs', 'Williams, John Harvey', 'Pointer (Dog breed)', 'Soldiers', 'Philadelphia (Pa.). Police Department', 'Portrait male', '1900-1909', 'Animal behavior', 'Dogs--California--Long Beach', 'Hunting', 'Ashdale Farm', 'Occupation', 'Skis and skiing', 'Liebler, H. Baxter (Harold Baxter), 1889-1982', 'Herding dogs', 'Laurie, Annie,, 1869-1936', 'Public figures', 'Hobo Day', 'History', '1950s', 'Morris Animal Refuge (Philadelphia, Pa)', 'Oakland (Calif.)--Photographs', 'Dogs', 'Historic buildings', 'Animal stories', 'Books--Reviews', 'Circus [dogs]', 'Northridge (Los Angeles, Calif.)', 'College Students', 'Social and Family Life', 'Chapbooks--Specimens', 'Silos', 'African-Americans', 'Sports in art', 'Wire fencing--California--Los Angeles', 'Scottish terrier', 'Prairie dogs,Poisons,Montana', 'Camels', 'Children in popular culture', 'Memphis World', 'Dogs as laboratory animals', 'Dog shows', 'Ski resorts', 'Figure male', 'Rabies vaccines--California--Los Angeles', 'Wildlife conservation', 'Pasadena (Calif.)', 'Trained animals', 'Cook, Henry Harvey, 1822-1905', 'Radium', 'Pet shows--California--San Fernando', 'Bird dogs', 'Saluki', 'SPCA International', 'University of Virginia', 'Authors', 'Toys--California--San Fernando Valley', 'Dwellings--California--Eagle Rock (Los Angeles)', 'Animals in art', 'Hurdles (Fences)--California--Los Angeles', 'CNS Lyme Disease', 'Greyhounds', 'Chicanos', 'Award winners--California--San Fernando', 'Dogs--Diseases', 'Ribbons', 'Animals--Abnormalities', 'Wagon trains', 'Dog owners--California--Los Angeles', 'Humphreys Co. (Miss.)', 'Georgetown Street Scenes, (not Front Street)', 'Yorkshire terrier--California--Los Angeles', 'Dog adoption', 'Puppies--California--Los Angeles', 'Logging', 'Outdoors', 'Kent County Humane Society', 'Joe Munroe', 'Lawns--California--Los Angeles', 'Individuals', 'Alcoholic beverages', 'Police dogs', 'Glass negatives', 'Hillside', 'Landscape', 'Mardi Gras, 1938', 'Urine--Analysis', 'Dirt roads', 'Chinese', 'Duquet, Julie', 'Sleeping', 'Forest', 'Children and animals', 'Rabies--Vaccination--California--Los Angeles', 'Birds.', 'South Dakota State University', 'Prairie dogs,Montana,Poisons', 'World War, 1914-1918', 'Moseley, Vonnie', 'Radios', 'Indians of North America--Religion', 'Chapbooks--New Hampshire--Concord--19th century', 'Natural history', 'Wagons--Wheels', 'Collies', 'Hunting dogs--North Carolina--Greenville', 'Cottages', 'McMorrough, W. A.--Family', 'Women Societies and clubs', 'Fox hunting', 'Sheet-metal, Corrugated', 'Plantation life--South Carolina', 'Pups in Basket', 'Dogs--Breeding', 'Animal', 'Blepharoptosis', 'Dachshunds', 'Dogs.', 'Portaits', 'Women--California--San Fernando', 'Philippines', 'Animals', 'Postcards', 'Sports', 'Dogs--Obedience trials--Novice classes', 'Rifles', 'Parks for dogs', 'Puppies', 'Loggers', 'Migration.', 'Signs and signboards--California--Los Angeles', 'Hunting--South Carolina', 'Endangered species--Habitat', 'Animals. Pets. Dogs.', 'Domestic Furnishings', 'Navajo', 'Minority business enterprises--Georgia--Savannah', 'Dog', 'Animal cages', 'The Hans C. Syz Collection', 'Erickson, Marilyn', 'Minnesota State Fair', 'Dogs--Pictorial works', 'De Heredia, Georgie Cook, -1946', 'Parks--California--San Fernando', 'Exercise--Bibliography', 'Dogs--Mammals', 'Church, James Edward, 1869-1959', 'Weapons', 'Environment & Nature People', 'Dogs--Training', 'Mexicans', 'Barns', 'Portrait photography--United States--History', 'Dogs--Bibliography', 'Minnesota.', 'University of Pennsylvania. Meyerson Hall', 'McDonald, Walter S. 1921-1976', 'Chinese Art', 'Lakes and Rivers', 'Transportation', 'Children and animals--California--Los Angeles', 'Glaspell, Susan, 1876-1948', 'Chapbooks--1843', 'Borders (Ornament Areas)', 'United States-South Carolina-Georgetown County', 'Dogs--Exercise--Bibliography', 'Reseda (Los Angeles, Calif.)', 'Lawns--California--San Fernando Valley', 'Día de los Muertos', 'Newton Co. (Miss.)', 'Photography--Equipment and supplies', 'Homes', 'Customs', 'San Fernando Valley (Calif.)', 'Awards--California--San Fernando Valley', 'Working class--United States', 'Livestock protection dogs', 'Basset hound', 'Animals - Dogs', 'Pets', 'Valley Obedience Club', 'Environmental education', 'Dogs--Legends and stories', 'Father Liebler', 'Shades of L.A. Korean American photographs', 'Floats (Parade)', 'China', 'Outdoor recreation', 'Animals in logging', 'Weir, J. Alden', 'Electric lines--Poles and towers', 'Farming', 'Criminals', 'Poems--1843', 'Environmental protection', 'Cone Mills', 'Snowmobiles--Photographs', 'Bastian, Harry', 'Episcopal Church', 'Motion picture theaters', 'Dog breeds', 'Farm life--Mississippi', 'Wicker furniture', 'Dogs--Photographs', '100 Years of Hobo Day', 'Chicano art', 'Social Life and Customs', 'Prairie dogs', 'Sled dogs--Greenland', 'Dogs --Food', 'Carl Sandburg', 'Homes & haunts', 'Bell, William', 'Georgetown County', 'Moore, Riley Dunning, Aid, U.S. National Museum', 'Leisure', 'Endangered species', 'Restaurants.', 'Birds', 'Dissertations, Academic', 'Stone, Cora', 'Neighborhoods--Georgia--Savannah', 'Geese', 'Dresses', 'Sheepherders', 'Birdcages--California--San Fernando', 'Alta (Utah)--Photographs', 'Sports. Hunting.', 'South Carolina', 'Electric lines', 'Robertson, Anabel Graves', 'Main Course', 'Pekingese dog--California--Los Angeles', 'Portraits', 'Hunting dogs', 'Sheepdogs', 'Native Americans', 'Newspaper photographs', 'Children playing--People by age group and activity', 'Children.', '1890-1899', 'Men--Portraits', 'Animal shelters', 'Hobcaw Barony', 'Parades', 'Community relations', 'Saint Bernard dog--California--Los Angeles', 'Pets--California--Los Angeles', 'Sculpture study', 'Note-taking', 'Group portraits', 'Dogs--War use', 'Pets--California--San Fernando Valley', 'Dogs--Juvenile literature', 'Carriages Horses', 'Women dog owners--California--Los Angeles', 'Stags (Deer)', 'Lyme Disease', 'Girls', 'Baruch Foundation', 'England', 'Children', 'Competition', 'Country homes', 'Trails Rivers', 'Cooking (Sour cream and milk)', 'Champion Paper and Fibre Company--Employees', 'Shades of L.A. Collection photographs', 'San Francisco (Calif.)--Photographs', 'Walking', 'Women--California--Los Angeles', 'Group portraits--Visual works by subject types', 'Architecture', 'Mexican Americans', 'Performing arts', 'Dogs Children', 'Concert tours', 'Animals-Dogs', 'German shepherd dog--California--San Fernando', 'Meissen Porcelain', 'People', 'Bonnets', 'Streams', 'World War, 1939-1945--Photographs', 'Mueller, Stephen', 'Plants', 'Poodles--California--Los Angeles', 'Indians of North America--Social life and customs', 'Farm buildings', 'Ranches', 'Circus', 'Grasses', 'Outsider art--Georgia--Savannah', 'Albritton, Joe W.--Family', 'Baby animals', 'South Carolina, Georgetown County, Pawleys Island', 'Transparencies, Slides', 'Plantations', 'People associated with education and communication', 'Book review', 'Ocean', 'Hoot Owl Canyon', 'Cats', 'Events and Entertainment', 'Missionaries', 'Rescue dogs--Utah--Photographs', 'Environment', 'Bile acids--Analysis', 'Hobo Day Parade', 'Wheatleigh', 'Clothing and dress', 'Indian', 'Trails', 'Dogs--Training--Equipment and supplies', 'McMorrough, W. A', 'Wildlife Conservation.', 'South Carolina, Georgetown County', 'Anatomical study', 'Lawns', 'Lincoln Co. (Miss.)', 'Sport and play', 'Circuses & shows', 'Crime', 'Great Britain', 'Children--California--Los Angeles', 'Eagle Rock (Los Angeles, Calif.)', 'Occupations', 'Clothing and dress. Men. 1900-1909.', 'Dwellings', 'Trees--California--San Fernando', 'Greenland--Discovery and Exploration', 'Little Cottonwood Canyon'}\n" ] } ], "source": [ "# 3. Manipulate the DPLA metadata to do something interesting with the results.\n", "# I'm going to use the wrapper here - so the template is exercise 1.\n", "# Let's look at the first 1000 dog-related objects and look at the other subjects associated\n", "# with those objects. We'll get duplicate items, so let's take that large list and turn it \n", "# into a set.\n", "\n", "from dpla.api import DPLA\n", "import os\n", "my_api_key = os.getenv('API_KEY')\n", "dpla_connection = DPLA(my_api_key)\n", "result = dpla_connection.search('dogs', page_size=1000)\n", "length = result.count\n", "\n", "dog_categories = []\n", "\n", "for item in result.items:\n", " if 'subject' in item['sourceResource']:\n", " for name in item['sourceResource']['subject']:\n", " dog_categories.append(name['name'])\n", "\n", "print(set(dog_categories))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }