{ "metadata": { "name": "Day_25_dpla" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "From the docs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://dp.la/info/developers/codex/ :\n", " \n", "* http://api.dp.la/v2 is the base URL of the DPLA API.\n", "* **items and collections are the two resource types you can request.**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "launch of dp.la API v2: \n", "https://cyber.law.harvard.edu/lists/arc/dpla-tech/2013-04/msg00004.html\n", "\n", "Need a key, which you can get, using [incantation from Tom Morris](https://cyber.law.harvard.edu/lists/arc/dpla-tech/2013-04/msg00007.html):\n", "\n", " curl -v -XPOST http://api.dp.la/v2/api_key/you@your_email.com\n", "\n", "As Tom wrote: \"If you use Gmail, check your spam folder. The key is sent immediately.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Would be nice to compare to Europeana API: http://pro.europeana.eu/api" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Use special library to parse json-ld?\n", "\n", "\n", "Reading to get the lowdown on json-ld Should I use one of the Python libs for json-ld? if so, which one?\n", "\n", "OK, I'll try because it's being actively developed:\n", "\n", " git clone git://github.com/digitalbazaar/pyld.git\n", " cd pyld/\n", " python setup.py install\n", "\n", "but I ran into [installation problems](https://github.com/digitalbazaar/pyld/issues/18) -- so I'll let go of looking at json-ld right now. (There might be a fix: https://github.com/digitalbazaar/pyld/commit/1173af0db20a1a27ba2fcf15bde531c0bf1fca2b )" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# from pyld import jsonld" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# Goal: feed a bunch of search terms to try to get at some collections\n", "\n", "# API doc: https://github.com/dpla/platform/\n", "# test data sources: http://dp.la/wiki/Platform_test_data_sources\n", "\n", "import requests\n", "import json\n", "import urllib\n", "from itertools import islice\n", "\n", "from CREDENTIALS import DPLA_KEY\n", "\n", "# http://api.dp.la/v2/items?api_key=YOUR_API_KEY\n", "\n", "# Retrieve an item by ID\n", "# http://api.dp.la/v2/items/a4e2346032cae75b0832abe064c14bcb\n", "\n", "# Retrieve multiple items by ID\n", "# http://api.dp.la/v1/items/a4e2346032cae75b0832abe0644e9b26,a4e2346032cae75b0832abe064c14bcb\n", "\n", "\n", "def dpla_query(**kw_input):\n", " \n", " kwargs = {\"page_size\": 20, \"page\": 1, \"sort_order\":\"asc\", \"api_key\":DPLA_KEY}\n", " \n", " # fudgy -- allow an extra parameter to allow for ones that can fit kw_input -- e.g., spatial.coordinates\n", " extras = kw_input.pop('extras',{})\n", " kw_input.update(extras)\n", " \n", " kwargs.update(kw_input)\n", " kwargs = dict([(k,v) for (k,v) in kwargs.items() if v is not None])\n", " \n", " # asc vs desc\n", " \n", " # available text search fields\n", " text_search_fields = (\"title\", \"description\", \"dplaContributor\", \"creator\", \"sourceResource.type\", \"publisher\", \"format\", \"rights\", \"contributor\", \"spatial\")\n", " expected_doc_fields = ['title','description', 'creator', 'type', 'publisher', 'format', 'rights', 'contributor', 'created', 'spatial', 'temporal', 'source']\n", " \n", " # temporal fields\n", " # http://api.dp.la/v1/items?temporal.after=1963-11-01&temporal.before=1963-11-30\n", " \n", " # location available...not implemented here\n", " more_items = True\n", " \n", " # content[\"count\"], content[\"start\"], content[\"limit\"]\n", " \n", " while more_items:\n", " \n", " url = \"http://api.dp.la/v2/items?\" + urllib.urlencode(kwargs)\n", " #print url\n", " r = requests.get(url)\n", " content = json.loads(r.content)\n", " \n", " if len(content.get(\"docs\", [])):\n", " for doc in content[\"docs\"]:\n", " yield (doc, content[\"count\"])\n", " if kwargs['sort_order'] == 'desc':\n", " kwargs['page'] -= 1\n", " else:\n", " kwargs['page'] += 1\n", " else:\n", " more_items = False\n", "\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "# search terms to feed in \n", "\n", "SEARCH_TERMS = [\"Bach\", \"tree\", \"horse\", \"cow\", \"Gore\"]\n", "\n", "# figure out collections \n", "collections = set()\n", "\n", "for term in islice(SEARCH_TERMS,1):\n", " results = list(islice(dpla_query(q=term),100)) \n", " \n", " for (i, (doc, count)) in enumerate(results):\n", " collections.add(doc.get('isPartOf', {'title':None}).get('title'))\n", " \n", "print len(collections)\n", "\n", "# for each collection let's figure out the number of items in the collection\n", "\n", "from IPython.display import HTML\n", "from jinja2 import Template\n", "\n", "num_items = []\n", "\n", "for (i, collection) in enumerate(sorted(collections)):\n", " if collection is not None:\n", " size_collection = list(islice(dpla_query(isPartOf=collection),1))[0][1] if collection is not None else 0\n", " url = \"http://api.dp.la/v1/items?\" + urllib.urlencode({'isPartOf': collection})\n", " num_items.append((collection, size_collection, url))\n", " else:\n", " num_items.append((None, 0, \"\"))\n", " \n", "TABLE_TEMPLATE = \"\"\"\n", " \n", " \n", " \n", " \n", " \n", " {% for num_item in num_items %}\n", " \n", " \n", " \n", " \n", "{% endfor %}\n", " \n", "\"\"\"\n", " \n", "template = Template(TABLE_TEMPLATE)\n", "HTML(template.render(num_items=num_items)) " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n" ] }, { "html": [ "
CollectionNumber of itemsAPI
{{num_item.0}}{{num_item.1}}{{num_item.2}}
\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " " ], "output_type": "pyout", "prompt_number": 3, "text": [ "" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "r = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "r0 = list(islice(r,10))[0]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "\n", "\n", "print \"keys\", r0[0].keys()\n", "\n", "print \"count\", r0[1]\n", "print \"item_url\", \"http://dp.la/item/{0}\".format(r0[0]['id']) \n", "print \"id\", r0[0]['id'] \n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "keys [u'_id', u'hasView', u'sourceResource', u'_rev', u'object', u'ingestDate', u'originalRecord', u'score', u'isShownAt', u'provider', u'@context', u'ingestType', u'dataProvider', u'@id', u'id']\n", "count 717\n", "item_url http://dp.la/item/84a6b96a7a9e3d17562c6d7c8eac4acb\n", "id 84a6b96a7a9e3d17562c6d7c8eac4acb\n" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "HTML(\"\"\"item\"\"\".format(\"http://dp.la/item/{0}\".format(r0[0]['id'])))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "item" ], "output_type": "pyout", "prompt_number": 7, "text": [ "" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# namespaces \n", "r0[0]['@context']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 8, "text": [ "{u'@vocab': u'http://purl.org/dc/terms/',\n", " u'LCSH': u'http://id.loc.gov/authorities/subjects',\n", " u'aggregatedDigitalResource': u'dpla:aggregatedDigitalResource',\n", " u'begin': {u'@id': u'dpla:dateRangeStart', u'@type': u'xsd:date'},\n", " u'collection': u'dpla:aggregation',\n", " u'coordinates': u'dpla:coordinates',\n", " u'dataProvider': u'edm:dataProvider',\n", " u'dpla': u'http://dp.la/terms/',\n", " u'edm': u'http://www.europeana.eu/schemas/edm/',\n", " u'end': {u'@id': u'dpla:end', u'@type': u'xsd:date'},\n", " u'hasView': u'edm:hasView',\n", " u'isShownAt': u'edm:isShownAt',\n", " u'name': u'xsd:string',\n", " u'object': u'edm:object',\n", " u'originalRecord': u'dpla:originalRecord',\n", " u'provider': u'edm:provider',\n", " u'sourceResource': u'edm:sourceResource',\n", " u'state': u'dpla:state',\n", " u'stateLocatedIn': u'dpla:stateLocatedIn'}" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "r0[0]['dataProvider']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 9, "text": [ "u'National Archives at College Park - Still Pictures'" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "r0[0]['hasView']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 10, "text": [ "[{u'format': u'image/jpeg',\n", " u'url': u'http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515.jpeg'}]" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "r0[0]['object']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 11, "text": [ "u'http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515_t.jpeg'" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})\n", "items = list([result[0] for result in islice(results,100)])\n", "\n", "for item in items:\n", " print item.get('object', None)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15215_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15552_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15540_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12182_t.jpeg\n", "http://media.artstor.net/imgstor/size2/yale/british/yale_british_526_8b_srgb.jpg\n", "http://content.lib.utah.edu/utils/getthumbnail/collection/VE_Photos/id/8126\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12177_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12179_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15217_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15196_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15513_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15526_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15213_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12181_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15210_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15220_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15195_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15547_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15527_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15220_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12180_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15219_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15189_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15514_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15207_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15539_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15519_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15216_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15554_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15193_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15545_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15194_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15546_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15555_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15533_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15211_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15201_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15535_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15203_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15522_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15535_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15203_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15528_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15548_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15537_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15205_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15517_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15532_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15200_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15544_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15192_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15524_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15541_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15209_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15521_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00170_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15529_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15197_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15549_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15197_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15549_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15208_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15218_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15188_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15520_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15202_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15534_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15212_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00171_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15530_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15550_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15198_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00170_t.jpeg\n", "http://digital.library.unlv.edu/cgi-bin/thumbnail.exe?CISOROOT=/snv&CISOPTR=289\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12178_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00485_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00171_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17143_t.jpeg\n", "http://content.lib.utah.edu/utils/getthumbnail/collection/coa/id/5443\n", "http://content.lib.utah.edu/utils/getthumbnail/collection/coa/id/5373\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-17895_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DA-SD-05-06503_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/1991/DF-ST-91-06474_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2007/DN-SD-07-07719_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-16908_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17245_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00309_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00180_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00454_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00139_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00339_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00164_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17141_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DM-SD-03-14162_t.jpeg\n", "http://media.nara.gov/media/images/21/30/21-2989t.gif\n", "http://media.nara.gov/stillpix/330-cfd/1986/DF-ST-86-04924_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-17889_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/2007/DN-SD-07-07722_t.jpeg\n", "http://media.nara.gov/stillpix/330-cfd/1982/DF-ST-82-00473_t.jpeg\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})\n", "\n", "items = list([result[0] for result in islice(results,10)])\n", "\n", "TABLE_TEMPLATE = \"\"\"\n", " {% for item in items %}\n", "\n", " {% endfor %}\n", "\"\"\"\n", " \n", "template = Template(TABLE_TEMPLATE)\n", "HTML(template.render(items=items)) " ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " \n", "\n", " " ], "output_type": "pyout", "prompt_number": 13, "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Rights challenge" ] }, { "cell_type": "code", "collapsed": false, "input": [ "results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})\n", "\n", "for result in islice(results,10):\n", " print result[0]['sourceResource'].get('rights')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "None\n", "Digital Image, copyright 2010 Uintah County Library\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n", "Restrictions: Unrestricted; Use status: Unrestricted\n" ] } ], "prompt_number": 14 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Follow-up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Revise my code to get reliable access to object URI: http://dp.la/info/forums/topic/how-to-reliably-find-url-for-returned-object-in-api/#post-8327" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }
CollectionNumber of itemsAPI
None0