{ "metadata": { "name": "", "signature": "sha256:b277e04e01e8870c1aa4ef7d1076911f627bd458c73fc32904d134a53f4e735d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Open Context: Data Publication for Cultural Heritage and Field\n", "Research](http://opencontext.org/): \"Open Context reviews, edits, and\n", "publishes archaeological research data and archives data with\n", "university-backed repositories, including the California Digital\n", "Library.\"\n", "\n", "I often think of OpenContext as an examplar \u2013 a model from the future:\n", "academic data archiving done right. Some cool features ([About Open\n", "Context: Technologies](http://opencontext.org/about/technology)):\n", "\n", "- use of Atom feeds\n", "- JSON\n", "- KML\n", "- use of [timemap - Javascript library to help use a SIMILE timeline\n", " with online maps](https://code.google.com/p/timemap/) to map\n", " events/objects in time and space (though I wonder whether this\n", " technology has been superceded).\n", "- contextualization by making ties to other data services\n", " - putting things on maps\n", " - ties to controlled vocabulary around biologica taxa and\n", " archaelogical terminology.\n", " - use of linked open data: Examples?\n", " - would be great to tie in any technology we develop for the\n", " visualization of large image collections into OpenContext.\n", "\n", "We want to provide for the long-term citability and availability of this\n", "data.\n", "\n", "Also contextualization.\n", "\n", "What was the excellent presentation/paper he made to us in WwOD13?\n", "\n", "- Questions\n", " - What costs of archiving data on OpenContext (How are the costs\n", " shared among depositor, OpenContext, CDL, and funding agencies?)\n", " I think [About Open Context: Estimate Data Management +\n", " Publication Costs](http://opencontext.org/about/estimate) gives\n", " some clues.\n", " - What guarantees are made about the data once it's archived at\n", " OpenContext and CDL? [About Open\n", " Context](http://opencontext.org/about/): \"Data safeguarded and\n", " preserved though archiving with the University of California's\n", " California Digital Library\"\n", " - How do you cite data in OpenContext?\n", " - Who is reusing this data? Examples?\n", " - IP rights of images (and other data)? Can we tie to Wikipedia\n", " and to Wikimedia Commons? (Some insights in [About Open Context:\n", " Intellectual\n", " Property](http://opencontext.org/about/intellectual-property)\n", " but all the whole range of issues are complex. It seems like\n", " there will be varying levels of openness and restriction in\n", " OpenContext. I will want to dive in to look at specific\n", " examples.)\n", " - For example, I don't see any explicit copyright statement at\n", " [Open Context Image Lightbox: (1021 Images\n", " Showing)](http://opencontext.org/lightbox/?proj=Asian+Stoneware+Jars)\n", " or\n", " [http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars](http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars)\n", " - Can we bulk download data from OpenContext?\n", "- open data in science, specifically archaeology\n", " - OpenContext has an API: [About Open Context: Web Services and\n", " APIs](http://opencontext.org/about/services)\n", "- [Open Context: Data Publication for Cultural Heritage and Field\n", " Research](http://opencontext.org/)\n", "- Eric Kansa ([@ekansa](https://twitter.com/ekansa)) is a former I\n", " School Adjunct Prof, and we've done work together on open goverment,\n", " particularly on the Recovery Act. Eric was recently honored (in\n", " 2013) by the White House [Open Science Champion of\n", " Change](http://www.neh.gov/files/divisions/odh/images/wh_neh_champions.jpg).\n", "- possible project ideas\n", " - visualizing the image collections in OpenContext.\n", " - making ties to Encyclopedia of Life\n", " - thinking about challenges of archiving data, reconciling data,\n", " aligning metadata to standards.\n", " - I see a \"suggested citation\" in pages like [Open Context view of\n", " Item: Trench\n", " 6](http://opencontext.org/subjects/E5B52F10-333F-4CB8-C397-7DFAD00A3719).\n", " Good idea to embed metadata into page to make Zotero know how to\n", " grab citation metadata?\n", "\n", "There are so many possibilities here; we can work iteratively with Eric Kansa to\n", "develop a good project without having it all figured out upfront.\n", "\n", "Eric has mentioned to me the idea of time span facets.\n" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Studying the UI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How to reproduce data represented by the map on [Open Context](http://opencontext.org/)?\n", "\n", "How to use the API to get a list of projects?\n", "\n", "http://opencontext.org/sets/.json returns json representation of items, but\n", "http://opencontext.org/projects/.json doesn't work for getting list of all projects. Answer: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A quick jump into the API of opencontext.org\n", "\n", "Let's use a specific project to focus on:\n", "\n", "* \n", "* \n", "\n", "\n", "The API documentation: " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[Open Context: Data Publication for Cultural Heritage and Field Research](http://opencontext.org/)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# using an example in the API documentation to confirm that we can get json representation from API\n", "\n", "import requests\n", "json_url = \"http://opencontext.org/sets/Palestinian+Authority/Tell+en-Nasbeh/.json?proj=Bade+Museum\"\n", "\n", "r = requests.get(json_url)\n", "\n", "# what are the top level keys of response?\n", "r.json().keys()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "[u'updated',\n", " u'sorting',\n", " u'numFound',\n", " u'facets',\n", " u'offset',\n", " u'geoCount',\n", " u'chronoTileFacets',\n", " u'summary',\n", " u'paging',\n", " u'qstring',\n", " u'published',\n", " u'results',\n", " u'paramCount',\n", " u'geoTileFacets']" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# Now let's apply same logic to the Asian Stoneware Jars project\n", "\n", "json_url = \"http://opencontext.org/sets/.json?proj=Asian+Stoneware+Jars\"\n", "\n", "request = requests.get(json_url)\n", "request_json = request.json()\n", "\n", "results= request_json['results']\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "request_json.keys()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "[u'updated',\n", " u'sorting',\n", " u'numFound',\n", " u'facets',\n", " u'offset',\n", " u'geoCount',\n", " u'chronoTileFacets',\n", " u'summary',\n", " u'paging',\n", " u'qstring',\n", " u'published',\n", " u'results',\n", " u'paramCount',\n", " u'geoTileFacets']" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "# number of results matches what is on human UI\n", "request_json['numFound']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "1008" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "# we get back the first page of 10\n", "len(results)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "10" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "results[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "{u'catIcon': u'http://opencontext.org/database/ui_images/med_oc_icons/ceramic_artifacts_50x50.jpg',\n", " u'category': u'Pottery',\n", " u'context': u'
\\nContext: Philippines / San Diego
\\n
',\n", " u'geoTime': {u'geoLat': 13.539201,\n", " u'geoLong': 121.168213,\n", " u'timeBegin': False,\n", " u'timeEnd': False},\n", " u'label': u'UNE 104',\n", " u'project': u'Asian Stoneware Jars',\n", " u'thumbIcon': u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/Copy%20(2)%20of%20une104%20copy.jpg',\n", " u'uri': u'http://opencontext.org/subjects/EAFD5A63-83C0-43A3-691C-08878757A66D',\n", " u'var_vals': {u'Artefact Type': u'intact jar',\n", " u'Compositional Group': u'1',\n", " u'Dataset Type': u'A Ship',\n", " u'Diameter (mm)': u'440',\n", " u'Donor Institution/sample Source': u'National Museum of the Philippines',\n", " u'Group (INAA)': u'1',\n", " u'Group (icp)': u'1',\n", " u'Height (mm)': u'530',\n", " u'ICP - Ba': u'475.54',\n", " u'ICP - Ca': u'1366.57',\n", " u'ICP - Ce': u'120.03',\n", " u'ICP - Cu': u'27.82',\n", " u'ICP - Fe': u'14855.69',\n", " u'ICP - Ga': u'27.29',\n", " u'ICP - Hf': u'4.82',\n", " u'ICP - K': u'21721.65',\n", " u'ICP - La': u'88.31',\n", " u'ICP - Li': u'25.39',\n", " u'ICP - Mg': u'5677.31',\n", " u'ICP - Na': u'2328.68',\n", " u'ICP - Ni': u'8.6',\n", " u'ICP - Sc': u'9.74',\n", " u'ICP - Sr': u'53.75',\n", " u'ICP - Ti': u'3030.14',\n", " u'ICP - V': u'39.88',\n", " u'ICP - Yb': u'3.83',\n", " u'ICP - Zn': u'88.53',\n", " u'Museum No.': u'706',\n", " u'NAA validation - As': u'6.9',\n", " u'NAA validation - Au': u'0',\n", " u'NAA validation - Ba': u'526',\n", " u'NAA validation - Br': u'0',\n", " u'NAA validation - Ca': u'0',\n", " u'NAA validation - Ce': u'128',\n", " u'NAA validation - Co': u'20.9',\n", " u'NAA validation - Cr': u'23.4',\n", " u'NAA validation - Cs': u'6.5',\n", " u'NAA validation - Eu': u'1.36',\n", " u'NAA validation - Fe': u'1.53',\n", " u'NAA validation - Hf': u'11',\n", " u'NAA validation - K': u'2.59',\n", " u'NAA validation - La': u'74.5',\n", " u'NAA validation - Lu': u'0.59',\n", " u'NAA validation - Na': u'0.244',\n", " u'NAA validation - Rb': u'154',\n", " u'NAA validation - Sb': u'0.49',\n", " u'NAA validation - Sc': u'11.4',\n", " u'NAA validation - Sm': u'7.8',\n", " u'NAA validation - Ta': u'2.74',\n", " u'NAA validation - Tb': u'1.2',\n", " u'NAA validation - Th': u'37.6',\n", " u'NAA validation - U': u'10.5',\n", " u'NAA validation - Yb': u'4',\n", " u'NAA validation - Zn': u'90.2',\n", " u'PIXE - Al(1014)': u'129585',\n", " u'PIXE - Ca': u'1601',\n", " u'PIXE - F(area)': u'183.9',\n", " u'PIXE - Fe': u'14970',\n", " u'PIXE - K': u'25205',\n", " u'PIXE - Li(478)': u'10.2',\n", " u'PIXE - Mg(585)': u'0.0001',\n", " u'PIXE - Mn': u'405',\n", " u'PIXE - Na(440)': u'2659.3',\n", " u'PIXE - Rb': u'153',\n", " u'PIXE - Si': u'434771',\n", " u'PIXE - Sr': u'44',\n", " u'PIXE - Ti': u'4227',\n", " u'PIXE - V': u'74',\n", " u'PIXE - Zr': u'312',\n", " u'Photograph No. - Located In Photographs Folder.': u'UNE 104',\n", " u'Rel: http://www.cidoc-crm.org/rdfs/cidoc-crm#P2.has_type': u'http://collection.britishmuseum.org/id/thesauri/x7402',\n", " u'Rel: http://www.cidoc-crm.org/rdfs/cidoc-crm#P45F.consists_of': u'http://collection.britishmuseum.org/id/thesauri/x10539',\n", " u'Sample Source Person': u'Eusebio Dizon',\n", " u'Sample Weight (g)': u'4.5',\n", " u'Vessel Part Sampled': u'base',\n", " u'Year': u'1600'}}" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "# list the URLs for the thumbnails\n", "[result.get('thumbIcon') for result in results]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "[u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/Copy%20(2)%20of%20une104%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/UNE373%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une343%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une342%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une338%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une233%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une267%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/une375%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/Edited%20Copies/UNE115%20copy.jpg',\n", " u'http://artiraq.org/static/opencontext/stoneware-media/thumbs/photographs/UNE149.JPG']" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# do a quick display\n", "\n", "from IPython.display import HTML\n", "from jinja2 import Template\n", "\n", "CSS = \"\"\"\n", "\n", "\"\"\"\n", "\n", "IMAGES_TEMPLATE = CSS + \"\"\"\n", "
\n", " {% for item in items %}{% endfor %}\n", "
\n", "\"\"\"\n", " \n", "template = Template(IMAGES_TEMPLATE)\n", "HTML(template.render(items=results)) " ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "
\n", " \n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "" ] } ], "prompt_number": 8 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Parsing http://opencontext.org/sets/.json" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import requests\n", "url = \"http://opencontext.org/sets/.json\"\n", "\n", "r = requests.get(url)\n", "r.json().keys()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "[u'updated',\n", " u'sorting',\n", " u'numFound',\n", " u'facets',\n", " u'offset',\n", " u'geoCount',\n", " u'chronoTileFacets',\n", " u'summary',\n", " u'paging',\n", " u'qstring',\n", " u'published',\n", " u'results',\n", " u'paramCount',\n", " u'geoTileFacets']" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "r.json()['numFound']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "840420" ] } ], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "r.json()['paging']['prev']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "False" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "# write a generator for all items in http://opencontext.org/sets/.json\n", "\n", "import requests\n", "\n", "def opencontext_items():\n", " \n", " url = \"http://opencontext.org/sets/.json\"\n", " more_items = True\n", " \n", " while more_items:\n", " r = requests.get(url)\n", " for item in r.json()['results']:\n", " yield item\n", " \n", " url = r.json()['paging']['next']\n", " if not url:\n", " more_items = False\n", " " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "from itertools import islice\n", "results = list(islice(opencontext_items(), 25))\n", "HTML(template.render(items=results)) " ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "
\n", " \n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Parsing http://opencontext.org/projects/.atom" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import requests\n", "import lxml\n", "from lxml import etree\n", "\n", "url = \"http://opencontext.org/projects/.atom\"\n", "r = requests.get(url)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "doc = etree.fromstring(r.content)\n", "doc" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 15, "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "# get list of titles\n", "\n", "project_titles = [e.find('{http://www.w3.org/2005/Atom}title').text for e in doc.findall('{http://www.w3.org/2005/Atom}entry')]\n", "for (i, title) in enumerate(project_titles):\n", " print i+1, title" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 Faunal Data from Neolithic Mente\u015fe: (Overview)\n", "2 Kentucky Site Files: (Overview)\n", "3 Illinois Site Files: (Overview)\n", "4 Iowa Site Files: (Overview)\n", "5 Indiana Site Files: (Overview)\n", "6 Missouri Site Files: (Overview)\n", "7 South Carolina SHPO: (Overview)\n", "8 Georgia Archaeological Site File (GASF): (Overview)\n", "9 Florida Site Files: (Overview)\n", "10 Pyla-Koutsopetria Archaeological Project: (Overview)\n", "11 Balance Pan Weights from Nippur: (Overview)\n", "12 Osteometric Database of South American Camelids: (Overview)\n", "13 Ceramics, Trade, Provenience and Geology: Cyprus in the Late Bronze Age: (Overview)\n", "14 Archaeology of Mesoamerican Animals: (Overview)\n", "15 \u00c7atalh\u00f6y\u00fck Zooarchaeology: (Overview)\n", "16 \u00c7atalh\u00f6y\u00fck Area TP Zooarchaeology: (Overview)\n", "17 Il\u0131p\u0131nar Zooarchaeology: (Overview)\n", "18 Zooarchaeology of Neolithic Ulucak: (Overview)\n", "19 \u00c7ukuri\u00e7i H\u00f6y\u00fck Zooarchaeology: (Overview)\n", "20 Bar\u00e7\u0131n H\u00f6y\u00fck Zooarchaeology: (Overview)\n", "21 K\u00f6\u015fk H\u00f6y\u00fck Faunal Data: (Overview)\n", "22 Erbaba H\u00f6y\u00fck and Suberde Zooarchaeology: (Overview)\n", "23 Mikt\u2019sqaq Angayuk Finds: (Overview)\n", "24 Asian Stoneware Jars: (Overview)\n", "25 Zooarchaeology of \u00d6k\u00fczini Cave: (Overview)\n", "26 Zooarchaeology of Karain Cave B: (Overview)\n", "27 West Stow West Zooarchaeology: (Overview)\n", "28 Murlo: (Overview)\n", "29 Hacksilber Project: (Overview)\n", "30 Kenan Tepe: (Overview)\n", "31 Rough Cilicia: (Overview)\n", "32 Dhiban Excavation and Development Project: (Overview)\n", "33 Tal-e Malyan Zooarchaeology: Tal-e Malyan Zooarchaeology\n", "34 Zooarchaeology of Medieval Emden: (Overview)\n", "35 Chogha Mish Fauna: (Overview)\n", "36 Khirbat al-Mudayna al-Aliya: (Overview)\n", "37 Dove Mountain Groundstone: (Overview)\n", "38 Bade Museum: (Overview)\n", "39 San Diego Archaeological Center: (Overview)\n", "40 Presidio of San Francisco: (Overview)\n", "41 Aegean Archaeomalacology: (Overview)\n", "42 Petra Great Temple Excavations: (Overview)\n", "43 Iraq Heritage Program: (Overview)\n", "44 Lake Carlos Beach Site, 1992 and 1996: (Overview)\n", "45 Corneal Ulceration in South East Asia: (Overview)\n", "46 Harvard Peabody Mus. Zooarchaeology: (Overview)\n", "47 Hazor: Zooarchaeology: (Overview)\n", "48 Hayonim: Micromorphology: (Overview)\n", "49 Geissenklosterle: Micromorphology: (Overview)\n", "50 P\u0131narba\u015f\u0131 1994: Animal Bones: (Overview)\n", "51 Domuztepe Excavations: (Overview)\n" ] } ], "prompt_number": 16 } ], "metadata": {} } ] }