{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "The [Portable Antiquities Scheme](http://finds.org.uk) (PAS) is a programme run by the British Museum and the National Museum of Wales. Small artefacts are often found in the course of gardening, metal detecting and other activities; the Scheme allows those finds to be recorded and those objects to become known. The Scheme has a database at [finds.org.uk/database](http://finds.org.uk/database) containing well over 1 million objects. The database exposes its records in a variety of ways to encourage scholarly re-use. Daniel Pett, who [designed and built](https://finds.org.uk/info) the database and webapi, wrote the following R code as a demonstration for how to query its API and retrieve photographic records." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Loading required package: bitops\n" ] } ], "source": [ "library(jsonlite)\n", "library(RCurl)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we set the base URL for PAS because we'll need it later. We're going to make a search of the database, which will return results to us in json format. Some of the key:value pairs will be things like 'filename' and 'imagedir'; to get the data we want, we'll grab that information and string it together with the base url to create a download path." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# The base URL for PAS\n", "base <- 'https://finds.org.uk/'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we set up our query. Open a new browser tab and go to the PAS website and do some simple searches to see the kind of information available. When you get the search results, scroll down to see the options for how you can get the data returned to you. Click on 'json', and note the URL in the search bar. That's what we're about to set up in the next cell:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "## Set your query up \n", "# The important parameters for you to include in a search are:\n", "# q/{queryString} - which has your free text or parameterised search e.g. q/gold/broadperiod/BRONZE+AGE\n", "# /thumbnail/1 - ask for records with images\n", "# /format/json - ask for json response\n", "##\n", "url <- \"https://finds.org.uk/database/search/results/q/gold/broadperiod/BRONZE+AGE/thumbnail/1/format/json\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next line, which is using a function from the 'jsonlite' package, goes to the URL we set up above and gets the data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Get your JSON and parse\n", "json <- fromJSON(url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you look at the results of your search in the other browser window, where you clicked on 'json' at the bottom of the page, you'll see a long list of key:value pairs. Below, we're going to grab some of the values that describe the metadata for our search." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# The total results available\n", "total <- json$meta$totalResults\n", "\n", "# Number of pages\n", "\n", "# Results \n", "results <- json$meta$resultsPerPage\n", "pagination <- ceiling(total/results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're now going to set up some variables that will specify which values we wish to keep, and pass to a csv file at the end of this process." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Set which fields to keep \n", "keeps <- c(\n", " \"id\", \"objecttype\", \"old_findID\",\n", " \"broadperiod\", \"institution\", \"imagedir\", \n", " \"filename\"\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "data <- json$results\n", "# Keep the columns you want\n", "data <- data[,(names(data) %in% keeps)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at what we've got. We could just call `data` and see everything. We'll use `head(data)` instead to see just the first few lines. Any guesses as to what you'd type to see the _last_ few lines?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
id | old_findID | objecttype | broadperiod | institution | filename | imagedir |
---|---|---|---|---|---|---|
904260 | PAS-011055 | PENANNULAR RING | BRONZE AGE | PAS | 2016T920b.jpg | images/ianr/ |
899611 | CORN-3237B8 | FLAT AXEHEAD | BRONZE AGE | CORN | DSCN0203.JPG | images/atyacke/ |
899113 | YORYM-057F37 | AXEHEAD | BRONZE AGE | YORYM | SWW0001.jpg | images/bmorris/ |
878840 | HESH-83416C | FLAT AXEHEAD | BRONZE AGE | HESH | HESH83416C.jpg | images/preavill/ |
878015 | HAMP-1248F2 | PENANNULAR RING | BRONZE AGE | HAMP | HAMP1248F2.jpg | images/khindshamp/ |
871932 | SUSS-1CBAB0 | PENANNULAR RING | BRONZE AGE | SUSS | RingSUSS1CBAB0.jpg | images/EdwinWood/ |