{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# An Introduction to the LibCrowds Annotations Data Model\n", "\n", "The purpose of this notebook is to introduce the data structures used to store [LibCrowds](https://www.libcrowds.com/)' results data and the public API that can be used to access that data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Web Annotations\n", "\n", "All LibCrowds results are stored as Web Annotations; a W3C standard used to make data more easily reusable online. The abstract from the [Web Annotation Data Model](https://www.w3.org/TR/annotation-model) is presented below:\n", "\n", "> Annotations are typically used to convey information about a resource or associations between resources. Simple examples include a comment or tag on a single web page or image, or a blog post about a news article.\n", ">\n", "> The Web Annotation Data Model specification describes a structured model and format to enable annotations to be shared and reused across different hardware and software platforms. Common use cases can be modeled in a manner that is simple and convenient, while at the same time enabling more complex requirements, including linking arbitrary content to a particular data point or to segments of timed multimedia resources.\n", ">\n", "> The specification provides a specific JSON format for ease of creation and consumption of annotations based on the conceptual model that accommodates these use cases, and the vocabulary of terms that represents it.\n", "\n", "By using this standardised structure for our final results we aim to make the crowdsourced data generated via the LibCrowds platform more easily reusable online, providing ways for researchers to answer specific questions via programmatic means. As well as the results data, we use Web Annotations to store additional user-generated data, such as image tags. \n", "\n", "All of these annotations are available via a public API that complies with a [standardised protocol](https://www.w3.org/TR/annotation-protocol/). The API can be used to gain programmatic access to our current results data. Depending on the objective, this method of consuming and analysing the data is likely to present advantages when compared with performing similar visualisations using offline datasets (i.e. those downloaded to your computer). Importantly, data requested via the API will always be up-to-date, as very soon after a task is completed the contributions will be analysed and the result made available via the API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The API\n", "\n", "The LibCrowds Annotation API is available at the following location:\n", "\n", "https://annotations.libcrowds.com/annotations/\n", "\n", "Annotations are returned in containers called Annotation Collections and by following the link above we can see all of the Annotation Collections available on the server. Annotation Collections are serialised as JSON-LD; a method of encoding Linked Data using JSON, where each key has some semantic meaning. \n", "\n", "The identity of each collection is provided against the `id` key as a URI that can be followed to access all the Annotations in that Annotation Collection. For instance, the *In the Spotlight* results data is available via the following endpoint:\n", "\n", "https://annotations.libcrowds.com/annotations/playbills-results\n", "\n", "To avoid responses from the API becoming too large, each Annotation Collection is split up into Annotaiton Pages. The first page of the *In the Spotlight* results data is available via the following endpoint:\n", "\n", "https://annotations.libcrowds.com/annotations/playbills-results?page=0\n", "\n", "An example of this page is presented below (note that the Annotations would usually appear against the `items` key).\n", "\n", "```json-ld\n", "{\n", " \"@context\": \"http://www.w3.org/ns/anno.jsonld\",\n", " \"id\": \"https://annotations.libcrowds.com/annotations/playbills-results/?page=0\",\n", " \"type\": \"AnnotationPage\",\n", " \"items\": [],\n", " \"next\": \"https://annotations.libcrowds.com/annotations/playbills-results/?page=1\",\n", " \"partOf\": {\n", " \"id\": \"https://annotations.libcrowds.com/annotations/playbills-results/\",\n", " \"type\": [\n", " \"AnnotationCollection\",\n", " \"BasicContainer\"\n", " ],\n", " \"label\": \"In the Spotlight Results\",\n", " \"created\": \"2018-05-30T10:48:14Z\",\n", " \"creator\": \"https://www.libcrowds.com/api/category/22\",\n", " \"generated\": \"2018-07-16T14:04:23Z\",\n", " \"modified\": \"2018-07-16T03:37:41Z\",\n", " \"total\": 48540,\n", " },\n", " \"startIndex\": 0,\n", "}\n", "```\n", "\n", "To avoid overloading this notebook with new concepts we won't go into the semantic meaning of every key. For now, it is enough to understand that there are consistent ways to programmatically navigate these data structures and pull out the information required, regardless of where they are stored or how they were produced. For example, we can see above that the `next` key above contains a link to the next page of Annotations. So, if we wanted to download all Annotations in an Annotation Collection we could request the first page then keep following this next link, if present, until we reach the end of the Annotation Collection." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The LibCrowds data model\n", "\n", "The LibCrowds data model describes how we use Web Annotations to model our results data.\n", "\n", "An example annotation is presented below. Again, don't worry if this doesn't make any sense just yet, the relevant sections will be explored in a bit more detail later. For now, note that Annotations are are also stored as JSON-LD; a method of encoding Linked Data using JSON.\n", "\n", "```json-ld\n", "{\n", " \"@context\": \"http://www.w3.org/ns/anno.jsonld\",\n", " \"id\": \"https://annotations.libcrowds.com/annotations/playbills-results/7640ddcd-6e48-4a9c-a360-3383032593b6\",\n", " \"type\": \"Annotation\",\n", " \"motivation\": \"describing\",\n", " \"created\": \"018-02-08T22:15:07.152Z\",\n", " \"generated\": \"018-02-08T22:15:07.152Z\",\n", " \"generator\": [\n", " {\n", " \"id\": \"https://github.com/LibCrowds/libcrowds\",\n", " \"type\": \"Software\",\n", " \"name\": \"LibCrowds\",\n", " \"homepage\": \"https://www.libcrowds.com\"\n", " },\n", " {\n", " \"id\": \"https://backend.libcrowds.com/api/task/42\",\n", " \"type\": \"Software\"\n", " }\n", " ],\n", " \"body\": [\n", " {\n", " \"type\": \"TextualBody\",\n", " \"purpose\": \"tagging\",\n", " \"value\": \"title\"\n", " },\n", " {\n", " \"type\": \"TextualBody\",\n", " \"purpose\": \"describing\",\n", " \"value\": \"King Lear\",\n", " \"format\": \"text/plain\"\n", " }\n", " ],\n", " \"target\": {\n", " \"source\": \"https://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589096.0x0002b7\",\n", " \"selector\": {\n", " \"conformsTo\": \"http://www.w3.org/TR/media-frags/\",\n", " \"type\": \"FragmentSelector\",\n", " \"value\": \"?xywh=7,1191,1962,359\"\n", " }\n", " }\n", "}\n", "```\n", "\n", "As with Annotation Collections, each Annotation key presented above has some semantic meaning, making it easier for other hardware and software platforms to consume the data.\n", "\n", "While these columns all serve a purpose and are essential for proper integration with other software platforms, it is likely that we will often only be interested in a few core parts of the dataset. Namely, the transcriptions provided by our volunteers, along with an indication of what they were transcribing. In Web Annotation terms, these would be referred to as the **body** and **target**, respectively, as summarised in the following extract from the [Web Annotation Data Model](https://www.w3.org/TR/annotation-model/):\n", "\n", "> An Annotation is a Web Resource. Typically, an Annotation has a single Body, which is a comment or other descriptive resource, and a single Target that the Body is somehow \"about\". The Annotation likely also has additional descriptive properties.\n", "\n", "Among these additional descriptive properties is the **motivation**, which specifies the reason for the Annotation's creation. LibCrowds' results are generated with one of three possible motivations: tagging, describing and commenting. For many of our analyses, we will only be interested in annotations with a particular motivation. For instance, we use Annotations with the describing motivation to store our transcription data. The decision to use *describing* as the motivation here (rather than, say, *transcribing*) was taken as *describing* has a semantic meaning defined by the standard [Web Annotation Vocabulary](https://www.w3.org/TR/annotation-vocab/#describing):\n", "\n", "> **2.3.5 describing:** The motivation for when the user intends to describe the Target, as opposed to (for example) a comment about it. \n", "\n", "We won't delve any further into the meaning of each key here. The important thing to note is that there are programatic ways locate the specific data we might require in a standardised way.\n", "\n", "The [documentation](https://docs.libcrowds.com/data/model/) contains more details of the different types of Annotation produced via the platform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "In this notebook, we took a brief look at the data structures used to model LibCrowds results. By using the LibCrowds Annotations API we can access the live results data and navigate it, programatically or otherwise, to locate the aspects of that data that we require.\n", "\n", "To see how we can begin analysing this data see [*An Introduction to Analysing LibCrowds Results Data Using Python*](intro_to_analysing_data.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }