{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Accessing Ghent University Library IIIF API\n",
"The [Ghent University Library](https://lib.ugent.be/en), provides open access and open data programmes to enhance access to research. They produce high-resolution scans of historic documents, print journals and promote open access for academic publications.\n",
"\n",
"This notebook introduces how to explore the repository, basically read a metadata record, obtain the fulltext and create a CSV dataset. \n",
"\n",
"The content used in this notebook is based on *la Russie illustrée* which is a periodical with 15 volumes and 748 issues. The digital content can be retrieved [here](https://lib.ugent.be/viewer/collection/RUG01-001643403#?c=&m=&s=&cv=&xywh=-2290%2C-224%2C7504%2C4200). \n",
"\n",
"Additional information about the collection is accessible [here](https://www.ghentcdh.ugent.be/content/blogpost-phaedra-claeys-spreadsheet-nightmares-and-database-dreams).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up things"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import requests, csv\n",
"import json\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Glogal configuration\n",
"In this section, we can add item that we want to use by providing its manifest URI."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"manifestUrl = 'https://adore.ugent.be/IIIF/collections/RUG01-001643403'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieve the main collection element that corresponds to *La Russie illustrée*.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"responseManifest = requests.get(manifestUrl)\n",
"print(responseManifest.url)\n",
"\n",
"# retrieving the metadata\n",
"m = json.loads(responseManifest.text)\n",
"\n",
"# the title\n",
"print('label:' + m['label'])\n",
"print('attribution:' + m['attribution'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A manifest has a field with called manifests will all the elements"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in m['manifests']:\n",
" print(i['@id'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Let's analyse the manifests one by one"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating a CSV file to store the metadata\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"csv_out = csv.writer(open('gent_records.csv', 'w'), delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n",
"csv_out.writerow(['title', 'label', 'date', 'thumbnail', 'publisher', 'attribution', 'provenance', 'manifestItemUrl'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in m['manifests']:\n",
" title = label = date = thumbnail = publisher = attribution = provenance = manifestItemUrl = ''\n",
" manifestItemUrl = i['@id']\n",
" \n",
" responseManifestItem = requests.get(manifestItemUrl)\n",
" \n",
" # retrieving the metadata\n",
" manifestItem = json.loads(responseManifestItem.text)\n",
" date = manifestItem['navDate']\n",
" attribution = manifestItem['attribution']\n",
" label = manifestItem['label']\n",
" \n",
" thumbnail = manifestItem['thumbnail']['@id']\n",
" \n",
" for metadata in manifestItem['metadata']:\n",
" \n",
" if metadata['label'] == 'Title' and not title: # first title\n",
" title = metadata['value']\n",
" elif metadata['label'] == 'Publisher':\n",
" publisher = metadata['value']\n",
" elif metadata['label'] == 'Provenance':\n",
" provenance = metadata['value']\n",
" else: pass\n",
" print(label + \" \" + thumbnail)\n",
" csv_out.writerow([title, label, date, thumbnail, publisher, attribution, provenance, manifestItemUrl]) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load the CSV file from GitHub.\n",
"# This puts the data in a Pandas DataFrame\n",
"df = pd.read_csv('gent_records.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Have a peek"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How many items are there?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# How many images?\n",
"df['thumbnail'].count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Creating a chart to visualize the results\n",
"This chart shows the number of resources by year"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# First we create a new column in pandas with the year\n",
"df['year'] = pd.DatetimeIndex(df['date']).year\n",
"\n",
"ax = df['year'].value_counts().plot(kind='bar',\n",
" figsize=(14,8),\n",
" title=\"Number of resources per date\")\n",
"ax.set_xlabel(\"Dates\")\n",
"ax.set_ylabel(\"Resources\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Showing the thumbnails as a gallery\n",
"\n",
"Once we have queried the repository and we have the metadata as a CSV file, let's show the results as a thumbnail gallery."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import HTML, Image\n",
"\n",
"def _src_from_data(data):\n",
" \"\"\"Base64 encodes image bytes for inclusion in an HTML img element\"\"\"\n",
" img_obj = Image(data=data)\n",
" for bundle in img_obj._repr_mimebundle_():\n",
" for mimetype, b64value in bundle.items():\n",
" if mimetype.startswith('image/'):\n",
" return f'data:{mimetype};base64,{b64value}'\n",
"\n",
"def gallery(images, row_height='auto'):\n",
" \"\"\"Shows a set of images in a gallery that flexes with the width of the notebook.\n",
" \n",
" Parameters\n",
" ----------\n",
" images: list of str or bytes\n",
" URLs or bytes of images to display\n",
"\n",
" row_height: str\n",
" CSS height value to assign to all images. Set to 'auto' by default to show images\n",
" with their native dimensions. Set to a value like '250px' to make all rows\n",
" in the gallery equal height.\n",
" \"\"\"\n",
" figures = []\n",
" for image in images:\n",
" if isinstance(image, bytes):\n",
" src = _src_from_data(image)\n",
" caption = ''\n",
" else:\n",
" src = image\n",
" caption = f'