{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Accessing Ghent University Library IIIF API\n", "The [Ghent University Library](https://lib.ugent.be/en), provides open access and open data programmes to enhance access to research. They produce high-resolution scans of historic documents, print journals and promote open access for academic publications.\n", "\n", "This notebook introduces how to explore the repository, basically read a metadata record, obtain the fulltext and create a CSV dataset. \n", "\n", "The content used in this notebook is based on *la Russie illustrée* which is a periodical with 15 volumes and 748 issues. The digital content can be retrieved [here](https://lib.ugent.be/viewer/collection/RUG01-001643403#?c=&m=&s=&cv=&xywh=-2290%2C-224%2C7504%2C4200). \n", "\n", "Additional information about the collection is accessible [here](https://www.ghentcdh.ugent.be/content/blogpost-phaedra-claeys-spreadsheet-nightmares-and-database-dreams).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up things" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "import requests, csv\n", "import json\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Glogal configuration\n", "In this section, we can add item that we want to use by providing its manifest URI." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "manifestUrl = 'https://adore.ugent.be/IIIF/collections/RUG01-001643403'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve the main collection element that corresponds to *La Russie illustrée*.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "responseManifest = requests.get(manifestUrl)\n", "print(responseManifest.url)\n", "\n", "# retrieving the metadata\n", "m = json.loads(responseManifest.text)\n", "\n", "# the title\n", "print('label:' + m['label'])\n", "print('attribution:' + m['attribution'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A manifest has a field with called manifests will all the elements" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in m['manifests']:\n", " print(i['@id'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's analyse the manifests one by one" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a CSV file to store the metadata\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "csv_out = csv.writer(open('gent_records.csv', 'w'), delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n", "csv_out.writerow(['title', 'label', 'date', 'thumbnail', 'publisher', 'attribution', 'provenance', 'manifestItemUrl'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in m['manifests']:\n", " title = label = date = thumbnail = publisher = attribution = provenance = manifestItemUrl = ''\n", " manifestItemUrl = i['@id']\n", " \n", " responseManifestItem = requests.get(manifestItemUrl)\n", " \n", " # retrieving the metadata\n", " manifestItem = json.loads(responseManifestItem.text)\n", " date = manifestItem['navDate']\n", " attribution = manifestItem['attribution']\n", " label = manifestItem['label']\n", " \n", " thumbnail = manifestItem['thumbnail']['@id']\n", " \n", " for metadata in manifestItem['metadata']:\n", " \n", " if metadata['label'] == 'Title' and not title: # first title\n", " title = metadata['value']\n", " elif metadata['label'] == 'Publisher':\n", " publisher = metadata['value']\n", " elif metadata['label'] == 'Provenance':\n", " provenance = metadata['value']\n", " else: pass\n", " print(label + \" \" + thumbnail)\n", " csv_out.writerow([title, label, date, thumbnail, publisher, attribution, provenance, manifestItemUrl]) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load the CSV file from GitHub.\n", "# This puts the data in a Pandas DataFrame\n", "df = pd.read_csv('gent_records.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Have a peek" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How many items are there?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# How many images?\n", "df['thumbnail'].count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating a chart to visualize the results\n", "This chart shows the number of resources by year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# First we create a new column in pandas with the year\n", "df['year'] = pd.DatetimeIndex(df['date']).year\n", "\n", "ax = df['year'].value_counts().plot(kind='bar',\n", " figsize=(14,8),\n", " title=\"Number of resources per date\")\n", "ax.set_xlabel(\"Dates\")\n", "ax.set_ylabel(\"Resources\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Showing the thumbnails as a gallery\n", "\n", "Once we have queried the repository and we have the metadata as a CSV file, let's show the results as a thumbnail gallery." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import HTML, Image\n", "\n", "def _src_from_data(data):\n", " \"\"\"Base64 encodes image bytes for inclusion in an HTML img element\"\"\"\n", " img_obj = Image(data=data)\n", " for bundle in img_obj._repr_mimebundle_():\n", " for mimetype, b64value in bundle.items():\n", " if mimetype.startswith('image/'):\n", " return f'data:{mimetype};base64,{b64value}'\n", "\n", "def gallery(images, row_height='auto'):\n", " \"\"\"Shows a set of images in a gallery that flexes with the width of the notebook.\n", " \n", " Parameters\n", " ----------\n", " images: list of str or bytes\n", " URLs or bytes of images to display\n", "\n", " row_height: str\n", " CSS height value to assign to all images. Set to 'auto' by default to show images\n", " with their native dimensions. Set to a value like '250px' to make all rows\n", " in the gallery equal height.\n", " \"\"\"\n", " figures = []\n", " for image in images:\n", " if isinstance(image, bytes):\n", " src = _src_from_data(image)\n", " caption = ''\n", " else:\n", " src = image\n", " caption = f'
{image}
'\n", " figures.append(f'''\n", "
\n", " \n", " \n", "
\n", " ''')\n", " return HTML(data=f'''\n", "
\n", " {''.join(figures)}\n", "
\n", " ''')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gallery(df['thumbnail'], row_height='150px')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }