{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accessing Ghent University Library IIIF API\n",
    "The [Ghent University Library](https://lib.ugent.be/en), provides open access and open data programmes to enhance access to research. They produce high-resolution scans of historic documents, print journals and promote open access for academic publications.\n",
    "\n",
    "This notebook introduces how to explore the repository, basically read a metadata record, obtain the fulltext and create a CSV dataset. \n",
    "\n",
    "The content used in this notebook is based on *la Russie illustrée* which is a periodical with 15 volumes and 748 issues. The digital content can be retrieved [here](https://lib.ugent.be/viewer/collection/RUG01-001643403#?c=&m=&s=&cv=&xywh=-2290%2C-224%2C7504%2C4200). \n",
    "\n",
    "Additional information about the collection is accessible [here](https://www.ghentcdh.ugent.be/content/blogpost-phaedra-claeys-spreadsheet-nightmares-and-database-dreams).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting up things"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import requests, csv\n",
    "import json\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Glogal configuration\n",
    "In this section, we can add item that we want to use by providing its manifest URI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "manifestUrl = 'https://adore.ugent.be/IIIF/collections/RUG01-001643403'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieve the main collection element that corresponds to *La Russie illustrée*.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "responseManifest = requests.get(manifestUrl)\n",
    "print(responseManifest.url)\n",
    "\n",
    "# retrieving the metadata\n",
    "m = json.loads(responseManifest.text)\n",
    "\n",
    "# the title\n",
    "print('label:' + m['label'])\n",
    "print('attribution:' + m['attribution'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A manifest has a field with called manifests will all the elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in m['manifests']:\n",
    "    print(i['@id'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Let's analyse the manifests one by one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a CSV file to store the metadata\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "csv_out = csv.writer(open('gent_records.csv', 'w'), delimiter = ',', quotechar = '\"', quoting = csv.QUOTE_MINIMAL)\n",
    "csv_out.writerow(['title', 'label', 'date', 'thumbnail', 'publisher', 'attribution', 'provenance', 'manifestItemUrl'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in m['manifests']:\n",
    "    title = label = date = thumbnail = publisher = attribution = provenance = manifestItemUrl = ''\n",
    "    manifestItemUrl = i['@id']\n",
    "    \n",
    "    responseManifestItem = requests.get(manifestItemUrl)\n",
    "    \n",
    "    # retrieving the metadata\n",
    "    manifestItem = json.loads(responseManifestItem.text)\n",
    "    date = manifestItem['navDate']\n",
    "    attribution = manifestItem['attribution']\n",
    "    label = manifestItem['label']\n",
    "    \n",
    "    thumbnail = manifestItem['thumbnail']['@id']\n",
    "    \n",
    "    for metadata in manifestItem['metadata']:\n",
    "        \n",
    "        if metadata['label'] == 'Title' and not title: # first title\n",
    "            title = metadata['value']\n",
    "        elif metadata['label'] == 'Publisher':\n",
    "            publisher = metadata['value']\n",
    "        elif metadata['label'] == 'Provenance':\n",
    "            provenance = metadata['value']\n",
    "        else: pass\n",
    "    print(label + \" \" + thumbnail)\n",
    "    csv_out.writerow([title, label, date, thumbnail, publisher, attribution, provenance, manifestItemUrl])    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the CSV file from GitHub.\n",
    "# This puts the data in a Pandas DataFrame\n",
    "df = pd.read_csv('gent_records.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Have a peek"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many items are there?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# How many images?\n",
    "df['thumbnail'].count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating a chart to visualize the results\n",
    "This chart shows the number of resources by year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First we create a new column in pandas with the year\n",
    "df['year'] = pd.DatetimeIndex(df['date']).year\n",
    "\n",
    "ax = df['year'].value_counts().plot(kind='bar',\n",
    "                                    figsize=(14,8),\n",
    "                                    title=\"Number of resources per date\")\n",
    "ax.set_xlabel(\"Dates\")\n",
    "ax.set_ylabel(\"Resources\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Showing the thumbnails as a gallery\n",
    "\n",
    "Once we have queried the repository and we have the metadata as a CSV file, let's show the results as a thumbnail gallery."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import HTML, Image\n",
    "\n",
    "def _src_from_data(data):\n",
    "    \"\"\"Base64 encodes image bytes for inclusion in an HTML img element\"\"\"\n",
    "    img_obj = Image(data=data)\n",
    "    for bundle in img_obj._repr_mimebundle_():\n",
    "        for mimetype, b64value in bundle.items():\n",
    "            if mimetype.startswith('image/'):\n",
    "                return f'data:{mimetype};base64,{b64value}'\n",
    "\n",
    "def gallery(images, row_height='auto'):\n",
    "    \"\"\"Shows a set of images in a gallery that flexes with the width of the notebook.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    images: list of str or bytes\n",
    "        URLs or bytes of images to display\n",
    "\n",
    "    row_height: str\n",
    "        CSS height value to assign to all images. Set to 'auto' by default to show images\n",
    "        with their native dimensions. Set to a value like '250px' to make all rows\n",
    "        in the gallery equal height.\n",
    "    \"\"\"\n",
    "    figures = []\n",
    "    for image in images:\n",
    "        if isinstance(image, bytes):\n",
    "            src = _src_from_data(image)\n",
    "            caption = ''\n",
    "        else:\n",
    "            src = image\n",
    "            caption = f'<figcaption style=\"font-size: 0.6em\">{image}</figcaption>'\n",
    "        figures.append(f'''\n",
    "            <figure style=\"margin: 5px !important;\">\n",
    "              <img src=\"{src}\" style=\"height: {row_height}\">\n",
    "              \n",
    "            </figure>\n",
    "        ''')\n",
    "    return HTML(data=f'''\n",
    "        <div style=\"display: flex; flex-flow: row wrap; text-align: center;\">\n",
    "        {''.join(figures)}\n",
    "        </div>\n",
    "    ''')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gallery(df['thumbnail'], row_height='150px')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}