{ "cells": [ { "cell_type": "markdown", "id": "022acf10", "metadata": {}, "source": [ "# Exploring the Moving Image Archive dataset\n", "\n", "Created in October-December 2022 for the National Library of Scotland's Data Foundry by [Gustavo Candela, National Librarian’s Research Fellowship in Digital Scholarship 2022-23](https://data.nls.uk/projects/the-national-librarians-research-fellowship-in-digital-scholarship-2022-23/)" ] }, { "cell_type": "markdown", "id": "320fcb74", "metadata": {}, "source": [ "### About the Moving Image Archive Dataset\n", "\n", "This dataset represents the descriptive metadata from the Moving Image Archive catalogue, which is Scotland’s national collection of moving images.\n", "\n", "- Data format: metadata available as MARCXML and Dublin Core\n", "- Data source: https://data.nls.uk/data/metadata-collections/moving-image-archive/" ] }, { "cell_type": "markdown", "id": "b7c45d56", "metadata": {}, "source": [ "### Table of contents\n", "\n", "- [Preparation](#Preparation)\n", "- [Loading the RDF dataset](#Loading-the-RDF-dataset)\n", "- [Retrieving the geographic locations](#Retrieving-the-geographic-locations)\n", "- [Map Visualisation](#Map-visualisation)\n", "- [Retrieving the Wikidata identifiers](#Retrieving-the-Wikidata-identifiers)\n", "- [Wikidata visualisation](#Wikidata-visualisation)" ] }, { "cell_type": "markdown", "id": "13ac8304", "metadata": {}, "source": [ "### Citations\n", "\n", "- Candela, G., Sáez, M. D., Escobar, P., & Marco-Such, M. (2022). Reusing digital collections from GLAM institutions. Journal of Information Science, 48(2), 251–267. https://doi.org/10.1177/0165551520950246" ] }, { "cell_type": "markdown", "id": "66f1d878", "metadata": {}, "source": [ "### Preparation\n", "\n", "Import the libraries required to create a map based on the geographic locations provided by the dataset." ] }, { "cell_type": "code", "execution_count": 17, "id": "7c36353b", "metadata": {}, "outputs": [], "source": [ "import folium\n", "from rdflib import Graph\n", "\n", "import logging\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.CRITICAL)" ] }, { "cell_type": "markdown", "id": "8957dc4d", "metadata": {}, "source": [ "### Loading the RDF dataset" ] }, { "cell_type": "code", "execution_count": 18, "id": "100b4503", "metadata": {}, "outputs": [], "source": [ "# Create a Graph\n", "g = Graph().parse(\"../rdf/datasetEnriched.ttl\")" ] }, { "cell_type": "markdown", "id": "e536b859", "metadata": {}, "source": [ "### Retrieving the geographic locations\n", "\n", "The following SPARQL query retrieves the geographic locations provided by the RDF dataset." ] }, { "cell_type": "code", "execution_count": 20, "id": "8d616561", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "##### edm:Place resources\n" ] } ], "source": [ "print('##### edm:Place resources')\n", "\n", "# Query the data in g using SPARQL\n", "# This query returns the 'name' of all ``edm:Place`` instances\n", "q = \"\"\"\n", " PREFIX skos: \n", " PREFIX schema: \n", " PREFIX wgs: \n", "\n", " SELECT distinct ?p ?lat ?long ?lbl ?wikidata ?geonames\n", " WHERE {\n", " ?p rdf:type edm:Place .\n", " ?p skos:prefLabel ?lbl .\n", " ?p wgs:long ?long .\n", " ?p wgs:lat ?lat .\n", " ?p owl:sameAs ?wikidata . FILTER ( strstarts(str(?wikidata), \"https://www.wikidata.org/wiki/\") ).\n", " ?p owl:sameAs ?geonames . FILTER ( strstarts(str(?geonames), \"https://www.geonames.org/\") )\n", " }\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "5d9b058e", "metadata": {}, "source": [ "### Map visualisation\n", "\n", "The python library [folium](https://python-visualization.github.io/folium/) can be used to create a map. The query is applied to the graph and we iterate through results to add the items to the map.\n" ] }, { "cell_type": "code", "execution_count": 21, "id": "20dd1e76", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
Make this Notebook Trusted to load map: File -> Trust Notebook
" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Apply the query to the graph and iterate through results\n", "map_circles = folium.Map(location=[55.86,-4.25], tiles=\"OpenStreetMap\", zoom_start=5)\n", "\n", "for r in g.query(q):\n", " idwikidata = r['wikidata']\n", " lat = r['lat']\n", " lon = r['long']\n", " idgeonames = r['geonames']\n", " label = r['lbl']\n", " \n", " text_popup = \"Records in \" + label + \"\"\n", "\n", " folium.Circle(\n", " location=[lat, lon],\n", " popup=text_popup,\n", " #radius=float(total)/10,\n", " color='crimson',\n", " fill=True,\n", " fill_color='crimson'\n", " ).add_to(map_circles)\n", "\n", "map_circles" ] }, { "cell_type": "markdown", "id": "d80984c5", "metadata": {}, "source": [ "### Retrieving the Wikidata identifiers" ] }, { "cell_type": "code", "execution_count": 22, "id": "e46efd11", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "##### edm:Place resources linked to Wikidata\n" ] } ], "source": [ "print('##### edm:Place resources linked to Wikidata')\n", "\n", "# Query the data in g using SPARQL\n", "# This query returns the 'name' of all ``schema:Place`` instances\n", "q = \"\"\"\n", " PREFIX skos: \n", " PREFIX schema: \n", " PREFIX wgs: \n", "\n", " SELECT distinct ?wikidata\n", " WHERE {\n", " ?p rdf:type edm:Place .\n", " ?p owl:sameAs ?wikidata . FILTER ( strstarts(str(?wikidata), \"https://www.wikidata.org/wiki/\") ).\n", " }\n", "\"\"\"" ] }, { "cell_type": "code", "execution_count": 23, "id": "e06a71dd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.wikidata.org/wiki/Q980084\n", "https://www.wikidata.org/wiki/Q207268\n", "https://www.wikidata.org/wiki/Q376914\n", "https://www.wikidata.org/wiki/Q550606\n", "https://www.wikidata.org/wiki/Q1061313\n", "https://www.wikidata.org/wiki/Q1247396\n", "https://www.wikidata.org/wiki/Q786649\n", "https://www.wikidata.org/wiki/Q202177\n", "https://www.wikidata.org/wiki/Q54809\n", "https://www.wikidata.org/wiki/Q864668\n", "https://www.wikidata.org/wiki/Q664892\n", "https://www.wikidata.org/wiki/Q206934\n", "https://www.wikidata.org/wiki/Q182923\n", "https://www.wikidata.org/wiki/Q978599\n", "https://www.wikidata.org/wiki/Q1247435\n", "https://www.wikidata.org/wiki/Q207257\n", "https://www.wikidata.org/wiki/Q1147435\n", "https://www.wikidata.org/wiki/Q2421\n", "https://www.wikidata.org/wiki/Q47134\n", "https://www.wikidata.org/wiki/Q1229763\n", "https://www.wikidata.org/wiki/Q9177476\n", "https://www.wikidata.org/wiki/Q100166\n", "https://www.wikidata.org/wiki/Q1247384\n", "https://www.wikidata.org/wiki/Q80967\n", "https://www.wikidata.org/wiki/Q81052\n", "https://www.wikidata.org/wiki/Q204940\n", "https://www.wikidata.org/wiki/Q123709\n", "https://www.wikidata.org/wiki/Q17582129\n", "https://www.wikidata.org/wiki/Q203000\n", "https://www.wikidata.org/wiki/Q189912\n", "https://www.wikidata.org/wiki/Q652539\n", "https://www.wikidata.org/wiki/Q36405\n", "https://www.wikidata.org/wiki/Q201149\n", "https://www.wikidata.org/wiki/Q530296\n", "https://www.wikidata.org/wiki/Q793283\n", "https://www.wikidata.org/wiki/Q211091\n", "https://www.wikidata.org/wiki/Q23436\n", "https://www.wikidata.org/wiki/Q4093\n" ] } ], "source": [ "for r in g.query(q):\n", " idwikidata = r['wikidata']\n", " print(idwikidata)" ] }, { "cell_type": "markdown", "id": "b80b7fc3", "metadata": {}, "source": [ "### Wikidata visualisation\n", "\n", "The [following link](https://w.wiki/5qa4) presents a map as a result of a SPARL query that retrieves all the geographic locations provided by the dataset and linked to Wikidata." ] }, { "cell_type": "code", "execution_count": 24, "id": "6f1b1bc8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import IFrame\n", "\n", "IFrame(src='https://w.wiki/5qa4', width=900, height=700)" ] }, { "cell_type": "code", "execution_count": null, "id": "cd7d6d6c", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }