# Exploring the Moving Image Archive dataset

Created in October-December 2022 for the National Library of Scotland's Data Foundry by [Gustavo Candela, National Librarian’s Research Fellowship in Digital Scholarship 2022-23](https://data.nls.uk/projects/the-national-librarians-research-fellowship-in-digital-scholarship-2022-23/)

### About the Moving Image Archive Dataset

This dataset represents the descriptive metadata from the Moving Image Archive catalogue, which is Scotland’s national collection of moving images.

- Data format: metadata available as MARCXML and Dublin Core
- Data source: https://data.nls.uk/data/metadata-collections/moving-image-archive/

### Table of contents

- [Preparation](#Preparation)
- [Loading the RDF dataset](#Loading-the-RDF-dataset)
- [Retrieving the geographic locations](#Retrieving-the-geographic-locations)
- [Map Visualisation](#Map-visualisation)
- [Retrieving the Wikidata identifiers](#Retrieving-the-Wikidata-identifiers)
- [Wikidata visualisation](#Wikidata-visualisation)

### Citations

- Candela, G., Sáez, M. D., Escobar, P., & Marco-Such, M. (2022). Reusing digital collections from GLAM institutions. Journal of Information Science, 48(2), 251–267. https://doi.org/10.1177/0165551520950246

### Preparation

Import the libraries required to create a map based on the geographic locations provided by the dataset.

In [17]:
import folium
from rdflib import Graph

import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

### Loading the RDF dataset

In [18]:
# Create a Graph
g = Graph().parse("../rdf/datasetEnriched.ttl")

### Retrieving the geographic locations

The following SPARQL query retrieves the geographic locations provided by the RDF dataset.

In [20]:
print('##### edm:Place resources')

# Query the data in g using SPARQL
# This query returns the 'name' of all ``edm:Place`` instances
q = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX schema: <http://schema.org/>
    PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>

    SELECT distinct ?p ?lat ?long ?lbl ?wikidata ?geonames
    WHERE {
        ?p rdf:type edm:Place .
        ?p skos:prefLabel ?lbl .
        ?p wgs:long ?long .
        ?p wgs:lat ?lat .
        ?p owl:sameAs ?wikidata . FILTER ( strstarts(str(?wikidata), "https://www.wikidata.org/wiki/") ).
        ?p owl:sameAs ?geonames . FILTER ( strstarts(str(?geonames), "https://www.geonames.org/") )
    }
"""

##### edm:Place resources


### Map visualisation

The python library [folium](https://python-visualization.github.io/folium/) can be used to create a map. The query is applied to the graph and we iterate through results to add the items to the map.


In [21]:
# Apply the query to the graph and iterate through results
map_circles = folium.Map(location=[55.86,-4.25], tiles="OpenStreetMap", zoom_start=5)

for r in g.query(q):
    idwikidata = r['wikidata']
    lat = r['lat']
    lon = r['long']
    idgeonames = r['geonames']
    label = r['lbl']
    
    text_popup = "Records in <a href='" + idwikidata + "'>" + label + "</a>"

    folium.Circle(
      location=[lat, lon],
      popup=text_popup,
      #radius=float(total)/10,
      color='crimson',
      fill=True,
      fill_color='crimson'
    ).add_to(map_circles)

map_circles

### Retrieving the Wikidata identifiers

In [22]:
print('##### edm:Place resources linked to Wikidata')

# Query the data in g using SPARQL
# This query returns the 'name' of all ``schema:Place`` instances
q = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    PREFIX schema: <http://schema.org/>
    PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>

    SELECT distinct ?wikidata
    WHERE {
        ?p rdf:type edm:Place .
        ?p owl:sameAs ?wikidata . FILTER ( strstarts(str(?wikidata), "https://www.wikidata.org/wiki/") ).
    }
"""

##### edm:Place resources linked to Wikidata


In [23]:
for r in g.query(q):
    idwikidata = r['wikidata']
    print(idwikidata)

https://www.wikidata.org/wiki/Q980084
https://www.wikidata.org/wiki/Q207268
https://www.wikidata.org/wiki/Q376914
https://www.wikidata.org/wiki/Q550606
https://www.wikidata.org/wiki/Q1061313
https://www.wikidata.org/wiki/Q1247396
https://www.wikidata.org/wiki/Q786649
https://www.wikidata.org/wiki/Q202177
https://www.wikidata.org/wiki/Q54809
https://www.wikidata.org/wiki/Q864668
https://www.wikidata.org/wiki/Q664892
https://www.wikidata.org/wiki/Q206934
https://www.wikidata.org/wiki/Q182923
https://www.wikidata.org/wiki/Q978599
https://www.wikidata.org/wiki/Q1247435
https://www.wikidata.org/wiki/Q207257
https://www.wikidata.org/wiki/Q1147435
https://www.wikidata.org/wiki/Q2421
https://www.wikidata.org/wiki/Q47134
https://www.wikidata.org/wiki/Q1229763
https://www.wikidata.org/wiki/Q9177476
https://www.wikidata.org/wiki/Q100166
https://www.wikidata.org/wiki/Q1247384
https://www.wikidata.org/wiki/Q80967
https://www.wikidata.org/wiki/Q81052
https://www.wikidata.org/wiki/Q204940
https://www

### Wikidata visualisation

The [following link](https://w.wiki/5qa4) presents a map as a result of a SPARL query that retrieves all the geographic locations provided by the dataset and linked to Wikidata.

In [24]:
from IPython.display import IFrame

IFrame(src='https://w.wiki/5qa4', width=900, height=700)