{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Querying ISA investigations with SparQL and GraphQL\n", "\n", "## Abstract\n", "The ISA api comes packaged with a graphQL interface and a JSON-LD serializer to help users query investigations.\n", "The aim of this notebook is to:\n", " - learn to load an ISA Investigation from a json file.\n", " - learn to execute a graphQL query on the ISA Investigation.\n", " - learn to serialize an ISA Investigation to JSON-LD with different contexts\n", " - generate an RDF graph from the JSON-LD\n", " - execute a sparQL query on that graph.\n", "\n", "To illustrate this notebook, we will try to get the names of all the protocols types stored in an ISA investigation.\n", "\n", "## 1. Getting the tools" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "# Let's first import all the packages we need" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from os import path\n", "import json\n", "\n", "from rdflib import Graph, Namespace\n", "\n", "from isatools.isajson import load\n", "from isatools.model import set_context" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Reading and loading an ISA Investigation in memory from an ISA-JSON instance" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "filepath = path.join('json', 'BII-S-3', 'BII-S-3.json')\n", "with open(filepath, 'r') as f:\n", " investigation = load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Write a graphQL query" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']\n" ] } ], "source": [ "query = \"\"\"\n", "{\n", " studies {\n", " protocols {\n", " type: protocolType { annotationValue }\n", " }\n", " }\n", "}\n", "\"\"\"\n", "protocols_graphql = []\n", "results = investigation.execute_query(query)\n", "for study in results.data['studies']:\n", " protocols = study['protocols']\n", " for protocol in protocols:\n", " value = protocol['type']['annotationValue']\n", " if value not in protocols_graphql:\n", " protocols_graphql.append(value)\n", "print(protocols_graphql)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Setting options for the contexts binding" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "set_context(vocab='wd', local=True, prepend_url='https://example.com', all_in_one=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `set_context()` method takes five parameters:\n", " - vocab: to choose the vocabulary to use between `sdo`, `obo`, `wdt`, `wd` and `sio`\n", " - local: if `True`, uses local files else the GitHub contexts\n", " - prepend_url: the url to prepend to the isa identifiers (this is the URL of your SPARQL endpoint)\n", " - all_in_on: if `True`, all the contexts are pulled from a single file instead of separate context files\n", " - include_context: if `True`, the context is included in the JSON-LD serialization, else it only contains the URL or local path to the context file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Generate a JSON-LD serialization" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "ld = investigation.to_dict(ld=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The investigation can be serialized to json with the `to_dict()` method. By passing the optional parameter `ld=True`, the serializer binds the `@type`, `@context` and `@id` to each object in the JSON." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Generate an RDF graph\n", "\n", "Before we can generate a graph we need to create the proper namespaces and transform the `ld` variable into a string" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Creating the namespace\n", "WD = Namespace(\"http://www.wikidata.org/entity/\")\n", "ISA = Namespace('https://isa.org/')\n", "\n", "ld_string = json.dumps(ld) # Get a string representation of the ld variable\n", "graph = Graph() # Create an empty graph\n", "graph.parse(data=ld_string, format='json-ld') # Load the data into the graph\n", "\n", "# Finally, bind the namespaces to the graph\n", "graph.bind('wdt', WD)\n", "graph.bind('isa', ISA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Create a small sparQL query and execute it" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']\n" ] } ], "source": [ "query = \"\"\"\n", "PREFIX rdf: \n", "PREFIX owl: \n", "PREFIX rdfs: \n", "PREFIX xsd: \n", "PREFIX wd: \n", "\n", "SELECT distinct ?protocolTypeName\n", "WHERE {\n", " ?p rdf:type wd:Q41689629 . # Is a protocol\n", " ?p wd:P7793 ?protocolType .\n", " ?protocolType wd:P527 ?protocolTypeName . # Get each protocol type name\n", " FILTER (?protocolTypeName!=\"\"^^wd:Q1417099) # Filter out empty protocol type name\n", "}\n", "\"\"\"\n", "protocols_sparql = []\n", "for node in graph.query(query):\n", " n = node.asdict()\n", " for fieldName in n:\n", " fieldVal = str(n[fieldName].toPython())\n", " if fieldVal not in protocols_sparql:\n", " protocols_sparql.append(fieldVal)\n", "print(protocols_sparql)\n", "assert(protocols_sparql == protocols_graphql)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 1 }