# Querying ISA investigations with SparQL and GraphQL

## Abstract
The ISA api comes packaged with a graphQL interface and a JSON-LD serializer to help users query investigations.
The aim of this notebook is to:
 - learn to load an ISA Investigation from a json file.
 - learn to execute a graphQL query on the ISA Investigation.
 - learn to serialize an ISA Investigation to JSON-LD with different contexts
 - generate an RDF graph from the JSON-LD
 - execute a sparQL query on that graph.

To illustrate this notebook, we will try to get the names of all the protocols types stored in an ISA investigation.

## 1. Getting the tools

In [17]:
# Let's first import all the packages we need

In [18]:
from os import path
import json

from rdflib import Graph, Namespace

from isatools.isajson import load
from isatools.model import set_context

## 2. Reading and loading an ISA Investigation in memory from an ISA-JSON instance

In [19]:
filepath = path.join('json', 'BII-S-3', 'BII-S-3.json')
with open(filepath, 'r') as f:
 investigation = load(f)

## 3. Write a graphQL query

In [20]:
query = """
{
 studies {
 protocols {
 type: protocolType { annotationValue }
 }
 }
}
"""
protocols_graphql = []
results = investigation.execute_query(query)
for study in results.data['studies']:
 protocols = study['protocols']
 for protocol in protocols:
 value = protocol['type']['annotationValue']
 if value not in protocols_graphql:
 protocols_graphql.append(value)
print(protocols_graphql)

['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']


## 4. Setting options for the contexts binding

In [21]:
set_context(vocab='wd', local=True, prepend_url='https://example.com', all_in_one=False)

The `set_context()` method takes five parameters:
 - vocab: to choose the vocabulary to use between `sdo`, `obo`, `wdt`, `wd` and `sio`
 - local: if `True`, uses local files else the GitHub contexts
 - prepend_url: the url to prepend to the isa identifiers (this is the URL of your SPARQL endpoint)
 - all_in_on: if `True`, all the contexts are pulled from a single file instead of separate context files
 - include_context: if `True`, the context is included in the JSON-LD serialization, else it only contains the URL or local path to the context file.

## 5. Generate a JSON-LD serialization

In [22]:
ld = investigation.to_dict(ld=True)

The investigation can be serialized to json with the `to_dict()` method. By passing the optional parameter `ld=True`, the serializer binds the `@type`, `@context` and `@id` to each object in the JSON.

## 6. Generate an RDF graph

Before we can generate a graph we need to create the proper namespaces and transform the `ld` variable into a string

In [23]:
# Creating the namespace
WD = Namespace("http://www.wikidata.org/entity/")
ISA = Namespace('https://isa.org/')

ld_string = json.dumps(ld) # Get a string representation of the ld variable
graph = Graph() # Create an empty graph
graph.parse(data=ld_string, format='json-ld') # Load the data into the graph

# Finally, bind the namespaces to the graph
graph.bind('wdt', WD)
graph.bind('isa', ISA)

## 7. Create a small sparQL query and execute it

In [24]:
query = """
PREFIX rdf: 
PREFIX owl: 
PREFIX rdfs: 
PREFIX xsd: 
PREFIX wd: 

SELECT distinct ?protocolTypeName
WHERE {
 ?p rdf:type wd:Q41689629 . # Is a protocol
 ?p wd:P7793 ?protocolType .
 ?protocolType wd:P527 ?protocolTypeName . # Get each protocol type name
 FILTER (?protocolTypeName!=""^^wd:Q1417099) # Filter out empty protocol type name
}
"""
protocols_sparql = []
for node in graph.query(query):
 n = node.asdict()
 for fieldName in n:
 fieldVal = str(n[fieldName].toPython())
 if fieldVal not in protocols_sparql:
 protocols_sparql.append(fieldVal)
print(protocols_sparql)
assert(protocols_sparql == protocols_graphql)

['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']
