{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying ISA investigations with SparQL and GraphQL\n",
    "\n",
    "## Abstract\n",
    "The ISA api comes packaged with a graphQL interface and a JSON-LD serializer to help users query investigations.\n",
    "The aim of this notebook is to:\n",
    "   - learn to load an ISA Investigation from a json file.\n",
    "   - learn to execute a graphQL query on the ISA Investigation.\n",
    "   - learn to serialize an ISA Investigation to JSON-LD with different contexts\n",
    "   - generate an RDF graph from the JSON-LD\n",
    "   - execute a sparQL query on that graph.\n",
    "\n",
    "To illustrate this notebook, we will try to get the names of all the protocols types stored in an ISA investigation.\n",
    "\n",
    "## 1. Getting the tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let's first import all the packages we need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os import path\n",
    "import json\n",
    "\n",
    "from rdflib import Graph, Namespace\n",
    "\n",
    "from isatools.isajson import load\n",
    "from isatools.model import set_context"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Reading and loading an ISA Investigation in memory from an ISA-JSON instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "filepath = path.join('json', 'BII-S-3', 'BII-S-3.json')\n",
    "with open(filepath, 'r') as f:\n",
    "    investigation = load(f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Write a graphQL query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']\n"
     ]
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "{\n",
    "    studies {\n",
    "        protocols {\n",
    "            type: protocolType { annotationValue }\n",
    "        }\n",
    "    }\n",
    "}\n",
    "\"\"\"\n",
    "protocols_graphql = []\n",
    "results = investigation.execute_query(query)\n",
    "for study in results.data['studies']:\n",
    "    protocols = study['protocols']\n",
    "    for protocol in protocols:\n",
    "        value = protocol['type']['annotationValue']\n",
    "        if value not in protocols_graphql:\n",
    "            protocols_graphql.append(value)\n",
    "print(protocols_graphql)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Setting options for the contexts binding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "set_context(vocab='wd', local=True, prepend_url='https://example.com', all_in_one=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `set_context()` method takes five parameters:\n",
    "   - vocab: to choose the vocabulary to use between `sdo`, `obo`, `wdt`, `wd` and `sio`\n",
    "   - local: if `True`, uses local files else the GitHub contexts\n",
    "   - prepend_url: the url to prepend to the isa identifiers (this is the URL of your SPARQL endpoint)\n",
    "   - all_in_on: if `True`, all the contexts are pulled from a single file instead of separate context files\n",
    "   - include_context: if `True`, the context is included in the JSON-LD serialization, else it only contains the URL or local path to the context file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Generate a JSON-LD serialization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "ld = investigation.to_dict(ld=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The investigation can be serialized to json with the `to_dict()` method. By passing the optional parameter `ld=True`, the serializer binds the `@type`, `@context` and `@id` to each object in the JSON."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Generate an RDF graph\n",
    "\n",
    "Before we can generate a graph we need to create the proper namespaces and transform the `ld` variable into a string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Creating the namespace\n",
    "WD = Namespace(\"http://www.wikidata.org/entity/\")\n",
    "ISA = Namespace('https://isa.org/')\n",
    "\n",
    "ld_string = json.dumps(ld) # Get a string representation of the ld variable\n",
    "graph = Graph() # Create an empty graph\n",
    "graph.parse(data=ld_string, format='json-ld') # Load the data into the graph\n",
    "\n",
    "# Finally, bind the namespaces to the graph\n",
    "graph.bind('wdt', WD)\n",
    "graph.bind('isa', ISA)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Create a small sparQL query and execute it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['sample collection', 'nucleic acid extraction', 'reverse transcription', 'library construction', 'nucleic acid sequencing', 'data transformation']\n"
     ]
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n",
    "PREFIX owl: <http://www.w3.org/2002/07/owl#>\n",
    "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>\n",
    "PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>\n",
    "PREFIX wd: <http://www.wikidata.org/entity/>\n",
    "\n",
    "SELECT distinct ?protocolTypeName\n",
    "WHERE {\n",
    "    ?p rdf:type wd:Q41689629 . # Is a protocol\n",
    "    ?p wd:P7793 ?protocolType .\n",
    "    ?protocolType wd:P527 ?protocolTypeName . # Get each protocol type name\n",
    "    FILTER (?protocolTypeName!=\"\"^^wd:Q1417099) # Filter out empty protocol type name\n",
    "}\n",
    "\"\"\"\n",
    "protocols_sparql = []\n",
    "for node in graph.query(query):\n",
    "    n = node.asdict()\n",
    "    for fieldName in n:\n",
    "        fieldVal = str(n[fieldName].toPython())\n",
    "        if fieldVal not in protocols_sparql:\n",
    "            protocols_sparql.append(fieldVal)\n",
    "print(protocols_sparql)\n",
    "assert(protocols_sparql == protocols_graphql)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}