## From ISA JSON to RDF Linked Data with ISALDSerializer function:

### Getting ISAtools and importing the latest module for conversion to JSON-LD from ISA-JSON

### Abstract:

The goal of this tutorial is to show how to go from an ISA document to an equivalent RDF representation using python tools but also to highlight some of the limitations of existing libraries and point to alternative options to complete a meainingful conversion to RDF Turtle format.

This notebook mainly highlights the new functionality coming with ISA-API rc10.3 latest release which allows to convert ISA-JSON to ISA-JSON-LD, with the choice of 3 popular ontological frameworks for semantic anchoring. These are:
- [obofoundry](http://www.obofoundry.org), a set of interoperable ontologies for the biological domain.
- [schema.org](https://schema.org), the search engine orientated ontology developed by companies such as Yandex,Bing,Google 
- [wikidata](https://wikidata.org), a set of semantic concepts backing wikipedia and wikidata resources.

These frameworks have been chosen for interoperability.


This notebook has a companion notebook which goes over the exploration of the resulting RDF representations using a set of SPARQL queries.
Check it out [here](http://localhost:8888/notebooks/isa-cookbook/content/notebooks/isa-jsonld%20exploration%20with%20SPARQL.ipynb)


In [None]:
import os
import json
from json import load
import datetime
import isatools

from isatools.convert.json2jsonld import ISALDSerializer

### 1. Loading an ISA-JSON document in memory with `json.load()` function

Prior to invoking the ISALDserializer function, we need to do 3 things.
* First, pass a url or a path to the ISA-JSON instance to convert to JSON-LD
* Second, select the ontology framework used for the semantic conversion. One may choose from the following 3 options:
 - obofoundry ontologies, abbreviated as `obo`
 - schema.org ontology, abbreviated as `sdo`
 - wikidata ontology, abbreviated as `wdt`
* Third, choose if to rely on embedding the @context file in the output or relying on url to individual contexts. By default, the converter will embedded the 'all in one' context information. The reason for this is the lack of support for JSON-LD 1.1 specifications in many of the python libraries supported RDF parsing (e.g. rdflib)


In [None]:
instance_path = os.path.join("./output/BII-S-3-synth/", "isa-new_ids.json")

with open(instance_path, 'r') as instance_file:
 instance = load(instance_file)
 instance_file.close()

### 2. Transforming ISA-JSON to ISA JSON-LD with `ISALDserializer` function

In [None]:
# we now invoke the ISALDSerializer function

ontology = "wdt"

serializer = ISALDSerializer(instance)
serializer.set_ontology(ontology)
serializer.set_instance(instance)

jsonldcontent = serializer.output

Now that the conversion is performed, we can write the resulting ISA-JSON-LD to file:

### 3. Writing ISA JSON-LD to file

In [None]:
isa_json_ld_path = os.path.join("./output/BII-S-3-synth/", "BII-S-3-isa-rdf-" + ontology + "-v3.jsonld")

with open(isa_json_ld_path, 'w') as outfile:
 json.dump(jsonldcontent, outfile, ensure_ascii=False, indent=4)

### Converting ISA-JSONLD instance to RDF Turtle using RDLlib (>= 6.0.2)

In [None]:
from rdflib import Graph

In [None]:
graph = Graph()
graph.parse(isa_json_ld_path)

In [None]:
print(f"Graph g has {len(graph)} statements.")

In [None]:
# Write turtle file
rdf_path=os.path.join("./output/BII-S-3-synth/", "BII-S-3-isa-rdf-" + ontology + "-v3.ttl")
with open(rdf_path, 'w') as rdf_file:
 rdf_file.write(graph.serialize(format='turtle'))

### Packaging the ISA archive and its various serializations (ISA-Tab, ISA-JSON, ISA-JSON-LD) as a Research Object Crate 

In [None]:
from rocrate.rocrate import ROCrate
from rocrate.model.person import Person
from rocrate.model.dataset import Dataset
from rocrate.model.softwareapplication import SoftwareApplication
from rocrate.model.computationalworkflow import ComputationalWorkflow
from rocrate.model.computerlanguage import ComputerLanguage
from rocrate import rocrate_api
import uuid
import hashlib

#### Instantiating a Research Object and providing basic metadata

In [None]:
ro_id = uuid.uuid4()
print(ro_id)

In [None]:
a_crate_for_isa = ROCrate()
# a_crate_for_isa.id = "#research_object/" + str(ro_id)
a_crate_for_isa.name = "ISA JSON-LD representation of BII-S-3"
a_crate_for_isa.description = "ISA study serialized as JSON-LD using " + ontology + " ontology mapping"
a_crate_for_isa.keywords = ["ISA", "JSON-LD"]
a_crate_for_isa.license = "https://creativecommons.org/licenses/by/4.0/"
a_crate_for_isa.creator = Person(a_crate_for_isa, "https://www.orcid.org/0000-0001-9853-5668", {"name": "Philippe Rocca-Serra"})

#### Adding the two ISA RDF serializations to the newly created Research Object

In [None]:
files = [isa_json_ld_path]
[a_crate_for_isa.add_file(file) for file in files]

#### Now adding a dataset to the Research Object, which is meant to describe a bag of associated images.

In [None]:
ds = Dataset(a_crate_for_isa, "raw_images")
ds.format_id="http://edamontology.org/format_3604"
ds.datePublished=datetime.datetime.now()
ds.as_jsonld=isa_json_ld_path
a_crate_for_isa.add(ds)

In [None]:
wf = ComputationalWorkflow(a_crate_for_isa, "metagenomics-sequence-analysis.cwl")
wf.language="http://edamontology.org/format_3857"
wf.datePublished=datetime.datetime.now()

with open("metagenomics-sequence-analysis.cwl","rb") as f:
 bytes = f.read() 
 new_hash = hashlib.sha256(bytes).hexdigest()
 
wf.hash=new_hash
a_crate_for_isa.add(wf)

#### Finally, writing the Research Object Crate

In [None]:
ro_outpath = "./output/BII-S-3-synth/ISA_in_a_ROcrate"
a_crate_for_isa.write_crate(ro_outpath)

#### Peaking into the RO-crate JSON-LD

In [None]:
with open(os.path.join(ro_outpath,"ro-crate-metadata.json"), 'r') as handle:
 parsed = json.load(handle)

print(json.dumps(parsed, indent=4, sort_keys=True))


#### Alternately, a zipped archive can be created as follows:

In [None]:
a_crate_for_isa.write_zip(ro_outpath)

### Conclusion:

With this content type, we have briefly introduced the notion of RO-Crate as a mechanism to package data and associated
metadata using a python library providing initial capability by offering a minimal implementation of the specifications.
The current iteration of the python library presents certain limitations. For instance, it does not provide the
necessary functionality to allow recording of `Provenance` information. However, this can be easily accomplished by
extending the code.
The key message behind this recipe is simply to show that RO-crate can improve over simply zipping a bunch of files
together by providing a little semantic over the different parts making up an archive.
Also, it is important to bear in mind that the Research Object crate is nascent and more work is needed to define
use best practices and implementation profiles.


## About this notebook

- authors: Philippe Rocca-Serra (philippe.rocca-serra@oerc.ox.ac.uk), Dominique Batista (dominique.batista@oerc.ox.ac.uk)
- license: CC-BY 4.0
- support: isatools@googlegroups.com
- issue tracker: https://github.com/ISA-tools/isa-api/issues
