{ "cells": [ { "cell_type": "markdown", "id": "bdf247b3", "metadata": {}, "source": [ "## From ISA JSON to RDF Linked Data with ISALDSerializer function:" ] }, { "cell_type": "markdown", "id": "dfdbdc70", "metadata": {}, "source": [ "### Getting ISAtools and importing the latest module for conversion to JSON-LD from ISA-JSON" ] }, { "cell_type": "markdown", "id": "c6871290", "metadata": {}, "source": [ "### Abstract:\n", "\n", "The goal of this tutorial is to show how to go from an ISA document to an equivalent RDF representation using python tools but also to highlight some of the limitations of existing libraries and point to alternative options to complete a meainingful conversion to RDF Turtle format.\n", "\n", "This notebook mainly highlights the new functionality coming with ISA-API rc10.3 latest release which allows to convert ISA-JSON to ISA-JSON-LD, with the choice of 3 popular ontological frameworks for semantic anchoring. These are:\n", "- [obofoundry](http://www.obofoundry.org), a set of interoperable ontologies for the biological domain.\n", "- [schema.org](https://schema.org), the search engine orientated ontology developed by companies such as Yandex,Bing,Google \n", "- [wikidata](https://wikidata.org), a set of semantic concepts backing wikipedia and wikidata resources.\n", "\n", "These frameworks have been chosen for interoperability.\n", "\n", "\n", "This notebook has a companion notebook which goes over the exploration of the resulting RDF representations using a set of SPARQL queries.\n", "Check it out [here](http://localhost:8888/notebooks/isa-cookbook/content/notebooks/isa-jsonld%20exploration%20with%20SPARQL.ipynb)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1866022b", "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "from json import load\n", "import datetime\n", "import isatools\n", "\n", "from isatools.convert.json2jsonld import ISALDSerializer" ] }, { "cell_type": "markdown", "id": "f2f467d0", "metadata": {}, "source": [ "### 1. Loading an ISA-JSON document in memory with `json.load()` function" ] }, { "cell_type": "markdown", "id": "417c45fb", "metadata": {}, "source": [ "Prior to invoking the ISALDserializer function, we need to do 3 things.\n", "* First, pass a url or a path to the ISA-JSON instance to convert to JSON-LD\n", "* Second, select the ontology framework used for the semantic conversion. One may choose from the following 3 options:\n", " - obofoundry ontologies, abbreviated as `obo`\n", " - schema.org ontology, abbreviated as `sdo`\n", " - wikidata ontology, abbreviated as `wdt`\n", "* Third, choose if to rely on embedding the @context file in the output or relying on url to individual contexts. By default, the converter will embedded the 'all in one' context information. The reason for this is the lack of support for JSON-LD 1.1 specifications in many of the python libraries supported RDF parsing (e.g. rdflib)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ffbe4c54", "metadata": {}, "outputs": [], "source": [ "instance_path = os.path.join(\"./output/BII-S-3-synth/\", \"isa-new_ids.json\")\n", "\n", "with open(instance_path, 'r') as instance_file:\n", " instance = load(instance_file)\n", " instance_file.close()" ] }, { "cell_type": "markdown", "id": "17796cdc", "metadata": {}, "source": [ "### 2. Transforming ISA-JSON to ISA JSON-LD with `ISALDserializer` function" ] }, { "cell_type": "code", "execution_count": null, "id": "b699f208", "metadata": {}, "outputs": [], "source": [ "# we now invoke the ISALDSerializer function\n", "\n", "ontology = \"wdt\"\n", "\n", "serializer = ISALDSerializer(instance)\n", "serializer.set_ontology(ontology)\n", "serializer.set_instance(instance)\n", "\n", "jsonldcontent = serializer.output" ] }, { "cell_type": "markdown", "id": "907178a8", "metadata": {}, "source": [ "Now that the conversion is performed, we can write the resulting ISA-JSON-LD to file:" ] }, { "cell_type": "markdown", "id": "7fd2e196", "metadata": {}, "source": [ "### 3. Writing ISA JSON-LD to file" ] }, { "cell_type": "code", "execution_count": null, "id": "0382b1b3", "metadata": {}, "outputs": [], "source": [ "isa_json_ld_path = os.path.join(\"./output/BII-S-3-synth/\", \"BII-S-3-isa-rdf-\" + ontology + \"-v3.jsonld\")\n", "\n", "with open(isa_json_ld_path, 'w') as outfile:\n", " json.dump(jsonldcontent, outfile, ensure_ascii=False, indent=4)" ] }, { "cell_type": "markdown", "id": "d68913f7", "metadata": {}, "source": [ "### Converting ISA-JSONLD instance to RDF Turtle using RDLlib (>= 6.0.2)" ] }, { "cell_type": "code", "execution_count": null, "id": "90f1c325", "metadata": {}, "outputs": [], "source": [ "from rdflib import Graph" ] }, { "cell_type": "code", "execution_count": null, "id": "7ae475be", "metadata": {}, "outputs": [], "source": [ "graph = Graph()\n", "graph.parse(isa_json_ld_path)" ] }, { "cell_type": "code", "execution_count": null, "id": "27b79db2", "metadata": {}, "outputs": [], "source": [ "print(f\"Graph g has {len(graph)} statements.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "f619b690", "metadata": {}, "outputs": [], "source": [ "# Write turtle file\n", "rdf_path=os.path.join(\"./output/BII-S-3-synth/\", \"BII-S-3-isa-rdf-\" + ontology + \"-v3.ttl\")\n", "with open(rdf_path, 'w') as rdf_file:\n", " rdf_file.write(graph.serialize(format='turtle'))" ] }, { "cell_type": "markdown", "id": "0e9bf011", "metadata": {}, "source": [ "### Packaging the ISA archive and its various serializations (ISA-Tab, ISA-JSON, ISA-JSON-LD) as a Research Object Crate " ] }, { "cell_type": "code", "execution_count": null, "id": "773a98c7", "metadata": {}, "outputs": [], "source": [ "from rocrate.rocrate import ROCrate\n", "from rocrate.model.person import Person\n", "from rocrate.model.dataset import Dataset\n", "from rocrate.model.softwareapplication import SoftwareApplication\n", "from rocrate.model.computationalworkflow import ComputationalWorkflow\n", "from rocrate.model.computerlanguage import ComputerLanguage\n", "from rocrate import rocrate_api\n", "import uuid\n", "import hashlib" ] }, { "cell_type": "markdown", "id": "0a50acef", "metadata": {}, "source": [ "#### Instantiating a Research Object and providing basic metadata" ] }, { "cell_type": "code", "execution_count": null, "id": "dbcd356b", "metadata": {}, "outputs": [], "source": [ "ro_id = uuid.uuid4()\n", "print(ro_id)" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ef3974", "metadata": {}, "outputs": [], "source": [ "a_crate_for_isa = ROCrate()\n", "# a_crate_for_isa.id = \"#research_object/\" + str(ro_id)\n", "a_crate_for_isa.name = \"ISA JSON-LD representation of BII-S-3\"\n", "a_crate_for_isa.description = \"ISA study serialized as JSON-LD using \" + ontology + \" ontology mapping\"\n", "a_crate_for_isa.keywords = [\"ISA\", \"JSON-LD\"]\n", "a_crate_for_isa.license = \"https://creativecommons.org/licenses/by/4.0/\"\n", "a_crate_for_isa.creator = Person(a_crate_for_isa, \"https://www.orcid.org/0000-0001-9853-5668\", {\"name\": \"Philippe Rocca-Serra\"})" ] }, { "cell_type": "markdown", "id": "e3a27a39", "metadata": {}, "source": [ "#### Adding the two ISA RDF serializations to the newly created Research Object" ] }, { "cell_type": "code", "execution_count": null, "id": "8d455a99", "metadata": {}, "outputs": [], "source": [ "files = [isa_json_ld_path]\n", "[a_crate_for_isa.add_file(file) for file in files]" ] }, { "cell_type": "markdown", "id": "2bb09cbb", "metadata": {}, "source": [ "#### Now adding a dataset to the Research Object, which is meant to describe a bag of associated images." ] }, { "cell_type": "code", "execution_count": null, "id": "561c452c", "metadata": {}, "outputs": [], "source": [ "ds = Dataset(a_crate_for_isa, \"raw_images\")\n", "ds.format_id=\"http://edamontology.org/format_3604\"\n", "ds.datePublished=datetime.datetime.now()\n", "ds.as_jsonld=isa_json_ld_path\n", "a_crate_for_isa.add(ds)" ] }, { "cell_type": "code", "execution_count": null, "id": "80ac9d57", "metadata": {}, "outputs": [], "source": [ "wf = ComputationalWorkflow(a_crate_for_isa, \"metagenomics-sequence-analysis.cwl\")\n", "wf.language=\"http://edamontology.org/format_3857\"\n", "wf.datePublished=datetime.datetime.now()\n", "\n", "with open(\"metagenomics-sequence-analysis.cwl\",\"rb\") as f:\n", " bytes = f.read() \n", " new_hash = hashlib.sha256(bytes).hexdigest()\n", " \n", "wf.hash=new_hash\n", "a_crate_for_isa.add(wf)" ] }, { "cell_type": "markdown", "id": "d43c6dd0", "metadata": {}, "source": [ "#### Finally, writing the Research Object Crate" ] }, { "cell_type": "code", "execution_count": null, "id": "6262f675", "metadata": {}, "outputs": [], "source": [ "ro_outpath = \"./output/BII-S-3-synth/ISA_in_a_ROcrate\"\n", "a_crate_for_isa.write_crate(ro_outpath)" ] }, { "cell_type": "markdown", "id": "93dfc86b", "metadata": {}, "source": [ "#### Peaking into the RO-crate JSON-LD" ] }, { "cell_type": "code", "execution_count": null, "id": "e36d28cb", "metadata": {}, "outputs": [], "source": [ "with open(os.path.join(ro_outpath,\"ro-crate-metadata.json\"), 'r') as handle:\n", " parsed = json.load(handle)\n", "\n", "print(json.dumps(parsed, indent=4, sort_keys=True))\n" ] }, { "cell_type": "markdown", "id": "86c3c677", "metadata": {}, "source": [ "#### Alternately, a zipped archive can be created as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "7841165b", "metadata": {}, "outputs": [], "source": [ "a_crate_for_isa.write_zip(ro_outpath)" ] }, { "cell_type": "markdown", "id": "95468e0e", "metadata": {}, "source": [ "### Conclusion:\n", "\n", "With this content type, we have briefly introduced the notion of RO-Crate as a mechanism to package data and associated\n", "metadata using a python library providing initial capability by offering a minimal implementation of the specifications.\n", "The current iteration of the python library presents certain limitations. For instance, it does not provide the\n", "necessary functionality to allow recording of `Provenance` information. However, this can be easily accomplished by\n", "extending the code.\n", "The key message behind this recipe is simply to show that RO-crate can improve over simply zipping a bunch of files\n", "together by providing a little semantic over the different parts making up an archive.\n", "Also, it is important to bear in mind that the Research Object crate is nascent and more work is needed to define\n", "use best practices and implementation profiles.\n" ] }, { "cell_type": "markdown", "id": "c1752636", "metadata": {}, "source": [ "## About this notebook\n", "\n", "- authors: Philippe Rocca-Serra (philippe.rocca-serra@oerc.ox.ac.uk), Dominique Batista (dominique.batista@oerc.ox.ac.uk)\n", "- license: CC-BY 4.0\n", "- support: isatools@googlegroups.com\n", "- issue tracker: https://github.com/ISA-tools/isa-api/issues\n" ] } ], "metadata": { "kernelspec": { "display_name": "isa-api-py39", "language": "python", "name": "isa-api-py39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 5 }