{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a basic ISA descriptor: Dataset Maturity Level 1\n", "\n", "
\n", "
\n", "\n", "````{panels_fairplus}\n", ":identifier_text: FCB063\n", ":identifier_link: 'https://w3id.org/faircookbook/FCB063'\n", ":difficulty_level: 2\n", ":recipe_type: hands_on\n", ":reading_time_minutes: 15\n", ":intended_audience: principal_investigator, data_manager, data_scientist\n", ":maturity_level: 1\n", ":maturity_indicator: 1\n", ":has_executable_code: yeah\n", ":recipe_name: Minimal Metadata Maturity with ISA\n", "````\n", "\n", "## Abstract:\n", "\n", "In this recipe, we will show how to programmatically create a minimal metadata for a single study ISA descriptor.\n", "It is minimal for the following reasons:\n", "- no community annotation requirements are met\n", "- very limited used of ontology to refine annotations\n", "- serialization to TAB format\n", "\n", "The recipe then shows how to write (serialize) the ISA objects generated with the\n", "[ISA-API](https://github.com/ISA-tools/isa-api/) to [ISA-Tab format]().\n", "\n", "- support: [isatools@googlegroups.com](mailto:isatools@googlegroups.com)\n", "- issue tracker: https://github.com/ISA-tools/isa-api/issues\n", "\n", "If executing the notebooks on [Google Colab](https://research.google.com/colaboratory/), uncomment the following command\n", "and run it to install the required python libraries. Also, make the test datasets available." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# !pip install -r requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's get ready and import all the necessary components" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from isatools.model import (Investigation,\\\n", "Study, Protocol, Publication, Person, Source, Sample, OntologySource, OntologyAnnotation, \\\n", "Process, Assay, Material, DataFile, Characteristic, plink, batch_create_materials)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Declaring key ISA objects: Investigation, Study, Protocols, Ontologies, Contacts metadata\n", "\n", "Creating the Investigation object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "investigation = Investigation()\n", "investigation.identifier = \"i1\"\n", "investigation.title = \"My Simple ISA Investigation\"\n", "investigation.description = \"We could alternatively use the class constructor's parameters to set some default \" \\\n", " \"values at the time of creation, however we want to demonstrate how to use the \" \\\n", " \"object's instance variables to set values.\"\n", "investigation.submission_date = \"2022-11-03\"\n", "investigation.public_release_date = \"2022-11-03\"" ] }, { "cell_type": "markdown", "source": [ "Creating a Study object and set some values.\n", "The ISA Study object **must** have a filename.\n", "We must also add the current ISA study to the Investigation by adding it to the `investigation` object and its\n", "associated list of studies." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "study = Study(filename=\"s_study.txt\")\n", "study.identifier = \"s1\"\n", "study.title = \"An exemplar ISA Study\"\n", "study.description = \"Like with the Investigation, we could use the class constructor to set some default values, \" \\\n", " \"but have chosen to demonstrate in this example the use of instance variables to set initial \" \\\n", " \"values.\"\n", "study.submission_date = \"2022-11-03\"\n", "study.public_release_date = \"2022-11-03\"\n", "\n", "investigation.studies.append(study)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1 Declaring and using `Ontology Resources`:\n", "\n", "Some instance variables are typed with different objects and lists of objects. For example, a Study can have a\n", "list of design descriptors. A design descriptor is an `ISA Ontology Annotation` object describing the kind of study at hand.\n", "Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class\n", "constructors and setting values with instance variables. Note that the OntologyAnnotation object\n", "`intervention_design` links its `term_source` directly to the `obi` object instance. To ensure the `OntologySource`\n", "is encapsulated in the descriptor, it is added to a list of `ontology_source_references` in the Investigation\n", "object. The `intervention_design` object is then added to the list of `design_descriptors` held by the Study\n", "object.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ncbitaxon = OntologySource(name='NCBITaxon', description=\"NCBI Taxonomy\")\n", "investigation.ontology_source_references.append(ncbitaxon) # remember to add the newly declared ontology source to the parent investigation\n", "\n", "intervention_design = OntologyAnnotation(term = \"intervention design\")\n", "study.design_descriptors.append(intervention_design)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.2. Declaring Contacts and Publications\n", "\n", "Other attributes to both `Investigation` and `Study` objects include 'contacts' and 'publications',\n", "each filled by lists of corresponding `Person` and `Publication` objects." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "contact = Person(first_name=\"Alice\",\n", " last_name=\"Robertson\",\n", " affiliation=\"University of Life\",\n", " roles=[OntologyAnnotation(term='submitter')])\n", "study.contacts.append(contact)\n", "\n", "publication = Publication(title=\"Experiments with Elephants\",\n", " author_list=\"A. Robertson, B. Robertson\",\n", " pubmed_id=\"12345678\",\n", " status= OntologyAnnotation(term=\"published\"))\n", "study.publications.append(publication)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Building the ISA biomaterial creation graph\n", "\n", "To create the ISA study graph that captures the biological materials used as study subjects, and which corresponds to\n", "the contents of the study table file (the s_*.txt file), we need to create a process sequence.\n", "To do this we use the `Process` class and attach it to the Study object's `process_sequence` list instance variable.\n", "Each process must be linked with a `Protocol` object that is attached to a Study object's 'protocols' list instance\n", "variable. The `sample collection Process` object usually has as input a `Source material` and as output a `Sample material`.\n", "\n", "Here, we create one `Source` material object and attach it to our study." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "source = Source(name='source_material')\n", "study.sources.append(source)" ] }, { "cell_type": "markdown", "source": [ "Then, we create three `Sample` objects, with organism as `Homo Sapiens`, and attach them to the study.\n", "We use the utility function `batch_create_material()` to clone a prototype material object.\n", "The function automatically appends an index to the material name.\n", "In this case, three samples will be created, with the names 'sample_material-0', 'sample_material-1' and 'sample_material-2'." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "prototype_sample = Sample(name='sample_material', derives_from=[source])\n", "\n", "\n", "\n", "characteristic_organism = Characteristic(category=OntologyAnnotation(term=\"Organism\"),\n", " value=OntologyAnnotation(term=\"Homo Sapiens\",\n", " term_source=ncbitaxon,\n", " term_accession=\"http://purl.bioontology.org/ontology/NCBITAXON/9606\"))\n", "prototype_sample.characteristics.append(characteristic_organism)\n", "\n", "study.samples = batch_create_materials(prototype_sample, n=3) # creates a batch of 3 samples" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Now, we create a single `Protocol` object that represents our sample collection protocol, and attach it to the\n", "study object.\n", "Protocols must be declared **before** we describe Processes, as a processing event of some sort\n", "must execute some defined protocol.\n", "In the case of the class model, Protocols should therefore be declared before Processes in order for the Process\n", "to be linked to one." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "sample_collection_protocol = Protocol(name=\"sample collection\",\n", " protocol_type=OntologyAnnotation(term=\"sample collection\"))\n", "study.protocols.append(sample_collection_protocol)\n", "sample_collection_process = Process(executes_protocol=sample_collection_protocol)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Next, we link our materials to the Process.\n", "In this particular case, we are describing a sample collection process that takes one source material, and produces\n", "three different samples.\n", "\n", "(source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "for src in study.sources:\n", " sample_collection_process.inputs.append(src)\n", "\n", "for sam in study.samples:\n", " sample_collection_process.outputs.append(sam)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Finally, attach the finished Process object to the study process_sequence.\n", "This can be done many times to describe multiple sample collection events." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "study.process_sequence.append(sample_collection_process)\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "#IMPORTANT: remember to list all Characteristics used in the study object: do as follows:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "study.characteristic_categories.append(characteristic_organism.category)\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Creating an ISA Assay with all associated objects\n", "\n", "Next, we build n Assay object and attach two protocols, extraction and sequencing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assay = Assay(filename=\"a_assay.txt\")\n", "\n", "extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term=\"material extraction\"))\n", "study.protocols.append(extraction_protocol)\n", "\n", "sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term=\"nucleic acid sequencing\"))\n", "study.protocols.append(sequencing_protocol)" ] }, { "cell_type": "markdown", "source": [ "To build out assay graphs, we enumerate the samples from the study-level, and for each sample, we create an\n", "*extraction process* and a *sequencing process*.\n", "- The extraction process takes as input a sample material, and produces\n", "an extract material.\n", "- The sequencing process takes the extract material and produces a data file. This will\n", "produce three graphs, from sample material through to data, as follows:\n", "\n", ">- (sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)\n", ">- (sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)\n", ">- (sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)\n", "\n", "```{Note}\n", "The extraction processes and sequencing processes are distinctly separate instances, where the three\n", "graphs are NOT interconnected.\n", "```" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "source": [ "for i, sample in enumerate(study.samples):\n", "\n", " # create an extraction process that executes the extraction protocol\n", "\n", " extraction_process = Process(executes_protocol=extraction_protocol)\n", "\n", " # extraction process takes as input a sample, and produces an extract material as output\n", "\n", " extraction_process.inputs.append(sample)\n", " material = Material(name=\"extract-{}\".format(i))\n", " material.type = \"Extract Name\"\n", " extraction_process.outputs.append(material)\n", "\n", " # create a sequencing process that executes the sequencing protocol\n", "\n", " sequencing_process = Process(executes_protocol=sequencing_protocol)\n", " sequencing_process.name = \"assay-name-{}\".format(i)\n", " sequencing_process.inputs.append(extraction_process.outputs[0])\n", "\n", " # Sequencing process usually has an output data file\n", "\n", " datafile = DataFile(filename=\"sequenced-data-{}\".format(i),\n", " label=\"Raw Data File\")\n", " sequencing_process.outputs.append(datafile)\n", "\n", " # Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set\n", " # these links for you. It is found in the isatools.model package\n", "\n", " plink(extraction_process, sequencing_process)\n", "\n", " # make sure the extract, data file, and the processes are attached to the assay\n", "\n", " assay.data_files.append(datafile)\n", " assay.samples.append(sample)\n", " assay.other_material.append(material)\n", " assay.process_sequence.append(extraction_process)\n", " assay.process_sequence.append(sequencing_process)\n", " assay.measurement_type = OntologyAnnotation(term=\"gene sequencing\")\n", " assay.technology_type = OntologyAnnotation(term=\"nucleotide sequencing\")\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we attach the ISA assay object to the ISA study object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "study.assays.append(assay)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Writing our objects as an ISA-Tab document\n", "\n", "To do this, we simply use the `isatab.dumps()` function, as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from isatools.isatab import dumps\n", "print(dumps(investigation))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "et VoilĂ !\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Authors\n", "````{authors_fairplus}\n", "Philippe: Writing - Original Draft\n", "````\n", "\n", "## License\n", "````{license_fairplus}\n", "CC-BY-4.0\n", "````\n" ] } ], "metadata": { "kernelspec": { "display_name": "isa-api-py39", "language": "python", "name": "isa-api-py39" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 1 }