{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Known Issue: unused ISA Source aren't serialized to ISA-Tab\n", "\n", "## Abtract:\n", "\n", "This notebook documents a behavior of the ISA-Tab writer which results in declared but unused ISA Source objects not to be serialized in the ISA-Tab file.\n", "The ISA objects are serialized fine if using the ISA-JSON write.\n", "The future releases of the ISA-API will see to address the issue.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's get the tools" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# If executing the notebooks on `Google Colab`,uncomment the following command \n", "# and run it to install the required python libraries. Also, make the test datasets available.\n", "\n", "# !pip install -r requirements.txt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json\n", "import datetime\n", "from isatools.model import (\n", " Investigation,\n", " Study,\n", " Assay,\n", " Person,\n", " Material,\n", " DataFile,\n", " OntologySource,\n", " OntologyAnnotation,\n", " Sample,\n", " Source,\n", " Characteristic,\n", " Protocol,\n", " Process,\n", " plink\n", ")\n", "from isatools import isatab\n", "from isatools.isajson import ISAJSONEncoder\n", "\n", "final_dir = os.path.abspath(os.path.join('notebook-output', 'issue-brapi'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating an ISA Study and boilerplate information" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "investigation = Investigation()\n", "investigation.identifier = \"BRAPI-test-unused-source\"\n", "investigation.title = \"BRAPI-test-unused-source\"\n", "investigation.description = \"this is test to understand the conditions under which ISA-API will serialize or not serialize a Source entity declared but not used in a workflow. Note: while the python ISA-API does not serialize in the Tab format, the information is available from ISA-JSON.\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "prs_test_study = Study(filename=\"s_prs_test.txt\")\n", "\n", "prs_test_study.identifier = \"PRS\"\n", "prs_test_study.title = \"Unused Sources\"\n", "prs_test_study.description = \"testing if the python ISA-API supports unusued Sources in ISA-Tab serialization\"\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "T\n" ] } ], "source": [ "prs = Person(last_name=\"Rocca-Serra\", first_name=\"Philippe\", mid_initials=\"T\", affiliation=\"OeRC\", email=\"prs@hotmail.com\" )\n", "prs_test_study.contacts.append(prs)\n", "print(prs.mid_initials)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "ncbi_taxon = OntologySource(name='NCBITaxon', description=\"NCBI Taxonomy\")\n", "human_characteristic= Characteristic(category=OntologyAnnotation(term=\"Organism\"),\n", " value=OntologyAnnotation(term=\"Homo Sapiens\", term_source=ncbi_taxon,\n", " term_accession=\"http://purl.bioontology.org/ontology/NCBITAXON/9606\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating ISA Sources" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "subject_0 = Source(name='human individual-0', characteristics=[human_characteristic]) \n", "subject_1 = Source(name='human individual-1', characteristics=[human_characteristic]) \n", "subject_2 = Source(name='human individual-2', characteristics=[human_characteristic]) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating ISA Samples" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "sample_0 = Sample(name='SBJ0_sample1')\n", "# note that 2 samples are generated from subject_1\n", "sample_1 = Sample(name='SBJ1_sample1')\n", "sample_2 = Sample(name='SBJ1_sample2')\n", "# note that no sample is generated from subject_\n", "sample_3 = Sample(name='SBJ2')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Associating Sources and Samples to the ISA.Study. object" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "prs_test_study.sources.append(subject_0)\n", "prs_test_study.sources.append(subject_1)\n", "prs_test_study.sources.append(subject_2)\n", "\n", "prs_test_study.samples.append(sample_0)\n", "prs_test_study.samples.append(sample_1)\n", "prs_test_study.samples.append(sample_2)\n", "#prs_test_study.samples.append(subject_2)\n", "prs_test_study.samples.append(sample_3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Declaring Protocol objects" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Don't forget the protocol_type should be declared as an ISA Ontology Annotation" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "prs_protocol = Protocol(name=\"sample collection\",\n", " protocol_type=OntologyAnnotation(term=\"sample collection\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding the newly created protocol to the list of protocols associated with an ISA Study Object" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "prs_test_study.protocols.append(prs_protocol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the ProtocolApplication events connecting Parent to Children Materials" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### executing a protocol minimally or maximally by specifying date and performer of execution" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "prs_process0 = Process(executes_protocol=prs_protocol)\n", "now = str(datetime.datetime.now().strftime(\"%Y-%m-%d\"))\n", "prs_process1 = Process(executes_protocol=prs_protocol, performer=prs.first_name, date_=now)\n", "prs_process2 = Process(executes_protocol=prs_protocol, performer=prs.first_name, date_=now)\n", "prs_process3 = Process(executes_protocol=prs_protocol, performer=prs.first_name, date_=now)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setting input and outputs of each sample collection protocol application" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "prs_process0.inputs.append(subject_0)\n", "prs_process0.outputs.append(sample_0)\n", "prs_process1.inputs.append(subject_1)\n", "prs_process1.outputs.append(sample_1)\n", "prs_process2.inputs.append(subject_1)\n", "prs_process2.outputs.append(sample_2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now the following tests if setting a protocol application with no output jinxes the ISA-API" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "prs_process3.inputs.append(subject_2)\n", "prs_process3.outputs.append(sample_3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Here, the ISA Study object is updated by associating all the processes/protocol_applications to the process_sequence attribute.\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "prs_test_study.process_sequence.append(prs_process0)\n", "prs_test_study.process_sequence.append(prs_process1)\n", "prs_test_study.process_sequence.append(prs_process2)\n", "prs_test_study.process_sequence.append(prs_process3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating an ISA Assay object - This is to test associating an ISA Source as input to an Assay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step1 - Create the ISA Assay Object" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "assay_on_source = Assay(measurement_type=OntologyAnnotation(term=\"phenotyping\"),\n", " technology_type=OntologyAnnotation(term=\"\"),\n", " filename=\"a_assay-test.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step2 - Create a new ISA Protocol the type of which is `data acquisition`" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "assay_protocol = Protocol(name=\"assay-on-source\",\n", " protocol_type=\"data acquisition\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step3 - Remember to add the new protocol to the ISA.Study object" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "prs_test_study.protocols.append(assay_protocol)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step4 - Create the Protocol Application event which generates an ISA DataFile" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "assay_process = Process(executes_protocol=assay_protocol, performer=prs.first_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step5 - Create an ISA DataFile object" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "dummy_file= DataFile(filename=\"dummy.txt\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step6 - Set ProtocolApplication/Process inputs and outputs testing if an ISA.Source can be used in an ISA.Assay" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "assay_process.inputs.append(sample_0)\n", "assay_process.outputs.append(dummy_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step7 - Associate the newly created ISA.DataFile with the ISA.Assay object" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "assay_on_source.data_files.append(dummy_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step8 - Link and Connect ISA.ProtocolApplications via the process_sequence attribute" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "assay_on_source.process_sequence.append(prs_process3)\n", "assay_on_source.process_sequence.append(assay_process)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step9 - Update the ISA.Assay Object by listing all ISA.Materials used in the Assay associated ProtocolApplications" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "assay_on_source.samples.append(sample_3)\n", "#assay_on_source.other_material.append(subject_2)\n", "plink(prs_process3, assay_process)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "prs_test_study.process_sequence.append(assay_process)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step10 - Update the ISA.Study Object by adding the ISA Assay object to the ISA.Study assays attribute" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "prs_test_study.assays.append(assay_on_source)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "investigation.studies.append(prs_test_study)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2021-07-21 17:44:03,166 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [0, 1, 2]\n", "2021-07-21 17:44:03,166 [WARNING]: isatab.py(write_study_table_files:1194) >> [7, 3, 0, 8, 4, 1, 9, 5, 10, 6, 2, 11]\n", "2021-07-21 17:44:03,167 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[1, 8, 4], [1, 9, 5], [2, 10, 6]]\n", "2021-07-21 17:44:03,187 [INFO]: isatab.py(_all_end_to_end_paths:1131) >> [6, 3]\n", "2021-07-21 17:44:03,189 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[3, 11]]\n", "2021-07-21 17:44:03,189 [INFO]: isatab.py(_longest_path_and_attrs:1091) >> [[3, 11]]\n" ] } ], "source": [ "dataframes = isatab.dump_tables_to_dataframes(investigation)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Source Name | \n", "Characteristics[Organism] | \n", "Term Source REF | \n", "Term Accession Number | \n", "Protocol REF | \n", "Date | \n", "Performer | \n", "Sample Name | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "human individual-1 | \n", "Homo Sapiens | \n", "NCBITaxon | \n", "http://purl.bioontology.org/ontology/NCBITAXON... | \n", "sample collection | \n", "2021-07-21 | \n", "Philippe | \n", "SBJ1_sample1 | \n", "
| 1 | \n", "human individual-1 | \n", "Homo Sapiens | \n", "NCBITaxon | \n", "http://purl.bioontology.org/ontology/NCBITAXON... | \n", "sample collection | \n", "2021-07-21 | \n", "Philippe | \n", "SBJ1_sample2 | \n", "
| 2 | \n", "human individual-2 | \n", "Homo Sapiens | \n", "NCBITaxon | \n", "http://purl.bioontology.org/ontology/NCBITAXON... | \n", "sample collection | \n", "2021-07-21 | \n", "Philippe | \n", "SBJ2 | \n", "