## How To Pool and Merge Nodes (Material or Data) with ISA-API

- author: https://orcid.org/0000-0001-9853-5668
- email: philippe.rocca-serra@oerc.ox.ac.uk
- license: CC-BY-4.0
- createdOn: 2021-04-27

This example shows how to use the ProtocolProcessSequence to build an ISA graph with Node merging(pooling) events.
The notebook shows 2 examples:
- pooling samples, as in the case of using to Source Material to create a pooled Samples (for example, pooling soil samples)
- pooling data, as in the case of a normalization data transformation event acting on 2 raw data files

The notebooks shows how to serialize (write) the ISA Model content to ISA-Tab and ISA-JSON formats.

In [70]:
from isatools.model import *
from isatools.create.model import *
import datetime

### Creating basic ISA objects: Investigation, Study, Protocols

In [71]:
# creating an ISA.Investigation object
investigation = Investigation()

# creating an ISA.Study object
study = Study(filename="s_study.txt")
study.identifier = "S1"
study.title = "ISA Study example: creating sample pools"
study.description = "a jupytern notebook showing how to create pooled samples (a node merging event with material nodes)"

# creating the necessary ISA.Protocol objects
study.protocols = [Protocol(name="sample collection",protocol_type="pooling"),
                   Protocol(name="intracellular fraction extraction",
                            protocol_type=OntologyAnnotation(term="extraction"),
                            parameters=[ProtocolParameter(parameter_name=OntologyAnnotation(term="concentration")),
                                        ProtocolParameter(parameter_name=OntologyAnnotation(term="sample QC"))]),
                    Protocol(name="data collection",
                             protocol_type=OntologyAnnotation(term="data acquisition")),
                    Protocol(name="data transformation",
                             protocol_type=OntologyAnnotation(term="data normalization"))
                  ]

In [72]:
# creating 4 ISA.Source objects
study.sources = [Source(name="source1"),Source(name="source2"),Source(name="source3"),Source(name="source4")]

# creating 2 ISA.Sample objects
study.samples = [Sample(name="sample1"),Sample(name="sample2")]

# creating an ISA.ProtocolApplication pooling Source1 and Source2 into Sample1
study.process_sequence = [Process(executes_protocol=study.protocols[0], inputs=[study.sources[0],study.sources[1]], outputs=[study.samples[0]])]


# doing the same again for pooling Source3 and Source4 into Sample2
study.process_sequence.append(Process(executes_protocol=study.protocols[0], inputs=[study.sources[2],study.sources[3]], outputs=[study.samples[1]]))

investigation.studies = [study]

## Writing the Study without Assay to ISA-Tab

In [73]:
# let's check how this looks in ISA-Tab
from isatools.isatab import dumps
print(dumps(investigation))

/var/folders/5n/rl6lqnks4rqb59pbtpvvntqw0000gr/T/tmp5xyrzxr6/i_investigation.txt
ONTOLOGY SOURCE REFERENCE
Term Source Name
Term Source File
Term Source Version
Term Source Description
INVESTIGATION
Investigation Identifier	
Investigation Title	
Investigation Description	
Investigation Submission Date	
Investigation Public Release Date	
INVESTIGATION PUBLICATIONS
Investigation PubMed ID
Investigation Publication DOI
Investigation Publication Author List
Investigation Publication Title
Investigation Publication Status
Investigation Publication Status Term Accession Number
Investigation Publication Status Term Source REF
INVESTIGATION CONTACTS
Investigation Person Last Name
Investigation Person First Name
Investigation Person Mid Initials
Investigation Person Email
Investigation Person Phone
Investigation Person Fax
Investigation Person Address
Investigation Person Affiliation
Investigation Person Roles
Investigation Person Roles Term Accession Number
Investigation Person Roles Term Sour

#### Writing the ISA Study object to ISA-JSON

In [74]:
import json
from isatools.isajson import ISAJSONEncoder
print(json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "comments": [],
    "description": "",
    "identifier": "",
    "ontologySourceReferences": [],
    "people": [],
    "publicReleaseDate": "",
    "publications": [],
    "studies": [
        {
            "assays": [],
            "characteristicCategories": [],
            "comments": [],
            "description": "a jupytern notebook showing how to create pooled samples (a node merging event with material nodes)",
            "factors": [],
            "filename": "s_study.txt",
            "identifier": "S1",
            "materials": {
                "otherMaterials": [],
                "samples": [
                    {
                        "@id": "#sample/5116543904",
                        "characteristics": [],
                        "factorValues": [],
                        "name": "sample1"
                    },
                    {
                        "@id": "#sample/5116545584",
                        "characteristics": [],
                        "f

### Creating ISA Assays, Data Acquisition and Data Transformation events

Let's now augment the ISA.Study by adding an Assay table where
- `raw data` will be collected *independently* on each of the samples created in the previous.
- `derived data` resulting from a data transformation acting on the raw data (node merging)

In [75]:

# This creates intermediate ISA.Materials (Extracts) from Samples. 
# The extracts will be used as input to the next protocol application
extraction_process1 = Process(executes_protocol=study.protocols[1])
extraction_process1.inputs.append(study.samples[0])

material1 = Material(name="extract-1")
material1.type = "Extract Name"

extraction_process2 = Process(executes_protocol=study.protocols[1])
extraction_process2.inputs.append(study.samples[1])

material2 = Material(name="extract-2")
material2.type = "Extract Name"

extraction_process1.outputs=[material1]
extraction_process2.outputs=[material2]



### Data Acquisition Events

In [76]:
metprof_assay = Assay(measurement_type=OntologyAnnotation(term="metabolite profiling"),
                      technology_type=OntologyAnnotation(term="mass spectrometry"),filename="a_mp_by_ms.txt")

metprof_assay.samples.append(study.samples[0])
metprof_assay.samples.append(study.samples[1])

# metprof_assay.data_files.append(DataFile(filename="sequenced-data-1", label="Raw Data File"))

datafile1=DataFile(filename="file-1",label="Spectral Raw Data File")
datafile2=DataFile(filename="file-2",label="Spectral Raw Data File")
metprof_assay.data_files.append(datafile1)
metprof_assay.data_files.append(datafile2)

metprof_assay.other_material.append(material1)
metprof_assay.other_material.append(material2)
                      
metprof_assay.process_sequence.append(extraction_process1)
metprof_assay.process_sequence.append(extraction_process2)

da_process1 = Process(executes_protocol=study.protocols[2],inputs=[material1], outputs=[datafile1], date_="2021-03-30", performer="Bob Louis")
da_process1.name = "assay-name-test-1"
da_process2 = Process(executes_protocol=study.protocols[2],inputs=[material2], outputs=[datafile2], date_="2021-04-10", performer="Yu Wong")
da_process2.name = "assay-name-test-2"
                      

metprof_assay.process_sequence.append(da_process1)
metprof_assay.process_sequence.append(da_process2)

# IMPORTANT: explictly set the linking/sequence between processes
# NOTE: one-to-one mapping between protocol applications
plink(extraction_process1, da_process1)
plink(extraction_process2, da_process2)




### Data Transformation Event acting on 2 ISA Data Nodes and resulting in 1 ISA Data Node.

In [77]:
datafile3 = DataFile(filename="analysis-output1.txt", label="Derived Spectral Data File")
dt_process1 = Process(executes_protocol=study.protocols[3], inputs=[datafile1,datafile2],outputs=[datafile3], date_="2021-04-25", performer="Data Science Officer")

dt_process1.name = "data transformation 1"

metprof_assay.process_sequence.append(dt_process1)

# IMPORTANT: explictly set the linking/sequence between processes
# NOTE: many-to-one mapping between protocol applications ~ pooling/merging event
plink(da_process1,dt_process1)
plink(da_process2,dt_process1)


study.assays.append(metprof_assay)

### Writing the full ISA Study complete with  Assay to ISA-Tab

In [78]:
from isatools.isatab import dumps
print(dumps(investigation))

/var/folders/5n/rl6lqnks4rqb59pbtpvvntqw0000gr/T/tmp3hybi37n/i_investigation.txt
ONTOLOGY SOURCE REFERENCE
Term Source Name
Term Source File
Term Source Version
Term Source Description
INVESTIGATION
Investigation Identifier	
Investigation Title	
Investigation Description	
Investigation Submission Date	
Investigation Public Release Date	
INVESTIGATION PUBLICATIONS
Investigation PubMed ID
Investigation Publication DOI
Investigation Publication Author List
Investigation Publication Title
Investigation Publication Status
Investigation Publication Status Term Accession Number
Investigation Publication Status Term Source REF
INVESTIGATION CONTACTS
Investigation Person Last Name
Investigation Person First Name
Investigation Person Mid Initials
Investigation Person Email
Investigation Person Phone
Investigation Person Fax
Investigation Person Address
Investigation Person Affiliation
Investigation Person Roles
Investigation Person Roles Term Accession Number
Investigation Person Roles Term Sour

#### Writing the same ISA Study to ISA-JSON

In [79]:
import json
from isatools.isajson import ISAJSONEncoder
print(json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': ')))

{
    "comments": [],
    "description": "",
    "identifier": "",
    "ontologySourceReferences": [],
    "people": [],
    "publicReleaseDate": "",
    "publications": [],
    "studies": [
        {
            "assays": [
                {
                    "characteristicCategories": [],
                    "comments": [],
                    "dataFiles": [
                        {
                            "@id": "#data/spectralrawdatafile-5120399872",
                            "comments": [],
                            "name": "file-1",
                            "type": "Spectral Raw Data File"
                        },
                        {
                            "@id": "#data/spectralrawdatafile-5119233712",
                            "comments": [],
                            "name": "file-2",
                            "type": "Spectral Raw Data File"
                        }
                    ],
                    "filename": "a_mp_by_ms.txt",
    