# Reading ISA-Tab from files and Validating ISA-Tab files 

## Abstract:

The aim of this notebook is to:
 - show essential function to read and load an ISA-tab file in memory.
 - navigate key objects and pull key attributes.
 - learn how to invoke the ISA-tab validation function.
 - interpret the output of the validation report.


## 1. Getting the tools

In [1]:
# If executing the notebooks on `Google Colab`,uncomment the following command 
# and run it to install the required python libraries. Also, make the test datasets available.

# !pip install -r requirements.txt

In [2]:
import isatools
import os
import sys
from isatools import isatab

## 2. Reading and loading an ISA Investigation in memory from an ISA-Tab instance

In [3]:
with open(os.path.join('./s_DO_transcriptome', 'i_DO_transcriptome.txt')) as fp:
 ISA = isatab.load(fp)

















































































































































































































In [None]:
with open(os.path.join('./BII-S-3', 'i_gilbert.txt')) as fp:
 ISA = isatab.load(fp)

### Let's check the description of the first study object present in an ISA Investigation object

### Let's check the protocols declared in ISA the study (using a python list comprehension):

In [5]:
[protocol.description for protocol in ISA.studies[0].protocols]

['Tissues from aproximately 18 week old mice dissected and frozen',
 "RNA was isolated using the miRNeasy Mini kit (Qiagen) according to manufacturer's protocols",
 "Stranded libraries were prepared using the KAPA mRNA HyperPrep Kit (KAPA Biosystems), according to the manufacturer's instructions",
 'Illumina',
 'GBRS']

### Let's now checks the ISA Assay Measurement and Technology Types are used in this ISA Study object

In [5]:
[f'{assay.measurement_type.term} using {assay.technology_type.term}' for assay in ISA.studies[0].assays]

['transcription profiling assay using DNA sequencer']

### Let's now check the `ISA Study Source` Material:

In [6]:
[source.name for source in ISA.studies[0].sources]

['DO021',
 'DO022',
 'DO023',
 'DO024',
 'DO025',
 'DO026',
 'DO027',
 'DO028',
 'DO030',
 'DO031',
 'DO032',
 'DO033',
 'DO034',
 'DO035',
 'DO036',
 'DO037',
 'DO038',
 'DO039',
 'DO040',
 'DO041',
 'DO042',
 'DO043',
 'DO044',
 'DO046',
 'DO047',
 'DO048',
 'DO049',
 'DO050',
 'DO051',
 'DO052',
 'DO053',
 'DO054',
 'DO055',
 'DO056',
 'DO057',
 'DO058',
 'DO059',
 'DO060',
 'DO061',
 'DO062',
 'DO063',
 'DO064',
 'DO065',
 'DO066',
 'DO067',
 'DO068',
 'DO069',
 'DO070',
 'DO071',
 'DO072',
 'DO073',
 'DO074',
 'DO075',
 'DO076',
 'DO078',
 'DO079',
 'DO080',
 'DO081',
 'DO082',
 'DO083',
 'DO084',
 'DO085',
 'DO086',
 'DO087',
 'DO088',
 'DO089',
 'DO090',
 'DO091',
 'DO092',
 'DO093',
 'DO094',
 'DO095',
 'DO096',
 'DO097',
 'DO098',
 'DO099',
 'DO100',
 'DO101',
 'DO102',
 'DO103',
 'DO104',
 'DO105',
 'DO106',
 'DO107',
 'DO108',
 'DO109',
 'DO111',
 'DO112',
 'DO113',
 'DO114',
 'DO115',
 'DO116',
 'DO118',
 'DO119',
 'DO120',
 'DO121',
 'DO122',
 'DO123',
 'DO124',
 'DO125',


#### Let's check what is the first `ISA Study Source property`:

In [7]:
# here, we get all the characteristics of the first Source object
first_source_characteristics = ISA.studies[0].sources[0].characteristics

In [8]:
first_source_characteristics[0].category.term

'organism'

#### Let's now check what is the `value` associated with that first `ISA Study Source property`:

In [9]:
first_source_characteristics[0].value.term

'Mus musculus'

#### Let's now check what are all the properties associated with this first `ISA Study Source`

In [10]:
[char.category.term for char in first_source_characteristics]

['organism', 'strain', 'sex', 'generation number', 'diet', 'tissue']

#### And the corresponding values are:

In [11]:
[char.value for char in first_source_characteristics]

[isatools.model.OntologyAnnotation(term='Mus musculus', term_source=isatools.model.OntologySource(name='NCBITaxon', file='https://bioportal.bioontology.org/ontologies/NCBITAXON', version='2020AB ', description='The NCBI Taxonomy Database', comments=[]), term_accession='http://purl.obolibrary.org/obo/NCBITaxon_10090', comments=[]),
 isatools.model.OntologyAnnotation(term='J:DO', term_source=isatools.model.OntologySource(name='Jax Registry', file='', version='', description='Jax Registry', comments=[]), term_accession='JR009376', comments=[]),
 'F',
 isatools.model.OntologyAnnotation(term='17', term_source=isatools.model.OntologySource(name='SIO', file='https://bioportal.bioontology.org/ontologies/SIO', version='1.51', description='Semanticscience Integrated Ontology', comments=[]), term_accession='http://semanticscience.org/resource/SIO_010061', comments=[]),
 'Envigo Teklad HFHS TD.08811',
 isatools.model.OntologyAnnotation(term='Adipose', term_source=isatools.model.OntologySource(name

## 3. Invoking the python ISA-Tab Validator

In [None]:
my_json_report_bii_i_1 = isatab.validate(open(os.path.join('./BII-I-1/', 'i_investigation.txt')))

In [None]:
my_json_report_bii_s_3 = isatab.validate(open(os.path.join('./BII-S-3/', 'i_gilbert.txt')))

In [None]:
my_json_report_bii_s_4 = isatab.validate(open(os.path.join('./BII-S-4/', 'i_investigation.txt')))

In [None]:
my_json_report_bii_s_7 = isatab.validate(open(os.path.join('./BII-S-7/', 'i_matteo.txt')))

In [None]:
my_json_report_bii_s_7

- This `Validation Report` shows that No Error has been logged
- The rest of the report consists in warnings meant to draw the attention of the curator to elements which may be provided but which do not break the ISA syntax.
- Notice the `study group` information reported on both study and assay files. If ISA `Factor Value[]` fields are found present in the `ISA Study` or ` ISA Assay` tables, the validator will try to identify the set of unique `Factor Value` combination defining a `Study Group`.
 - When no `Factor Value` are found in a ISA `Study` or `Assay` table, the value is left to its default value: -1, which means that `No Study Group` have been found.
 - ISA **strongly** encourages to declare Study Group using ISA Factor Value to unambiguously identify the Independent Variables of an experiment.
 

## 4. How does a validation failure looks like ?

### BII-S-5 contains an error located in the `i_investigation.txt` file of the submission

In [None]:
my_json_report_bii_s_5 = isatab.validate(open(os.path.join('./BII-S-5/', 'i_investigation.txt')))

In [None]:
my_json_report_bii_s_5["errors"]

- The Validator report the Error Array is not empty and shows the root cause of the syntactic validator error.
- There is a typo in the Investigation file which affects 2 positions on the file for both Investigation and Study Object: 
Publication **l**ist. vs Publication **L**ist

## About this notebook

- authors: philippe.rocca-serra@oerc.ox.ac.uk, massimiliano.izzo@oerc.ox.ac.uk
- license: CC-BY 4.0
- support: isatools@googlegroups.com
- issue tracker: https://github.com/ISA-tools/isa-api/issues