{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Abstract:\n",
    "\n",
    "The aim of this notebook is to show to create an ISA document for depositing Stable Isotope Resolved Metabolomics Study metadata using the ISA API.\n",
    "\n",
    "This notebook highlights key steps of the deposition, including:\n",
    "- declaration of study variables and treatment groups\n",
    "- declaration of SIRM specific protocols, assays and annotation requirements for a given data modality.\n",
    "- ISA roundtrip (write, reading, writing).\n",
    "- Serialization to TAB and JSON\n",
    "- Validation\n",
    "   \n",
    " Stable Isotope Resolved Metabolomics Studies are a type of studies using MS and NMR acquisition techniques to decypher biochemical reactions using `tracer molecule`, i.e. molecules for which certain positions carry an isotope (e.g. 13C, 15N). Specific data acquisition and data processing techniques are required and dedicated software is used to make sense of the data. Software such as `IsoSolve` [1], `Ramid`[2](for primary processing of 13C mass isotopomer data obtained with GCMS) or `midcor`[3] (for natural abundance correction processes on13C mass isotopomers spectra), may be used to accomplish those tasks. The output of such tools are tables which may comply with a new specifications devised to better support the reporting of SIRM study results.\n",
    " \n",
    " \n",
    " - [1]. IsoSolve https://doi.org/10.1021/acs.analchem.1c01064\n",
    " - [2]. https://github.com/seliv55/ramid\n",
    " - [3]. https://github.com/seliv55/midcor\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading the ISA-API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from hashlib import md5, sha1, sha256, blake2b\n",
    "\n",
    "from isatools.model import (\n",
    "    Comment,\n",
    "    Investigation,\n",
    "    Study,\n",
    "    StudyFactor,\n",
    "    FactorValue,\n",
    "    OntologyAnnotation,\n",
    "    Characteristic,\n",
    "    OntologySource,\n",
    "    Material,\n",
    "    Sample,\n",
    "    Source,\n",
    "    Protocol,\n",
    "    ProtocolParameter,\n",
    "    ParameterValue,\n",
    "    Process,\n",
    "    Publication,\n",
    "    Person,\n",
    "    Assay,\n",
    "    DataFile,\n",
    "    plink\n",
    ")\n",
    "\n",
    "\n",
    "HASH_FUNCTIONS = {\n",
    "    \"md5\": md5,\n",
    "    \"sha1\": sha1,\n",
    "    \"sha256\": sha256,\n",
    "    \"blake2\": blake2b,\n",
    "}\n",
    "\n",
    "\n",
    "def compute_hash(file_path, file, hash_func):\n",
    "    \"\"\"a subfunction generating the hash using hashlib functions\n",
    "\n",
    "    :param file_path:\n",
    "    :param file:\n",
    "    :param hash_func:\n",
    "    :return:\n",
    "    \"\"\"\n",
    "\n",
    "    with open(os.path.join(file_path, file), \"rb\") as f:\n",
    "        for byte_block in iter(lambda: f.read(4096), b\"\"):\n",
    "            hash_func.update(byte_block)\n",
    "    return hash_func.hexdigest()\n",
    "\n",
    "\n",
    "def update_checksum(file_path, isa_file_object: DataFile, checksum_type):\n",
    "    \"\"\" a helper function to compute file checksum given a file path, an isa data file name and a type of algorithm\n",
    "\n",
    "    :param file_path:\n",
    "    :param isa_file_object:\n",
    "    :param checksum_type: enum\n",
    "    :return: isa_file_object:\n",
    "    :raises ValueError: when the checksum is invalid\n",
    "    \"\"\"\n",
    "    if checksum_type in HASH_FUNCTIONS.keys():\n",
    "        hash_type = HASH_FUNCTIONS[checksum_type]()\n",
    "        file_checksum = compute_hash(file_path, isa_file_object.filename, hash_type)\n",
    "        isa_file_object.comments.append(Comment(name=\"checksum type\", value=checksum_type))\n",
    "    else:\n",
    "        raise ValueError(\"Invalid checksum type\")\n",
    "    isa_file_object.comments.append(Comment(name=\"checksum\", value=file_checksum))\n",
    "\n",
    "    return isa_file_object\n",
    "\n",
    "\n",
    "def create_directories() -> None:\n",
    "    \"\"\" Creates all the directories required by the notebook \"\"\"\n",
    "    here_path: str = os.getcwd()\n",
    "    bh2023_output_path: str = os.path.join(here_path, \"output\", \"ISA-BH2023-ALL\")\n",
    "\n",
    "    directories: dict[str, list[str]] = {\n",
    "        'TAB': ['BH23-ISATAB_FROM_TAB'],\n",
    "        'JSON': ['BH23-ISATAB', 'BH23-ISATAB_FROM_JSON'],\n",
    "        'DERIVED_FILES': [],\n",
    "        'RAW_FILES': []\n",
    "    }\n",
    "\n",
    "    for directory, subdirectories in directories.items():\n",
    "        directory_path: str = os.path.join(bh2023_output_path, directory)\n",
    "        if not os.path.exists(directory_path):\n",
    "            os.makedirs(directory_path)\n",
    "        for subdirectory in subdirectories:\n",
    "            sub_directory_path: str = os.path.join(directory_path, subdirectory)\n",
    "            if not os.path.exists(sub_directory_path):\n",
    "                os.makedirs(sub_directory_path)\n",
    "\n",
    "\n",
    "create_directories()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Programmatic reporting of a 13C Stable Isotope Resolved Metabolomics (SIRM) study"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Declaring the Ontologies and Vocabularies used in the ISA Study"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "investigation = Investigation()\n",
    "\n",
    "chebi = OntologySource(\n",
    "    name=\"CHEBI\",\n",
    "    description=\"Chemical Entity of Biological Interest\",\n",
    "    version=\"1.0\",\n",
    "    file=\"https://www.example.org/CHEBI\"\n",
    ")\n",
    "efo = OntologySource(name=\"EFO\", description=\"Experimental Factor Ontology\")\n",
    "msio = OntologySource(name=\"MSIO\", description=\"Metabolomics Standards Initiative Ontology\")\n",
    "obi = OntologySource(name='OBI', description=\"Ontology for Biomedical Investigations\")\n",
    "pato = OntologySource(name='PATO', description=\"Phenotype and Trait Ontology\")\n",
    "uo = OntologySource(name=\"UO\", description=\"Unit Ontology\")\n",
    "ncbitaxon = OntologySource(name=\"NCIBTaxon\", description=\"NCBI Taxonomy\")\n",
    "ncbitaxon.comments.append(Comment(name=\"onto-test\", value=\"onto-value\"))\n",
    "\n",
    "investigation.ontology_source_references = [chebi, efo, obi, pato, ncbitaxon, msio, uo]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Declaring Units to be used at study or assay levels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mag_field_unit = OntologyAnnotation(term=\"Tesla\", term_source=uo, term_accession=\"https://purl.org/\")\n",
    "mass_unit = OntologyAnnotation(term=\"mg\", term_source=uo, term_accession=\"https://purl.org/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Basic Study description: declaring Study Factor and and Study Design type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "study = Study(filename=\"s_BH2023-study.txt\")\n",
    "study.identifier= \"BH2023\"\n",
    "study.title = \"[U-13C6]-D-glucose labeling experiment in MCF7 cancer cell line\"\n",
    "study.description = \"Probing cancer pathways of MCF7 cell line using 13C stable isotope resolved metabolomics study using isotopologue distribution analysis with mass spectrometry and isotopomer analysis by 1D 1H NMR.\"\n",
    "study.submission_date = \"2021-08-15\"\n",
    "study.public_release_date = \"2021-08-15\"\n",
    "\n",
    "# These EMBL-EBI Metabolights (MTBLS) related ISA Comments fields may be used for deposition to EMBL-EBI\n",
    "SRA_comments = [\n",
    "    {\"name\": \"EMBL Broker Name\", \"value\": \"OXFORD\"},\n",
    "    {\"name\": \"EMBL Center Name\", \"value\": \"OXFORD\"},\n",
    "    {\"name\": \"EMBL Center Project Name\", \"value\": \"OXFORD\"},\n",
    "    {\"name\": \"EMBL Lab Name\", \"value\": \"Oxford e-Research Centre\"},\n",
    "    {\"name\": \"EMBL Submission Action\", \"value\": \"ADD\"}\n",
    "]\n",
    "\n",
    "Funders_comments = [\n",
    "    {\"name\": \"Study Funding Agency\", \"value\": \"\"},\n",
    "    {\"name\": \"Study Grant Number\", \"value\": \"\"}    \n",
    "]\n",
    "for cmt in SRA_comments:\n",
    "    sra_comment = Comment(name=cmt[\"name\"], value=cmt[\"value\"])\n",
    "    study.comments.append(sra_comment)\n",
    "    \n",
    "for cmt in Funders_comments:\n",
    "    funder_cmt = Comment(name=cmt[\"name\"], value=cmt[\"value\"])\n",
    "    study.comments.append(funder_cmt)\n",
    "\n",
    "# Adding a Study Design descriptor to the ISA Study object\n",
    "intervention_design = OntologyAnnotation(term_source=obi)\n",
    "intervention_design.term = \"intervention design\"\n",
    "intervention_design.term_accession = \"http://purl.obolibrary.org/obo/OBI_0000115\"\n",
    "\n",
    "study_design = OntologyAnnotation(term_source=msio)\n",
    "study_design.term = \"stable isotope resolved metabolomics study\"\n",
    "study_design.term_accession = \"http://purl.obolibrary.org/obo/MSIO_0000096\"\n",
    "\n",
    "study.design_descriptors.append(intervention_design)\n",
    "study.design_descriptors.append(study_design)\n",
    "\n",
    "\n",
    "# Declaring the Study Factors\n",
    "agent_ft_annot = OntologyAnnotation(term=\"chemical substance\",\n",
    "                                    term_accession=\"http://purl.obolibrary.org/obo/CHEBI_59999\",\n",
    "                                    term_source=chebi)\n",
    "intensity_ft_annot = OntologyAnnotation(term=\"dose\",\n",
    "                                        term_accession=\"http://www.ebi.ac.uk/efo/EFO_0000428\",\n",
    "                                        term_source=efo)\n",
    "duration_ft_annot = OntologyAnnotation(term=\"time\",\n",
    "                                       term_accession=\"http://purl.obolibrary.org/obo/PATO_0000165\",\n",
    "                                       term_source=pato)\n",
    "study.factors = [\n",
    "    StudyFactor(name=\"compound\",factor_type=agent_ft_annot),\n",
    "    StudyFactor(name=\"dose\",factor_type=intensity_ft_annot),\n",
    "    StudyFactor(name=\"duration\",factor_type=duration_ft_annot)\n",
    "]\n",
    "\n",
    "# Associating the levels to each of the Study Factor.\n",
    "\n",
    "agent_fvalue_annot = OntologyAnnotation(term=\"dioxygen\", term_source=obi, term_accession=\"https://purl.org/\")\n",
    "intensity_fvalue_annot1 = OntologyAnnotation(term=\"high\", term_source=obi, term_accession=\"https://purl.org/\")\n",
    "intensity_fvalue_annot2 =OntologyAnnotation(term=\"normal\", term_source=obi, term_accession=\"https://purl.org/\")\n",
    "duration_fvalue_annot =OntologyAnnotation(term=\"hour\", term_source=obi, term_accession=\"https://purl.org/\")\n",
    "\n",
    "fv1 = FactorValue(factor_name=study.factors[0], value=agent_fvalue_annot)\n",
    "fv2 = FactorValue(factor_name=study.factors[1], value=intensity_fvalue_annot1)\n",
    "fv3 = FactorValue(factor_name=study.factors[1], value=intensity_fvalue_annot2)\n",
    "fv4 = FactorValue(factor_name=study.factors[2], value=duration_fvalue_annot)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Adding the publications associated to the study"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "status_annot_value = OntologyAnnotation(term=\"indexed in PubMed\", term_source=obi, term_accession=\"https://purl.org/\")\n",
    "\n",
    "study.publications = [\n",
    "    Publication(doi=\"10.1371/journal.pone.0000000\",pubmed_id=\"36007233\",\n",
    "                title=\"Decyphering new cancer pathways with stable isotope resolved metabolomics in MCF7 cell lines\",\n",
    "                status=status_annot_value,\n",
    "                author_list=\"Min,W. and Everest H\"),\n",
    "   \n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Adding the authors of the study"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "\n",
    "study.contacts = [\n",
    "    Person(first_name=\"Weng\", last_name=\"Min\", affiliation=\"Beijing Institute of Metabolism\", email=\"weng.min@bim.edu.cn\",\n",
    "           address=\"Prospect Street, Beijing, People's Republic of China\",\n",
    "           comments=[Comment(name=\"Study Person REF\", value=\"\")],\n",
    "            roles=[OntologyAnnotation(term=\"principal investigator role\"),\n",
    "                   OntologyAnnotation(term=\"SRA Inform On Status\"),\n",
    "                   OntologyAnnotation(term=\"SRA Inform On Error\")]\n",
    "    ),\n",
    "    Person(first_name=\"Hillary\", last_name=\"Everest\", affiliation=\"Centre for Cell Metabolism\",\n",
    "           address=\"CCM, Edinborough, United Kingdom\",\n",
    "           comments=[Comment(name=\"Study Person REF\", value=\"\")],\n",
    "           roles=[OntologyAnnotation(term=\"principal investigator role\")]\n",
    "    )\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Declaring all the protocols used in the ISA study. Note also the declaration of Protocol Parameters when needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "study.protocols = [ \n",
    "    #Protocol #0\n",
    "    Protocol(name=\"cell culture and isotopic labeling\",\n",
    "             description=\"SOP for growing MCF7 cells and incubating them with the tracer molecule\",\n",
    "             protocol_type=OntologyAnnotation(term=\"sample collection\"),\n",
    "             parameters=[\n",
    "              ProtocolParameter(parameter_name=OntologyAnnotation(term=\"tracer molecule\"))\n",
    "             ]\n",
    "            ),\n",
    "    #Protocol #1\n",
    "    Protocol(\n",
    "        name=\"intracellular metabolite extraction\",\n",
    "        description=\"SOP for extracting metabolites from harvested cells\",\n",
    "        protocol_type=OntologyAnnotation(term=\"extraction\")\n",
    "    ),\n",
    "    #Protocol #2\n",
    "    Protocol(\n",
    "        name=\"extracellular metabolite extraction\",\n",
    "        description=\"SOP for extracting metabolites from cell culture supernatant\",\n",
    "        protocol_type=OntologyAnnotation(term=\"extraction\")\n",
    "    ),\n",
    "    #Protocol #3\n",
    "    Protocol(\n",
    "        name=\"liquid chromatography mass spectrometry\",\n",
    "        description=\"SOP for LC-MS data acquisition\",\n",
    "        protocol_type=OntologyAnnotation(term=\"mass spectrometry\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"chromatography column\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"mass spectrometry instrument\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"mass analyzer\"))\n",
    "        ]\n",
    "    ),\n",
    "    #Protocol #4\n",
    "    Protocol(\n",
    "        name=\"1D 13C NMR spectroscopy for isotopomer analysis\",\n",
    "        description=\"SOP for 1D 13C NMR data acquisition for isotopomer analysis\",\n",
    "        protocol_type=OntologyAnnotation(term=\"NMR spectroscopy\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"magnetic field strength\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"nmr tube\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"pulse sequence\"))\n",
    "        ]\n",
    "    ),\n",
    "    #Protocol #5\n",
    "    Protocol(\n",
    "        name=\"1D 13C NMR spectroscopy for metabolite profiling\",\n",
    "        description=\"SOP for 1D 13C NMR data acquisition for metabolite profiling\",\n",
    "        protocol_type=OntologyAnnotation(term=\"NMR spectroscopy\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"magnetic field strength\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"nmr tube\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"pulse sequence\"))\n",
    "        ]\n",
    "    ),\n",
    "    #Protocol #6\n",
    "    Protocol(\n",
    "        name=\"MS metabolite identification\",\n",
    "        description=\"SOP for MS signal processing and metabolite and isotopologue identification\",\n",
    "        protocol_type=OntologyAnnotation(term=\"metabolite identification\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"ms software\"))\n",
    "        ]\n",
    "    ),\n",
    "    #Protocol #7\n",
    "    Protocol(\n",
    "        name=\"NMR metabolite identification\",\n",
    "        description=\"SOP for NMR signal processing and metabolite and isotopomer identification\",\n",
    "        uri=\"https://doi.org/10.1021/acs.analchem.1c01064\",\n",
    "        protocol_type=OntologyAnnotation(term=\"data transformation\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"nmr software\"))\n",
    "        ]\n",
    "    ),\n",
    "    \n",
    "    #Protocol #8\n",
    "    Protocol(\n",
    "        name=\"mRNA extraction\",\n",
    "        description=\"procedure for isolating messenger RNA for transcriptomics analysis\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"material separation\")\n",
    "    ),\n",
    "    \n",
    "    #Protocol #9\n",
    "    Protocol(\n",
    "        name=\"gDNA extraction\",\n",
    "        description=\"procedure for isolating genomic DNA for copy number variation analysis\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"material separation\")\n",
    "    ),\n",
    "    \n",
    "    #Protocol #10\n",
    "    Protocol(\n",
    "        name=\"gDNA library preparation\",\n",
    "        description=\"procedure for isolating genoic DNA for copy number variation analysis\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"library construction\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library strategy\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library selection\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library source\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library orientation\"))          \n",
    "        ]\n",
    "    ),\n",
    "    \n",
    "    #Protocol #11\n",
    "    Protocol(\n",
    "        name=\"mRNA library preparation\",\n",
    "        description=\"procedure for isolating genoic DNA for gene expression analysis\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"library construction\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library strategy\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library selection\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library source\")),\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library orientation\"))          \n",
    "        ]\n",
    "    ),\n",
    "    \n",
    "    #Protocol #12\n",
    "    Protocol(\n",
    "        name=\"nucleic acid sequencing\",\n",
    "        description=\"SOP for nucleic acid sequencing\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"nucleic acid sequencing\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"sequencing instrument\"))\n",
    "        ]\n",
    "    ),\n",
    "    \n",
    "    #Protocol #13\n",
    "    Protocol(\n",
    "        name=\"transcription analysis\",\n",
    "        description=\"SOP for transcriptomics analysis\",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"data transformation\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"sequence analysis software\"))\n",
    "        ]\n",
    "    ),\n",
    "    \n",
    "    #Protocol #14\n",
    "    Protocol(\n",
    "        name=\"CNV analysis\",\n",
    "        description=\"SOP for CNV \",\n",
    "        uri=\"\",\n",
    "        protocol_type=OntologyAnnotation(term=\"data transformation\"),\n",
    "        parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"variant calling software\"))\n",
    "        ]\n",
    "    )\n",
    "    \n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now, creating ISA Source and Sample objects and building the ISA study table\n",
    "\n",
    "\n",
    "In this fictional study, we assume the following underlying experimental setup:\n",
    "\n",
    "- Human MCF-7 breast cancer cell line will be grown in 2 distinct conditions, namely \"normal concentration of dioxygen for 72 hours\" and \"high concentration of dioxygen for 72 hours\".\n",
    "- Each cell culture batch will be grown in the presence of 80% [1-13C1]-D-glucose + 20% [U-13C6]-D-glucose tracer molecules\n",
    "- For each cell culture, 4 samples will be collected for characterisation.\n",
    "- 3 assay modalities will be used on each of the collected samples, namely:\n",
    "    - isotopologue distribution analysis using LC-MS\n",
    "    - isotopomer analysis using 1D 13C NMR spectrometry with 3 distinct pulse sequences (HSQC, ZQF-TOCSY, HCNA, and HACO-DIPSY)\n",
    "    - metabolite profiling using 1D 13C NMR spectrometry with one pulse sequence (CPMG)\n",
    "- Data analysis for each data modality will be performed with dedicated data analysis protocols\n",
    "- Integrative analysis (cutting accross results coming from each assay modality) will be performed using a dedicated workflow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's start by building the ISA Source and ISA Samples reflecting the experimental plan:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Creating the ISA Source Materials\n",
    "study.sources = [Source(name=\"culture-1\"), Source(name=\"culture-2\")]\n",
    "\n",
    "src_characteristic_biosamplexref = Characteristic(category=OntologyAnnotation(term=\"namespace:biosample:src\"),\n",
    "                                     value=OntologyAnnotation(term=\"SRC:\" ,\n",
    "                                                              term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "characteristic_organism = Characteristic(category=OntologyAnnotation(term=\"Organism\"),\n",
    "                                     value=OntologyAnnotation(term=\"Homo sapiens\",\n",
    "                                                              term_source=ncbitaxon,\n",
    "                                                              term_accession=\"http://purl.obolibrary.org/obo/NCBITaxon_9606\"))\n",
    "\n",
    "characteristic_cell = Characteristic(category=OntologyAnnotation(term=\"cell line\"),\n",
    "                                     value=OntologyAnnotation(term=\"MCF-7\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "\n",
    "\n",
    "study.characteristic_categories.append(src_characteristic_biosamplexref.category)\n",
    "study.characteristic_categories.append(characteristic_organism.category)\n",
    "study.characteristic_categories.append(characteristic_cell.category)\n",
    "\n",
    "\n",
    "for i in range(len(study.sources)):  \n",
    "    study.sources[i].characteristics.append(src_characteristic_biosamplexref)\n",
    "    study.sources[i].characteristics.append(characteristic_organism)\n",
    "    study.sources[i].characteristics.append(characteristic_cell)\n",
    "    \n",
    "\n",
    "# Note how the treatment groups are defined as sets of factor values attached to the ISA.Sample object\n",
    "treatment_1 = [fv1,fv2,fv4]\n",
    "treatment_2 = [fv1,fv3,fv4]\n",
    "\n",
    "\n",
    "# Ensuring the Tracer Molecule(s) used for the SIRM study is properly reported\n",
    "tracer_mol_C = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"tracer molecule\",\n",
    "                                                                                           term_source=\"\",\n",
    "                                                                                           term_accession=\"\")),\n",
    "                            value=OntologyAnnotation(term=\"80% [1-13C1]-D-glucose + 20% [U-13C6]-D-glucose\",\n",
    "                                                     term_source=chebi,\n",
    "                                                     term_accession=\"https://purl.org/chebi_1212\"))\n",
    "\n",
    "\n",
    "tracers = [tracer_mol_C]\n",
    "\n",
    "# the number of samples collected from each culture condition\n",
    "replicates = 4\n",
    "# Now creating a Process showing a `Protocol Application` using Source as input and producing Sample as output.\n",
    "\n",
    "for k in range(replicates):\n",
    "    \n",
    "    smp_characteristics_biosamplexref = Characteristic(category=OntologyAnnotation(term=\"namespace:biosample:smp\"),\n",
    "                                                       value=OntologyAnnotation(term=(\"SAME:\" + str(k)), term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "    \n",
    "    study.characteristic_categories.append(smp_characteristics_biosamplexref.category)\n",
    "    \n",
    "    study.samples.append(Sample(name=(study.sources[0].name + \"-sample-\" + str(k)),\n",
    "                                characteristics=[smp_characteristics_biosamplexref],\n",
    "                                factor_values=treatment_1))\n",
    "    \n",
    "    study.samples.append(Sample(name=(study.sources[1].name + \"-sample-\" + str(k)), \n",
    "                                characteristics=[smp_characteristics_biosamplexref],\n",
    "                                factor_values=treatment_2))\n",
    "\n",
    "sample_collection_mbx = Process(name=\"sample-collection-process-mbx\",\n",
    "        executes_protocol=study.protocols[0], # a sample collection\n",
    "        inputs=[study.sources[0]],\n",
    "        outputs=[study.samples[0],study.samples[2],study.samples[4],study.samples[6]],\n",
    "        parameter_values= [tracer_mol_C])\n",
    "\n",
    "sample_collection_gtx = Process(name=\"sample-collection-process-gtx\",\n",
    "        executes_protocol=study.protocols[0], # a sample collection\n",
    "        inputs=[study.sources[1]],\n",
    "        outputs=[study.samples[1],study.samples[3],study.samples[5],study.samples[7]],\n",
    "        parameter_values= [tracer_mol_C])\n",
    "\n",
    "study.process_sequence.append(sample_collection_mbx)\n",
    "\n",
    "study.process_sequence.append(sample_collection_gtx)\n",
    "\n",
    "study.units = []\n",
    "                          \n",
    "# Now appending the ISA Study object to the ISA Investigation object    \n",
    "investigation.studies = [study]\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now, creating the ISA objects needed to represent assays and raw data acquisition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's start by declaring the 5 modalities as 5 ISA Assays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#Starting by declaring the 2 types of assays used in BII-S-3 as coded with ISAcreator tool\n",
    "# assay = Assay(filename=\"a_\"+ study.identifier + \"-isotopologue-ms-assay.txt\")\n",
    "# assay.measurement_type = OntologyAnnotation(term=\"isotopologue distribution analysis\",term_accession=\"http://purl.obolibrary.org/obo/msio.owl#mass_isotopologue_distribution_analysis\", term_source=msio)\n",
    "# assay.technology_type = OntologyAnnotation(term=\"mass spectrometry\", term_accession=\"http://purl.obolibrary.org/obo/CHMO_0000470\", term_source=msio)\n",
    "# assay.comments.append(Comment(name=\"target repository\", value=\"metabolights\"))\n",
    "\n",
    "# assay_nmr_topo = Assay(filename=\"a_\"+ study.identifier + \"-isotopomer-nmr-assay.txt\")\n",
    "# assay_nmr_topo.measurement_type = OntologyAnnotation(term=\"isotopomer analysis\",term_accession=\"http://purl.obolibrary.org/obo/msio.owl#isotopomer_analysis\", term_source=msio)\n",
    "# assay_nmr_topo.technology_type = OntologyAnnotation(term=\"NMR spectroscopy\",term_accession=\"http://purl.obolibrary.org/obo/CHMO_0000591\", term_source=msio)\n",
    "# assay_nmr_topo.comments.append(Comment(name=\"target repository\", value=\"metabolights\"))\n",
    "#\n",
    "assay_nmr_metpro = Assay(filename=\"a_\"+ study.identifier + \"-metabolite-profiling-nmr-assay.txt\")\n",
    "assay_nmr_metpro.measurement_type = OntologyAnnotation(term=\"metabolite profiling\",term_accession=\"http://purl.obolibrary.org/obo/MSIO_0000101\", term_source=msio)\n",
    "assay_nmr_metpro.technology_type = OntologyAnnotation(term=\"NMR spectroscopy\",term_accession=\"http://purl.obolibrary.org/obo/CHMO_0000591\", term_source=msio)\n",
    "assay_nmr_metpro.comments.append(Comment(name=\"target repository\", value=\"metabolights\"))\n",
    "\n",
    "assay_cnv_seq = Assay(filename=\"a_\"+ study.identifier + \"-cnv_seq-assay.txt\")\n",
    "assay_cnv_seq.measurement_type = OntologyAnnotation(term=\"copy number variation profiling\",term_accession=\"https://purl.org\", term_source=msio)\n",
    "assay_cnv_seq.technology_type = OntologyAnnotation(term=\"nucleotide sequencing\",term_accession=\"https://purl.org\", term_source=msio)\n",
    "assay_cnv_seq.comments.append(Comment(name=\"target repository\", value=\"ega\"))\n",
    "\n",
    "assay_rna_seq = Assay(filename=\"a_\"+ study.identifier + \"-rna-seq-assay.txt\")\n",
    "assay_rna_seq.measurement_type = OntologyAnnotation(term=\"transcription profiling\", term_accession=\"https://purl.org\", term_source=msio)\n",
    "assay_rna_seq.technology_type = OntologyAnnotation(term=\"nucleotide sequencing\", term_accession=\"https://purl.org\", term_source=msio)\n",
    "assay_rna_seq.comments.append(Comment(name=\"target repository\", value=\"arrayexpress\"))\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Warning**\n",
    "\n",
    "- The current release of ISA-API throws an error if Assay `technology type` OntologyAnnotation.term is left empty\n",
    "- The coming release 10.13 will address this issue.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The mass isotopologue distribution analysis assay using MS acquisitions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "main_path = \"./output/ISA-BH2023-ALL/\"\n",
    "data_path = \"./output/\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE**\n",
    "make sure to used `ISA API plink function` to connects the protocols in a chain."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conventional Metabolite Profiling  using dedicated 1D 13C NMR acquisitions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "nmr_sw = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"nmr software\")),\n",
    "                                value=OntologyAnnotation(term=\"Batman\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "nmr_derivedDF = DataFile(filename=\"metpro-analysis.txt\", label=\"Derived Spectral Data File\")\n",
    "f=open(os.path.join(main_path, \"DERIVED_FILES/\",\"metpro-analysis.txt\"),\"w+\")\n",
    "f.write(\"metpro-analysis.txt\")\n",
    "f.close\n",
    "\n",
    "nmr_da_process = Process(\n",
    "    name = \"NMR-metpro-DT-ident\",\n",
    "    executes_protocol=study.protocols[7],\n",
    "    parameter_values=[nmr_sw],\n",
    "    outputs=[nmr_derivedDF]\n",
    ")\n",
    "\n",
    "assay_nmr_metpro.data_files.append(nmr_derivedDF)\n",
    "\n",
    "for i, sample in enumerate(study.samples):\n",
    "    \n",
    "#    extraction process takes as input a sample, and produces an extract material as output\n",
    "\n",
    "    material_nmr_metpro = Material(name=\"extract-nmr-metpro-{}\".format(i),\n",
    "                                   type_=\"Extract Name\")\n",
    "    \n",
    "    extraction_process_nmr_metpro = Process(\n",
    "        name=\"extract-process-{}\".format(i),\n",
    "        executes_protocol=study.protocols[1], \n",
    "        inputs=[sample],\n",
    "        outputs=[material_nmr_metpro]\n",
    "    )\n",
    "    \n",
    "    \n",
    "    # create a nmr acquisition process that executes the nmr protocol\n",
    "    magnet = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"magnetic field strength\")),\n",
    "                            value=6.5,\n",
    "                            unit=mag_field_unit)\n",
    "    tube = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"nmr tube\")),\n",
    "                            value=OntologyAnnotation(term=\"Brucker 14 mm Oscar\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "    pulse_a = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"pulse sequence\")),\n",
    "                            value=OntologyAnnotation(term=\"CPMG\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "#     pulses=[pulse_a]\n",
    "\n",
    "#     for j in range(len(pulses)):\n",
    "\n",
    "    metpro_process = Process(executes_protocol=study.protocols[5],parameter_values=[magnet,tube,pulse_a])\n",
    "    metpro_process.name = \"assay-name-nmr-metpro-\"+ pulse_a.value.term +\"-{}\".format(i+1)\n",
    "    metpro_process.inputs.append(extraction_process_nmr_metpro.outputs[0])\n",
    "\n",
    "        # a Data acquisition process usually has an output data file\n",
    "\n",
    "    datafile_nmr_metpro = DataFile(filename=\"nmr-data-metpro-\"+pulse_a.value.term +\"-{}.nmrml\".format(i+1), label=\"Free Induction Decay Data File\")\n",
    "    f=open(os.path.join(main_path,\"RAW_FILES/\",\"nmr-data-metpro-\"+ pulse_a.value.term +\"-{}.nmrml\".format(i+1)),\"w+\")\n",
    "    f.write(\"nmr-data-metpro-\"+ pulse_a.value.term +\"-{}.nmrml\".format(i+1))\n",
    "    f.close\n",
    "\n",
    "    metpro_process.outputs.append(datafile_nmr_metpro)\n",
    "    nmr_da_process.inputs.append(datafile_nmr_metpro)\n",
    "\n",
    "        # Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set\n",
    "        # these links for you. It is found in the isatools.model package\n",
    "\n",
    "    assay_nmr_metpro.samples.append(sample)\n",
    "    assay_nmr_metpro.other_material.append(material_nmr_metpro)\n",
    "    assay_nmr_metpro.data_files.append(datafile_nmr_metpro)\n",
    "\n",
    "    assay_nmr_metpro.process_sequence.append(extraction_process_nmr_metpro)\n",
    "    assay_nmr_metpro.process_sequence.append(metpro_process)\n",
    "    assay_nmr_metpro.process_sequence.append(nmr_da_process)        \n",
    "\n",
    "    # plink(sample_collection_mbx, extraction_process_nmr_metpro)\n",
    "    # plink(extraction_process_nmr_metpro, metpro_process)\n",
    "    # # plink(metpro_process, nmr_da_process)\n",
    "# make sure the extract, data file, and the processes are attached to the assay\n",
    "\n",
    "\n",
    "    \n",
    "assay_nmr_metpro.units.append(mag_field_unit)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Transcriptomics profiling using sequencing assay:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#TODO: this is static: take it out of the for loop\n",
    "char_ext_rna_seq = Characteristic(category=OntologyAnnotation(term=\"Stuff Type\"),\n",
    "                                  value=OntologyAnnotation(term=\"mRNA\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "#TODO: this is static: take it out of the for loop\n",
    "rna_strat = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library strategy\")),\n",
    "                        value=OntologyAnnotation(term=\"RNA-SEQ\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "rna_sel = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library selection\")),\n",
    "                        value=OntologyAnnotation(term=\"OTHER\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "rna_src = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library source\")),\n",
    "                        value=OntologyAnnotation(term=\"TRANSCRIPTOMICS\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "rna_ori = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library orientation\")),\n",
    "                        value=OntologyAnnotation(term=\"SINGLE\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "\n",
    "rna_label = Characteristic(category=OntologyAnnotation(term=\"Label\"), value=OntologyAnnotation(term=\"AAAAAAAAAA\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "seq_instrument = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"sequencing instrument\")),\n",
    "                                    value=OntologyAnnotation(term=\"Illumina MiSeq\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "rna_sw = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"sequence analysis software\")),\n",
    "                            value=OntologyAnnotation(term=\"DESeq2\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "for i, sample in enumerate(study.samples):\n",
    "\n",
    "    # extraction process takes as input a sample, and produces an extract material as output\n",
    "    \n",
    "    material_rna_seq = Material(name=\"extract-rna-seq-{}\".format(i))\n",
    "    material_rna_seq.type = \"Extract Name\"\n",
    "    material_rna_seq.characteristics.append(char_ext_rna_seq)\n",
    "    # print(char_ext_rna_seq.to_dict())\n",
    "    \n",
    "    # create an extraction process that executes the extraction protocol\n",
    "\n",
    "    extraction_process_rna_seq = Process(\n",
    "        name=\"extract-process-rna-seq-{}\".format(i),\n",
    "        executes_protocol=study.protocols[8],\n",
    "        inputs=[sample],\n",
    "        outputs=[material_rna_seq]\n",
    "    )\n",
    "    \n",
    "\n",
    "    # create a library contruction process that executes the gDNA library construction protocol\n",
    "\n",
    "\n",
    "    rna_library = Material(name=\"rna-library-name-{}\".format(i))\n",
    "    rna_library.type = \"Labeled Extract Name\"\n",
    "    rna_library.characteristics.append(rna_label)\n",
    "    \n",
    "    rna_lib_process = Process(\n",
    "        name = \"rna-library-name-{}\".format(i),\n",
    "        executes_protocol=study.protocols[11],\n",
    "        parameter_values=[rna_strat,rna_sel, rna_src, rna_ori],\n",
    "        inputs=[extraction_process_rna_seq.outputs[0]],\n",
    "        outputs=[rna_library]\n",
    "    )\n",
    "\n",
    "    # rna seq acquisition process usually has an output fastq data file\n",
    "    rna_datafile = DataFile(filename=\"rna-seq-data-{}.fastq\".format(i), label=\"Raw Data File\")\n",
    "    f=open(os.path.join(main_path, \"rna-seq-data-{}.fastq\".format(i)),\"w+\")\n",
    "    f.write(\"rna-seq-data-{}.fastq\".format(i))\n",
    "    # f.close\n",
    "\n",
    "\n",
    "    updated_rna_datafile = update_checksum(main_path, rna_datafile, \"md5\")\n",
    "\n",
    "    rna_data_comment = Comment(name=\"export\",value=\"yes\")\n",
    "    # rna_data_comment1 = Comment(name=\"checksum\", value=md5)\n",
    "    # rna_data_comment2 = Comment(name=\"checksum type\", value=\"MD5\")\n",
    "\n",
    "    updated_rna_datafile.comments.append(rna_data_comment)\n",
    "    # rna_datafile.comments.append(rna_data_comment1)\n",
    "    # rna_datafile.comments.append(rna_data_comment2)\n",
    "    #\n",
    "    \n",
    "    # create a sequencing process that executes the sequencing protoco\n",
    "    rna_seq_process = Process(\n",
    "        name = \"assay-name-rna-seq-{}\".format(i),\n",
    "        executes_protocol=study.protocols[12],\n",
    "        parameter_values=[seq_instrument],\n",
    "        inputs=[rna_lib_process.outputs[0]],\n",
    "        outputs=[updated_rna_datafile]\n",
    "    )\n",
    "\n",
    "    # Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set\n",
    "    # these links for you. It is found in the isatools.model package\n",
    "\n",
    "    assay_rna_seq.samples.append(sample)\n",
    "    assay_rna_seq.other_material.append(material_rna_seq)\n",
    "    assay_rna_seq.other_material.append(rna_library)\n",
    "    assay_rna_seq.data_files.append(updated_rna_datafile)\n",
    "\n",
    "    \n",
    "    rnaseq_drvdf = DataFile(filename=\"rna-seq-DEA.txt\", label=\"Derived Data File\")\n",
    "    dvf=open(os.path.join(main_path,\"rna-seq-DEA.txt\"),\"w+\")\n",
    "    dvf.write(\"rna-seq-DEA.txt\")\n",
    "    dvf.close\n",
    "    \n",
    "\n",
    "    rna_drvdata_comment  = Comment(name=\"export\", value=\"yes\")\n",
    "    updated_rnaseq_drvdf = update_checksum(main_path, rnaseq_drvdf, \"md5\")\n",
    "    updated_rnaseq_drvdf.comments.append(rna_drvdata_comment)\n",
    "\n",
    "    \n",
    "    rna_da_process = Process(\n",
    "        name = \"RNASEQ-DT\",\n",
    "        executes_protocol=study.protocols[13],\n",
    "        parameter_values=[rna_sw],\n",
    "        inputs=[rna_datafile],\n",
    "        outputs=[updated_rnaseq_drvdf]\n",
    "    )\n",
    "    \n",
    "    assay_rna_seq.process_sequence.append(extraction_process_rna_seq)\n",
    "    assay_rna_seq.process_sequence.append(rna_lib_process)\n",
    "    assay_rna_seq.process_sequence.append(rna_seq_process)\n",
    "    assay_rna_seq.process_sequence.append(rna_da_process)\n",
    "    \n",
    "    # plink(sample_collection_gtx, extraction_process_rna_seq)\n",
    "    # plink(extraction_process_rna_seq, rna_lib_process)\n",
    "    # plink(rna_lib_process, rna_seq_process)\n",
    "  # plink(rna_seq_process, rna_da_process)\n",
    "\n",
    "    \n",
    "assay_rna_seq.characteristic_categories.append(char_ext_rna_seq.category)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Copy Number Variation profiling using sequencing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "char_ext_cnv_seq = Characteristic(category=OntologyAnnotation(term=\"Stuff Type\", term_source=\"\", term_accession=\"\"),\n",
    "                                     value=OntologyAnnotation(term=\"gDNA\", term_source=obi, term_accession=\"https://purl.org/OBOfoundry/obi:123414\"))\n",
    "\n",
    "# create a library contruction process that executes the gDNA library construction protocol\n",
    "cnv_strat = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library strategy\")),\n",
    "                        value=OntologyAnnotation(term=\"WGS\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "cnv_sel = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library selection\")),\n",
    "                        value=OntologyAnnotation(term=\"OTHER\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "cnv_src = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library source\")),\n",
    "                        value=OntologyAnnotation(term=\"GENOMICS\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "cnv_ori = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"library orientation\")),\n",
    "                        value=OntologyAnnotation(term=\"SINGLE\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "cnv_label = Characteristic(category=OntologyAnnotation(term=\"Label\") , value=OntologyAnnotation(term=\"Not Applicable\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "seq_instrument = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"sequencing instrument\")),\n",
    "                            value=OntologyAnnotation(term=\"AB SOLiD 5500xl\", term_source=obi, term_accession=\"https://purl.org/\")) \n",
    "\n",
    "cnv_sw = ParameterValue(category=ProtocolParameter(parameter_name=OntologyAnnotation(term=\"variant calling software\")),\n",
    "                            value=OntologyAnnotation(term=\"VCF caller\", term_source=obi, term_accession=\"https://purl.org/\"))\n",
    "\n",
    "for i, sample in enumerate(study.samples): \n",
    "\n",
    "    # extraction process takes as input a sample, and produces an extract material as output\n",
    "        \n",
    "    material_cnv_seq = Material(name=\"extract-cnv-seq-{}\".format(i))\n",
    "    material_cnv_seq.type = \"Extract Name\"\n",
    "    material_cnv_seq.characteristics.append(char_ext_cnv_seq)\n",
    "    print(material_cnv_seq.characteristics)\n",
    "    \n",
    "     # create an extraction process that executes the extraction protocol\n",
    "    extraction_process_cnv_seq = Process(\n",
    "        name=\"extract-process-cnv-seq-{}\".format(i),\n",
    "        executes_protocol=study.protocols[9],\n",
    "        inputs=[sample],\n",
    "        outputs=[material_cnv_seq]\n",
    "    )\n",
    "\n",
    "    cnv_library = Material(name=\"cnv-library-name-{}\".format(i))\n",
    "    cnv_library.type = \"Labeled Extract Name\"\n",
    "    cnv_library.characteristics.append(cnv_label)\n",
    "    \n",
    "    cnv_lib_process = Process(\n",
    "        name = \"cnv-library-name-{}\".format(i),\n",
    "        executes_protocol=study.protocols[10],\n",
    "        parameter_values=[cnv_strat,cnv_sel, cnv_src, cnv_ori],\n",
    "        inputs=[extraction_process_cnv_seq.outputs[0]],\n",
    "        outputs=[cnv_library]\n",
    "    )\n",
    "\n",
    "\n",
    "    # cnv seq acquisition process usually has an output fastq data file\n",
    "\n",
    "    cnv_datafile = DataFile(filename=\"cnv-seq-data-{}.fastq\".format(i), label=\"Raw Data File\")\n",
    "    f=open(os.path.join(main_path,\"cnv-seq-data-{}.fastq\".format(i)), \"w+\")\n",
    "\n",
    "    cnv_data_comment = Comment(name=\"export\", value=\"yes\")\n",
    "    updated_cnv_datafile = update_checksum(main_path, cnv_datafile, \"md5\")\n",
    "    updated_cnv_datafile.comments.append(cnv_data_comment)\n",
    "    \n",
    "    \n",
    "    # create a sequencing process that executes the sequencing protocol\n",
    "    cnv_seq_process = Process(\n",
    "        name = \"assay-name-cnv-seq-{}\".format(i),\n",
    "        executes_protocol=study.protocols[12],\n",
    "        parameter_values=[seq_instrument],\n",
    "        inputs=[cnv_lib_process.outputs[0]],\n",
    "        outputs=[updated_cnv_datafile]\n",
    "    )\n",
    "\n",
    "    # Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set\n",
    "    # these links for you. It is found in the isatools.model package\n",
    "\n",
    "    assay_cnv_seq.samples.append(sample)\n",
    "    assay_cnv_seq.other_material.append(material_cnv_seq)\n",
    "    assay_cnv_seq.other_material.append(cnv_library)\n",
    "    assay_cnv_seq.data_files.append(updated_cnv_datafile)\n",
    "\n",
    "\n",
    "    cnvseq_drvdf = DataFile(filename=\"cnv-seq-derived-data.vcf\", label=\"Derived Data File\")\n",
    "    dvf=open(os.path.join(main_path,\"cnv-seq-derived-data.vcf\"),\"w+\")\n",
    "    dvf.write(\"cnv-seq-derived-datav.vcf\")\n",
    "    # dvf.close\n",
    "    \n",
    "    \n",
    "    cnvseq_drvdf = DataFile(filename=\"cnv-seq-data-{}.vcf\".format(i), label=\"Derived Data File\")\n",
    "    dvf=open(os.path.join(main_path,\"cnv-seq-data-{}.vcf\".format(i)),\"w+\")\n",
    "    dvf.write(\"cnv-seq-data-{}.vcf\".format(i))\n",
    "    dvf.close\n",
    "    \n",
    "\n",
    "    cnv_drvdata_comment = Comment(name=\"export\",value=\"yes\")    \n",
    "    updated_cnvseq_drvdf = update_checksum(main_path, cnvseq_drvdf, \"md5\")\n",
    "    updated_cnvseq_drvdf.comments.append(cnv_drvdata_comment)\n",
    "    \n",
    "    \n",
    "    \n",
    "    cnv_da_process = Process(\n",
    "        name = \"VCF-DT\",\n",
    "        executes_protocol=study.protocols[14],\n",
    "        parameter_values=[cnv_sw],\n",
    "        inputs=[cnv_datafile],\n",
    "        outputs=[cnvseq_drvdf]\n",
    "    )\n",
    "    \n",
    "    \n",
    "    assay_cnv_seq.process_sequence.append(extraction_process_cnv_seq)\n",
    "    assay_cnv_seq.process_sequence.append(cnv_lib_process)\n",
    "    assay_cnv_seq.process_sequence.append(cnv_seq_process)\n",
    "    assay_cnv_seq.process_sequence.append(cnv_da_process)\n",
    "    \n",
    "    # plink(sample_collection_gtx, extraction_process_cnv_seq)\n",
    "    # plink(extraction_process_cnv_seq, cnv_lib_process)\n",
    "    # plink(cnv_lib_process, cnv_seq_process)\n",
    "    # plink(cnv_seq_process, cnv_da_process)\n",
    "#\n",
    "assay_cnv_seq.characteristic_categories.append(char_ext_cnv_seq.category)\n",
    "# print(assay_cnv_seq.other_material[0].characteristics[0].value.term)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding all ISA Assays declarations to the ISA Study object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# study.assays.append(assay)\n",
    "#study.assays.append(assay_nmr_topo)\n",
    "study.assays.append(assay_nmr_metpro)\n",
    "study.assays.append(assay_rna_seq)\n",
    "study.assays.append(assay_cnv_seq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reporting a cross-technique integrative analysis by referencing a workflow (e.g. snakemake, galaxy) with an ISA protocol  and using ISA.Protocol.uri attribute to do so."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#Protocol #*\n",
    "workflow_ref =Protocol(\n",
    "    name=\"13C SIRM MS and NMR integrative analysis\",\n",
    "    description=\"a workflow for integrating data from NMR and MS acquisition into a consolidated result\",\n",
    "    uri=\"https://doi.org/10.1021/acs.analchem.1c01064\",\n",
    "    protocol_type=OntologyAnnotation(term=\"data transformation\"),\n",
    "    parameters=[\n",
    "            ProtocolParameter(parameter_name=OntologyAnnotation(term=\"software\"))\n",
    "        ])\n",
    "study.protocols.append(workflow_ref)\n",
    "\n",
    "print(investigation.ontology_source_references[4].comments[0])\n",
    "print(study.assays[0].comments[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Serializing (writing) the ISA object representation to TAB with the ISA-API `dump` function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from isatools.isatab import dump\n",
    "\n",
    "# note the use of the flag for explicit serialization on factor values on assay tables\n",
    "dump(investigation, os.path.join(main_path,'TAB'), write_factor_values_in_assay_table=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading ISA objects from TAB and writing back to TAB (roundtrip from TAB to TAB)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools.isatab import load\n",
    "with open(os.path.join(main_path,\"TAB\", \"i_investigation.txt\")) as isa_sirm_test:\n",
    "    roundtrip = load(isa_sirm_test)\n",
    "\n",
    "# note the use of the flag for explicit serialization on factor values on assay tables\n",
    "dump(roundtrip, os.path.join(main_path,'TAB/BH23-ISATAB_FROM_TAB'), write_factor_values_in_assay_table=False)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Serializing (writing) the ISA object representation to JSON with the ISA-API "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools.isajson.dump import ISAJSONEncoder\n",
    "from json import dumps, loads\n",
    "\n",
    "\n",
    "inv_j = dumps(investigation, cls=ISAJSONEncoder)\n",
    "print(main_path)\n",
    "with open(os.path.join(main_path, 'isa-bh2023-all.json'), 'w') as out_fp:\n",
    "    out_fp.write(inv_j)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Validating the ISA object representation  with the ISA `validate` function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools import isatab\n",
    "\n",
    "my_json_report_isa_flux = isatab.validate(open(os.path.join(main_path,\"TAB\",\"i_investigation.txt\")))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "my_json_report_isa_flux[\"errors\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "NOTE: The error report indicates the need to add new configurations files matching the assay definitions.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading from ISA-TAB from disk and converting to ISA-JSON"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from isatools.isatab import load\n",
    "with open(os.path.join(main_path,\"TAB\", \"i_investigation.txt\")) as isa_sirm_test:\n",
    "    roundtrip = load(isa_sirm_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools.convert import isatab2json\n",
    "import json\n",
    "\n",
    "isa_json = isatab2json.convert(os.path.join(main_path, \"TAB\"), validate_first=True, use_new_parser=True)\n",
    "\n",
    "print(isa_json[\"studies\"][0][\"assays\"][0][\"technologyType\"][\"annotationValue\"])\n",
    "print(isa_json[\"studies\"][0][\"assays\"][0][\"processSequence\"][10][\"name\"])\n",
    "print([process['name'] for process in isa_json[\"studies\"][0][\"assays\"][0][\"processSequence\"]])\n",
    "\n",
    "\n",
    "output_path = os.path.join(main_path, 'JSON', 'isa-bh2023-t2j.json')\n",
    "with open(output_path, 'w') as out_fp:\n",
    "    json.dump(isa_json, out_fp)\n",
    "\n",
    "with open(output_path) as out_fp:\n",
    "    new_investigation_dict = json.loads(out_fp.read())\n",
    "    new_investigation =  Investigation()\n",
    "    new_investigation.from_dict(new_investigation_dict)\n",
    "    print(new_investigation.studies[0].assays[0].process_sequence[0])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Loading from ISA-JSON from disk and converting to ISA-TAB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools.convert import json2isatab\n",
    "with open(os.path.join(main_path,'JSON','isa-bh2023-t2j.json')) as in_fp:\n",
    "    out_path = os.path.join(main_path,'JSON', 'BH23-ISATAB_FROM_JSON')\n",
    "    print(out_path)\n",
    "    json2isatab.convert(in_fp, out_path)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Loading from PTMM ISA-JSON and converting to ISA-TAB (no assays in the input)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from isatools.convert import json2isatab\n",
    "with open(os.path.join(main_path, 'isa-v2.json')) as in_fp:\n",
    "    out_path = os.path.join(main_path,'JSON', 'BH23-ISATAB')\n",
    "    json2isatab.convert(in_fp, out_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## About this notebook\n",
    "\n",
    "- authors: Philippe Rocca-Serra (philippe.rocca-serra@oerc.ox.ac.uk)\n",
    "- license: CC-BY 4.0\n",
    "- support: isatools@googlegroups.com\n",
    "- issue tracker: https://github.com/ISA-tools/isa-api/issues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}