{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create ISA-API Investigation from Datascriptor Study Design configuration\n",
    "# Crossover Study with two dietary treatments on dogs\n",
    "\n",
    "In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.\n",
    "\n",
    "Or study design configuration consists of:\n",
    "- a 4-arm study design. Each arm has 10 subjects\n",
    "- subjects are humans. There is an observational factor, named \"status\" with two values: \"healthy\" and \"diseased\"\n",
    "- a crossover of two drug treatments, a proper treatment (\"hypertena\" 20 mg/day for 14 days) and a control treatment (\"placebo\" 20 mg/day for 14 days)\n",
    "- four non-treatment phases: screen (7 days), washout (14 days) and follow-up (180 days)\n",
    "- three sample types collected: blood and saliva\n",
    "- three assay types: \n",
    "    - DNA methylation profiling using nucleic acid sequencing on saliva samples\n",
    "    - clinical chemistry with marker on blood samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup\n",
    "\n",
    "Let's import all the required libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import time\n",
    "import os\n",
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "## ISA-API related imports\n",
    "from isatools.model import Investigation, Study"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "## ISA-API create mode related imports\n",
    "from isatools.create.model import StudyDesign\n",
    "from isatools.create.connectors import generate_study_design\n",
    "\n",
    "# serializer from ISA Investigation to JSON\n",
    "from isatools.isajson import ISAJSONEncoder\n",
    "\n",
    "# ISA-Tab serialisation\n",
    "from isatools import isatab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "## ISA-API create mode related imports\n",
    "from isatools.create import model\n",
    "from isatools import isajson"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load the Study Design JSON configuration\n",
    "\n",
    "First of all we load the study design configurator with all the specs defined above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(os.path.abspath(os.path.join(\n",
    "    \"isa-study-design-as-json\", \"datascriptor\", \"crossover-study-human.json\"\n",
    ")), \"r\") as config_file:\n",
    "    study_design_config = json.load(config_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Generate the ISA Study Design from the JSON configuration\n",
    "To perform the conversion we just need to use the function `generate_isa_study_design()` (name possibly subject to change, should we drop the \"isa\" and \"datascriptor\" qualifiers?)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "study_design = generate_study_design(study_design_config)\n",
    "assert isinstance(study_design, StudyDesign)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation\n",
    "\n",
    "The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The generation of the study design took 2.02 s.\n"
     ]
    }
   ],
   "source": [
    "start = time()\n",
    "study = study_design.generate_isa_study()\n",
    "end = time()\n",
    "print('The generation of the study design took {:.2f} s.'.format(end - start))\n",
    "assert isinstance(study, Study)\n",
    "investigation = Investigation(identifier='inv01', studies=[study])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Serialize and save the JSON representation of the generated ISA Investigation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The JSON serialisation of the ISA investigation took 0.55 s.\n"
     ]
    }
   ],
   "source": [
    "start = time()\n",
    "inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))\n",
    "end = time()\n",
    "print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "directory = os.path.abspath(os.path.join('output', 'crossover-2-treatments-mice'))\n",
    "os.makedirs(directory, exist_ok=True)\n",
    "with open(os.path.abspath(os.path.join(directory, 'isa-investigation-crossover-2-treatments-mice.json')), 'w') as out_fp:\n",
    "    json.dump(json.loads(inv_json), out_fp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Dump the ISA Investigation to ISA-Tab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Tab serialisation of the ISA investigation took 21.58 s.\n"
     ]
    }
   ],
   "source": [
    "start = time()\n",
    "isatab.dump(investigation, directory)\n",
    "end = time()\n",
    "print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataframes = isatab.dump_tables_to_dataframes(investigation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(dataframes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Check the correctness of the ISA-Tab DataFrames \n",
    "\n",
    "We have 1 study file and 2 assay files (one for MS and one for NMR). Let's check the names:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'s_study_01.txt'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "'a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "'a_AT11_clinical-chemistry_marker-panel.txt'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "for key in dataframes.keys():\n",
    "    display(key)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1 Count of subjects and samples\n",
    "\n",
    "We have 10 subjects in the each of the 4 arms for a total of 40 subjects.\n",
    "\n",
    "We collect:\n",
    "- 5 blood samples per subject (50 samples * 4 arms = 200 total samples)\n",
    "- 2 blood samples per subject (20 samples * 4 arms = 80 total samples)\n",
    "\n",
    "Across the 4 study arms a total of 280 samples are collected (70 samples per arm)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 70 samples in the GRP0 arm (i.e. group)\n",
      "There are 70 samples in the GRP1 arm (i.e. group)\n",
      "There are 70 samples in the GRP2 arm (i.e. group)\n",
      "There are 70 samples in the GRP3 arm (i.e. group)\n"
     ]
    }
   ],
   "source": [
    "study_frame = dataframes['s_study_01.txt']\n",
    "count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP0' in el)])\n",
    "count_arm1_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP1' in el)])\n",
    "count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])\n",
    "count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP3' in el)])\n",
    "print(\"There are {} samples in the GRP0 arm (i.e. group)\".format(count_arm0_samples))\n",
    "print(\"There are {} samples in the GRP1 arm (i.e. group)\".format(count_arm1_samples))\n",
    "print(\"There are {} samples in the GRP2 arm (i.e. group)\".format(count_arm2_samples))\n",
    "print(\"There are {} samples in the GRP3 arm (i.e. group)\".format(count_arm3_samples))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2 Study Table Overview\n",
    "\n",
    "The study table provides an overview of the subjects (sources) and samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Source Name</th>\n",
       "      <th>Characteristics[Study Subject]</th>\n",
       "      <th>Term Accession Number</th>\n",
       "      <th>Characteristics[status]</th>\n",
       "      <th>Protocol REF</th>\n",
       "      <th>Parameter Value[Sampling order]</th>\n",
       "      <th>Parameter Value[Study cell]</th>\n",
       "      <th>Date</th>\n",
       "      <th>Performer</th>\n",
       "      <th>Sample Name</th>\n",
       "      <th>Characteristics[organism part]</th>\n",
       "      <th>Term Accession Number.1</th>\n",
       "      <th>Comment[study step with treatment]</th>\n",
       "      <th>Factor Value[Sequence Order]</th>\n",
       "      <th>Factor Value[DURATION]</th>\n",
       "      <th>Unit</th>\n",
       "      <th>Factor Value[AGENT]</th>\n",
       "      <th>Factor Value[INTENSITY]</th>\n",
       "      <th>Unit.1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>GRP0_SBJ01</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>healthy</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>031</td>\n",
       "      <td>A0E3</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP0_SBJ01_A0E3_SMP-Blood-Sample-1</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>YES</td>\n",
       "      <td>3</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>placebo</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>GRP0_SBJ01</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>healthy</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>001</td>\n",
       "      <td>A0E1</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>Saliva Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C174119</td>\n",
       "      <td>YES</td>\n",
       "      <td>1</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>hypertena</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>GRP0_SBJ01</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>healthy</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>042</td>\n",
       "      <td>A0E4</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP0_SBJ01_A0E4_SMP-Blood-Sample-2</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>NO</td>\n",
       "      <td>4</td>\n",
       "      <td>180</td>\n",
       "      <td>days</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>GRP0_SBJ01</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>healthy</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>011</td>\n",
       "      <td>A0E1</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Blood-Sample-1</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>YES</td>\n",
       "      <td>1</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>hypertena</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>GRP0_SBJ01</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>healthy</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>043</td>\n",
       "      <td>A0E4</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP0_SBJ01_A0E4_SMP-Blood-Sample-3</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>NO</td>\n",
       "      <td>4</td>\n",
       "      <td>180</td>\n",
       "      <td>days</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>275</th>\n",
       "      <td>GRP3_SBJ10</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>diseased</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>242</td>\n",
       "      <td>A3E3</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Blood-Sample-1</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>YES</td>\n",
       "      <td>3</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>hypertena</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>276</th>\n",
       "      <td>GRP3_SBJ10</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>diseased</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>256</td>\n",
       "      <td>A3E4</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP3_SBJ10_A3E4_SMP-Blood-Sample-3</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>NO</td>\n",
       "      <td>4</td>\n",
       "      <td>180</td>\n",
       "      <td>days</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>277</th>\n",
       "      <td>GRP3_SBJ10</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>diseased</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>254</td>\n",
       "      <td>A3E4</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP3_SBJ10_A3E4_SMP-Blood-Sample-1</td>\n",
       "      <td>Blood Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C17610</td>\n",
       "      <td>NO</td>\n",
       "      <td>4</td>\n",
       "      <td>180</td>\n",
       "      <td>days</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>278</th>\n",
       "      <td>GRP3_SBJ10</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>diseased</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>232</td>\n",
       "      <td>A3E3</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>Saliva Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C174119</td>\n",
       "      <td>YES</td>\n",
       "      <td>3</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>hypertena</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>279</th>\n",
       "      <td>GRP3_SBJ10</td>\n",
       "      <td>Homo sapiens</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCBITaxon_9606</td>\n",
       "      <td>diseased</td>\n",
       "      <td>sample collection</td>\n",
       "      <td>212</td>\n",
       "      <td>A3E1</td>\n",
       "      <td>2021-06-30</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>GRP3_SBJ10_A3E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>Saliva Sample</td>\n",
       "      <td>http://purl.obolibrary.org/obo/NCIT_C174119</td>\n",
       "      <td>YES</td>\n",
       "      <td>1</td>\n",
       "      <td>14</td>\n",
       "      <td>days</td>\n",
       "      <td>placebo</td>\n",
       "      <td>20.0</td>\n",
       "      <td>mg/day</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>280 rows × 19 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Source Name Characteristics[Study Subject]  \\\n",
       "0    GRP0_SBJ01                   Homo sapiens   \n",
       "1    GRP0_SBJ01                   Homo sapiens   \n",
       "2    GRP0_SBJ01                   Homo sapiens   \n",
       "3    GRP0_SBJ01                   Homo sapiens   \n",
       "4    GRP0_SBJ01                   Homo sapiens   \n",
       "..          ...                            ...   \n",
       "275  GRP3_SBJ10                   Homo sapiens   \n",
       "276  GRP3_SBJ10                   Homo sapiens   \n",
       "277  GRP3_SBJ10                   Homo sapiens   \n",
       "278  GRP3_SBJ10                   Homo sapiens   \n",
       "279  GRP3_SBJ10                   Homo sapiens   \n",
       "\n",
       "                             Term Accession Number Characteristics[status]  \\\n",
       "0    http://purl.obolibrary.org/obo/NCBITaxon_9606                 healthy   \n",
       "1    http://purl.obolibrary.org/obo/NCBITaxon_9606                 healthy   \n",
       "2    http://purl.obolibrary.org/obo/NCBITaxon_9606                 healthy   \n",
       "3    http://purl.obolibrary.org/obo/NCBITaxon_9606                 healthy   \n",
       "4    http://purl.obolibrary.org/obo/NCBITaxon_9606                 healthy   \n",
       "..                                             ...                     ...   \n",
       "275  http://purl.obolibrary.org/obo/NCBITaxon_9606                diseased   \n",
       "276  http://purl.obolibrary.org/obo/NCBITaxon_9606                diseased   \n",
       "277  http://purl.obolibrary.org/obo/NCBITaxon_9606                diseased   \n",
       "278  http://purl.obolibrary.org/obo/NCBITaxon_9606                diseased   \n",
       "279  http://purl.obolibrary.org/obo/NCBITaxon_9606                diseased   \n",
       "\n",
       "          Protocol REF Parameter Value[Sampling order]  \\\n",
       "0    sample collection                             031   \n",
       "1    sample collection                             001   \n",
       "2    sample collection                             042   \n",
       "3    sample collection                             011   \n",
       "4    sample collection                             043   \n",
       "..                 ...                             ...   \n",
       "275  sample collection                             242   \n",
       "276  sample collection                             256   \n",
       "277  sample collection                             254   \n",
       "278  sample collection                             232   \n",
       "279  sample collection                             212   \n",
       "\n",
       "    Parameter Value[Study cell]        Date Performer  \\\n",
       "0                          A0E3  2021-06-30   Unknown   \n",
       "1                          A0E1  2021-06-30   Unknown   \n",
       "2                          A0E4  2021-06-30   Unknown   \n",
       "3                          A0E1  2021-06-30   Unknown   \n",
       "4                          A0E4  2021-06-30   Unknown   \n",
       "..                          ...         ...       ...   \n",
       "275                        A3E3  2021-06-30   Unknown   \n",
       "276                        A3E4  2021-06-30   Unknown   \n",
       "277                        A3E4  2021-06-30   Unknown   \n",
       "278                        A3E3  2021-06-30   Unknown   \n",
       "279                        A3E1  2021-06-30   Unknown   \n",
       "\n",
       "                             Sample Name Characteristics[organism part]  \\\n",
       "0     GRP0_SBJ01_A0E3_SMP-Blood-Sample-1                   Blood Sample   \n",
       "1    GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                  Saliva Sample   \n",
       "2     GRP0_SBJ01_A0E4_SMP-Blood-Sample-2                   Blood Sample   \n",
       "3     GRP0_SBJ01_A0E1_SMP-Blood-Sample-1                   Blood Sample   \n",
       "4     GRP0_SBJ01_A0E4_SMP-Blood-Sample-3                   Blood Sample   \n",
       "..                                   ...                            ...   \n",
       "275   GRP3_SBJ10_A3E3_SMP-Blood-Sample-1                   Blood Sample   \n",
       "276   GRP3_SBJ10_A3E4_SMP-Blood-Sample-3                   Blood Sample   \n",
       "277   GRP3_SBJ10_A3E4_SMP-Blood-Sample-1                   Blood Sample   \n",
       "278  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                  Saliva Sample   \n",
       "279  GRP3_SBJ10_A3E1_SMP-Saliva-Sample-1                  Saliva Sample   \n",
       "\n",
       "                         Term Accession Number.1  \\\n",
       "0     http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "1    http://purl.obolibrary.org/obo/NCIT_C174119   \n",
       "2     http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "3     http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "4     http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "..                                           ...   \n",
       "275   http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "276   http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "277   http://purl.obolibrary.org/obo/NCIT_C17610   \n",
       "278  http://purl.obolibrary.org/obo/NCIT_C174119   \n",
       "279  http://purl.obolibrary.org/obo/NCIT_C174119   \n",
       "\n",
       "    Comment[study step with treatment] Factor Value[Sequence Order]  \\\n",
       "0                                  YES                            3   \n",
       "1                                  YES                            1   \n",
       "2                                   NO                            4   \n",
       "3                                  YES                            1   \n",
       "4                                   NO                            4   \n",
       "..                                 ...                          ...   \n",
       "275                                YES                            3   \n",
       "276                                 NO                            4   \n",
       "277                                 NO                            4   \n",
       "278                                YES                            3   \n",
       "279                                YES                            1   \n",
       "\n",
       "    Factor Value[DURATION]  Unit Factor Value[AGENT] Factor Value[INTENSITY]  \\\n",
       "0                       14  days             placebo                    20.0   \n",
       "1                       14  days           hypertena                    20.0   \n",
       "2                      180  days                                               \n",
       "3                       14  days           hypertena                    20.0   \n",
       "4                      180  days                                               \n",
       "..                     ...   ...                 ...                     ...   \n",
       "275                     14  days           hypertena                    20.0   \n",
       "276                    180  days                                               \n",
       "277                    180  days                                               \n",
       "278                     14  days           hypertena                    20.0   \n",
       "279                     14  days             placebo                    20.0   \n",
       "\n",
       "     Unit.1  \n",
       "0    mg/day  \n",
       "1    mg/day  \n",
       "2            \n",
       "3    mg/day  \n",
       "4            \n",
       "..      ...  \n",
       "275  mg/day  \n",
       "276          \n",
       "277          \n",
       "278  mg/day  \n",
       "279  mg/day  \n",
       "\n",
       "[280 rows x 19 columns]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "study_frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3 First Assay: DNA Methylation Profiling using nucleic acid sequencing\n",
    "\n",
    "This assay takes urine samples as input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sample Name</th>\n",
       "      <th>Comment[study step with treatment]</th>\n",
       "      <th>Protocol REF</th>\n",
       "      <th>Parameter Value[cross linking]</th>\n",
       "      <th>Parameter Value[DNA fragmentation]</th>\n",
       "      <th>Parameter Value[DNA fragment size]</th>\n",
       "      <th>Parameter Value[immunoprecipitation antibody]</th>\n",
       "      <th>Performer</th>\n",
       "      <th>Extract Name</th>\n",
       "      <th>Characteristics[extract type]</th>\n",
       "      <th>Protocol REF.1</th>\n",
       "      <th>Parameter Value[instrument]</th>\n",
       "      <th>Parameter Value[library_orientation]</th>\n",
       "      <th>Parameter Value[library_strategy]</th>\n",
       "      <th>Parameter Value[library_selection]</th>\n",
       "      <th>Parameter Value[multiplex identifier]</th>\n",
       "      <th>Performer.1</th>\n",
       "      <th>Raw Data File</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-Extract-R1</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>single</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-raw_data_file-R2.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>uv-light</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S1-Extract-R1</td>\n",
       "      <td>DNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>paired</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S1-raw_data_file-R1.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-Extract-R1</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>paired</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-raw_data_file-R3.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>uv-light</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S1-Extract-R1</td>\n",
       "      <td>DNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>paired</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S1-raw_data_file-R2.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-Extract-R1</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>paired</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S81-raw_data_file-R4.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1275</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-Extract-R1</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>single</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-raw_data_file-R1.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1276</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-Extract-R2</td>\n",
       "      <td>DNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>single</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-raw_data_file-R5.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1277</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-Extract-R2</td>\n",
       "      <td>DNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>single</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-raw_data_file-R6.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1278</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>di-tert-butyl peroxide</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-Extract-R1</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>paired</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S152-raw_data_file-R3.raw</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1279</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>extraction</td>\n",
       "      <td>uv-light</td>\n",
       "      <td>nebulization</td>\n",
       "      <td>a</td>\n",
       "      <td>d</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S72-Extract-R2</td>\n",
       "      <td>gDNA</td>\n",
       "      <td>library_preparation</td>\n",
       "      <td>GridION</td>\n",
       "      <td>single</td>\n",
       "      <td>MBD-Seq</td>\n",
       "      <td>MF</td>\n",
       "      <td>f</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT5-S72-raw_data_file-R6.raw</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1280 rows × 18 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                              Sample Name Comment[study step with treatment]  \\\n",
       "0     GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                                YES   \n",
       "1     GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                                YES   \n",
       "2     GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                                YES   \n",
       "3     GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                                YES   \n",
       "4     GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1                                YES   \n",
       "...                                   ...                                ...   \n",
       "1275  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                                YES   \n",
       "1276  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                                YES   \n",
       "1277  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                                YES   \n",
       "1278  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                                YES   \n",
       "1279  GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1                                YES   \n",
       "\n",
       "     Protocol REF Parameter Value[cross linking]  \\\n",
       "0      extraction         di-tert-butyl peroxide   \n",
       "1      extraction                       uv-light   \n",
       "2      extraction         di-tert-butyl peroxide   \n",
       "3      extraction                       uv-light   \n",
       "4      extraction         di-tert-butyl peroxide   \n",
       "...           ...                            ...   \n",
       "1275   extraction         di-tert-butyl peroxide   \n",
       "1276   extraction         di-tert-butyl peroxide   \n",
       "1277   extraction         di-tert-butyl peroxide   \n",
       "1278   extraction         di-tert-butyl peroxide   \n",
       "1279   extraction                       uv-light   \n",
       "\n",
       "     Parameter Value[DNA fragmentation] Parameter Value[DNA fragment size]  \\\n",
       "0                          nebulization                                  a   \n",
       "1                          nebulization                                  a   \n",
       "2                          nebulization                                  a   \n",
       "3                          nebulization                                  a   \n",
       "4                          nebulization                                  a   \n",
       "...                                 ...                                ...   \n",
       "1275                       nebulization                                  a   \n",
       "1276                       nebulization                                  a   \n",
       "1277                       nebulization                                  a   \n",
       "1278                       nebulization                                  a   \n",
       "1279                       nebulization                                  a   \n",
       "\n",
       "     Parameter Value[immunoprecipitation antibody] Performer  \\\n",
       "0                                                d   Unknown   \n",
       "1                                                d   Unknown   \n",
       "2                                                d   Unknown   \n",
       "3                                                d   Unknown   \n",
       "4                                                d   Unknown   \n",
       "...                                            ...       ...   \n",
       "1275                                             d   Unknown   \n",
       "1276                                             d   Unknown   \n",
       "1277                                             d   Unknown   \n",
       "1278                                             d   Unknown   \n",
       "1279                                             d   Unknown   \n",
       "\n",
       "             Extract Name Characteristics[extract type]       Protocol REF.1  \\\n",
       "0      AT5-S81-Extract-R1                          gDNA  library_preparation   \n",
       "1       AT5-S1-Extract-R1                           DNA  library_preparation   \n",
       "2      AT5-S81-Extract-R1                          gDNA  library_preparation   \n",
       "3       AT5-S1-Extract-R1                           DNA  library_preparation   \n",
       "4      AT5-S81-Extract-R1                          gDNA  library_preparation   \n",
       "...                   ...                           ...                  ...   \n",
       "1275  AT5-S152-Extract-R1                          gDNA  library_preparation   \n",
       "1276  AT5-S152-Extract-R2                           DNA  library_preparation   \n",
       "1277  AT5-S152-Extract-R2                           DNA  library_preparation   \n",
       "1278  AT5-S152-Extract-R1                          gDNA  library_preparation   \n",
       "1279   AT5-S72-Extract-R2                          gDNA  library_preparation   \n",
       "\n",
       "     Parameter Value[instrument] Parameter Value[library_orientation]  \\\n",
       "0                        GridION                               single   \n",
       "1                        GridION                               paired   \n",
       "2                        GridION                               paired   \n",
       "3                        GridION                               paired   \n",
       "4                        GridION                               paired   \n",
       "...                          ...                                  ...   \n",
       "1275                     GridION                               single   \n",
       "1276                     GridION                               single   \n",
       "1277                     GridION                               single   \n",
       "1278                     GridION                               paired   \n",
       "1279                     GridION                               single   \n",
       "\n",
       "     Parameter Value[library_strategy] Parameter Value[library_selection]  \\\n",
       "0                              MBD-Seq                                 MF   \n",
       "1                              MBD-Seq                                 MF   \n",
       "2                              MBD-Seq                                 MF   \n",
       "3                              MBD-Seq                                 MF   \n",
       "4                              MBD-Seq                                 MF   \n",
       "...                                ...                                ...   \n",
       "1275                           MBD-Seq                                 MF   \n",
       "1276                           MBD-Seq                                 MF   \n",
       "1277                           MBD-Seq                                 MF   \n",
       "1278                           MBD-Seq                                 MF   \n",
       "1279                           MBD-Seq                                 MF   \n",
       "\n",
       "     Parameter Value[multiplex identifier] Performer.1  \\\n",
       "0                                        f     Unknown   \n",
       "1                                        f     Unknown   \n",
       "2                                        f     Unknown   \n",
       "3                                        f     Unknown   \n",
       "4                                        f     Unknown   \n",
       "...                                    ...         ...   \n",
       "1275                                     f     Unknown   \n",
       "1276                                     f     Unknown   \n",
       "1277                                     f     Unknown   \n",
       "1278                                     f     Unknown   \n",
       "1279                                     f     Unknown   \n",
       "\n",
       "                      Raw Data File  \n",
       "0      AT5-S81-raw_data_file-R2.raw  \n",
       "1       AT5-S1-raw_data_file-R1.raw  \n",
       "2      AT5-S81-raw_data_file-R3.raw  \n",
       "3       AT5-S1-raw_data_file-R2.raw  \n",
       "4      AT5-S81-raw_data_file-R4.raw  \n",
       "...                             ...  \n",
       "1275  AT5-S152-raw_data_file-R1.raw  \n",
       "1276  AT5-S152-raw_data_file-R5.raw  \n",
       "1277  AT5-S152-raw_data_file-R6.raw  \n",
       "1278  AT5-S152-raw_data_file-R3.raw  \n",
       "1279   AT5-S72-raw_data_file-R6.raw  \n",
       "\n",
       "[1280 rows x 18 columns]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.3.1 Nucleic acid sequencing stats Stats\n",
    "\n",
    "For this assay we have 280 urine samples. 280 DNA extracts are extracted from the samples. The 280 extracts are subsequently labeled. For each labeled extract, 4 mass.spec analyses are run (using Agilent QTQF 6510, positive acquisition mode, 2 replicates each for LC and FIA injection mode), for a total of 1120 mass. spec. processes and 1120 raw spectral data files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sample Name                                        80\n",
       "Comment[study step with treatment]                  1\n",
       "Protocol REF                                        1\n",
       "Parameter Value[cross linking]                      2\n",
       "Parameter Value[DNA fragmentation]                  1\n",
       "Parameter Value[DNA fragment size]                  1\n",
       "Parameter Value[immunoprecipitation antibody]       1\n",
       "Performer                                           1\n",
       "Extract Name                                      320\n",
       "Characteristics[extract type]                       2\n",
       "Protocol REF.1                                      1\n",
       "Parameter Value[instrument]                         1\n",
       "Parameter Value[library_orientation]                2\n",
       "Parameter Value[library_strategy]                   1\n",
       "Parameter Value[library_selection]                  1\n",
       "Parameter Value[multiplex identifier]               1\n",
       "Performer.1                                         1\n",
       "Raw Data File                                    1280\n",
       "dtype: int64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'].nunique(axis=0, dropna=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.4 Second Assay: Clinical Chemistry Marker Panel\n",
    "\n",
    "This assay takes blood samples as input"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sample Name</th>\n",
       "      <th>Comment[study step with treatment]</th>\n",
       "      <th>Protocol REF</th>\n",
       "      <th>Performer</th>\n",
       "      <th>Raw Data File</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>GRP0_SBJ01_A0E1_SMP-Blood-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S1-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>GRP0_SBJ01_A0E3_SMP-Blood-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S11-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>GRP0_SBJ01_A0E4_SMP-Blood-Sample-1</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S21-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>GRP0_SBJ01_A0E4_SMP-Blood-Sample-2</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S22-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>GRP0_SBJ01_A0E4_SMP-Blood-Sample-3</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S23-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>195</th>\n",
       "      <td>GRP3_SBJ10_A3E1_SMP-Blood-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S152-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>GRP3_SBJ10_A3E3_SMP-Blood-Sample-1</td>\n",
       "      <td>YES</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S162-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>197</th>\n",
       "      <td>GRP3_SBJ10_A3E4_SMP-Blood-Sample-1</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S174-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>GRP3_SBJ10_A3E4_SMP-Blood-Sample-2</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S175-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>199</th>\n",
       "      <td>GRP3_SBJ10_A3E4_SMP-Blood-Sample-3</td>\n",
       "      <td>NO</td>\n",
       "      <td>sample preparation</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>AT11-S176-raw_data_file-R1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>200 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                            Sample Name Comment[study step with treatment]  \\\n",
       "0    GRP0_SBJ01_A0E1_SMP-Blood-Sample-1                                YES   \n",
       "1    GRP0_SBJ01_A0E3_SMP-Blood-Sample-1                                YES   \n",
       "2    GRP0_SBJ01_A0E4_SMP-Blood-Sample-1                                 NO   \n",
       "3    GRP0_SBJ01_A0E4_SMP-Blood-Sample-2                                 NO   \n",
       "4    GRP0_SBJ01_A0E4_SMP-Blood-Sample-3                                 NO   \n",
       "..                                  ...                                ...   \n",
       "195  GRP3_SBJ10_A3E1_SMP-Blood-Sample-1                                YES   \n",
       "196  GRP3_SBJ10_A3E3_SMP-Blood-Sample-1                                YES   \n",
       "197  GRP3_SBJ10_A3E4_SMP-Blood-Sample-1                                 NO   \n",
       "198  GRP3_SBJ10_A3E4_SMP-Blood-Sample-2                                 NO   \n",
       "199  GRP3_SBJ10_A3E4_SMP-Blood-Sample-3                                 NO   \n",
       "\n",
       "           Protocol REF Performer               Raw Data File  \n",
       "0    sample preparation   Unknown    AT11-S1-raw_data_file-R1  \n",
       "1    sample preparation   Unknown   AT11-S11-raw_data_file-R1  \n",
       "2    sample preparation   Unknown   AT11-S21-raw_data_file-R1  \n",
       "3    sample preparation   Unknown   AT11-S22-raw_data_file-R1  \n",
       "4    sample preparation   Unknown   AT11-S23-raw_data_file-R1  \n",
       "..                  ...       ...                         ...  \n",
       "195  sample preparation   Unknown  AT11-S152-raw_data_file-R1  \n",
       "196  sample preparation   Unknown  AT11-S162-raw_data_file-R1  \n",
       "197  sample preparation   Unknown  AT11-S174-raw_data_file-R1  \n",
       "198  sample preparation   Unknown  AT11-S175-raw_data_file-R1  \n",
       "199  sample preparation   Unknown  AT11-S176-raw_data_file-R1  \n",
       "\n",
       "[200 rows x 5 columns]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataframes['a_AT11_clinical-chemistry_marker-panel.txt']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 7.4.1 Marker Panel Stats\n",
    "\n",
    "For this assay we use 320 blood samples. For each sample three chemical marker assays are run, producing a total of 960 sample preparation processes and 960 raw data files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Sample Name                           200\n",
       "Comment[study step with treatment]      2\n",
       "Protocol REF                            1\n",
       "Performer                               1\n",
       "Raw Data File                         200\n",
       "dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataframes['a_AT11_clinical-chemistry_marker-panel.txt'].nunique(axis=0, dropna=True)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}