{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create ISA-API Investigation from Datascriptor Study Design configuration\n", "# Crossover Study with two dietary treatments on dogs\n", "\n", "In this notebook I will show you how you can use a study design configuration is JSON format as produce by datascriptor (https://gitlab.com/datascriptor/datascriptor) to generate a single-study ISA investigation and how you can then serialise it in JSON and tabular (i.e. CSV) format.\n", "\n", "Or study design configuration consists of:\n", "- a 4-arm study design. Each arm has 10 subjects\n", "- subjects are humans. There is an observational factor, named \"status\" with two values: \"healthy\" and \"diseased\"\n", "- a crossover of two drug treatments, a proper treatment (\"hypertena\" 20 mg/day for 14 days) and a control treatment (\"placebo\" 20 mg/day for 14 days)\n", "- four non-treatment phases: screen (7 days), washout (14 days) and follow-up (180 days)\n", "- three sample types collected: blood and saliva\n", "- three assay types: \n", " - DNA methylation profiling using nucleic acid sequencing on saliva samples\n", " - clinical chemistry with marker on blood samples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup\n", "\n", "Let's import all the required libraries" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "from time import time\n", "import os\n", "import json" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "## ISA-API related imports\n", "from isatools.model import Investigation, Study" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "## ISA-API create mode related imports\n", "from isatools.create.model import StudyDesign\n", "from isatools.create.connectors import generate_study_design\n", "\n", "# serializer from ISA Investigation to JSON\n", "from isatools.isajson import ISAJSONEncoder\n", "\n", "# ISA-Tab serialisation\n", "from isatools import isatab" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "## ISA-API create mode related imports\n", "from isatools.create import model\n", "from isatools import isajson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Load the Study Design JSON configuration\n", "\n", "First of all we load the study design configurator with all the specs defined above" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "with open(os.path.abspath(os.path.join(\n", " \"isa-study-design-as-json\", \"datascriptor\", \"crossover-study-human.json\"\n", ")), \"r\") as config_file:\n", " study_design_config = json.load(config_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Generate the ISA Study Design from the JSON configuration\n", "To perform the conversion we just need to use the function `generate_isa_study_design()` (name possibly subject to change, should we drop the \"isa\" and \"datascriptor\" qualifiers?)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "study_design = generate_study_design(study_design_config)\n", "assert isinstance(study_design, StudyDesign)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Generate the ISA Study from the StudyDesign and embed it into an ISA Investigation\n", "\n", "The `StudyDesign.generate_isa_study()` method returns the complete ISA-API `Study` object." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The generation of the study design took 2.02 s.\n" ] } ], "source": [ "start = time()\n", "study = study_design.generate_isa_study()\n", "end = time()\n", "print('The generation of the study design took {:.2f} s.'.format(end - start))\n", "assert isinstance(study, Study)\n", "investigation = Investigation(identifier='inv01', studies=[study])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Serialize and save the JSON representation of the generated ISA Investigation" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The JSON serialisation of the ISA investigation took 0.55 s.\n" ] } ], "source": [ "start = time()\n", "inv_json = json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': '))\n", "end = time()\n", "print('The JSON serialisation of the ISA investigation took {:.2f} s.'.format(end - start))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "directory = os.path.abspath(os.path.join('output', 'crossover-2-treatments-mice'))\n", "os.makedirs(directory, exist_ok=True)\n", "with open(os.path.abspath(os.path.join(directory, 'isa-investigation-crossover-2-treatments-mice.json')), 'w') as out_fp:\n", " json.dump(json.loads(inv_json), out_fp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Dump the ISA Investigation to ISA-Tab" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The Tab serialisation of the ISA investigation took 21.58 s.\n" ] } ], "source": [ "start = time()\n", "isatab.dump(investigation, directory)\n", "end = time()\n", "print('The Tab serialisation of the ISA investigation took {:.2f} s.'.format(end - start))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use them on the notebook we can also dump the tables to pandas DataFrames, using the `dump_tables_to_dataframes` function rather than dump" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "dataframes = isatab.dump_tables_to_dataframes(investigation)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(dataframes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Check the correctness of the ISA-Tab DataFrames \n", "\n", "We have 1 study file and 2 assay files (one for MS and one for NMR). Let's check the names:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'s_study_01.txt'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "'a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "'a_AT11_clinical-chemistry_marker-panel.txt'" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for key in dataframes.keys():\n", " display(key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.1 Count of subjects and samples\n", "\n", "We have 10 subjects in the each of the 4 arms for a total of 40 subjects.\n", "\n", "We collect:\n", "- 5 blood samples per subject (50 samples * 4 arms = 200 total samples)\n", "- 2 blood samples per subject (20 samples * 4 arms = 80 total samples)\n", "\n", "Across the 4 study arms a total of 280 samples are collected (70 samples per arm)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 70 samples in the GRP0 arm (i.e. group)\n", "There are 70 samples in the GRP1 arm (i.e. group)\n", "There are 70 samples in the GRP2 arm (i.e. group)\n", "There are 70 samples in the GRP3 arm (i.e. group)\n" ] } ], "source": [ "study_frame = dataframes['s_study_01.txt']\n", "count_arm0_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP0' in el)])\n", "count_arm1_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP1' in el)])\n", "count_arm2_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP2' in el)])\n", "count_arm3_samples = len(study_frame[study_frame['Source Name'].apply(lambda el: 'GRP3' in el)])\n", "print(\"There are {} samples in the GRP0 arm (i.e. group)\".format(count_arm0_samples))\n", "print(\"There are {} samples in the GRP1 arm (i.e. group)\".format(count_arm1_samples))\n", "print(\"There are {} samples in the GRP2 arm (i.e. group)\".format(count_arm2_samples))\n", "print(\"There are {} samples in the GRP3 arm (i.e. group)\".format(count_arm3_samples))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.2 Study Table Overview\n", "\n", "The study table provides an overview of the subjects (sources) and samples" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Source NameCharacteristics[Study Subject]Term Accession NumberCharacteristics[status]Protocol REFParameter Value[Sampling order]Parameter Value[Study cell]DatePerformerSample NameCharacteristics[organism part]Term Accession Number.1Comment[study step with treatment]Factor Value[Sequence Order]Factor Value[DURATION]UnitFactor Value[AGENT]Factor Value[INTENSITY]Unit.1
0GRP0_SBJ01Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606healthysample collection031A0E32021-06-30UnknownGRP0_SBJ01_A0E3_SMP-Blood-Sample-1Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610YES314daysplacebo20.0mg/day
1GRP0_SBJ01Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606healthysample collection001A0E12021-06-30UnknownGRP0_SBJ01_A0E1_SMP-Saliva-Sample-1Saliva Samplehttp://purl.obolibrary.org/obo/NCIT_C174119YES114dayshypertena20.0mg/day
2GRP0_SBJ01Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606healthysample collection042A0E42021-06-30UnknownGRP0_SBJ01_A0E4_SMP-Blood-Sample-2Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610NO4180days
3GRP0_SBJ01Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606healthysample collection011A0E12021-06-30UnknownGRP0_SBJ01_A0E1_SMP-Blood-Sample-1Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610YES114dayshypertena20.0mg/day
4GRP0_SBJ01Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606healthysample collection043A0E42021-06-30UnknownGRP0_SBJ01_A0E4_SMP-Blood-Sample-3Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610NO4180days
............................................................
275GRP3_SBJ10Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606diseasedsample collection242A3E32021-06-30UnknownGRP3_SBJ10_A3E3_SMP-Blood-Sample-1Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610YES314dayshypertena20.0mg/day
276GRP3_SBJ10Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606diseasedsample collection256A3E42021-06-30UnknownGRP3_SBJ10_A3E4_SMP-Blood-Sample-3Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610NO4180days
277GRP3_SBJ10Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606diseasedsample collection254A3E42021-06-30UnknownGRP3_SBJ10_A3E4_SMP-Blood-Sample-1Blood Samplehttp://purl.obolibrary.org/obo/NCIT_C17610NO4180days
278GRP3_SBJ10Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606diseasedsample collection232A3E32021-06-30UnknownGRP3_SBJ10_A3E3_SMP-Saliva-Sample-1Saliva Samplehttp://purl.obolibrary.org/obo/NCIT_C174119YES314dayshypertena20.0mg/day
279GRP3_SBJ10Homo sapienshttp://purl.obolibrary.org/obo/NCBITaxon_9606diseasedsample collection212A3E12021-06-30UnknownGRP3_SBJ10_A3E1_SMP-Saliva-Sample-1Saliva Samplehttp://purl.obolibrary.org/obo/NCIT_C174119YES114daysplacebo20.0mg/day
\n", "

280 rows × 19 columns

\n", "
" ], "text/plain": [ " Source Name Characteristics[Study Subject] \\\n", "0 GRP0_SBJ01 Homo sapiens \n", "1 GRP0_SBJ01 Homo sapiens \n", "2 GRP0_SBJ01 Homo sapiens \n", "3 GRP0_SBJ01 Homo sapiens \n", "4 GRP0_SBJ01 Homo sapiens \n", ".. ... ... \n", "275 GRP3_SBJ10 Homo sapiens \n", "276 GRP3_SBJ10 Homo sapiens \n", "277 GRP3_SBJ10 Homo sapiens \n", "278 GRP3_SBJ10 Homo sapiens \n", "279 GRP3_SBJ10 Homo sapiens \n", "\n", " Term Accession Number Characteristics[status] \\\n", "0 http://purl.obolibrary.org/obo/NCBITaxon_9606 healthy \n", "1 http://purl.obolibrary.org/obo/NCBITaxon_9606 healthy \n", "2 http://purl.obolibrary.org/obo/NCBITaxon_9606 healthy \n", "3 http://purl.obolibrary.org/obo/NCBITaxon_9606 healthy \n", "4 http://purl.obolibrary.org/obo/NCBITaxon_9606 healthy \n", ".. ... ... \n", "275 http://purl.obolibrary.org/obo/NCBITaxon_9606 diseased \n", "276 http://purl.obolibrary.org/obo/NCBITaxon_9606 diseased \n", "277 http://purl.obolibrary.org/obo/NCBITaxon_9606 diseased \n", "278 http://purl.obolibrary.org/obo/NCBITaxon_9606 diseased \n", "279 http://purl.obolibrary.org/obo/NCBITaxon_9606 diseased \n", "\n", " Protocol REF Parameter Value[Sampling order] \\\n", "0 sample collection 031 \n", "1 sample collection 001 \n", "2 sample collection 042 \n", "3 sample collection 011 \n", "4 sample collection 043 \n", ".. ... ... \n", "275 sample collection 242 \n", "276 sample collection 256 \n", "277 sample collection 254 \n", "278 sample collection 232 \n", "279 sample collection 212 \n", "\n", " Parameter Value[Study cell] Date Performer \\\n", "0 A0E3 2021-06-30 Unknown \n", "1 A0E1 2021-06-30 Unknown \n", "2 A0E4 2021-06-30 Unknown \n", "3 A0E1 2021-06-30 Unknown \n", "4 A0E4 2021-06-30 Unknown \n", ".. ... ... ... \n", "275 A3E3 2021-06-30 Unknown \n", "276 A3E4 2021-06-30 Unknown \n", "277 A3E4 2021-06-30 Unknown \n", "278 A3E3 2021-06-30 Unknown \n", "279 A3E1 2021-06-30 Unknown \n", "\n", " Sample Name Characteristics[organism part] \\\n", "0 GRP0_SBJ01_A0E3_SMP-Blood-Sample-1 Blood Sample \n", "1 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 Saliva Sample \n", "2 GRP0_SBJ01_A0E4_SMP-Blood-Sample-2 Blood Sample \n", "3 GRP0_SBJ01_A0E1_SMP-Blood-Sample-1 Blood Sample \n", "4 GRP0_SBJ01_A0E4_SMP-Blood-Sample-3 Blood Sample \n", ".. ... ... \n", "275 GRP3_SBJ10_A3E3_SMP-Blood-Sample-1 Blood Sample \n", "276 GRP3_SBJ10_A3E4_SMP-Blood-Sample-3 Blood Sample \n", "277 GRP3_SBJ10_A3E4_SMP-Blood-Sample-1 Blood Sample \n", "278 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 Saliva Sample \n", "279 GRP3_SBJ10_A3E1_SMP-Saliva-Sample-1 Saliva Sample \n", "\n", " Term Accession Number.1 \\\n", "0 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "1 http://purl.obolibrary.org/obo/NCIT_C174119 \n", "2 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "3 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "4 http://purl.obolibrary.org/obo/NCIT_C17610 \n", ".. ... \n", "275 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "276 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "277 http://purl.obolibrary.org/obo/NCIT_C17610 \n", "278 http://purl.obolibrary.org/obo/NCIT_C174119 \n", "279 http://purl.obolibrary.org/obo/NCIT_C174119 \n", "\n", " Comment[study step with treatment] Factor Value[Sequence Order] \\\n", "0 YES 3 \n", "1 YES 1 \n", "2 NO 4 \n", "3 YES 1 \n", "4 NO 4 \n", ".. ... ... \n", "275 YES 3 \n", "276 NO 4 \n", "277 NO 4 \n", "278 YES 3 \n", "279 YES 1 \n", "\n", " Factor Value[DURATION] Unit Factor Value[AGENT] Factor Value[INTENSITY] \\\n", "0 14 days placebo 20.0 \n", "1 14 days hypertena 20.0 \n", "2 180 days \n", "3 14 days hypertena 20.0 \n", "4 180 days \n", ".. ... ... ... ... \n", "275 14 days hypertena 20.0 \n", "276 180 days \n", "277 180 days \n", "278 14 days hypertena 20.0 \n", "279 14 days placebo 20.0 \n", "\n", " Unit.1 \n", "0 mg/day \n", "1 mg/day \n", "2 \n", "3 mg/day \n", "4 \n", ".. ... \n", "275 mg/day \n", "276 \n", "277 \n", "278 mg/day \n", "279 mg/day \n", "\n", "[280 rows x 19 columns]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "study_frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.3 First Assay: DNA Methylation Profiling using nucleic acid sequencing\n", "\n", "This assay takes urine samples as input" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sample NameComment[study step with treatment]Protocol REFParameter Value[cross linking]Parameter Value[DNA fragmentation]Parameter Value[DNA fragment size]Parameter Value[immunoprecipitation antibody]PerformerExtract NameCharacteristics[extract type]Protocol REF.1Parameter Value[instrument]Parameter Value[library_orientation]Parameter Value[library_strategy]Parameter Value[library_selection]Parameter Value[multiplex identifier]Performer.1Raw Data File
0GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S81-Extract-R1gDNAlibrary_preparationGridIONsingleMBD-SeqMFfUnknownAT5-S81-raw_data_file-R2.raw
1GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1YESextractionuv-lightnebulizationadUnknownAT5-S1-Extract-R1DNAlibrary_preparationGridIONpairedMBD-SeqMFfUnknownAT5-S1-raw_data_file-R1.raw
2GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S81-Extract-R1gDNAlibrary_preparationGridIONpairedMBD-SeqMFfUnknownAT5-S81-raw_data_file-R3.raw
3GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1YESextractionuv-lightnebulizationadUnknownAT5-S1-Extract-R1DNAlibrary_preparationGridIONpairedMBD-SeqMFfUnknownAT5-S1-raw_data_file-R2.raw
4GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S81-Extract-R1gDNAlibrary_preparationGridIONpairedMBD-SeqMFfUnknownAT5-S81-raw_data_file-R4.raw
.........................................................
1275GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S152-Extract-R1gDNAlibrary_preparationGridIONsingleMBD-SeqMFfUnknownAT5-S152-raw_data_file-R1.raw
1276GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S152-Extract-R2DNAlibrary_preparationGridIONsingleMBD-SeqMFfUnknownAT5-S152-raw_data_file-R5.raw
1277GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S152-Extract-R2DNAlibrary_preparationGridIONsingleMBD-SeqMFfUnknownAT5-S152-raw_data_file-R6.raw
1278GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1YESextractiondi-tert-butyl peroxidenebulizationadUnknownAT5-S152-Extract-R1gDNAlibrary_preparationGridIONpairedMBD-SeqMFfUnknownAT5-S152-raw_data_file-R3.raw
1279GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1YESextractionuv-lightnebulizationadUnknownAT5-S72-Extract-R2gDNAlibrary_preparationGridIONsingleMBD-SeqMFfUnknownAT5-S72-raw_data_file-R6.raw
\n", "

1280 rows × 18 columns

\n", "
" ], "text/plain": [ " Sample Name Comment[study step with treatment] \\\n", "0 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 YES \n", "1 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 YES \n", "2 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 YES \n", "3 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 YES \n", "4 GRP0_SBJ01_A0E1_SMP-Saliva-Sample-1 YES \n", "... ... ... \n", "1275 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 YES \n", "1276 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 YES \n", "1277 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 YES \n", "1278 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 YES \n", "1279 GRP3_SBJ10_A3E3_SMP-Saliva-Sample-1 YES \n", "\n", " Protocol REF Parameter Value[cross linking] \\\n", "0 extraction di-tert-butyl peroxide \n", "1 extraction uv-light \n", "2 extraction di-tert-butyl peroxide \n", "3 extraction uv-light \n", "4 extraction di-tert-butyl peroxide \n", "... ... ... \n", "1275 extraction di-tert-butyl peroxide \n", "1276 extraction di-tert-butyl peroxide \n", "1277 extraction di-tert-butyl peroxide \n", "1278 extraction di-tert-butyl peroxide \n", "1279 extraction uv-light \n", "\n", " Parameter Value[DNA fragmentation] Parameter Value[DNA fragment size] \\\n", "0 nebulization a \n", "1 nebulization a \n", "2 nebulization a \n", "3 nebulization a \n", "4 nebulization a \n", "... ... ... \n", "1275 nebulization a \n", "1276 nebulization a \n", "1277 nebulization a \n", "1278 nebulization a \n", "1279 nebulization a \n", "\n", " Parameter Value[immunoprecipitation antibody] Performer \\\n", "0 d Unknown \n", "1 d Unknown \n", "2 d Unknown \n", "3 d Unknown \n", "4 d Unknown \n", "... ... ... \n", "1275 d Unknown \n", "1276 d Unknown \n", "1277 d Unknown \n", "1278 d Unknown \n", "1279 d Unknown \n", "\n", " Extract Name Characteristics[extract type] Protocol REF.1 \\\n", "0 AT5-S81-Extract-R1 gDNA library_preparation \n", "1 AT5-S1-Extract-R1 DNA library_preparation \n", "2 AT5-S81-Extract-R1 gDNA library_preparation \n", "3 AT5-S1-Extract-R1 DNA library_preparation \n", "4 AT5-S81-Extract-R1 gDNA library_preparation \n", "... ... ... ... \n", "1275 AT5-S152-Extract-R1 gDNA library_preparation \n", "1276 AT5-S152-Extract-R2 DNA library_preparation \n", "1277 AT5-S152-Extract-R2 DNA library_preparation \n", "1278 AT5-S152-Extract-R1 gDNA library_preparation \n", "1279 AT5-S72-Extract-R2 gDNA library_preparation \n", "\n", " Parameter Value[instrument] Parameter Value[library_orientation] \\\n", "0 GridION single \n", "1 GridION paired \n", "2 GridION paired \n", "3 GridION paired \n", "4 GridION paired \n", "... ... ... \n", "1275 GridION single \n", "1276 GridION single \n", "1277 GridION single \n", "1278 GridION paired \n", "1279 GridION single \n", "\n", " Parameter Value[library_strategy] Parameter Value[library_selection] \\\n", "0 MBD-Seq MF \n", "1 MBD-Seq MF \n", "2 MBD-Seq MF \n", "3 MBD-Seq MF \n", "4 MBD-Seq MF \n", "... ... ... \n", "1275 MBD-Seq MF \n", "1276 MBD-Seq MF \n", "1277 MBD-Seq MF \n", "1278 MBD-Seq MF \n", "1279 MBD-Seq MF \n", "\n", " Parameter Value[multiplex identifier] Performer.1 \\\n", "0 f Unknown \n", "1 f Unknown \n", "2 f Unknown \n", "3 f Unknown \n", "4 f Unknown \n", "... ... ... \n", "1275 f Unknown \n", "1276 f Unknown \n", "1277 f Unknown \n", "1278 f Unknown \n", "1279 f Unknown \n", "\n", " Raw Data File \n", "0 AT5-S81-raw_data_file-R2.raw \n", "1 AT5-S1-raw_data_file-R1.raw \n", "2 AT5-S81-raw_data_file-R3.raw \n", "3 AT5-S1-raw_data_file-R2.raw \n", "4 AT5-S81-raw_data_file-R4.raw \n", "... ... \n", "1275 AT5-S152-raw_data_file-R1.raw \n", "1276 AT5-S152-raw_data_file-R5.raw \n", "1277 AT5-S152-raw_data_file-R6.raw \n", "1278 AT5-S152-raw_data_file-R3.raw \n", "1279 AT5-S72-raw_data_file-R6.raw \n", "\n", "[1280 rows x 18 columns]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 7.3.1 Nucleic acid sequencing stats Stats\n", "\n", "For this assay we have 280 urine samples. 280 DNA extracts are extracted from the samples. The 280 extracts are subsequently labeled. For each labeled extract, 4 mass.spec analyses are run (using Agilent QTQF 6510, positive acquisition mode, 2 replicates each for LC and FIA injection mode), for a total of 1120 mass. spec. processes and 1120 raw spectral data files" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sample Name 80\n", "Comment[study step with treatment] 1\n", "Protocol REF 1\n", "Parameter Value[cross linking] 2\n", "Parameter Value[DNA fragmentation] 1\n", "Parameter Value[DNA fragment size] 1\n", "Parameter Value[immunoprecipitation antibody] 1\n", "Performer 1\n", "Extract Name 320\n", "Characteristics[extract type] 2\n", "Protocol REF.1 1\n", "Parameter Value[instrument] 1\n", "Parameter Value[library_orientation] 2\n", "Parameter Value[library_strategy] 1\n", "Parameter Value[library_selection] 1\n", "Parameter Value[multiplex identifier] 1\n", "Performer.1 1\n", "Raw Data File 1280\n", "dtype: int64" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataframes['a_AT5_DNA-methylation-profiling_nucleic-acid-sequencing.txt'].nunique(axis=0, dropna=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 7.4 Second Assay: Clinical Chemistry Marker Panel\n", "\n", "This assay takes blood samples as input" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sample NameComment[study step with treatment]Protocol REFPerformerRaw Data File
0GRP0_SBJ01_A0E1_SMP-Blood-Sample-1YESsample preparationUnknownAT11-S1-raw_data_file-R1
1GRP0_SBJ01_A0E3_SMP-Blood-Sample-1YESsample preparationUnknownAT11-S11-raw_data_file-R1
2GRP0_SBJ01_A0E4_SMP-Blood-Sample-1NOsample preparationUnknownAT11-S21-raw_data_file-R1
3GRP0_SBJ01_A0E4_SMP-Blood-Sample-2NOsample preparationUnknownAT11-S22-raw_data_file-R1
4GRP0_SBJ01_A0E4_SMP-Blood-Sample-3NOsample preparationUnknownAT11-S23-raw_data_file-R1
..................
195GRP3_SBJ10_A3E1_SMP-Blood-Sample-1YESsample preparationUnknownAT11-S152-raw_data_file-R1
196GRP3_SBJ10_A3E3_SMP-Blood-Sample-1YESsample preparationUnknownAT11-S162-raw_data_file-R1
197GRP3_SBJ10_A3E4_SMP-Blood-Sample-1NOsample preparationUnknownAT11-S174-raw_data_file-R1
198GRP3_SBJ10_A3E4_SMP-Blood-Sample-2NOsample preparationUnknownAT11-S175-raw_data_file-R1
199GRP3_SBJ10_A3E4_SMP-Blood-Sample-3NOsample preparationUnknownAT11-S176-raw_data_file-R1
\n", "

200 rows × 5 columns

\n", "
" ], "text/plain": [ " Sample Name Comment[study step with treatment] \\\n", "0 GRP0_SBJ01_A0E1_SMP-Blood-Sample-1 YES \n", "1 GRP0_SBJ01_A0E3_SMP-Blood-Sample-1 YES \n", "2 GRP0_SBJ01_A0E4_SMP-Blood-Sample-1 NO \n", "3 GRP0_SBJ01_A0E4_SMP-Blood-Sample-2 NO \n", "4 GRP0_SBJ01_A0E4_SMP-Blood-Sample-3 NO \n", ".. ... ... \n", "195 GRP3_SBJ10_A3E1_SMP-Blood-Sample-1 YES \n", "196 GRP3_SBJ10_A3E3_SMP-Blood-Sample-1 YES \n", "197 GRP3_SBJ10_A3E4_SMP-Blood-Sample-1 NO \n", "198 GRP3_SBJ10_A3E4_SMP-Blood-Sample-2 NO \n", "199 GRP3_SBJ10_A3E4_SMP-Blood-Sample-3 NO \n", "\n", " Protocol REF Performer Raw Data File \n", "0 sample preparation Unknown AT11-S1-raw_data_file-R1 \n", "1 sample preparation Unknown AT11-S11-raw_data_file-R1 \n", "2 sample preparation Unknown AT11-S21-raw_data_file-R1 \n", "3 sample preparation Unknown AT11-S22-raw_data_file-R1 \n", "4 sample preparation Unknown AT11-S23-raw_data_file-R1 \n", ".. ... ... ... \n", "195 sample preparation Unknown AT11-S152-raw_data_file-R1 \n", "196 sample preparation Unknown AT11-S162-raw_data_file-R1 \n", "197 sample preparation Unknown AT11-S174-raw_data_file-R1 \n", "198 sample preparation Unknown AT11-S175-raw_data_file-R1 \n", "199 sample preparation Unknown AT11-S176-raw_data_file-R1 \n", "\n", "[200 rows x 5 columns]" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataframes['a_AT11_clinical-chemistry_marker-panel.txt']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 7.4.1 Marker Panel Stats\n", "\n", "For this assay we use 320 blood samples. For each sample three chemical marker assays are run, producing a total of 960 sample preparation processes and 960 raw data files" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sample Name 200\n", "Comment[study step with treatment] 2\n", "Protocol REF 1\n", "Performer 1\n", "Raw Data File 200\n", "dtype: int64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataframes['a_AT11_clinical-chemistry_marker-panel.txt'].nunique(axis=0, dropna=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }