--- name: tooluniverse-multiomic-disease-characterization description: Comprehensive multi-omics disease characterization integrating genomics, transcriptomics, proteomics, pathway, and therapeutic layers for systems-level understanding. Produces a detailed multi-omics report with quantitative confidence scoring (0-100), cross-layer gene concordance analysis, biomarker candidates, therapeutic opportunities, and mechanistic hypotheses. Uses 80+ ToolUniverse tools across 8 analysis layers. Use when users ask about disease mechanisms, multi-omics analysis, systems biology of disease, biomarker discovery, or therapeutic target identification from a disease perspective. --- # Multi-Omics Disease Characterization Pipeline Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates. **KEY PRINCIPLES**: 1. **Report-first approach** - Create report file FIRST, then populate progressively 2. **Disease disambiguation FIRST** - Resolve all identifiers before omics analysis 3. **Layer-by-layer analysis** - Systematically cover all omics layers 4. **Cross-layer integration** - Identify genes/targets appearing in multiple layers 5. **Evidence grading** - Grade all evidence as T1 (human/clinical) to T4 (computational) 6. **Tissue context** - Emphasize disease-relevant tissues/organs 7. **Quantitative scoring** - Multi-Omics Confidence Score (0-100) 8. **Druggable focus** - Prioritize targets with therapeutic potential 9. **Biomarker identification** - Highlight diagnostic/prognostic markers 10. **Mechanistic synthesis** - Generate testable hypotheses 11. **Source references** - Every statement must cite tool/database 12. **Completeness checklist** - Mandatory section showing analysis coverage 13. **English-first queries** - Always use English terms in tool calls. Respond in user's language --- ## When to Use This Skill Apply when users: - Ask about disease mechanisms across omics layers - Need multi-omics characterization of a disease - Want to understand disease at the systems biology level - Ask "What pathways/genes/proteins are involved in [disease]?" - Need biomarker discovery for a disease - Want to identify druggable targets from disease profiling - Ask for integrated genomics + transcriptomics + proteomics analysis - Need cross-layer concordance analysis - Ask about disease network biology / hub genes **NOT for** (use other skills instead): - Single gene/target validation -> Use `tooluniverse-drug-target-validation` - Drug safety profiling -> Use `tooluniverse-adverse-event-detection` - General disease overview -> Use `tooluniverse-disease-research` - Variant interpretation -> Use `tooluniverse-variant-interpretation` - GWAS-specific analysis -> Use `tooluniverse-gwas-*` skills - Pathway-only analysis -> Use `tooluniverse-systems-biology` --- ## Input Parameters | Parameter | Required | Description | Example | |-----------|----------|-------------|---------| | **disease** | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | `Alzheimer disease`, `MONDO_0004975` | | **tissue** | No | Tissue/organ of interest | `brain`, `liver`, `blood` | | **focus_layers** | No | Specific omics layers to emphasize | `genomics`, `transcriptomics`, `pathways` | --- ## Multi-Omics Confidence Score (0-100) ### Score Components **Data Availability (0-40 points)**: - Genomics data available (GWAS or rare variants): 10 points - Transcriptomics data available (DEGs or expression): 10 points - Protein data available (PPI or expression): 5 points - Pathway data available (enriched pathways): 10 points - Clinical/drug data available (approved drugs or trials): 5 points **Evidence Concordance (0-40 points)**: - Multi-layer genes (appear in 3+ layers): up to 20 points (2 per gene, max 10 genes) - Consistent direction (genetics + expression concordant): 10 points - Pathway-gene concordance (genes found in enriched pathways): 10 points **Evidence Quality (0-20 points)**: - Strong genetic evidence (GWAS p < 5e-8): 10 points - Clinical validation (approved drugs): 10 points ### Score Interpretation | Score | Tier | Interpretation | |-------|------|----------------| | **80-100** | Excellent | Comprehensive multi-omics coverage, high confidence, strong cross-layer concordance | | **60-79** | Good | Good coverage across most layers, some gaps | | **40-59** | Moderate | Moderate coverage, limited cross-layer integration | | **0-39** | Limited | Limited data, single-layer analysis dominates | ### Evidence Grading System | Tier | Symbol | Criteria | Examples | |------|--------|----------|----------| | **T1** | [T1] | Direct human evidence, clinical proof | FDA-approved drug, GWAS hit (p<5e-8), clinical trial result | | **T2** | [T2] | Experimental evidence | Differential expression (validated), functional screen, mouse KO | | **T3** | [T3] | Computational/database evidence | PPI network, pathway mapping, expression correlation | | **T4** | [T4] | Annotation/prediction only | GO annotation, text-mined association, predicted interaction | --- ## Report Template Create this file structure at the start: `{disease_name}_multiomic_report.md` ```markdown # Multi-Omics Disease Characterization: {Disease Name} **Report Generated**: {date} **Disease Identifiers**: (to be filled) **Multi-Omics Confidence Score**: (to be calculated) --- ## Executive Summary (2-3 sentence disease mechanism synthesis - fill after all layers complete) --- ## 1. Disease Definition & Context ### Disease Identifiers | System | ID | Source | |--------|-----|--------| ### Description ### Synonyms ### Disease Hierarchy (parents/children) ### Affected Tissues/Organs ### Therapeutic Areas **Sources**: (tools used) --- ## 2. Genomics Layer ### 2.1 GWAS Associations | SNP | P-value | Effect | Gene | Study | Source | |-----|---------|--------|------|-------|--------| ### 2.2 GWAS Studies Summary | Study ID | Trait | Sample Size | Year | Source | |----------|-------|-------------|------|--------| ### 2.3 Associated Genes (Genetic Evidence) | Gene | Ensembl ID | Association Score | Evidence Type | Source | |------|------------|-------------------|---------------|--------| ### 2.4 Rare Variants (ClinVar) | Variant | Gene | Clinical Significance | Source | |---------|------|-----------------------|--------| ### Genomics Layer Summary - Total GWAS hits: - Top genes by genetic evidence: - Genetic architecture: **Sources**: (tools used) --- ## 3. Transcriptomics Layer ### 3.1 Differential Expression Studies | Experiment | Condition | Up-regulated | Down-regulated | Source | |------------|-----------|--------------|----------------|--------| ### 3.2 Expression Atlas Disease Evidence | Gene | Score | Source | |------|-------|--------| ### 3.3 Tissue Expression Patterns (GTEx/HPA) | Gene | Tissue | Expression Level | Source | |------|--------|-----------------|--------| ### 3.4 Biomarker Candidates (Expression-Based) | Gene | Tissue Specificity | Fold Change | Evidence | Source | |------|-------------------|-------------|----------|--------| ### Transcriptomics Layer Summary - Differential expression datasets: - Top DEGs: - Tissue-specific patterns: **Sources**: (tools used) --- ## 4. Proteomics & Interaction Layer ### 4.1 Protein-Protein Interactions (STRING) | Protein A | Protein B | Score | Source | |-----------|-----------|-------|--------| ### 4.2 Hub Genes (Network Centrality) | Gene | Degree | Betweenness | Role | Source | |------|--------|-------------|------|--------| ### 4.3 Protein Complexes (IntAct) | Complex | Members | Function | Source | |---------|---------|----------|--------| ### 4.4 Tissue-Specific PPI Network | Gene | Interaction Score | Tissue | Source | |------|-------------------|--------|--------| ### Proteomics Layer Summary - Total PPIs: - Hub genes: - Network modules: **Sources**: (tools used) --- ## 5. Pathway & Network Layer ### 5.1 Enriched Pathways (Enrichr/Reactome) | Pathway | Database | P-value | Genes | Source | |---------|----------|---------|-------|--------| ### 5.2 Reactome Pathway Details | Pathway ID | Name | Genes Involved | Source | |------------|------|----------------|--------| ### 5.3 KEGG Pathways | Pathway ID | Name | Description | Source | |------------|------|-------------|--------| ### 5.4 WikiPathways | Pathway ID | Name | Organism | Source | |------------|------|----------|--------| ### Pathway Layer Summary - Top enriched pathways: - Key pathway nodes: - Cross-pathway connections: **Sources**: (tools used) --- ## 6. Gene Ontology & Functional Annotation ### 6.1 Biological Processes | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| ### 6.2 Molecular Functions | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| ### 6.3 Cellular Components | GO Term | Name | P-value | Genes | Source | |---------|------|---------|-------|--------| **Sources**: (tools used) --- ## 7. Therapeutic Landscape ### 7.1 Approved Drugs | Drug | ChEMBL ID | Mechanism | Target | Phase | Source | |------|-----------|-----------|--------|-------|--------| ### 7.2 Druggable Targets | Gene | Tractability | Modality | Clinical Precedent | Source | |------|-------------|----------|-------------------|--------| ### 7.3 Drug Repurposing Candidates | Drug | Original Indication | Mechanism | Target | Source | |------|---------------------|-----------|--------|--------| ### 7.4 Clinical Trials | NCT ID | Title | Phase | Status | Intervention | Source | |--------|-------|-------|--------|--------------|--------| ### Therapeutic Summary - Approved drugs: - Clinical pipeline: - Novel targets: **Sources**: (tools used) --- ## 8. Multi-Omics Integration ### 8.1 Cross-Layer Gene Concordance | Gene | Genomics | Transcriptomics | Proteomics | Pathways | Layers | Evidence Tier | |------|----------|-----------------|------------|----------|--------|---------------| ### 8.2 Multi-Omics Hub Genes (Top 20) | Rank | Gene | Layers Found | Key Evidence | Druggable | Source | |------|------|-------------|--------------|-----------|--------| ### 8.3 Biomarker Candidates | Biomarker | Type | Evidence Layers | Confidence | Source | |-----------|------|-----------------|------------|--------| ### 8.4 Mechanistic Hypotheses 1. (Hypothesis with supporting evidence from multiple layers) 2. ... ### 8.5 Systems-Level Insights - Key disrupted processes: - Critical pathway nodes: - Therapeutic intervention points: - Testable hypotheses: --- ## Multi-Omics Confidence Score | Component | Points | Max | Details | |-----------|--------|-----|---------| | Genomics data | | 10 | | | Transcriptomics data | | 10 | | | Protein data | | 5 | | | Pathway data | | 10 | | | Clinical data | | 5 | | | Multi-layer genes | | 20 | | | Direction concordance | | 10 | | | Pathway-gene concordance | | 10 | | | Genetic evidence quality | | 10 | | | Clinical validation | | 10 | | | **TOTAL** | | **100** | | **Score**: XX/100 - [Tier] --- ## Data Availability Checklist | Omics Layer | Data Available | Tools Used | Findings | |-------------|---------------|------------|----------| | Genomics (GWAS) | Yes/No | | | | Genomics (Rare Variants) | Yes/No | | | | Transcriptomics (DEGs) | Yes/No | | | | Transcriptomics (Expression) | Yes/No | | | | Proteomics (PPI) | Yes/No | | | | Proteomics (Expression) | Yes/No | | | | Pathways (Enrichment) | Yes/No | | | | Pathways (KEGG/Reactome) | Yes/No | | | | Gene Ontology | Yes/No | | | | Drugs/Therapeutics | Yes/No | | | | Clinical Trials | Yes/No | | | | Literature | Yes/No | | | --- ## Completeness Checklist - [ ] Disease disambiguation complete (IDs resolved) - [ ] Genomics layer analyzed (GWAS + variants) - [ ] Transcriptomics layer analyzed (DEGs + expression) - [ ] Proteomics layer analyzed (PPI + interactions) - [ ] Pathway layer analyzed (enrichment + mapping) - [ ] Gene Ontology analyzed (BP + MF + CC) - [ ] Therapeutic landscape analyzed (drugs + targets + trials) - [ ] Cross-layer integration complete (concordance analysis) - [ ] Multi-Omics Confidence Score calculated - [ ] Biomarker candidates identified - [ ] Hub genes identified - [ ] Mechanistic hypotheses generated - [ ] Executive summary written - [ ] All sections have source citations --- ## References ### Data Sources Used | # | Tool | Parameters | Section | Items Retrieved | |---|------|------------|---------|-----------------| ### Database Versions - OpenTargets: (current) - GWAS Catalog: (current) - STRING: (current) - Reactome: (current) ``` --- ## Phase 0: Disease Disambiguation (ALWAYS FIRST) **Objective**: Resolve disease to standard identifiers for all downstream queries. ### Tools Used **OpenTargets_get_disease_id_description_by_name** (primary): - **Input**: `diseaseName` (string) - Disease name - **Output**: `{data: {search: {hits: [{id, name, description}]}}}` - **Use**: Get MONDO/EFO IDs and description - **CRITICAL**: Disease IDs from OpenTargets use underscore format (e.g., `MONDO_0004975`), NOT colon format **OSL_get_efo_id_by_disease_name** (secondary): - **Input**: `disease` (string) - Disease name - **Output**: `{efo_id, name}` - **Use**: Get EFO/MONDO ID **OpenTargets_get_disease_description_by_efoId**: - **Input**: `efoId` (string) - Disease ID (e.g., `MONDO_0004975`) - **Output**: `{data: {disease: {id, name, description, dbXRefs}}}` - **Use**: Get full description, cross-references (OMIM, UMLS, DOID, etc.) **OpenTargets_get_disease_synonyms_by_efoId**: - **Input**: `efoId` (string) - **Output**: `{data: {disease: {id, name, synonyms: [{relation, terms}]}}}` **OpenTargets_get_disease_therapeutic_areas_by_efoId**: - **Input**: `efoId` (string) - **Output**: `{data: {disease: {id, name, therapeuticAreas: [{id, name}]}}}` **OpenTargets_get_disease_ancestors_parents_by_efoId**: - **Input**: `efoId` (string) - **Output**: `{data: {disease: {id, name, ancestors: [{id, name}]}}}` **OpenTargets_get_disease_descendants_children_by_efoId**: - **Input**: `efoId` (string) - **Output**: `{data: {disease: {id, name, descendants: [{id, name}]}}}` **OpenTargets_map_any_disease_id_to_all_other_ids**: - **Input**: `inputId` (string) - Any known disease ID (e.g., `OMIM:104300`, `UMLS:C0002395`) - **Output**: `{data: {disease: {id, name, dbXRefs: [str], ...}}}` - **Use**: Cross-map between OMIM, UMLS, ICD10, DOID, etc. ### Workflow 1. Search by disease name to get primary ID (OpenTargets) 2. Get full description and cross-references 3. Get synonyms for search term expansion 4. Get therapeutic areas for context 5. Get disease hierarchy (parents/children) 6. If user provided OMIM/other ID, map to MONDO/EFO first ### Collision-Aware Search When disease name returns multiple hits: - Check if user's input matches any hit exactly - If ambiguous, present top 3-5 options and ask user to select - Always prefer the most specific disease (not parent categories) - For cancer, prefer the specific tumor type over generic "cancer" ### Key Disease IDs to Track After disambiguation, store these for all downstream queries: - `efo_id` - Primary ID for OpenTargets queries (e.g., `MONDO_0004975`) - `disease_name` - Canonical name (e.g., `Alzheimer disease`) - `synonyms` - For literature search expansion - `therapeutic_areas` - For context - `dbXRefs` - Cross-references (OMIM, UMLS, DOID, etc.) --- ## Phase 1: Genomics Layer **Objective**: Identify genetic variants, GWAS associations, and genetically implicated genes. ### Tools Used **OpenTargets_get_associated_targets_by_disease_efoId** (primary): - **Input**: `efoId` (string) - Disease EFO/MONDO ID - **Output**: `{data: {disease: {id, name, associatedTargets: {count, rows: [{target: {id, approvedSymbol}, score}]}}}}` - **Use**: Get ALL disease-associated genes ranked by overall evidence score - **NOTE**: Returns top 25 by default. For comprehensive analysis, note the total `count` **OpenTargets_get_evidence_by_datasource**: - **Input**: `efoId` (string), `ensemblId` (string), optional `datasourceIds` (array), `size` (int, default 50) - **Output**: `{data: {disease: {evidences: {count, rows: [{...evidence details}]}}}}` - **Use**: Get specific evidence types. Key datasourceIds for genomics: - `['ot_genetics_portal']` - GWAS/genetics - `['gene2phenotype', 'genomics_england', 'orphanet']` - Rare variants - `['eva']` - ClinVar variants **gwas_search_associations** (GWAS Catalog): - **Input**: `disease_trait` (string), `size` (int, default 20) - **Output**: `{data: [{association_id, p_value, or_per_copy_num, or_value, beta, risk_frequency, efo_traits: [{...}], ...}], metadata: {pagination: {totalElements}}}` - **Use**: Get genome-wide significant associations - **NOTE**: Use disease name (e.g., "Alzheimer"), not ID. Returns paginated results **gwas_get_studies_for_trait**: - **Input**: `disease_trait` (string), `size` (int) - **Output**: `{data: [...studies], metadata: {pagination}}` - **NOTE**: May return empty if trait name does not match exactly. Try synonyms **gwas_get_variants_for_trait**: - **Input**: `disease_trait` (string), `size` (int) - **Output**: `{data: [...variants], metadata: {pagination}}` **GWAS_search_associations_by_gene**: - **Input**: `gene_name` (string) - **Output**: Associations for a specific gene **OpenTargets_search_gwas_studies_by_disease**: - **Input**: `diseaseIds` (array of strings), `enableIndirect` (bool, default true), `size` (int, default 10) - **Output**: `{data: {studies: {count, rows: [{id, studyType, traitFromSource, publicationFirstAuthor, publicationDate, pubmedId, nSamples, nCases, nControls, ...}]}}}` - **Use**: Get GWAS studies from OpenTargets genetics portal **clinvar_search_variants**: - **Input**: `condition` (string) or `gene` (string), optional `max_results` (int) - **Output**: List of ClinVar variants with clinical significance - **Use**: Rare variant / monogenic disease evidence ### Workflow 1. Get associated genes from OpenTargets (overall scores) 2. For top 10-15 genes, get genetic evidence specifically via `OpenTargets_get_evidence_by_datasource` 3. Search GWAS Catalog for associations 4. Search OpenTargets GWAS studies 5. Search ClinVar for rare variants 6. For top GWAS genes, check `GWAS_search_associations_by_gene` ### Gene Tracking Maintain a dictionary of genes found in genomics layer: ```python genomics_genes = { 'PSEN1': {'score': 0.87, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000080815', 'layer': 'genomics'}, 'APP': {'score': 0.82, 'evidence': 'genetic', 'ensembl_id': 'ENSG00000142192', 'layer': 'genomics'}, # ... } ``` --- ## Phase 2: Transcriptomics Layer **Objective**: Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers. ### Tools Used **ExpressionAtlas_search_differential**: - **Input**: optional `gene` (string), `condition` (string), `species` (string, default 'homo sapiens') - **Output**: Differential expression studies and results - **Use**: Find studies where genes are differentially expressed in disease **ExpressionAtlas_search_experiments**: - **Input**: optional `gene` (string), `condition` (string), `species` (string) - **Output**: Expression experiments relevant to condition - **Use**: Find all Expression Atlas experiments for the disease **expression_atlas_disease_target_score**: - **Input**: `efoId` (string), `pageSize` (int, required) - **Output**: Genes scored by expression evidence for the disease - **Use**: Get expression-based disease-gene association scores **europepmc_disease_target_score**: - **Input**: `efoId` (string), `pageSize` (int, required) - **Output**: Genes scored by literature evidence for the disease - **Use**: Complement expression evidence with literature-mined associations **HPA_get_rna_expression_by_source** (Human Protein Atlas): - **Input**: `gene_name` (string), `source_type` (string: 'tissue', 'blood', 'brain'), `source_name` (string: e.g., 'brain', 'liver') - **Output**: `{status, data: {gene_name, source_type, source_name, expression_value, expression_level, expression_unit}}` - **NOTE**: ALL 3 params required. `source_type` options: 'tissue', 'blood', 'brain', 'cell_line', 'single_cell' **HPA_get_rna_expression_in_specific_tissues**: - **Input**: `gene_name` (string), `tissues` (array of strings) - **Output**: Expression across specified tissues **HPA_get_cancer_prognostics_by_gene**: - **Input**: `gene_name` (string) - **Output**: Cancer prognostic data (if cancer context) **HPA_get_subcellular_location**: - **Input**: `gene_name` (string) - **Output**: Subcellular localization data **HPA_search_genes_by_query**: - **Input**: `query` (string) - **Output**: Matching genes in HPA ### Workflow 1. Search Expression Atlas for differential expression studies 2. Get expression-based disease scores 3. Get literature-based disease scores (EuropePMC) 4. For top 10-15 genes from genomics layer, check tissue expression via HPA 5. Check disease-relevant tissue expression patterns 6. For cancer: check prognostic biomarkers ### Gene Tracking Add transcriptomics genes to tracking: ```python transcriptomics_genes = { 'APOE': {'expression_score': 0.75, 'tissues': ['brain'], 'evidence': 'differential_expression', 'layer': 'transcriptomics'}, # ... } ``` --- ## Phase 3: Proteomics & Interaction Layer **Objective**: Map protein-protein interactions, identify hub genes, and characterize interaction networks. ### Tools Used **STRING_get_interaction_partners** (primary PPI): - **Input**: `protein_ids` (array of strings - gene names work), `species` (int, default 9606), `confidence_score` (float, default 0.4), `limit` (int, default 20) - **Output**: `{status: 'success', data: [{stringId_A, stringId_B, preferredName_A, preferredName_B, ncbiTaxonId, score, nscore, fscore, pscore, ascore, escore, dscore, tscore}]}` - **Use**: Get interaction partners for disease genes - **NOTE**: `protein_ids` is an array, NOT string. Gene symbols like `['APOE']` work **STRING_get_network**: - **Input**: `protein_ids` (array), `species` (int), `confidence_score` (float) - **Output**: Network of interactions between input proteins - **Use**: Build disease-specific PPI network **STRING_functional_enrichment**: - **Input**: `protein_ids` (array), `species` (int) - **Output**: Functional enrichment results (GO, KEGG, etc.) - **Use**: Functional characterization of disease gene set **STRING_ppi_enrichment**: - **Input**: `protein_ids` (array), `species` (int) - **Output**: Statistical test for PPI enrichment (more interactions than expected) - **Use**: Test if disease genes form a connected module **intact_get_interactions**: - **Input**: `identifier` (string - UniProt ID or gene name) - **Output**: Molecular interaction data from IntAct **intact_search_interactions**: - **Input**: `query` (string), `first` (int, default 0), `max` (int, default 25) - **Output**: Search results for interactions **HPA_get_protein_interactions_by_gene**: - **Input**: `gene_name` (string) - **Output**: `{gene, interactions, interactor_count, interactors: [...]}` **humanbase_ppi_analysis**: - **Input**: `gene_list` (array), `tissue` (string), `max_node` (int), `interaction` (string), `string_mode` (bool) - **Output**: Tissue-specific PPI network - **NOTE**: ALL params required. `interaction` options: 'coexpression', 'interaction', 'coexpression_and_interaction'. `string_mode`: true/false ### Workflow 1. Take top 15-20 genes from genomics + transcriptomics layers 2. Query STRING for interaction partners of each gene 3. Build composite PPI network using STRING_get_network 4. Test PPI enrichment (are genes more connected than random?) 5. Get functional enrichment from STRING 6. For disease-relevant tissue, get tissue-specific network (HumanBase) 7. Identify hub genes (highest degree centrality) 8. Check IntAct for experimentally validated interactions ### Hub Gene Analysis Calculate network centrality metrics: - **Degree**: Number of interaction partners - **Betweenness**: Number of shortest paths through node - **Hub score**: Genes with degree > mean + 1 SD are hubs --- ## Phase 4: Pathway & Network Layer **Objective**: Identify enriched biological pathways and cross-pathway connections. ### Tools Used **enrichr_gene_enrichment_analysis** (primary enrichment): - **Input**: `gene_list` (array of gene symbols, min 2), `libs` (array of library names) - **Output**: `{status: 'success', data: '{...JSON string with enrichment results...}'}` - **Key libraries**: `['KEGG_2021_Human']`, `['Reactome_2022']`, `['WikiPathway_2023_Human']`, `['GO_Biological_Process_2023']`, `['GO_Molecular_Function_2023']`, `['GO_Cellular_Component_2023']` - **NOTE**: `data` field is a JSON string, needs parsing. Contains `connected_paths` and per-library results - **NOTE**: `libs` is REQUIRED as array **ReactomeAnalysis_pathway_enrichment**: - **Input**: `identifiers` (string - space-separated gene list), optional `page_size` (int, default 20), `include_disease` (bool), `projection` (bool) - **Output**: `{data: {token, analysis_type, pathways_found, pathways: [{pathway_id, name, species, is_disease, is_lowest_level, entities_found, entities_total, entities_ratio, p_value, fdr, reactions_found, reactions_total}]}}` - **Use**: Reactome-specific pathway enrichment with statistical testing **Reactome_map_uniprot_to_pathways**: - **Input**: `id` (string - UniProt accession) - **Output**: List of Reactome pathways containing this protein - **Use**: Map individual proteins to pathways **Reactome_get_pathway**: - **Input**: `stId` (string - Reactome stable ID, e.g., 'R-HSA-73817') - **Output**: Pathway details **Reactome_get_pathway_reactions**: - **Input**: `stId` (string) - **Output**: Reactions within pathway **kegg_search_pathway**: - **Input**: `keyword` (string) - **Output**: Array of KEGG pathway matches **kegg_get_pathway_info**: - **Input**: `pathway_id` (string, e.g., 'hsa04930') - **Output**: Detailed pathway information **WikiPathways_search**: - **Input**: `query` (string), optional `organism` (string, e.g., 'Homo sapiens') - **Output**: Matching community-curated pathways ### Workflow 1. Collect all genes from genomics + transcriptomics layers (top 20-30) 2. Run Enrichr enrichment for KEGG, Reactome, WikiPathways 3. Run ReactomeAnalysis for more detailed Reactome enrichment with p-values 4. Search KEGG for disease-specific pathways 5. Search WikiPathways for disease pathways 6. For top Reactome pathways, get detailed reactions 7. Identify cross-pathway connections (genes in multiple pathways) --- ## Phase 5: Gene Ontology & Functional Annotation **Objective**: Characterize biological processes, molecular functions, and cellular components. ### Tools Used **enrichr_gene_enrichment_analysis** (GO enrichment): - Use with `libs=['GO_Biological_Process_2023']` for BP - Use with `libs=['GO_Molecular_Function_2023']` for MF - Use with `libs=['GO_Cellular_Component_2023']` for CC **GO_get_annotations_for_gene**: - **Input**: `gene_id` (string - gene symbol or UniProt ID) - **Output**: List of GO annotations with terms, aspects, evidence codes **GO_search_terms**: - **Input**: `query` (string) - **Output**: Matching GO terms **QuickGO_annotations_by_gene**: - **Input**: `gene_product_id` (string - UniProt accession, e.g., 'UniProtKB:P02649'), optional `aspect` (string: 'biological_process', 'molecular_function', 'cellular_component'), `taxon_id` (int: 9606), `limit` (int: 25) - **Output**: GO annotations with evidence codes **OpenTargets_get_target_gene_ontology_by_ensemblID**: - **Input**: `ensemblId` (string) - **Output**: GO terms associated with target ### Workflow 1. Run Enrichr GO enrichment for all 3 aspects using combined gene list 2. For top 5 genes, get detailed GO annotations from QuickGO 3. For top genes, get OpenTargets GO terms 4. Summarize key biological processes, molecular functions, cellular components --- ## Phase 6: Therapeutic Landscape **Objective**: Map approved drugs, druggable targets, repurposing opportunities, and clinical trials. ### Tools Used **OpenTargets_get_associated_drugs_by_disease_efoId** (primary): - **Input**: `efoId` (string), `size` (int, REQUIRED - use 100) - **Output**: `{data: {disease: {knownDrugs: {count, rows: [{drug: {id, name, tradeNames, maximumClinicalTrialPhase, isApproved, hasBeenWithdrawn}, phase, mechanismOfAction, target: {id, approvedSymbol}, disease: {id, name}, urls: [{url, name}]}]}}}}` - **Use**: All drugs associated with disease (approved + investigational) **OpenTargets_get_target_tractability_by_ensemblID**: - **Input**: `ensemblId` (string) - **Output**: Tractability assessment (small molecule, antibody, PROTAC, etc.) **OpenTargets_get_associated_drugs_by_target_ensemblID**: - **Input**: `ensemblId` (string), `size` (int, REQUIRED) - **Output**: Drugs targeting this gene/protein **search_clinical_trials**: - **Input**: `query_term` (string, REQUIRED), optional `condition` (string), `intervention` (string), `pageSize` (int, default 10) - **Output**: Clinical trial results - **NOTE**: `query_term` is REQUIRED even if `condition` is provided **OpenTargets_get_drug_mechanisms_of_action_by_chemblId**: - **Input**: `chemblId` (string) - **Output**: Mechanism of action details ### Workflow 1. Get all drugs for disease from OpenTargets 2. For top disease-associated genes, check tractability 3. For top genes with no approved drugs, identify repurposing candidates 4. Search clinical trials for disease 5. For top approved drugs, get mechanism of action ### Drug Tracking ```python drug_targets = { 'PSEN1': {'drugs': ['Semagacestat'], 'tractability': 'small_molecule', 'clinical_phase': 3}, 'ACHE': {'drugs': ['Donepezil', 'Galantamine'], 'tractability': 'small_molecule', 'clinical_phase': 4}, # ... } ``` --- ## Phase 7: Multi-Omics Integration **Objective**: Integrate findings across all layers to identify cross-layer genes, calculate concordance, and generate mechanistic hypotheses. ### Cross-Layer Gene Concordance Analysis This is the core integrative step. For each gene found in the analysis: 1. **Count layers**: In how many omics layers does this gene appear? - Genomics (GWAS, rare variants, genetic association) - Transcriptomics (DEGs, expression score) - Proteomics (PPI hub, protein expression) - Pathways (enriched pathway member) - Therapeutics (drug target) 2. **Score genes**: Genes appearing in 3+ layers are "multi-omics hub genes" 3. **Direction concordance**: Do genetics and expression agree? - Risk allele + upregulated = concordant gain-of-function - Risk allele + downregulated = concordant loss-of-function - Discordant = needs investigation ### Biomarker Identification For each multi-omics hub gene, assess biomarker potential: - **Diagnostic**: Gene expression distinguishes disease vs healthy - **Prognostic**: Expression/variant predicts outcome (cancer prognostics from HPA) - **Predictive**: Variant/expression predicts treatment response (pharmacogenomics) - **Evidence level**: Number of supporting omics layers ### Mechanistic Hypothesis Generation From the integrated data: 1. Identify the most supported biological processes (GO + pathways) 2. Map causal chain: genetic variant -> gene expression -> protein function -> pathway disruption -> disease 3. Identify intervention points (druggable nodes in the causal chain) 4. Generate testable hypotheses ### Confidence Score Calculation Calculate the Multi-Omics Confidence Score (0-100) based on: - Data availability across layers - Cross-layer concordance - Evidence quality - Clinical validation --- ## Phase 8: Report Finalization ### Executive Summary Write a 2-3 sentence synthesis covering: - Disease mechanism in systems terms - Key genes/pathways identified - Therapeutic opportunities ### Final Report Quality Checklist Before presenting to user, verify: - [ ] All 8 sections have content (or marked as "No data available") - [ ] Every data point has a source citation - [ ] Executive summary reflects key findings - [ ] Multi-Omics Confidence Score calculated - [ ] Top 20 genes ranked by multi-omics evidence - [ ] Top 10 enriched pathways listed - [ ] Biomarker candidates identified - [ ] Cross-layer concordance table complete - [ ] Therapeutic opportunities summarized - [ ] Mechanistic hypotheses generated - [ ] Data Availability Checklist complete - [ ] Completeness Checklist complete - [ ] References section lists all tools used --- ## Tool Parameter Quick Reference | Tool | Key Parameters | Notes | |------|---------------|-------| | `OpenTargets_get_disease_id_description_by_name` | `diseaseName` | Primary disambiguation | | `OSL_get_efo_id_by_disease_name` | `disease` | Secondary disambiguation | | `OpenTargets_get_associated_targets_by_disease_efoId` | `efoId` | Returns top 25 genes | | `OpenTargets_get_evidence_by_datasource` | `efoId`, `ensemblId`, `datasourceIds[]`, `size` | Per-gene evidence | | `OpenTargets_search_gwas_studies_by_disease` | `diseaseIds[]`, `size` | GWAS studies | | `gwas_search_associations` | `disease_trait`, `size` | GWAS Catalog | | `clinvar_search_variants` | `condition` or `gene`, `max_results` | Rare variants | | `ExpressionAtlas_search_differential` | `condition`, `species` | DEGs | | `expression_atlas_disease_target_score` | `efoId`, `pageSize` (REQUIRED) | Expression scores | | `europepmc_disease_target_score` | `efoId`, `pageSize` (REQUIRED) | Literature scores | | `HPA_get_rna_expression_by_source` | `gene_name`, `source_type`, `source_name` (ALL REQUIRED) | Tissue expression | | `STRING_get_interaction_partners` | `protein_ids[]`, `species` (9606), `limit` | PPI partners | | `STRING_get_network` | `protein_ids[]`, `species` | PPI network | | `STRING_functional_enrichment` | `protein_ids[]`, `species` | Functional enrichment | | `STRING_ppi_enrichment` | `protein_ids[]`, `species` | Network significance | | `intact_search_interactions` | `query`, `max` | Experimental PPIs | | `humanbase_ppi_analysis` | `gene_list[]`, `tissue`, `max_node`, `interaction`, `string_mode` (ALL REQ) | Tissue PPI | | `enrichr_gene_enrichment_analysis` | `gene_list[]`, `libs[]` (BOTH REQUIRED) | Pathway/GO enrichment | | `ReactomeAnalysis_pathway_enrichment` | `identifiers` (space-sep string) | Reactome enrichment | | `Reactome_map_uniprot_to_pathways` | `id` (UniProt accession) | Protein-pathway mapping | | `kegg_search_pathway` | `keyword` | KEGG pathway search | | `WikiPathways_search` | `query`, `organism` | WikiPathways search | | `GO_get_annotations_for_gene` | `gene_id` | GO annotations | | `QuickGO_annotations_by_gene` | `gene_product_id` (e.g., 'UniProtKB:P02649') | Detailed GO | | `OpenTargets_get_associated_drugs_by_disease_efoId` | `efoId`, `size` (REQUIRED) | Disease drugs | | `OpenTargets_get_target_tractability_by_ensemblID` | `ensemblId` | Druggability | | `search_clinical_trials` | `query_term` (REQUIRED), `condition`, `pageSize` | Clinical trials | | `PubMed_search_articles` | `query`, `limit` | Literature | | `ensembl_lookup_gene` | `gene_id`, `species` ('homo_sapiens' REQUIRED) | Gene lookup | | `MyGene_query_genes` | `query`, `species`, `fields`, `size` | Gene info | | `OpenTargets_get_similar_entities_by_disease_efoId` | `efoId`, `threshold`, `size` (ALL REQUIRED) | Similar diseases | --- ## Response Format Notes (Verified) ### OpenTargets Associated Targets ```json { "data": { "disease": { "id": "MONDO_0004975", "name": "Alzheimer disease", "associatedTargets": { "count": 2456, "rows": [ { "target": {"id": "ENSG00000080815", "approvedSymbol": "PSEN1"}, "score": 0.87 } ] } } } } ``` ### GWAS Catalog Associations ```json { "data": [ { "association_id": 216440893, "p_value": 2e-09, "or_per_copy_num": 0.94, "or_value": "0.94", "efo_traits": [{"..."}], "risk_frequency": "NR" } ], "metadata": {"pagination": {"totalElements": 1061816}} } ``` ### STRING Interactions ```json { "status": "success", "data": [ { "stringId_A": "9606.ENSP00000252486", "stringId_B": "9606.ENSP00000466775", "preferredName_A": "APOE", "preferredName_B": "APOC2", "score": 0.999 } ] } ``` ### Reactome Enrichment ```json { "data": { "token": "...", "pathways_found": 154, "pathways": [ { "pathway_id": "R-HSA-1251985", "name": "Nuclear signaling by ERBB4", "species": "Homo sapiens", "is_disease": false, "is_lowest_level": true, "entities_found": 3, "entities_total": 47, "entities_ratio": 0.00291, "p_value": 4.0e-06, "fdr": 0.00068, "reactions_found": 3, "reactions_total": 34 } ] } } ``` ### HPA RNA Expression ```json { "status": "success", "data": { "gene_name": "APOE", "source_type": "tissue", "source_name": "brain", "expression_value": "2714.9", "expression_level": "very high", "expression_unit": "nTPM" } } ``` ### Enrichr Results ```json { "status": "success", "data": "{\"connected_paths\": {\"Path: ...\": \"Total Weight: ...\"}}" } ``` **NOTE**: The `data` field is a JSON string that needs parsing. --- ## Common Use Patterns ### 1. Comprehensive Disease Profiling ``` User: "Characterize Alzheimer's disease across omics layers" -> Run all 8 phases -> Produce full multi-omics report ``` ### 2. Therapeutic Target Discovery ``` User: "What are druggable targets for rheumatoid arthritis?" -> Emphasize Phase 1 (genomics), Phase 6 (therapeutics), Phase 7 (integration) -> Focus on tractability and clinical precedent ``` ### 3. Biomarker Identification ``` User: "Find diagnostic biomarkers for pancreatic cancer" -> Emphasize Phase 2 (transcriptomics), Phase 3 (proteomics), Phase 7 (biomarkers) -> Focus on tissue-specific expression and diagnostic potential ``` ### 4. Mechanism Elucidation ``` User: "What pathways are dysregulated in Crohn's disease?" -> Emphasize Phase 4 (pathways), Phase 5 (GO), Phase 7 (mechanistic hypotheses) -> Focus on pathway enrichment and cross-pathway connections ``` ### 5. Drug Repurposing ``` User: "What existing drugs could be repurposed for ALS?" -> Emphasize Phase 1 (genetics), Phase 6 (therapeutic landscape), Phase 7 (repurposing) -> Focus on drugs targeting disease-associated genes ``` ### 6. Systems Biology ``` User: "What are the hub genes and key pathways in type 2 diabetes?" -> Emphasize Phase 3 (PPI network), Phase 4 (pathways), Phase 7 (network analysis) -> Focus on hub genes and network modules ``` --- ## Edge Case Handling ### Rare Diseases (limited data) - Genomics layer may dominate (single gene) - Limited GWAS data (monogenic) - Focus on ClinVar variants, pathway consequences - Confidence score will be lower (less cross-layer data) ### Common Diseases (overwhelming data) - Thousands of GWAS associations - Prioritize by effect size and significance - Focus on top 20-30 genes for downstream analysis - Use strict significance thresholds (p < 5e-8) ### Cancer - Include somatic mutations (if CIViC/cBioPortal available) - Check cancer prognostics via HPA - Include tumor-specific expression patterns - Clinical trial landscape may be extensive ### Monogenic Diseases - Single gene dominates - ClinVar/OMIM evidence is primary - Pathway analysis reveals downstream effects - Therapeutic landscape may be limited (gene therapy, enzyme replacement) ### Polygenic Diseases - Many weak genetic signals - GWAS provides the gene list - Pathway enrichment reveals convergent biology - Network analysis identifies hub genes ### Tissue Ambiguity - Diseases affecting multiple tissues - Query HPA for all relevant tissues - Compare tissue-specific expression patterns - Use tissue context from disease ontology --- ## Fallback Strategies ### If disease name not found 1. Try synonyms 2. Try broader disease category 3. Try OMIM/UMLS ID mapping 4. Report disambiguation failure and ask user ### If no GWAS data 1. Check ClinVar for rare variants 2. Use OpenTargets genetic evidence 3. Note in report as "Limited genetic data" 4. Adjust confidence score accordingly ### If no expression data 1. Try different disease name/synonym 2. Check HPA for individual gene expression 3. Use OpenTargets expression evidence 4. Note as "Limited transcriptomics data" ### If no pathway enrichment 1. Reduce gene list stringency 2. Try different pathway databases 3. Map individual genes to pathways via Reactome 4. Note as "No significant pathway enrichment" ### If no drugs found 1. Check if disease is rare/orphan 2. Look for drugs targeting individual genes 3. Check clinical trials for investigational therapies 4. Note as "No approved drugs - novel therapeutic opportunity"