---
name: tooluniverse-multiomic-disease-characterization
description: Comprehensive disease characterization across genomics, transcriptomics, proteomics, and pathways for systems-level understanding. Identifies therapeutic opportunities and biomarker candidates by integrating multi-layer molecular data. Use for full-omics disease deep-dive reports, mechanism mapping, and biomarker-and-target identification from multi-omics data.
disable-model-invocation: true
---

# Multi-Omics Disease Characterization Pipeline

Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.

**KEY PRINCIPLES**:
1. **Report-first approach** - Create report file FIRST, then populate progressively
2. **Disease disambiguation FIRST** - Resolve all identifiers before omics analysis
3. **Layer-by-layer analysis** - Systematically cover all omics layers
4. **Cross-layer integration** - Identify genes/targets appearing in multiple layers
5. **Evidence grading** - Grade all evidence as T1 (human/clinical) to T4 (computational)
6. **Tissue context** - Emphasize disease-relevant tissues/organs
7. **Quantitative scoring** - Multi-Omics Confidence Score (0-100)
8. **Druggable focus** - Prioritize targets with therapeutic potential
9. **Biomarker identification** - Highlight diagnostic/prognostic markers
10. **Mechanistic synthesis** - Generate testable hypotheses
11. **Source references** - Every statement must cite tool/database
12. **Completeness checklist** - Mandatory section showing analysis coverage
13. **English-first queries** - Always use English terms in tool calls. Respond in user's language

Multi-omics disease characterization asks: what molecular layers are dysregulated? Genomic mutations → transcriptomic changes → proteomic effects → metabolomic consequences. Concordance across layers strengthens the finding. Discordance reveals regulatory complexity.

## LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.

---

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## When to Use This Skill

Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes

**NOT for** (use other skills instead):
- Single gene/target validation -> Use `tooluniverse-drug-target-validation`
- Drug safety profiling -> Use `tooluniverse-adverse-event-detection`
- General disease overview -> Use `tooluniverse-disease-research`
- Variant interpretation -> Use `tooluniverse-variant-interpretation`
- GWAS-specific analysis -> Use `tooluniverse-gwas-*` skills
- Pathway-only analysis -> Use `tooluniverse-systems-biology`

---

## Input Parameters

| Parameter | Required | Description | Example |
|-----------|----------|-------------|---------|
| **disease** | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | `Alzheimer disease`, `MONDO_0004975` |
| **tissue** | No | Tissue/organ of interest | `brain`, `liver`, `blood` |
| **focus_layers** | No | Specific omics layers to emphasize | `genomics`, `transcriptomics`, `pathways` |

---

## Pipeline Overview

The pipeline runs 9 phases sequentially. Each phase uses specific tools documented in detail in `tool-reference.md`.

### Phase 0: Disease Disambiguation (ALWAYS FIRST)
Resolve disease to standard identifiers (MONDO/EFO) for all downstream queries.
- Primary tool: `OpenTargets_get_disease_id_description_by_name`
- Get description, synonyms, therapeutic areas, disease hierarchy, cross-references
- **CRITICAL**: Disease IDs use underscore format (e.g., `MONDO_0004975`), NOT colon
- If ambiguous, present top 3-5 options and ask user to select

### Phase 1: Genomics Layer
Identify genetic variants, GWAS associations, and genetically implicated genes.
- Tools: `gwas_search_associations` (use `efo_id` for precision, not free-text `disease_trait`), `gwas_get_snps_for_gene`, ClinVar, OpenTargets associated targets
- `gnomad_get_gene_constraints` — gene constraint metrics (pLI, oe_lof) to interpret whether LoF variants are tolerated vs. haploinsufficient
- Get top 10-15 genes with genetic evidence scores; track Ensembl IDs for downstream phases

### Phase 2: Transcriptomics Layer
Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
- `GTEx_get_expression_summary` — baseline expression across 54 tissues (accepts `gene_symbol` directly)
- Tools: Expression Atlas, HPA (tissue expression), EuropePMC scores
- Check expression in disease-relevant tissues for top genes from Phase 1

### Phase 3: Proteomics & Interaction Layer
Map protein-protein interactions, identify hub genes, and characterize interaction networks.
- `UniProt_get_function_by_accession` — protein function narrative (essential for mechanistic context)
- Tools: `STRING_get_network` (param: `identifiers`, `species`=9606), `intact_get_interactions`, HumanBase
- Build PPI network from top 15-20 genes; identify hub genes by degree centrality

### Phase 4: Pathway & Network Layer
Identify enriched biological pathways and cross-pathway connections.
- `ReactomeAnalysis_pathway_enrichment` — identifiers are **newline-separated** (`\n`), NOT space-separated
- `enrichr_gene_enrichment_analysis` — param: `gene_list` (array), `libs` (array). NOTE: `data` field is a JSON string that needs parsing
- `kegg_search_pathway` — pathway keyword search

### Phase 5: Gene Ontology & Functional Annotation
Characterize biological processes, molecular functions, and cellular components.
- Tools: Enrichr (GO libraries), QuickGO, GO annotations, OpenTargets GO
- Run GO enrichment for all 3 aspects (BP, MF, CC)

### Phase 6: Therapeutic Landscape
Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
- `DGIdb_get_drug_gene_interactions` — drug interactions by gene (param: `genes` as array). Often more comprehensive than OpenTargets for drug-gene data.
- OpenTargets drugs/tractability (use **EFO IDs** like `EFO_0000384` for Crohn's, not MONDO — MONDO IDs may return null for drug queries)
- `search_clinical_trials` — `query_term` is REQUIRED

### Phase 7: Multi-Omics Integration
Integrate findings across all layers. See `integration-scoring.md` for full details.
- Cross-layer gene concordance: count layers per gene, score multi-layer hub genes
- Direction concordance: genetics + expression agreement
- Biomarker identification: diagnostic, prognostic, predictive
- Mechanistic hypothesis generation

### Phase 8: Report Finalization
Write executive summary, calculate confidence score, verify completeness.
- See `integration-scoring.md` for quality checklist and scoring formula

---

## Key Tool Parameter Notes

These are the most common parameter pitfalls:
- `OpenTargets` disease IDs: underscore format (`MONDO_0004975`), NOT colon
- `STRING` `protein_ids`: must be **array** (`['APOE']`), not string
- `enrichr` `libs`: must be **array** (`['KEGG_2021_Human']`)
- `HPA_get_rna_expression_by_source`: ALL 3 params required (`gene_name`, `source_type`, `source_name`)
- `humanbase_ppi_analysis`: ALL params required (`gene_list`, `tissue`, `max_node`, `interaction`, `string_mode`)
- `expression_atlas_disease_target_score`: `pageSize` is REQUIRED
- `search_clinical_trials`: `query_term` is REQUIRED even if `condition` is provided

For full tool parameters and per-phase workflows, see `tool-reference.md`.

---

## Reference Files

All detailed content is in reference files in this directory:

| File | Contents |
|------|----------|
| `tool-reference.md` | Full tool parameters, inputs/outputs, per-phase workflows, quick reference table |
| `report-template.md` | Complete report markdown template with all sections and checklists |
| `integration-scoring.md` | Confidence score formula (0-100), evidence grading (T1-T4), integration procedures, quality checklist |
| `response-formats.md` | Verified JSON response structures for key tools |
| `use-patterns.md` | Common use patterns, edge case handling, fallback strategies |