---
name: tooluniverse-systems-biology
description: Systems biology and pathway analysis integrating Reactome, KEGG, WikiPathways, BioCarta, NCI-Nature Pathway Interaction Database. Multi-database pathway enrichment, protein-pathway relationships, network reasoning. Use for pathway analysis on a gene list, multi-source pathway concordance, and systems-level interpretation across databases.
disable-model-invocation: true
---

# Systems Biology & Pathway Analysis

Comprehensive pathway and systems biology analysis integrating multiple curated databases to provide multi-dimensional view of biological systems, pathway enrichment, and protein-pathway relationships.

## When to Use This Skill

**Triggers**:
- "Analyze pathways for this gene list"
- "What pathways is [protein] involved in?"
- "Find pathways related to [keyword/process]"
- "Perform pathway enrichment analysis"
- "Map proteins to biological pathways"
- "Find computational models for [process]"
- "Systems biology analysis of [genes/proteins]"

**Use Cases**:
1. **Gene Set Analysis**: Identify enriched pathways from RNA-seq, proteomics, or screen results
2. **Protein Function**: Discover pathways and processes a protein participates in
3. **Pathway Discovery**: Find pathways related to diseases, processes, or phenotypes
4. **Systems Integration**: Connect genes → pathways → processes → diseases
5. **Model Discovery**: Find computational systems biology models (SBML)
6. **Cross-Database Validation**: Compare pathway annotations across multiple sources

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

## Domain Reasoning: Enrichment vs Causation

Pathway analysis answers: which biological processes are enriched in my gene list? But enrichment is not causation. A pathway being enriched means your gene list overlaps it more than expected by chance. Ask: is the enrichment driven by a few hub genes, or by many genes distributed across the pathway? A pathway with 3 input genes but 200 annotated members is less informative than one where 15 of 40 members are in your list.

LOOK UP DON'T GUESS: pathway membership, gene-to-pathway assignments, and enrichment statistics. Do not assume a gene is in a pathway — use Reactome, KEGG, or Enrichr to verify. Pathway databases disagree on membership; cross-validate key findings across at least two sources.

## Core Databases Integrated

| Database | Strengths |
|----------|-----------|
| **Reactome** | Detailed mechanistic pathways with reactions; human-curated |
| **KEGG** | Metabolic maps, disease pathways, drug targets |
| **WikiPathways** | Emerging and community-curated pathways |
| **Pathway Commons** | Meta-database aggregating multiple sources |
| **BioModels** | Mathematical/computational SBML models |
| **Enrichr** | Statistical over-representation analysis |

## Workflow Overview

```
Input → Phase 1: Enrichment → Phase 2: Protein Mapping → Phase 3: Keyword Search → Phase 4: Top Pathways → Report
```

---

## Phase 1: Pathway Enrichment Analysis

**When**: Gene list provided (from experiments, screens, differentially expressed genes)

**Objective**: Identify biological pathways statistically over-represented in gene list

### Tools & Workflow

| Tool | Input | Use |
|------|-------|-----|
| `ReactomeAnalysis_pathway_enrichment` | `identifiers` (newline-separated symbols), `page_size` | FDR-corrected Reactome enrichment (recommended) |
| `enrichr_gene_enrichment_analysis` | `gene_list` (array), `libs` (array) | Over-representation with KEGG/Reactome/WikiPathways |
| `STRING_functional_enrichment` | `protein_ids` (array), `species`, `category` | Functional enrichment from PPI networks |
| `intact_get_interactions` | `identifier` (UniProt accession) | Binary protein interactions with evidence |

1. Submit gene list to Enrichr/Reactome. 2. Sort by adjusted p-value < 0.05. 3. Report top 10-20 pathways with IDs, p-values, and overlapping genes. If no enrichment, note explicitly.

---

## Phase 2: Protein-Pathway Mapping

**When**: Protein UniProt ID provided

**Objective**: Map protein to all known pathways it participates in

### Tools Used

**Reactome_map_uniprot_to_pathways**:
- **Input**:
  - `uniprot_id`: UniProt accession (e.g., "P53350")
- **Output**: Array of Reactome pathways containing this protein

**Reactome_get_pathway_reactions**:
- **Input**:
  - `stId`: Reactome pathway stable ID (e.g., "R-HSA-73817")
- **Output**: Array of reactions and subpathways
- **Use**: Get mechanistic details of pathways

### Workflow

1. Map UniProt ID to Reactome pathways
2. Get all pathways this protein appears in
3. For top pathway (or user-specified):
   - Retrieve detailed reactions and subpathways
   - Extract event names, types (Reaction vs Pathway)
   - Note disease associations if present

### Decision Logic

- **Multiple pathways**: Report all pathways, prioritize by hierarchical level
- **Top pathway details**: Get detailed reactions for 1-3 most relevant
- **Versioned IDs**: Reactome uses unversioned IDs - strip version if present
- **Empty results**: Check if protein ID valid; suggest alternative databases if Reactome empty

---

## Phase 3: Keyword-Based Pathway Search

**When**: User provides keyword or biological process name

**Objective**: Search multiple pathway databases to find relevant pathways

### Tools

| Tool | Key Params | Coverage |
|------|-----------|----------|
| `kegg_search_pathway` | `keyword` | Reference, metabolic, disease pathways |
| `kegg_get_pathway_info` | `pathway_id` (e.g., "hsa04930") | Detailed genes/compounds for a pathway |
| `WikiPathways_search` | `query`, `organism` | Community-curated, emerging pathways |
| `PathwayCommons_search` | `action`="search_pathways", `keyword` | Meta-database aggregating multiple sources |
| `biomodels_search` | `query`, `limit` | SBML computational models |

Search all databases in parallel. Group results by pathway concept. BioModels often returns empty — this is normal.

---

## Phase 4: Top-Level Pathway Catalog

**When**: Always included to provide context

**Objective**: Show major biological systems/pathways for organism

### Tools Used

**Reactome_list_top_pathways**:
- **Input**: `species` (e.g., "Homo sapiens")
- **Output**: Array of top-level pathway categories
- **Use**: Provides hierarchical pathway organization

### Workflow

1. Retrieve top-level pathways for specified organism
2. Display pathway categories (metabolism, signaling, disease, etc.)
3. Serve as reference for pathway hierarchy

### Decision Logic

- **Always show**: Provides context even if other phases empty
- **Organism-specific**: Filter by species of interest
- **Hierarchical view**: These are parent pathways with many subpathways

---

## Output Structure

Create a markdown report progressively: header → Phase 1 enrichment results → Phase 2 protein mapping → Phase 3 keyword search → Phase 4 top pathway catalog. Note empty results explicitly; never silently omit them. Include pathway IDs for follow-up.

## Tool Parameter Reference

**Critical Parameter Notes** (from testing):

| Tool | Correct Parameter | Common Mistake |
|------|-------------------|----------------|
| `Reactome_map_uniprot_to_pathways` | `uniprot_id` | `id` |
| `PathwayCommons_search` | `action` + `keyword` (both required) | omitting `action` |
| `enrichr_gene_enrichment_analysis` | `gene_list` (array) | string |

**Response Format Notes**:
- **Reactome**: Returns list directly (not wrapped in `{status, data}`)
- **Pathway Commons**: Returns dict with `total_hits` and `pathways`
- **Others**: Standard `{status: "success", data: [...]}` format

---

## Domain Reasoning: Enzyme Kinetics & Metabolic Analysis

LOOK UP DON'T GUESS: Km values, kcat values, cofactor requirements, and optimal pH/temperature for specific enzymes. Use `BindingDB_search_by_target`, `ChEMBL_get_molecule`, `BRENDA_get_enzyme_info` *(requires BRENDA_EMAIL + BRENDA_PASSWORD env vars; free academic registration at brenda-enzymes.org)* (if available), or `EuropePMC_search_articles` to retrieve published kinetic parameters. Do not estimate Km from first principles.

### Michaelis-Menten Kinetics

The foundational model: v = Vmax * [S] / (Km + [S])
- **Km** = substrate concentration at half-maximal velocity. NOT binding affinity (Km = (koff + kcat) / kon).
- **Vmax** = maximum velocity = kcat * [E_total]. Proportional to enzyme concentration.
- **kcat** = turnover number = molecules of substrate converted per enzyme per second.
- **Catalytic efficiency** = kcat / Km. The "best" enzymes approach the diffusion limit (~10^8 M^-1 s^-1).

To determine Km and Vmax from data: use Lineweaver-Burk (1/v vs 1/[S]), Eadie-Hofstee (v vs v/[S]), or nonlinear regression (preferred — avoids distortion from reciprocal transforms). See `enzyme_kinetics.py` in `skills/tooluniverse-computational-biophysics/scripts/`.

### Allosteric Regulation & Cooperative Binding

Not all enzymes follow Michaelis-Menten. Sigmoidal v-vs-[S] curves indicate cooperativity.
- **Hill equation**: v = Vmax * [S]^nH / (K0.5^nH + [S]^nH)
- **Hill coefficient (nH)**: nH = 1 (no cooperativity), nH > 1 (positive, e.g., hemoglobin O2 binding nH ~ 2.8), nH < 1 (negative cooperativity).
- **K0.5**: substrate concentration at half-maximal velocity (analogous to Km but not identical for cooperative systems).
- Allosteric activators shift the curve LEFT (lower K0.5). Allosteric inhibitors shift it RIGHT (higher K0.5) or reduce Vmax.

### Enzyme Inhibition Types

| Type | Effect on Km | Effect on Vmax | Lineweaver-Burk pattern |
|------|-------------|----------------|------------------------|
| Competitive | Increases (Km_app = Km * (1 + [I]/Ki)) | Unchanged | Lines intersect on y-axis |
| Uncompetitive | Decreases | Decreases | Parallel lines |
| Noncompetitive (pure) | Unchanged | Decreases (Vmax_app = Vmax / (1 + [I]/Ki)) | Lines intersect on x-axis |
| Mixed | Changes | Decreases | Lines intersect in quadrant II or III |

To determine Ki: measure v at multiple [I] and [S], fit to the appropriate model. The `enzyme_kinetics.py` script handles competitive, uncompetitive, and noncompetitive inhibition calculations.

### Troubleshooting "No Activity" Results

When a purified enzyme shows no catalytic activity, systematically check:

1. **Oligomeric state**: Many enzymes are obligate dimers/tetramers. Dilute protein may dissociate. Check with SEC, native PAGE, or DLS. Concentrate sample or add stabilizing agents (glycerol, specific ions).
2. **Cofactors**: Metal ions (Zn2+, Mg2+, Mn2+), coenzymes (NAD+, FAD, PLP), or prosthetic groups may be lost during purification. LOOK UP the enzyme's cofactor requirements and supplement the assay buffer.
3. **pH**: Most enzymes have a sharp pH optimum. Even 1 pH unit off can reduce activity 10-fold. Buffer at the literature-reported optimal pH.
4. **Temperature**: Standard assays at 25C or 37C. Thermophilic enzymes need 50-80C. Psychrophilic enzymes denature above 30C.
5. **Reducing environment**: Many enzymes need DTT or beta-mercaptoethanol to maintain active-site cysteines in reduced form.
6. **Substrate**: Wrong isomer (D- vs L-), wrong oxidation state, or degraded substrate. Use fresh substrate and verify by a positive control enzyme.
7. **Inhibitors in buffer**: EDTA chelates essential metals. Phosphate competes at phospho-binding sites. Detergents can denature.
8. **Protein folding**: Inclusion body protein may be misfolded even after refolding. Check by CD spectroscopy or thermal shift assay.

### Metabolic Flux Analysis Reasoning

Metabolic flux analysis (MFA) quantifies the rates of metabolic reactions in vivo, not just enzyme activities in vitro.

Key concepts:
- **Steady-state assumption**: At metabolic steady state, the rate of production of each intermediate equals its rate of consumption. This gives a system of linear equations: S * v = 0, where S is the stoichiometric matrix and v is the flux vector.
- **Flux Balance Analysis (FBA)**: When the system is underdetermined (more reactions than metabolites), FBA uses linear programming to optimize an objective function (e.g., maximize biomass production). Use `biomodels_search` to find published SBML models for the organism.
- **13C-MFA**: Uses isotope labeling to experimentally constrain intracellular fluxes. The labeling pattern of metabolites reveals which pathways carried flux.
- **Control coefficients**: How much does a 1% change in enzyme activity change the pathway flux? Most enzymes have near-zero flux control coefficients — flux is usually controlled by a few rate-limiting steps plus substrate supply.

LOOK UP DON'T GUESS: stoichiometric coefficients, pathway topology, and published flux distributions. Use KEGG (`kegg_get_pathway_info`), Reactome (`Reactome_get_pathway_reactions`), and BioModels (`biomodels_search`) for these data.

---

## Fallback Strategies

### Enrichment Analysis
- **Primary**: Enrichr with KEGG library
- **Fallback**: Try alternative libraries (Reactome, GO Biological Process)
- **If all fail**: Note "enrichment analysis unavailable" and continue

### Protein Mapping
- **Primary**: Reactome protein-pathway mapping
- **Fallback**: Use keyword search with protein name
- **If empty**: Check if protein ID valid; suggest checking gene symbol

### Keyword Search
- **Primary**: Search all databases (KEGG, WikiPathways, Pathway Commons, BioModels)
- **Fallback**: If all empty, broaden keyword (e.g., "diabetes" → "glucose")
- **If still empty**: Note "no pathways found for [keyword]"

---

## Limitations & Known Issues

- **Reactome**: Strong human coverage; limited for non-model organisms
- **KEGG**: Requires keyword match; may miss synonyms
- **WikiPathways**: Variable curation quality; check pathway version dates
- **Pathway Commons**: Aggregation may have duplicates; check source attribution
- **BioModels**: Sparse for many processes; often returns no results
- **Enrichr**: Requires gene symbols (not IDs); case-sensitive

**Best for**: Gene set analysis, protein function investigation, pathway discovery, systems-level biology