---
name: tooluniverse-infectious-disease
description: Rapid pathogen characterization and drug repurposing for outbreaks. Combines pathogen genomics (NCBI, BVBRC), host immune response (IEDB), drug-target databases (ChEMBL, DGIdb), and literature surveillance (PubMed/EuropePMC). Use for emerging-pathogen profiling, antiviral candidate identification, and outbreak intelligence reporting.
disable-model-invocation: true
---

## COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

# Infectious Disease Outbreak Intelligence

Rapid response system for emerging pathogens using taxonomy analysis, target identification, structure prediction, and computational drug repurposing.

**KEY PRINCIPLES**:
1. **Speed is critical** - Optimize for rapid actionable intelligence
2. **Target essential proteins** - Focus on conserved, essential viral/bacterial proteins
3. **Leverage existing drugs** - Prioritize FDA-approved compounds for repurposing
4. **Structure-guided** - Use NvidiaNIM for rapid structure prediction and docking
5. **Evidence-graded** - Grade repurposing candidates by evidence strength
6. **Actionable output** - Prioritized drug candidates with rationale
7. **English-first queries** - Always use English terms in tool calls; respond in user's language

**REASONING STRATEGY — Start Here**:
Start with pathogen identification: What type of organism? (virus, bacteria, fungus, parasite). Then ask:
- What are the essential proteins? (required for replication or viability — cannot be mutated away)
- Which are surface-exposed? (accessible to drugs and antibodies)
- Which are conserved across strains? (targeting conserved regions prevents resistance escape)
These three questions define your drug targets and vaccine candidates. Organisms in the same genus share targets — look up drug precedent for related pathogens before predicting from scratch.

**LOOK UP DON'T GUESS**: Never assume a pathogen's taxonomy, genome size, or protein function. Always call `BVBRC_search_taxonomy` or `UniProt_search` first. Even well-known pathogens have strains with different drug susceptibility profiles — look up the specific strain when known.

---

## When to Use

Apply when user asks:
- "New pathogen detected - what drugs might work?"
- "Emerging virus [X] - therapeutic options?"
- "Drug repurposing candidates for [pathogen]"
- "What do we know about [novel coronavirus/bacteria]?"
- "Essential targets in [pathogen] for drug development"
- "Can we repurpose [drug] against [pathogen]?"

---

## Critical Workflow Requirements

### 1. Report-First Approach (MANDATORY)

1. Create `[PATHOGEN]_outbreak_intelligence.md` FIRST with section headers
2. Progressively update as data is gathered
3. Output separate files: `[PATHOGEN]_drug_candidates.csv`, `[PATHOGEN]_target_proteins.csv`

### 2. Citation Requirements (MANDATORY)

Every finding must have inline source attribution:
```markdown
### Target: RNA-dependent RNA polymerase (RdRp)
- **UniProt**: P0DTD1 (NSP12)
- **Essentiality**: Required for replication
*Source: UniProt via `UniProt_search`, literature review*
```

---

## Phase 0: Tool Verification

### Known Parameter Corrections

| Tool | WRONG Parameter | CORRECT Parameter |
|------|-----------------|-------------------|
| `NCBIDatasets_get_taxonomy` | `name` | `tax_id` (integer) or use `BVBRC_search_taxonomy` for keyword search |
| `UniProt_search` | `name` | `query` |
| `ChEMBL_search_targets` | `query`, `target` | `pref_name__contains` (substring match) |
| `get_diffdock_info` | `protein_file` | `protein` (content) |
| `drugbank_full_search` | _(may fail)_ | Use `drugbank_vocab_search` as primary DrugBank lookup |

> **PubMed tip**: Use `sort="relevance"` (default) not `sort="pub_date"` — date-sorted queries can return empty for narrow topics. Tool name: `PubMed_search_articles`.
> **FDA labels**: Use `FDA_get_drug_label_info_by_field_value` with targeted `return_fields` to avoid oversized responses from `OpenFDA_search_drug_labels`.

---

## Workflow Overview

```
Phase 1: Pathogen Identification
├── Taxonomic classification (NCBI Taxonomy)
├── Closest relatives (for knowledge transfer)
├── Genome/proteome availability
└── OUTPUT: Pathogen profile
    |
Phase 2: Target Identification
├── Essential genes/proteins (UniProt)
├── Conservation across strains
├── Druggability assessment (ChEMBL)
└── OUTPUT: Prioritized target list (scored by essentiality/conservation/druggability/precedent)
    |
Phase 3: Structure Prediction (NvidiaNIM)
├── AlphaFold2/ESMFold for targets
├── Binding site identification
├── Quality assessment (pLDDT)
└── OUTPUT: Target structures (docking-ready if pLDDT > 70)
    |
Phase 4: Drug Repurposing Screen
├── Approved drugs for related pathogens (ChEMBL)
├── Broad-spectrum antivirals/antibiotics
├── Docking screen (get_diffdock_info)
└── OUTPUT: Ranked candidate drugs
    |
Phase 4.5: Pathway Analysis
├── KEGG: Pathogen metabolism pathways
├── Essential metabolic targets
├── Host-pathogen interaction pathways
└── OUTPUT: Pathway-based drug targets
    |
Phase 5: Literature Intelligence
├── PubMed: Published outbreak reports
├── BioRxiv/MedRxiv: Recent preprints (CRITICAL for outbreaks)
├── ArXiv: Computational/ML preprints
├── OpenAlex: Citation tracking
├── ClinicalTrials.gov: Active trials
└── OUTPUT: Evidence synthesis
    |
Phase 6: Report Synthesis
├── Top drug candidates with evidence grades
├── Clinical trial opportunities
├── Recommended immediate actions
└── OUTPUT: Final report
```

---

## Phase Summaries

### Phase 1: Pathogen Identification
Classify via NCBI Taxonomy (query param). Identify related pathogens with existing drugs for knowledge transfer. Determine genome/proteome availability.

**Knowledge transfer principle**: Drugs effective against related pathogens are the highest-priority repurposing candidates. A protease inhibitor for SARS-CoV-1 is immediately relevant to SARS-CoV-2. Look up the related pathogen's approved drugs in ChEMBL before generating candidates from first principles.

### Phase 2: Target Identification
Search UniProt for pathogen proteins (reviewed). Check ChEMBL for drug precedent. Score targets by: Essentiality (30%), Conservation (25%), Druggability (25%), Drug precedent (20%). Aim for 5+ targets.

### Phase 3: Structure Prediction
Use NvidiaNIM AlphaFold2 for top 3 targets. Assess pLDDT confidence. Only dock structures with pLDDT > 70 (active site > 90 preferred). Fallback: alphafold_get_prediction or ESMFold_predict_structure.

### Phase 4: Drug Repurposing Screen
Source candidates from: related pathogen drugs, broad-spectrum antivirals, target class drugs (DGIdb). Dock top 20+ candidates via get_diffdock_info. Rank by docking score and evidence tier.

### Phase 4.5: Pathway Analysis
Use KEGG to identify essential metabolic pathways. Map host-pathogen interaction points. Identify pathway-based drug targets beyond direct protein inhibition.

### Phase 5: Literature Intelligence
Search PubMed (peer-reviewed), BioRxiv/MedRxiv (preprints - critical for outbreaks), ArXiv (computational), ClinicalTrials.gov (active trials). Track citations via OpenAlex. Note: preprints are NOT peer-reviewed.

### Phase 6: Report Synthesis
Aggregate all findings into final report. Grade every candidate. Provide 3+ immediate actions, clinical trial opportunities, and research priorities.

---

## Evidence Grading

| Tier | Symbol | Criteria | Example |
|------|--------|----------|---------|
| **T1** | [T1] | FDA approved for this pathogen | Remdesivir for COVID |
| **T2** | [T2] | Clinical trial evidence OR approved for related pathogen | Favipiravir |
| **T3** | [T3] | In vitro activity OR strong docking + mechanism | Sofosbuvir |
| **T4** | [T4] | Computational prediction only | Novel docking hits |

---

## Completeness Checklist

### Phase 1: Pathogen ID
- [ ] Taxonomic classification complete
- [ ] Related pathogens identified
- [ ] Genome/proteome availability noted

### Phase 2: Targets
- [ ] 5+ targets identified
- [ ] Essentiality documented
- [ ] Conservation assessed
- [ ] Drug precedent checked

### Phase 3: Structures
- [ ] Structures predicted for top 3 targets
- [ ] pLDDT confidence reported
- [ ] Binding sites identified

### Phase 4: Drug Screen
- [ ] 20+ candidates screened
- [ ] FDA-approved drugs prioritized
- [ ] Docking scores reported
- [ ] Top 5 candidates detailed

### Phase 5: Literature
- [ ] Recent papers summarized
- [ ] Active trials listed
- [ ] Resistance data noted

### Phase 6: Recommendations
- [ ] 3+ immediate actions
- [ ] Clinical trial opportunities
- [ ] Research priorities

---

## Fallback Chains

| Primary Tool | Fallback 1 | Fallback 2 |
|--------------|------------|------------|
| `NvidiaNIM_alphafold2` *(requires NVIDIA_API_KEY env var; free key at build.nvidia.com)* | `alphafold_get_prediction` (AlphaFold DB by UniProt) | `ESMFold_predict_structure` |
| `get_diffdock_info` | `NvidiaNIM_boltz2` *(requires NVIDIA_API_KEY env var; free key at build.nvidia.com)* | Manual docking |
| `NCBIDatasets_suggest_taxonomy` | `UniProtTaxonomy_get_taxon` | Manual classification |
| `ChEMBL_search_drugs` | `drugbank_vocab_search` | PubChem bioassays |

---

## References

| File | Contents |
|------|----------|
| [TOOLS_REFERENCE.md](TOOLS_REFERENCE.md) | Complete tool documentation |
| [phase_details.md](phase_details.md) | Detailed code examples and procedures for each phase |
| [report_template.md](report_template.md) | Report template with section headers, checklist, and evidence grading |
| [CHECKLIST.md](CHECKLIST.md) | Pre-delivery verification checklist (quality, citations, docking) |
| [EXAMPLES.md](EXAMPLES.md) | Full worked examples (coronavirus, CRKP, limited-info scenarios) |