--- name: tooluniverse description: Use this skill when working with scientific research tools and workflows across bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. This skill provides access to 600+ scientific tools including machine learning models, datasets, APIs, and analysis packages. Use when searching for scientific tools, executing computational biology workflows, composing multi-step research pipelines, accessing databases like OpenTargets/PubChem/UniProt/PDB/ChEMBL, performing tool discovery for research tasks, or integrating scientific computational resources into LLM workflows. --- # ToolUniverse ## Overview ToolUniverse is a unified ecosystem that enables AI agents to function as research scientists by providing standardized access to 600+ scientific resources. Use this skill to discover, execute, and compose scientific tools across multiple research domains including bioinformatics, cheminformatics, genomics, structural biology, proteomics, and drug discovery. **Key Capabilities:** - Access 600+ scientific tools, models, datasets, and APIs - Discover tools using natural language, semantic search, or keywords - Execute tools through standardized AI-Tool Interaction Protocol - Compose multi-step workflows for complex research problems - Integration with Claude Desktop/Code via Model Context Protocol (MCP) ## When to Use This Skill Use this skill when: - Searching for scientific tools by function or domain (e.g., "find protein structure prediction tools") - Executing computational biology workflows (e.g., disease target identification, drug discovery, genomics analysis) - Accessing scientific databases (OpenTargets, PubChem, UniProt, PDB, ChEMBL, KEGG, etc.) - Composing multi-step research pipelines (e.g., target discovery → structure prediction → virtual screening) - Working with bioinformatics, cheminformatics, or structural biology tasks - Analyzing gene expression, protein sequences, molecular structures, or clinical data - Performing literature searches, pathway enrichment, or variant annotation - Building automated scientific research workflows ## Quick Start ### Basic Setup ```python from tooluniverse import ToolUniverse # Initialize and load tools tu = ToolUniverse() tu.load_tools() # Loads 600+ scientific tools # Discover tools tools = tu.run({ "name": "Tool_Finder_Keyword", "arguments": { "description": "disease target associations", "limit": 10 } }) # Execute a tool result = tu.run({ "name": "OpenTargets_get_associated_targets_by_disease_efoId", "arguments": {"efoId": "EFO_0000537"} # Hypertension }) ``` ### Model Context Protocol (MCP) For Claude Desktop/Code integration: ```bash tooluniverse-smcp ``` ## Core Workflows ### 1. Tool Discovery Find relevant tools for your research task: **Three discovery methods:** - `Tool_Finder` - Embedding-based semantic search (requires GPU) - `Tool_Finder_LLM` - LLM-based semantic search (no GPU required) - `Tool_Finder_Keyword` - Fast keyword search **Example:** ```python # Search by natural language description tools = tu.run({ "name": "Tool_Finder_LLM", "arguments": { "description": "Find tools for RNA sequencing differential expression analysis", "limit": 10 } }) # Review available tools for tool in tools: print(f"{tool['name']}: {tool['description']}") ``` **See `references/tool-discovery.md` for:** - Detailed discovery methods and search strategies - Domain-specific keyword suggestions - Best practices for finding tools ### 2. Tool Execution Execute individual tools through the standardized interface: **Example:** ```python # Execute disease-target lookup targets = tu.run({ "name": "OpenTargets_get_associated_targets_by_disease_efoId", "arguments": {"efoId": "EFO_0000616"} # Breast cancer }) # Get protein structure structure = tu.run({ "name": "AlphaFold_get_structure", "arguments": {"uniprot_id": "P12345"} }) # Calculate molecular properties properties = tu.run({ "name": "RDKit_calculate_descriptors", "arguments": {"smiles": "CCO"} # Ethanol }) ``` **See `references/tool-execution.md` for:** - Real-world execution examples across domains - Tool parameter handling and validation - Result processing and error handling - Best practices for production use ### 3. Tool Composition and Workflows Compose multiple tools for complex research workflows: **Drug Discovery Example:** ```python # 1. Find disease targets targets = tu.run({ "name": "OpenTargets_get_associated_targets_by_disease_efoId", "arguments": {"efoId": "EFO_0000616"} }) # 2. Get protein structures structures = [] for target in targets[:5]: structure = tu.run({ "name": "AlphaFold_get_structure", "arguments": {"uniprot_id": target['uniprot_id']} }) structures.append(structure) # 3. Screen compounds hits = [] for structure in structures: compounds = tu.run({ "name": "ZINC_virtual_screening", "arguments": { "structure": structure, "library": "lead-like", "top_n": 100 } }) hits.extend(compounds) # 4. Evaluate drug-likeness drug_candidates = [] for compound in hits: props = tu.run({ "name": "RDKit_calculate_drug_properties", "arguments": {"smiles": compound['smiles']} }) if props['lipinski_pass']: drug_candidates.append(compound) ``` **See `references/tool-composition.md` for:** - Complete workflow examples (drug discovery, genomics, clinical) - Sequential and parallel tool composition patterns - Output processing hooks - Workflow best practices ## Scientific Domains ToolUniverse supports 600+ tools across major scientific domains: **Bioinformatics:** - Sequence analysis, alignment, BLAST - Gene expression (RNA-seq, DESeq2) - Pathway enrichment (KEGG, Reactome, GO) - Variant annotation (VEP, ClinVar) **Cheminformatics:** - Molecular descriptors and fingerprints - Drug discovery and virtual screening - ADMET prediction and drug-likeness - Chemical databases (PubChem, ChEMBL, ZINC) **Structural Biology:** - Protein structure prediction (AlphaFold) - Structure retrieval (PDB) - Binding site detection - Protein-protein interactions **Proteomics:** - Mass spectrometry analysis - Protein databases (UniProt, STRING) - Post-translational modifications **Genomics:** - Genome assembly and annotation - Copy number variation - Clinical genomics workflows **Medical/Clinical:** - Disease databases (OpenTargets, OMIM) - Clinical trials and FDA data - Variant classification **See `references/domains.md` for:** - Complete domain categorization - Tool examples by discipline - Cross-domain applications - Search strategies by domain ## Reference Documentation This skill includes comprehensive reference files that provide detailed information for specific aspects: - **`references/installation.md`** - Installation, setup, MCP configuration, platform integration - **`references/tool-discovery.md`** - Discovery methods, search strategies, listing tools - **`references/tool-execution.md`** - Execution patterns, real-world examples, error handling - **`references/tool-composition.md`** - Workflow composition, complex pipelines, parallel execution - **`references/domains.md`** - Tool categorization by domain, use case examples - **`references/api_reference.md`** - Python API documentation, hooks, protocols **Workflow:** When helping with specific tasks, reference the appropriate file for detailed instructions. For example, if searching for tools, consult `references/tool-discovery.md` for search strategies. ## Example Scripts Two executable example scripts demonstrate common use cases: **`scripts/example_tool_search.py`** - Demonstrates all three discovery methods: - Keyword-based search - LLM-based search - Domain-specific searches - Getting detailed tool information **`scripts/example_workflow.py`** - Complete workflow examples: - Drug discovery pipeline (disease → targets → structures → screening → candidates) - Genomics analysis (expression data → differential analysis → pathways) Run examples to understand typical usage patterns and workflow composition. ## Best Practices 1. **Tool Discovery:** - Start with broad searches, then refine based on results - Use `Tool_Finder_Keyword` for fast searches with known terms - Use `Tool_Finder_LLM` for complex semantic queries - Set appropriate `limit` parameter (default: 10) 2. **Tool Execution:** - Always verify tool parameters before execution - Implement error handling for production workflows - Validate input data formats (SMILES, UniProt IDs, gene symbols) - Check result types and structures 3. **Workflow Composition:** - Test each step individually before composing full workflows - Implement checkpointing for long workflows - Consider rate limits for remote APIs - Use parallel execution when tools are independent 4. **Integration:** - Initialize ToolUniverse once and reuse the instance - Call `load_tools()` once at startup - Cache frequently used tool information - Enable logging for debugging ## Key Terminology - **Tool**: A scientific resource (model, dataset, API, package) accessible through ToolUniverse - **Tool Discovery**: Finding relevant tools using search methods (Finder, LLM, Keyword) - **Tool Execution**: Running a tool with specific arguments via `tu.run()` - **Tool Composition**: Chaining multiple tools for multi-step workflows - **MCP**: Model Context Protocol for integration with Claude Desktop/Code - **AI-Tool Interaction Protocol**: Standardized interface for LLM-tool communication ## Resources - **Official Website**: https://aiscientist.tools - **GitHub**: https://github.com/mims-harvard/ToolUniverse - **Documentation**: https://zitniklab.hms.harvard.edu/ToolUniverse/ - **Installation**: `uv pip install tooluniverse` - **MCP Server**: `tooluniverse-smcp`