--- name: biomni description: Autonomous biomedical AI agent framework for executing complex research tasks across genomics, drug discovery, molecular biology, and clinical analysis. Use this skill when conducting multi-step biomedical research including CRISPR screening design, single-cell RNA-seq analysis, ADMET prediction, GWAS interpretation, rare disease diagnosis, or lab protocol optimization. Leverages LLM reasoning with code execution and integrated biomedical databases. --- # Biomni ## Overview Biomni is an open-source biomedical AI agent framework from Stanford's SNAP lab that autonomously executes complex research tasks across biomedical domains. Use this skill when working on multi-step biological reasoning tasks, analyzing biomedical data, or conducting research spanning genomics, drug discovery, molecular biology, and clinical analysis. ## Core Capabilities Biomni excels at: 1. **Multi-step biological reasoning** - Autonomous task decomposition and planning for complex biomedical queries 2. **Code generation and execution** - Dynamic analysis pipeline creation for data processing 3. **Knowledge retrieval** - Access to ~11GB of integrated biomedical databases and literature 4. **Cross-domain problem solving** - Unified interface for genomics, proteomics, drug discovery, and clinical tasks ## When to Use This Skill Use biomni for: - **CRISPR screening** - Design screens, prioritize genes, analyze knockout effects - **Single-cell RNA-seq** - Cell type annotation, differential expression, trajectory analysis - **Drug discovery** - ADMET prediction, target identification, compound optimization - **GWAS analysis** - Variant interpretation, causal gene identification, pathway enrichment - **Clinical genomics** - Rare disease diagnosis, variant pathogenicity, phenotype-genotype mapping - **Lab protocols** - Protocol optimization, literature synthesis, experimental design ## Quick Start ### Installation and Setup Biomni requires conda environment setup and API keys for LLM providers: ```bash # Clone repository and set up environment git clone https://github.com/snap-stanford/biomni cd biomni bash setup.sh # Or install via pip conda activate biomni_e1 pip install biomni --upgrade ``` Configure API keys (store in `.env` file or environment variables): ```bash export ANTHROPIC_API_KEY="your-key-here" # Optional: OpenAI, Azure, Google, Groq, AWS Bedrock keys ``` Use `scripts/setup_environment.py` for interactive setup assistance. ### Basic Usage Pattern ```python from biomni.agent import A1 # Initialize agent with data path and LLM choice agent = A1(path='./data', llm='claude-sonnet-4-20250514') # Execute biomedical task autonomously agent.go("Your biomedical research question or task") # Save conversation history and results agent.save_conversation_history("report.pdf") ``` ## Working with Biomni ### 1. Agent Initialization The A1 class is the primary interface for biomni: ```python from biomni.agent import A1 from biomni.config import default_config # Basic initialization agent = A1( path='./data', # Path to data lake (~11GB downloaded on first use) llm='claude-sonnet-4-20250514' # LLM model selection ) # Advanced configuration default_config.llm = "gpt-4" default_config.timeout_seconds = 1200 default_config.max_iterations = 50 ``` **Supported LLM Providers:** - Anthropic Claude (recommended): `claude-sonnet-4-20250514`, `claude-opus-4-20250514` - OpenAI: `gpt-4`, `gpt-4-turbo` - Azure OpenAI: via Azure configuration - Google Gemini: `gemini-2.0-flash-exp` - Groq: `llama-3.3-70b-versatile` - AWS Bedrock: Various models via Bedrock API See `references/llm_providers.md` for detailed LLM configuration instructions. ### 2. Task Execution Workflow Biomni follows an autonomous agent workflow: ```python # Step 1: Initialize agent agent = A1(path='./data', llm='claude-sonnet-4-20250514') # Step 2: Execute task with natural language query result = agent.go(""" Design a CRISPR screen to identify genes regulating autophagy in HEK293 cells. Prioritize genes based on essentiality and pathway relevance. """) # Step 3: Review generated code and analysis # Agent autonomously: # - Decomposes task into sub-steps # - Retrieves relevant biological knowledge # - Generates and executes analysis code # - Interprets results and provides insights # Step 4: Save results agent.save_conversation_history("autophagy_screen_report.pdf") ``` ### 3. Common Task Patterns #### CRISPR Screening Design ```python agent.go(""" Design a genome-wide CRISPR knockout screen for identifying genes affecting [phenotype] in [cell type]. Include: 1. sgRNA library design 2. Gene prioritization criteria 3. Expected hit genes based on pathway analysis """) ``` #### Single-Cell RNA-seq Analysis ```python agent.go(""" Analyze this single-cell RNA-seq dataset: - Perform quality control and filtering - Identify cell populations via clustering - Annotate cell types using marker genes - Conduct differential expression between conditions File path: [path/to/data.h5ad] """) ``` #### Drug ADMET Prediction ```python agent.go(""" Predict ADMET properties for these drug candidates: [SMILES strings or compound IDs] Focus on: - Absorption (Caco-2 permeability, HIA) - Distribution (plasma protein binding, BBB penetration) - Metabolism (CYP450 interaction) - Excretion (clearance) - Toxicity (hERG liability, hepatotoxicity) """) ``` #### GWAS Variant Interpretation ```python agent.go(""" Interpret GWAS results for [trait/disease]: - Identify genome-wide significant variants - Map variants to causal genes - Perform pathway enrichment analysis - Predict functional consequences Summary statistics file: [path/to/gwas_summary.txt] """) ``` See `references/use_cases.md` for comprehensive task examples across all biomedical domains. ### 4. Data Integration Biomni integrates ~11GB of biomedical knowledge sources: - **Gene databases** - Ensembl, NCBI Gene, UniProt - **Protein structures** - PDB, AlphaFold - **Clinical datasets** - ClinVar, OMIM, HPO - **Literature indices** - PubMed abstracts, biomedical ontologies - **Pathway databases** - KEGG, Reactome, GO Data is automatically downloaded to the specified `path` on first use. ### 5. MCP Server Integration Extend biomni with external tools via Model Context Protocol: ```python # MCP servers can provide: # - FDA drug databases # - Web search for literature # - Custom biomedical APIs # - Laboratory equipment interfaces # Configure MCP servers in .biomni/mcp_config.json ``` ### 6. Evaluation Framework Benchmark agent performance on biomedical tasks: ```python from biomni.eval import BiomniEval1 evaluator = BiomniEval1() # Evaluate on specific task types score = evaluator.evaluate( task_type='crispr_design', instance_id='test_001', answer=agent_output ) # Access evaluation dataset dataset = evaluator.load_dataset() ``` ## Best Practices ### Task Formulation - **Be specific** - Include biological context, organism, cell type, conditions - **Specify outputs** - Clearly state desired analysis outputs and formats - **Provide data paths** - Include file paths for datasets to analyze - **Set constraints** - Mention time/computational limits if relevant ### Security Considerations ⚠️ **Important**: Biomni executes LLM-generated code with full system privileges. For production use: - Run in isolated environments (Docker, VMs) - Avoid exposing sensitive credentials - Review generated code before execution in sensitive contexts - Use sandboxed execution environments when possible ### Performance Optimization - **Choose appropriate LLMs** - Claude Sonnet 4 recommended for balance of speed/quality - **Set reasonable timeouts** - Adjust `default_config.timeout_seconds` for complex tasks - **Monitor iterations** - Track `max_iterations` to prevent runaway loops - **Cache data** - Reuse downloaded data lake across sessions ### Result Documentation ```python # Always save conversation history for reproducibility agent.save_conversation_history("results/project_name_YYYYMMDD.pdf") # Include in reports: # - Original task description # - Generated analysis code # - Results and interpretations # - Data sources used ``` ## Resources ### References Detailed documentation available in the `references/` directory: - **`api_reference.md`** - Complete API documentation for A1 class, configuration, and evaluation - **`llm_providers.md`** - LLM provider setup (Anthropic, OpenAI, Azure, Google, Groq, AWS) - **`use_cases.md`** - Comprehensive task examples for all biomedical domains ### Scripts Helper scripts in the `scripts/` directory: - **`setup_environment.py`** - Interactive environment and API key configuration - **`generate_report.py`** - Enhanced PDF report generation with custom formatting ### External Resources - **GitHub**: https://github.com/snap-stanford/biomni - **Web Platform**: https://biomni.stanford.edu - **Paper**: https://www.biorxiv.org/content/10.1101/2025.05.30.656746v1 - **Model**: https://huggingface.co/biomni/Biomni-R0-32B-Preview - **Evaluation Dataset**: https://huggingface.co/datasets/biomni/Eval1 ## Troubleshooting ### Common Issues **Data download fails** ```python # Manually trigger data lake download agent = A1(path='./data', llm='your-llm') # First .go() call will download data ``` **API key errors** ```bash # Verify environment variables echo $ANTHROPIC_API_KEY # Or check .env file in working directory ``` **Timeout on complex tasks** ```python from biomni.config import default_config default_config.timeout_seconds = 3600 # 1 hour ``` **Memory issues with large datasets** - Use streaming for large files - Process data in chunks - Increase system memory allocation ### Getting Help For issues or questions: - GitHub Issues: https://github.com/snap-stanford/biomni/issues - Documentation: Check `references/` files for detailed guidance - Community: Stanford SNAP lab and biomni contributors