--- name: boltzgen description: > All-atom protein design using BoltzGen diffusion model. Use this skill when: (1) Need side-chain aware design from the start, (2) Designing around small molecules or ligands, (3) Want all-atom diffusion (not just backbone), (4) Require precise binding geometries, (5) Using YAML-based configuration. For backbone-only generation, use rfdiffusion. For sequence-only design, use proteinmpnn. For structure validation, use boltz. license: MIT category: design-tools tags: [structure-design, sequence-design, diffusion, all-atom, binder] proteinbase_slug: boltzgen proteinbase_url: https://proteinbase.com/design-methods/boltzgen biomodals_script: modal_boltzgen.py --- # BoltzGen All-Atom Design ## Prerequisites | Requirement | Minimum | Recommended | |-------------|---------|-------------| | Python | 3.10+ | 3.11 | | CUDA | 12.0+ | 12.1+ | | GPU VRAM | 24GB | 48GB (L40S) | | RAM | 32GB | 64GB | ## How to run > **First time?** See [Installation Guide](../../docs/installation.md) to set up Modal and biomodals. ### Option 1: Modal (recommended) ```bash # Clone biomodals git clone https://github.com/hgbrian/biomodals && cd biomodals # Run BoltzGen (requires YAML config file) modal run modal_boltzgen.py \ --input-yaml binder_config.yaml \ --protocol protein-anything \ --num-designs 50 # With custom GPU GPU=L40S modal run modal_boltzgen.py \ --input-yaml binder_config.yaml \ --protocol protein-anything \ --num-designs 100 ``` **GPU**: L40S (48GB) recommended | **Timeout**: 120min default **Available protocols**: `protein-anything`, `peptide-anything`, `protein-small_molecule`, `nanobody-anything`, `antibody-anything` ### Option 2: Local installation ```bash git clone https://github.com/HannesStark/boltzgen.git cd boltzgen pip install -e . python sample.py config=config.yaml ``` ### Option 3: Python API ```python from boltzgen import BoltzGen model = BoltzGen.load_pretrained() designs = model.sample( target_pdb="target.pdb", num_samples=50, binder_length=80 ) ``` **GPU**: L40S (48GB) | **Time**: ~30-60s per design ## Key parameters (CLI) | Parameter | Default | Description | |-----------|---------|-------------| | `--input-yaml` | required | Path to YAML design specification | | `--protocol` | `protein-anything` | Design protocol | | `--num-designs` | 10 | Number of designs to generate | | `--steps` | all | Pipeline steps to run (e.g., `design inverse_folding`) | ## YAML configuration BoltzGen uses an **entity-based YAML format** where you specify designed proteins and target structures as entities. **Important notes:** - Residue indices use `label_seq_id` (1-indexed), not author residue numbers - File paths are relative to the YAML file location - Target files should be in CIF format (PDB also works but CIF preferred) - Run `boltzgen check config.yaml` to verify your specification before running ### Basic Binder Config ```yaml entities: # Designed protein (variable length 80-140 residues) - protein: id: B sequence: 80..140 # Target from structure file - file: path: target.cif include: - chain: id: A # Specify binding site residues (optional but recommended) binding_types: - chain: id: A binding: 45,67,89 ``` ### Binder with Specific Binding Site ```yaml entities: - protein: id: G sequence: 60..100 - file: path: 5cqg.cif include: - chain: id: A binding_types: - chain: id: A binding: 343,344,251 structure_groups: "all" ``` ### Peptide Design (Cyclic) ```yaml entities: - protein: id: S sequence: 10..14C6C3 # With cysteines for disulfide - file: path: target.cif include: - chain: id: A constraints: - bond: atom1: [S, 11, SG] atom2: [S, 18, SG] # Disulfide bond ``` ## Design protocols | Protocol | Use Case | |----------|----------| | `protein-anything` | Design proteins to bind proteins or peptides | | `peptide-anything` | Design cyclic peptides to bind proteins | | `protein-small_molecule` | Design proteins to bind small molecules | | `nanobody-anything` | Design nanobody CDRs | | `antibody-anything` | Design antibody CDRs | ## Output format ``` output/ ├── sample_0/ │ ├── design.cif # All-atom structure (CIF format) │ ├── metrics.json # Confidence scores │ └── sequence.fasta # Sequence ├── sample_1/ │ └── ... └── summary.csv ``` **Note**: BoltzGen outputs CIF format. Convert to PDB if needed: ```python from Bio.PDB import MMCIFParser, PDBIO parser = MMCIFParser() structure = parser.get_structure("design", "design.cif") io = PDBIO() io.set_structure(structure) io.save("design.pdb") ``` ## Sample output ### Successful run ``` $ modal run modal_boltzgen.py --input-yaml binder.yaml --protocol protein-anything --num-designs 10 Running: boltzgen run binder.yaml --output /tmp/out --protocol protein-anything --num_designs 10 [INFO] Loading BoltzGen model... [INFO] Generating designs... [INFO] Running inverse folding... [INFO] Running structure prediction... [INFO] Filtering and ranking... [INFO] Pipeline complete Results saved to: ./out/boltzgen/2501161234/ ``` **Output directory structure:** ``` out/boltzgen/2501161234/ ├── intermediate_designs/ # Raw diffusion outputs │ ├── design_0.cif │ └── design_0.npz ├── intermediate_designs_inverse_folded/ │ ├── refold_cif/ # Refolded complexes │ └── aggregate_metrics_analyze.csv └── final_ranked_designs/ ├── final_10_designs/ # Top designs └── results_overview.pdf # Summary plots ``` **What good output looks like:** - Refolding RMSD < 2.0A (design folds as predicted) - ipTM > 0.5 (confident interface) - All designs complete pipeline without errors ## Decision tree ``` Should I use BoltzGen? │ ├─ What type of design? │ ├─ All-atom precision needed → BoltzGen ✓ │ ├─ Ligand binding pocket → BoltzGen ✓ │ └─ Standard miniprotein → RFdiffusion (faster) │ ├─ What matters most? │ ├─ Side-chain packing → BoltzGen ✓ │ ├─ Speed / diversity → RFdiffusion │ ├─ Highest success rate → BindCraft │ └─ AF2 optimization → ColabDesign │ └─ Compute resources? ├─ Have L40S/A100 (48GB+) → BoltzGen ✓ └─ Only A10G (24GB) → Consider RFdiffusion ``` ## Typical performance | Campaign Size | Time (L40S) | Cost (Modal) | Notes | |---------------|-------------|--------------|-------| | 50 designs | 30-45 min | ~$8 | Quick exploration | | 100 designs | 1-1.5h | ~$15 | Standard campaign | | 500 designs | 5-8h | ~$70 | Large campaign | **Per-design**: ~30-60s for typical binder. --- ## Verify ```bash find output -name "*.cif" | wc -l # Should match num_samples ``` --- ## Troubleshooting **Verify config first**: Always run `boltzgen check config.yaml` before running the full pipeline **Slow generation**: Use fewer designs for initial testing, then scale up **OOM errors**: Use A100-80GB or reduce `--num-designs` **Wrong binding site**: Residue indices use `label_seq_id` (1-indexed), check in Molstar viewer ### Error interpretation | Error | Cause | Fix | |-------|-------|-----| | `RuntimeError: CUDA out of memory` | Large design or long protein | Use A100-80GB or reduce designs | | `FileNotFoundError: *.cif` | Target file not found | File paths are relative to YAML location | | `ValueError: invalid chain` | Chain not in target | Verify chain IDs with Molstar or PyMOL | | `modal: command not found` | Modal CLI not installed | Run `pip install modal && modal setup` | --- **Next**: Validate with `boltz` or `chai` → `protein-qc` for filtering.