--- name: protein-design-workflow description: > End-to-end guidance for protein design pipelines. Use this skill when: (1) Starting a new protein design project, (2) Need step-by-step workflow guidance, (3) Understanding the full design pipeline, (4) Planning compute resources and timelines, (5) Integrating multiple design tools. For tool selection, use binder-design. For QC thresholds, use protein-qc. license: MIT category: orchestration tags: [guidance, pipeline, workflow] --- # Protein Design Workflow Guide ## Standard binder design pipeline ### Overview ``` Target Preparation --> Backbone Generation --> Sequence Design | | | v v v (pdb skill) (rfdiffusion) (proteinmpnn) | | v v Structure Validation --> Filtering | | v v (alphafold/chai) (protein-qc) ``` ## Phase 1: Target preparation ### 1.1 Obtain target structure ```bash # Download from PDB curl -o target.pdb "https://files.rcsb.org/download/XXXX.pdb" ``` ### 1.2 Clean and prepare ```python # Extract target chain # Remove waters, ligands if needed # Trim to binding region + 10A buffer ``` ### 1.3 Select hotspots - Choose 3-6 exposed residues - Prefer charged/aromatic (K, R, E, D, W, Y, F) - Check surface accessibility - Verify residue numbering **Output**: `target_prepared.pdb`, hotspot list ## Phase 2: Backbone generation ### Option A: RFdiffusion (diverse exploration) ```bash modal run modal_rfdiffusion.py \ --pdb target_prepared.pdb \ --contigs "A1-150/0 70-100" \ --hotspot "A45,A67,A89" \ --num-designs 500 ``` ### Option B: BindCraft (end-to-end) ```bash modal run modal_bindcraft.py \ --target-pdb target_prepared.pdb \ --hotspots "A45,A67,A89" \ --num-designs 100 ``` **Output**: 100-500 backbone PDBs ## Phase 3: Sequence design ### For RFdiffusion backbones ```bash for backbone in backbones/*.pdb; do modal run modal_proteinmpnn.py \ --pdb-path "$backbone" \ --num-seq-per-target 8 \ --sampling-temp 0.1 done ``` **Output**: 8 sequences per backbone (800-4000 total) ## Phase 4: Structure validation ### Predict complexes ```bash # Prepare FASTA with binder + target # binder:target format for multimer modal run modal_colabfold.py \ --input-faa all_sequences.fasta \ --out-dir predictions/ ``` **Output**: AF2 predictions with pLDDT, ipTM, PAE ## Phase 5: Filtering and selection ### Apply standard thresholds ```python import pandas as pd # Load metrics designs = pd.read_csv('all_metrics.csv') # Filter filtered = designs[ (designs['pLDDT'] > 0.85) & (designs['ipTM'] > 0.50) & (designs['PAE_interface'] < 10) & (designs['scRMSD'] < 2.0) & (designs['esm2_pll'] > 0.0) ] # Rank by composite score filtered['score'] = ( 0.3 * filtered['pLDDT'] + 0.3 * filtered['ipTM'] + 0.2 * (1 - filtered['PAE_interface'] / 20) + 0.2 * filtered['esm2_pll'] ) top_designs = filtered.nlargest(50, 'score') ``` **Output**: 50-200 filtered candidates ## Resource planning ### Compute requirements | Stage | GPU | Time (100 designs) | |-------|-----|-------------------| | RFdiffusion | A10G | 30 min | | ProteinMPNN | T4 | 15 min | | ColabFold | A100 | 4-8 hours | | Filtering | CPU | 15 min | ### Total timeline - Small campaign (100 designs): 8-12 hours - Medium campaign (500 designs): 24-48 hours - Large campaign (1000+ designs): 2-5 days ## Quality checkpoints ### After backbone generation - [ ] Visual inspection of diverse backbones - [ ] Secondary structure present - [ ] No clashes with target ### After sequence design - [ ] ESM2 PLL > 0.0 for most sequences - [ ] No unwanted cysteines (unless intentional) - [ ] Reasonable sequence diversity ### After validation - [ ] pLDDT > 0.85 - [ ] ipTM > 0.50 - [ ] PAE_interface < 10 - [ ] Self-consistency RMSD < 2.0 A ### Final selection - [ ] Diverse sequences (cluster if needed) - [ ] Manufacturable (no problematic motifs) - [ ] Reasonable molecular weight ## Common issues | Problem | Solution | |---------|----------| | Low ipTM | Check hotspots, increase designs | | Poor diversity | Higher temperature, more backbones | | High scRMSD | Backbone may be unusual | | Low pLDDT | Check design quality | ## Advanced workflows ### Multi-tool combination 1. RFdiffusion for initial backbones 2. ColabDesign for refinement 3. ProteinMPNN diversification 4. AF2 final validation ### Iterative refinement 1. Run initial campaign 2. Analyze failures 3. Adjust hotspots/parameters 4. Repeat with insights