--- name: protein-qc description: > Quality control metrics and filtering thresholds for protein design. Use this skill when: (1) Evaluating design quality for binding, expression, or structure, (2) Setting filtering thresholds for pLDDT, ipTM, PAE, (3) Checking sequence liabilities (cysteines, deamidation, polybasic clusters), (4) Creating multi-stage filtering pipelines, (5) Computing PyRosetta interface metrics (dG, SC, dSASA), (6) Checking biophysical properties (instability, GRAVY, pI), (7) Ranking designs with composite scoring. This skill provides research-backed thresholds from binder design competitions and published benchmarks. license: MIT category: evaluation tags: [qc, filtering, metrics, thresholds] --- # Protein Design Quality Control ## Critical Limitation **Individual metrics have weak predictive power for binding**. Research shows: - Individual metric ROC AUC: 0.64-0.66 (slightly better than random) - Metrics are **pre-screening filters**, not affinity predictors - **Composite scoring is essential** for meaningful ranking These thresholds filter out poor designs but do NOT predict binding affinity. ## QC Organization QC is organized by **purpose** and **level**: | Purpose | What it assesses | Key metrics | |---------|------------------|-------------| | **Binding** | Interface quality, binding geometry | ipTM, PAE, SC, dG, dSASA | | **Expression** | Manufacturability, solubility | Instability, GRAVY, pI, cysteines | | **Structural** | Fold confidence, consistency | pLDDT, pTM, scRMSD | Each category has two levels: - **Metric-level**: Calculated values with thresholds (pLDDT > 0.85) - **Design-level**: Pattern/motif detection (odd cysteines, NG sites) --- ## Quick Reference: All Thresholds | Category | Metric | Standard | Stringent | Source | |----------|--------|----------|-----------|--------| | **Structural** | pLDDT | > 0.85 | > 0.90 | AF2/Chai/Boltz | | | pTM | > 0.70 | > 0.80 | AF2/Chai/Boltz | | | scRMSD | < 2.0 Å | < 1.5 Å | Design vs pred | | **Binding** | ipTM | > 0.50 | > 0.60 | AF2/Chai/Boltz | | | PAE_interaction | < 12 Å | < 10 Å | AF2/Chai/Boltz | | | Shape Comp (SC) | > 0.50 | > 0.60 | PyRosetta | | | interface_dG | < -10 | < -15 | PyRosetta | | **Expression** | Instability | < 40 | < 30 | BioPython | | | GRAVY | < 0.4 | < 0.2 | BioPython | | | ESM2 PLL | > 0.0 | > 0.2 | ESM2 | ### Design-Level Checks (Expression) | Pattern | Risk | Action | |---------|------|--------| | Odd cysteine count | Unpaired disulfides | Redesign | | NG/NS/NT motifs | Deamidation | Flag/avoid | | K/R >= 3 consecutive | Proteolysis | Flag | | >= 6 hydrophobic run | Aggregation | Redesign | See: references/binding-qc.md, references/expression-qc.md, references/structural-qc.md --- ## Sequential Filtering Pipeline ```python import pandas as pd designs = pd.read_csv('designs.csv') # Stage 1: Structural confidence designs = designs[designs['pLDDT'] > 0.85] # Stage 2: Self-consistency designs = designs[designs['scRMSD'] < 2.0] # Stage 3: Binding quality designs = designs[(designs['ipTM'] > 0.5) & (designs['PAE_interaction'] < 10)] # Stage 4: Sequence plausibility designs = designs[designs['esm2_pll_normalized'] > 0.0] # Stage 5: Expression checks (design-level) designs = designs[designs['cysteine_count'] % 2 == 0] # Even cysteines designs = designs[designs['instability_index'] < 40] ``` --- ## Composite Scoring (Required for Ranking) Individual metrics alone are too weak. Use composite scoring: ```python def composite_score(row): return ( 0.30 * row['pLDDT'] + 0.20 * row['ipTM'] + 0.20 * (1 - row['PAE_interaction'] / 20) + 0.15 * row['shape_complementarity'] + 0.15 * row['esm2_pll_normalized'] ) designs['score'] = designs.apply(composite_score, axis=1) top_designs = designs.nlargest(100, 'score') ``` For advanced composite scoring, see references/composite-scoring.md. --- ## Tool-Specific Filtering ### BindCraft Filter Levels | Level | Use Case | Stringency | |-------|----------|------------| | Default | Standard design | Most stringent | | Relaxed | Need more designs | Higher failure rate | | Peptide | Designs < 30 AA | ~5-10x lower success | ### BoltzGen Filtering ```bash boltzgen run ... \ --budget 60 \ --alpha 0.01 \ --filter_biased true \ --refolding_rmsd_threshold 2.0 \ --additional_filters 'ALA_fraction<0.3' ``` - `alpha=0.0`: Quality-only ranking - `alpha=0.01`: Default (slight diversity) - `alpha=1.0`: Diversity-only --- ## Design-Level Severity Scoring For pattern-based checks, use severity scoring: | Severity Level | Score | Action | |----------------|-------|--------| | LOW | 0-15 | Proceed | | MODERATE | 16-35 | Review flagged issues | | HIGH | 36-60 | Redesign recommended | | CRITICAL | 61+ | Redesign required | --- ## Experimental Correlation | Metric | AUC | Use | |--------|-----|-----| | ipTM | ~0.64 | Pre-screening | | PAE | ~0.65 | Pre-screening | | ESM2 PLL | ~0.72 | Best single metric | | Composite | ~0.75+ | **Always use** | **Key insight**: Metrics work as **filters** (eliminating failures) not **predictors** (ranking successes). --- ## Campaign Health Assessment Quick assessment of your design campaign: | Pass Rate | Status | Interpretation | |-----------|--------|----------------| | > 15% | Excellent | Above average, proceed | | 10-15% | Good | Normal, proceed | | 5-10% | Marginal | Below average, review issues | | < 5% | Poor | Significant problems, diagnose | --- ## Failure Recovery Trees ### Too Few Pass pLDDT Filter (< 5% with pLDDT > 0.85) ``` Low pLDDT across campaign ├── Check scRMSD distribution │ ├── High scRMSD (>2.5Å): Backbone issue │ │ └── Fix: Regenerate backbones with lower noise_scale (0.5-0.8) │ └── Low scRMSD but low pLDDT: Disordered regions │ └── Fix: Check design length, simplify topology ├── Try more sequences per backbone │ └── modal run modal_proteinmpnn.py --num-seq-per-target 32 --sampling-temp 0.1 ├── Use SolubleMPNN instead of ProteinMPNN │ └── Better for expression-optimized sequences └── Consider different design tool └── BindCraft (integrated design) may work better ``` ### Too Few Pass ipTM Filter (< 5% with ipTM > 0.5) ``` Low ipTM across campaign ├── Review hotspot selection │ ├── Are hotspots surface-exposed? (SASA > 20Ų) │ ├── Are hotspots conserved? (check MSA) │ └── Try 3-6 different hotspot combinations ├── Increase binder length (more contact area) │ └── Try 80-100 AA instead of 60-80 AA ├── Check interface geometry │ ├── Is target flat? → Try helical binders │ └── Is target concave? → Try smaller binders └── Try all-atom design tool └── BoltzGen (all-atom, better packing) ``` ### High scRMSD (> 50% with scRMSD > 2.0Å) ``` Sequences don't specify intended structure ├── ProteinMPNN issue │ ├── Lower temperature: --sampling-temp 0.1 │ ├── Increase sequences: --num-seq-per-target 32 │ └── Check fixed_positions aren't over-constraining ├── Backbone geometry issue │ ├── Backbones may be unusual/strained │ ├── Regenerate with lower noise_scale (0.5-0.8) │ └── Reduce diffuser.T to 30-40 └── Try different sequence design └── ColabDesign (AF2 gradient-based) may work better ``` ### Everything Passes But No Experimental Hits ``` In silico metrics don't predict affinity ├── Generate MORE designs (10x current) │ └── Computational metrics have high false positive rate ├── Increase diversity │ ├── Higher ProteinMPNN temperature (0.2-0.3) │ ├── Different backbone topologies │ └── Different hotspot combinations ├── Try different design approach │ ├── BindCraft (different algorithm) │ ├── ColabDesign (AF2 hallucination) │ └── BoltzGen (all-atom diffusion) └── Check if target is druggable └── Some targets are inherently difficult ``` ### Too Many Designs Pass (> 50%) ``` Suspiciously high pass rate ├── Check if thresholds are too lenient │ └── Use stringent thresholds: pLDDT > 0.90, ipTM > 0.60 ├── Verify prediction quality │ ├── Are predictions actually running? Check output files │ └── Are complexes being predicted, not just monomers? ├── Check for data issues │ ├── Same sequence being predicted multiple times? │ └── Wrong FASTA format (missing chain separator)? └── Apply diversity filter └── Cluster at 70% identity, take top per cluster ``` --- ## Diagnostic Commands ### Quick Campaign Assessment ```python import pandas as pd df = pd.read_csv('designs.csv') # Pass rates at each stage print(f"Total designs: {len(df)}") print(f"pLDDT > 0.85: {(df['pLDDT'] > 0.85).mean():.1%}") print(f"ipTM > 0.50: {(df['ipTM'] > 0.50).mean():.1%}") print(f"scRMSD < 2.0: {(df['scRMSD'] < 2.0).mean():.1%}") print(f"All filters: {((df['pLDDT'] > 0.85) & (df['ipTM'] > 0.5) & (df['scRMSD'] < 2.0)).mean():.1%}") # Identify top issue if (df['pLDDT'] > 0.85).mean() < 0.1: print("ISSUE: Low pLDDT - check backbone or sequence quality") elif (df['ipTM'] > 0.50).mean() < 0.1: print("ISSUE: Low ipTM - check hotspots or interface geometry") elif (df['scRMSD'] < 2.0).mean() < 0.5: print("ISSUE: High scRMSD - sequences don't specify backbone") ``` ---