---
name: chai
description: >
  Structure prediction using Chai-1, a foundation model for molecular structure.
  Use this skill when: (1) Predicting protein-protein complex structures,
  (2) Validating designed binders,
  (3) Predicting protein-ligand complexes,
  (4) Using the Chai API for high-throughput prediction,
  (5) Need an alternative to AlphaFold2.

  For QC thresholds, use protein-qc.
  For AlphaFold2 prediction, use alphafold.
  For ESM-based analysis, use esm.
license: MIT
category: design-tools
tags: [structure-prediction, validation, foundation-model]
biomodals_script: modal_chai1.py
---

# Chai-1 Structure Prediction

## Prerequisites

| Requirement | Minimum | Recommended |
|-------------|---------|-------------|
| Python | 3.10+ | 3.11 |
| CUDA | 12.0+ | 12.1+ |
| GPU VRAM | 24GB | 40GB (A100) |
| RAM | 32GB | 64GB |

## How to run

> **First time?** See [Installation Guide](../../docs/installation.md) to set up Modal and biomodals.

### Option 1: Modal
```bash
cd biomodals
modal run modal_chai1.py \
  --input-faa complex.fasta \
  --out-dir predictions/
```

**GPU**: A100 (40GB) | **Timeout**: 30min default

### Option 2: Chai API (recommended)
```bash
pip install chai_lab

python -c "
import chai_lab
from chai_lab.chai1 import run_inference

# Run prediction
run_inference(
    fasta_file='complex.fasta',
    output_dir='predictions/',
    num_trunk_recycles=3
)
"
```

### Option 3: Local installation
```bash
git clone https://github.com/chaidiscovery/chai-lab.git
cd chai-lab
pip install -e .

chai-lab predict \
  --fasta complex.fasta \
  --output predictions/
```

## FASTA Format

### Protein complex
```
>binder
MKTAYIAKQRQISFVKSHFSRQLE...
>target
MVLSPADKTNVKAAWGKVGAHAGE...
```

### Protein + ligand
```
>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>ligand|smiles
CCO
```

### Protein + DNA/RNA
```
>protein
MKTAYIAKQRQISFVKSHFSRQLE...
>dna
ATCGATCGATCG
```

## Key parameters

| Parameter | Default | Range | Description |
|-----------|---------|-------|-------------|
| `num_trunk_recycles` | 3 | 1-10 | Recycles (more = better) |
| `num_diffn_timesteps` | 200 | 50-500 | Diffusion steps |
| `seed` | 0 | int | Random seed |

## Output format

```
predictions/
├── pred.model_idx_0.cif    # Best model (CIF format)
├── pred.model_idx_1.cif    # Second model
├── scores.json             # Confidence scores
├── pae.npy                 # PAE matrix
└── plddt.npy               # pLDDT values
```

**Note**: Chai-1 outputs CIF format. Convert to PDB if needed:
```python
from Bio.PDB import MMCIFParser, PDBIO
parser = MMCIFParser()
structure = parser.get_structure("pred", "pred.model_idx_0.cif")
io = PDBIO()
io.set_structure(structure)
io.save("pred.model_idx_0.pdb")
```

### Extracting metrics
```python
import numpy as np
import json

# Load scores
with open('predictions/scores.json') as f:
    scores = json.load(f)

plddt = np.load('predictions/plddt.npy')
pae = np.load('predictions/pae.npy')

print(f"pLDDT: {plddt.mean():.3f}")
print(f"pTM: {scores['ptm']:.3f}")
print(f"ipTM: {scores.get('iptm', 'N/A')}")
```

## Use cases

### Binder validation
```python
# Predict complex with Chai
chai-lab predict --fasta binder_target.fasta --output val/

# Check ipTM > 0.5
scores = json.load(open('val/scores.json'))
if scores['iptm'] > 0.5:
    print("Design passes validation")
```

### Protein-ligand complex
```python
# FASTA with SMILES
fasta = """
>protein
MKTA...
>ligand|smiles
CCO
"""

# Chai handles both protein and small molecules
```

### Batch prediction
```bash
# Multiple sequences
for fasta in sequences/*.fasta; do
    chai-lab predict \
        --fasta "$fasta" \
        --output "predictions/$(basename $fasta .fasta)"
done
```

## Comparison with AF2

| Aspect | Chai-1 | AlphaFold2 |
|--------|--------|------------|
| MSA required | No | Yes |
| Small molecules | Yes | No |
| DNA/RNA | Yes | Limited |
| Speed | Faster | Slower |
| Accuracy | Comparable | Reference |

## Sample output

### Successful run
```
$ chai-lab predict --fasta complex.fasta --output predictions/
[INFO] Loading Chai-1 model...
[INFO] Running inference...
[INFO] Saved 5 models to predictions/

predictions/scores.json:
{
  "ptm": 0.82,
  "iptm": 0.71,
  "ranking_score": 0.76
}
```

**What good output looks like:**
- pTM: > 0.7 (confident global structure)
- ipTM: > 0.5 (confident interface, > 0.7 for high confidence)
- CIF files with reasonable atom positions

## Decision tree

```
Should I use Chai?
│
├─ What are you predicting?
│  ├─ Protein-protein complex → Chai ✓ or ColabFold
│  ├─ Protein + small molecule → Chai ✓
│  ├─ Protein + DNA/RNA → Chai ✓
│  └─ Single protein only → Use ESMFold (faster)
│
├─ Need MSA?
│  ├─ No / want speed → Chai ✓
│  └─ Yes / want accuracy → ColabFold
│
└─ Priority?
   ├─ Highest accuracy → ColabFold with MSA
   ├─ Speed / no MSA → Chai ✓
   └─ Ligand binding → Chai ✓
```

## Typical performance

| Campaign Size | Time (A100) | Cost (Modal) | Notes |
|---------------|-------------|--------------|-------|
| 100 complexes | 30-60 min | ~$10 | Standard validation |
| 500 complexes | 2-4h | ~$45 | Large campaign |
| 1000 complexes | 5-8h | ~$90 | Comprehensive |

**Per-complex**: ~20-40s for typical binder-target complex.

---

## Verify

```bash
find predictions -name "*.cif" | wc -l  # Should match input count
```

---

## Troubleshooting

**Low pLDDT**: Increase num_trunk_recycles
**Low ipTM**: Check chain order, interface region
**OOM errors**: Use A100-80GB or reduce batch
**Slow prediction**: Reduce num_diffn_timesteps

### Error interpretation

| Error | Cause | Fix |
|-------|-------|-----|
| `RuntimeError: CUDA out of memory` | Complex too large | Use A100-80GB or split prediction |
| `KeyError: 'iptm'` | Single chain predicted | Ensure FASTA has multiple chains |
| `ValueError: invalid SMILES` | Malformed ligand | Validate SMILES with RDKit |
| `torch.cuda.OutOfMemoryError` | GPU exhausted | Reduce num_diffn_timesteps to 100 |

---

**Next**: `protein-qc` for filtering and ranking.