[![Open In Colab](../../_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz.ipynb)
[![Get Notebook](../../_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_Boltz.ipynb)
[![View In GitHub](../../_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz.ipynb)

# Using Boltz
This tutorial demonstrates how to use the Boltz-2 model to predict the
structure of a molecular complex, including proteins and ligands. We
will also show how to request and retrieve predicted binding affinities
and other quality metrics.

# What you need before getting started

First, ensure you have an active `OpenProtein` session. Then, import the
necessary classes for defining the components of your complex.

In [1]:
import openprotein
from openprotein.protein import Protein
from openprotein.chains import Ligand

# Login to your session
session = openprotein.connect()

# Defining the Molecules

Boltz-2 can model various molecule types, including proteins, ligands,
DNA, and RNA. For this example, we'll predict the structure of a protein
dimer in complex with a ligand.

We will define a dimer and one ligand. When using Boltz models, we can
specify that a `Protein` is meant to be an oligomer by specifying
multiple ids in the `chain_id`. In this case, the protein is a dimer
since we have `["A", "B"]`.

Note that for affinity prediction, the ligand that is binding must have
a single, unique string for its `chain_id`.

In [2]:
# Define the proteins
proteins = [
    Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ"),
]
proteins[0].chain_id = ["A", "B"]

# You can also specify the proteins to be cyclic by setting the property
# proteins[0].cyclic = True

# Define the ligand
# We use the three-letter code for S-adenosyl-L-homocysteine (SAH)
# The chain_id 'C' is the "binder" we will reference later.
ligands = [
    Ligand(ccd="SAH", chain_id="C")
]

# Create MSA for the Protein using Homology Search

When using Boltz with protein sequences, we need to supply an MSA to
help inform the model. Otherwise, we can also explicitly set it to run
using single sequence mode. You have to specify `protein.msa` either an
MSA or to use `Protein.single_sequence_mode`.

Here, we will be building an MSA using our platform capabilities. Take
note of the syntax here: creating an MSA with a complex uses ColabFold's
syntax of joining sequences with `:`.

In [3]:
msa_query = []
for p in proteins:
    if p.chain_id is not None and isinstance(p.chain_id, list):
        for _ in p.chain_id:
            msa_query.append(p.sequence.decode())
    else:
        msa_query.append(p.sequence.decode())
msa = session.align.create_msa(seed=":".join(msa_query))

for p in proteins:
    p.msa = msa
    # If desired, use single sequence mode to specify no msa
    # p.msa = Protein.single_sequence_mode

# Predicting the Complex Structure and Affinity

Now, we can call the `fold` method on the Boltz-2 model.

The key steps are:

1.  Access the model via `session.fold.boltz2`.
2.  Pass the defined proteins and ligands.
3.  To request binding affinity prediction, include the `properties`
    argument. This argument takes a list of dictionaries. For affinity,
    you specify the `binder`, which must match the `chain_id` of a
    ligand you defined.

In [4]:
# Request the fold, including an affinity prediction for our ligand.
fold_job = session.fold.boltz2.fold(
    proteins=proteins,
    ligands=ligands,
    properties=[{"affinity": {"binder": "C"}}]
)
fold_job

FoldJob(num_records=1, job_id='ab5f4f6f-6acc-4dcb-a008-523fdea02c5b', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 21, 14, 15, 1, 358781, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

The call returns a `FoldComplexResultFuture` object immediately. This is
a reference to your job running on the OpenProtein platform. You can
monitor its status or wait for it to complete.

In [5]:
# Wait for the job to finish
fold_job.wait_until_done(verbose=True)

Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [09:26<00:00,  5.66s/it, status=SUCCESS]


True

# Retrieving the Results

Once the job is complete, you can retrieve the various outputs from the
future object.

## Getting the Structure File
The primary result is the predicted
structure, which you can retrieve as a mmCIF file. Note that we only implemented mmCIF output format for Boltz.

In [6]:
# Get the result as a PDB bytestring
result = fold_job.get()

print('\n'.join(result.decode().splitlines()[500:510])) # Print a few lines

ATOM 46 O O . ASN A 1 7 ? -9.37756 -6.15156 -7.90108 1 68.851 ? 7 A 1
ATOM 47 C CB . ASN A 1 7 ? -8.77594 -5.10971 -10.70351 1 68.851 ? 7 A 1
ATOM 48 C CG . ASN A 1 7 ? -8.48088 -3.63107 -10.55621 1 68.851 ? 7 A 1
ATOM 49 O OD1 . ASN A 1 7 ? -8.00682 -3.16609 -9.515 1 68.851 ? 7 A 1
ATOM 50 N ND2 . ASN A 1 7 ? -8.77729 -2.86805 -11.60447 1 68.851 ? 7 A 1
ATOM 51 N N . VAL A 1 8 ? -7.36867 -5.19251 -7.56931 1 78.911 ? 8 A 1
ATOM 52 C CA . VAL A 1 8 ? -7.60041 -4.94319 -6.14561 1 78.911 ? 8 A 1
ATOM 53 C C . VAL A 1 8 ? -7.8049 -3.45969 -5.84513 1 78.911 ? 8 A 1
ATOM 54 O O . VAL A 1 8 ? -7.64426 -3.02562 -4.70211 1 78.911 ? 8 A 1
ATOM 55 C CB . VAL A 1 8 ? -6.46385 -5.52776 -5.2703 1 78.911 ? 8 A 1


Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec)

In [7]:
from molviewspec import create_builder
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
    .parse(format="mmcif")\
    .model_structure()\
    .component()\
    .representation()\
    .color(color="blue")
builder.molstar_notebook(data={'mystructure.cif': result}, width=500, height=400)

<IPython.core.display.Javascript object>

## Getting Confidence Metrics (pLDDT, PAE, PDE, and Confidence Score)

Boltz provides AlphaFold3-style confidence metrics, plus an additional **PDE** output reflecting diffusion uncertainty.

- **pLDDT** (predicted Local Distance Difference Test)  
  A per-residue confidence score—commonly scaled from 0–100 (or 0.0–1.0)—indicating how reliably each residue’s coordinate is predicted.

- **PAE** (Predicted Aligned Error)  
  An N × N matrix estimating the expected error between pairs of residues, useful for assessing relative positions (e.g., domains or chains).

- **PDE** (Predicted Diffusion Error)  
  A Boltz-specific metric trained to estimate uncertainty introduced by the reverse diffusion process. Output as a per-pair matrix (often symmetric) representing diffusion-related misprediction between residue pairs.

- **Overall confidence score**  
  The confidence score combines pLDDT, interface scores (pTM/iPTM, ligand_ipTM), and PDE into a single normalized rating (typically between 0 and 1), reflecting the likely reliability of the full prediction, including binding mode and interface correctness. It also shows the scores across pairs.

By combining pLDDT with pairwise PAE and PDE, and optionally a summary confidence score, users can evaluate confidence both at the residue level and in global or interface contexts—including uncertainties introduced by diffusion sampling.


In [8]:
# Retrieve the pLDDT scores
plddt_scores = fold_job.plddt
print("pLDDT scores shape:", plddt_scores.shape)
print("First 10 scores:", plddt_scores[0, :10])

# Retrieve the PAE matrix
pae_matrix = fold_job.pae
print("\nPAE matrix shape:", pae_matrix.shape)

# Retrieve the PDE matrix
pde_matrix = fold_job.pde
print("\nPDE matrix shape:", pde_matrix.shape)

# Retrieve the confidence scores
import json
confidence_scores = fold_job.confidence
print("\nConfidence scores:", json.dumps(confidence_scores[0].model_dump(), indent=2))

pLDDT scores shape: (1, 794)
First 10 scores: [0.53493166 0.57479316 0.6206771  0.6189912  0.65894026 0.6582905
 0.68851155 0.7891059  0.873299   0.95369035]

PAE matrix shape: (1, 794, 794)

PDE matrix shape: (1, 794, 794)

Confidence scores: {
  "confidence_score": 0.9350161552429199,
  "ptm": 0.9228687882423401,
  "iptm": 0.9220518469810486,
  "ligand_iptm": 0.9667198061943054,
  "protein_iptm": 0.921159029006958,
  "complex_plddt": 0.9382572770118713,
  "complex_iplddt": 0.9464380145072937,
  "complex_pde": 0.6022099852561951,
  "complex_ipde": 1.7212265729904175,
  "chains_ptm": {
    "0": 0.9488698840141296,
    "1": 0.9453496932983398,
    "2": 0.9895499348640442
  },
  "pair_chains_iptm": {
    "0": {
      "0": 0.9488698840141296,
      "1": 0.921159029006958,
      "2": 0.850456953048706
    },
    "1": {
      "0": 0.9169709086418152,
      "1": 0.9453496932983398,
      "2": 0.7493193745613098
    },
    "2": {
      "0": 0.9667198061943054,
      "1": 0.935237467288971,
  

**Getting Predicted Binding Affinity**  
Since we requested it, we can
now retrieve the predicted binding affinity. The result is a
`BoltzAffinity` object containing detailed predictions.

In [9]:
# Retrieve the affinity prediction
affinity_data = fold_job.affinity

print("Affinity for binder 'C':")
print(f"  predicted: {affinity_data.affinity_pred_value}")
print(f"  probability: {affinity_data.affinity_probability_binary}")
print(f"  per model: {affinity_data.per_model}")


Affinity for binder 'C':
  predicted: -1.828442931175232
  probability: 0.9927777051925659
  per model: {'affinity_pred_value1': -2.108689308166504, 'affinity_probability_binary1': 0.9957777261734009, 'affinity_pred_value2': -1.54819655418396, 'affinity_probability_binary2': 0.9897776246070862}


# Next Steps

You can use examine the predicted structure, or work on binder design with [RFdiffusion](../structure-generation/Using RFdiffusion) on our platform. You can save your predicted structure like so:

In [10]:
with open("mystructure.cif", "wb") as f:
    f.write(result)