{ "cells": [ { "cell_type": "markdown", "id": "3429c855-8e94-4b13-b76d-9514aa388269", "metadata": {}, "source": [ "[![Open In Colab](../../_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz.ipynb)\n", "[![Get Notebook](../../_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_Boltz.ipynb)\n", "[![View In GitHub](../../_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz.ipynb)\n", "\n", "# Using Boltz\n", "This tutorial demonstrates how to use the Boltz-2 model to predict the\n", "structure of a molecular complex, including proteins and ligands. We\n", "will also show how to request and retrieve predicted binding affinities\n", "and other quality metrics.\n", "\n", "# What you need before getting started\n", "\n", "First, ensure you have an active `OpenProtein` session. Then, import the\n", "necessary classes for defining the components of your complex." ] }, { "cell_type": "code", "execution_count": 1, "id": "aa67079e-312c-4b64-922c-d80d0b8e072b", "metadata": {}, "outputs": [], "source": [ "import openprotein\n", "from openprotein.protein import Protein\n", "from openprotein.chains import Ligand\n", "\n", "# Login to your session\n", "session = openprotein.connect()" ] }, { "cell_type": "markdown", "id": "ec9a8864-c90f-4489-a606-8f515416c972", "metadata": {}, "source": [ "# Defining the Molecules\n", "\n", "Boltz-2 can model various molecule types, including proteins, ligands,\n", "DNA, and RNA. For this example, we'll predict the structure of a protein\n", "dimer in complex with a ligand.\n", "\n", "We will define a dimer and one ligand. When using Boltz models, we can\n", "specify that a `Protein` is meant to be an oligomer by specifying\n", "multiple ids in the `chain_id`. In this case, the protein is a dimer\n", "since we have `[\"A\", \"B\"]`.\n", "\n", "Note that for affinity prediction, the ligand that is binding must have\n", "a single, unique string for its `chain_id`." ] }, { "cell_type": "code", "execution_count": 2, "id": "e698e738-d5d3-42dd-b78c-a6bb109c5f68", "metadata": {}, "outputs": [], "source": [ "# Define the proteins\n", "proteins = [\n", " Protein(sequence=\"MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ\"),\n", "]\n", "proteins[0].chain_id = [\"A\", \"B\"]\n", "\n", "# You can also specify the proteins to be cyclic by setting the property\n", "# proteins[0].cyclic = True\n", "\n", "# Define the ligand\n", "# We use the three-letter code for S-adenosyl-L-homocysteine (SAH)\n", "# The chain_id 'C' is the \"binder\" we will reference later.\n", "ligands = [\n", " Ligand(ccd=\"SAH\", chain_id=\"C\")\n", "]" ] }, { "cell_type": "markdown", "id": "435b1eb0-51c1-4241-805c-eda60b30e0c0", "metadata": {}, "source": [ "# Create MSA for the Protein using Homology Search\n", "\n", "When using Boltz with protein sequences, we need to supply an MSA to\n", "help inform the model. Otherwise, we can also explicitly set it to run\n", "using single sequence mode. You have to specify `protein.msa` either an\n", "MSA or to use `Protein.single_sequence_mode`.\n", "\n", "Here, we will be building an MSA using our platform capabilities. Take\n", "note of the syntax here: creating an MSA with a complex uses ColabFold's\n", "syntax of joining sequences with `:`." ] }, { "cell_type": "code", "execution_count": 3, "id": "f09c738e-b96e-43bf-905d-eb12c5290344", "metadata": {}, "outputs": [], "source": [ "msa_query = []\n", "for p in proteins:\n", " if p.chain_id is not None and isinstance(p.chain_id, list):\n", " for _ in p.chain_id:\n", " msa_query.append(p.sequence.decode())\n", " else:\n", " msa_query.append(p.sequence.decode())\n", "msa = session.align.create_msa(seed=\":\".join(msa_query))\n", "\n", "for p in proteins:\n", " p.msa = msa\n", " # If desired, use single sequence mode to specify no msa\n", " # p.msa = Protein.single_sequence_mode" ] }, { "cell_type": "markdown", "id": "b015df20-d3ca-4fcf-85be-85765979737b", "metadata": {}, "source": [ "# Predicting the Complex Structure and Affinity\n", "\n", "Now, we can call the `fold` method on the Boltz-2 model.\n", "\n", "The key steps are:\n", "\n", "1. Access the model via `session.fold.boltz2`.\n", "2. Pass the defined proteins and ligands.\n", "3. To request binding affinity prediction, include the `properties`\n", " argument. This argument takes a list of dictionaries. For affinity,\n", " you specify the `binder`, which must match the `chain_id` of a\n", " ligand you defined." ] }, { "cell_type": "code", "execution_count": 4, "id": "a6c53df3-97c3-49d4-9219-5667e1ae91d8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='ab5f4f6f-6acc-4dcb-a008-523fdea02c5b', job_type=, status=, created_date=datetime.datetime(2025, 8, 21, 14, 15, 1, 358781, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Request the fold, including an affinity prediction for our ligand.\n", "fold_job = session.fold.boltz2.fold(\n", " proteins=proteins,\n", " ligands=ligands,\n", " properties=[{\"affinity\": {\"binder\": \"C\"}}]\n", ")\n", "fold_job" ] }, { "cell_type": "markdown", "id": "9444e3b7-53b8-4e85-baec-a68c5453483e", "metadata": {}, "source": [ "The call returns a `FoldComplexResultFuture` object immediately. This is\n", "a reference to your job running on the OpenProtein platform. You can\n", "monitor its status or wait for it to complete." ] }, { "cell_type": "code", "execution_count": 5, "id": "e9ca9947-5e66-45c4-a278-a02361c5abfb", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [09:26<00:00, 5.66s/it, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Wait for the job to finish\n", "fold_job.wait_until_done(verbose=True)" ] }, { "cell_type": "markdown", "id": "9d5bc3d3-2124-4f41-b2a8-4fde0b24c2a9", "metadata": {}, "source": [ "# Retrieving the Results\n", "\n", "Once the job is complete, you can retrieve the various outputs from the\n", "future object.\n", "\n", "## Getting the Structure File\n", "The primary result is the predicted\n", "structure, which you can retrieve as a mmCIF file. Note that we only implemented mmCIF output format for Boltz." ] }, { "cell_type": "code", "execution_count": 6, "id": "f16952ef-a1a6-4319-a8ba-17a858e99288", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ATOM 46 O O . ASN A 1 7 ? -9.37756 -6.15156 -7.90108 1 68.851 ? 7 A 1\n", "ATOM 47 C CB . ASN A 1 7 ? -8.77594 -5.10971 -10.70351 1 68.851 ? 7 A 1\n", "ATOM 48 C CG . ASN A 1 7 ? -8.48088 -3.63107 -10.55621 1 68.851 ? 7 A 1\n", "ATOM 49 O OD1 . ASN A 1 7 ? -8.00682 -3.16609 -9.515 1 68.851 ? 7 A 1\n", "ATOM 50 N ND2 . ASN A 1 7 ? -8.77729 -2.86805 -11.60447 1 68.851 ? 7 A 1\n", "ATOM 51 N N . VAL A 1 8 ? -7.36867 -5.19251 -7.56931 1 78.911 ? 8 A 1\n", "ATOM 52 C CA . VAL A 1 8 ? -7.60041 -4.94319 -6.14561 1 78.911 ? 8 A 1\n", "ATOM 53 C C . VAL A 1 8 ? -7.8049 -3.45969 -5.84513 1 78.911 ? 8 A 1\n", "ATOM 54 O O . VAL A 1 8 ? -7.64426 -3.02562 -4.70211 1 78.911 ? 8 A 1\n", "ATOM 55 C CB . VAL A 1 8 ? -6.46385 -5.52776 -5.2703 1 78.911 ? 8 A 1\n" ] } ], "source": [ "# Get the result as a PDB bytestring\n", "result = fold_job.get()\n", "\n", "print('\\n'.join(result.decode().splitlines()[500:510])) # Print a few lines" ] }, { "cell_type": "markdown", "id": "cdab1798-d544-4eca-89b8-262377f3b8da", "metadata": {}, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec)" ] }, { "cell_type": "code", "execution_count": 7, "id": "e12f68b4-70e4-4eea-835d-9d40dc45ec0b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_c59f4a4b-1349-48b7-914e-d58c58c4e2d1\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_c59f4a4b-1349-48b7-914e-d58c58c4e2d1 not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n