{ "cells": [ { "cell_type": "markdown", "id": "3429c855-8e94-4b13-b76d-9514aa388269", "metadata": {}, "source": [ "[![Open In Colab](/_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz_1_and_Boltz_2.ipynb)\n", "[![Get Notebook](/_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_Boltz_1_and_Boltz_2.ipynb)\n", "[![View In GitHub](/_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_Boltz_1_and_Boltz_2.ipynb)\n", "\n", "# Using Boltz-1 and Boltz-2\n", "This tutorial demonstrates how to use the Boltz-2 model to predict the\n", "structure of a molecular complex, including proteins and ligands. We\n", "will also show how to request and retrieve predicted binding affinities\n", "and other quality metrics.\n", "\n", "## What you need before getting started\n", "\n", "First, ensure you have an active `OpenProtein` session. Then, import the\n", "necessary classes for defining the components of your complex." ] }, { "cell_type": "code", "execution_count": 1, "id": "aa67079e-312c-4b64-922c-d80d0b8e072b", "metadata": {}, "outputs": [], "source": [ "import openprotein\n", "from openprotein.molecules import Complex, Protein, Ligand\n", "\n", "# Login to your session\n", "session = openprotein.connect()" ] }, { "cell_type": "markdown", "id": "ec9a8864-c90f-4489-a606-8f515416c972", "metadata": {}, "source": [ "## Defining the Molecules\n", "\n", "Boltz-2 can model various molecule types, including proteins, ligands,\n", "DNA, and RNA. For this example, we'll predict the structure of a protein\n", "dimer in complex with a ligand.\n", "\n", "We will define a dimer and one ligand. To do this, we will create a [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex) with a dictionary of chains and their respective chain ids. \n", "\n", "Note that for affinity prediction, the ligand that is binding must be\n", "a single, unique ligand in the complex." ] }, { "cell_type": "code", "execution_count": 2, "id": "e698e738-d5d3-42dd-b78c-a6bb109c5f68", "metadata": {}, "outputs": [], "source": [ "# Define the molecular complex to predict\n", "# Start with the protein in a homodimer\n", "protein = Protein(sequence=\"MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ\")\n", "\n", "# You can also specify the protein to be cyclic by setting the property\n", "# protein.cyclic = True\n", "\n", "# Define the ligand in our complex\n", "ligand = Ligand(ccd=\"SAH\")\n", "\n", "# Assemble the complex\n", "complex = Complex({\n", " \"A\": protein,\n", " \"B\": protein,\n", " \"C\": ligand,\n", "})" ] }, { "cell_type": "markdown", "id": "435b1eb0-51c1-4241-805c-eda60b30e0c0", "metadata": {}, "source": [ "## Create MSA for the Protein using Homology Search\n", "\n", "When using Boltz with protein sequences, we need to supply an MSA to\n", "help inform the model. Otherwise, we can also explicitly set it to run\n", "using single sequence mode. You have to specify `protein.msa` either an\n", "MSA or to use `Protein.single_sequence_mode`.\n", "\n", "Here, we will be building an MSA using our platform capabilities. Note\n", "the syntax here: creating an MSA with a complex uses ColabFold's\n", "syntax of joining sequences with `:`." ] }, { "cell_type": "code", "execution_count": 3, "id": "f09c738e-b96e-43bf-905d-eb12c5290344", "metadata": {}, "outputs": [], "source": [ "msa_query = []\n", "for p in complex.get_proteins().values():\n", " msa_query.append(p.sequence)\n", "msa = session.align.create_msa(seed=b\":\".join(msa_query))\n", "\n", "for p in complex.get_proteins().values():\n", " p.msa = msa\n", " # If desired, use single sequence mode to specify no msa\n", " # p.msa = Protein.single_sequence_mode" ] }, { "cell_type": "markdown", "id": "b015df20-d3ca-4fcf-85be-85765979737b", "metadata": {}, "source": [ "## Predicting the Complex Structure and Affinity\n", "\n", "Now, we can call the `fold` method on the Boltz-2 model.\n", "\n", "The key steps are:\n", "\n", "1. Access the model via `session.fold.boltz2`. (or `session.fold.boltz1`, or `session.fold.boltz1x`)\n", "2. Pass the defined proteins and ligands.\n", "3. To request binding affinity prediction, include the `properties`\n", " argument. This argument takes a list of dictionaries. For affinity,\n", " you specify the `binder` as the `chain_id` of the\n", " ligand you defined. (Note that Boltz-1 doesn't support affinity.)" ] }, { "cell_type": "code", "execution_count": 4, "id": "a6c53df3-97c3-49d4-9219-5667e1ae91d8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='7bce7fe5-e946-4ae8-a9aa-bde6e2b7b0c0', job_type=, status=, created_date=datetime.datetime(2026, 1, 16, 12, 56, 7, 411147, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Request the fold, including an affinity prediction for our ligand.\n", "fold_job = session.fold.boltz2.fold(\n", " sequences=[complex], # list for batch requests\n", " properties=[{\"affinity\": {\"binder\": \"C\"}}]\n", ")\n", "fold_job" ] }, { "cell_type": "markdown", "id": "9444e3b7-53b8-4e85-baec-a68c5453483e", "metadata": {}, "source": [ "The call returns a `FoldResultFuture` object immediately. This is\n", "a reference to your job running on the OpenProtein platform. You can\n", "monitor its status or wait for it to complete." ] }, { "cell_type": "code", "execution_count": 5, "id": "e9ca9947-5e66-45c4-a278-a02361c5abfb", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 938.65it/s, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Wait for the job to finish\n", "fold_job.wait_until_done(verbose=True)" ] }, { "cell_type": "markdown", "id": "9d5bc3d3-2124-4f41-b2a8-4fde0b24c2a9", "metadata": {}, "source": [ "## Retrieving the Results\n", "\n", "Once the job is complete, you can retrieve the various outputs from the future object.\n", "\n", "### Getting the Structure\n", "The primary result is the [Structure](../api-reference/molecules.rst#openprotein.molecules.Structure) which contains the parsed molecular structure from the Boltz inference. The `Structure` object itself can hold multiple [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex)s which in turn can hold multiple difference chains, including [Protein](../api-reference/molecules.rst#openprotein.molecules.Protein)s, which themselves hold the individual predicted 3D coordinates of their atoms.\n", "\n", "The number of `Complex`es in the resulting `Structure` depends on the `diffusion_samples` parameter in the request.\n", "\n", "The output result is a `list` type because the API supports submitting multiple `Complex`es for prediction and each result maps to what was submitted in order." ] }, { "cell_type": "code", "execution_count": 6, "id": "f16952ef-a1a6-4319-a8ba-17a858e99288", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicted structures: []\n", "Predicted molecular complex: \n", "Predicted protein A:\n", " 0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA\n", "\n", "60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL\n", "\n", "120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT\n", "\n", "180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL\n", "\n", "240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD\n", "\n", "300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI\n", "\n", "360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ\n", "Predicted protein B:\n", " 0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA\n", "\n", "60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL\n", "\n", "120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT\n", "\n", "180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL\n", "\n", "240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD\n", "\n", "300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI\n", "\n", "360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ\n", "Predicted ligand C:\n", " Ligand(ccd='SAH', smiles=None, _structure_block=)\n" ] } ], "source": [ "result = fold_job.get()\n", "structure = result[0]\n", "predicted_complex = structure[0]\n", "print(\"Predicted structures:\", result)\n", "print(\"Predicted molecular complex:\", result[0][0])\n", "print(\"Predicted protein A:\\n\", predicted_complex.get_protein(\"A\"))\n", "print(\"Predicted protein B:\\n\", predicted_complex.get_protein(\"B\"))\n", "print(\"Predicted ligand C:\\n\", predicted_complex.get_ligand(\"C\"))" ] }, { "cell_type": "markdown", "id": "cdab1798-d544-4eca-89b8-262377f3b8da", "metadata": {}, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec)." ] }, { "cell_type": "code", "execution_count": 7, "id": "e12f68b4-70e4-4eea-835d-9d40dc45ec0b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)\n", "Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)\n", "Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_8d0b49cd-fdcb-4fa7-929e-2f746554db33\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_8d0b49cd-fdcb-4fa7-929e-2f746554db33 not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n