{ "cells": [ { "cell_type": "markdown", "id": "4ff86973-c86d-49e3-9b1b-9f9f1f251b76", "metadata": {}, "source": [ "[![Open In Colab](/_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-generation/Using_RFdiffusion.ipynb)\n", "[![Get Notebook](/_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-generation/Using_RFdiffusion.ipynb)\n", "[![View In GitHub](/_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-generation/Using_RFdiffusion.ipynb)\n", "\n", "# Using RFdiffusion\n", "This tutorial shows you how to use the RFdiffusion model to design novel\n", "protein structures.\n", "\n", "The examples here are largely lifted from the [original\n", "documentation](https://github.com/RosettaCommons/RFdiffusion) but\n", "adapted to show how it can be run using the OpenProtein platform, which\n", "can then be combined with our other workflows!\n", "\n", "Full credit for the examples and use cases go to the authors of\n", "RFdiffusion!\n", "\n", "## Unconditional monomer design\n", "\n", "The basic execution of RFdiffusion would be an unconditional design of a\n", "protein structure of a certain length. You would need 2 things:\n", "\n", "1. Length of the protein\n", "2. Number of designs `N` desired" ] }, { "cell_type": "code", "execution_count": 1, "id": "189f1c09-7a74-4e7e-8717-9acac9f2767f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import openprotein\n", "session = openprotein.connect()\n", "session" ] }, { "cell_type": "code", "execution_count": 2, "id": "051f019d-da3f-4abe-b103-cf0e0970d146", "metadata": {}, "outputs": [], "source": [ "length = 150\n", "N = 3" ] }, { "cell_type": "markdown", "id": "6a5f098a-e782-451b-9518-98e56030ad3f", "metadata": {}, "source": [ "Now let's get the model handle:" ] }, { "cell_type": "code", "execution_count": 3, "id": "185570f8-1043-4b26-9692-3036fb357e54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\u001b[31mSignature:\u001b[39m\n", "rfdiffusion.generate(\n", " query: str | bytes | openprotein.molecules.protein.Protein | openprotein.molecules.complex.Complex | openprotein.prompt.models.Query | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " contigs: int | str | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " structure_file: str | bytes | typing.BinaryIO | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " N: int = \u001b[32m1\u001b[39m,\n", " inpaint_seq: str | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " provide_seq: str | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " hotspot: str | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " T: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " partial_T: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " use_active_site_model: bool | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " use_beta_model: bool | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " symmetry: Optional[Literal[\u001b[33m'cyclic'\u001b[39m, \u001b[33m'dihedral'\u001b[39m, \u001b[33m'tetrahedral'\u001b[39m]] = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " order: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " add_potential: bool | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " scaffold_target_structure_file: str | bytes | typing.BinaryIO | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " scaffold_target_use_struct: bool = \u001b[38;5;28;01mFalse\u001b[39;00m,\n", " **kwargs,\n", ") -> openprotein.models.foundation.rfdiffusion.RFdiffusionFuture\n", "\u001b[31mDocstring:\u001b[39m\n", "Run a protein structure generate job using RFdiffusion.\n", "\n", "Parameters\n", "----------\n", "query : str or bytes or Protein or Complex or Query, optional\n", " A query representing the design specification. Use either `query` or `contigs`\n", " for default design. Or provide `scaffold_target_structure_file`\n", " for scaffold guided design.\n", " `query` provides a unified way to represent design specifications on the\n", " OpenProtein platform. In this case, the structure mask of the containing Complex\n", " proteins are specified to be designed. Other parameters like binding are passed\n", " as hotspots to RFdiffusion.\n", "contigs : int, str, optional\n", " Defines the lengths and connectivity of chain segments for the desired\n", " structure, specified in RFdiffusion's contig string format.\n", " Required for most design tasks. Example: 150, '10-20/A100-110/10-20' for a\n", " binder design.\n", "structure_file : BinaryIO, optional\n", " An input PDB file (as a file-like object) used for inpainting or other\n", " guided design tasks where parts of an existing structure are provided.\n", "n : int, optional\n", " The number of unique design trajectories to run (default is 1).\n", "inpaint_seq : str, optional\n", " A string specifying the regions in the input structure to mask for\n", " in-painting. Example: 'A1-A10/A30-40'.\n", "provide_seq : str, optional\n", " A string specifying which segments of the contig have a provided\n", " sequence. Example: 'A1-A10/A30-40'.\n", "hotspot : str, optional\n", " A string specifying hotspot residues to constrain during design,\n", " typically for functional sites. Example: 'A10,A12,A14'.\n", "T : int, optional\n", " The number of timesteps for the diffusion process.\n", "partial_T : int, optional\n", " The number of timesteps for partial diffusion.\n", "use_active_site_model : bool, optional\n", " If True, uses the active site model checkpoint, which has been finetuned to\n", " better keep very small motifs in place in the output for motif scaffolding\n", " (default is False).\n", "use_beta_model : bool, optional\n", " If True, uses the complex beta model checkpoint, which generates a\n", " greater diversity of topologies but has not been extensively\n", " experimentally validated (default is False).\n", "symmetry : {\"cyclic\", \"dihedral\", \"tetrahedral\"}, optional\n", " The type of symmetry to apply to the design.\n", "order : int, optional\n", " The order of the symmetry (e.g., 3 for C3 or D3 symmetry).\n", " Must be provided if `symmetry` is set.\n", "add_potential : bool, optional\n", " A flag to toggle an additional potential to guide the design.\n", " This defaults to true in the case of symmetric design.\n", "scaffold_target_structure_file : str, bytes, BinaryIO, optional\n", " A PDB file (which can be the text string or bytes or the file-like\n", " object) containing a scaffold structure to be used as a structural\n", " guide. It could also be used as a target when doing scaffold guided\n", " binder design with `scaffold_target_use_struct`.\n", "scaffold_target_use_struct : bool, optional\n", " Whether or not to use the provided scaffold structure as a target.\n", " Otherwise, it is used only as a topology guide.\n", "\n", "Other Parameters\n", "----------------\n", "**kwargs : dict\n", " Additional keyword args that are passed directly to the rfdiffusion\n", " inference script. Overwrites any preceding options.\n", "\n", "Returns\n", "-------\n", "RFdiffusionFuture\n", " A future object that can be used to retrieve the results of the design\n", " job upon completion.\n", "\u001b[31mFile:\u001b[39m ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/rfdiffusion.py\n", "\u001b[31mType:\u001b[39m method" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "rfdiffusion = session.models.rfdiffusion\n", "rfdiffusion.generate?" ] }, { "cell_type": "markdown", "id": "3038ce42-fd1f-40c5-9a26-ee43d057d9b7", "metadata": {}, "source": [ "We can run designs in a unified manner using the `Query` object, which itself represents a `Protein`, which has been masked in some way. For RFdiffusion, which generates protein structures, we want to mask the structure of our `Protein` to tell the model to generate the structure for us. \n", "\n", "In this case, for unconditional monomer design, we can just create a protein chain with length N and all unknown residues. [Protein.from_expr](../api-reference/molecules.rst#openprotein.molecules.Protein.from_expr) provides syntactic sugar for how to quickly construct such chains." ] }, { "cell_type": "code", "execution_count": 4, "id": "0d05d987-484e-4bce-9c03-18076aec501c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'\n", "structure mask: [ True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True True True True True True True\n", " True True True True True True]\n" ] } ], "source": [ "from openprotein.molecules import Protein\n", "\n", "unconditional_monomer = Protein.from_expr(length)\n", "print(\"sequence:\", unconditional_monomer.sequence)\n", "print(\"structure mask:\", unconditional_monomer.get_structure_mask())" ] }, { "cell_type": "markdown", "id": "b5a3c36e-dbf4-4339-a3c1-0058a3deb13e", "metadata": {}, "source": [ "The above tells us that our query protein has both the sequence and structure fully masked, with our expected length. Since RFdiffusion only works on structures, and the query has a fully masked structure, the output will be unconditionally generated monomers." ] }, { "cell_type": "markdown", "id": "a5c650a0-c717-4dd9-aa82-13aeff60ce90", "metadata": {}, "source": [ "Run the design using RFdiffusion:" ] }, { "cell_type": "code", "execution_count": 5, "id": "b85f1b2e-8be2-44b3-9eba-e2a55cc9af05", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RFdiffusionJob(job_id='846bce61-dbc2-4fc1-9f48-105cbf67ff22', job_type='/models/rfdiffusion', status=, created_date=datetime.datetime(2026, 1, 17, 12, 30, 26, 607171, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "design = rfdiffusion.generate(N=N, query=unconditional_monomer)\n", "design" ] }, { "cell_type": "markdown", "id": "d14493a7-c253-4296-80ed-44d90642a907", "metadata": {}, "source": [ "Wait for the job to finish running with `wait_until_done`." ] }, { "cell_type": "code", "execution_count": 6, "id": "f7e8dddd-01d0-4cfe-ab51-01a33fda7ff0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [05:31<00:00, 3.31s/it, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "design.wait_until_done(verbose=True, timeout=600)" ] }, { "cell_type": "markdown", "id": "b545f01b-80d0-43cb-9450-016213d53f90", "metadata": {}, "source": [ "Retrieve the designs as a list of `N` [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex) objects. `Complex` objects represent multimers, and can hold multiple protein (and other) chains. For now, our design will only return a single chain. Let's look at the first one." ] }, { "cell_type": "code", "execution_count": 7, "id": "509a700a-4079-4db0-906d-3efbf6286f78", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'\n", "structure mask: [False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False False False False False False False\n", " False False False False False False]\n" ] } ], "source": [ "unconditional_monomer_designs = design.get()\n", "unconditional_monomer_design = unconditional_monomer_designs[0]\n", "print(\"sequence:\", unconditional_monomer_design.get_protein(\"A\").sequence)\n", "print(\"structure mask:\", unconditional_monomer_design.get_protein(\"A\").get_structure_mask())" ] }, { "cell_type": "markdown", "id": "d1c76bf1-afd4-4b99-a3e4-f1df81eefec7", "metadata": {}, "source": [ "From the above, we can see that the structure mask is now empty, which means RFdiffusion has produced a design for the structure. Also, the sequence mask is still in place, which can be inferred using [ProteinMPNN](../api-reference/models.rst#openprotein.models.ProteinMPNNModel). Look at our walkthrough for [binder design](../../walkthroughs/Protein_protein_binder_design_with_RFdiffusion.ipynb) for more information on the full workflow.\n", "\n", "Now let's visualize this:" ] }, { "cell_type": "code", "execution_count": 8, "id": "e4580109-1260-4978-849c-a41e6a57251c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)\n", "Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)\n", "Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_b8697ded-2c70-4af2-bf3f-03613052f15c\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_b8697ded-2c70-4af2-bf3f-03613052f15c not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n