{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ODSNVxS33to7" }, "source": [ "[![Open In Colab](/_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "[![Get Notebook](/_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "[![View In GitHub](/_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "\n", "# Using AlphaFold2\n", "\n", "This tutorial shows you how to use the AlphaFold2 model to create a predicted 3D structure of your protein sequence or complex of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit [Using ESMFold](./Using_ESMFold.ipynb). If you have ligands or DNA/RNA of interest, please try [Using Boltz](./Using_Boltz.ipynb) instead." ] }, { "cell_type": "markdown", "metadata": { "id": "AzVrAlxU4daB" }, "source": [ "## What you need before getting started\n", "\n", "Specify a sequence or complex of interest whose structure you want to predict. This example uses [1SPD](https://www.rcsb.org/structure/1SPD).\n", "\n", "We will specify a [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex) so that we can attach the [MSA](../api-reference/align.rst#openprotein.align.MSAFuture) to provide AlphaFold-2 with the evolutionary context." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "PyscZwF53tat" }, "outputs": [], "source": [ "import openprotein\n", "from openprotein.molecules import Complex, Protein\n", "\n", "# Login to your session\n", "session = openprotein.connect()\n", "\n", "# Specify your complex\n", "complex = Complex({\n", " \"A\": Protein(\"XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ\"),\n", " \"B\": Protein(\"XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ\")\n", "})\n", "\n", "# We can also directly use a ':'-delimited string as well if we run in single sequence mode, i.e. no MSA.\n", "# complex = \"XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ:XATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSAGPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVHEKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ\"" ] }, { "cell_type": "markdown", "metadata": { "id": "Iw_a4bMQ4-qO" }, "source": [ "## Getting the Model\n", "\n", "Start by getting the [AlphaFold2 model](../api-reference/fold.rst#openprotein.fold.AlphaFold2Model) object:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "4u743HGr5SHx" }, "outputs": [ { "data": { "text/plain": [ "\u001b[31mSignature:\u001b[39m\n", "afmodel.fold(\n", " sequences: Union[Sequence[openprotein.molecules.complex.Complex | openprotein.molecules.protein.Protein | str], openprotein.align.msa.MSAFuture, NoneType] = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " num_recycles: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " num_models: int = \u001b[32m1\u001b[39m,\n", " num_relax: int = \u001b[32m0\u001b[39m,\n", " **kwargs,\n", ") -> openprotein.fold.future.FoldResultFuture\n", "\u001b[31mDocstring:\u001b[39m\n", "Post sequences to alphafold model.\n", "\n", "Parameters\n", "----------\n", "sequences : List[Complex | Protein | str] | MSAFuture\n", " List of protein sequences to include in folded output. `Protein` objects must be tagged with an `msa`, which can be a `Protein.single_sequence_mode` for single sequence mode. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.\n", "num_recycles : int\n", " number of times to recycle models\n", "num_models : int\n", " number of models to train - best model will be used\n", "num_relax : int\n", " maximum number of iterations for relax\n", "\n", "Returns\n", "-------\n", "job : Job\n", "\u001b[31mFile:\u001b[39m ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py\n", "\u001b[31mType:\u001b[39m method" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "afmodel = session.fold.alphafold2\n", "afmodel.fold?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can review some of the metadata about the AlphaFold2 model. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cIOg97ke5nZC", "outputId": "743a05b2-ddd5-4df0-8810-6043aa27301e" }, "outputs": [ { "data": { "text/plain": [ "ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='AlphaFold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', 'X', 'O', 'U', 'B', 'Z', '-'], output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "afmodel.metadata" ] }, { "cell_type": "markdown", "metadata": { "id": "pJc2qaOsgNbj" }, "source": [ "# Creating an MSA using Homology Search\n", "\n", "When using AlphaFold2 with protein sequences, we need to supply an MSA to\n", "help inform the model. Otherwise, we can also explicitly set it to run\n", "using single sequence mode. You have to specify `protein.msa` either an\n", "MSA or to use `Protein.single_sequence_mode`. We will go ahead to create the MSA using our platform capabilities.\n", "\n", "Use our complex as the seed sequence to create an MSA:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zVnKV40u5okT", "outputId": "2f2f80ba-2fc7-4d91-958c-f027d8af0253" }, "outputs": [ { "data": { "text/plain": [ "MSAJob(job_id='7b5e5586-245d-4019-a30f-c8eea90882b4', job_type=, status=, created_date=datetime.datetime(2026, 1, 16, 17, 13, 7, 523305, tzinfo=TzInfo(0)), start_date=None, end_date=datetime.datetime(2026, 1, 16, 17, 13, 7, 523396, tzinfo=TzInfo(0)), prerequisite_job_id=None, progress_message=None, progress_counter=None, sequence_length=None)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "msa_query = []\n", "for p in complex.get_proteins().values():\n", " msa_query.append(p.sequence)\n", "msa = session.align.create_msa(seed=b\":\".join(msa_query))\n", "\n", "for p in complex.get_proteins().values():\n", " p.msa = msa\n", " # If desired, use single sequence mode to specify no msa\n", " # p.msa = Protein.single_sequence_mode\n", "\n", "msa" ] }, { "cell_type": "markdown", "metadata": { "id": "lRmwiyti5vPI" }, "source": [ "We can either wait for the results to complete, or we can go ahead and schedule the fold job run immediately after the MSA is done automatically." ] }, { "cell_type": "markdown", "metadata": { "id": "HSNIK5fn55zx" }, "source": [ "# Predicting the Complex Structure\n", "\n", "Call the AlphaFold-2 `fold` method with our complex and return a job to await. We also set `num_models` to 3." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7N6TsTlo6ASx", "outputId": "661e6de6-0f43-4a0b-b289-7dec7f65099d" }, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='240ec08e-7c47-4ccd-a12f-7ea969b0cd9b', job_type=, status=, created_date=datetime.datetime(2026, 1, 16, 17, 14, 41, 8933, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "af2_fold = afmodel.fold(sequences=[complex], num_models=3)\n", "\n", "af2_fold" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XlTPun2F6M3r", "outputId": "ad264864-68bd-415f-b1e6-1f07c356f30b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|█████████████████████████████████████████████████| 100/100 [05:39<00:00, 3.40s/it, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "af2_fold.wait_until_done(verbose=True, timeout=900)" ] }, { "cell_type": "markdown", "metadata": { "id": "GJN1ZoNV6oNK" }, "source": [ "Wait for the job to complete and fetch the results all with `wait()`:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Retrieving the Results\n", "\n", "## Getting the Structure\n", "The primary result is the [Structure](../api-reference/molecules.rst#openprotein.molecules.Structure) which contains the parsed molecular structure from the AlphaFold-2 inference. The `Structure` object itself can hold multiple [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex)s which in turn can hold multiple difference chains, including [Protein](../api-reference/molecules.rst#openprotein.molecules.Protein)s, which themselves hold the individual predicted 3D coordinates of their atoms.\n", "\n", "The number of `Complex`es in the resulting `Structure` depends on the `num_models` parameter in the request, and since we set it to 3, we can expect 3 predicted `Complex`es.\n", "\n", "The output result is a `list` type because the API supports submitting multiple `Complex`es for prediction and each result maps to what was submitted in order." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AH0VmP016h4_", "outputId": "cf8b003d-27b7-4abe-bfdf-c4b973146987" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicted structures: []\n", "Predicted molecular complex: \n", "Predicted protein A:\n", " 0 SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA\n", "\n", "60 SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH\n", "\n", "120 SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ\n", "Predicted protein B:\n", " 0 SEQUENCE ATKAVCVLKGDGPVQGIINFEQKESNGPVKVWGSIKGLTEGLHGFHVHEFGDNTAGCTSA\n", "\n", "60 SEQUENCE GPHFNPLSRKHGGPKDEERHVGDLGNVTADKDGVADVSIEDSVISLSGDHCIIGRTLVVH\n", "\n", "120 SEQUENCE EKADDLGKGGNEESTKTGNAGSRLACGVIGIAQ\n" ] } ], "source": [ "result = af2_fold.get()\n", "structure = result[0]\n", "predicted_complex = structure[0]\n", "print(\"Predicted structures:\", result)\n", "print(\"Predicted molecular complex:\", result[0][0])\n", "print(\"Predicted protein A:\\n\", predicted_complex.get_protein(\"A\"))\n", "print(\"Predicted protein B:\\n\", predicted_complex.get_protein(\"B\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec):" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)\n", "Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)\n", "Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_58b605d8-fc54-40a5-bc74-85305e11fb54\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_58b605d8-fc54-40a5-bc74-85305e11fb54 not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n