{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ODSNVxS33to7" }, "source": [ "[![Open In Colab](../../_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "[![Get Notebook](../../_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "[![View In GitHub](../../_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_AlphaFold2.ipynb)\n", "\n", "# Using AlphaFold2\n", "\n", "This tutorial shows you how to use the AlphaFold2 model to create a PDB of your protein sequence of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit [Using ESMFold](./Using_ESMFold.ipynb). If you have ligands or DNA/RNA of interest, please try [Using Boltz](./Using_Boltz.ipynb) instead." ] }, { "cell_type": "markdown", "metadata": { "id": "AzVrAlxU4daB" }, "source": [ "## What you need before getting started\n", "\n", "Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "PyscZwF53tat" }, "outputs": [], "source": [ "import openprotein\n", "\n", "# Login to your session\n", "session = openprotein.connect()\n", "\n", "# Specify your sequence\n", "sequence = \"MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP\"" ] }, { "cell_type": "markdown", "metadata": { "id": "Iw_a4bMQ4-qO" }, "source": [ "## Creating an MSA\n", "\n", "AlphaFold2 requires evolutionary context from a multiple sequence alignment (MSA) to make structure predictions. This section demonstrates how to create an MSA based on the sequence you wish to fold.\n", "\n", "Start by getting the alphafold model object:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "4u743HGr5SHx" }, "outputs": [ { "data": { "text/plain": [ "\u001b[31mSignature:\u001b[39m\n", "afmodel.fold(\n", " proteins: list[openprotein.protein.Protein] | openprotein.align.msa.MSAFuture | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " num_recycles: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", " num_models: int = \u001b[32m1\u001b[39m,\n", " num_relax: int = \u001b[32m0\u001b[39m,\n", " **kwargs,\n", ") -> openprotein.fold.future.FoldComplexResultFuture\n", "\u001b[31mDocstring:\u001b[39m\n", "Post sequences to alphafold model.\n", "\n", "Parameters\n", "----------\n", "proteins : List[Protein] | MSAFuture\n", " List of protein sequences to fold. `Protein` objects must be tagged with an `msa`. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.\n", "num_recycles : int\n", " number of times to recycle models\n", "num_models : int\n", " number of models to train - best model will be used\n", "num_relax : int\n", " maximum number of iterations for relax\n", "\n", "Returns\n", "-------\n", "job : Job\n", "\u001b[31mFile:\u001b[39m ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py\n", "\u001b[31mType:\u001b[39m method" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "afmodel = session.fold.get_model('alphafold2')\n", "afmodel.fold?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can review some of the metadata about the AlphaFold2 model. Note that the input tokens for the model is `null` because it accepts an MSA instead of directly with sequences." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cIOg97ke5nZC", "outputId": "743a05b2-ddd5-4df0-8810-6043aa27301e" }, "outputs": [ { "data": { "text/plain": [ "ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=None, output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "afmodel.metadata" ] }, { "cell_type": "markdown", "metadata": { "id": "pJc2qaOsgNbj" }, "source": [ "Use your seed sequence to create an MSA:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zVnKV40u5okT", "outputId": "2f2f80ba-2fc7-4d91-958c-f027d8af0253" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "job_id='df4da7b0-55ac-4db7-8cca-a7a52d5911bc' job_type= status= created_date=datetime.datetime(2025, 8, 21, 7, 36, 6, 317723) start_date=None end_date=datetime.datetime(2025, 8, 21, 7, 36, 6, 317880) prerequisite_job_id=None progress_message=None progress_counter=None sequence_length=None\n" ] } ], "source": [ "msa = session.align.create_msa(sequence.encode())\n", "print(msa)" ] }, { "cell_type": "markdown", "metadata": { "id": "lRmwiyti5vPI" }, "source": [ "Examine the outputs once the MSA is complete:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "km8owclo5yPU", "outputId": "a239e432-8e41-4074-e534-8ae0a23a445b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 381.79it/s, status=SUCCESS]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[('101', 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'), ('UniRef100_G1RE34\\t243\\t0.764\\t2.142E-68\\t0\\t138\\t239\\t0\\t152\\t153', 'MYRMQLLSCIALSLALVTNGAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVQELKGSETTFMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------'), ('UniRef100_A0A2K5MA48\\t234\\t0.753\\t1.582E-65\\t0\\t138\\t239\\t0\\t153\\t154', 'MYRMQLLSCIALSLALVANSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRdTKDLISNINVIVLELKGSETTLMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------')]\n" ] } ], "source": [ "msa.wait_until_done(verbose=True)\n", "\n", "print(list(msa.get())[0:3])" ] }, { "cell_type": "markdown", "metadata": { "id": "HSNIK5fn55zx" }, "source": [ "## Predicting your sequence\n", "\n", "Call the AlphaFold2 model by sending the MSA to the fold endpoint and return a `fold` job to await:\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7N6TsTlo6ASx", "outputId": "661e6de6-0f43-4a0b-b289-7dec7f65099d" }, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='4e50f2d1-f921-46ac-8d23-cdcebba3ebbb', job_type=, status=, created_date=datetime.datetime(2025, 8, 21, 7, 36, 8, 793708, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fold = afmodel.fold(msa, num_models=1)\n", "\n", "fold" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XlTPun2F6M3r", "outputId": "ad264864-68bd-415f-b1e6-1f07c356f30b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:52<00:00, 1.91it/s, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fold.wait_until_done(verbose=True, timeout=900)" ] }, { "cell_type": "markdown", "metadata": { "id": "GJN1ZoNV6oNK" }, "source": [ "Wait for the job to complete and fetch the results all with `wait()`:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AH0VmP016h4_", "outputId": "cf8b003d-27b7-4abe-bfdf-c4b973146987" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 517.18it/s, status=SUCCESS]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "ATOM 80 C CB . ILE A ? 10 ? -22.625 -1.933 1.770 1.0 58.75 10 A 1 \n", "ATOM 81 O O . ILE A ? 10 ? -21.391 -1.501 -1.284 1.0 58.75 10 A 1 \n", "ATOM 82 C CG1 . ILE A ? 10 ? -22.359 -2.760 3.031 1.0 58.75 10 A 1 \n", "ATOM 83 C CG2 . ILE A ? 10 ? -23.484 -2.715 0.774 1.0 58.75 10 A 1 \n", "ATOM 84 C CD1 . ILE A ? 10 ? -23.609 -3.113 3.818 1.0 58.75 10 A 1 \n", "ATOM 85 N N . ALA A ? 11 ? -21.844 0.433 -0.271 1.0 55.09 11 A 1 \n", "ATOM 86 C CA . ALA A ? 11 ? -22.062 1.217 -1.481 1.0 55.09 11 A 1 \n", "ATOM 87 C C . ALA A ? 11 ? -20.781 1.376 -2.287 1.0 55.09 11 A 1 \n", "ATOM 88 C CB . ALA A ? 11 ? -22.641 2.586 -1.127 1.0 55.09 11 A 1 \n", "ATOM 89 O O . ALA A ? 11 ? -20.797 1.257 -3.514 1.0 55.09 11 A 1 \n" ] } ], "source": [ "result = fold.wait(verbose=True)\n", "print(\"\\n\".join(result.decode().splitlines()[100:110]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b]4;0;#1B1A1C\u001b\\\u001b]1;0;#1B1A1C\u001b\\\u001b]4;1;#B071FF\u001b\\\u001b]4;2;#64DCF0\u001b\\\u001b]4;3;#FFDCF3\u001b\\\u001b]4;4;#9AA9D8\u001b\\\u001b]4;5;#B59EEA\u001b\\\u001b]4;6;#9DCEFF\u001b\\\u001b]4;7;#E8D3DE\u001b\\\u001b]4;8;#C3B5C0\u001b\\\u001b]4;9;#D5B1FF\u001b\\\u001b]4;10;#F7FDFF\u001b\\\u001b]4;11;#FFFFFF\u001b\\\u001b]4;12;#D1DCF9\u001b\\\u001b]4;13;#E3D2FF\u001b\\\u001b]4;14;#F8FAFF\u001b\\\u001b]4;15;#E5E0E9\u001b\\\u001b]10;#E8D3DE\u001b\\\u001b]11;[100]#1B1A1C\u001b\\\u001b]12;#E8D3DE\u001b\\\u001b]13;#E8D3DE\u001b\\\u001b]17;#E8D3DE\u001b\\\u001b]19;#1B1A1C\u001b\\\u001b]4;232;#E8D3DE\u001b\\\u001b]4;256;#E8D3DE\u001b\\\u001b]708;[100]#1B1A1C\u001b\\\u001b]11;#1B1A1C\u001b\\Collecting molviewspec\n", " Downloading molviewspec-1.6.0-py3-none-any.whl.metadata (10 kB)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.11.4)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.33.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.33.2)\n", "Requirement already satisfied: typing-extensions>=4.12.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.13.2)\n", "Requirement already satisfied: typing-inspection>=0.4.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.0)\n", "Downloading molviewspec-1.6.0-py3-none-any.whl (31 kB)\n", "Installing collected packages: molviewspec\n", "Successfully installed molviewspec-1.6.0\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install molviewspec" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_b4e94b85-5687-417e-999e-c9349311360d\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_b4e94b85-5687-417e-999e-c9349311360d not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n