{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FeuwvvUs0yqk" }, "source": [ "[![Open In Colab](/_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "[![Get Notebook](/_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "[![View In GitHub](/_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "\n", "# Using ESMFold\n", "\n", "This tutorial shows you how to use the ESMFold model to create a predicted 3D structure of your protein sequence of interest. We recommend using ESMFold with single-chain sequences. If you have a multi-chain sequence, please try [Using AlphaFold2](./Using_AlphaFold2.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "8HtB4VsvwU4H" }, "source": [ "## What you need before getting started\n", "\n", "Specify a sequence of interest whose structure you want to predict. The example used here is Interleukin 2:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "0QdAOCsW40pY" }, "outputs": [], "source": [ "import openprotein\n", "\n", "# Login to your session\n", "session = openprotein.connect()\n", "\n", "sequence = \"MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting the Model" ] }, { "cell_type": "markdown", "metadata": { "id": "xV2gH7gxhbBw" }, "source": [ "Create the model object for ESMFold:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "XM5ksfsDhaot" }, "outputs": [ { "data": { "text/plain": [ "\u001b[31mSignature:\u001b[39m\n", "esmfoldmodel.fold(\n", " sequences: Sequence[openprotein.molecules.complex.Complex | openprotein.molecules.protein.Protein | str | bytes],\n", " num_recycles: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", ") -> openprotein.fold.future.FoldResultFuture\n", "\u001b[31mDocstring:\u001b[39m\n", "Fold sequences using this model.\n", "\n", "Parameters\n", "----------\n", "sequences : Sequence[bytes | str]\n", " sequences to fold\n", "num_recycles : int | None\n", " number of times to recycle models\n", "Returns\n", "-------\n", " FoldResultFuture\n", "\u001b[31mFile:\u001b[39m ~/Projects/openprotein/openprotein-python-private/openprotein/fold/esmfold.py\n", "\u001b[31mType:\u001b[39m method" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "esmfoldmodel = session.fold.esmfold\n", "esmfoldmodel.fold?" ] }, { "cell_type": "markdown", "metadata": { "id": "wUS6G7001sV2" }, "source": [ "## Predicting your sequence\n", "\n", "Call ESMFold on your sequence. The `num_recycles` hyperparameter allows the model to further refine structures using the previous cycle’s output as the new cycle’s input. This parameter accepts integers between 1 and 48.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "xcGPjlCmhqqN" }, "source": [ "Send the sequence of interest to ESM for folding.\n", "\n", "Note that we can submit either a [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex),\n", "or [Protein](../api-reference/molecules.rst#openprotein.molecules.Protein), \n", "or just the sequence itself which itself represents a single `Protein`. We can also submit a \n", "list of sequences to batch the request." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "s8HB2XFt19oP", "outputId": "4b71fa9d-c77b-4b64-d228-429d4a7c3fad" }, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='f3370817-5fdd-400d-bd26-a90805cd9f4a', job_type=, status=, created_date=datetime.datetime(2026, 1, 16, 16, 18, 19, 489442, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "esm = esmfoldmodel.fold([sequence.encode()], num_recycles=1)\n", "\n", "esm" ] }, { "cell_type": "markdown", "metadata": { "id": "AZO2b3L3hu32" }, "source": [ "Wait for the job to complete with `wait_until_done()`:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lbhB5V9S2B8d", "outputId": "2721d4ae-d271-4474-90ab-642d649e266b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|█████████████████████████████████████████████████| 100/100 [04:40<00:00, 2.80s/it, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "esm.wait_until_done(verbose=True, timeout=300)" ] }, { "cell_type": "markdown", "metadata": { "id": "FA4bnBXb272e" }, "source": [ "## Retrieving the Results\n", "\n", "### Getting the Predicted Structure\n", "\n", "Fetch the results with `get()`.\n", "\n", "The results return a list of [Structure](../api-reference/molecules.rst#openprotein.molecules.Structure) \n", "which contains the 3D structures for each of the input in the request.\n", "\n", "We can access the [Complex](../api-reference/molecules.rst#openprotein.molecules.Complex) for each `Structure`,\n", "and the folded `Protein` of interest." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4JvewKs43S2j", "outputId": "a95c59cb-7bf1-4ca2-f8ac-a49f3f9360c1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predicted structure: \n", "Predicted protein sequence: b'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'\n" ] } ], "source": [ "result = esm.get()\n", "structure = result[0]\n", "complex = structure[0]\n", "protein = complex.get_protein(\"A\") # auto-named alphabetical order\n", "\n", "print(\"Predicted structure:\", structure)\n", "print(\"Predicted protein sequence:\", protein.sequence)" ] }, { "cell_type": "markdown", "metadata": { "id": "yzAO2KTYh5yy" }, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)\n", "Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)\n", "Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] }, { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_60eb9152-f2dc-4286-8eaf-c55645ecd0b8\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_60eb9152-f2dc-4286-8eaf-c55645ecd0b8 not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n