{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FeuwvvUs0yqk" }, "source": [ "[![Open In Colab](../../_static/colab-badge.svg)](https://colab.research.google.com/github/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "[![Get Notebook](../../_static/get-notebook-badge.svg)](https://raw.githubusercontent.com/OpenProteinAI/openprotein-docs/refs/heads/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "[![View In GitHub](../../_static/view-in-github-badge.svg)](https://github.com/OpenProteinAI/openprotein-docs/blob/main/source/python-api/structure-prediction/Using_ESMFold.ipynb)\n", "\n", "# Using ESMFold\n", "\n", "This tutorial shows you how to use the ESMFold model to create a PDB of your protein sequence of interest. We recommend using ESMFold with single-chain sequences. If you have a multi-chain sequence, please visit [Using AlphaFold2](./Using_AlphaFold2.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "8HtB4VsvwU4H" }, "source": [ "## What you need before getting started\n", "\n", "Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "0QdAOCsW40pY" }, "outputs": [], "source": [ "import openprotein\n", "\n", "# Login to your session\n", "session = openprotein.connect()\n", "\n", "sequence = \"MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP\"" ] }, { "cell_type": "markdown", "metadata": { "id": "wUS6G7001sV2" }, "source": [ "## Predicting your sequence\n", "\n", "Call ESMFold on your sequence. The `num_recycles` hyperparameter allows the model to further refine structures using the previous cycle’s output as the new cycle’s input. This parameter accepts integers between 1 and 48.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "xV2gH7gxhbBw" }, "source": [ "Create the model object for ESMFold:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "XM5ksfsDhaot" }, "outputs": [ { "data": { "text/plain": [ "\u001b[31mSignature:\u001b[39m\n", "esmfoldmodel.fold(\n", " sequences: collections.abc.Sequence[bytes | str],\n", " num_recycles: int | \u001b[38;5;28;01mNone\u001b[39;00m = \u001b[38;5;28;01mNone\u001b[39;00m,\n", ") -> openprotein.fold.future.FoldResultFuture\n", "\u001b[31mDocstring:\u001b[39m\n", "Fold sequences using this model.\n", "\n", "Parameters\n", "----------\n", "sequences : Sequence[bytes | str]\n", " sequences to fold\n", "num_recycles : int | None\n", " number of times to recycle models\n", "Returns\n", "-------\n", " FoldResultFuture\n", "\u001b[31mFile:\u001b[39m ~/Projects/openprotein/openprotein-python-private/openprotein/fold/esmfold.py\n", "\u001b[31mType:\u001b[39m method" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "esmfoldmodel = session.fold.get_model('esmfold')\n", "esmfoldmodel.fold?" ] }, { "cell_type": "markdown", "metadata": { "id": "xcGPjlCmhqqN" }, "source": [ "Send the sequence of interest to ESM for folding:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "s8HB2XFt19oP", "outputId": "4b71fa9d-c77b-4b64-d228-429d4a7c3fad" }, "outputs": [ { "data": { "text/plain": [ "FoldJob(num_records=1, job_id='184e52a3-7eb5-4105-890e-9dcf41525382', job_type=, status=, created_date=datetime.datetime(2025, 8, 21, 8, 59, 51, 384873, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "esm = esmfoldmodel.fold([sequence.encode()], num_recycles=1)\n", "\n", "esm" ] }, { "cell_type": "markdown", "metadata": { "id": "AZO2b3L3hu32" }, "source": [ "Wait for the job to complete with `wait_until_done()`:\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "lbhB5V9S2B8d", "outputId": "2721d4ae-d271-4474-90ab-642d649e266b" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [04:57<00:00, 2.98s/it, status=SUCCESS]\n" ] }, { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "esm.wait_until_done(verbose=True, timeout=300)" ] }, { "cell_type": "markdown", "metadata": { "id": "FA4bnBXb272e" }, "source": [ "Fetch the results with `get()`\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "JAnnNiyDh8DJ" }, "source": [ "The results display a tuple containing the query sequence and the contents of the resulting PDB file.\n", "Note that ESMFold returns results in PDB format:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4JvewKs43S2j", "outputId": "a95c59cb-7bf1-4ca2-f8ac-a49f3f9360c1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ATOM 10 CA TYR A 2 -0.479 -21.837 -10.591 1.00 69.89 C \n", "ATOM 11 C TYR A 2 -1.672 -21.177 -9.910 1.00 51.02 C \n", "ATOM 12 CB TYR A 2 0.209 -20.836 -11.525 1.00 53.86 C \n", "ATOM 13 O TYR A 2 -1.519 -20.513 -8.882 1.00 48.75 O \n", "ATOM 14 CG TYR A 2 1.660 -21.154 -11.794 1.00 49.22 C \n", "ATOM 15 CD1 TYR A 2 2.645 -20.852 -10.856 1.00 49.69 C \n", "ATOM 16 CD2 TYR A 2 2.048 -21.757 -12.985 1.00 51.99 C \n", "ATOM 17 CE1 TYR A 2 3.983 -21.141 -11.101 1.00 50.78 C \n", "ATOM 18 CE2 TYR A 2 3.384 -22.051 -13.240 1.00 47.59 C \n", "ATOM 19 OH TYR A 2 5.666 -22.029 -12.540 1.00 42.47 O \n" ] } ], "source": [ "result = esm.get()\n", "result = result[0][1]\n", "print(\"\\n\".join(result.decode().splitlines()[10:20]))" ] }, { "cell_type": "markdown", "metadata": { "id": "yzAO2KTYh5yy" }, "source": [ "Visualize the structure using [molviewspec](https://github.com/molstar/mol-view-spec)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b]4;0;#1B1A1C\u001b\\\u001b]1;0;#1B1A1C\u001b\\\u001b]4;1;#B071FF\u001b\\\u001b]4;2;#64DCF0\u001b\\\u001b]4;3;#FFDCF3\u001b\\\u001b]4;4;#9AA9D8\u001b\\\u001b]4;5;#B59EEA\u001b\\\u001b]4;6;#9DCEFF\u001b\\\u001b]4;7;#E8D3DE\u001b\\\u001b]4;8;#C3B5C0\u001b\\\u001b]4;9;#D5B1FF\u001b\\\u001b]4;10;#F7FDFF\u001b\\\u001b]4;11;#FFFFFF\u001b\\\u001b]4;12;#D1DCF9\u001b\\\u001b]4;13;#E3D2FF\u001b\\\u001b]4;14;#F8FAFF\u001b\\\u001b]4;15;#E5E0E9\u001b\\\u001b]10;#E8D3DE\u001b\\\u001b]11;[100]#1B1A1C\u001b\\\u001b]12;#E8D3DE\u001b\\\u001b]13;#E8D3DE\u001b\\\u001b]17;#E8D3DE\u001b\\\u001b]19;#1B1A1C\u001b\\\u001b]4;232;#E8D3DE\u001b\\\u001b]4;256;#E8D3DE\u001b\\\u001b]708;[100]#1B1A1C\u001b\\\u001b]11;#1B1A1C\u001b\\Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.6.0)\n", "Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.11.4)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.33.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.33.2)\n", "Requirement already satisfied: typing-extensions>=4.12.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.13.2)\n", "Requirement already satisfied: typing-inspection>=0.4.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.0)\n", "Note: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "%pip install molviewspec" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KaIXHXOT3YLz", "outputId": "91db94e7-b544-42a7-83b5-485a7bcc2996" }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/javascript": [ "\n", " setTimeout(function(){\n", " var wrapper = document.getElementById(\"molstar_3875edd2-452f-469f-bd9a-a1b7df184793\")\n", " if (wrapper === null) {\n", " throw new Error(\"Wrapper element #molstar_3875edd2-452f-469f-bd9a-a1b7df184793 not found anymore\")\n", " }\n", " var blob = new Blob([\"\\n\\n \\n