{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "c9QFJ338xsQP" }, "source": [ "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/combine-org/combine-notebooks/main?labpath=%2Fnotebooks%2Fsedml.ipynb)\n", "\"Open" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "_4H3KfaPxsQT" }, "source": [ "# Simple SBOL example" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "mZjQHFnZxsQX" }, "source": [ "This notebook creates a simple model in [Synthetic Biology Open Language (SBOL) Version 3.0.1](https://sbolstandard.org/docs/SBOL3.0.1.pdf). SBOL is a free and open-source standard for the representation of biological designs. SBOL uses existing Semantic Web practices and resources, such as Uniform Resource Identifiers (URIs) and ontologies, to unambiguously identify and define genetic design elements. In this notebook we will be creating a simple sequence based on this [notebook](https://github.com/RudgeLab/LOICA/blob/master/notebooks/LOICA_DEMO_Operators.ipynb). We then create a diagram of the sequence (see below) using the SBOL Visual standard. SBOL Visual aims to organize and systematize conventions in order to produce a coherent language for expressing the structure and function of genetic designs.\n", "\n", "
\n", "\"SBML\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "tF-cOdCmxsQZ" }, "source": [ "## 1) Including libraries and setup" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "gDrGE7GHcpfi" }, "source": [ "Note: Please change the `colab` flag to `True` if you are using Google Colab." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "FFJJyAbIxsQb" }, "outputs": [], "source": [ "colab = False\n", "if colab:\n", " %pip install git+https://github.com/combine-org/combine-notebooks\n", " %pip install pyflapjack --quiet\n", " %pip install loica --quiet\n", " %pip install sbol3 --quiet\n", " %pip install sbol_utilities --quiet" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QpeLhn1exsQf", "outputId": "2864754c-764a-42de-b8c7-0e8d31a0f448" }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import sbol3\n", "from sbol_utilities import component\n", "import parasbolv as psv\n", "import matplotlib.pyplot as plt\n", "\n", "from combine_notebooks import RESULTS_DIR, GLYPH_DIR\n", "\n", "%matplotlib inline" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "6ykS0hotxsQj" }, "source": [ "## 2) Declaring the SBOL model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "mGFfPu52xsQp" }, "source": [ "A valid SBOL Document with the sequences and parts that we will use in the design. Here we create the J23101 Operator as an Engineered Region composed by the promoter J23101 and the BASIC RBS1 (derived from BBa_B0033). We create the GFP GeneProduct also as an Engineering Region now composed by GFP CDS, M0050 degradation tag and rrnbT1 terminator. First, we will decalre the SBOL document and set the namespace." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "FC_Plc2nxsQr" }, "outputs": [], "source": [ "doc = sbol3.Document()\n", "sbol3.set_namespace('https://github.com/Gonza10V')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "YhxZ47w9xsQs" }, "source": [ "Next, we will create a promoter. A synthetic promoter is a sequence of DNA that does not exist in nature and which has been designed to control gene expression of a target gene." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "JonkziJkxsQt" }, "outputs": [], "source": [ "j23101, j23101_seq = component.promoter('J23101','tttacagctagctcagtcctaggtattatgctagc ', description='https://synbiohub.org/public/igem/BBa_J23101/1')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "Mjr67Nr2xsQu" }, "source": [ "Now, we will create a ribosome binding site (RBS), which is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of translation." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "id": "-IXdm_WExsQu" }, "outputs": [], "source": [ "rbs1, rbs1_seq = component.rbs('RBS1', 'ttgaacaccgtcTCAGGTAAGTATCAGTTGTAAatcacacaggacta', description='BASIC Linker RBS1')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "pXjx7lV3xsQv" }, "source": [ "Define the promoter and RBS as our first engineered region." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "id": "ewZcVtd_xsQw" }, "outputs": [], "source": [ "op_j23101 = component.engineered_region('operator_ptet', [j23101,rbs1], description= 'LOICA Operator J23101')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "1B3lYwtFxsQx" }, "source": [ "The next step is to create a CoDing Sequence (CDS). This is a region of DNA or RNA whose sequence determines the sequence of amino acids in a protein." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "jLLQIAHZxsQx" }, "outputs": [], "source": [ "gfpm3, gfpm3_seq = component.cds('GFP_mut3', 'atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagcgcgaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaa', description='GFP mut3 Coding Sequence, no BsaI site, no stop codon')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "HrAx6Y2vxsQy" }, "source": [ "Next, we add a sequence that will ensure the resultant protein susceptible to very fast degradation." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "id": "IvYz701JxsQz" }, "outputs": [], "source": [ "m0050, m0050_seq = component.protein_stability_element('M0050', 'gctgctaacgacgaaaactacgctctggctgctTAAattgaacta', description='http://parts.igem.org/wiki/index.php?title=Part:BBa_M0050')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "sklG38m8xsQ0" }, "source": [ "Finally, we will create a terminator sequence. Terminators are genetic parts that usually occur at the end of a gene or operon and cause transcription to stop." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "id": "RaR9J5chxsQ1" }, "outputs": [], "source": [ "ter1, ter1_seq = component.terminator('TER1', 'GTCCatttgtcctactcaggagagcgttcaccgacaaacaacagataaaacgaaaggcccagtctttcgactgagcctttcgttttatttgTAAGGCTCG', description='rrnB T1 terminator from Potvin-Trottier pLPT119, extra stop codon')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "G5eESkDvxsQ1" }, "source": [ "Define the CDS, stability element, and terminator as our second engineered region." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "id": "8p9dXOq4xsQ2" }, "outputs": [], "source": [ "gp_gfp = component.engineered_region(f'geneproduct_{gfpm3.display_id}_{m0050.display_id}_{ter1.display_id}', [gfpm3, m0050, ter1], description='LOICA GeneProduct GFP')" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "xq_PVU4mxsQ3" }, "source": [ "Add everything we have just created to the SBOL document." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hiPHV3cKxsQ4", "outputId": "1cdd6e80-1f7c-41b9-c6fc-7516c4cf4c38" }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "doc.add([j23101, j23101_seq, rbs1, rbs1_seq, op_j23101, gfpm3, gfpm3_seq, m0050, m0050_seq, ter1, ter1_seq, gp_gfp])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "dfrFOgv8xsQ4" }, "source": [ "## 3) Validate, save and display SBOL document" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "28ExKwqUxsQ4" }, "source": [ "Print out all object identities in the SBOL document. Then validate the document." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kcl1Ypo8xsQ4", "outputId": "adffbd1c-ae83-4154-96b4-8040f59d5ab0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://github.com/Gonza10V/J23101\n", "https://github.com/Gonza10V/J23101_seq\n", "https://github.com/Gonza10V/RBS1\n", "https://github.com/Gonza10V/RBS1_seq\n", "https://github.com/Gonza10V/operator_ptet\n", "https://github.com/Gonza10V/GFP_mut3\n", "https://github.com/Gonza10V/GFP_mut3_seq\n", "https://github.com/Gonza10V/M0050\n", "https://github.com/Gonza10V/M0050_seq\n", "https://github.com/Gonza10V/TER1\n", "https://github.com/Gonza10V/TER1_seq\n", "https://github.com/Gonza10V/geneproduct_GFP_mut3_M0050_TER1\n", "0\n" ] } ], "source": [ "for obj in doc.objects:\n", " print(obj.identity)\n", "report_sbol3 = doc.validate()\n", "print(len(report_sbol3))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "lHyPoHKOCUHs" }, "source": [ "We can save this SBOL as an XML file which is in RDF format. An RDF file is a document written in the Resource Description Framework (RDF) language, which is used to represent information about resources on the web. As you can see in the text, most tags are pointing to a specific URI on the internet." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "nwF8XYxwCTXV", "outputId": "12744983-d1d6-4cd1-ce1b-78b0026439cf" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", " \n", " TER1\n", " rrnB T1 terminator from Potvin-Trottier pLPT119, extra stop codon\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Constraint1\n", " \n", " \n", " \n", " \n", " \n", " \n", " Constraint1\n", " \n", " \n", " \n", " \n", " \n", " \n", " GFP_mut3_seq\n", " \n", " \n", " atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttgaattagatggtgatgttaatgggcacaaattttctgtcagtggagagggtgaaggtgatgcaacatacggaaaacttacccttaaatttatttgcactactggaaaactacctgttccatggccaacacttgtcactactttcggttatggtgttcaatgctttgcgagatacccagatcatatgaaacagcatgactttttcaagagtgccatgcccgaaggttatgtacaggaaagaactatatttttcaaagatgacgggaactacaagacacgtgctgaagtcaagtttgaaggtgatacccttgttaatagaatcgagttaaaaggtattgattttaaagaagatggaaacattcttggacacaaattggaatacaactataactcacacaatgtatacatcatggcagacaaacaaaagaatggaatcaaagttaacttcaaaattagacacaacattgaagatggaagcgttcaactagcagaccattatcaacaaaatactccaattggcgatggccctgtccttttaccagacaaccattacctgtccacacaatctgccctttcgaaagatcccaacgaaaagcgcgaccacatggtccttcttgagtttgtaacagctgctgggattacacatggcatggatgaactatacaaa\n", " \n", " \n", " \n", " M0050_seq\n", " \n", " \n", " gctgctaacgacgaaaactacgctctggctgctTAAattgaacta\n", " \n", " \n", " \n", " GFP_mut3\n", " GFP mut3 Coding Sequence, no BsaI site, no stop codon\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " SubComponent1\n", " \n", " \n", " \n", " \n", " SubComponent1\n", " \n", " \n", " \n", " \n", " J23101\n", " https://synbiohub.org/public/igem/BBa_J23101/1\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " TER1_seq\n", " \n", " \n", " GTCCatttgtcctactcaggagagcgttcaccgacaaacaacagataaaacgaaaggcccagtctttcgactgagcctttcgttttatttgTAAGGCTCG\n", " \n", " \n", " \n", " RBS1_seq\n", " \n", " \n", " ttgaacaccgtcTCAGGTAAGTATCAGTTGTAAatcacacaggacta\n", " \n", " \n", " \n", " SubComponent3\n", " \n", " \n", " \n", " \n", " M0050\n", " http://parts.igem.org/wiki/index.php?title=Part:BBa_M0050\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " J23101_seq\n", " \n", " \n", " tttacagctagctcagtcctaggtattatgctagc \n", " \n", " \n", " \n", " geneproduct_GFP_mut3_M0050_TER1\n", " LOICA GeneProduct GFP\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " RBS1\n", " BASIC Linker RBS1\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " operator_ptet\n", " LOICA Operator J23101\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " SubComponent2\n", " \n", " \n", " \n", " \n", " SubComponent2\n", " \n", " \n", " \n", "\n", "\n" ] } ], "source": [ "RESULTS_DIR.mkdir(parents=True, exist_ok=True)\n", "sbol_file = RESULTS_DIR / 'hello_world_sbol.xml'\n", "doc.write(sbol_file)\n", "with open(sbol_file) as f:\n", " print(f.read())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 4) SBOL Visual" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this section we will be using the [SBOL visual standard](https://sbolstandard.org/docs/SBOL-Visual-3.0.pdf). SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. At the same time, it aim to make this language simple and easy to use, allowing a high degree of flexibility and freedom in how such diagrams are organized, presented, and styled—in particular, it is easy to create diagrams either by hand or using a wide variety of software programs.\n", "\n", "We will be using the paraSBOLv library to draw a simple diagram. SBOL visual is made up of glyphs, each of which will correspond to something in SBOL. Below we will create four glyphs in isolation, then we will chain them together to create a model similar to the above SBOL model.\n", "\n", "Firstly, we will create an object to render the glyths." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "renderer = psv.GlyphRenderer()\n", "renderer = psv.GlyphRenderer(glyph_path = str(GLYPH_DIR))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This glyth is a promoter. It is created using the `renderer.draw_glyph` function." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 20.0)" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "bounds, end_point = renderer.draw_glyph(ax, 'Promoter', (2.5, 2.5))\n", "\n", "ax.set_xlim([bounds[0][0] - 2.5, bounds[1][0] + 2.5])\n", "ax.set_ylim([bounds[0][1] - 2.5, bounds[1][1] + 2.5])\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This glyth is a ribosome entry site." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 15.000000000000004)" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "bounds, end_point = renderer.draw_glyph(ax, 'RibosomeEntrySite', (2.5, 2.5))\n", "\n", "ax.set_xlim([bounds[0][0] - 2.5, bounds[1][0] + 2.5])\n", "ax.set_ylim([bounds[0][1] - 2.5, bounds[1][1] + 2.5])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This glyth is a CoDing Sequence (CDS)." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(-7.5, 12.5)" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "bounds, end_point = renderer.draw_glyph(ax, 'CDS', (2.5, 2.5))\n", "\n", "ax.set_xlim([bounds[0][0] - 2.5, bounds[1][0] + 2.5])\n", "ax.set_ylim([bounds[0][1] - 2.5, bounds[1][1] + 2.5])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This glyph is a terminator." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0, 15.0)" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots()\n", "bounds, end_point = renderer.draw_glyph(ax, 'Terminator', (2.5, 2.5))\n", "\n", "ax.set_xlim([bounds[0][0] - 2.5, bounds[1][0] + 2.5])\n", "ax.set_ylim([bounds[0][1] - 2.5, bounds[1][1] + 2.5])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now we will combine these into a single sequence. Starting with a promoter, ending with a terminator, and containing both ribosome entry site and CDS components in between." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from collections import namedtuple\n", "from matplotlib.pyplot import figure\n", "\n", "part_list = []\n", "Part = namedtuple('part', ['glyph_type', 'orientation', 'user_parameters', 'style_parameters'])\n", "part_list.append(Part('Promoter', 'forward', None, None))\n", "part_list.append(Part('RibosomeEntrySite', 'forward', None, None))\n", "part_list.append(Part('CDS', 'forward', None, None))\n", "part_list.append(Part('Terminator', 'forward', None, None))\n", "\n", "construct = psv.Construct(part_list, renderer)\n", "\n", "fig, ax, baseline_start, baseline_end, bounds = construct.draw(False)\n", "fig.dpi = 400\n", "\n", "ax.plot([baseline_start[0], baseline_end[0]], [baseline_start[1], baseline_end[1]], color=(0,0,0), linewidth=1.5, zorder=0)\n", "\n", "plt.savefig(str(RESULTS_DIR / 'hello_world_sbol_visual.png'))" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" } }, "nbformat": 4, "nbformat_minor": 0 }