{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pYPK0_PGI_ScXKS1_FBA1\n",
    "\n",
    "This notebook describes the assembly of the [_Saccaromyces cerevisiae_](www.yeastgenome.org)\n",
    "single gene expression vector pYPK0_PGI_ScXKS1_FBA1.\n",
    "\n",
    "It is made by _in-vivo_ homologous recombination between three PCR products and one linear vector fragment.\n",
    "The PCR products are a promoter generated from a pYPK_Z vector, a gene from a template and \n",
    "a terminator from a pYPKa_E vector. The three PCR products are joined to\n",
    "a linearized [pYPKpw](https://github.com/BjornFJohansson/ypk-xylose-pathways/blob/master/pYPKpw.gb) \n",
    "backbone vector that has the [URA3](http://www.yeastgenome.org/locus/S000000747/overview) \n",
    "marker and a _S. crevisiae_ [2 micron](http://blog.addgene.org/plasmids-101-yeast-vectors) origin of replication. \n",
    "\n",
    "The four linear DNA fragments are joined by homologous recombination in a \n",
    "[_Saccharomyces cerevisiae_](http://wiki.yeastgenome.org/index.php/Commonly_used_strains) ura3 mutant.\n",
    "\n",
    "![pYPK0_promoter_gene_terminator](tp_g_tp.png \"pYPK0_promoter_gene_terminator\")\n",
    "\n",
    "The [pydna](https://pypi.python.org/pypi/pydna/) package is imported in the code cell below. \n",
    "There is a [publication](http://www.biomedcentral.com/1471-2105/16/142) describing pydna as well as\n",
    "[documentation](http://pydna.readthedocs.org/en/latest/) available online. \n",
    "Pydna is developed on [Github](https://github.com/BjornFJohansson/pydna)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydna.parsers import parse_primers\n",
    "from pydna.readers import read\n",
    "from pydna.design import primer_design\n",
    "from pydna.amplify import pcr\n",
    "from pydna.assembly import Assembly"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The YPK [standard primers](standard_primers.txt) are read into a dictionary in the code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = { x.id: x for x in parse_primers(\"standard_primers.txt\") }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The backbone vector [pYPKpw](pYPKpw.gb) is read from a local file in the code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "pYPKpw = read(\"pYPKpw.gb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The backbone vector is linearized by digestion with [EcoRV](http://rebase.neb.com/rebase/enz/EcoRV.html).\n",
    "The restriction enzyme functionality is provided by [biopython](http://biopython.org)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from Bio.Restriction import EcoRV\n",
    "\n",
    "pYPK_EcoRV = pYPKpw.linearize(EcoRV)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The pYPKa derived _E. coli_ plasmids containing the [promoter](pYPKa_Z_PGI.gb) and [terminator](pYPKa_E_FBA1.gb) \n",
    "as well as the [gene template](ScXKS1.gb) sequence are read into three variables in the code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "promoter_template   = read(\"pYPKa_Z_PGI.gb\")\n",
    "gene_template       = read(\"ScXKS1.gb\")\n",
    "terminator_template = read(\"pYPKa_E_FBA1.gb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The construction of the two vector above are described in the [pYPKa_ZE_PGI](pYPKa_ZE_PGI.ipynb) notebooks.\n",
    "\n",
    "The promoter is amplified with from [pYPKa_Z_PGI](pYPKa_Z_PGI.gb). A suggested PCR program can be found at the end of this document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "prom = pcr( p['577'], p['567'], promoter_template)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Primers with tails are needed for the recombination between the gene and the promoter/terminator PCR products.\n",
    "The tails are designed to provide 33 bp of terminal homology between the DNA fragments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "fp_tail = \"tgcccactttctcactagtgacctgcagccgacAA\"\n",
    "rp_tail = \"AAatcctgatgcgtttgtctgcacagatggCAC\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Primers with the tails above are designed for the gene template in the code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "ins = primer_design(gene_template)\n",
    "fp = fp_tail + ins.forward_primer\n",
    "rp = rp_tail + ins.reverse_primer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The primers are included in the [new_primers.txt](new_primers.txt) list and in the end of the [pathway notebook](pw.ipynb) file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">fw1803 ScXKS1\n",
      "tgcccactttctcactagtgacctgcagccgacAAATGTTGTGTTCAGTAATTCAG\n",
      "\n",
      ">rv1803 ScXKS1\n",
      "AAatcctgatgcgtttgtctgcacagatggCACTTAGATGAGAGTCTTTTCCAG\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(fp.format(\"fasta\"))\n",
    "print(rp.format(\"fasta\"))\n",
    "with open(\"new_primers.txt\", \"a+\") as f:\n",
    "    f.write(fp.format(\"fasta\"))\n",
    "    f.write(rp.format(\"fasta\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The gene is amplifed using the newly designed primers. A suggested PCR program can be found at the end of this document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "gene = pcr( fp, rp, gene_template)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The terminator is amplified from [pYPKa_E_FBA1](pYPKa_E_FBA1.gb). A suggested PCR program can be found at the end of this document."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "term = pcr( p['568'], p['578'], terminator_template)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The four linear DNA fragments are mixed and transformed\n",
    "to a _Saccharomyces cerevisiae_ ura3 mutant.\n",
    "\n",
    "The fragments will be assembled by _in-vivo_ [homologous recombination](http://www.ncbi.nlm.nih.gov/pubmed/2828185):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Assembly:\n",
       "Sequences........................: [5603] [1231] [1871] [969]\n",
       "Sequences with shared homologies.: [5603] [1231] [969] [1871]\n",
       "Homology limit (bp)..............: 31\n",
       "Number of overlaps...............: 4\n",
       "Nodes in graph(incl. 5' & 3')....: 6\n",
       "Only terminal overlaps...........: No\n",
       "Circular products................: [9223]\n",
       "Linear products..................: [9467] [9364] [9256] [9256] [8531] [8166] [7418] [6693] [6328] [4005] [3069] [2807] [244] [141] [33] [33]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "asm = Assembly( (pYPK_EcoRV, prom, gene, term), limit=31 )\n",
    "\n",
    "asm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The representation of the asm object above should normally indicate one circcular product only.  \n",
    "More than one circular products might indicate an incorrect assembly strategy or represent\n",
    "by-products that might arise in the assembly process.  \n",
    "The largest recombination product is chosen as candidate for the pYPK0_PGI_ScXKS1_FBA1 vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       " -|pYPKpw|124\n",
       "|         \\/\n",
       "|         /\\\n",
       "|         124|1231bp_PCR_prod|33\n",
       "|                             \\/\n",
       "|                             /\\\n",
       "|                             33|1871bp_PCR_prod|33\n",
       "|                                                \\/\n",
       "|                                                /\\\n",
       "|                                                33|969bp_PCR_prod|242\n",
       "|                                                                  \\/\n",
       "|                                                                  /\\\n",
       "|                                                                  242-\n",
       "|                                                                     |\n",
       " ---------------------------------------------------------------------"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "candidate = asm.assemble_circular()[0]\n",
    "\n",
    "candidate.figure()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The candidate vector is synchronized to the backbone vector. This means that\n",
    "the plasmid origin is shifted so that it matches the pYPKpw backbone vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "result = candidate.synced(pYPKpw)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Diagnostic PCR confirmation\n",
    "\n",
    "The structure of the final vector is confirmed by two\n",
    "separate PCR reactions, one for the promoter and gene and\n",
    "one for the gene and terminator.\n",
    "\n",
    "PCR using standard primers 577 and 467 to amplify promoter and gene."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "product = pcr( p['577'], p['467'], result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A correct clone should give this size in base pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3073\n"
     ]
    }
   ],
   "source": [
    "print(len(product))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the promoter is missing from the assembly, the PCR product will have this size in base pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1842\n"
     ]
    }
   ],
   "source": [
    "print(len(product) - len(prom))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the gene is missing from the assembly, the PCR product will have this size in base pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1202\n"
     ]
    }
   ],
   "source": [
    "print(len(product) - len(gene))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PCR using standard primers 468 and 578 to amplify gene and terminator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "product2 = pcr( p['468'], p['578'], result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A correct clone should give this size:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2824\n"
     ]
    }
   ],
   "source": [
    "print(len(product2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the gene is missing from the assembly, the PCR product will have this size in base pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "953\n"
     ]
    }
   ],
   "source": [
    "print(len(product2) - len(gene))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the terminator is missing from the assembly, the PCR product will have this size in base pairs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1855\n"
     ]
    }
   ],
   "source": [
    "print(len(product2) - len(term))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The cseguid checksum for the resulting plasmid is calculated for future reference.\n",
    "The [cseguid checksum](http://pydna.readthedocs.org/en/latest/pydna.html#pydna.utils.cseguid) \n",
    "uniquely identifies a circular double stranded sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R5OEqb-2ZLrJfiIM9r80CFLI5H4"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.cseguid()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The file is named based on the names of promoter, gene and terminator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "result.locus = \"pYPK0_tp_g_tp\"\n",
    "result.definition = \"pYPK0_PGI_ScXKS1_FBA1\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sequence is stamped with cseguid checksum. This can be used to verify the \n",
    "integrity of the sequence file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cSEGUID_R5OEqb-2ZLrJfiIM9r80CFLI5H4"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "result.stamp()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Write sequence to a local file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<font face=monospace><a href='pYPK0_PGI_ScXKS1_FBA1.gb' target='_blank'>pYPK0_PGI_ScXKS1_FBA1.gb</a></font><br>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "result.write(\"pYPK0_PGI_ScXKS1_FBA1.gb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## PCR programs for the amplification of Promoter, Gene and Terminator\n",
    "\n",
    "Promoter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Taq (rate 30 nt/s) 35 cycles             |1231bp\n",
       "95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN\n",
       "|_________|_____          72.0°C  |72.0°C|SaltC 50mM\n",
       "| 03min00s|30s  \\         ________|______|Primer1C 1.0µM\n",
       "|         |      \\ 56.3°C/ 0min37s| 5min |Primer2C 1.0µM\n",
       "|         |       \\_____/         |      |GC 40%\n",
       "|         |         30s           |      |4-12°C"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prom.program()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Gene"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Taq (rate 30 nt/s) 35 cycles             |1871bp\n",
       "95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN\n",
       "|_________|_____          72.0°C  |72.0°C|SaltC 50mM\n",
       "| 03min00s|30s  \\         ________|______|Primer1C 1.0µM\n",
       "|         |      \\ 54.0°C/ 0min56s| 5min |Primer2C 1.0µM\n",
       "|         |       \\_____/         |      |GC 40%\n",
       "|         |         30s           |      |4-12°C"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gene.program()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Terminator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Taq (rate 30 nt/s) 35 cycles             |969bp\n",
       "95.0°C    |95.0°C                 |      |Tm formula: Biopython Tm_NN\n",
       "|_________|_____          72.0°C  |72.0°C|SaltC 50mM\n",
       "| 03min00s|30s  \\         ________|______|Primer1C 1.0µM\n",
       "|         |      \\ 55.9°C/ 0min29s| 5min |Primer2C 1.0µM\n",
       "|         |       \\_____/         |      |GC 38%\n",
       "|         |         30s           |      |4-12°C"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "term.program()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download [pYPK0_PGI_ScXKS1_FBA1](pYPK0_PGI_ScXKS1_FBA1.gb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cSEGUID_R5OEqb-2ZLrJfiIM9r80CFLI5H4"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pydna\n",
    "reloaded = read(\"pYPK0_PGI_ScXKS1_FBA1.gb\")\n",
    "reloaded.verify_stamp()\n"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}