{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from notebook.services.config import ConfigManager\n",
    "cm = ConfigManager()\n",
    "cm.update('livereveal', {\n",
    "              'theme': 'serif',\n",
    "              'transition': 'zoom',\n",
    "              'start_slideshow_at': 'selected',\n",
    "})\n",
    "from IPython.display import IFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# dammit!\n",
    "\n",
    "### a simple transcriptome annotator\n",
    "\n",
    "--------------------------------------\n",
    "\n",
    "#### Camille Scott\n",
    "\n",
    "#### November 18, 2015\n",
    "\n",
    "#### DIB Lab Meeting"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# TL;DR\n",
    "\n",
    "<img align=\"left\" src=\"doc/_static/Character_Building.png\" width=\"300\" style=\"margin:15px\">\n",
    "\n",
    "dammit is a simple de novo transcriptome annotator. It was born out of the\n",
    "observations that annotation is mundane and annoying, all the individual pieces\n",
    "of the process exist already, and the existing solutions are overly complicated\n",
    "or rely on crappy non-free software.\n",
    "\n",
    "Science shouldn't suck for the sake of sucking, so dammit attempts\n",
    "to make this sucky part of the process suck a little less."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Motivations\n",
    "\n",
    "* Annotation is a major component of the sea lamprey project\n",
    "* Many of the pieces of dammit were already implemented\n",
    "* No easy to use solutions\n",
    "* No solutions with good licensing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## What should a good annotator even look like?\n",
    "\n",
    "* It should be easy to install and upgrade\n",
    "* It should only use Free software\n",
    "* It should make use of standard databases\n",
    "* It should output in reasonable formats\n",
    "* It should be relatively fast\n",
    "\n",
    "and of course,\n",
    "\n",
    "* it should try to be correct!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Without further ado..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "        <iframe\n",
       "            width=\"800\"\n",
       "            height=\"400\"\n",
       "            src=\"http://www.camillescott.org/dammit\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
       "<IPython.lib.display.IFrame at 0x10a5512d0>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "IFrame('http://www.camillescott.org/dammit', width=800, height=400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### The Obligatory Flowchart\n",
    "\n",
    "![The Workflow](flow.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Software Used\n",
    "\n",
    "* TransDecoder\n",
    "* BUSCO\n",
    "* HMMER\n",
    "* Infernal\n",
    "* LAST\n",
    "* crb-blast (for now)\n",
    "* pydoit (under the hood)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### All of these are Free Software, as in freedom and beer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Just as important, they're all *accessible* \n",
    "\n",
    "* they're maintained, and \n",
    "* relatively easy to install\n",
    "\n",
    "An exception is BUSCO, which is on a dodgy lab website and distributed as a tarball. I intent to remember this ;)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "## Databases\n",
    "\n",
    "<img align=\"right\" src=\"https://upload.wikimedia.org/wikipedia/en/thumb/0/03/Pfam_logo.gif/600px-Pfam_logo.gif\" width=\"200\">\n",
    "\n",
    "\n",
    "* Pfam-A\n",
    "* Rfam\n",
    "* OrthoDB\n",
    "* BUSCO databases\n",
    "* Uniref90\n",
    "* User-supplied protein databases\n",
    "\n",
    "The last one is important, and sometimes ignored.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "# Conditional Reciprocal Best LAST\n",
    "\n",
    "Building off Richard and co's work on Conditional Reciprocal Best BLAST, I've implemented a new version with Python and LAST -- CRBL. The original lives here: https://github.com/cboursnell/crb-blast\n",
    "\n",
    "## Why??\n",
    "\n",
    "* BLAST is too slooooooow\n",
    "* Ruby is yet another dependency to have users install\n",
    "* With Python and scikit learn, I have freedom to toy with models (and learn stuff)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "## And why does speed matter?\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![deluge](https://upload.wikimedia.org/wikipedia/commons/0/0d/Great_Wave_off_Kanagawa2.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "And, of course, some of these databases are BIG. Doing `blastx` and `tblastn` between a reasonably sized transcriptome and Uniref90 is not an experience you want to have.\n",
    "\n",
    "#### ie, practical concerns.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "### A brief intro to CRBB\n",
    "\n",
    "* Reciprocal Best Hits (RBH) is a standard method for ortholog detection\n",
    "* Transcriptomes have multiple multiple transcript isoforms, which confounds RBH\n",
    "* CRBB uses machine learning to get at this problem\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "#### RBH in the presence of isoforms\n",
    "\n",
    "![RBH](RBH.svg)\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "CRBB attempts to associate those isoforms with appropriate annotations by learning an appropriate e-value cutoff for different transcript lengths.\n",
    "\n",
    "![CRBB](CRBB.png)\n",
    "\n",
    "*from http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004365#s5*\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "### CRBL\n",
    "\n",
    "For CRBL, instead of fitting a linear model, we train a model.\n",
    "\n",
    "* SVM\n",
    "* Naive bayes\n",
    "\n",
    "One limitation is that LAST has no equivalent to `tblastn`. So, we find the RBHs using the TransDecoder ORFs, and then use the model on the translated transcriptome versus database hits.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Challenges\n",
    "\n",
    "Very difficult to establish an appropriate boundary!\n",
    "\n",
    "http://edhar.genomecenter.ucdavis.edu/~camille/dammit/pom.500.fa.x.pep.fa.crbl.model.fitted.pdf\n",
    "\n",
    "![sigh](http://edhar.genomecenter.ucdavis.edu/~camille/dammit/sigh.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "---\n",
    "\n",
    "Need to play with data preparation to make the boundary stand out more. CRBB did this by first running a sliding window over the transcript lengths and finding the centroid e-value, and fitting the resulting points to their model.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## dammit as a library\n",
    "\n",
    "dammit is a standard Python package, which means it can used as a library as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>E</th>\n",
       "      <th>EG2</th>\n",
       "      <th>q_aln_len</th>\n",
       "      <th>q_len</th>\n",
       "      <th>q_name</th>\n",
       "      <th>q_start</th>\n",
       "      <th>q_strand</th>\n",
       "      <th>s_aln_len</th>\n",
       "      <th>s_len</th>\n",
       "      <th>s_name</th>\n",
       "      <th>s_start</th>\n",
       "      <th>s_strand</th>\n",
       "      <th>score</th>\n",
       "      <th>bitscore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1031</th>\n",
       "      <td>2.200000e-47</td>\n",
       "      <td>1.600000e-37</td>\n",
       "      <td>301</td>\n",
       "      <td>923</td>\n",
       "      <td>SPAC1002.03c_gls2_I_glucosidase</td>\n",
       "      <td>533</td>\n",
       "      <td>+</td>\n",
       "      <td>294</td>\n",
       "      <td>994</td>\n",
       "      <td>SPAC30D11.01c_1119880_1123773_1_gto2_I_protein...</td>\n",
       "      <td>612</td>\n",
       "      <td>+</td>\n",
       "      <td>405</td>\n",
       "      <td>182.042873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1029</th>\n",
       "      <td>1.700000e-186</td>\n",
       "      <td>8.100000e-163</td>\n",
       "      <td>492</td>\n",
       "      <td>547</td>\n",
       "      <td>SPAC1002.12c_SPAC1002.12c_I_succinate-semialde...</td>\n",
       "      <td>55</td>\n",
       "      <td>+</td>\n",
       "      <td>493</td>\n",
       "      <td>494</td>\n",
       "      <td>SPAC139.05_1027811_1029993_1_SPAC139.05_I_prot...</td>\n",
       "      <td>0</td>\n",
       "      <td>+</td>\n",
       "      <td>1348</td>\n",
       "      <td>598.255639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1035</th>\n",
       "      <td>8.100000e-09</td>\n",
       "      <td>9.900000e+01</td>\n",
       "      <td>122</td>\n",
       "      <td>417</td>\n",
       "      <td>SPAC1002.13c_psu1_I_cell</td>\n",
       "      <td>36</td>\n",
       "      <td>+</td>\n",
       "      <td>122</td>\n",
       "      <td>531</td>\n",
       "      <td>SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...</td>\n",
       "      <td>140</td>\n",
       "      <td>+</td>\n",
       "      <td>113</td>\n",
       "      <td>53.162569</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1041</th>\n",
       "      <td>3.900000e-06</td>\n",
       "      <td>4.500000e+04</td>\n",
       "      <td>101</td>\n",
       "      <td>417</td>\n",
       "      <td>SPAC1002.13c_psu1_I_cell</td>\n",
       "      <td>32</td>\n",
       "      <td>+</td>\n",
       "      <td>97</td>\n",
       "      <td>531</td>\n",
       "      <td>SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...</td>\n",
       "      <td>196</td>\n",
       "      <td>+</td>\n",
       "      <td>93</td>\n",
       "      <td>44.335150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1040</th>\n",
       "      <td>2.900000e-06</td>\n",
       "      <td>3.300000e+04</td>\n",
       "      <td>155</td>\n",
       "      <td>417</td>\n",
       "      <td>SPAC1002.13c_psu1_I_cell</td>\n",
       "      <td>32</td>\n",
       "      <td>+</td>\n",
       "      <td>155</td>\n",
       "      <td>386</td>\n",
       "      <td>SPAC1F8.06_99871_101431_1_fta5_I_protein_codin...</td>\n",
       "      <td>22</td>\n",
       "      <td>+</td>\n",
       "      <td>94</td>\n",
       "      <td>44.776521</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  E            EG2  q_aln_len  q_len  \\\n",
       "1031   2.200000e-47   1.600000e-37        301    923   \n",
       "1029  1.700000e-186  8.100000e-163        492    547   \n",
       "1035   8.100000e-09   9.900000e+01        122    417   \n",
       "1041   3.900000e-06   4.500000e+04        101    417   \n",
       "1040   2.900000e-06   3.300000e+04        155    417   \n",
       "\n",
       "                                                 q_name  q_start q_strand  \\\n",
       "1031                    SPAC1002.03c_gls2_I_glucosidase      533        +   \n",
       "1029  SPAC1002.12c_SPAC1002.12c_I_succinate-semialde...       55        +   \n",
       "1035                           SPAC1002.13c_psu1_I_cell       36        +   \n",
       "1041                           SPAC1002.13c_psu1_I_cell       32        +   \n",
       "1040                           SPAC1002.13c_psu1_I_cell       32        +   \n",
       "\n",
       "      s_aln_len  s_len                                             s_name  \\\n",
       "1031        294    994  SPAC30D11.01c_1119880_1123773_1_gto2_I_protein...   \n",
       "1029        493    494  SPAC139.05_1027811_1029993_1_SPAC139.05_I_prot...   \n",
       "1035        122    531  SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...   \n",
       "1041         97    531  SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...   \n",
       "1040        155    386  SPAC1F8.06_99871_101431_1_fta5_I_protein_codin...   \n",
       "\n",
       "      s_start s_strand  score    bitscore  \n",
       "1031      612        +    405  182.042873  \n",
       "1029        0        +   1348  598.255639  \n",
       "1035      140        +    113   53.162569  \n",
       "1041      196        +     93   44.335150  \n",
       "1040       22        +     94   44.776521  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dammit.model import CRBL\n",
    "from dammit.parsers import maf_to_df_iter\n",
    "import pandas as pd\n",
    "\n",
    "maf_df = pd.concat([group for group in maf_to_df_iter('pom.500.fa.dammit/pep.fa.x.pom.500.fa.transdecoder.pep.maf')])\n",
    "maf_df.sort_values(by='q_name').head(n=5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>E</th>\n",
       "      <th>EG2</th>\n",
       "      <th>q_aln_len</th>\n",
       "      <th>q_len</th>\n",
       "      <th>q_name</th>\n",
       "      <th>q_start</th>\n",
       "      <th>q_strand</th>\n",
       "      <th>s_aln_len</th>\n",
       "      <th>s_len</th>\n",
       "      <th>s_name</th>\n",
       "      <th>s_start</th>\n",
       "      <th>s_strand</th>\n",
       "      <th>score</th>\n",
       "      <th>bitscore</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1031</th>\n",
       "      <td>2.200000e-47</td>\n",
       "      <td>1.600000e-37</td>\n",
       "      <td>301</td>\n",
       "      <td>923</td>\n",
       "      <td>SPAC1002.03c_gls2_I_glucosidase</td>\n",
       "      <td>533</td>\n",
       "      <td>+</td>\n",
       "      <td>294</td>\n",
       "      <td>994</td>\n",
       "      <td>SPAC30D11.01c_1119880_1123773_1_gto2_I_protein...</td>\n",
       "      <td>612</td>\n",
       "      <td>+</td>\n",
       "      <td>405</td>\n",
       "      <td>182.042873</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1029</th>\n",
       "      <td>1.700000e-186</td>\n",
       "      <td>8.100000e-163</td>\n",
       "      <td>492</td>\n",
       "      <td>547</td>\n",
       "      <td>SPAC1002.12c_SPAC1002.12c_I_succinate-semialde...</td>\n",
       "      <td>55</td>\n",
       "      <td>+</td>\n",
       "      <td>493</td>\n",
       "      <td>494</td>\n",
       "      <td>SPAC139.05_1027811_1029993_1_SPAC139.05_I_prot...</td>\n",
       "      <td>0</td>\n",
       "      <td>+</td>\n",
       "      <td>1348</td>\n",
       "      <td>598.255639</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1032</th>\n",
       "      <td>1.900000e-12</td>\n",
       "      <td>2.600000e-02</td>\n",
       "      <td>124</td>\n",
       "      <td>417</td>\n",
       "      <td>SPAC1002.13c_psu1_I_cell</td>\n",
       "      <td>35</td>\n",
       "      <td>+</td>\n",
       "      <td>131</td>\n",
       "      <td>531</td>\n",
       "      <td>SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...</td>\n",
       "      <td>120</td>\n",
       "      <td>+</td>\n",
       "      <td>140</td>\n",
       "      <td>65.079583</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1030</th>\n",
       "      <td>4.800000e-137</td>\n",
       "      <td>1.000000e-118</td>\n",
       "      <td>468</td>\n",
       "      <td>499</td>\n",
       "      <td>SPAC1002.16c_SPAC1002.16c_I_plasma</td>\n",
       "      <td>8</td>\n",
       "      <td>+</td>\n",
       "      <td>469</td>\n",
       "      <td>499</td>\n",
       "      <td>SPAC11D3.18c_144819_146935_-1_SPAC11D3.18c_I_p...</td>\n",
       "      <td>9</td>\n",
       "      <td>+</td>\n",
       "      <td>1016</td>\n",
       "      <td>451.720498</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                  E            EG2  q_aln_len  q_len  \\\n",
       "1031   2.200000e-47   1.600000e-37        301    923   \n",
       "1029  1.700000e-186  8.100000e-163        492    547   \n",
       "1032   1.900000e-12   2.600000e-02        124    417   \n",
       "1030  4.800000e-137  1.000000e-118        468    499   \n",
       "\n",
       "                                                 q_name  q_start q_strand  \\\n",
       "1031                    SPAC1002.03c_gls2_I_glucosidase      533        +   \n",
       "1029  SPAC1002.12c_SPAC1002.12c_I_succinate-semialde...       55        +   \n",
       "1032                           SPAC1002.13c_psu1_I_cell       35        +   \n",
       "1030                 SPAC1002.16c_SPAC1002.16c_I_plasma        8        +   \n",
       "\n",
       "      s_aln_len  s_len                                             s_name  \\\n",
       "1031        294    994  SPAC30D11.01c_1119880_1123773_1_gto2_I_protein...   \n",
       "1029        493    494  SPAC139.05_1027811_1029993_1_SPAC139.05_I_prot...   \n",
       "1032        131    531  SPAC13G6.10c_191928_194053_-1_asl1_I_protein_c...   \n",
       "1030        469    499  SPAC11D3.18c_144819_146935_-1_SPAC11D3.18c_I_p...   \n",
       "\n",
       "      s_start s_strand  score    bitscore  \n",
       "1031      612        +    405  182.042873  \n",
       "1029        0        +   1348  598.255639  \n",
       "1032      120        +    140   65.079583  \n",
       "1030        9        +   1016  451.720498  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "CRBL.best_hits(maf_df)\n",
    "maf_df.head(n=4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### The code is even relatively documented..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iterator yielding DataFrames of length chunksize holding MAF alignments.\n",
      "\n",
      "    An extra column is added for bitscore, using the equation described here:\n",
      "        http://last.cbrc.jp/doc/last-evalues.html\n",
      "\n",
      "    Args:\n",
      "        fn (str): Path to the MAF alignment file.\n",
      "        chunksize (int): Alignments to parse per iteration.\n",
      "    Yields:\n",
      "        DataFrame: Pandas DataFrame with the alignments.\n",
      "    \n"
     ]
    }
   ],
   "source": [
    "print maf_to_df_iter.__doc__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The the documentation for more: http://www.camillescott.org/dammit/py-modindex.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Future Work\n",
    "\n",
    "* Shoring up the CRBL implementation\n",
    "* Finishing output formatting -- still need to do the FASTA output\n",
    "* Some slight rearrangment of the command line interface"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Acknowledgements\n",
    "\n",
    "Thanks to Titus for advice, input, and PRs, Chris Hamm and Matt MacManes for filing issues, Michael for advice, Richard for his nifty method, and the terrible state of the bioinformatics industry for inspiring me."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python [conda env:py3.dammit]",
   "language": "python",
   "name": "conda-env-py3.dammit-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}