{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Identification and assembly practical" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How do we know what our sample is?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a few easy ways ... \n", "\n", " - BLASTing reads\n", " - OneCodex\n", " - Kraken (What's in my Pot?)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For E. coli we will use the file:\n", " \n", "MAP006-1.pass.2D.poRe.fastq\n", "\n", "For the new genome use:\n", "\n", "pc1_shigella.tar\n", "\n", "Start with E. coli!, then move onto Shigella." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## miniasm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will start by assembling E. coli with miniasm." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Install minimap and miniasm (requiring gcc and zlib)\n", "git clone https://github.com/lh3/minimap && (cd minimap && make)\n", "git clone https://github.com/lh3/miniasm && (cd miniasm && make)\n", "# Overlap\n", "minimap/minimap -Sw5 -L100 -m0 -t8 MAP006-1.pass.2D.poRe.fastq MAP006-1.pass.2D.poRe.fastq | gzip -1 > reads.paf.gz\n", "# Layout\n", "miniasm/miniasm -f MAP006-1.pass.2D.poRe.fastq reads.paf.gz > contigs.gfa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to convert the unitigs file into a FASTA file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "awk '/^S/{print \">\"$2\"\\n\"$3}' contigs.gfa | fold > contigs.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use QUAST to explore the assembly quality" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "wget \"http://downloads.sourceforge.net/project/quast/quast-3.2.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fquast%2Ffiles%2F&ts=1450193070&use_mirror=netcologne\" -O quast.tar.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tar xvfz quast.tar.gz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "sudo apt-get install python-matplotlib" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "quast-3.2/quast.py -R NC_000913.3.fa contigs.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the PDF report from ``quast_results/latest``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hybrid assembly" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the FASTQ file to FASTA (of course, you could have output FASTA via poretools or poRe)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "seqtk seq -A MAP006-1.pass.2D.poRe.fastq > MAP006-1.pass.2D.poRe.fasta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run Spades using the nanopore data and Illumina data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "spades.py --only-assembler -k 21,51,71 -1 MiSeq/SRR2627019_1.fastq.gz -2 MiSeq/SRR2627019_2.fastq.gz --nanopore MAP006-1.pass.2D.poRe.fasta -o SPADES &" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }