{ "metadata": { "name": "", "signature": "sha256:293537c95a9c3321e2c514ae32b98cbb52a2aeb2ce03daa6020a30ffde13863d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Pyura (Piura) chilensis Transcriptome " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Objectives" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Gain Experience using IPython Notebook (or alternative) to document research so that it is useful to future you. \n", "* Use iPlant Collaborative Discovery Environment \n", "* Install and software locally (BLAST) \n", "* Explore structured data tables with commandline\n", "* Use SQLShare to join and query table." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "What is Piura?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import HTML\n", "HTML('')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "" ] } ], "prompt_number": 1 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![pic](../img/Coquimbo_-_Google_Maps_1A58733E.png)" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Scenario" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sequencing data from 454 Platform was just sequenced by a core facility. You need to take it, make sure it is of decent quality, your data, and determine the functional category of genes expressed." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Location of Raw Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 1 Upload to iPlant Discovery Environment" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 2 Convert SFF to Fastq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_33__Discovery_Environment_1A33CEE2.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_40__Discovery_Environment_1A33CF96.png\"/" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 3 Check Sequence Quality - FASTQC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_40__Discovery_Environment_1A33D0B2.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_43__Discovery_Environment_1A33D446.png\"/" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 3b Download zip file and view graphs" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import HTML\n", "HTML('')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import HTML\n", "HTML('')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 4 Trim Sequences" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_103__Discovery_Environment_1A34B118.png\"/" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 5 Recheck Sequence Quality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_109__Discovery_Environment_1A34B470.png\"/" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import HTML\n", "HTML('')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import HTML\n", "HTML('')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 6 Assemble Reads into Transcriptome (Trinity)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets do each sample separately " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"_56__Discovery_Environment_1A5EFB8C.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Results \n", "* http://de.iplantcollaborative.org/dl/d/352B2BF2-C1C3-4866-9B8B-DA3928D0D0A1/PiuraC_Val_Trinity.fasta\n", "* http://de.iplantcollaborative.org/dl/d/38C4F315-476D-48E7-B8F4-D41FE7BF613A/PiuraC_Coq_Trinity.fasta" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 6b Assemble Reads into Transcriptome (CLC)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"CLC_Genomics_Workbench_7_5_1_1A44A53F.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"CLC_Genomics_Workbench_7_5_1_1A44A571.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Results\n", "* http://eagle.fish.washington.edu/cnidarian/Piura_v1_contigs.fa\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#in repo\n", "!head ../data/Piura_v1_contigs.fa" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">PiuraChilensis_v1_contig_1\r\n", "ATTTACAATACGAAGTAAAATAGATAACGTGAAAATAATCTTGGTGCTGGATGATCGATC\r\n", "AAGTTCACCAATATTTTATTGTAAAAAATCATTCTAAACAGCATGAAATCGTGTACAATG\r\n", "TATAAACAAGCAAATATATAACACTAAAGCAAGAGGGCGTAAGTGGGGGGGTGGGTGAGA\r\n", "GTAAAAAATTCAAACATGTCAAATACCCCGGCGTTAGCCTTAAAAGCACCATGGACTTCT\r\n", "GCCTTCAATAAGCATAAAATTAAAACACCTAATACACAATGAATATACAGATAAAACAGA\r\n", "TTTATGAATAGTTGGTGTTACATCTTTTACAGCCATAAGCCTTCATTTTGCTTCCAAACG\r\n", "TATAAAATCTGACTTGGAACAATATACAGCCATGAGATATGACACAGCGAGCACTACAAT\r\n", "ATATATTTATCTTGTACTATACAGCCTGTACAAGAAAATTCTGGAATTGTCTTCACAAGA\r\n", "GACAGAAAAATAGTTGCAATGTGAATGCTAGTCTACTATTTGATCACAATTGGATAGAAA\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 7 Annotate Contigs (Blast)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "see [01-Local_BLAST How-to](http://nbviewer.ipython.org/github/sr320/austral/blob/master/modules/01-Piura-Annotation/01-Local_BLAST.ipynb)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!blastx \\\n", "-query ../data/Piura_v1_contigs.fa \\\n", "-db /Volumes/Bay3/Software/ncbi-blast-2.2.29\\+/db/uniprot_sprot_r2013_12 \\\n", "-out ../data/Piura_v1_uniprot_sprot.tab \\\n", "-evalue 1E-05 \\\n", "-max_target_seqs 1 \\\n", "-max_hsps 1 \\\n", "-outfmt 6 \\\n", "-num_threads 2" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "#in repo\n", "!tail -2 ../data/Piura_v1_uniprot_sprot.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "PiuraChilensis_v1_contig_15018\tsp|O18973|RABX5_BOVIN\t79.07\t43\t9\t0\t888\t1016\t321\t363\t3e-15\t80.1\r\n", "PiuraChilensis_v1_contig_15021\tsp|Q9Z1Z1|E2AK3_RAT\t51.61\t93\t45\t0\t100\t378\t971\t1063\t8e-22\t97.8\r\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "#still needs to be run\n", "\n", "!/Applications/bioinfo/ncbi-blast-2.2.30/bin/blastx \\\n", "-query ../data/PiuraC_Val_Trinity_2ndhalf.fasta \\\n", "-db /Users/sr320/data-genomic/blast/db/uniprot_sprot_r2015_01 \\\n", "-out ../data/PiuraC_Val_Trinity_uniprot_sprot_2ndhalf.tab \\\n", "-evalue 1E-05 \\\n", "-max_target_seqs 1 \\\n", "-max_hsps 1 \\\n", "-outfmt 6 \\\n", "-num_threads 6" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "!tail ../data/PiuraC_Val_Trinity_uniprot_sprot_2ndhalf.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "comp31014_c0_seq1\tsp|Q63159|COQ3_RAT\t50.93\t108\t50\t1\t3\t317\t155\t262\t5e-29\t 111\r\n", "comp31018_c0_seq1\tsp|P43695|GAT5A_XENLA\t68.97\t58\t16\t1\t409\t242\t226\t283\t2e-18\t84.7\r\n", "comp31019_c0_seq1\tsp|P90747|YE56_CAEEL\t48.53\t68\t33\t1\t54\t251\t567\t634\t5e-14\t71.2\r\n", "comp31028_c0_seq1\tsp|Q5RAG7|XCT_PONAB\t44.00\t100\t56\t0\t302\t3\t298\t397\t4e-13\t69.7\r\n", "comp31030_c0_seq1\tsp|Q9CQJ2|PIHD1_MOUSE\t50.57\t87\t38\t2\t403\t143\t57\t138\t3e-23\t96.7\r\n", "comp31033_c0_seq1\tsp|Q6BEA2|PRS27_RAT\t71.43\t28\t8\t0\t132\t215\t53\t80\t6e-06\t46.6\r\n", "comp31037_c0_seq1\tsp|O95425|SVIL_HUMAN\t31.94\t72\t48\t1\t10\t225\t1793\t1863\t1e-06\t49.3\r\n", "comp31054_c0_seq1\tsp|A2AGA4|RHBL2_MOUSE\t43.93\t107\t57\t2\t10\t324\t195\t300\t5e-16\t76.6\r\n", "comp31056_c0_seq1\tsp|Q9NVH0|EXD2_HUMAN\t40.30\t134\t66\t3\t6\t389\t170\t295\t1e-20\t91.3\r\n", "comp31058_c0_seq1\tsp|Q9WUA2|SYFB_MOUSE\t69.70\t66\t20\t0\t53\t250\t353\t418\t3e-27\t 108\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 8 Explore files in commandline" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wc -l ../data/Piura_v1_uniprot_sprot.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 9498 ../data/Piura_v1_uniprot_sprot.tab\r\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "!head -2 ../data/Piura_v1_uniprot_sprot.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "PiuraChilensis_v1_contig_3\tsp|Q6P9A1|ZN530_HUMAN\t33.33\t105\t61\t3\t825\t1118\t414\t516\t1e-07\t57.4\r\n", "PiuraChilensis_v1_contig_4\tsp|Q8TGM6|TAR1_YEAST\t70.91\t55\t16\t0\t3829\t3665\t22\t76\t3e-15\t80.1\r\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "!tr '|' \"\\t\" <../data/Piura_v1_uniprot_sprot.tab> ../data/Piura_v1_uniprot_sprot_sql.tab\n", "!echo SQLShare ready version has Pipes converted to Tabs ....\n", "!head -1 ../data/Piura_v1_uniprot_sprot_sql.tab " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "SQLShare ready version has Pipes converted to Tabs ....\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "PiuraChilensis_v1_contig_3\tsp\tQ6P9A1\tZN530_HUMAN\t33.33\t105\t61\t3\t825\t1118\t414\t516\t1e-07\t57.4\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Step 9 Joining Blast results with other information (SQLShare)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "see also https://github.com/sr320/escience-talk-sqlshare-2015" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://sqlshare.escience.washington.edu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Upload_File_1A61D49E.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"SQLShare_-_View_Query_1A61D549.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```sql\n", "SELECT Column1, term, GOSlim_bin, aspect, ProteinName FROM [sr320@washington.edu].[Piura_v1_uniprot_sprot_sql.tab]p\n", "left join [samwhite@washington.edu].[UniprotProtNamesReviewed_yes20130610]sp \n", "on p.Column3=sp.SPID \n", "left join [sr320@washington.edu].[SPID and GO Numbers]go \n", "on p.Column3=go.SPID \n", "left join [sr320@washington.edu].[GO_to_GOslim]slim \n", "on go.GOID=slim.GO_id\n", "where aspect like 'P'\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Run_Query_1A61D5FA.png\"/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```sql\n", "SELECT DISTINCT Column1, GOSlim_bin FROM [sr320@washington.edu].[Piura_v1_uniprot_sprot_sql.tab]p\n", "left join [sr320@washington.edu].[SPID and GO Numbers]go \n", "on p.Column3=go.SPID \n", "left join [sr320@washington.edu].[GO_to_GOslim]slim \n", "on go.GOID=slim.GO_id\n", "where aspect like 'P'\n", "```" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Bonus Step! Automate - fasta2slim" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }