{ "metadata": { "name": "", "signature": "sha256:398aaae13507cc5dd4d62bb3355ce70d54fd319aee650e866888f5796b324d49" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Week 2: I can BLAST, now what?" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Hot Link: The automated workflow from [Fasta to Go Slim in a push of a button.](http://nbviewer.ipython.org/github/sr320/ipython_nb/blob/master/fish546/blast2pie-test4.ipynb)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#So if you completed Module 1, you might have some data that looks like...\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![blastab-4](files/img/blastab-4.jpeg)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you blasted on UniProt-SwissProt you are have some ID numbers ie `P12234`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we get going too much farther I would like to clean up sequence title name to something more meaningful and would like to separate out SPID (ie `P12234`) into its own column. Spoiler alert: eventually we will be joining this table with another table so that we can glean associated information including the name of the gene and gene ontology." ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_3\tsp|O42248|GBLP_DANRE\t82.46\t171\t30\t0\t1\t513\t35\t205\t1e-101\t 301\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_5\tsp|Q08013|SSRG_RAT\t75.38\t65\t16\t0\t3\t197\t121\t185\t1e-27\t 104\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_6\tsp|P12234|MPCP_BOVIN\t76.62\t77\t18\t0\t2\t232\t286\t362\t2e-23\t98.6\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_9\tsp|Q41629|ADT1_WHEAT\t82.26\t62\t11\t0\t3\t188\t170\t231\t3e-27\t 104\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_13\tsp|Q32NG4|PDDC1_XENLA\t54.44\t90\t40\t1\t1\t270\t140\t228\t1e-27\t 106\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_23\tsp|Q9GNE2|RL23_AEDAE\t97.22\t72\t2\t0\t67\t282\t14\t85\t1e-42\t 142\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_31\tsp|Q3V1H3|HPHL1_MOUSE\t53.38\t133\t59\t1\t2\t391\t23\t155\t5e-42\t 153\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_32\tsp|Q641Y2|NDUS2_RAT\t88.03\t117\t14\t0\t2\t352\t334\t450\t1e-70\t 224\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_37\tsp|Q9D3D9|ATPD_MOUSE\t56.10\t123\t54\t0\t2\t370\t46\t168\t7e-42\t 144\r\n", "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_39\tsp|Q39613|CYPH_CATRO\t75.00\t120\t23\t1\t55\t393\t1\t120\t7e-49\t 160\r\n" ] } ], "prompt_number": 16 }, { "cell_type": "raw", "metadata": {}, "source": [ "#Lets change sequence title first\n", "!sed 's/solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed/Haliotis_cra_4/g' /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx2.tab\n", "#sed 's/abc/XYZ/g' outfile" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx2.tab\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Haliotis_cra_4_contig_3\tsp|O42248|GBLP_DANRE\t82.46\t171\t30\t0\t1\t513\t35\t205\t1e-101\t 301\r\n", "Haliotis_cra_4_contig_5\tsp|Q08013|SSRG_RAT\t75.38\t65\t16\t0\t3\t197\t121\t185\t1e-27\t 104\r\n", "Haliotis_cra_4_contig_6\tsp|P12234|MPCP_BOVIN\t76.62\t77\t18\t0\t2\t232\t286\t362\t2e-23\t98.6\r\n", "Haliotis_cra_4_contig_9\tsp|Q41629|ADT1_WHEAT\t82.26\t62\t11\t0\t3\t188\t170\t231\t3e-27\t 104\r\n", "Haliotis_cra_4_contig_13\tsp|Q32NG4|PDDC1_XENLA\t54.44\t90\t40\t1\t1\t270\t140\t228\t1e-27\t 106\r\n", "Haliotis_cra_4_contig_23\tsp|Q9GNE2|RL23_AEDAE\t97.22\t72\t2\t0\t67\t282\t14\t85\t1e-42\t 142\r\n", "Haliotis_cra_4_contig_31\tsp|Q3V1H3|HPHL1_MOUSE\t53.38\t133\t59\t1\t2\t391\t23\t155\t5e-42\t 153\r\n", "Haliotis_cra_4_contig_32\tsp|Q641Y2|NDUS2_RAT\t88.03\t117\t14\t0\t2\t352\t334\t450\t1e-70\t 224\r\n", "Haliotis_cra_4_contig_37\tsp|Q9D3D9|ATPD_MOUSE\t56.10\t123\t54\t0\t2\t370\t46\t168\t7e-42\t 144\r\n", "Haliotis_cra_4_contig_39\tsp|Q39613|CYPH_CATRO\t75.00\t120\t23\t1\t55\t393\t1\t120\t7e-49\t 160\r\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "#now lets get rid of the pipes\n", "!tr '|' \"\\t\" /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx3.tab\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx3.tab\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Haliotis_cra_4_contig_3\tsp\tO42248\tGBLP_DANRE\t82.46\t171\t30\t0\t1\t513\t35\t205\t1e-101\t 301\r\n", "Haliotis_cra_4_contig_5\tsp\tQ08013\tSSRG_RAT\t75.38\t65\t16\t0\t3\t197\t121\t185\t1e-27\t 104\r\n", "Haliotis_cra_4_contig_6\tsp\tP12234\tMPCP_BOVIN\t76.62\t77\t18\t0\t2\t232\t286\t362\t2e-23\t98.6\r\n", "Haliotis_cra_4_contig_9\tsp\tQ41629\tADT1_WHEAT\t82.26\t62\t11\t0\t3\t188\t170\t231\t3e-27\t 104\r\n", "Haliotis_cra_4_contig_13\tsp\tQ32NG4\tPDDC1_XENLA\t54.44\t90\t40\t1\t1\t270\t140\t228\t1e-27\t 106\r\n", "Haliotis_cra_4_contig_23\tsp\tQ9GNE2\tRL23_AEDAE\t97.22\t72\t2\t0\t67\t282\t14\t85\t1e-42\t 142\r\n", "Haliotis_cra_4_contig_31\tsp\tQ3V1H3\tHPHL1_MOUSE\t53.38\t133\t59\t1\t2\t391\t23\t155\t5e-42\t 153\r\n", "Haliotis_cra_4_contig_32\tsp\tQ641Y2\tNDUS2_RAT\t88.03\t117\t14\t0\t2\t352\t334\t450\t1e-70\t 224\r\n", "Haliotis_cra_4_contig_37\tsp\tQ9D3D9\tATPD_MOUSE\t56.10\t123\t54\t0\t2\t370\t46\t168\t7e-42\t 144\r\n", "Haliotis_cra_4_contig_39\tsp\tQ39613\tCYPH_CATRO\t75.00\t120\t23\t1\t55\t393\t1\t120\t7e-49\t 160\r\n" ] } ], "prompt_number": 20 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Using SQLShare (web) to join files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For intro to SQLShare see " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Upload_File_1885F6E9_png\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Here is a straight-forward example of joining two tables. \n", "A BLAST output table will be joined with gene descriptions.\n", "\n", "\n", "
\n", "\"SQLShare_-_View_Query_187B13C2.png\"\n", "
\n", "\"SQLShare_-_View_Query_and_SQLShare_187B14E1.png\"\n", "
\n", "\"SQLShare_-_View_Query_187B18DE.png\"\n", "
\n", "This can also be done in New Query UI\n", "\"Run_Query_187B1988.png\" \n", "And from here you can Save or simply download the joined table.\n", "
\n", "\n", "\n", "\n", "Explanation of code\n", "\n", "```\n", "SELECT * \n", "FROM \n", "[wearh@washington.edu].[Oly_Blast_uniprot_swissprot.txt]blast \n", "Left Join \n", "[samwhite@washington.edu].[UniprotProtNamesReviewed_yes20130610]unp \n", "on\n", "blast.Column3 = unp.SPID\u200b\n", "```\n", "\n", "\"Run_Query_187B1B37.png\"\n" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }