{ "metadata": { "name": "", "signature": "sha256:5dffe5ad51e51e9e32becf8ec418d4dfd311df88020628cef1b970bc81190dae" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Automating a Workflow: Beyond Blast - to GO Slim" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!date" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Mon Feb 17 08:11:38 PST 2014\r\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Updates - blast full path \n", "subsequent remove of 'blast' variable use as now full path\n", "\n", "--\n", "\n", "have to manually change sqlshare id in code (for now)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The concept is that you can take a fasta file in a working directory and end up with GO slim information all within a single notebook that is automated. Currently this work by writing (and overwriting) as scracth file to SQLShare. Assumptions are that you are working in a directory with fasta file named `query.fa`. And you have SQLShare Python client install\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#allows plots to be shown inline\n", "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "#Setting Working Directory\n", "wd=\"/Volumes/web/whale/lft_05\"\n", "#Setting directory of Blast Databases !!! make sure you have last '/'\n", "dbd=\"/Volumes/Bay3/Software/ncbi-blast-2.2.29\\+/db/\"\n", "#Database name\n", "dbn=\"uniprot_sprot_r2013_12\"\n", "#Blast algorithim complete path\n", "ba=\"/Volumes/Bay3/Software/ncbi-blast-2.2.29\\+/bin/blastx\"\n", "#Location of SQLShare python tools: you can empty (\"\") if tools are in PATH !!! make sure you have last '/'\n", "#spd=\"/Users/Mackenzie/sqlshare-pythonclient/tools/\"\n", "spd=\"/Users/sr320/sqlshare-pythonclient/tools/\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "cd {wd}" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "/Volumes/web/whale/lft_05\n" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "#for some reason max hsp produced error and removed" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!{ba} -query query.fa -db {dbd}{dbn} -out {dbn}_blast_out.tab -evalue 1E-10 -num_threads 4 -max_target_seqs 1 -outfmt 6" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 613 replaced by X\r\n", "Selenocysteine (U) at position 613 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 64 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 613 replaced by X\r\n", "Selenocysteine (U) at position 613 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 85 replaced by X\r\n", "Selenocysteine (U) at position 74 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 613 replaced by X\r\n", "Selenocysteine (U) at position 613 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 18 replaced by X\r\n", "Selenocysteine (U) at position 38 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 17 replaced by X\r\n", "Selenocysteine (U) at position 17 replaced by X\r\n", "Selenocysteine (U) at position 15 replaced by X\r\n", "Selenocysteine (U) at position 17 replaced by X\r\n", "Selenocysteine (U) at position 15 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 15 replaced by X\r\n", "Selenocysteine (U) at position 15 replaced by X\r\n", "Selenocysteine (U) at position 25 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 17 replaced by X\r\n", "Selenocysteine (U) at position 13 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 19 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 7 replaced by X\r\n", "Selenocysteine (U) at position 16 replaced by X\r\n", "Selenocysteine (U) at position 21 replaced by X\r\n", "Selenocysteine (U) at position 24 replaced by X\r\n", "Selenocysteine (U) at position 60 replaced by X\r\n", "Selenocysteine (U) at position 63 replaced by X\r\n", "Selenocysteine (U) at position 63 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 52 replaced by X\r\n", "Selenocysteine (U) at position 49 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 52 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 46 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 28 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 52 replaced by X\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 49 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 52 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 46 replaced by X\r\n", "Selenocysteine (U) at position 47 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 40 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 73 replaced by X\r\n", "Selenocysteine (U) at position 64 replaced by X\r\n", "Selenocysteine (U) at position 28 replaced by X\r\n" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "!head -1 {dbn}_blast_out.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Roberts_20100712_CC_F3_contig_1\tsp|Q962U1|RL13_SPOFR\t66.35\t104\t35\t0\t3\t314\t9\t112\t3e-46\t 154\r\n" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "#Translate pipes to tab so SPID is in separate column for Joining\n", "!tr '|' \"\\t\" <{dbn}_blast_out.tab> {dbn}_blast_out2.tab" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "!head -1 {dbn}_blast_out2.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Roberts_20100712_CC_F3_contig_1\tsp\tQ962U1\tRL13_SPOFR\t66.35\t104\t35\t0\t3\t314\t9\t112\t3e-46\t 154\r\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "#Uploads formatted blast table to SQLshare; currently has generic name and meant to be temporary: Warning will overwrite.\n", "!python {spd}singleupload.py -d scratchblast_out {dbn}_blast_out2.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "processing chunk line 0 to 1201 (0.00117206573486 s elapsed)\r\n", "pushing uniprot_sprot_r2013_12_blast_out2.tab...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "parsing D5B836EF...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "finished scratchblast_out\r\n" ] } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "!python {spd}fetchdata.py -s \"SELECT * FROM [sr320@washington.edu].[scratchblast_out]blast Left Join [sr320@washington.edu].[uniprot-reviewed_wGO_010714]unp ON blast.Column3 = unp.Entry Left Join [sr320@washington.edu].[SPID and GO Numbers]go ON unp.Entry = go.SPID Left Join [sr320@washington.edu].[GO_to_GOslim]slim ON slim.GO_id = go.GOID\" -f tsv -o {dbn}_join2goslim.txt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "!head -2 {dbn}_join2goslim.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Column1\tColumn2\tColumn3\tColumn4\tColumn5\tColumn6\tColumn7\tColumn8\tColumn9\tColumn10\tColumn11\tColumn12\tColumn13\tColumn14\tEntry\tEntry name\tGene ontology IDs\tInteracts with\tCross-reference (GO)\tGene ontology (GO)\tStatus\tInterPro\tPathway\tProtein names\tGene names\tOrganism\tLength\tSPID\tGOID\tGO_id\tterm\tGOSlim_bin\taspect\r", "\r\n", "Roberts_20100712_CC_F3_contig_3346\tsp\tA0JP85\tCNOT1_XENTR\t82.19\t73\t13\t0\t3\t221\t2306\t2378\t5E-34\t130\tA0JP85\tCNOT1_XENTR\tGO:0030014; GO:0000932; GO:0030331; GO:0031047; GO:0033147; GO:0048387; GO:0000122; GO:0005634; GO:0010606; GO:1900153; GO:0060213; GO:0006417; GO:0042974; GO:0006351\t\t\tCCR4-NOT complex; cytoplasmic mRNA processing body; estrogen receptor binding; gene silencing by RNA; negative regulation of intracellular estrogen receptor signaling pathway; negative regulation of retinoic acid receptor signaling pathway; negative regulation of transcription from RNA polymerase II promoter; nucleus; positive regulation of cytoplasmic mRNA processing body assembly; positive regulation of nuclear-transcribed mRNA catabolic process, deadenylation-dependent decay; positive regulation of nuclear-transcribed mRNA poly(A) tail shortening; regulation of translation; retinoic acid receptor binding; transcription, DNA-dependent\treviewed\tIPR007196; IPR024557;\t\tCCR4-NOT transcription complex subunit 1 (CCR4-associated factor 1)\tcnot1\tXenopus tropicalis (Western clawed frog) (Silurana tropicalis)\t2388\tA0JP85\tGO:0006351\tGO:0006351\t\"transcription, DNA-dependent\"\tRNA metabolism\tP\r", "\r\n" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "pausing" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python {spd}singleupload.py -d scratchjoin_slim {dbn}_join2goslim.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "processing chunk line 0 to 1978 (0.00637292861938 s elapsed)\r\n", "pushing spdb_join2goslim.txt...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "parsing 94DDEBBA...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "finished scratchjoin_slim\r\n" ] } ], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "#Sets GO aspect \n", "!python {spd}fetchdata.py -s \"SELECT Distinct Column1 as query, Column3 as SPID, GOSlim_bin FROM [mgavery@washington.edu].[scratchjoin_slim] Where aspect = 'P'\" -f tsv -o justslim.txt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "!head justslim.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "query\tSPID\tGOSlim_bin\r", "\r\n", "ConsensusfromContig10\tQ9PVZ4\tcell organization and biogenesis\r", "\r\n", "ConsensusfromContig10\tQ9PVZ4\tdevelopmental processes\r", "\r\n", "ConsensusfromContig10\tQ9PVZ4\tother metabolic processes\r", "\r\n", "ConsensusfromContig10\tQ9PVZ4\tprotein metabolism\r", "\r\n", "ConsensusfromContig10\tQ9PVZ4\tsignal transduction\r", "\r\n", "ConsensusfromContig107\tQ5R8W6\tdeath\r", "\r\n", "ConsensusfromContig107\tQ5R8W6\tRNA metabolism\r", "\r\n", "ConsensusfromContig107\tQ5R8W6\tstress response\r", "\r\n", "ConsensusfromContig117\tA6QR55\tother biological processes\r", "\r\n" ] } ], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "from pandas import *\n", "\n", "jslim = read_table(\"justslim.txt\", # name of the data file\n", " #sep=\",\", # what character separates each column?\n", " na_values=[\"\", \" \"]) # what values should be considered \"blank\" values?" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "IOError", "evalue": "File justslim.txt does not exist", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mIOError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m jslim = read_table(\"justslim.txt\", # name of the data file\n\u001b[1;32m 4\u001b[0m \u001b[0;31m#sep=\",\", # what character separates each column?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m na_values=[\"\", \" \"]) # what values should be considered \"blank\" values?\n\u001b[0m", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc\u001b[0m in \u001b[0;36mparser_f\u001b[0;34m(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)\u001b[0m\n\u001b[1;32m 399\u001b[0m buffer_lines=buffer_lines)\n\u001b[1;32m 400\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 401\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 402\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 403\u001b[0m \u001b[0mparser_f\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__name__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc\u001b[0m in \u001b[0;36m_read\u001b[0;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[1;32m 207\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 208\u001b[0m \u001b[0;31m# Create the parser.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 209\u001b[0;31m \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTextFileReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 210\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 211\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mnrows\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, f, engine, **kwds)\u001b[0m\n\u001b[1;32m 507\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'has_index_names'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'has_index_names'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 508\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 509\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 510\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 511\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_options_with_defaults\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc\u001b[0m in \u001b[0;36m_make_engine\u001b[0;34m(self, engine)\u001b[0m\n\u001b[1;32m 609\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_make_engine\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mengine\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'c'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 610\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'c'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 611\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mCParserWrapper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moptions\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 612\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 613\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mengine\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'python'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, src, **kwds)\u001b[0m\n\u001b[1;32m 891\u001b[0m \u001b[0;31m# #2442\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 892\u001b[0m \u001b[0mkwds\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'allow_leading_cols'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex_col\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mFalse\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 893\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_reader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_parser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTextReader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msrc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 894\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 895\u001b[0m \u001b[0;31m# XXX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/_parser.so\u001b[0m in \u001b[0;36mpandas._parser.TextReader.__cinit__ (pandas/src/parser.c:2771)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m//anaconda/lib/python2.7/site-packages/pandas/_parser.so\u001b[0m in \u001b[0;36mpandas._parser.TextReader._setup_parser_source (pandas/src/parser.c:4803)\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mIOError\u001b[0m: File justslim.txt does not exist" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "jslim.groupby('GOSlim_bin').query.count().plot(kind='bar')\n" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!say \"hash tag winning\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Below is optional" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#could also upload again to get a simple table\n", "#could be done in pandas\n", "\n", "#!python {spd}singleupload.py -d scratchpie justslim.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "processing chunk line 0 to 2538 (0.00250601768494 s elapsed)\r\n", "pushing justslim.txt...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "parsing 87B0B7A8...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "finished scratchpie\r\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "#fetching data grouped by GObin\n", "\n", "#!python {spd}fetchdata.py -s \"SELECT GOSlim_bin, COUNT(GOSlim_bin) as termcount from [sr320@washington.edu].[scratchpie] Group by GOSlim_bin\" -f tsv -o justpie.txt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }