{
"metadata": {
"name": "",
"signature": "sha256:293537c95a9c3321e2c514ae32b98cbb52a2aeb2ce03daa6020a30ffde13863d"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Pyura (Piura) chilensis Transcriptome "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##Objectives"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Gain Experience using IPython Notebook (or alternative) to document research so that it is useful to future you. \n",
"* Use iPlant Collaborative Discovery Environment \n",
"* Install and software locally (BLAST) \n",
"* Explore structured data tables with commandline\n",
"* Use SQLShare to join and query table."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"What is Piura?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
""
]
}
],
"prompt_number": 1
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Map"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![pic](../img/Coquimbo_-_Google_Maps_1A58733E.png)"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Scenario"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sequencing data from 454 Platform was just sequenced by a core facility. You need to take it, make sure it is of decent quality, your data, and determine the functional category of genes expressed."
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"Location of Raw Data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 1 Upload to iPlant Discovery Environment"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 2 Convert SFF to Fastq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 3 Check Sequence Quality - FASTQC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 3b Download zip file and view graphs"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
""
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
""
]
}
],
"prompt_number": 3
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 4 Trim Sequences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 5 Recheck Sequence Quality"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
""
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML('')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
""
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
""
]
}
],
"prompt_number": 13
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 6 Assemble Reads into Transcriptome (Trinity)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets do each sample separately "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Results \n",
"* http://de.iplantcollaborative.org/dl/d/352B2BF2-C1C3-4866-9B8B-DA3928D0D0A1/PiuraC_Val_Trinity.fasta\n",
"* http://de.iplantcollaborative.org/dl/d/38C4F315-476D-48E7-B8F4-D41FE7BF613A/PiuraC_Coq_Trinity.fasta"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 6b Assemble Reads into Transcriptome (CLC)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Results\n",
"* http://eagle.fish.washington.edu/cnidarian/Piura_v1_contigs.fa\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#in repo\n",
"!head ../data/Piura_v1_contigs.fa"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
">PiuraChilensis_v1_contig_1\r\n",
"ATTTACAATACGAAGTAAAATAGATAACGTGAAAATAATCTTGGTGCTGGATGATCGATC\r\n",
"AAGTTCACCAATATTTTATTGTAAAAAATCATTCTAAACAGCATGAAATCGTGTACAATG\r\n",
"TATAAACAAGCAAATATATAACACTAAAGCAAGAGGGCGTAAGTGGGGGGGTGGGTGAGA\r\n",
"GTAAAAAATTCAAACATGTCAAATACCCCGGCGTTAGCCTTAAAAGCACCATGGACTTCT\r\n",
"GCCTTCAATAAGCATAAAATTAAAACACCTAATACACAATGAATATACAGATAAAACAGA\r\n",
"TTTATGAATAGTTGGTGTTACATCTTTTACAGCCATAAGCCTTCATTTTGCTTCCAAACG\r\n",
"TATAAAATCTGACTTGGAACAATATACAGCCATGAGATATGACACAGCGAGCACTACAAT\r\n",
"ATATATTTATCTTGTACTATACAGCCTGTACAAGAAAATTCTGGAATTGTCTTCACAAGA\r\n",
"GACAGAAAAATAGTTGCAATGTGAATGCTAGTCTACTATTTGATCACAATTGGATAGAAA\r\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 7 Annotate Contigs (Blast)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"see [01-Local_BLAST How-to](http://nbviewer.ipython.org/github/sr320/austral/blob/master/modules/01-Piura-Annotation/01-Local_BLAST.ipynb)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!blastx \\\n",
"-query ../data/Piura_v1_contigs.fa \\\n",
"-db /Volumes/Bay3/Software/ncbi-blast-2.2.29\\+/db/uniprot_sprot_r2013_12 \\\n",
"-out ../data/Piura_v1_uniprot_sprot.tab \\\n",
"-evalue 1E-05 \\\n",
"-max_target_seqs 1 \\\n",
"-max_hsps 1 \\\n",
"-outfmt 6 \\\n",
"-num_threads 2"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#in repo\n",
"!tail -2 ../data/Piura_v1_uniprot_sprot.tab"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"PiuraChilensis_v1_contig_15018\tsp|O18973|RABX5_BOVIN\t79.07\t43\t9\t0\t888\t1016\t321\t363\t3e-15\t80.1\r\n",
"PiuraChilensis_v1_contig_15021\tsp|Q9Z1Z1|E2AK3_RAT\t51.61\t93\t45\t0\t100\t378\t971\t1063\t8e-22\t97.8\r\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#still needs to be run\n",
"\n",
"!/Applications/bioinfo/ncbi-blast-2.2.30/bin/blastx \\\n",
"-query ../data/PiuraC_Val_Trinity_2ndhalf.fasta \\\n",
"-db /Users/sr320/data-genomic/blast/db/uniprot_sprot_r2015_01 \\\n",
"-out ../data/PiuraC_Val_Trinity_uniprot_sprot_2ndhalf.tab \\\n",
"-evalue 1E-05 \\\n",
"-max_target_seqs 1 \\\n",
"-max_hsps 1 \\\n",
"-outfmt 6 \\\n",
"-num_threads 6"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tail ../data/PiuraC_Val_Trinity_uniprot_sprot_2ndhalf.tab"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"comp31014_c0_seq1\tsp|Q63159|COQ3_RAT\t50.93\t108\t50\t1\t3\t317\t155\t262\t5e-29\t 111\r\n",
"comp31018_c0_seq1\tsp|P43695|GAT5A_XENLA\t68.97\t58\t16\t1\t409\t242\t226\t283\t2e-18\t84.7\r\n",
"comp31019_c0_seq1\tsp|P90747|YE56_CAEEL\t48.53\t68\t33\t1\t54\t251\t567\t634\t5e-14\t71.2\r\n",
"comp31028_c0_seq1\tsp|Q5RAG7|XCT_PONAB\t44.00\t100\t56\t0\t302\t3\t298\t397\t4e-13\t69.7\r\n",
"comp31030_c0_seq1\tsp|Q9CQJ2|PIHD1_MOUSE\t50.57\t87\t38\t2\t403\t143\t57\t138\t3e-23\t96.7\r\n",
"comp31033_c0_seq1\tsp|Q6BEA2|PRS27_RAT\t71.43\t28\t8\t0\t132\t215\t53\t80\t6e-06\t46.6\r\n",
"comp31037_c0_seq1\tsp|O95425|SVIL_HUMAN\t31.94\t72\t48\t1\t10\t225\t1793\t1863\t1e-06\t49.3\r\n",
"comp31054_c0_seq1\tsp|A2AGA4|RHBL2_MOUSE\t43.93\t107\t57\t2\t10\t324\t195\t300\t5e-16\t76.6\r\n",
"comp31056_c0_seq1\tsp|Q9NVH0|EXD2_HUMAN\t40.30\t134\t66\t3\t6\t389\t170\t295\t1e-20\t91.3\r\n",
"comp31058_c0_seq1\tsp|Q9WUA2|SYFB_MOUSE\t69.70\t66\t20\t0\t53\t250\t353\t418\t3e-27\t 108\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 8 Explore files in commandline"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!wc -l ../data/Piura_v1_uniprot_sprot.tab"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" 9498 ../data/Piura_v1_uniprot_sprot.tab\r\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head -2 ../data/Piura_v1_uniprot_sprot.tab"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"PiuraChilensis_v1_contig_3\tsp|Q6P9A1|ZN530_HUMAN\t33.33\t105\t61\t3\t825\t1118\t414\t516\t1e-07\t57.4\r\n",
"PiuraChilensis_v1_contig_4\tsp|Q8TGM6|TAR1_YEAST\t70.91\t55\t16\t0\t3829\t3665\t22\t76\t3e-15\t80.1\r\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tr '|' \"\\t\" <../data/Piura_v1_uniprot_sprot.tab> ../data/Piura_v1_uniprot_sprot_sql.tab\n",
"!echo SQLShare ready version has Pipes converted to Tabs ....\n",
"!head -1 ../data/Piura_v1_uniprot_sprot_sql.tab "
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"SQLShare ready version has Pipes converted to Tabs ....\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"PiuraChilensis_v1_contig_3\tsp\tQ6P9A1\tZN530_HUMAN\t33.33\t105\t61\t3\t825\t1118\t414\t516\t1e-07\t57.4\r\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Step 9 Joining Blast results with other information (SQLShare)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"see also https://github.com/sr320/escience-talk-sqlshare-2015"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"https://sqlshare.escience.washington.edu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```sql\n",
"SELECT Column1, term, GOSlim_bin, aspect, ProteinName FROM [sr320@washington.edu].[Piura_v1_uniprot_sprot_sql.tab]p\n",
"left join [samwhite@washington.edu].[UniprotProtNamesReviewed_yes20130610]sp \n",
"on p.Column3=sp.SPID \n",
"left join [sr320@washington.edu].[SPID and GO Numbers]go \n",
"on p.Column3=go.SPID \n",
"left join [sr320@washington.edu].[GO_to_GOslim]slim \n",
"on go.GOID=slim.GO_id\n",
"where aspect like 'P'\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```sql\n",
"SELECT DISTINCT Column1, GOSlim_bin FROM [sr320@washington.edu].[Piura_v1_uniprot_sprot_sql.tab]p\n",
"left join [sr320@washington.edu].[SPID and GO Numbers]go \n",
"on p.Column3=go.SPID \n",
"left join [sr320@washington.edu].[GO_to_GOslim]slim \n",
"on go.GOID=slim.GO_id\n",
"where aspect like 'P'\n",
"```"
]
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Bonus Step! Automate - fasta2slim"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}