{ "metadata": { "name": "TJGR_OysterGenome_IGV", "signature": "sha256:6f4b4459fd9015d95c576ce7b1bbb83ae612af539d0ead3087d6f2c076b96565" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Developing Canonical Tracks for IGV Oyster Genome Browser" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is my attempt derive fundamental genomic tracks for the oyster genome that can be easily visualized. \n", "\n", "[render in viewer](http://nbviewer.ipython.org/urls/raw.github.com/sr320/ipython_nb/master/TJGR_OysterGenome_IGV.ipynb)\n", "\n", "\n", "Contents \n", "\n", "- [Launching IGV](#IGV)\n", "- [Loading Core tracks](#cds) -- -- [URLs](#qURL)\n", "- [Gill Tissue Methylation](#gillmeth)\n", "- [Sperm Methylation](#spermmeth)\n", "- [Gill and Sperm gene level expression - RPKM](#rpkm)\n", "\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Will use the full genome as scaffold (should be in cnidaria)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Launching Integrative Genome Viewer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_Launch IGV_\n", "\n", "clicking this file opens the app\n", "\n", "\"Screenshot%205/27/13%2012:14%20PM\"\n", "\n", "###Load genome via url\n", "\n", "\"Screenshot%205/27/13%2012:16%20PM\"\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Loading / Defining Core Genome tracks\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "####Exons and Genes \n", "Derived from \n", "\n", "\n", "\n", "```\n", "gene:\n", "\tgene_v9/oyster.v9.glean.final.rename.gff.gz\tgene feature of pacific oyster in gff format\n", "\tgene_v9/oyster.v9.glean.final.rename.gff.cds.gz\tcoding sequence of pacific oyster in fasta format\n", "\tgene_v9/oyster.v9.glean.final.rename.gff.pep.gz\tprotein sequence of pacific oyster in fasta format\n", "```\n", "\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/Bay4\\ scratch/oyster.v9.glean.final.rename.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tmRNA\t35\t385\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\tParent=CGI_10000001;\r\n", "C17212\tGLEAN\tmRNA\t31\t363\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\tParent=CGI_10000002;\r\n", "C17316\tGLEAN\tmRNA\t30\t257\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\tParent=CGI_10000003;\r\n", "C17476\tGLEAN\tmRNA\t34\t257\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\tParent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\tParent=CGI_10000004;\r\n", "C17998\tGLEAN\tmRNA\t196\t387\t1\t-\t.\tID=CGI_10000005;\r\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/Bay4\\ scratch/oyster.v9.glean.final.rename.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 224718 2022462 14179523 /Volumes/Bay4 scratch/oyster.v9.glean.final.rename.gff\r\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "#not quite a GFF!\n", "!head /Volumes/Bay4\\ scratch/oyster.v9.glean.final.rename.gff.pep" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">CGI_10000780\r\n", "MERYGARRLRMTIWETTRNGQLQTTHLGSILFILVMMYACVFCRVSLKNG\r\n", "EEITQLREKGCNTVNRTSQTRNNTIVTTPGQKVHQKCRRDYINANSIKNY\r\n", "MREKDVSITEPTRDLRSSTPDFEFQKNCLFCGYFAKFSECKRGIDVFPVR\r\n", "TTDFSNTLRNICKKRNDEWSEIVLRRLNIAPSDLHAADAIYHQTCSVNFR\r\n", "TGQQIPVSKQANKMVEKGIKTKHADADADVLIALTAIESAKTKPTVLLGE\r\n", "DTDLLVLLLHHADVTSNSLIFKSGNVSKVNTHIKIWDILKTKVLLGEELC\r\n", "TLLPLIHAISGCDTTSRMFGVSKAATLKKFAEHDFLKTRQLLCNANAKDD\r\n", "VISAGENIISSLYNGAPYEELNVLRYRKFAARVLTNKTCVQIHTLPPTSN\r\n", "AASFHSQRAYLQMKMWMNEDNLNPCEWGWKVANGNLVPVKCTVKLPLNC\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "#not quite a GFF!\n", "!head /Volumes/Bay4\\ scratch/oyster.v9.glean.final.rename.gff.CDS" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">CGI_10000780\r\n", "ATGGAAAGATATGGCGCCCGTAGATTAAGAATGACGATATGGGAGACAAC\r\n", "TCGTAATGGTCAACTGCAGACGACGCATCTAGGTTCCATCCTTTTCATTC\r\n", "TGGTAATGATGTATGCTTGTGTTTTTTGTCGGGTGTCTCTAAAAAATGGT\r\n", "GAAGAAATAACACAACTAAGAGAAAAAGGATGTAACACAGTTAATAGGAC\r\n", "CAGCCAAACCAGAAATAATACAATCGTCACAACTCCAGGACAAAAAGTTC\r\n", "ATCAGAAATGTCGACGTGATTACATTAATGCTAACTCAATCAAGAATTAC\r\n", "ATGCGAGAAAAGGATGTATCGATAACCGAGCCAACTCGTGACTTACGATC\r\n", "TTCTACTCCTGATTTTGAGTTCCAGAAGAACTGTTTATTTTGTGGATATT\r\n", "TTGCAAAATTTTCAGAATGCAAAAGGGGAATCGACGTGTTTCCTGTCAGG\r\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specifically, (/Volumes/Bay4\\ scratch/oyster.v9.glean.final.rename.gff) was parsed to Exon (CDS) and full gene (mRNA). " ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\tParent=CGI_10000001;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\tParent=CGI_10000002;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\tParent=CGI_10000003;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\tParent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\tParent=CGI_10000004;\r\n", "C17998\tGLEAN\tCDS\t196\t387\t.\t-\t0\tParent=CGI_10000005;\r\n", "C18346\tGLEAN\tCDS\t174\t551\t.\t+\t0\tParent=CGI_10000009;\r\n", "C18428\tGLEAN\tCDS\t286\t546\t.\t-\t0\tParent=CGI_10000010;\r\n", "C18964\tGLEAN\tCDS\t203\t658\t.\t-\t0\tParent=CGI_10000011;\r\n", "C18980\tGLEAN\tCDS\t30\t674\t.\t+\t0\tParent=CGI_10000012;\r\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tmRNA\t35\t385\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tmRNA\t31\t363\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tmRNA\t30\t257\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tmRNA\t34\t257\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tmRNA\t196\t387\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tmRNA\t174\t551\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tmRNA\t286\t546\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tmRNA\t203\t658\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tmRNA\t30\t674\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tmRNA\t160\t681\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 196691 1770219 12359791 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff\r\n" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 28027 252243 1819732 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff\r\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "#check to make sure files add up.\n", "sum(196691 + 28027)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "pyout", "prompt_number": 16, "text": [ "224718" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\tParent=CGI_10000001;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\tParent=CGI_10000002;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\tParent=CGI_10000003;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\tParent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\tParent=CGI_10000004;\r\n", "C17998\tGLEAN\tCDS\t196\t387\t.\t-\t0\tParent=CGI_10000005;\r\n", "C18346\tGLEAN\tCDS\t174\t551\t.\t+\t0\tParent=CGI_10000009;\r\n", "C18428\tGLEAN\tCDS\t286\t546\t.\t-\t0\tParent=CGI_10000010;\r\n", "C18964\tGLEAN\tCDS\t203\t658\t.\t-\t0\tParent=CGI_10000011;\r\n", "C18980\tGLEAN\tCDS\t30\t674\t.\t+\t0\tParent=CGI_10000012;\r\n" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tmRNA\t35\t385\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tmRNA\t31\t363\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tmRNA\t30\t257\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tmRNA\t34\t257\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tmRNA\t196\t387\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tmRNA\t174\t551\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tmRNA\t286\t546\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tmRNA\t203\t658\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tmRNA\t30\t674\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tmRNA\t160\t681\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 23 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "#### All CGs\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 10035701 99934100 977314599 /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff\r\n" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "!fgrep -c \"fuzznuc\tnucleotide_motif\" /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "9978551\r\n" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "##gff-version 3\r\n", "##sequence-region scaffold360 1 280\r\n", "#!Date 2013-04-23\r\n", "#!Type DNA\r\n", "#!Source-version EMBOSS 6.5.7.0\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t60\t61\t2\t+\t.\tID=scaffold360.1;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t96\t97\t2\t+\t.\tID=scaffold360.2;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t120\t121\t2\t+\t.\tID=scaffold360.3;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t187\t188\t2\t+\t.\tID=scaffold360.4;note=*pat pattern:CG\r\n", "##gff-version 3\r\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!sortbed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff > /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 9978551 99785510 976050492 /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff\r\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "####Promoter###" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/qDOD_scaffold_length.csv" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "sID,length\r", "\r\n", "C1,100\r", "\r\n", "C10003,156\r", "\r\n", "C10005,156\r", "\r\n", "C10007,156\r", "\r\n", "C10009,156\r", "\r\n", "C1001,103\r", "\r\n", "C10011,156\r", "\r\n", "C10013,157\r", "\r\n", "C10015,157\r", "\r\n" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "!tr ',' \"\\t\" /Volumes/web/cnidarian/qDOD_scaffold_length.txt" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 36 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/qDOD_scaffold_length.txt" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "sID\tlength\r", "\r\n", "C1\t100\r", "\r\n", "C10003\t156\r", "\r\n", "C10005\t156\r", "\r\n", "C10007\t156\r", "\r\n", "C10009\t156\r", "\r\n", "C1001\t103\r", "\r\n", "C10011\t156\r", "\r\n", "C10013\t157\r", "\r\n", "C10015\t157\r", "\r\n" ] } ], "prompt_number": 37 }, { "cell_type": "code", "collapsed": false, "input": [ "!flankbed -s -l 1000 -r 0 -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tmRNA\t386\t395\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tmRNA\t1\t30\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tmRNA\t1\t29\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tmRNA\t258\t491\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tmRNA\t388\t559\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tmRNA\t1\t173\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tmRNA\t547\t611\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tmRNA\t659\t714\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tmRNA\t1\t29\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tmRNA\t682\t743\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/mRNA/promoter/g' /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tpromoter\t386\t395\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tpromoter\t1\t30\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tpromoter\t1\t29\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tpromoter\t258\t491\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tpromoter\t388\t559\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tpromoter\t1\t173\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tpromoter\t547\t611\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tpromoter\t659\t714\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tpromoter\t1\t29\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tpromoter\t682\t743\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 45 }, { "cell_type": "raw", "metadata": {}, "source": [ "http://eagle.fish.washington.edu/cnidarian/TJGR_Promoter_1k5p_b.gff" ] }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "#clean up in SQLShare\n", "!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9\r", "\r\n", "C16582,flankbed,promoter,386,395,.,-,.,ID=CGI_10000001;\r", "\r\n", "C17212,flankbed,promoter,1,30,.,+,.,ID=CGI_10000002;\r", "\r\n", "C17316,flankbed,promoter,1,29,.,+,.,ID=CGI_10000003;\r", "\r\n", "C17476,flankbed,promoter,258,491,.,-,.,ID=CGI_10000004;\r", "\r\n", "C17998,flankbed,promoter,388,559,.,-,.,ID=CGI_10000005;\r", "\r\n", "C18346,flankbed,promoter,1,173,.,+,.,ID=CGI_10000009;\r", "\r\n", "C18428,flankbed,promoter,547,611,.,-,.,ID=CGI_10000010;\r", "\r\n", "C18964,flankbed,promoter,659,714,.,-,.,ID=CGI_10000011;\r", "\r\n", "C18980,flankbed,promoter,1,29,.,+,.,ID=CGI_10000012;\r", "\r\n" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff > /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2b.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "!tr ',' \"\\t\" /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tflankbed\tpromoter\t386\t395\t.\t-\t.\tID=CGI_10000001;\r", "\r\n", "C17212\tflankbed\tpromoter\t1\t30\t.\t+\t.\tID=CGI_10000002;\r", "\r\n", "C17316\tflankbed\tpromoter\t1\t29\t.\t+\t.\tID=CGI_10000003;\r", "\r\n", "C17476\tflankbed\tpromoter\t258\t491\t.\t-\t.\tID=CGI_10000004;\r", "\r\n", "C17998\tflankbed\tpromoter\t388\t559\t.\t-\t.\tID=CGI_10000005;\r", "\r\n", "C18346\tflankbed\tpromoter\t1\t173\t.\t+\t.\tID=CGI_10000009;\r", "\r\n", "C18428\tflankbed\tpromoter\t547\t611\t.\t-\t.\tID=CGI_10000010;\r", "\r\n", "C18964\tflankbed\tpromoter\t659\t714\t.\t-\t.\tID=CGI_10000011;\r", "\r\n", "C18980\tflankbed\tpromoter\t1\t29\t.\t+\t.\tID=CGI_10000012;\r", "\r\n", "C19100\tflankbed\tpromoter\t682\t743\t.\t-\t.\tID=CGI_10000013;\r", "\r\n" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "!cp /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 34 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "###Introns (Option 1)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/Parent=/#Parent=/g' /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 70 }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/ID=/#ID=/g' /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 75 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\t#Parent=CGI_10000001;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\t#Parent=CGI_10000002;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\t#Parent=CGI_10000003;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\t#Parent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\t#Parent=CGI_10000004;\r\n", "C17998\tGLEAN\tCDS\t196\t387\t.\t-\t0\t#Parent=CGI_10000005;\r\n", "C18346\tGLEAN\tCDS\t174\t551\t.\t+\t0\t#Parent=CGI_10000009;\r\n", "C18428\tGLEAN\tCDS\t286\t546\t.\t-\t0\t#Parent=CGI_10000010;\r\n", "C18964\tGLEAN\tCDS\t203\t658\t.\t-\t0\t#Parent=CGI_10000011;\r\n", "C18980\tGLEAN\tCDS\t30\t674\t.\t+\t0\t#Parent=CGI_10000012;\r\n" ] } ], "prompt_number": 71 }, { "cell_type": "code", "collapsed": false, "input": [ "!subtractBed -a /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron.gff " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 76 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Cgigas_v9_intron.gff " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\tGLEAN\tmRNA\t75\t103\t0.998947\t-\t.\t#ID=CGI_10000004;\r\n", "C19392\tGLEAN\tmRNA\t184\t451\t1\t+\t.\t#ID=CGI_10000015;\r\n", "C20262\tGLEAN\tmRNA\t539\t641\t1\t-\t.\t#ID=CGI_10000025;\r\n", "C20262\tGLEAN\tmRNA\t650\t871\t1\t-\t.\t#ID=CGI_10000025;\r\n", "C20334\tGLEAN\tmRNA\t524\t867\t1\t-\t.\t#ID=CGI_10000028;\r\n", "C20412\tGLEAN\tmRNA\t215\t409\t1\t-\t.\t#ID=CGI_10000029;\r\n", "C20412\tGLEAN\tmRNA\t464\t705\t1\t-\t.\t#ID=CGI_10000029;\r\n", "C20462\tGLEAN\tmRNA\t50\t271\t1\t+\t.\t#ID=CGI_10000030;\r\n", "C20462\tGLEAN\tmRNA\t360\t481\t1\t+\t.\t#ID=CGI_10000030;\r\n", "C20462\tGLEAN\tmRNA\t577\t822\t1\t+\t.\t#ID=CGI_10000030;\r\n" ] } ], "prompt_number": 77 }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/#ID=/Parent=/g' /Volumes/web/cnidarian/Cgigas_v9_intron_b.gff " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 99 }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/GLEAN/subtractBed/g' /Volumes/web/cnidarian/Cgigas_v9_intron_c.gff " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 105 }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/mRNA/_intron/g' /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 111 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\tsubtractBed\t_intron\t75\t103\t0.998947\t-\t.\tParent=CGI_10000004;\r\n", "C19392\tsubtractBed\t_intron\t184\t451\t1\t+\t.\tParent=CGI_10000015;\r\n", "C20262\tsubtractBed\t_intron\t539\t641\t1\t-\t.\tParent=CGI_10000025;\r\n", "C20262\tsubtractBed\t_intron\t650\t871\t1\t-\t.\tParent=CGI_10000025;\r\n", "C20334\tsubtractBed\t_intron\t524\t867\t1\t-\t.\tParent=CGI_10000028;\r\n", "C20412\tsubtractBed\t_intron\t215\t409\t1\t-\t.\tParent=CGI_10000029;\r\n", "C20412\tsubtractBed\t_intron\t464\t705\t1\t-\t.\tParent=CGI_10000029;\r\n", "C20462\tsubtractBed\t_intron\t50\t271\t1\t+\t.\tParent=CGI_10000030;\r\n", "C20462\tsubtractBed\t_intron\t360\t481\t1\t+\t.\tParent=CGI_10000030;\r\n", "C20462\tsubtractBed\t_intron\t577\t822\t1\t+\t.\tParent=CGI_10000030;\r\n" ] } ], "prompt_number": 112 }, { "cell_type": "code", "collapsed": false, "input": [ "http://eagle.fish.washington.edu/cnidarian/Cgigas_v9_intron_d.gff " ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 113 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\tsubtractBed\t_intron\t75\t103\t0.998947\t-\t.\tParent=CGI_10000004;\r\n", "C19392\tsubtractBed\t_intron\t184\t451\t1\t+\t.\tParent=CGI_10000015;\r\n", "C20262\tsubtractBed\t_intron\t539\t641\t1\t-\t.\tParent=CGI_10000025;\r\n", "C20262\tsubtractBed\t_intron\t650\t871\t1\t-\t.\tParent=CGI_10000025;\r\n", "C20334\tsubtractBed\t_intron\t524\t867\t1\t-\t.\tParent=CGI_10000028;\r\n", "C20412\tsubtractBed\t_intron\t215\t409\t1\t-\t.\tParent=CGI_10000029;\r\n", "C20412\tsubtractBed\t_intron\t464\t705\t1\t-\t.\tParent=CGI_10000029;\r\n", "C20462\tsubtractBed\t_intron\t50\t271\t1\t+\t.\tParent=CGI_10000030;\r\n", "C20462\tsubtractBed\t_intron\t360\t481\t1\t+\t.\tParent=CGI_10000030;\r\n", "C20462\tsubtractBed\t_intron\t577\t822\t1\t+\t.\tParent=CGI_10000030;\r\n" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "#will clean up in SQLSHARE\n", "!head /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff\n", "#!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_intron_v2b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron_v2c.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\tsubtractBed\tintrn\t75\t103\t.\t-\t.\tParent=CGI_10000004;\r", "\r\n", "C19392\tsubtractBed\tintrn\t184\t451\t.\t+\t.\tParent=CGI_10000015;\r", "\r\n", "C20262\tsubtractBed\tintrn\t539\t641\t.\t-\t.\tParent=CGI_10000025;\r", "\r\n", "C20262\tsubtractBed\tintrn\t650\t871\t.\t-\t.\tParent=CGI_10000025;\r", "\r\n", "C20334\tsubtractBed\tintrn\t524\t867\t.\t-\t.\tParent=CGI_10000028;\r", "\r\n", "C20412\tsubtractBed\tintrn\t215\t409\t.\t-\t.\tParent=CGI_10000029;\r", "\r\n", "C20412\tsubtractBed\tintrn\t464\t705\t.\t-\t.\tParent=CGI_10000029;\r", "\r\n", "C20462\tsubtractBed\tintrn\t50\t271\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n", "C20462\tsubtractBed\tintrn\t360\t481\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n", "C20462\tsubtractBed\tintrn\t577\t822\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "http://eagle.fish.washington.edu/cnidarian/Cgigas_v9_intron_v2c.gff" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!sed 's/intron/intrn/g' /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff\n" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 176049 1584441 12654996 /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff\r\n" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 176049 1584441 13834641 /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff\r\n" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Intron (Option 2)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!complementBed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_complement_exon.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 143 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_complement_exon.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C1\t0\t100\r\n", "C10003\t0\t156\r\n", "C10005\t0\t156\r\n", "C10007\t0\t156\r\n", "C10009\t0\t156\r\n", "C1001\t0\t103\r\n", "C10011\t0\t156\r\n", "C10013\t0\t157\r\n", "C10015\t0\t157\r\n", "C10021\t0\t157\r\n" ] } ], "prompt_number": 144 }, { "cell_type": "code", "collapsed": false, "input": [ "!intersectBed -a /Volumes/web/cnidarian/TJGR_complement_exon.bed -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff > /Volumes/web/cnidarian/TJGR_intron2.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 145 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_intron2.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\t74\t103\r\n", "C19392\t183\t451\r\n", "C20262\t538\t641\r\n", "C20262\t649\t871\r\n", "C20334\t523\t867\r\n", "C20412\t214\t409\r\n", "C20412\t463\t705\r\n", "C20462\t49\t271\r\n", "C20462\t359\t481\r\n", "C20462\t576\t822\r\n" ] } ], "prompt_number": 146 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "###Transposable Elements \n", "\n", "Generating TE canonical GFF from RepeatProteinMask oyster v9\n", "Updated Today\n", "The starting file for this is the output of RepeatProteinMask performed by SR \n", "(look towards the bottom of this entry): `https://www.evernote.com/shard/s10/sh/7dea995c-17ac-4bcf-bc38-963220e9e7c9/b28dacbbdbfe123960b88e42fa45a34a`\n", "\n", "The txt file (http://eagle.fish.washington.edu/cnidarian/qDOD_RepeatProteinMask_v9.txt) was uploaded into SQLshare\n", "\n", "Then a gff was derived using the following query:\n", "\n", "```\n", "SELECT \n", "SeqID as seqname, \n", "Method as source, \n", "Type as feature, \n", "[Begin] as [start],\n", "[End] as [end],\n", "Score as score, \n", "sym as strand, \n", "'.' as frame, \n", "'.' as attribute \n", "FROM [mgavery@washington.edu].[qDOD_RepeatProteinMask_v9.txt]\n", "``` \n", "\n", "\n", "The derived SQLdataset is shared publicly here: https://sqlshare.escience.washington.edu/sqlshare#s=query/mgavery%40washington.edu/qDOD_RepeatProteinMask_v9_asgff\n", "\n", "The file was downloaded and saved as a .gff and saved here: http://eagle.fish.washington.edu/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff\n" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C21242\tTRF\tTandem_Repeat\t38\t100\t72\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t35\t143\t112\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t574\t947\t208\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t574\t901\t313\t+\t.\t.\r\n", "C21372\tTRF\tTandem_Repeat\t643\t671\t58\t+\t.\t.\r\n", "C22542\tTRF\tTandem_Repeat\t1727\t1774\t96\t+\t.\t.\r\n", "C22728\tTRF\tTandem_Repeat\t426\t491\t105\t+\t.\t.\r\n", "C23428\tTRF\tTandem_Repeat\t130\t415\t202\t+\t.\t.\r\n", "C23796\tTRF\tTandem_Repeat\t547\t608\t97\t+\t.\t.\r\n", "C24440\tTRF\tTandem_Repeat\t1059\t1089\t62\t+\t.\t.\r\n" ] } ], "prompt_number": 115 }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 116 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "###Other\n", "\n", "Complement to gene, promoter, nor TE" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "\n", "$ cat file1 file2 ... fileN > file1-N.nonunique.bed\n", "$ mergeBed -i file1-N.nonunique.bed > file1-N.merged.bed" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 126 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tpromoter\t386\t395\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tpromoter\t1\t30\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tpromoter\t1\t29\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tpromoter\t258\t491\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tpromoter\t388\t559\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tpromoter\t1\t173\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tpromoter\t547\t611\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tpromoter\t659\t714\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tpromoter\t1\t29\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tpromoter\t682\t743\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 129 }, { "cell_type": "code", "collapsed": false, "input": [ "!sortBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 130 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C10153\tWUBlastX\tLTR_Pao\t3\t158\t109\t-\t.\t.\r\n", "C10177\tWUBlastX\tLINE_L2\t2\t157\t97\t-\t.\t.\r\n", "C10191\tWUBlastX\tLTR_Copia\t2\t157\t174\t-\t.\t.\r\n", "C10245\tWUBlastX\tLINE_Penelope\t5\t154\t59\t-\t.\t.\r\n", "C10291\tWUBlastX\tLTR_Copia\t2\t160\t85\t-\t.\t.\r\n", "C10475\tWUBlastX\tLINE_L1-Tx1\t3\t149\t50\t-\t.\t.\r\n", "C10673\tWUBlastX\tLTR_DIRS\t37\t162\t59\t+\t.\t.\r\n", "C10675\tWUBlastX\tLINE_L2\t1\t165\t132\t+\t.\t.\r\n", "C10805\tWUBlastX\tLINE_I\t1\t168\t100\t-\t.\t.\r\n", "C10973\tWUBlastX\tLTR_Gypsy\t3\t167\t186\t+\t.\t.\r\n" ] } ], "prompt_number": 131 }, { "cell_type": "code", "collapsed": false, "input": [ "!mergebed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 134 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C10153\t2\t158\r\n", "C10177\t1\t157\r\n", "C10191\t1\t157\r\n", "C10245\t4\t154\r\n", "C10291\t1\t160\r\n", "C10475\t2\t149\r\n", "C10673\t36\t162\r\n", "C10675\t0\t165\r\n", "C10805\t0\t168\r\n", "C10973\t2\t167\r\n" ] } ], "prompt_number": 135 }, { "cell_type": "code", "collapsed": false, "input": [ "!complementBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 136 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C1\t0\t100\r\n", "C10003\t0\t156\r\n", "C10005\t0\t156\r\n", "C10007\t0\t156\r\n", "C10009\t0\t156\r\n", "C1001\t0\t103\r\n", "C10011\t0\t156\r\n", "C10013\t0\t157\r\n", "C10015\t0\t157\r\n", "C10021\t0\t157\r\n" ] } ], "prompt_number": 137 }, { "cell_type": "code", "collapsed": false, "input": [ "http://eagle.fish.washington.edu/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "cp /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 139 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C1\t0\t100\r\n", "C10003\t0\t156\r\n", "C10005\t0\t156\r\n", "C10007\t0\t156\r\n", "C10009\t0\t156\r\n", "C1001\t0\t103\r\n", "C10011\t0\t156\r\n", "C10013\t0\t157\r\n", "C10015\t0\t157\r\n", "C10021\t0\t157\r\n" ] } ], "prompt_number": 140 }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "TEST - Verification everything is covered" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff > /Volumes/web/cnidarian/TJGR_CanTest" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 149 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_CanTest" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\tParent=CGI_10000001;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\tParent=CGI_10000002;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\tParent=CGI_10000003;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\tParent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\tParent=CGI_10000004;\r\n", "C17998\tGLEAN\tCDS\t196\t387\t.\t-\t0\tParent=CGI_10000005;\r\n", "C18346\tGLEAN\tCDS\t174\t551\t.\t+\t0\tParent=CGI_10000009;\r\n", "C18428\tGLEAN\tCDS\t286\t546\t.\t-\t0\tParent=CGI_10000010;\r\n", "C18964\tGLEAN\tCDS\t203\t658\t.\t-\t0\tParent=CGI_10000011;\r\n", "C18980\tGLEAN\tCDS\t30\t674\t.\t+\t0\tParent=CGI_10000012;\r\n" ] } ], "prompt_number": 150 }, { "cell_type": "code", "collapsed": false, "input": [ "!sortBed -i /Volumes/web/cnidarian/TJGR_CanTest > /Volumes/web/cnidarian/TJGR_CanTest_s" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 151 }, { "cell_type": "code", "collapsed": false, "input": [ "!mergebed -i /Volumes/web/cnidarian/TJGR_CanTest_s > /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 155 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C10153\t2\t158\r\n", "C10177\t1\t157\r\n", "C10191\t1\t157\r\n", "C10245\t4\t154\r\n", "C10291\t1\t160\r\n", "C10475\t2\t149\r\n", "C10673\t36\t162\r\n", "C10675\t0\t165\r\n", "C10805\t0\t168\r\n", "C10973\t2\t167\r\n" ] } ], "prompt_number": 156 }, { "cell_type": "code", "collapsed": false, "input": [ "http://eagle.fish.washington.edu/cnidarian/TJGR_CanTest_s_unique.bed" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!intersectBed -a /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed > /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 157 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " 0 0 0 /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed\r\n" ] } ], "prompt_number": 159 }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "##URLS for Canonical Genome Features##\n", "\n", "\n", "**Gene** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff` \n", "\n", "**Exons** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff` \n", "\n", "**Intron** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff` \n", "\n", "**Promoter (= 1kbp 5' of genes)** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff` \n", "\n", "**Transposable Elements** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff` \n", "\n", "**Complement to Gene, Promoter, and TE tracks** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed` \n", "\n", "\n", "**All CGs** \n", "`http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "Import all tracks \n", "`http://eagle.fish.washington.edu/cnidarian/igv_session_073013.xml`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_previews_" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tmRNA\t35\t385\t0.555898\t-\t.\tID=CGI_10000001;\r\n", "C17212\tGLEAN\tmRNA\t31\t363\t0.999572\t+\t.\tID=CGI_10000002;\r\n", "C17316\tGLEAN\tmRNA\t30\t257\t0.555898\t+\t.\tID=CGI_10000003;\r\n", "C17476\tGLEAN\tmRNA\t34\t257\t0.998947\t-\t.\tID=CGI_10000004;\r\n", "C17998\tGLEAN\tmRNA\t196\t387\t1\t-\t.\tID=CGI_10000005;\r\n", "C18346\tGLEAN\tmRNA\t174\t551\t1\t+\t.\tID=CGI_10000009;\r\n", "C18428\tGLEAN\tmRNA\t286\t546\t0.555898\t-\t.\tID=CGI_10000010;\r\n", "C18964\tGLEAN\tmRNA\t203\t658\t0.999572\t-\t.\tID=CGI_10000011;\r\n", "C18980\tGLEAN\tmRNA\t30\t674\t0.555898\t+\t.\tID=CGI_10000012;\r\n", "C19100\tGLEAN\tmRNA\t160\t681\t0.999955\t-\t.\tID=CGI_10000013;\r\n" ] } ], "prompt_number": 123 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tGLEAN\tCDS\t35\t385\t.\t-\t0\tParent=CGI_10000001;\r\n", "C17212\tGLEAN\tCDS\t31\t363\t.\t+\t0\tParent=CGI_10000002;\r\n", "C17316\tGLEAN\tCDS\t30\t257\t.\t+\t0\tParent=CGI_10000003;\r\n", "C17476\tGLEAN\tCDS\t104\t257\t.\t-\t0\tParent=CGI_10000004;\r\n", "C17476\tGLEAN\tCDS\t34\t74\t.\t-\t2\tParent=CGI_10000004;\r\n", "C17998\tGLEAN\tCDS\t196\t387\t.\t-\t0\tParent=CGI_10000005;\r\n", "C18346\tGLEAN\tCDS\t174\t551\t.\t+\t0\tParent=CGI_10000009;\r\n", "C18428\tGLEAN\tCDS\t286\t546\t.\t-\t0\tParent=CGI_10000010;\r\n", "C18964\tGLEAN\tCDS\t203\t658\t.\t-\t0\tParent=CGI_10000011;\r\n", "C18980\tGLEAN\tCDS\t30\t674\t.\t+\t0\tParent=CGI_10000012;\r\n" ] } ], "prompt_number": 122 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C17476\tsubtractBed\tintrn\t75\t103\t.\t-\t.\tParent=CGI_10000004;\r", "\r\n", "C19392\tsubtractBed\tintrn\t184\t451\t.\t+\t.\tParent=CGI_10000015;\r", "\r\n", "C20262\tsubtractBed\tintrn\t539\t641\t.\t-\t.\tParent=CGI_10000025;\r", "\r\n", "C20262\tsubtractBed\tintrn\t650\t871\t.\t-\t.\tParent=CGI_10000025;\r", "\r\n", "C20334\tsubtractBed\tintrn\t524\t867\t.\t-\t.\tParent=CGI_10000028;\r", "\r\n", "C20412\tsubtractBed\tintrn\t215\t409\t.\t-\t.\tParent=CGI_10000029;\r", "\r\n", "C20412\tsubtractBed\tintrn\t464\t705\t.\t-\t.\tParent=CGI_10000029;\r", "\r\n", "C20462\tsubtractBed\tintrn\t50\t271\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n", "C20462\tsubtractBed\tintrn\t360\t481\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n", "C20462\tsubtractBed\tintrn\t577\t822\t.\t+\t.\tParent=CGI_10000030;\r", "\r\n" ] } ], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C16582\tflankbed\tpromoter\t386\t395\t.\t-\t.\tID=CGI_10000001;\r", "\r\n", "C17212\tflankbed\tpromoter\t1\t30\t.\t+\t.\tID=CGI_10000002;\r", "\r\n", "C17316\tflankbed\tpromoter\t1\t29\t.\t+\t.\tID=CGI_10000003;\r", "\r\n", "C17476\tflankbed\tpromoter\t258\t491\t.\t-\t.\tID=CGI_10000004;\r", "\r\n", "C17998\tflankbed\tpromoter\t388\t559\t.\t-\t.\tID=CGI_10000005;\r", "\r\n", "C18346\tflankbed\tpromoter\t1\t173\t.\t+\t.\tID=CGI_10000009;\r", "\r\n", "C18428\tflankbed\tpromoter\t547\t611\t.\t-\t.\tID=CGI_10000010;\r", "\r\n", "C18964\tflankbed\tpromoter\t659\t714\t.\t-\t.\tID=CGI_10000011;\r", "\r\n", "C18980\tflankbed\tpromoter\t1\t29\t.\t+\t.\tID=CGI_10000012;\r", "\r\n", "C19100\tflankbed\tpromoter\t682\t743\t.\t-\t.\tID=CGI_10000013;\r", "\r\n" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C21242\tTRF\tTandem_Repeat\t38\t100\t72\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t35\t143\t112\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t574\t947\t208\t+\t.\t.\r\n", "C21306\tTRF\tTandem_Repeat\t574\t901\t313\t+\t.\t.\r\n", "C21372\tTRF\tTandem_Repeat\t643\t671\t58\t+\t.\t.\r\n", "C22542\tTRF\tTandem_Repeat\t1727\t1774\t96\t+\t.\t.\r\n", "C22728\tTRF\tTandem_Repeat\t426\t491\t105\t+\t.\t.\r\n", "C23428\tTRF\tTandem_Repeat\t130\t415\t202\t+\t.\t.\r\n", "C23796\tTRF\tTandem_Repeat\t547\t608\t97\t+\t.\t.\r\n", "C24440\tTRF\tTandem_Repeat\t1059\t1089\t62\t+\t.\t.\r\n" ] } ], "prompt_number": 124 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "C1\t0\t100\r\n", "C10003\t0\t156\r\n", "C10005\t0\t156\r\n", "C10007\t0\t156\r\n", "C10009\t0\t156\r\n", "C1001\t0\t103\r\n", "C10011\t0\t156\r\n", "C10013\t0\t157\r\n", "C10015\t0\t157\r\n", "C10021\t0\t157\r\n" ] } ], "prompt_number": 142 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "##gff-version 3\r\n", "##sequence-region scaffold360 1 280\r\n", "#!Date 2013-04-23\r\n", "#!Type DNA\r\n", "#!Source-version EMBOSS 6.5.7.0\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t60\t61\t2\t+\t.\tID=scaffold360.1;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t96\t97\t2\t+\t.\tID=scaffold360.2;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t120\t121\t2\t+\t.\tID=scaffold360.3;note=*pat pattern:CG\r\n", "scaffold360\tfuzznuc\tnucleotide_motif\t187\t188\t2\t+\t.\tID=scaffold360.4;note=*pat pattern:CG\r\n", "##gff-version 3\r\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "###Gill Methylation - MBD BS-Seq data\n", "\n", "methratio did produce output and his was uploaded to SQLShare\n", "\n", "\"Screenshot%205/27/13%2012:27%20PM\"\n", "\n", "
\n", "https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/BiGill_methratio_v9_A.txt\n", "
\n", "\"Screenshot%205/27/13%2012:31%20PM\"\n", "
\n", "_converted to GFF format_ [see example](https://github.com/uwescience/sqlshare/wiki/Workflow:-Analysis-of-BSMAP-data#how-to-convert-methratio-file-to-gff-format)\n", "
\n", "_resulting file_\n", "\n", "\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "IGV session file available:\n", "http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Want to convert to IGV\n", "\n", "\"Screenshot%205/28/13%208:53%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " SELECT \n", " chr as seqname, \n", " pos - 1 as start, -- compensating for going to zero-based?\n", " pos + 1 as [end], \n", " 'CG' as feature, \n", " ratio as score \n", "\n", " FROM [sr320@washington.edu]. \n", " [BiGill_methratio_v9_A.txt] yel \n", " where \n", " context like '__CG_' --_=single character wildcard\n", " and\n", " CT_Count >= 5\u200b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", " python fetchdata.py -d \"[sr320@washington.edu].[BiGill_methratio_v9_IGV]\u200b\u200b\u200b\" -f tsv -o /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "seqname\tstart\tend\tfeature\tscore\r", "\r\n", "C10009\t69\t71\tCG\t0.000\r", "\r\n", "C10049\t42\t44\tCG\t0.200\r", "\r\n", "C10049\t51\t53\tCG\t0.071\r", "\r\n", "C10075\t86\t88\tCG\t0.200\r", "\r\n", "C1009\t88\t90\tCG\t0.833\r", "\r\n", "C10093\t106\t108\tCG\t0.875\r", "\r\n", "C10107\t27\t29\tCG\t0.000\r", "\r\n", "C10107\t92\t94\tCG\t0.667\r", "\r\n", "C10127\t71\t73\tCG\t0.000\r", "\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imported in IGV and looks like coordinates are ok" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Screenshot%205/28/13%209:11%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\"Screenshot%205/28/13%209:13%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### Male Gonad Methylation data\n", "Developing IGV file format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " SELECT \n", " chr as seqname, \n", " pos - 1 as start, -- compensating for going to zero-based?\n", " pos + 1 as [end], \n", " 'CG' as feature, \n", " ratio as score \n", "\n", " FROM [sr320@washington.edu]. \n", " [BiGO_betty_plain_methratio_v1.txt] yel \n", " where \n", " context like '__CG_' --_=single character wildcard\n", " and\n", " CT_Count >= 5\u200b\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", " python fetchdata.py -d \"[sr320@washington.edu].[BiGO_betty_methratio_v1_IGV]\u200b\u200b\u200b\" -f tsv -o /Volumes/web/cnidarian/BiGO_betty_methratio_v1.igv\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Screenshot%205/29/13%204:01%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "IGV Session resaved\n", "http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml\n", "\n", "*Details on sperm exon level expression available [here](http://nbviewer.ipython.org/url/eagle.fish.washington.edu/cnidarian/TJGR_Mgo_Expression.ipynb)*\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "### Adding gene level expresssion for sperm and gill" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gene level expression is in SQLShare, originally derived from CLC RNA-Seq" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Gill Expression data**\n", "\n", "\n", "\n", "\"Screenshot%206/2/13%208:14%20AM\"\n", " \n", "_SQLShare Query_\n", " \n", " \n", " SELECT \n", " Chromosome,\n", " \"Chromosome region start\" - 1 as start,\n", " \"Chromosome region end\" as [end],\n", " 'gene' as feature,\n", " RPKM \n", " \n", " FROM [sr320@washington.edu].[qDOD_Zhang_Gil_gene_RNA-seq]\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\u200b\n", " \n", "_Resulting file_\n", "\n", " \n", "_Downloading_ \n", " `python fetchdata.py -d \"[sr320@washington.edu].[Zhang_Gil_gene_RNA-seq_IGV]\u200b\" -f tsv -o /Volumes/web/cnidarian/Zhang_Gil_gene_RNA-seq.igv`\n", " \n", "_Needs to be sorted in IGV_ \n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Sperm Gene level expression**\n", "\n", "File in SQLShare \n", "\n", "_SQLShare Query_ \n", "```\n", "SELECT \n", "Chromosome,\n", "\"Chromosome region start\" - 1 as start,\n", "\"Chromosome region end\" as [end],\n", "'gene' as feature,\n", "RPKM as Mgo_RPKM\n", "FROM [sr320@washington.edu].[qDOD_Zhang_Mgo_gene_RNA-seq]\u200b\n", "```\n", "\n", "_New Dataset_\n", "\n", " \n", "_Downloading_ \n", "`python fetchdata.py -d \"[sr320@washington.edu].[Zhang_Mgo_gene_RNA-seq_IGV]\u200b\u200b\" -f tsv -o /Volumes/web/cnidarian/Zhang_Mgo_gene_RNA-seq.igv `\n", " \n", "_Sorted_ \n", "\n", "\n", "
\n", "New IGV Browser ...\n", "\n", "\"Screenshot%206/2/13%209:48%20AM\" \n", "http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml\n", " \n" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }