{ "metadata": { "name": "", "signature": "sha256:61a48f8ea86d627c5c74f721db6ec0145da4d164308476d7f84bae04c7861b2b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Olympia Oyster (Pat) -Initial- PacBio Data Analysis\n", "\n", "_Includes conversion, mapping RNA-seq data, and blast comparison of transcriptome_ \n", "_Focus is on scaffolds made from assembly of reads > 10k bp_\n", "\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#updated - trying out new software\n", "!date" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Sat Feb 22 10:43:06 PST 2014\r\n" ] } ], "prompt_number": 1 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "pbh5tools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "dependency- https://github.com/PacificBiosciences/pbcore" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!bash5tools.py m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5.1 --outFilePrefix myreads --outType fasta --readType Raw" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug] [--outFilePrefix OUTFILEPREFIX]\r\n", " [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE] [--minLength MINLENGTH]\r\n", " [--minReadScore MINREADSCORE] [--minPasses MINPASSES]\r\n", " input.bas.h5\r\n", "bash5tools.py: error: argument --readType: invalid choice: 'Raw' (choose from 'ccs', 'subreads', 'unrolled')\r\n" ] } ], "prompt_number": 45 }, { "cell_type": "code", "collapsed": false, "input": [ "!bash5tools.py -h" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "usage: bash5tools.py [-h] [--verbose] [--version] [--profile] [--debug] [--outFilePrefix OUTFILEPREFIX]\r\n", " [--readType {ccs,subreads,unrolled}] [--outType OUTTYPE] [--minLength MINLENGTH]\r\n", " [--minReadScore MINREADSCORE] [--minPasses MINPASSES]\r\n", " input.bas.h5\r\n", "\r\n", "Tool for extracting data from .bas.h5 files\r\n", "\r\n", "positional arguments:\r\n", " input.bas.h5 input .bas.h5 filename\r\n", "\r\n", "optional arguments:\r\n", " -h, --help show this help message and exit\r\n", " --verbose, -v Set the verbosity level (default: None)\r\n", " --version show program's version number and exit\r\n", " --profile Print runtime profile at exit (default: False)\r\n", " --debug Run within a debugger session (default: False)\r\n", " --outFilePrefix OUTFILEPREFIX\r\n", " output filename prefix [None]\r\n", " --readType {ccs,subreads,unrolled}\r\n", " read type (ccs, subreads, or unrolled) []\r\n", " --outType OUTTYPE output file type (fasta, fastq) [fasta]\r\n", "\r\n", "Read filtering arguments:\r\n", " --minLength MINLENGTH\r\n", " min read length [0]\r\n", " --minReadScore MINREADSCORE\r\n", " min read score, valid only with --readType={unrolled,subreads} [0]\r\n", " --minPasses MINPASSES\r\n", " min number of CCS passes, valid only with --readType=ccs [0]\r\n" ] } ], "prompt_number": 43 }, { "cell_type": "raw", "metadata": {}, "source": [ "!wget http://eagle.fish.washington.edu/whale/Pat/Analysis_Results/m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pwd" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 33, "text": [ "u'/Users/sr320/Desktop'" ] } ], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "!wc m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5.1" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "^C\r\n" ] } ], "prompt_number": 37 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "_Initial_ analysis below" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "_Data was provided by core facility with comments:_\n", "\n", "Data are grouped into folders by SMRT cell. Folders are generically named by the position of the cell in the run (A01_1, B01_1, etc.). Reads are available in PacBio's HDF5 format as .bas.h5 files. Metadata for each SMRT cell is available in PacBio's XML format.\n", "\n", "Tools for analysis of .bas.h5 files are available on PacBio's DevNet site (http://pacbiodevnet.com/) including the SMRT Analysis toolkit which can perform de novo assembly and resequencing.\n", "\n", "_Data was downloaded locally._ \n", "\n", " \n", "Two files\n", "\n", "\"Screenshot%207/8/13%2012:20%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Conversion to fastq\n", "\n", "\n", "_via Giles._\n", "\n", "Basically I took the bas.h5 file and ran it through the pbh5tools, specifically bash5tools.py.\n", "\n", "The exact command I ran was\n", "\n", "bash5tools.py --readType unrolled --outType fastq inputfile.bas.h5\n", "\n", "where inputfile.bas.h5 was the name of the file you got from them.\n", "\n", "**My only question was whether I did the --readType correctly**. I found some info on that online as well as some examples of running this command and they all talked about using a \"Raw\" mode which was removed from this version of bash5tools so I assumed it was the unrolled option because the other two options (ccs and subreads) where the same in previous versions.\n", "\n", "To Install,\n", "\n", "First you need numpy and h5py, both are python libraries. I installed these via MacPorts. Then you to install the pbcore library, which is a python library provided by PacBios (same website as the pbh5tools).\n", "\n", "Basically download pbcore, unpack, then run\n", "python setup.py build\n", "sudo python setup.py install\n", "\n", "This will install it with all the other python libraries, then do same with pbh5tools. You might need to add things to your PYTHONPATH and PATH environment variables. If you want to, you can customize where it goes by adding a --prefix to the install command but then you have to adjust PYTHONPATH and PATH (which is what I did).\n", "\n", "Links:\n", "\n", "Understanding PacBios reads\n", "https://github.com/PacificBiosciences/Bioinformatics-Training/wiki/Understanding-PacBio-transcriptome-data#wiki-readexplained\n", "\n", "Examples for bash5tools.py\n", "http://seqanswers.com/forums/archive/index.php/t-16895.html\n", "\n", "Github for tools\n", "https://github.com/PacificBiosciences\n", "\n", "---\n", "resulting in \n", "\n", "\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Importing into CLC\n", "\n", "_info via CLC_ \n", "\n", "CLC bio tools are currently not optimized to handle the specific error profile of PacBio reads, and we therefore do not yet provide support for using this data type.\n", "\n", "If you still wish to try using PacBio data in the Genomics Workbench, then if you have your data in standard fastq format, you should be able to succeed in importing it into the CLC bio Genomics Workbench using the Illumina import option. If you choose to do this, please choose the option \"NCBI/Sanger or Illumina Pipeline 1.8 or later\" under 'Quality scores' in the Import wizard .\n", "\n", "Our developers are currently evaluating methods for error correction of PacBio reads (using short read data), and for hybrid de novo assembly using PacBio and short read data combined as input. If the outcome of this work matches the performance of tools commonly used for PacBio error correction (e.g. Celera Assembler using the pacBioToCA utility; Koren et al. 2012. Nature Biotechnology), or hybrid de novo assembly (e.g. ALPATHS-LG; Ribeiro et al. 2012. Genome Research), we will be considering providing such functionality in the future.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_thus_\n", "\n", "\"Screenshot%207/8/13%2012:27%20PM\"\n", "\n", "###Stats\n", "47,475 Sequences \n", "\n", "QC Report - \n", "\n", "_also viewable below_\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "More stats- \n", "**3058 sequences > 10000 bp** \n", "\n", " \n", " \n", " Running denovo assembly on\n", " Fast (simple contigs) \n", "\n", " \n", "https://docs.google.com/file/d/0B9V_gF766XZAOUg3ajNnbDZoMk0/preview\n", " " ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">OlyO_Pat_PacBio_10k_contig_1\r\n", "AAAAAAAGGGAGATGTTTTCCTCATGTTGAATTGAATTCTTCAACTCATTTAAATCAGGC\r\n", "AAGTGTTCAACACTAACGCTAGTTCCGGTTAGCCTGTAGGTCTAATAACTTTTGTCCAAA\r\n", "CTACCAGGATATACAATTATGCACTGTTTAGCAGGGAGACATCACAAAAGGTATTTAATT\r\n", "CCGATTACGCAAGAGCTTTTCTGCGTATTGATCAGGTTTTTTGAATCGAGGCAATGCTAC\r\n", "CTTACAGACAAAGTTTGTTGTTGTTCAGGGGTTTCAATAGTCACGTTTTAAAGGCAGCAT\r\n", "AATTTCGTTATAATTCTAATGGTCGTTATACAATTTAGTTTACCAATATTTGACCTATCA\r\n", "TTTGAGTCAAATACTTGGTCCTGATTGTGGTTTCATAGCCATTGTTAAGTGCCGTTTTAC\r\n", "ACACTGTATTTGACTACGGACTGTTCCGTTACCTGATCAAGACTATGGGGCCTCCCACGG\r\n", "CGGGTGTGACGCGGTCAACAGGGGATGCCTACTCCTCTCTAGGCAGCCTGATCCCCACTT\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "###Ran cd-hit-est via webserver\n", "\n", "\n", "Summary: Nothing combined at 90% level (probably to big for cd-est hit)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "###Importing in iPlant\n", "\n", "\"Screenshot%207/10/13%207:09%20AM\"\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TopHat2-PE \n", "\"Screenshot%207/10/13%207:10%20AM\"\n", "\n", "---\n", "\"Screenshot%207/10/13%207:11%20AM\"\n", "\n", "---\n", "Reference Genome - OlyO_Pat_PacBio_10k_contigs.fa\n", "\n", "Output - OlyO_10kcontigs_TopHatm106\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#finished" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "output file \n", "`http://eagle.fish.washington.edu/cnidarian/OlyO10k_filtered_106A_Male_Mix_TAGCTT_L004_R1.bam`\n", "\n", "\"Screenshot%207/10/13%201:53%20PM\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "##Annotating 10kcontigs\n", "\n", " \n", "Have working copy of transcriptome and will blastn against 10k contigs " ] }, { "cell_type": "code", "collapsed": false, "input": [ "!makeblastdb -in /Volumes/web/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa -dbtype nucl -out /Volumes/Bay3/Software/ncbi-blast-2.2.27\\+/db/OlyO_PacBio_10kcontigs" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\r\n", "\r\n", "Building a new DB, current time: 07/10/2013 07:40:30\r\n", "New DB name: /Volumes/Bay3/Software/ncbi-blast-2.2.27+/db/OlyO_PacBio_10kcontigs\r\n", "New DB title: /Volumes/web/cnidarian/OlyO_Pat_PacBio_10k_contigs.fa\r\n", "Sequence type: Nucleotide\r\n", "Keep Linkouts: T\r\n", "Keep MBits: T\r\n", "Maximum file size: 1000000000B\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Adding sequences from FASTA; added 553 sequences in 0.575964 seconds.\r\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "!blastn -query /Volumes/web/cnidarian/OlyOv3_400bp.fasta -db /Volumes/Bay3/Software/ncbi-blast-2.2.27\\+/db/OlyO_PacBio_10kcontigs -out /Volumes/web/cnidarian/OlyOv3_400_blastn_OlyO10kcontigs -outfmt 6 -evalue 1E-20 -num_threads 2 -task blastn" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/OlyOv3_400_blastn_OlyO10kcontigs" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "4486232\tOlyO_Pat_PacBio_10k_contig_319\t89.39\t132\t4\t5\t142\t264\t21178\t21048\t1e-39\t 161\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t90.00\t130\t4\t6\t268\t390\t13091\t12964\t1e-38\t 158\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t90.00\t130\t4\t6\t268\t390\t20765\t20638\t1e-38\t 158\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t87.88\t132\t6\t5\t142\t264\t13504\t13374\t5e-37\t 152\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t84.50\t129\t9\t9\t144\t267\t11561\t11683\t4e-25\t 113\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t84.50\t129\t9\t9\t144\t267\t19235\t19357\t4e-25\t 113\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t82.14\t140\t9\t11\t142\t265\t5867\t5728\t7e-23\t 105\r\n", "4486255\tOlyO_Pat_PacBio_10k_contig_420\t77.51\t418\t31\t33\t36\t397\t9827\t10237\t4e-63\t 239\r\n", "4486256\tOlyO_Pat_PacBio_10k_contig_420\t77.51\t418\t31\t33\t5\t366\t10237\t9827\t4e-63\t 239\r\n", "4486297\tOlyO_Pat_PacBio_10k_contig_420\t78.89\t289\t34\t17\t2\t276\t24459\t24184\t2e-49\t 194\r\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/OlyOv3_400bp.fasta" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ ">4485895 length 400 cvg_67.8_tip_0\r\n", "ACCCAGAAAGGTTTAAAGAATGTATTTGATGAAGCCATTCTGGCTGCTCTGGAACCTCCTGAACCACCCAAAAAGAAGAAGTGTGTGTTGTTGTAATCTT\r\n", "TGAACTCTCGTCAGTTTCATGTGTAATCATAGAATGATTTCAACTTGTCATCTGTGGGAAAATCTTGTGCAAAATTAAAAATAAAAACCACTTTTATACA\r\n", "TGTCTGGATAAGTATTTTCACAGATGGAAGAGTGCGGGTTGAAATAGAGATTATTCCAACTTTCTGAAGAAAAGGAATATTTGAAGTTCCTGAGACGGAA\r\n", "AAGGCAGGTGTTATTTTCAAGCGAACCACTAGCACAGTGCTGTGGTTTTATTATCCCATATGGGTCCAATGAACATATGATTTGTAAATATATATATAAT\r\n", "\r\n", ">4485897 length 400 cvg_5.5_tip_1\r\n", "ACACTGCACATCGCGGTCCTAGATTTTAACGACAACACGCCGTATTTCCTTAACAGTACATATAAATTTAGTGAAAATGAATCCACTTACAACAGAACAA\r\n", "GAATTGGAGCATTGTATGCCCATGATCTGGACTCGGGACAAAACGCCAATATAACGTTTTCTATCTCTGGAGGAAACAGTCAAGGACATTTTCAAATAGA\r\n", "TCCATACACGGGTGATTTATTCATAAATGGCCTTATCGACAGAGAAAATGTATCCTCATACAACCTGAGAGTTACTATAAGAGATAATCCGTCTAATCCG\r\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "!grep -c \"OlyO_Pat_\" /Volumes/web/cnidarian/OlyOv3_400_blastn_OlyO10kcontigs " ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1730\r\n" ] } ], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Will try perl script I modified to take this \"reverse\" blast and make a gff file. \n", "\n", "Another option in SQLShare" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!2_Blast2gff.pl -i /Volumes/web/cnidarian/OlyOv3_400_blastn_OlyO10kcontigs -o /Users/sr320/Desktop/OlyO10kcontigs_exon_a.gff -d \"OlyOv3_400\" -p EXON -s \"something\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "mv /Users/sr320/Desktop/OlyO10kcontigs_exon_a.gff /Volumes/web/cnidarian/" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "mv: /Volumes/web/cnidarian/OlyO10kcontigs_exon_a.gff: set owner/group (was: 501/20): Operation not permitted\r\n" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "perl script would not write to eagle so had to move" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/OlyO10kcontigs_exon_a.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t21178\t21048\t1e-39\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t13091\t12964\t1e-38\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t20765\t20638\t1e-38\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t13504\t13374\t5e-37\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t11561\t11683\t4e-25\t+\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t19235\t19357\t4e-25\t+\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t5867\t5728\t7e-23\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t9827\t10237\t4e-63\t+\t.\t4486255\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t10237\t9827\t4e-63\t-\t.\t4486256\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t24459\t24184\t2e-49\t-\t.\t4486297\t\r\n" ] } ], "prompt_number": 31 }, { "cell_type": "markdown", "metadata": {}, "source": [ " \"Screenshot%207/10/13%208:18%20AM\"" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wget https://usegalaxy.org/dataset/display?dataset_id=bbd44e69cb8906b586614e867a9985b1&to_ext=bed" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "pwd" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "u'/Users/Steven/Dropbox/Steven/ipython_nb'" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "ls" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "\u001b[31mBiGill_Gene_Methylation.ipynb\u001b[m\u001b[m* README.md\r\n", "BiGill_RNAseq.ipynb README_old.md\r\n", "\u001b[31mBiGill_Tran_Elements.ipynb\u001b[m\u001b[m* Roberts_cv.ipynb\r\n", "BiGill_array.ipynb \u001b[31mRuphi_OA_RNAseq.ipynb\u001b[m\u001b[m*\r\n", "\u001b[31mBiGo - methratio error.ipynb\u001b[m\u001b[m* TJGR_CpG_binding.ipynb\r\n", "\u001b[31mBiGo_GFF_dev.ipynb\u001b[m\u001b[m* TJGR_CpG_islands.ipynb\r\n", "\u001b[31mBiGo_RNAseq.ipynb\u001b[m\u001b[m* TJGR_Methylation_GenomeSnapshot.ipynb\r\n", "BiGo_larvae.ipynb \u001b[31mTJGR_Mgo_Expression.ipynb\u001b[m\u001b[m*\r\n", "BiGo_larvae_2.ipynb \u001b[31mTJGR_OysterGenome_IGV.ipynb\u001b[m\u001b[m*\r\n", "BiGo_larvae_methylkit.ipynb TJGR_ProteinAnnot.ipynb\r\n", "BiGo_methratio.ipynb \u001b[31mTJGR_pearl.ipynb\u001b[m\u001b[m*\r\n", "BiGo_methratio_mito.ipynb UW_SoftwareBootcamp.ipynb\r\n", "\u001b[31mBlackAb_Annot.ipynb\u001b[m\u001b[m* _BiGO_expPLOT.ipynb\r\n", "CC_ampk.ipynb _BiGo_RNAseq.pdf\r\n", "EY_methratio.ipynb _BiGo_larvae_bsmap274.ipynb\r\n", "\u001b[31mGene Specific Methylation.ipynb\u001b[m\u001b[m* _MGarray_blast.ipynb\r\n", "LT_C1q.ipynb _scratch.ipynb\r\n", "OA_MS_data_check.ipynb \u001b[30m\u001b[43mfish546\u001b[m\u001b[m/\r\n", "\u001b[31mOlyO_Chi_Exp.ipynb\u001b[m\u001b[m* gen_workflows.ipynb\r\n", "OlyO_GonadExp.ipynb \u001b[30m\u001b[43mimg\u001b[m\u001b[m/\r\n", "\u001b[31mOlyO_PacBio.ipynb\u001b[m\u001b[m* intersectbed.ipynb\r\n", "OlyO_transcriptome.ipynb snippets.md\r\n", "\u001b[31mOsHV_host.ipynb\u001b[m\u001b[m* \u001b[30m\u001b[43mtools\u001b[m\u001b[m/\r\n", "PAG_2014.ipynb\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Version 2 of genome" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!/Users/sr320/Dropbox/Steven/gt/fish546/2_Blast2Gff.pl \\\n", "-i /Volumes/web/dermochelys/Bioinformatics/IGV\\ stuff/otbnos.tab \\\n", "-o /Volumes/web/cnidarian/Olytranv2_OlyPacBio_v02.gff \\\n", "-d \"OlyOvtran_v2\" \\\n", "-p EXON \\\n", "-s \"something\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/OlyOv3_400_blastn_OlyO10kcontigs" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "4486232\tOlyO_Pat_PacBio_10k_contig_319\t89.39\t132\t4\t5\t142\t264\t21178\t21048\t1e-39\t 161\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t90.00\t130\t4\t6\t268\t390\t13091\t12964\t1e-38\t 158\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t90.00\t130\t4\t6\t268\t390\t20765\t20638\t1e-38\t 158\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t87.88\t132\t6\t5\t142\t264\t13504\t13374\t5e-37\t 152\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t84.50\t129\t9\t9\t144\t267\t11561\t11683\t4e-25\t 113\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t84.50\t129\t9\t9\t144\t267\t19235\t19357\t4e-25\t 113\r\n", "4486232\tOlyO_Pat_PacBio_10k_contig_319\t82.14\t140\t9\t11\t142\t265\t5867\t5728\t7e-23\t 105\r\n", "4486255\tOlyO_Pat_PacBio_10k_contig_420\t77.51\t418\t31\t33\t36\t397\t9827\t10237\t4e-63\t 239\r\n", "4486256\tOlyO_Pat_PacBio_10k_contig_420\t77.51\t418\t31\t33\t5\t366\t10237\t9827\t4e-63\t 239\r\n", "4486297\tOlyO_Pat_PacBio_10k_contig_420\t78.89\t289\t34\t17\t2\t276\t24459\t24184\t2e-49\t 194\r\n" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Users/sr320/Desktop/OlyO10kcontigs_exon_a.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t21178\t21048\t1e-39\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t13091\t12964\t1e-38\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t20765\t20638\t1e-38\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t13504\t13374\t5e-37\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t11561\t11683\t4e-25\t+\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t19235\t19357\t4e-25\t+\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_319\tblastn:OlyOv3_400\tblastn\t5867\t5728\t7e-23\t-\t.\t4486232\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t9827\t10237\t4e-63\t+\t.\t4486255\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t10237\t9827\t4e-63\t-\t.\t4486256\t\r\n", "OlyO_Pat_PacBio_10k_contig_420\tblastn:OlyOv3_400\tblastn\t24459\t24184\t2e-49\t-\t.\t4486297\t\r\n" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/dermochelys/Bioinformatics/IGV\\ stuff/otbnos.tab" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_549\t87.16\t7716\t122\t762\t6\t7148\t9554\t2135\t0.0\t7956\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t86.10\t6915\t116\t736\t873\t7148\t7401\t693\t0.0\t6665\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t87.76\t1699\t18\t167\t874\t2434\t7475\t9121\t0.0\t1810\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t85.99\t828\t18\t90\t2747\t3508\t11421\t10626\t0.0\t 797\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t80.39\t1188\t15\t153\t2441\t3428\t9248\t10417\t0.0\t 702\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_550\t91.46\t82\t1\t5\t2559\t2637\t5083\t5005\t1e-21\t 108\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_1964\t81.58\t7947\t208\t1059\t6\t7148\t9076\t1582\t0.0\t5406\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_5046\t87.24\t4946\t52\t527\t873\t5348\t5837\t10673\t0.0\t5103\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_252\t82.64\t6913\t77\t873\t1241\t7150\t6128\t12920\t0.0\t5081\r\n", "Olurida_trim_nodups_v2reads_contig_9\tOlyO_Pat_PacBio_1_contig_2528\t84.63\t5140\t96\t596\t6\t4590\t4524\t9524\t0.0\t4475\r\n" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "# could just awk etc \"!awk '{ print $10 }'" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head /Volumes/web/cnidarian/Olytranv2_OlyPacBio_v02.gff" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "OlyO_Pat_PacBio_1_contig_549\tblastn:OlyOvtran_v2\tblastn\t9554\t2135\t0.0\t-\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_550\tblastn:OlyOvtran_v2\tblastn\t7401\t693\t0.0\t-\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_550\tblastn:OlyOvtran_v2\tblastn\t7475\t9121\t0.0\t+\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_550\tblastn:OlyOvtran_v2\tblastn\t11421\t10626\t0.0\t-\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_550\tblastn:OlyOvtran_v2\tblastn\t9248\t10417\t0.0\t+\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_550\tblastn:OlyOvtran_v2\tblastn\t5083\t5005\t1e-21\t-\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_1964\tblastn:OlyOvtran_v2\tblastn\t9076\t1582\t0.0\t-\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_5046\tblastn:OlyOvtran_v2\tblastn\t5837\t10673\t0.0\t+\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_252\tblastn:OlyOvtran_v2\tblastn\t6128\t12920\t0.0\t+\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n", "OlyO_Pat_PacBio_1_contig_2528\tblastn:OlyOvtran_v2\tblastn\t4524\t9524\t0.0\t+\t.\tOlurida_trim_nodups_v2reads_contig_9\t\r\n" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "url: " ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }