{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Running in Docker container on Ostrich\n", "\n", "#### Started Docker container with the following command:\n", "\n", "```docker run -p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:/owl_web -v /Users/sam/gitrepos:/gitrepos -it f99537d7e06a```\n", "\n", "The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.\n", "\n", "Once the container was started, started Jupyter Notebook with the following command inside the Docker container:\n", "\n", "```jupyter notebook```\n", "\n", "This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.\n", "\n", "The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Tue Mar 14 21:22:24 UTC 2017\n" ] } ], "source": [ "%%bash\n", "date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check computer specs" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0f2bca9c664b\n" ] } ], "source": [ "%%bash\n", "hostname" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 8\n", "On-line CPU(s) list: 0-7\n", "Thread(s) per core: 1\n", "Core(s) per socket: 8\n", "Socket(s): 1\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 26\n", "Model name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz\n", "Stepping: 5\n", "CPU MHz: 2260.998\n", "BogoMIPS: 4521.99\n", "Hypervisor vendor: KVM\n", "Virtualization type: full\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 256K\n", "L3 cache: 8192K\n" ] } ], "source": [ "%%bash\n", "lscpu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download BGI Tools for Demultiplexing" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/gitrepos\n" ] } ], "source": [ "cd /gitrepos/" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Cloning into 'Reseqtools'...\n" ] } ], "source": [ "%%bash\n", "git clone https://github.com/BGI-shenzhen/Reseqtools.git" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2bRAD_GATK\n", "LabDocs\n", "OwlUploader\n", "Reseqtools\n", "paper_oly_gbs\n" ] } ], "source": [ "%%bash\n", "ls" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cd Reseqtools/" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2bRAD_GATK\n", "LabDocs\n", "OwlUploader\n", "Reseqtools\n", "paper_oly_gbs\n" ] } ], "source": [ "%%bash\n", "ls" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/gitrepos/Reseqtools\n" ] } ], "source": [ "cd Reseqtools/" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LICENSE README.md iTools_Code20160530.tar.gz\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Follow the instuctions emailed by Lisa Cheng from BGI:\n", "\n", ">Hi Sam,\n", "\n", ">We downloaded it and it seems fine when compiling. You can compile it with the below command under Linux system. \n", "> tar -zxvf ReSeqTools_XXX.tar.gz ; cd iTools_Code; chmod 775 iTools ; ./ iTools -h\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "tar (child): ReSeqTools_XXX.tar.gz: Cannot open: No such file or directory\n", "tar (child): Error is not recoverable: exiting now\n", "tar: Child returned status 2\n", "tar: Error is not recoverable: exiting now\n", "bash: line 1: cd: iTools_Code: No such file or directory\n", "chmod: cannot access 'iTools': No such file or directory\n", "bash: line 1: ./: Is a directory\n" ] } ], "source": [ "%%bash\n", "tar -zxvf ReSeqTools_XXX.tar.gz ; cd iTools_Code; chmod 775 iTools ; ./ iTools -h" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 1: cd: iTools_Code: No such file or directory\n", "chmod: cannot access 'iTools': No such file or directory\n", "bash: line 1: ./: Is a directory\n" ] } ], "source": [ "%%bash\n", "cd iTools_Code; chmod 775 iTools ; ./ iTools -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ooops, my bad. Here we go again..." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 2: cd: iTools_Code20160530: No such file or directory\n", "chmod: cannot access 'iTools': No such file or directory\n", "bash: line 4: ./iTools: No such file or directory\n" ] } ], "source": [ "%%bash\n", "tar -zxf iTools_Code20160530.tar.gz\n", "cd iTools_Code20160530\n", "chmod 775 iTools\n", "./iTools -h" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LICENSE README.md \u001b[0m\u001b[01;34miTools_Code\u001b[0m/ iTools_Code20160530.tar.gz\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error opening terminal: xterm-color.\n" ] } ], "source": [ "%%bash\n", "cd iTools_Code/\n", "chmod 775 iTools\n", "./iTools -h" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LICENSE README.md \u001b[0m\u001b[01;34miTools_Code\u001b[0m/ iTools_Code20160530.tar.gz\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/gitrepos/Reseqtools/iTools_Code\n" ] } ], "source": [ "cd iTools_Code/" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m\u001b[01;34mExample\u001b[0m/ \u001b[01;36mReSeqTools\u001b[0m@ config.h.in \u001b[01;36mdepcomp\u001b[0m@ \u001b[01;36minstall-sh\u001b[0m@\r\n", "Install.Readme \u001b[01;36mReSeqTools.Readme\u001b[0m@ config.log \u001b[01;34mdocument\u001b[0m/ \u001b[01;36mmissing\u001b[0m@\r\n", "Makefile aclocal.m4 \u001b[01;32mconfig.status\u001b[0m* \u001b[01;32miTools\u001b[0m* \u001b[01;34msrc\u001b[0m/\r\n", "Makefile.am \u001b[01;34mautom4te.cache\u001b[0m/ \u001b[01;32mconfigure\u001b[0m* \u001b[01;36miTools.Readme\u001b[0m@ stamp-h1\r\n", "Makefile.in \u001b[01;34mbin\u001b[0m/ configure.ac iTools.cpp\r\n", "NEW config.h configure.scan iTools.o\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m chmod 775 iTools\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "chmod 775 iTools" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "chmod 775 iTools" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m ./iTools -h\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "./iTools -h" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Error opening terminal: xterm-color.\n" ] } ], "source": [ "%%bash\n", "./iTools -h" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 1: ./: Is a directory\n" ] } ], "source": [ "%%bash\n", "./ iTools -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### It doesn't work...\n", "\n", "#### Looking at the Install.Readme file, it indicates to compile the program with the following command (not using that ```-h``` argument, as proposed in the email...)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "make all-am\n", "make[1]: Entering directory '/gitrepos/Reseqtools/iTools_Code'\n", "if g++ -DHAVE_CONFIG_H -I. -I. -I. -g -O2 -MT iTools.o -MD -MP -MF \".deps/iTools.Tpo\" -c -o iTools.o iTools.cpp; \\\n", "then mv -f \".deps/iTools.Tpo\" \".deps/iTools.Po\"; else rm -f \".deps/iTools.Tpo\"; exit 1; fi\n", "Makefile:264: recipe for target 'iTools.o' failed\n", "make[1]: Leaving directory '/gitrepos/Reseqtools/iTools_Code'\n", "Makefile:174: recipe for target 'all' failed\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "In file included from /usr/include/c++/4.9/ext/new_allocator.h:33:0,\n", " from /usr/include/x86_64-linux-gnu/c++/4.9/bits/c++allocator.h:33,\n", " from /usr/include/c++/4.9/bits/allocator.h:46,\n", " from /usr/include/c++/4.9/string:41,\n", " from /usr/include/c++/4.9/bits/locale_classes.h:40,\n", " from /usr/include/c++/4.9/bits/ios_base.h:41,\n", " from /usr/include/c++/4.9/ios:42,\n", " from /usr/include/c++/4.9/ostream:38,\n", " from /usr/include/c++/4.9/iostream:39,\n", " from iTools.cpp:1:\n", "./new:2:1: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:3: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:5: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:7: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:9: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:11: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:13: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:15: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:29: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:31: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:33: error: stray '#' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:6:1: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:3: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:5: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:7: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:9: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:11: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:13: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:15: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:22: error: invalid digit \"9\" in octal constant\n", " ################2014-09-19##############\n", " ^\n", "./new:6:27: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:29: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:31: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:33: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:35: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:37: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:39: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:10:1: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:3: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:5: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:7: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:9: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:11: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:13: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:15: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:27: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:29: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:31: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:33: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:35: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:37: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:39: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:14:1: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:3: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:5: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:7: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:9: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:11: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:13: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:15: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:27: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:29: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:31: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:33: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:35: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:37: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:39: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:17:1: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:3: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:5: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:7: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:9: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:11: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:13: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:15: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:27: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:29: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:31: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:33: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:35: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:37: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:39: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:20:1: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:3: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:5: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:7: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:9: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:11: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:13: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:15: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:17: error: stray '#' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:28: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:30: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:32: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:34: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:36: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:38: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:40: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:24:1: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:3: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:5: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:7: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:9: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:11: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:13: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:15: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:17: error: stray '#' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:28: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:30: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:32: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:34: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:36: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:38: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:40: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "In file included from /usr/include/c++/4.9/bits/stl_construct.h:59:0,\n", " from /usr/include/c++/4.9/vector:62,\n", " from ./src/ALL/comm.h:8,\n", " from iTools.cpp:9:\n", "./new:2:1: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:3: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:5: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:7: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:9: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:11: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:13: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:15: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:29: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:31: error: stray '##' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:2:33: error: stray '#' in program\n", " ################ 2016-05-20 #####\n", " ^\n", "./new:6:1: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:3: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:5: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:7: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:9: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:11: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:13: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:15: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:22: error: invalid digit \"9\" in octal constant\n", " ################2014-09-19##############\n", " ^\n", "./new:6:27: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:29: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:31: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:33: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:35: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:37: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:6:39: error: stray '##' in program\n", " ################2014-09-19##############\n", " ^\n", "./new:10:1: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:3: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:5: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:7: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:9: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:11: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:13: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:15: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:27: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:29: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:31: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:33: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:35: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:37: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:10:39: error: stray '##' in program\n", " ################2012-06-21##############\n", " ^\n", "./new:14:1: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:3: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:5: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:7: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:9: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:11: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:13: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:15: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:27: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:29: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:31: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:33: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:35: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:37: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:14:39: error: stray '##' in program\n", " ################2012-05-21##############\n", " ^\n", "./new:17:1: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:3: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:5: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:7: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:9: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:11: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:13: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:15: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:27: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:29: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:31: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:33: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:35: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:37: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:17:39: error: stray '##' in program\n", " ################2012-04-27##############\n", " ^\n", "./new:20:1: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:3: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:5: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:7: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:9: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:11: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:13: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:15: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:17: error: stray '#' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:28: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:30: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:32: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:34: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:36: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:38: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:20:40: error: stray '##' in program\n", " #################2012-03-12##############\n", " ^\n", "./new:24:1: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:3: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:5: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:7: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:9: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:11: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:13: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:15: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:17: error: stray '#' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:28: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:30: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:32: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:34: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:36: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:38: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "./new:24:40: error: stray '##' in program\n", " #################2010-01-12##############\n", " ^\n", "In file included from /usr/include/c++/4.9/ext/hash_map:60:0,\n", " from ./src/Soap/Soap_Split.h:15,\n", " from ./src/Soap/SOAPTools.h:19,\n", " from iTools.cpp:10:\n", "/usr/include/c++/4.9/backward/backward_warning.h:32:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp]\n", " #warning \\\n", " ^\n", "In file included from ./src/CNSTool/CNSTools.h:11:0,\n", " from iTools.cpp:12:\n", "./src/CNSTool/Addcn_All_V2.2.h:12:35: fatal error: boost/thread/thread.hpp: No such file or directory\n", " #include \n", " ^\n", "compilation terminated.\n", "make[1]: *** [iTools.o] Error 1\n", "make: *** [all] Error 2\n" ] } ], "source": [ "%%bash\n", "make" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### What a mess! Grrrrrr...\n", "\n", "#### Let me try something." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Program: iTools (ReSeqtools)\n", "Version: 0.23\thewm2008@gmail.com\tMay 30 2016\n", "\n", "\tUsage:\n", "\n", "\t\tFatools Tools For Fasta\n", "\t\tFqtools Tools For Fastq\n", "\t\tSOAPtools Tools For SOAP \n", "\t\tVartools Tools For SOAP Variant\n", "\t\tCNStools Tools For CNS\n", "\t\tXamtools Tools For Sam/Bam\n", "\t\tGfftools Tools For Gff\n", "\t\tFormtools Tools For Form convert\n", "\t\tFiletools Tools For Specified File\n", "\t\tOthertools Tools For Other\n", "\t\tGametools Tools For Game\n", "\n", "\t\tHelp Show help in detail\n", "\n" ] } ], "source": [ "%%bash\n", "./iTools" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "\tFqTools Usage:\n", "\n", "\t\tstat quick stat fastq's info\n", "\t\tfqcheck fqchek Base Q Distribute\n", "\t\tfilterV1 filter fastq for clean datas with trim\n", "\t\tfilterV2 filter fastq for clean datas select trim\n", "\t\trmAdapterPE index remove adapter of PE\n", "\t\trmAdapterSE index remove adapter of SE\n", "\t\tsplitpool Split pooling Fq to sample for RAD (GBS)\n", "\t\tcutIndex cut the Read Length in the Fq\n", "\t\tpooling pooling index library data filter\n", "\t\tbubble filter the N bubble site Read\n", "\t\tchangQ chang Fq seq Quality (+/- 31)\n", "\n", "\t\tHelp Show this help\n", "\n" ] } ], "source": [ "%%bash\n", "./iTools Fqtools" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\tUsage: splitpool -InFq1 -InFq2 -Index -Flag \n", "\n", "\t\t-InFq1 Input #_1.Fq to split RAD(GBS)\n", "\t\t-InFq2 Input #_2.Fq to split RAD(GBS)\n", "\t\t-Index Input File with (sample seq)\n", "\t\t-Flag Input File with Flag(ferment) seq\n", "\n", "\t\t-OutDir Output Dir for Split Files[PWD]\n", "\t\t-MisMatch Allow one misMatch on the sample seq\n", "\t\t-NoCheckS No Check Sample double,Allow one sample with multi seq\n", "\t\t but Read1 may be different length\n", "\n", "\t\t-help show this help\n", "\n" ] } ], "source": [ "%%bash\n", "./iTools Fqtools splitpool" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Wait, is this working even though the ```make``` command failed? Well, let's try running the script..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Copy files from Owl to local machine." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t27m30.177s\n", "user\t0m0.020s\n", "sys\t5m34.770s\n" ] } ], "source": [ "%%bash\n", "mkdir /data/oly_gbs_raw\n", "time cp /owl_web/nightingales/O_lurida/20160223_gbs/160123*.gz /data/oly_gbs_raw/" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz\r\n", "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz\r\n" ] } ], "source": [ "ls /data/oly_gbs_raw" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cp /owl_web/nightingales/O_lurida/20160223_gbs/*.[sl]*" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz\r\n", "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz\r\n" ] } ], "source": [ "ls /data/oly_gbs_raw" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "head: cannot open 'index.lst' for reading: No such file or directory\n" ] } ], "source": [ "%%bash\n", "head index.lst" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cp /owl_web/nightingales/O_lurida/20160223_gbs/*.[sl]* /data/oly_gbs_raw/" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m\u001b[01;34mExample\u001b[0m/ \u001b[01;36mReSeqTools\u001b[0m@ config.h.in \u001b[01;36mdepcomp\u001b[0m@ \u001b[01;36minstall-sh\u001b[0m@\r\n", "Install.Readme \u001b[01;36mReSeqTools.Readme\u001b[0m@ config.log \u001b[01;34mdocument\u001b[0m/ \u001b[01;36mmissing\u001b[0m@\r\n", "Makefile aclocal.m4 \u001b[01;32mconfig.status\u001b[0m* \u001b[01;32miTools\u001b[0m* \u001b[01;34msrc\u001b[0m/\r\n", "Makefile.am \u001b[01;34mautom4te.cache\u001b[0m/ \u001b[01;32mconfigure\u001b[0m* \u001b[01;36miTools.Readme\u001b[0m@ stamp-h1\r\n", "Makefile.in \u001b[01;34mbin\u001b[0m/ configure.ac iTools.cpp\r\n", "NEW config.h configure.scan iTools.o\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz index.lst\r\n", "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz split.sh\r\n" ] } ], "source": [ "ls /data/oly_gbs_raw" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OYSzenG1AAD96FAAPEI-109\tCTCC\t1NF_1A\n", "OYSzenG1AAD96FAAPEI-109\tTGCA\t1NF_2A\n", "OYSzenG1AAD96FAAPEI-109\tACTA\t1NF_4A\n", "OYSzenG1AAD96FAAPEI-109\tCAGA\t1NF_5A\n", "OYSzenG1AAD96FAAPEI-109\tAACT\t1NF_6A\n", "OYSzenG1AAD96FAAPEI-109\tGCGT\t1NF_7A\n", "OYSzenG1AAD96FAAPEI-109\tCGAT\t1NF_8A\n", "OYSzenG1AAD96FAAPEI-109\tGTAA\t1NF_9A\n", "OYSzenG1AAD96FAAPEI-109\tAGGC\t1NF_10A\n", "OYSzenG1AAD96FAAPEI-109\tGATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/index.lst" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OYSzenG1AAD96FAAPEI-109\tCTCC\t1NF_1A\n", "OYSzenG1AAD96FAAPEI-109\tTGCA\t1NF_2A\n", "OYSzenG1AAD96FAAPEI-109\tACTA\t1NF_4A\n", "OYSzenG1AAD96FAAPEI-109\tCAGA\t1NF_5A\n", "OYSzenG1AAD96FAAPEI-109\tAACT\t1NF_6A\n", "OYSzenG1AAD96FAAPEI-109\tGCGT\t1NF_7A\n", "OYSzenG1AAD96FAAPEI-109\tCGAT\t1NF_8A\n", "OYSzenG1AAD96FAAPEI-109\tGTAA\t1NF_9A\n", "OYSzenG1AAD96FAAPEI-109\tAGGC\t1NF_10A\n", "OYSzenG1AAD96FAAPEI-109\tGATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/split.sh" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OYSzenG1AAD96FAAPEI-109\tCTCC\t1NF_1A\n", "OYSzenG1AAD96FAAPEI-109\tTGCA\t1NF_2A\n", "OYSzenG1AAD96FAAPEI-109\tACTA\t1NF_4A\n", "OYSzenG1AAD96FAAPEI-109\tCAGA\t1NF_5A\n", "OYSzenG1AAD96FAAPEI-109\tAACT\t1NF_6A\n", "OYSzenG1AAD96FAAPEI-109\tGCGT\t1NF_7A\n", "OYSzenG1AAD96FAAPEI-109\tCGAT\t1NF_8A\n", "OYSzenG1AAD96FAAPEI-109\tGTAA\t1NF_9A\n", "OYSzenG1AAD96FAAPEI-109\tAGGC\t1NF_10A\n", "OYSzenG1AAD96FAAPEI-109\tGATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/split.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### I think I actually overwrote the ```split.sh``` script when I screwed up the first copy command in line 30 above! Whoops! Good thing for backups!!" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cp /gitrepos/paper_oly_gbs/data/split.sh /owl_web/nightingales/O_lurida/20160223_gbs/" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cp /gitrepos/paper_oly_gbs/data/split.sh /data/oly_gbs_raw/" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iTools Fqtools splitpool -InFq1 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz -InFq2 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz -Index index.lst -Flag enzyme.txt -MisMatch -OutDir split\n" ] } ], "source": [ "%%bash\n", "head /owl_web/nightingales/O_lurida/20160223_gbs/split.sh" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iTools Fqtools splitpool -InFq1 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz -InFq2 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz -Index index.lst -Flag enzyme.txt -MisMatch -OutDir split\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/split.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Whew! Much better! Now, we need to create the ```enzyme.txt``` file mentioned in the ```split.sh``` script, sicne it wasn't supplied by BGI." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "printf %\"s\\n\" CAGC CTGC > /data/oly_gbs_raw/enzyme.txt" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CAGC\n", "CTGC\n" ] } ], "source": [ "%%bash\n", "cat /data/oly_gbs_raw/enzyme.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's try out that iTools command!" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mkdir /data/oly_gbs_raw/split" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warming : sample double in this INDEX Files. Sample ID: OYSzenG1AAD96FAAPEI-109; please renamed it diff\n", "\n", "real\t0m0.181s\n", "user\t0m0.000s\n", "sys\t0m0.010s\n" ] } ], "source": [ "%%bash\n", "time ./iTools Fqtools splitpool \\\n", "-InFq1 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \\\n", "-InFq2 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \\\n", "-Index /data/oly_gbs_raw/index.lst \\\n", "-Flag /data/oly_gbs_raw/enzyme.txt \\\n", "-MisMatch \\\n", "-OutDir /data/oly_gbs_raw/split/" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Well, I'm giving up for today. Contacting BGI about this error message." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Try out different versions of the Index file" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wed Mar 15 17:19:02 UTC 2017\n" ] } ], "source": [ "%%bash\n", "date" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "iTools Fqtools splitpool -InFq1 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz -InFq2 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz -Index index.lst -Flag enzyme.txt -MisMatch -OutDir split\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/split.sh" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OYSzenG1AAD96FAAPEI-109\tCTCC\t1NF_1A\n", "OYSzenG1AAD96FAAPEI-109\tTGCA\t1NF_2A\n", "OYSzenG1AAD96FAAPEI-109\tACTA\t1NF_4A\n", "OYSzenG1AAD96FAAPEI-109\tCAGA\t1NF_5A\n", "OYSzenG1AAD96FAAPEI-109\tAACT\t1NF_6A\n", "OYSzenG1AAD96FAAPEI-109\tGCGT\t1NF_7A\n", "OYSzenG1AAD96FAAPEI-109\tCGAT\t1NF_8A\n", "OYSzenG1AAD96FAAPEI-109\tGTAA\t1NF_9A\n", "OYSzenG1AAD96FAAPEI-109\tAGGC\t1NF_10A\n", "OYSzenG1AAD96FAAPEI-109\tGATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/index.lst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's get rid of that first column - seems like it's possibly the issue throwing that error message from yesterday." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "awk {'printf (\"%s\\t%s\\n\", $2, $3)'} /data/oly_gbs_raw/index.lst > /data/oly_gbs_raw/index.tmp" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CTCC\t1NF_1A\n", "TGCA\t1NF_2A\n", "ACTA\t1NF_4A\n", "CAGA\t1NF_5A\n", "AACT\t1NF_6A\n", "GCGT\t1NF_7A\n", "CGAT\t1NF_8A\n", "GTAA\t1NF_9A\n", "AGGC\t1NF_10A\n", "GATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/index.tmp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! Let's try the script again..." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "bash: line 1: ./iTools: No such file or directory\n", "\n", "real\t0m0.002s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n" ] } ], "source": [ "%%bash\n", "time ./iTools Fqtools splitpool \\\n", "-InFq1 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \\\n", "-InFq2 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \\\n", "-Index /data/oly_gbs_raw/index.tmp \\\n", "-Flag /data/oly_gbs_raw/enzyme.txt \\\n", "-MisMatch \\\n", "-OutDir /data/oly_gbs_raw/split/" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "cd /gitrepos/Reseqtools/iTools_Code/" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/gitrepos/Reseqtools/iTools_Code\n" ] } ], "source": [ "cd /gitrepos/Reseqtools/iTools_Code/" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Process is interrupted.\n" ] } ], "source": [ "%%bash\n", "time ./iTools Fqtools splitpool \\\n", "-InFq1 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \\\n", "-InFq2 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \\\n", "-Index /data/oly_gbs_raw/index.tmp \\\n", "-Flag /data/oly_gbs_raw/enzyme.txt \\\n", "-MisMatch \\\n", "-OutDir /data/oly_gbs_raw/split/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I stopped the run because it was naming the output files incorrectly." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AAAAGTT_1.fq.gz CAGA_2.fq.gz GAATTCA_1.fq.gz TAATA_2.fq.gz\r\n", "AAAAGTT_2.fq.gz CATAAGT_1.fq.gz GAATTCA_2.fq.gz TACAT_1.fq.gz\r\n", "AACCGAGA_1.fq.gz CATAAGT_2.fq.gz GAGATA_1.fq.gz TACAT_2.fq.gz\r\n", "AACCGAGA_2.fq.gz CATCGT_1.fq.gz GAGATA_2.fq.gz TAGCATGC_1.fq.gz\r\n", "AACGCCT_1.fq.gz CATCGT_2.fq.gz GAGGA_1.fq.gz TAGCATGC_2.fq.gz\r\n", "AACGCCT_2.fq.gz CATCT_1.fq.gz GAGGA_2.fq.gz TAGCGGA_1.fq.gz\r\n", "AACT_1.fq.gz CATCT_2.fq.gz GATC_1.fq.gz TAGCGGA_2.fq.gz\r\n", "AACT_2.fq.gz CCACAA_1.fq.gz GATC_2.fq.gz TAGGAA_1.fq.gz\r\n", "AATATGC_1.fq.gz CCACAA_2.fq.gz GCCAGT_1.fq.gz TAGGAA_2.fq.gz\r\n", "AATATGC_2.fq.gz CCAGCT_1.fq.gz GCCAGT_2.fq.gz TAGGCCAT_1.fq.gz\r\n", "ACAAA_1.fq.gz CCAGCT_2.fq.gz GCGGAAT_1.fq.gz TAGGCCAT_2.fq.gz\r\n", "ACAAA_2.fq.gz CCATGGGT_1.fq.gz GCGGAAT_2.fq.gz TATCGGGA_1.fq.gz\r\n", "ACAGGGAA_1.fq.gz CCATGGGT_2.fq.gz GCGT_1.fq.gz TATCGGGA_2.fq.gz\r\n", "ACAGGGAA_2.fq.gz CCGGATAT_1.fq.gz GCGT_2.fq.gz TATTTTT_1.fq.gz\r\n", "ACCGT_1.fq.gz CCGGATAT_2.fq.gz GCTCTA_1.fq.gz TATTTTT_2.fq.gz\r\n", "ACCGT_2.fq.gz CCTAC_1.fq.gz GCTCTA_2.fq.gz TCACC_1.fq.gz\r\n", "ACCTAA_1.fq.gz CCTAC_2.fq.gz GCTGTGGA_1.fq.gz TCACC_2.fq.gz\r\n", "ACCTAA_2.fq.gz CGAT_1.fq.gz GCTGTGGA_2.fq.gz TCAC_1.fq.gz\r\n", "ACGACTAC_1.fq.gz CGAT_2.fq.gz GCTTA_1.fq.gz TCAC_2.fq.gz\r\n", "ACGACTAC_2.fq.gz CGCCTTAT_1.fq.gz GCTTA_2.fq.gz TCGAAGA_1.fq.gz\r\n", "ACGTGGTA_1.fq.gz CGCCTTAT_2.fq.gz GGAAC_1.fq.gz TCGAAGA_2.fq.gz\r\n", "ACGTGGTA_2.fq.gz CGCGGAGA_1.fq.gz GGAAC_2.fq.gz TCGTT_1.fq.gz\r\n", "ACGTGTT_1.fq.gz CGCGGAGA_2.fq.gz GGAAGA_1.fq.gz TCGTT_2.fq.gz\r\n", "ACGTGTT_2.fq.gz CGCGGT_1.fq.gz GGAAGA_2.fq.gz TCTCAGTC_1.fq.gz\r\n", "ACTA_1.fq.gz CGCGGT_2.fq.gz GGACCTA_1.fq.gz TCTCAGTC_2.fq.gz\r\n", "ACTA_2.fq.gz CGCTGAT_1.fq.gz GGACCTA_2.fq.gz TCTGTGA_1.fq.gz\r\n", "AGCCC_1.fq.gz CGCTGAT_2.fq.gz GGATTGGT_1.fq.gz TCTGTGA_2.fq.gz\r\n", "AGCCC_2.fq.gz CGCTT_1.fq.gz GGATTGGT_2.fq.gz TGCAAGGA_1.fq.gz\r\n", "AGGAT_1.fq.gz CGCTT_2.fq.gz GGTGT_1.fq.gz TGCAAGGA_2.fq.gz\r\n", "AGGAT_2.fq.gz CGGTAGA_1.fq.gz GGTGT_2.fq.gz TGCA_1.fq.gz\r\n", "AGGC_1.fq.gz CGGTAGA_2.fq.gz GGTTGT_1.fq.gz TGCA_2.fq.gz\r\n", "AGGC_2.fq.gz CGTGTGGT_1.fq.gz GGTTGT_2.fq.gz TGCGA_1.fq.gz\r\n", "AGTGGA_1.fq.gz CGTGTGGT_2.fq.gz GTAA_1.fq.gz TGCGA_2.fq.gz\r\n", "AGTGGA_2.fq.gz CTACGGA_1.fq.gz GTAA_2.fq.gz TGCTGGA_1.fq.gz\r\n", "ATATGT_1.fq.gz CTACGGA_2.fq.gz GTACTT_1.fq.gz TGCTGGA_2.fq.gz\r\n", "ATATGT_2.fq.gz CTAGC_1.fq.gz GTACTT_2.fq.gz TGGCTA_1.fq.gz\r\n", "ATCGTA_1.fq.gz CTAGC_2.fq.gz GTATT_1.fq.gz TGGCTA_2.fq.gz\r\n", "ATCGTA_2.fq.gz CTATTA_1.fq.gz GTATT_2.fq.gz TGGTACGT_1.fq.gz\r\n", "ATGAAAC_1.fq.gz CTATTA_2.fq.gz GTCAA_1.fq.gz TGGTACGT_2.fq.gz\r\n", "ATGAAAC_2.fq.gz CTCC_1.fq.gz GTCAA_2.fq.gz TTCAGA_1.fq.gz\r\n", "ATGCCT_1.fq.gz CTCC_2.fq.gz GTCGATT_1.fq.gz TTCAGA_2.fq.gz\r\n", "ATGCCT_2.fq.gz CTGTA_1.fq.gz GTCGATT_2.fq.gz TTCCTGGA_1.fq.gz\r\n", "ATTAATT_1.fq.gz CTGTA_2.fq.gz GTGAGGGT_1.fq.gz TTCCTGGA_2.fq.gz\r\n", "ATTAATT_2.fq.gz CTTCCA_1.fq.gz GTGAGGGT_2.fq.gz TTCTC_1.fq.gz\r\n", "ATTGA_1.fq.gz CTTCCA_2.fq.gz GTTGAA_1.fq.gz TTCTC_2.fq.gz\r\n", "ATTGA_2.fq.gz CTTGCTT_1.fq.gz GTTGAA_2.fq.gz UnKnow_1.fq.gz\r\n", "ATTGGAT_1.fq.gz CTTGCTT_2.fq.gz TAACGA_1.fq.gz UnKnow_2.fq.gz\r\n", "ATTGGAT_2.fq.gz GAACTTC_1.fq.gz TAACGA_2.fq.gz\r\n", "CAGA_1.fq.gz GAACTTC_2.fq.gz TAATA_1.fq.gz\r\n" ] } ], "source": [ "ls /data/oly_gbs_raw/split/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's reorder the index file (again)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "awk {'printf (\"%s\\t%s\\n\", $3, $2)'} /data/oly_gbs_raw/index.lst > /data/oly_gbs_raw/index.tmp" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1NF_1A\tCTCC\n", "1NF_2A\tTGCA\n", "1NF_4A\tACTA\n", "1NF_5A\tCAGA\n", "1NF_6A\tAACT\n", "1NF_7A\tGCGT\n", "1NF_8A\tCGAT\n", "1NF_9A\tGTAA\n", "1NF_10A\tAGGC\n", "1NF_11A\tGATC\n" ] } ], "source": [ "%%bash\n", "head /data/oly_gbs_raw/index.tmp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Clean out the output directory so it's not cluttered with useless files" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "ename": "SyntaxError", "evalue": "invalid syntax (, line 1)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m for i in /data/oly_gbs_raw/split/*.gz\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" ] } ], "source": [ "for i in /data/oly_gbs_raw/split/*.gz\n", " do\n", " rm \"$i\"\n", " done" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for i in /data/oly_gbs_raw/split/*.gz\n", " do\n", " rm \"$i\"\n", " done" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ls /data/oly_gbs_raw/split/" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t980m9.467s\n", "user\t797m30.100s\n", "sys\t156m26.850s\n" ] } ], "source": [ "%%bash\n", "time ./iTools Fqtools splitpool \\\n", "-InFq1 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz \\\n", "-InFq2 /data/oly_gbs_raw/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz \\\n", "-Index /data/oly_gbs_raw/index.tmp \\\n", "-Flag /data/oly_gbs_raw/enzyme.txt \\\n", "-MisMatch \\\n", "-OutDir /data/oly_gbs_raw/split/" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 36G\r\n", "-rw-r--r-- 1 srlab staff 507M Mar 16 10:05 UnKnow_2.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 445M Mar 16 10:05 UnKnow_1.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 229M Mar 16 10:05 1SN_9A_2.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 188M Mar 16 10:05 1SN_9A_1.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 240M Mar 16 10:05 1SN_8A_2.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 197M Mar 16 10:05 1SN_8A_1.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 216M Mar 16 10:05 1SN_7A_2.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 175M Mar 16 10:05 1SN_7A_1.fq.gz\r\n", "-rw-r--r-- 1 srlab staff 242M Mar 16 10:05 1SN_6A_2.fq.gz\r\n", "ls: write error\r\n" ] } ], "source": [ "ls -lhr /data/oly_gbs_raw/split/ | head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alright! Looks like those commands work! Will correct the ```index.lst``` file and will add the ```enzyme.txt``` file to the GBS paper repo." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0m\u001b[01;34m160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc\u001b[0m/ SNP.stat.xls\r\n", "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.html index.lst\r\n", "\u001b[01;34m160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc\u001b[0m/ readme.md\r\n", "160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.html split.sh\r\n", "Data.stat.xls\r\n" ] } ], "source": [ "ls /gitrepos/paper_oly_gbs/data/" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "OYSzenG1AAD96FAAPEI-109\tCTCC\t1NF_1A\n", "OYSzenG1AAD96FAAPEI-109\tTGCA\t1NF_2A\n", "OYSzenG1AAD96FAAPEI-109\tACTA\t1NF_4A\n", "OYSzenG1AAD96FAAPEI-109\tCAGA\t1NF_5A\n", "OYSzenG1AAD96FAAPEI-109\tAACT\t1NF_6A\n", "OYSzenG1AAD96FAAPEI-109\tGCGT\t1NF_7A\n", "OYSzenG1AAD96FAAPEI-109\tCGAT\t1NF_8A\n", "OYSzenG1AAD96FAAPEI-109\tGTAA\t1NF_9A\n", "OYSzenG1AAD96FAAPEI-109\tAGGC\t1NF_10A\n", "OYSzenG1AAD96FAAPEI-109\tGATC\t1NF_11A\n" ] } ], "source": [ "%%bash\n", "head /gitrepos/paper_oly_gbs/data/index.lst" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mv /data/oly_gbs_raw/index.tmp /gitrepos/paper_oly_gbs/data/index.lst" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1NF_1A\tCTCC\n", "1NF_2A\tTGCA\n", "1NF_4A\tACTA\n", "1NF_5A\tCAGA\n", "1NF_6A\tAACT\n", "1NF_7A\tGCGT\n", "1NF_8A\tCGAT\n", "1NF_9A\tGTAA\n", "1NF_10A\tAGGC\n", "1NF_11A\tGATC\n" ] } ], "source": [ "%%bash\n", "head /gitrepos/paper_oly_gbs/data/index.lst" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "fatal: /gitrepos/paper_oly_gbs/data/index.lst: '/gitrepos/paper_oly_gbs/data/index.lst' is outside repository\n" ] } ], "source": [ "%%bash\n", "git add /gitrepos/paper_oly_gbs/data/index.lst" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guess I can't add/commit to the repo from this notebook. Will do so outside." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "mv /data/oly_gbs_raw/enzyme.txt /gitrepos/paper_oly_gbs/data/enzyme.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For kicks, let's do a very brief comparison of one of the original demultiplexed files supplied by BGI to one of the demultiplexed files created above.\n", "\n", "We'll just look at line counts and see how they compare." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "741674 /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_2.fq.gz\n" ] } ], "source": [ "%%bash\n", "wc -l /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_2.fq.gz" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "wc: /data/oly_gbs_raw/1SN_9A_2.fq.gz: No such file or directory\n" ] } ], "source": [ "%%bash\n", "wc -l /data/oly_gbs_raw/1SN_9A_2.fq.gz" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1052299 /data/oly_gbs_raw/split/1SN_9A_2.fq.gz\n" ] } ], "source": [ "%%bash\n", "wc -l /data/oly_gbs_raw/split/1SN_9A_2.fq.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, this is curious... A couple of issues that I notice off the bat:\n", "\n", "1. Obviously, the line numbers differ.\n", "\n", "2. The demultiplexed file I created with the BGI script is not evenly divisible by the number 4. This is important because each read in a FASTQ file is supposed to have four lines of information.\n", "\n", "Let's look at another file." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "802505 /data/oly_gbs_raw/split/1SN_9A_1.fq.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "wc: /data/oly_gbs_raw/1SN_9A_1.fq.gz: No such file or directory\n" ] } ], "source": [ "%%bash\n", "wc -l /data/oly_gbs_raw/1SN_9A_1.fq.gz\n", "wc -l /data/oly_gbs_raw/split/1SN_9A_1.fq.gz" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "602771 /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_1.fq.gz\n", "802505 /data/oly_gbs_raw/split/1SN_9A_1.fq.gz\n" ] } ], "source": [ "%%bash\n", "wc -l /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_1.fq.gz\n", "wc -l /data/oly_gbs_raw/split/1SN_9A_1.fq.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, this is beginning to seem problematic. Let's glance at a bit of these FASTQ files and see if we can determine what's going on (doubtful, though)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DOH! User error! I just realized that I need to decompress the file before I can get accurate line counts! OMG!\n", "\n", "Let's try this again..." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "602994\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t0m10.640s\n", "user\t0m5.110s\n", "sys\t0m3.300s\n" ] } ], "source": [ "%%bash\n", "time gzip -c /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_1.fq.gz | wc -l" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "802289\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t0m12.173s\n", "user\t0m7.970s\n", "sys\t0m2.620s\n" ] } ], "source": [ "%%bash\n", "time gzip -c /data/oly_gbs_raw/split/1SN_9A_1.fq.gz | wc -l" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001f�\b\bLc�X\u0000\u00031SN_9A_1.fq.gz\u0000\u00003��\u001f�\b\u0000\u0000\u0000\u0000\u0000\u0000\u0003Խˮ�<�,8�����0�wg\u0016<8<\u0002��l�7������1\u0019AYvZV�Ԫv}{��Z�$�\f", "\u0006C���9\u001c", "��ӿ\u001e", "�����������?������_�����K�M����u��?�:�K$FI1�W�\u0018�\u0007I�\u000b", "�\u000b", ")I�~�i�G����Կc\n", "!LOC�(����=ՇQ_�>�_��=��q���|�z�o\u0015����\u001f;F_�˱n��������?�7OfH0k����U�5�����������F}w�GkF�r�w��\u001d", "\u000eת��;N�M����4N�\u0015�����t@u\u000e��:\u0019t\u0018��f��2\u001bu�;�G�\u000e8��_3گ}����j��/����M�d\u0003e�%X�\u0003\u0019�Oo\u001d", "o\u001d", "Y����p��'���Juz7��L���]���}�����G'��\u001c", "\u00068�O/ڨF3B?�}��O+�\u001e", "F�'p��S�>l��\u001a������_\u001e", "߲�q��j����dC\u001c", "t���W17'X�򏘥f��\u0005\u001b��f$�e�tv�~��ow�7�o��q���[�V�\u001e", "�S}��k2��N\\��f�.n]�b�9��/�$\u000e��M�͙,�矘'�.\u0002��`��գ�����ƧHi\u0000?����0�����������7�_ص�||���\u0019\u000e�_:�1\u0001\u000f�Z���|0�����`n���v%[\u0012b��\u001c", "��\u0013\u0018k�\u0015�����R\u001d", "�rx�����rS\u001b���W5�qܜ���T�0d�\u0001\u001d", "~�(�\u000b", "\t0JW����Jo��\u0010�։yCۼ�\u000fe�\u000b", "���+��:���\u000f�a�d��C���\\h\u0018};\u001d", "�F���l[��Ȇ\r", "ۖ:r]��6����3�]���\u0014��ˋ/\u001b���~�ϯS|6{���A��h\u0016׮����R7Z�\u001e", "5 ���P\u0017�َEo�6pCSg�\u001f�X\u001f�/����W�/���H[�b~��6�a�\u0011m��\r", "�������`�'��92\u0004����v��\b��z����ߵ.�ke�-�V��\u0012\u0011xd \u001d", "���9��٘\u000b", "�\u0000�>ٍ��v�����Q�:bi����\t\u0003\\1�.��\f", "A�E�\u0001�B,r��mqZ@p\u0012,\"�\u001d", "�c�\u001f�靺\u0003G�i\u0019E��D7ms�K\u0018��v.�}\u000b", "�\u0013f��\u001b�æ�Β�.���L��Mљ�2\u0016�\u0014�\\�<�G(�өn����m�6+l�\u001a����)\u0000�\u00160�bї.�\u0004W�\u0010\u0015�\u001bѩ:������Y��VO\u0011J_��\u0002pK!\u0010x�P{�b;R�.̌\"X���\u0015l䖩�ㅄ�Э\u001e", "WC�6I\u0016�9n|l\u001e", "�³MQ˾է��^]�\u0016t�\u0000�\u0010sZ�*J�eA*��L.IJ\u0012��`s$r*؜���u-ۀ¸�ȯ}�/��mg�G\u0003>\u0002\u0013\r", "&\u001e", "\u0011I�m�!�\u0016$�l�\u0013����ͯ��gV���\u001a�\u001a�Ұ�vٛ�\u0016��W2�f��D����stj\u0013�����f3v~�.\bQ�VOoe+$��}��\t�\f", "<���ؾ�����f��6���c��\u0017\u0010��i�<�x.X�J3/�Ô[�\u000e��Ѷ�\u0000#2�\u000b", "�K�/��#M���\u0012�`ܢ6{`�Q��\u0000�8}�\u0013<;��\b\u0004\u0017M��^ς\f", "G���?�o��r�\u001b�(\u0003^�\u0016-�܄�\u0013�8`D΁�i��?�HKއ�q(G�C���(�\u000b", "?1�g�n��;�>�8\u001f�F\u0007�!�q��L��;ϰ\u0019�3��\u0016�\u0002WL\u0000\u0014qS\u0010�#D\u0011�3Y6���\u000et�Gsḋ~���;��.'��V�q�F����p_\u0001e\u001b�4�A��a\u0012[�%[�A\u001d", "��\bec�ʳ��\u0017�\u0018�{�\u0000F�_m�t��;F\u0003\u0000G�m�!�Œ��.[\u0003�҂E�B\u0010��D\u0014�T\u0016��������\r", "\u0003��\"8\u001f\u00176�C�ia��=���\"����O2O���m5�\u0014�4q��\f", "H�qv�A\u0015\u0014�lY\u0007�\u001f���o�\b帨e���_⣺�9��V�{��= [����P\u001f�U��j��!��\u0010�N��HԂp�mA�]!�33�\"�t������n4�t�yJ?\u000e��\u000f�a\u001c", "���䠤��i��9����t�\u0006�\u001f�)���c}�O\u0007�\u000b", "��\u001b����V��Se�\u0013\\�mL�����o\u000f\u0003��ꍱ\"&J����D\u0004�u�!���\u0005_L�2|o���V[׈��\u0001\u0002V���+��8��v�$\u0004�ăWܓ�j\u0001�j�\u0016��y0�k3W.F\u0005��\u0011{�4\u000b", "\u0000w���N�\u0005==��}�`�\u001a�3\f", "-��$smu�um�\r", "�� \u0016f\t��]\\��٘i!\u0014\u0005T,�N����?+�L�t��ӄ��O���9��i�v�tZ�\"\u0018:�\u000b", "�\u0007G]\u0000:黾�p�z�wGϼ�\t��eܟ���~��p�i\b��pe\u0007����r�\u0013|��(��\u001f�Gu�@½�1-�iA[t�M+\u0016�-Ә���z?4��RI\u0019F���K�7���\u001c", "��e7��Űo��v�G�\u0011�\u0019\u0016!�Q\u0006�\u0001)�\\�V��f?b68=`h��b�-q��!A�\u001a�~��R�+�}����imu����\b7*]DEm�\u001aJ�\u0004��xv���MW0�G�p��{�ɟ��[9Pg�jQ�ֶ�;9�\u0017�i2z��s�\\��շ�^\t…", "��~\\R&\u001b`֒| s�\u001d", "�\u000b", "z�n�\u0004X��l���w��Zv��X��k=J�\u00181�Ď��S\b�4+�En�{V�\u0007���s\u0013ls�˲��\\�\t����۹��`rr���\n", "�i��i$�\u0014P�J\b�#�DT���������e\u0003<\u001a9Ms��ܟ_��9����\u0015.�\u0007\u0014\u000e�c%6�\f", "���F�\u0005X�\u0018*�-¤�ҙ`��\u0005�\u0011%\u0011\u0007ҁ���\u0019o_�[�1����)^߹șc�\u0002�/F=1�\"7\u0005ujLnĜ�s\u00050c`��\u0019?\u0017\u0000\u0016�W,�iw֍��!�������M\u001fAM0'7m�\u0016��[}���t�FG�{\u0013ف\u0001�r�Œ�\u0000�0��\u0005\u0012�7BWYaH94�/$��<��\u001b�\u001ft��^�f��]��j0ʆ�ح�[g\u000e\u0006J\u0000\u0006\u0004�q��p�G8m˥Q��f������\u0018�k��c��~r}�ds{�Y��d�\u0014\u0001ys�\u0016gK��AF�\u000b", "���=�\u0006g�b��@\t�\u001a�(��~�U���\u001a�\u001e", "`\u0005\u0012J�_��,��~\u0004+�n\u0001�Z��I�2wA;�ĖO��5�q��u��\u000f\u000e\u0002\u0007�IJ�\u0002�\u0004�:�k)����\u0003�pj��U\u0005(:f,���v�<�\f", "���\"\u0011���\u0002\u0019:E��\u0011�NQ�^q�)M\u001d", "�h��ӌl��d�<��Fy���:`�\u0002\u0013\u000b", "۽�\b\u0010�3��Pr�n�\t�b�;\u0018\u0010C�{�C\u001bc������\u0000)%ζ\n", "T����/\u0015��W]\u0003\u0001~�(��D�h�\u0007\u0002�\u0007\u0001t��cA�$z]�����wf�[�n�����'\u0005\u000fܞ��\u0000R��z�E\"\n", "Pp��Y퐔{\u0001\u0012�-���\u0012���w�1\u0012P�-n��?���.\u000f�k��\u0000RΗ�X�1�5�\u0003��~sB\"���\u001d", "�hW�sC�f���\f", "\u001f�]\u0007V�*�$�z\u0018AAy\u000e��(���~�A��)\r", "\u0016��X�4\u0019RC�\b�\u001f��Q�el��H@W@��O�C\u0004,�b\u0006~\t%�X�GW]�娕���lŇv엟�+\u00031�}=��ָn۝� \b&�)\u0000œT���\u0002��b�i0��$�W�V\n", "+�5g6��7�Y�L���F��`ܣm�A\u001c", "e���9���\u0006F¤�х�0�\b\u001b�\u0004U��dq\u0016���/�ê�s�Wq5�>��cM�%�)$\u0014*�����{�sO�72�a\n", "�\u0005[��\u0013�ܸ�u5\u0012���R\u001fk�8\b\u0011�C\u0016��Q�\u0005u\u0012�<�IW�KS򉏞֘���\\�\\oE6\r", "��۽��\u0013[�\\ŧ]�\u0013�Z���\u0004�/��\"r�oX�f4�bg���\u0019����߿p��s.]m߀\u0017�\u001f&\r", "��ǡ>�)z�\n", "����\u000f�%y{\u000b", "�r\"�\u0010��'�,�y\u0013#\u0016�3�oFk���U�j>�\u001a\u0000!x�\n", "�����\u0010�DމHr8\u000b", "[�K#�%ȹ�{S\u001d", "�oX��7��),�V\n", "�[Mʍ��\"�tv��\u0010�\u0019\u0006,Yԁm�\u0002g�j2䫬ڸ�چw̫���\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "gzip: stdout: Broken pipe\n" ] } ], "source": [ "%%bash\n", "gzip -c /data/oly_gbs_raw/split/1SN_9A_1.fq.gz | head -12" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "@K00132:90:H3YHMBBXX:4:1101:4706:1173 1:N:0:AAGGATGC\n", "CAGCATGTTGTTCTGCTTACCTTGGATTGATTGATTGATTGATTAGTTACTGTTTTACGTCCCACTCGAGAATATTTCACTCATATGGAGACGT\n", "+\n", "JJJJJJJJJJJJJJJJJJJJJFFJFFJJJJJJFJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJFFJJJJJJJJJA\n", "@K00132:90:H3YHMBBXX:4:1101:5741:1173 1:N:0:AAGGATGC\n", "CAGCACGTATCAATTTAATTCTCATTAATATCTGATTTAATTCTACTTAATATCTGTCGATTATTTTGTGTGAAGAAAATCTTTATGCGATGTA\n", "+\n", "JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJ\n", "@K00132:90:H3YHMBBXX:4:1101:6005:1173 1:N:0:AAGGATGC\n", "CAGCTGCGGCGTGACTCGATGGGGGCCGTTCGCGGTCCACGCTTGTCGTGCTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAAGGATG\n", "+\n", "JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "gzip: stdout: Broken pipe\n" ] } ], "source": [ "%%bash\n", "gzip -cd /data/oly_gbs_raw/split/1SN_9A_1.fq.gz | head -12" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12708820\n", "16594884\n" ] } ], "source": [ "%%bash\n", "gzip -cd /owl_web/nightingales/O_lurida/20160223_gbs/1SN_9A_1.fq.gz | wc -l\n", "gzip -cd /data/oly_gbs_raw/split/1SN_9A_1.fq.gz | wc -l" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK! Now we've gotten somewhere! I've fixed the counting and head display issue. Need the ```-d``` (decompress) argument for the gzip command (still need the ```-c```, though!). Counts for both files are evenly divisible by 4.\n", "\n", "However, we see that the line numbers differ from each other for the same (theoretically) demultiplexed reads.\n", "\n", "Maybe I'll re-run the BGI script and compare those files with the ones I created in this notebook entry?" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }