{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Running in Docker container on Ostrich\n", "\n", "#### Started Docker container with the following command:\n", "\n", "```docker run -p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:/owl_web -v /Users/sam/gitrepos:/gitrepos -it f99537d7e06a```\n", "\n", "The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.\n", "\n", "Once the container was started, started Jupyter Notebook with the following command inside the Docker container:\n", "\n", "```jupyter notebook```\n", "\n", "This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.\n", "\n", "The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Wed Jan 4 15:29:29 UTC 2017\n" ] } ], "source": [ "%%bash\n", "date" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check computer specs" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0f2bca9c664b\n" ] } ], "source": [ "%%bash\n", "hostname" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 8\n", "On-line CPU(s) list: 0-7\n", "Thread(s) per core: 1\n", "Core(s) per socket: 8\n", "Socket(s): 1\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 26\n", "Model name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz\n", "Stepping: 5\n", "CPU MHz: 2260.998\n", "BogoMIPS: 4521.99\n", "Hypervisor vendor: KVM\n", "Virtualization type: full\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 256K\n", "L3 cache: 8192K\n" ] } ], "source": [ "%%bash\n", "lscpu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View/compare MD5 checksums of different \"versions\" of good files\n", "\n", "#### Recently discovered that two FASTQ files from BGI were corrupt. Communication with BGI has indicated that all \"versions\" of the FASTQ files we've received over the year (there are three different \"versions\" of each FASTQ) are actually all the same file - they've just been renamed. So, let's take a look at this..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Original FASTQ files from BGI" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total 136G\n", "-rw-rw-rw- 1 srlab staff 584M Jan 9 2016 11_GGCTAC_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 577M Jan 9 2016 12_CTTGTA_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 6.6G Jan 25 2016 151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 6.9G Jan 25 2016 151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 6.6G Jan 25 2016 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 6.2G Jan 27 2016 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.4G Jan 27 2016 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 4.0G Jan 25 2016 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.4G Jan 25 2016 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.8G Jan 25 2016 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 1.3G Jan 25 2016 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Jan 25 2016 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.4G Jan 25 2016 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.8G Jan 25 2016 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 1.3G Jan 25 2016 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Jan 25 2016 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.7G Jan 25 2016 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 3.1G Jan 25 2016 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 2.6G Jan 25 2016 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_1.fq.gz\n", "-rw-rw-rw- 1 srlab staff 3.0G Jan 25 2016 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz\n", "-rw-rw-rw- 1 srlab staff 503M Jan 9 2016 1_ATCACG_L001_R1_001.fastq.gz\n", "drwxrwxrwx 1 srlab staff 5.0K Nov 30 17:38 20160203_mbdseq\n", "drwxrwxrwx 1 srlab staff 6.8K Apr 29 2016 20160223_gbs\n", "-rw-rw-rw- 1 srlab staff 638M Jan 9 2016 2_CGATGT_L001_R1_001.fastq.gz\n", "drwxrwxrwx 1 srlab staff 11K Nov 12 19:47 2bRAD_Dec2015\n", "-rw-rw-rw- 1 srlab staff 634M Jan 9 2016 3_TTAGGC_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 620M Jan 9 2016 4_TGACCA_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 672M Jan 9 2016 5_ACAGTG_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 668M Jan 9 2016 6_GCCAAT_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 549M Jan 9 2016 7_CAGATC_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 765M Jan 9 2016 8_ACTTGA_L001_R1_001.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 4.2K Mar 24 2016 checksums.md5\n", "-rwxr-xr-x 1 srlab staff 1.3G Aug 14 2012 filtered_1000_GTGAAA_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 2.6G Dec 19 2012 filtered_106A_Female_Mix_GATCAG_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 2.5G Dec 18 2012 filtered_106A_Female_Mix_GATCAG_L004_R2.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.3G Aug 14 2012 filtered_106A_Female_Mix_GATCAG_L007_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.9G Dec 18 2012 filtered_106A_Male_Mix_TAGCTT_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.7G Dec 18 2012 filtered_106A_Male_Mix_TAGCTT_L004_R2.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.9G Aug 21 2012 filtered_106A_Male_Mix_TAGCTT_L007_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.0G Dec 19 2012 filtered_108A_Female_Mix_GGCTAC_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 2.9G Dec 19 2012 filtered_108A_Female_Mix_GGCTAC_L004_R2.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.4G Aug 21 2012 filtered_108A_Female_Mix_GGCTAC_L007_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.7G Dec 19 2012 filtered_108A_Male_Mix_AGTCAA_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.5G Dec 19 2012 filtered_108A_Male_Mix_AGTCAA_L004_R2.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.8G Aug 21 2012 filtered_108A_Male_Mix_AGTCAA_L007_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.1G Aug 14 2012 filtered_1600_GTGGCC_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 5.6G Jan 25 2012 filtered_2000_NoIndex_L007_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.2G Aug 14 2012 filtered_2200_GTTTCG_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 1.1G Aug 14 2012 filtered_400_GTCCGC_L004_R1.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 5.9G Jan 25 2012 filtered_400_NoIndex_L006_R1.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_001.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_002.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_003.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_004.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_005.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_006.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_007.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_008.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_009.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_010.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_011.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_012.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.5G Dec 4 2014 lane1_NoIndex_L001_R1_013.fastq.gz\n", "-rw-rw-rw- 1 srlab staff 1.3G Dec 4 2014 lane1_NoIndex_L001_R1_014.fastq.gz\n", "-rwxr-xr-x 1 srlab staff 3.1G Jun 20 2013 m130619_081336_42134_c100525122550000001823081109281326_s1_p0.bas.h5.gz\n", "-rw-rw-rw- 1 srlab staff 3.4K Feb 9 2016 readme.md\n" ] } ], "source": [ "%%bash\n", "ls -lh /owl_web/nightingales/O_lurida/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View original checksum for one of the \"non-corrupt\" files" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz) = a503043167457337a65d51151ceb5dd0\n" ] } ], "source": [ "%%bash\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz /owl_web/nightingales/O_lurida/checksums.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View checksum for subsequent \"versions\" of this file" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (/Volumes/web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz.clean.dup.clean.gz) = a503043167457337a65d51151ceb5dd0\n", "a503043167457337a65d51151ceb5dd0 /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz.clean.dup.clean.gz\n" ] } ], "source": [ "%%bash\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20160512/checksums.md5\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/checksums.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Albeit a bit difficult to read, you should be able to see that the checksum values for all three \"versions\" of this FASTQ file are the same (despite the changes in filename). This confirms BGI's information that all the data files provided at each stage are exactly the same file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### View checksums for one of the \"corrupt\" files" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz) = 08cfdc6fdc5a6190cb05cdcb81fa5b9c\n", "MD5 (/Volumes/web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz) = 08cfdc6fdc5a6190cb05cdcb81fa5b9c\n", "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n" ] } ], "source": [ "%%bash\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz /owl_web/nightingales/O_lurida/checksums.md5\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20160512/checksums.md5\n", "grep 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/checksums.md5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### This appears confusing because all three stored checksum values match. However, this [notebook](https://github.com/sr320/LabDocs/blob/2f6c1b43d4dba60c7a4f3e6dd34d9e2d2eb1f85a/jupyter_nbs/sam/20161117_docker_oly_genome_fastq_corruption.ipynb) reviews what has happened. To help improve the ability to follow all of this (w/o needing to leave this notebook to get some idea of what's going on), let's generate MD5 checksums for all three \"versions\" of this file and see what they look like." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Generate MD5 checksums for all three \"versions\" of this \"corrupt\" file." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "f10c1f143d153f364cf6be13452eea4a /owl_web/nightingales/O_lurida/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz\n", "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n", "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t1m39.203s\n", "user\t0m0.350s\n", "sys\t0m16.000s\n", "\n", "real\t3m34.544s\n", "user\t0m0.360s\n", "sys\t0m25.510s\n", "\n", "real\t3m18.641s\n", "user\t0m0.370s\n", "sys\t0m25.530s\n" ] } ], "source": [ "%%bash\n", "time md5sum /owl_web/nightingales/O_lurida/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Notice that the checksum for the original file is different than the subsequent \"versions\", despite the fact that the checksum.md5 file has the correct checksum value! Something happened to this file (as well as 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz) during copying to owl/nightingales/O_lurida." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For completeness, let's view the MD5 checksums for 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz. This is the other \"corrupt\" file identified in this [notebook entry](https://github.com/sr320/LabDocs/blob/2f6c1b43d4dba60c7a4f3e6dd34d9e2d2eb1f85a/jupyter_nbs/sam/20161117_docker_oly_genome_fastq_corruption.ipynb)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz) = 8bc0d7c7a7af3954baca31a4a7fe9f2b\n", "MD5 (/Volumes/web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz) = 8bc0d7c7a7af3954baca31a4a7fe9f2b\n", "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n" ] } ], "source": [ "%%bash\n", "grep 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz /owl_web/nightingales/O_lurida/checksums.md5\n", "grep 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20160512/checksums.md5\n", "grep 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/checksums.md5" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "861c8161c1e0171532a058582db3ae6d /owl_web/nightingales/O_lurida/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n", "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t5m45.360s\n", "user\t0m0.470s\n", "sys\t0m42.260s\n", "\n", "real\t4m50.124s\n", "user\t0m0.710s\n", "sys\t0m46.230s\n", "\n", "real\t5m39.771s\n", "user\t0m0.750s\n", "sys\t0m46.310s\n" ] } ], "source": [ "%%bash\n", "time md5sum /owl_web/nightingales/O_lurida/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### We see the same thing - original file has different checksum value than subsequent \"versions\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's replace the bad files with the good ones" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t30m31.618s\n", "user\t0m0.000s\n", "sys\t0m32.920s\n" ] } ], "source": [ "%%bash\n", "time cp /owl_web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz \\\n", "/owl_web/nightingales/O_lurida/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Look at the checksums now" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/nightingales/O_lurida/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz\n", "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n", "08cfdc6fdc5a6190cb05cdcb81fa5b9c /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t4m55.495s\n", "user\t0m0.300s\n", "sys\t0m25.810s\n", "\n", "real\t2m12.678s\n", "user\t0m0.370s\n", "sys\t0m25.610s\n", "\n", "real\t2m37.531s\n", "user\t0m0.680s\n", "sys\t0m25.540s\n" ] } ], "source": [ "%%bash\n", "time md5sum /owl_web/nightingales/O_lurida/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20160512/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz.clean.dup.clean.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Everything looks good. Time to copy the other file and check it out." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/nightingales/O_lurida/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n", "8bc0d7c7a7af3954baca31a4a7fe9f2b /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n", "real\t33m16.268s\n", "user\t0m0.000s\n", "sys\t0m59.710s\n", "\n", "real\t11m57.266s\n", "user\t0m0.710s\n", "sys\t0m46.420s\n", "\n", "real\t4m11.480s\n", "user\t0m0.550s\n", "sys\t0m46.800s\n", "\n", "real\t5m36.195s\n", "user\t0m0.660s\n", "sys\t0m46.440s\n" ] } ], "source": [ "%%bash\n", "time cp /owl_web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz \\\n", "/owl_web/nightingales/O_lurida/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "time md5sum /owl_web/nightingales/O_lurida/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20160512/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz\n", "time md5sum /owl_web/O_lurida_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Ostrea_lurida/clean_data/151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz.clean.dup.clean.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Everything looks good." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 2 }