### Running in Docker container on Ostrich

#### Started Docker container with the following command:

```docker run -p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:/owl_web -v /Users/sam/gitrepos:/gitrepos -it f99537d7e06a```

The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.

Once the container was started, started Jupyter Notebook with the following command inside the Docker container:

```jupyter notebook```

This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.

The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)

In [1]:
%%bash
date

Wed Mar  1 20:13:26 UTC 2017


### Check computer specs

In [2]:
%%bash
hostname

0f2bca9c664b


In [3]:
%%bash
lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    8
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 26
Model name:            Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz
Stepping:              5
CPU MHz:               2260.998
BogoMIPS:              4521.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K


### Copy non-demultiplexed BGI FASTQ files to local computer (Ostrich) and verfiy checksums

In [4]:
ls /owl_web/nightingales/O_lurida/20160223_gbs/

160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
[0m[01;32m1HL_10A_1.fq.gz[0m*
[01;32m1HL_10A_2.fq.gz[0m*
[01;32m1HL_11A_1.fq.gz[0m*
[01;32m1HL_11A_2.fq.gz[0m*
[01;32m1HL_12A_1.fq.gz[0m*
[01;32m1HL_12A_2.fq.gz[0m*
[01;32m1HL_13A_1.fq.gz[0m*
[01;32m1HL_13A_2.fq.gz[0m*
[01;32m1HL_14A_1.fq.gz[0m*
[01;32m1HL_14A_2.fq.gz[0m*
[01;32m1HL_15A_1.fq.gz[0m*
[01;32m1HL_15A_2.fq.gz[0m*
[01;32m1HL_16A_1.fq.gz[0m*
[01;32m1HL_16A_2.fq.gz[0m*
[01;32m1HL_17A_1.fq.gz[0m*
[01;32m1HL_17A_2.fq.gz[0m*
[01;32m1HL_19A_1.fq.gz[0m*
[01;32m1HL_19A_2.fq.gz[0m*
[01;32m1HL_1A_1.fq.gz[0m*
[01;32m1HL_1A_2.fq.gz[0m*
[01;32m1HL_20A_1.fq.gz[0m*
[01;32m1HL_20A_2.fq.gz[0m*
[01;32m1HL_21A_1.fq.gz[0m*
[01;32m1HL_21A_2.fq.gz[0m*
[01;32m1HL_22A_1.fq.gz[0m*
[01;32m1HL_22A_2.fq.gz[0m*
[01;32m1HL_23A_1.fq.gz[0m*
[01;32m1HL_23A_2.fq.gz[0m*
[01;32m1HL_24A_1.fq.gz[0m*
[01;32m1HL_24A_2.fq.gz[0m*
[01;32m1H

In [5]:
%%bash
mkdir /data/20170301_fastqc_gbs

In [6]:
cd /data/20170301_fastqc_gbs/

/data/20170301_fastqc_gbs


In [7]:
%%bash
time for file in /owl_web/nightingales/O_lurida/20160223_gbs/160123*
    do
    cp "$file" .
    done


real	31m9.816s
user	0m0.020s
sys	5m31.590s


### Generate MD5 checksums & compare to original checksums

In [8]:
%%bash
time for file in *.gz
    do
    md5 "$file" >> temp_checksums.md5
    done

bash: line 3: md5: command not found
bash: line 3: md5: command not found

real	0m0.020s
user	0m0.010s
sys	0m0.000s


In [9]:
%%bash
cat temp_checksums.md5_checksums.md5

In [10]:
%%bash
ls -lh

total 37G
-rw-r--r-- 1 srlab staff 17G Mar  1 20:36 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
-rw-r--r-- 1 srlab staff 20G Mar  1 20:54 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
-rw-r--r-- 1 srlab staff   0 Mar  1 20:54 temp_checksums.md5


Whoops! Typos in both of the above commands. Will re-run...

In [13]:
%%bash
time for file in *.gz
    do
    md5sum "$file" >> temp_checksums.md5
    done


real	7m1.077s
user	0m3.950s
sys	4m11.280s


In [14]:
%%bash
cat temp_checksums.md5

afd18614b9f5694af1a59672821cd0db  160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
2894dd9b54c1388d3d74e8f8642aa267  160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz


In [15]:
%%bash
awk '/160123_I132/[print $0]' /owl_web/nightingales/O_lurida/20160223_gbs/checksums.md5

awk: line 1: syntax error at or near [


Bracket typo. Redo...

In [16]:
%%bash
awk '/160123_I132/{print $0}' /owl_web/nightingales/O_lurida/20160223_gbs/checksums.md5

MD5 (/Volumes/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz) = afd18614b9f5694af1a59672821cd0db
MD5 (/Volumes/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz) = 2894dd9b54c1388d3d74e8f8642aa267


Visual comparison looks good. Will proceed with FastQC

### FastQC Analysis

In [17]:
%%bash
which fastqc

/usr/local/bioinformatics/FastQC/fastqc


In [18]:
%%bash
time fastqc --extract 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz

Analysis complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Analysis complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz


Started analysis of 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 5% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 10% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 15% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 20% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 25% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 30% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 35% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 40% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 45% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 50% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 55% complete for 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
Approx 60

In [19]:
ls -lh

total 37G
-rw-r--r-- 1 srlab staff  17G Mar  1 20:36 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
drwxr-xr-x 1 srlab staff  272 Mar  2 00:35 [0m[01;34m160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc[0m/
-rw-r--r-- 1 srlab staff 294K Mar  2 00:35 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.html
-rw-r--r-- 1 srlab staff 403K Mar  2 00:35 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.zip
-rw-r--r-- 1 srlab staff  20G Mar  1 20:54 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
drwxr-xr-x 1 srlab staff  272 Mar  2 03:12 [01;34m160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc[0m/
-rw-r--r-- 1 srlab staff 347K Mar  2 03:12 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.html
-rw-r--r-- 1 srlab staff 489K Mar  2 03:12 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.zip
-rw-r--r-- 1 srlab staff  186 Mar  1 21:10 temp_checksums.md5


In [20]:
ls 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/

[0m[01;34mIcons[0m/  [01;34mImages[0m/  fastqc.fo  fastqc_data.txt  fastqc_report.html  summary.txt


In [21]:
cat 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/summary.txt

PASS	Basic Statistics	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Per base sequence quality	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
WARN	Per tile sequence quality	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Per sequence quality scores	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
WARN	Per base sequence content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Per sequence GC content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Per base N content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Sequence Length Distribution	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
FAIL	Sequence Duplication Levels	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
PASS	Overrepresented sequences	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
FAIL	Adapter Content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
FAIL	Kmer Content	16

In [22]:
ls 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/

[0m[01;34mIcons[0m/  [01;34mImages[0m/  fastqc.fo  fastqc_data.txt  fastqc_report.html  summary.txt


In [23]:
cat 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/s

cat: 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/: Is a directory


In [24]:
cat 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/summary.txt

PASS	Basic Statistics	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Per base sequence quality	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Per tile sequence quality	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
PASS	Per sequence quality scores	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Per base sequence content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
PASS	Per sequence GC content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
WARN	Per base N content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
PASS	Sequence Length Distribution	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Sequence Duplication Levels	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
WARN	Overrepresented sequences	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Adapter Content	160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
FAIL	Kmer Content	16

#### Move the FastQC Outputs to Nightingales

In [25]:
%%bash
mv -v 160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/ \
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/ \
-t /owl_web/nightingales/O_lurida/20160223_gbs/

'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/' -> '/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc'
'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc.fo' -> '/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc.fo'
'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_data.txt' -> '/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_data.txt'
'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_report.html' -> '/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_report.html'
'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/Icons' -> '/owl_web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/Icons'
'160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96F

### View FastQC Reports

Easier viewing can be had by visiting the two folling links:

- [20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc](http://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_report.html)

- [20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc](http://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/fastqc_report.html)

In [27]:
%%HTML
<iframe width="100%" height="500" src="http://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc/fastqc_report.html"></iframe>

In [28]:
%%HTML
<iframe width="100%" height="500" src="http://owl.fish.washington.edu/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc/fastqc_report.html"></iframe>

These results aren't terribly surprising since these data haven't been quality trimmed to remove adapter/barcode sequences.

In [1]:
cd /data/20170301_fastqc_gbs/

/data/20170301_fastqc_gbs


In [2]:
ls

160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.html
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.zip
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.html
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.zip
temp_checksums.md5


Forgot to move the .html files!

In [3]:
%%bash
mv *.html /owl_web/nightingales/O_lurida/20160223_gbs/

In [4]:
ls

160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1_fastqc.zip
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz
160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2_fastqc.zip
temp_checksums.md5


Clean up.

In [5]:
cd /data/

/data


In [6]:
ls

[0m[01;34m20161117_oly_gbs_vcf_analysis[0m/  test_file10.txt.gz  test_file6.txt.gz
[01;34m20170227_jay_data_tmp[0m/          test_file2.txt.gz   test_file7.txt.gz
[01;34m20170301_fastqc_gbs[0m/            test_file3.txt.gz   test_file8.txt.gz
md5s.txt                        test_file4.txt.gz   test_file9.txt.gz
test_file1.txt.gz               test_file5.txt.gz   wgetrc_berk_seq


In [7]:
%%bash
rm -rf 20170301_fastqc_gbs/

In [8]:
ls

[0m[01;34m20161117_oly_gbs_vcf_analysis[0m/  test_file2.txt.gz  test_file7.txt.gz
[01;34m20170227_jay_data_tmp[0m/          test_file3.txt.gz  test_file8.txt.gz
md5s.txt                        test_file4.txt.gz  test_file9.txt.gz
test_file1.txt.gz               test_file5.txt.gz  wgetrc_berk_seq
test_file10.txt.gz              test_file6.txt.gz
