{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Running in Docker container on Ostrich\n",
"\n",
"#### Started Docker container with the following command:\n",
"\n",
"```docker run - p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:owl_web -v /Users/sam/gitrepos/LabDocs/jupyter_nbs/sam/:/jupyter_nbs -it f99537d7e06a```\n",
"\n",
"The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files on Owl/home and Owl/web accessible to the Docker container.\n",
"\n",
"Once the container was started, started Jupyter Notebook with the following command inside the Docker container:\n",
"\n",
"```jupyter notebook```\n",
"\n",
"This is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.\n",
"\n",
"The Docker container is running on an image created from this [Dockerfile (Git commit 443bc42)](https://github.com/sr320/LabDocs/blob/443bc425cd36d23a07cf12625f38b7e3a397b9be/code/dockerfiles/Dockerfile.bio)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wed Dec 14 15:52:40 UTC 2016\n"
]
}
],
"source": [
"%%bash\n",
"date"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Check computer specs"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4bd1957ce190\n"
]
}
],
"source": [
"%%bash\n",
"hostname"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Architecture: x86_64\n",
"CPU op-mode(s): 32-bit, 64-bit\n",
"Byte Order: Little Endian\n",
"CPU(s): 8\n",
"On-line CPU(s) list: 0-7\n",
"Thread(s) per core: 1\n",
"Core(s) per socket: 8\n",
"Socket(s): 1\n",
"Vendor ID: GenuineIntel\n",
"CPU family: 6\n",
"Model: 26\n",
"Model name: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz\n",
"Stepping: 5\n",
"CPU MHz: 2260.998\n",
"BogoMIPS: 4521.99\n",
"Hypervisor vendor: KVM\n",
"Virtualization type: full\n",
"L1d cache: 32K\n",
"L1i cache: 32K\n",
"L2 cache: 256K\n",
"L3 cache: 8192K\n"
]
}
],
"source": [
"%%bash\n",
"lscpu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bloated notebook analysis"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-r--r-- 1 srlab staff 104M Dec 8 12:09 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n"
]
}
],
"source": [
"%%bash\n",
"ls -lh /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### That notebook is over >100MB in size, which is too large for hosting on GitHub. Additionally, the notebook crashes the browser (and sometimes the computer) due to the ridiculous number of output lines generated by the ```wget``` command. Let's look at some more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Line count"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1197134 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n"
]
}
],
"source": [
"%%bash\n",
"wc -l /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### In order to preserve some of the information in the orginal notebook before I strip the output, we'll look at the file in a bit more depth..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### How long did the wget command for the Ostrea lurida files take?\n",
"\n",
"#### First, let's find the line that has the output of the ```time``` command that I ran. The ```grep``` command includes the ```-n``` flag to identify line number(s) of search results."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1197005: \"real\\t2529m32.643s\\n\",\n"
]
}
],
"source": [
"%%bash\n",
"grep -n real /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Whoa! That's a LONG time! Let's try to pull the full time output.\n",
"\n",
"#### Using ```head``` and ```tail``` to pull out a specific range of lines from the file. Making a rough guess..."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \"FINISHED --2016-12-08 12:07:18--\\n\",\n",
" \"Total wall clock time: 1d 18h 9m 33s\\n\",\n",
" \"Downloaded: 25 files, 55G in 1d 17h 53m 46s (379 KB/s)\\n\",\n",
" \"\\n\",\n",
" \"real\\t2529m32.643s\\n\",\n",
" \"user\\t0m11.190s\\n\",\n",
" \"sys\\t40m9.630s\\n\"\n",
" ]\n",
" }\n",
" ],\n",
" \"source\": [\n",
" \"%%bash\\n\",\n",
" \"time wget -m ftp://F15FTSUSAT0327:OSTibkD@cdts-hk.genomics.cn/Ostrea_lurida/\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"#### View directory structure of downloaded files\\n\",\n"
]
}
],
"source": [
"%%bash\n",
"head -1197020 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb | tail -20"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### So, to download all of the Ostrea lurida files, it took a little over 37hrs!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Let's see what the time frame was on the Panopea generosa files was..."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"115: \"### Download all Ostrea lurida files from BGI using ```wget```\"\n",
"142: \"time wget -m ftp://F15FTSUSAT0327:OSTibkD@cdts-hk.genomics.cn/Ostrea_lurida/ \\\\\\n\",\n",
"197: \"#### Not going to waste time figuring out why the ```-P``` argument didn't work for ```wget```, so just changing to desired directory and running ```wget``` command again...\"\n",
"1197013: \"time wget -m ftp://F15FTSUSAT0327:OSTibkD@cdts-hk.genomics.cn/Ostrea_lurida/\"\n",
"1197049: \"### Download all Panopea gererosa files from BGI using ```wget```\"\n",
"1197080: \"time wget -m ftp://F15FTSUSAT0327:OSTibkD@cdts-hk.genomics.cn/Panopea_generosa\"\n"
]
}
],
"source": [
"%%bash\n",
"grep -n wget /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Since the total number of lines in the file is 1197134, I'll just use the ```tail``` command to look at the last 100 lines (because the ```wget``` command for the Panopea generosa files is at line 1197080."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" \"text\": [\n",
" \"bash: line 1: tree: command not found\\n\"\n",
" ]\n",
" }\n",
" ],\n",
" \"source\": [\n",
" \"%%bash\\n\",\n",
" \"tree /owl_web/O_lurida_genome_assemblies_BGI/20161201/\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"### Download all Panopea gererosa files from BGI using ```wget```\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {\n",
" \"collapsed\": false\n",
" },\n",
" \"outputs\": [\n",
" {\n",
" \"name\": \"stdout\",\n",
" \"output_type\": \"stream\",\n",
" \"text\": [\n",
" \"/owl_web/P_generosa_genome_assemblies_BGI/20161201\\n\"\n",
" ]\n",
" }\n",
" ],\n",
" \"source\": [\n",
" \"cd /owl_web/P_generosa_genome_assemblies_BGI/20161201/\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {\n",
" \"collapsed\": true\n",
" },\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"%%bash\\n\",\n",
" \"time wget -m ftp://F15FTSUSAT0327:OSTibkD@cdts-hk.genomics.cn/Panopea_generosa\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"markdown\",\n",
" \"metadata\": {},\n",
" \"source\": [\n",
" \"#### View directory structure of downloaded files\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {\n",
" \"collapsed\": true\n",
" },\n",
" \"outputs\": [],\n",
" \"source\": [\n",
" \"%%bash\\n\",\n",
" \"tree /owl_web/P_generosa_genome_assemblies_BGI/20161201/\"\n",
" ]\n",
" },\n",
" {\n",
" \"cell_type\": \"code\",\n",
" \"execution_count\": null,\n",
" \"metadata\": {\n",
" \"collapsed\": true\n",
" },\n",
" \"outputs\": [],\n",
" \"source\": []\n",
" }\n",
" ],\n",
" \"metadata\": {\n",
" \"anaconda-cloud\": {},\n",
" \"kernelspec\": {\n",
" \"display_name\": \"Python [default]\",\n",
" \"language\": \"python\",\n",
" \"name\": \"python2\"\n",
" },\n",
" \"language_info\": {\n",
" \"codemirror_mode\": {\n",
" \"name\": \"ipython\",\n",
" \"version\": 2\n",
" },\n",
" \"file_extension\": \".py\",\n",
" \"mimetype\": \"text/x-python\",\n",
" \"name\": \"python\",\n",
" \"nbconvert_exporter\": \"python\",\n",
" \"pygments_lexer\": \"ipython2\",\n",
" \"version\": \"2.7.12\"\n",
" }\n",
" },\n",
" \"nbformat\": 4,\n",
" \"nbformat_minor\": 1\n",
"}\n"
]
}
],
"source": [
"%%bash\n",
"tail -100 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Well, what we see (and should've realized when we ran ```grep -n real``` on line 7) is that there is no output from that ```wget``` command.\n",
"\n",
"#### So, let's see if the files got downloaded or not..."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 1.3G\n",
"-rw-rw-rw- 1 srlab staff 2.3K Dec 1 05:39 md5.txt\n",
"-rw-rw-rw- 1 srlab staff 1.7K Dec 1 09:37 md5.check\n",
"drwxrwxrwx 1 srlab staff 704 Dec 10 08:35 clean_data\n",
"-rw-rw-rw- 1 srlab staff 1.3G Dec 1 04:12 Panopea_generosa.fa\n",
"-rw-rw-rw- 1 srlab staff 432 Dec 1 04:12 N50.xls\n",
"-rw-rw-rw- 1 srlab staff 3.6K Dec 1 04:11 17mer.log\n",
"-rw-rw-rw- 1 srlab staff 7.6K Dec 1 04:11 17mer.freq\n"
]
}
],
"source": [
"%%bash\n",
"ls -lhr /owl_web/P_generosa_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Panopea_generosa/"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 71G\n",
"-rw-rw-rw- 1 srlab staff 2.3K Dec 1 03:56 lane.lst.stat.xls\n",
"-rw-rw-rw- 1 srlab staff 1.3G Dec 1 05:07 160103_I137_FCH3V5YBBXX_L6_WHPANwalDDACDTAAPEI-102_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.2G Dec 1 05:03 160103_I137_FCH3V5YBBXX_L6_WHPANwalDDACDTAAPEI-102_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 2.2G Dec 1 04:55 160103_I137_FCH3V5YBBXX_L6_WHPANwalDDABDLAAPEI-100_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 2.0G Dec 1 04:51 160103_I137_FCH3V5YBBXX_L6_WHPANwalDDABDLAAPEI-100_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.3G Dec 1 05:01 160103_I137_FCH3V5YBBXX_L5_WHPANwalDDACDTAAPEI-102_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.2G Dec 1 04:58 160103_I137_FCH3V5YBBXX_L5_WHPANwalDDACDTAAPEI-102_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 2.3G Dec 1 04:47 160103_I137_FCH3V5YBBXX_L5_WHPANwalDDABDLAAPEI-100_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 2.1G Dec 1 04:42 160103_I137_FCH3V5YBBXX_L5_WHPANwalDDABDLAAPEI-100_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 2.0G Dec 1 04:38 160103_I137_FCH3V5YBBXX_L4_WHPANwalDDAADWAAPEI-101_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.8G Dec 1 04:34 160103_I137_FCH3V5YBBXX_L4_WHPANwalDDAADWAAPEI-101_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.9G Dec 1 04:30 160103_I137_FCH3V5YBBXX_L3_WHPANwalDDAADWAAPEI-101_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 1.8G Dec 1 04:27 160103_I137_FCH3V5YBBXX_L3_WHPANwalDDAADWAAPEI-101_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 7.2G Dec 1 04:24 151122_I136_FCH3L2FBBXX_L7_wHAXPI023990-97_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 6.4G Dec 1 04:19 151122_I136_FCH3L2FBBXX_L7_wHAXPI023990-97_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 8.1G Dec 1 04:14 151114_I191_FCH3Y35BCXX_L2_wHAMPI023988-81_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 7.8G Dec 1 04:09 151114_I191_FCH3Y35BCXX_L2_wHAMPI023988-81_1.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 11G Dec 1 04:05 151114_I191_FCH3Y35BCXX_L1_wHAIPI023989-79_2.fq.gz.clean.dup.clean.gz\n",
"-rw-rw-rw- 1 srlab staff 9.9G Dec 1 03:59 151114_I191_FCH3Y35BCXX_L1_wHAIPI023989-79_1.fq.gz.clean.dup.clean.gz\n"
]
}
],
"source": [
"%%bash\n",
"ls -lhr /owl_web/P_generosa_genome_assemblies_BGI/20161201/cdts-hk.genomics.cn/Panopea_generosa/clean_data/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### OK, the files got downloaded. I'm guessing the enormous oupt from the Ostrea lurida ```wget``` command crashed the browser, but the notebook commands still proceeded to completion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stripping cell output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use nbconvert to convert from \"notebook\" format to \"notebook\" format. A [Jupyter Google Group post provided the use of ```--ClearOutputPreprocessor.enabled=True```](https://groups.google.com/forum/#!topic/jupyter/z6ODiJ6VUzI) to strip output from cells."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"[NbConvertApp] Converting notebook /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb to notebook\n",
"[NbConvertApp] Writing 6510 bytes to /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n"
]
}
],
"source": [
"%%bash\n",
"jupyter nbconvert \\\n",
"--to notebook \\\n",
"/gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb \\\n",
"--ClearOutputPreprocessor.enabled=True \\\n",
"--output /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Let's see if it worked by doing another line count on the notebook file"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"295 /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb\n"
]
}
],
"source": [
"%%bash\n",
"wc -l /gitrepos/LabDocs/jupyter_nbs/sam/20161206_docker_BGI_genome_downloads.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Indeed it did! Will get the notebook (and this notebook) pushed to GitHub!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 1
}