{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Get data & packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data was downloaded from Globus (globus.org). It was sent in two zipped files, one from each lane, containing separate files for each sample (demultiplexed). \n", "\n", "I then checked that the md5 values were the same using `md5 [filename].tar.gz >> [filename].md5` (this appened the new value in the .md5 file, so I could compare). \n", "\n", "### A note on data accessibility & my working environment\n", "\n", "I originally downloaded data to Ostrich, thinking that I could work on that computer remotely using Remote Desktop and Jupyter Notebook. However, my internet connection is too slow to work productively that way. I therefore also downloaded the data to my external hard drive, and worked locally. This also allows me to use all the packages that I had installed on my personal computer in 2018 (which is helpful). \n", "\n", "Canonical versions of the data is saved to Owl/Nightengales in the zipped format here: [nightingales/O_lurida/2020-04-21_QuantSeq-data/](http://owl.fish.washington.edu/nightingales/O_lurida/2020-04-21_QuantSeq-data/) and as individual fastq/sample here [nightingales/O_lurida/](http://owl.fish.washington.edu/nightingales/O_lurida/), and to Gannet as individual fastq/sample here [Atumefaciens/20200426_olur_fastqc_quantseq/](https://gannet.fish.washington.edu/Atumefaciens/20200426_olur_fastqc_quantseq/). \n", "\n", "Sam also ran MultiQC on my samples; check out his [notebook entry](https://robertslab.github.io/sams-notebook/2020/04/26/FastQC-MultiQC-Laura-Spencer's-QuantSeq-Data.html), and the [MultiQC report](https://gannet.fish.washington.edu/Atumefaciens/20200426_olur_fastqc_quantseq/multiqc_report.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create path variables, i.e. shortcuts to certain directories \n", "\n", "To start, create some variables for commonly accessed paths. NOTE: many of the steps in this workflow require the me to be located within a specific directory to access files. So, while I try to use these path variables, I often have to hard-code my paths. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# create path variable to raw data, saved on my external hard drive \n", "workingdir = \"/Volumes/Bumblebee/O.lurida_QuantSeq-2020/\"" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Volumes/Bumblebee/O.lurida_QuantSeq-2020\n" ] } ], "source": [ "cd {workingdir}" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# create path variable to fastqc directory \n", "fastqc = \"/Applications/bioinformatics/FastQC/\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FastQC v0.11.8\r\n" ] } ], "source": [ "# test fqstqc \n", "! {fastqc}fastqc --version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### I installed MultiQC using git clone via the following: \n", "\n", " git clone https://github.com/ewels/MultiQC.git\n", " cd MultiQC\n", " pip install ." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "multiqc, version 1.9.dev0\r\n" ] } ], "source": [ "# test multiqc \n", "! multiqc --version" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install Cutadapt \n", "\n", "The [Cutadapt program](https://cutadapt.readthedocs.io/en/stable/installation.html) is used in the tagseq processing pipeline. I installed `cutadapt` using `python3 -m pip install --user --upgrade cutadapt`. During install, I received this warning: \n", "\n", " WARNING: The script cutadapt is installed in '/Users/laura/.local/bin' which is not on PATH.\n", " Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.\n", " \n", "So, I added that path to my PATH using the following in Terminal: \n", "`PATH=$PATH:/Users/laura/.local/bin`\n", "\n", "For some reason Jupyter Notebook doesn't recognize `cutadapt`, even though I can access it via the Terminal. " ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "# Try to add the path here \n", "! PATH=$PATH:/Users/laura/.local/bin" ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/bin/sh: cutadapt: command not found\r\n" ] } ], "source": [ "# Still doesn't work \n", "! cutadapt --version" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.10\r\n" ] } ], "source": [ "# Hard coding the cutadapt path works, though \n", "! /Users/laura/.local/bin/cutadapt --version" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "usage: fastq_quality_filter [-h] [-v] [-q N] [-p N] [-z] [-i INFILE] [-o OUTFILE]\r\n", "Part of FASTX Toolkit 0.0.14 by A. Gordon (assafgordon@gmail.com)\r\n", "\r\n", " [-h] = This helpful help screen.\r\n", " [-q N] = Minimum quality score to keep.\r\n", " [-p N] = Minimum percent of bases that must have [-q] quality.\r\n", " [-z] = Compress output with GZIP.\r\n", " [-i INFILE] = FASTA/Q input file. default is STDIN.\r\n", " [-o OUTFILE] = FASTA/Q output file. default is STDOUT.\r\n", " [-v] = Verbose - report number of sequences.\r\n", " If [-o] is specified, report will be printed to STDOUT.\r\n", " If [-o] is not specified (and output goes to STDOUT),\r\n", " report will be printed to STDERR.\r\n", "\r\n" ] } ], "source": [ "# test fastq_quality_filter (see if it's correctly added to my PATH)\n", "! fastq_quality_filter -h" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unpack raw data \n", "\n", "Currently, the data is still zipped (that's how it arrived via Globus). I need to thus tar/gunzip the lane files, before I can gunzip the individual library files. " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Volumes/Bumblebee/O.lurida_QuantSeq-2020/raw-data\n" ] } ], "source": [ "cd raw-data/" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mBatch1_69plex_lane1.md5\u001b[m\u001b[m* \u001b[31mkey\u001b[m\u001b[m*\r\n", "\u001b[31mBatch1_69plex_lane1.tar.gz\u001b[m\u001b[m* \u001b[31mquantseq2020_key.csv\u001b[m\u001b[m*\r\n", "\u001b[31mBatch2_77plex_lane2_md5\u001b[m\u001b[m* \u001b[31mquantseq2020_key.xlsx\u001b[m\u001b[m*\r\n", "\u001b[31mBatch2_77plex_lane2_tar.gz\u001b[m\u001b[m* \u001b[30m\u001b[43mtest-batch\u001b[m\u001b[m/\r\n" ] } ], "source": [ "ls" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a856ffbe2bf102b07ee78607946a463b Batch1_69plex_lane1.tar.gz\n", "MD5 (Batch1_69plex_lane1.tar.gz) = a856ffbe2bf102b07ee78607946a463b\n" ] } ], "source": [ "# check md5's for batch1.tar.gz compared to md5 that seq. facility sent \n", "! cat Batch1_69plex_lane1.md5\n", "! md5 Batch1_69plex_lane1.tar.gz" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56decbd72a38b6076de6a421cfddf9c5 Batch2_77plex_lane2_tar.gz\n", "MD5 (Batch2_77plex_lane2_tar.gz) = 56decbd72a38b6076de6a421cfddf9c5\n" ] } ], "source": [ "# check md5's for batch2.tar.gz compared to md5 that seq. facility sent\n", "! cat Batch2_77plex_lane2_md5\n", "! md5 Batch2_77plex_lane2_tar.gz" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# extract batch/lane 1 data \n", "! gunzip -c Batch1_69plex_lane1.tar.gz | tar xopf -" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# extract batch/lane 2 data \n", "! gunzip -c Batch2_77plex_lane2_tar.gz | tar xopf -" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31mBatch1_69plex_lane1.md5\u001b[m\u001b[m \u001b[31mBatch2_77plex_lane2_tar.gz\u001b[m\u001b[m\r\n", "\u001b[31mBatch1_69plex_lane1.tar.gz\u001b[m\u001b[m \u001b[31mkey\u001b[m\u001b[m\r\n", "\u001b[30m\u001b[43mBatch1_69plex_lane1_done\u001b[m\u001b[m \u001b[31mquantseq2020_key.csv\u001b[m\u001b[m\r\n", "\u001b[30m\u001b[43mBatch2_77plex_lane2_done\u001b[m\u001b[m \u001b[31mquantseq2020_key.xlsx\u001b[m\u001b[m\r\n", "\u001b[31mBatch2_77plex_lane2_md5\u001b[m\u001b[m \u001b[30m\u001b[43mtest-batch\u001b[m\u001b[m\r\n" ] } ], "source": [ "# check out resulting file structure \n", "! ls" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31m137_S63_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m314_S49_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m139_S54_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m315_S26_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m140_S64_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m316_S9_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m141_S61_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m317_S33_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m156_S66_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m318_S6_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m159_S68_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m319_S52_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m161_S57_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m321_S29_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m162_S62_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m322_S8_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m168_S67_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m323_S39_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m169_S65_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m324_S47_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m171_S58_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m325_S13_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m172_S59_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m326_S38_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m181_S69_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m327_S37_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m183_S56_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m328_S12_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m184_S55_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m329_S46_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m185_S60_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m331_S53_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m291_S42_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m332_S48_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m292_S32_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m333_S30_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m293_S11_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m334_S50_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m294_S7_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m335_S31_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m295_S41_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m336_S51_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m296_S40_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m337_S35_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m298_S25_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m338_S4_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m299_S21_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m339_S10_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m301_S22_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m341_S19_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m302_S15_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m342_S23_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m303_S17_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m343_S14_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m304_S24_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m344_S27_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m305_S1_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m345_S16_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m306_S44_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m346_S18_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m307_S45_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m347_S43_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m308_S5_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m348_S3_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m309_S20_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31m349_S34_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n", "\u001b[31m311_S2_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[30m\u001b[43mReports\u001b[m\u001b[m\r\n", "\u001b[31m312_S28_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[30m\u001b[43mStats\u001b[m\u001b[m\r\n", "\u001b[31m313_S36_L001_R1_001.fastq.gz\u001b[m\u001b[m \u001b[31mUndetermined_S0_L001_R1_001.fastq.gz\u001b[m\u001b[m\r\n" ] } ], "source": [ "! ls Batch1_69plex_lane1_done/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Move to each batch's directory containing demultiplexed library files, and gunzip all fastq files in that folder" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Volumes/Bumblebee/O.lurida_QuantSeq-2020/raw-data/Batch1_69plex_lane1_done\n" ] } ], "source": [ "cd Batch1_69plex_lane1_done/" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "! gunzip *.fastq.gz" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Volumes/Bumblebee/O.lurida_QuantSeq-2020/raw-data/Batch2_77plex_lane2_done\n" ] } ], "source": [ "cd ../Batch2_77plex_lane2_done/" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "! gunzip *.fastq.gz " ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31m34_S68_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m482_S25_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m35_S72_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m483_S7_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m37_S70_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m484_S43_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m39_S52_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m485_S21_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m401_S10_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m487_S6_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m402_S5_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m488_S26_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m403_S30_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m489_S35_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m404_S42_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m490_S19_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m411_S9_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m491_S50_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m412_S74_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m492_S40_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m413_S38_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m506_S47_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m414_S49_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m513_S56_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m41_S62_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m521_S65_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m421_S22_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m522_S1_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m431b_S8_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m523_S4_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m432_S75_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m524_S11_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m434_S55_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m525_S18_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m43_S46_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m526_S15_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m441_S73_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m527_S39_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m442b_S60_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m528_S27_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m443_S36_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m529_S53_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m444_S34_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m531_S44_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m445_S45_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m532_S33_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m44_S71_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m533_S61_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m451_S28_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m541_S41_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m452b_S2_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m542_S3_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m453_S12_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m543_S29_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m45_S63_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m551_S24_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m461b_S31_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m552b_S54_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m462b_S64_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m553_S23_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m46_S66_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m554_S13_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m471b_S51_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m561_S69_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m472b_S48_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m562_S77_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m473_S20_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m563_S59_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m474_S14_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m564_S32_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m475_S16_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m565_S67_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m476_S17_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31m571_S76_L002_R1_001.fastq\u001b[m\u001b[m\r\n", "\u001b[31m477_S37_L002_R1_001.fastq\u001b[m\u001b[m \u001b[30m\u001b[43mReports\u001b[m\u001b[m\r\n", "\u001b[31m47_S58_L002_R1_001.fastq\u001b[m\u001b[m \u001b[30m\u001b[43mStats\u001b[m\u001b[m\r\n", "\u001b[31m481_S57_L002_R1_001.fastq\u001b[m\u001b[m \u001b[31mUndetermined_S0_L002_R1_001.fastq\u001b[m\u001b[m\r\n" ] } ], "source": [ "# Check out contents after gunzip \n", "! ls " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Initial QC " ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "! mkdir {workingdir}qc-processing/fastqc/" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "! mkdir {workingdir}qc-processing/fastqc/untrimmed/" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Started analysis of 506_S47_L002_R1_001.fastq\n", "Approx 5% complete for 506_S47_L002_R1_001.fastq\n", "Approx 10% complete for 506_S47_L002_R1_001.fastq\n", "Approx 15% complete for 506_S47_L002_R1_001.fastq\n", "Approx 20% complete for 506_S47_L002_R1_001.fastq\n", "Approx 25% complete for 506_S47_L002_R1_001.fastq\n", "Approx 30% complete for 506_S47_L002_R1_001.fastq\n", "Approx 35% complete for 506_S47_L002_R1_001.fastq\n", "Approx 40% complete for 506_S47_L002_R1_001.fastq\n", "Approx 45% complete for 506_S47_L002_R1_001.fastq\n", "Approx 50% complete for 506_S47_L002_R1_001.fastq\n", "Approx 55% complete for 506_S47_L002_R1_001.fastq\n", "Approx 60% complete for 506_S47_L002_R1_001.fastq\n", "Approx 65% complete for 506_S47_L002_R1_001.fastq\n", "Approx 70% complete for 506_S47_L002_R1_001.fastq\n", "Approx 75% complete for 506_S47_L002_R1_001.fastq\n", "Approx 80% complete for 506_S47_L002_R1_001.fastq\n", "Approx 85% complete for 506_S47_L002_R1_001.fastq\n", "Approx 90% complete for 506_S47_L002_R1_001.fastq\n", "Approx 95% complete for 506_S47_L002_R1_001.fastq\n", "Analysis complete for 506_S47_L002_R1_001.fastq\n" ] } ], "source": [ "# test fastqc on one sample file \n", "! {fastqc}fastqc \\\n", "506_S47_L002_R1_001.fastq \\\n", "--outdir ../fastqc/untrimmed/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run fastqc on all .fastq files in Batch/Lane2, the larval data (current directory) " ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "! {fastqc}fastqc \\\n", "*.fastq \\\n", "--outdir {workingdir}qc-processing/fastqc/untrimmed/ \\\n", "--quiet" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[31m34_S68_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m481_S57_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m34_S68_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m481_S57_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m35_S72_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m482_S25_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m35_S72_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m482_S25_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m37_S70_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m483_S7_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m37_S70_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m483_S7_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m39_S52_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m484_S43_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m39_S52_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m484_S43_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m401_S10_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m485_S21_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m401_S10_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m485_S21_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m402_S5_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m487_S6_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m402_S5_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m487_S6_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m403_S30_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m488_S26_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m403_S30_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m488_S26_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m404_S42_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m489_S35_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m404_S42_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m489_S35_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m411_S9_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m490_S19_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m411_S9_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m490_S19_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m412_S74_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m491_S50_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m412_S74_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m491_S50_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m413_S38_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m492_S40_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m413_S38_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m492_S40_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m414_S49_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m506_S47_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m414_S49_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m506_S47_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m41_S62_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m513_S56_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m41_S62_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m513_S56_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m421_S22_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m521_S65_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m421_S22_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m521_S65_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m431b_S8_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m522_S1_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m431b_S8_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m522_S1_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m432_S75_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m523_S4_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m432_S75_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m523_S4_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m434_S55_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m524_S11_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m434_S55_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m524_S11_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m43_S46_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m525_S18_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m43_S46_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m525_S18_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m441_S73_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m526_S15_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m441_S73_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m526_S15_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m442b_S60_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m527_S39_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m442b_S60_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m527_S39_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m443_S36_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m528_S27_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m443_S36_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m528_S27_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m444_S34_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m529_S53_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m444_S34_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m529_S53_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m445_S45_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m531_S44_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m445_S45_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m531_S44_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m44_S71_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m532_S33_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m44_S71_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m532_S33_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m451_S28_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m533_S61_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m451_S28_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m533_S61_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m452b_S2_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m541_S41_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m452b_S2_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m541_S41_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m453_S12_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m542_S3_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m453_S12_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m542_S3_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m45_S63_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m543_S29_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m45_S63_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m543_S29_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m461b_S31_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m551_S24_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m461b_S31_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m551_S24_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m462b_S64_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m552b_S54_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m462b_S64_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m552b_S54_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m46_S66_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m553_S23_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m46_S66_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m553_S23_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m471b_S51_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m554_S13_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m471b_S51_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m554_S13_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m472b_S48_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m561_S69_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m472b_S48_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m561_S69_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m473_S20_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m562_S77_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m473_S20_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m562_S77_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m474_S14_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m563_S59_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m474_S14_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m563_S59_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m475_S16_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m564_S32_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m475_S16_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m564_S32_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m476_S17_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m565_S67_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m476_S17_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m565_S67_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m477_S37_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31m571_S76_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m477_S37_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31m571_S76_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n", "\u001b[31m47_S58_L002_R1_001_fastqc.html\u001b[m\u001b[m \u001b[31mUndetermined_S0_L002_R1_001_fastqc.html\u001b[m\u001b[m\r\n", "\u001b[31m47_S58_L002_R1_001_fastqc.zip\u001b[m\u001b[m \u001b[31mUndetermined_S0_L002_R1_001_fastqc.zip\u001b[m\u001b[m\r\n" ] } ], "source": [ "# check out resulting fastqc files. \n", "! ls {workingdir}qc-processing/fastqc/untrimmed/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run fastqc on all fastq files in Batch/Lane 1, the adult ctenidia+juvenile samples " ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Volumes/Bumblebee/O.lurida_QuantSeq-2020/raw-data/Batch1_69plex_lane1_done\n" ] } ], "source": [ "cd {workingdir}raw-data/Batch1_69plex_lane1_done/" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "! {fastqc}fastqc \\\n", "*.fastq \\\n", "--outdir {workingdir}qc-processing/fastqc/untrimmed/ \\\n", "--quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate a MultiQC report on all untrimmed data files " ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;30m[INFO ]\u001b[0m multiqc : This is MultiQC v1.9.dev0\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : Template : default\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : Searching : /Volumes/Bumblebee/O.lurida_QuantSeq-2020/qc-processing/fastqc/untrimmed\n", "\u001b[1;30m[INFO ]\u001b[0m fastqc : Found 148 reports\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : Compressing plot data\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : Report : ../../qc-processing/fastqc/untrimmed/multiqc_report_untrimmed.html\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : Data : ../../qc-processing/fastqc/untrimmed/multiqc_report_untrimmed_data\n", "\u001b[1;30m[INFO ]\u001b[0m multiqc : MultiQC complete\n" ] } ], "source": [ "! multiqc {workingdir}qc-processing/fastqc/untrimmed/ \\\n", "--filename {workingdir}qc-processing/fastqc/untrimmed/multiqc_report_untrimmed.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect MultiQC report" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "MultiQC Report\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "