{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Use taxonomic read classifications from MEGAN6 Phylum-specific extractions to convert from FastA to FastQs - Arthropoda and Unassigned Reads" ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Input data are here:\n", "\n", "FastAs: https://gannet.fish.washington.edu/Atumefaciens/20230726-mmag-read_extraction/\n", "\n", "Trimmed-FastQs: https://gannet.fish.washington.edu/Atumefaciens/20230301-mmag-trimmed_rnaseq_from_noaa/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List computer specs" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TODAY'S DATE:\n", "Sun Jul 30 16:32:07 PDT 2023\n", "------------\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "No LSB modules are available.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Distributor ID:\tUbuntu\n", "Description:\tUbuntu 18.04.6 LTS\n", "Release:\t18.04\n", "Codename:\tbionic\n", "\n", "------------\n", "HOSTNAME: \n", "raven\n", "\n", "------------\n", "Computer Specs:\n", "\n", "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 48\n", "On-line CPU(s) list: 0-47\n", "Thread(s) per core: 2\n", "Core(s) per socket: 24\n", "Socket(s): 1\n", "NUMA node(s): 1\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 85\n", "Model name: Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz\n", "Stepping: 7\n", "CPU MHz: 1000.055\n", "CPU max MHz: 4000.0000\n", "CPU min MHz: 1000.0000\n", "BogoMIPS: 4400.00\n", "Virtualization: VT-x\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 1024K\n", "L3 cache: 36608K\n", "NUMA node0 CPU(s): 0-47\n", "Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities\n", "\n", "------------\n", "\n", "Memory Specs\n", "\n", " total used free shared buff/cache available\n", "Mem: 247G 2.4G 86G 952K 158G 242G\n", "Swap: 99G 179M 99G\n" ] } ], "source": [ "%%bash\n", "echo \"TODAY'S DATE:\"\n", "date\n", "echo \"------------\"\n", "echo \"\"\n", "#Display operating system info\n", "lsb_release -a\n", "echo \"\"\n", "echo \"------------\"\n", "echo \"HOSTNAME: \"; hostname \n", "echo \"\"\n", "echo \"------------\"\n", "echo \"Computer Specs:\"\n", "echo \"\"\n", "lscpu\n", "echo \"\"\n", "echo \"------------\"\n", "echo \"\"\n", "echo \"Memory Specs\"\n", "echo \"\"\n", "free -mh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set variables" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: RNAseq_dir=/home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq\n", "env: fasta_dir=/home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction\n", "env: wd=/home/shared/8TB_HDD_01/sam/analyses\n", "env: species=mmag\n", "env: phyla=arthropoda-and-unassigned\n", "env: seqtk=/home/shared/seqtk-1.4/seqtk\n" ] } ], "source": [ "# Set data directories\n", "%env RNAseq_dir=/home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq\n", "%env fasta_dir=/home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction\n", "%env wd=/home/shared/8TB_HDD_01/sam/analyses\n", "\n", "%env species=mmag\n", "%env phyla=arthropoda-and-unassigned\n", "\n", "# Programs\n", "%env seqtk=/home/shared/seqtk-1.4/seqtk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract phylum-specific reads from trimmed FastQ files\n", "\n", "\n", "Use FastA IDs from MEGAN6 taxonomic read extraction FastAs to pull out appropriate reads from each phylum (Arthropoda and Alveolata). This is performed because MEGAN6 strips paired read ID after the first space. As such, the resulting read extractions using MEGAN end up with a FastA file containing two reads with identicial headers. Not sure if this will cause any downstream issues (i.e. with Trinity) where paired end data is used, so playing it safe and using the truncated IDs to pull FastQs with complete sequence headers for use in subsequent data wrangling." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create list of FastA IDs" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Finished with FastA ID extraction.\n" ] } ], "source": [ "%%bash\n", "\n", "timestamp=$(date +%Y%m%d)\n", "\n", " \n", "# Make new directory and change to that directory (\"$_\" means use previous command's argument)\n", "mkdir --parents \"${wd}\"/\"${timestamp}\".\"${species}\"-megan_reads \\\n", "&& cd \"$_\" || exit\n", "\n", "# Set seqtk list filename\n", "seqtk_list=${timestamp}.${species}.seqtk.read_id.list\n", "\n", "######################################################\n", "# Create FastA IDs list to use for sequence extraction\n", "######################################################\n", "for fasta in ${fasta_dir}/*.fasta\n", "do\n", " echo \"Pulling FastA IDs from ${fasta}\"\n", " echo \"\"\n", " grep \">\" \"${fasta}\" | awk 'sub(/^>/, \"\")'\n", "done | sort -u >> \"${seqtk_list}\"\n", " \n", " \n", "echo \"\"\n", "echo \"Finished with FastA ID extraction.\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Insepct FastQ IDs file" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:10676\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:13683\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:14215\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:1532\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:17597\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:18380\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:18850\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:2002\n", "GWNJ-1012:589:GW2108041346th:3:1101:10004:2096\n" ] } ], "source": [ "%%bash\n", "timestamp=$(date +%Y%m%d)\n", "cd \"${wd}\"/\"${timestamp}\".\"${species}\"-megan_reads\n", "\n", "head \"${timestamp}.${species}.seqtk.read_id.list\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate MD5 checksums for input FastAs" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Recording MD5 checksum for /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/*.fasta.\n", "cf5c9f9f4ca98e49ad92eb57f4a0de0b /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-06.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "66df16af1a44a7491d053cb1995c5880 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-06.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "fd81ac48f2dfbb1e9baee926265660e6 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-14.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "12facf9917b418952bc732e5772d7f3a /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-14.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "124e669eae1bbc652134ce50a2e4d2f7 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-22.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "87aeab58b98dc88143ca3fbfc9dd063a /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-22.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "8f5cd5b04c9e5261463a0a9c5b32974e /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-38.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "8ab6a162f5cbc114d05e560500e51232 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH01-38.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "27fd7754c2ccfe956cb43759bdf863be /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-04.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "e64bc93b265e2f6a8e0ce2ec0cecde9a /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-04.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "c3a15b025b50c75364a61aa22ef92d9d /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-15.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "8863c6a6187b4f1f34ea0e71e336ecc2 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-15.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "54c123113c083d9dd180ad6222421597 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-33.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "1dba4ed96ab54702cea1e489fa3a7de5 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH03-33.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "3150af6e6c5cd1aeeb35ff2d6f405509 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-01.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "d42008f8d4f639e24a3544f9a97c4dac /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-01.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "eae6beacf4d3d22b87972fa21556078c /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-06.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "6909396b60061064a67145a80629a75d /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-06.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "f8a99d6cb069ad3ad46028cfe7fb2250 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-07.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "ac1adb6163fb13f3819c4b739b96a905 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-07.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "1ecd8619aecae2d58b78a907f6327d93 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-09.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "53cf5394bf9a62c8f5166ad8bb69b2ff /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-09.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "09a54301f35be158319d573204593ad8 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-14.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "81513f5fbe5355214b048fdd0f164fa7 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-14.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "f45a8fbffd7b11da5d759be094ff5344 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-21.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "d97de57f53b5dd68458321241b151b26 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-21.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "b1253087cf5a1fafcad44f82ced38d4c /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-29.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "eec7ff7b2c92d917e7988f0942abf0c3 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH05-29.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "f8c8b987cc670d22a50ac35c170e8ca1 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-04.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "8a2eace9f9ba3fd667dbdca122a5bb61 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-04.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "166cf4c77153cce2f833c1a72b3708b9 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-06.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "64fb965ee601684d31853ff893ba6848 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-06.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "0445b88359d8c69b34e7f3b4646c5a9f /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-08.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "0fe5862a38b3bda04c7f4970af02aa3c /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-08.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "7e09213400e7236bae3e60e411cac0bf /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-11.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "20af48e7b983eb2a17573081cea677da /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-11.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "407e067299e23609e4ff353e81705e63 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-24.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "016cda1c262b3970d383acc1d3083545 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH07-24.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "8eb16a939fae131088d15ffcf71f0158 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-02.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "c17336b0b30c07bf3bda6c7c4ca95aa2 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-02.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "ed2167d64e530500146b972c901d5bc7 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-13.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "71a617b64cedfc0b5c8e8a6440fe5da7 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-13.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "ad131a6ce4aaf4ffbb414c203a960e04 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-28.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "03133926a760ad8f05652865c84261dc /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH09-28.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "f8cc75f2083fe580ba4ff66adce7da48 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH10-08.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "d25cf1241dc19f592e5f202ed9a8a1bb /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH10-08.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "873ec46bc3dcd1e5bd4a5c75d64da9d4 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH10-11.trimmed.R1-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "3ee44831768928ce43b7188a06072199 /home/shared/8TB_HDD_01/sam/analyses/20230726-mmag-read_extraction/CH10-11.trimmed.R2-MEGAN_summarized_reads-arthropoda_NA.fasta\n", "\n", "\n", "Moving on to read extractions...\n", "\n" ] } ], "source": [ "%%bash\n", "timestamp=$(date +%Y%m%d)\n", "cd \"${wd}\"/\"${timestamp}\".\"${species}\"-megan_reads\n", "\n", "# Get checksums for FastAs\n", "for fasta in ${RNAseq_dir}/*.fasta\n", "do\n", " echo \"\"\n", " echo \"Recording MD5 checksum for ${fasta}.\"\n", " md5sum ${fasta_dir}/*.fasta | tee --append input-fasta-checksums.md5\n", " echo \"\"\n", "done\n", "\n", "echo \"\"\n", "echo \"Moving on to read extractions...\" \n", "echo \"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extract FastQs Using `seqtk`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-06.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH01-06.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-06.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-14.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH01-14.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-14.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-22.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH01-22.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-22.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-38.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH01-38.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-38.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-04.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH03-04.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-04.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-15.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH03-15.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-15.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-33.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH03-33.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-33.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-01.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-01.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-01.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-06.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-06.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-06.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-07.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-07.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-07.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-09.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-09.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-09.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-14.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-14.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-14.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-21.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-21.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-21.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-29.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH05-29.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-29.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-04.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH07-04.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-04.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-06.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH07-06.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-06.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-08.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH07-08.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-08.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-11.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH07-11.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-11.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-24.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH07-24.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-24.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-02.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH09-02.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-02.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-13.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH09-13.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-13.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-28.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH09-28.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-28.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH10-08.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH10-08.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH10-08.trimmed.megan_R1.fq.\n", "\n", "Extracting R1 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH10-11.trimmed.R1.fastq.gz.\n", "\n", "Writing R1 reads to 20230730.mmag.CH10-11.trimmed.megan_R1.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH10-11.trimmed.megan_R1.fq.\n", "\n", "\n", "Done with R1 read extractions\n", "-------------------------------------\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-06.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH01-06.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-06.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-14.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH01-14.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-14.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-22.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH01-22.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-22.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH01-38.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH01-38.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH01-38.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-04.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH03-04.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-04.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-15.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH03-15.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-15.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH03-33.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH03-33.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH03-33.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-01.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-01.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-01.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-06.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-06.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-06.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-07.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-07.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-07.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-09.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-09.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-09.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-14.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-14.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-14.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-21.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-21.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-21.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH05-29.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH05-29.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH05-29.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-04.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH07-04.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-04.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-06.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH07-06.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-06.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-08.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH07-08.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-08.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-11.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH07-11.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-11.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH07-24.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH07-24.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH07-24.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-02.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH09-02.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-02.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-13.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH09-13.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-13.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH09-28.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH09-28.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH09-28.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH10-08.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH10-08.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH10-08.trimmed.megan_R2.fq.\n", "\n", "Extracting R2 reads from /home/shared/8TB_HDD_01/sam/data/M_magister/RNAseq/CH10-11.trimmed.R2.fastq.gz.\n", "\n", "Writing R2 reads to 20230730.mmag.CH10-11.trimmed.megan_R2.fq\n", "\n", "\n", "Compressing 20230730.mmag.CH10-11.trimmed.megan_R2.fq.\n", "\n", "-------------------------------------\n", "\n", "/home/shared/8TB_HDD_01/sam/analyses/20230730.mmag-megan_reads\n", "total 21G\n", "-rw-rw-r-- 1 sam sam 6.5G Jul 30 17:19 20230730.mmag.seqtk.read_id.list\n", "-rw-rw-r-- 1 sam sam 7.7K Jul 30 17:21 input-fasta-checksums.md5\n", "-rw-rw-r-- 1 sam sam 218M Jul 30 17:25 20230730.mmag.CH01-06.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 405M Jul 30 17:30 20230730.mmag.CH01-14.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 238M Jul 30 17:36 20230730.mmag.CH01-22.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 405M Jul 30 17:42 20230730.mmag.CH01-38.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 214M Jul 30 17:48 20230730.mmag.CH03-04.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 260M Jul 30 17:53 20230730.mmag.CH03-15.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 235M Jul 30 17:59 20230730.mmag.CH03-33.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 248M Jul 30 18:04 20230730.mmag.CH05-01.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 219M Jul 30 18:09 20230730.mmag.CH05-06.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 241M Jul 30 18:15 20230730.mmag.CH05-07.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 335M Jul 30 18:20 20230730.mmag.CH05-09.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 336M Jul 30 18:26 20230730.mmag.CH05-14.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 319M Jul 30 18:32 20230730.mmag.CH05-21.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 350M Jul 30 18:38 20230730.mmag.CH05-29.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 193M Jul 30 18:44 20230730.mmag.CH07-04.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 262M Jul 30 18:49 20230730.mmag.CH07-06.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 503M Jul 30 18:55 20230730.mmag.CH07-08.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 349M Jul 30 19:02 20230730.mmag.CH07-11.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 378M Jul 30 19:09 20230730.mmag.CH07-24.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 215M Jul 30 19:15 20230730.mmag.CH09-02.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 263M Jul 30 19:20 20230730.mmag.CH09-13.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 191M Jul 30 19:25 20230730.mmag.CH09-28.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 305M Jul 30 19:30 20230730.mmag.CH10-08.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 259M Jul 30 19:36 20230730.mmag.CH10-11.trimmed.megan_R1.fq.gz\n", "-rw-rw-r-- 1 sam sam 224M Jul 30 19:41 20230730.mmag.CH01-06.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 412M Jul 30 19:47 20230730.mmag.CH01-14.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 242M Jul 30 19:53 20230730.mmag.CH01-22.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 410M Jul 30 19:59 20230730.mmag.CH01-38.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 218M Jul 30 20:05 20230730.mmag.CH03-04.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 265M Jul 30 20:10 20230730.mmag.CH03-15.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 235M Jul 30 20:16 20230730.mmag.CH03-33.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 254M Jul 30 20:21 20230730.mmag.CH05-01.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 223M Jul 30 20:27 20230730.mmag.CH05-06.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 243M Jul 30 20:32 20230730.mmag.CH05-07.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 339M Jul 30 20:38 20230730.mmag.CH05-09.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 341M Jul 30 20:44 20230730.mmag.CH05-14.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 324M Jul 30 20:51 20230730.mmag.CH05-21.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 360M Jul 30 20:57 20230730.mmag.CH05-29.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 196M Jul 30 21:03 20230730.mmag.CH07-04.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 265M Jul 30 21:08 20230730.mmag.CH07-06.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 513M Jul 30 21:15 20230730.mmag.CH07-08.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 350M Jul 30 21:22 20230730.mmag.CH07-11.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 384M Jul 30 21:28 20230730.mmag.CH07-24.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 219M Jul 30 21:34 20230730.mmag.CH09-02.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 265M Jul 30 21:40 20230730.mmag.CH09-13.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 190M Jul 30 21:45 20230730.mmag.CH09-28.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 312M Jul 30 21:50 20230730.mmag.CH10-08.trimmed.megan_R2.fq.gz\n", "-rw-rw-r-- 1 sam sam 260M Jul 30 21:56 20230730.mmag.CH10-11.trimmed.megan_R2.fq.gz\n" ] } ], "source": [ "%%bash\n", "\n", "timestamp=$(date +%Y%m%d)\n", "\n", "# Set seqtk list filename\n", "seqtk_list=${timestamp}.${species}.seqtk.read_id.list\n", "\n", "# Set output FastQ filenames\n", "prefix=${timestamp}.${species}\n", "R1_suffix=megan_R1.fq\n", "R2_suffix=megan_R2.fq\n", "\n", "cd \"${wd}\"/\"${timestamp}\".\"${species}\"-megan_reads\n", "\n", "######################################################\n", "# Extract corresponding R1 and R2 reads using seqtk FastA ID list\n", "######################################################\n", "for fastq in \"${RNAseq_dir}\"/*R1*.gz\n", "do\n", " # Strip path from filename\n", " fastq_nopath=${fastq##*/}\n", " \n", " # Get sample ID from FastQ filename\n", " sample=$(echo \"${fastq_nopath}\" | awk -F \".\" '{print $1\".\"$2}')\n", " \n", " R1_out=\"${prefix}.${sample}.${R1_suffix}\"\n", " \n", " echo \"Extracting R1 reads from ${fastq}.\"\n", " echo \"\"\n", " echo \"Writing R1 reads to ${R1_out}\"\n", " echo \"\"\n", " \n", " # Use seqtk to pull out desired FastQ reads\n", " \t${seqtk} subseq \"${fastq}\" \"${seqtk_list}\" >> \"${R1_out}\"\n", " \n", " # Gzip output file\n", " echo \"\"\n", " echo \"Compressing ${R1_out}.\"\n", " gzip \"${R1_out}\"\n", " echo \"\"\n", "done\n", " \n", "echo \"\"\n", "echo \"Done with R1 read extractions\"\n", "echo \"-------------------------------------\"\n", "echo \"\"\n", "\n", "for fastq in \"${RNAseq_dir}\"/*R2*.gz\n", "do\n", " # Strip path from filename\n", " fastq_nopath=${fastq##*/}\n", " \n", " # Get sample ID from FastQ filename\n", " sample=$(echo \"${fastq_nopath}\" | awk -F \".\" '{print $1\".\"$2}')\n", " \n", " # Set output filename \n", " R2_out=\"${prefix}.${sample}.${R2_suffix}\"\n", " \n", " \n", " echo \"Extracting R2 reads from ${fastq}.\"\n", " echo \"\"\n", " echo \"Writing R2 reads to ${R2_out}\"\n", " echo \"\"\n", " \n", " # Use seqtk to pull out desired FastQ reads\n", " \t${seqtk} subseq \"${fastq}\" \"${seqtk_list}\" >> \"${R2_out}\"\n", " \n", " # Gzip output file\n", " echo \"\"\n", " echo \"Compressing ${R2_out}.\"\n", " gzip \"${R2_out}\"\n", " echo \"\"\n", "\n", "done\n", " \n", "echo \"-------------------------------------\"\n", "echo \"\"\n", " \n", "# Print working directory and list files\n", "pwd\n", "ls -ltrh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate checksums" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e6047623dc885392b3ae816f094552b4 20230730.mmag.CH01-06.trimmed.megan_R1.fq.gz\n", "40b096dc26066a0d715d05e4bde36806 20230730.mmag.CH01-06.trimmed.megan_R2.fq.gz\n", "401d6206dcd62c8b682536eef19ee4cc 20230730.mmag.CH01-14.trimmed.megan_R1.fq.gz\n", "dcfe7a42861fa7406e1cc16522584045 20230730.mmag.CH01-14.trimmed.megan_R2.fq.gz\n", "e9c1b9a7d3afb9bdd4beaff2c6ddb0fa 20230730.mmag.CH01-22.trimmed.megan_R1.fq.gz\n", "7491813d90b50e08e3cefca8e1a1d7fc 20230730.mmag.CH01-22.trimmed.megan_R2.fq.gz\n", "99bf9577b46bf2dfca8b8ac042e179ed 20230730.mmag.CH01-38.trimmed.megan_R1.fq.gz\n", "4be920ddd78862fe25a46e66235faad4 20230730.mmag.CH01-38.trimmed.megan_R2.fq.gz\n", "2f85cd2b10061b89011123d3bfa58d4f 20230730.mmag.CH03-04.trimmed.megan_R1.fq.gz\n", "d220a8d18d8a67bfba19410879464f43 20230730.mmag.CH03-04.trimmed.megan_R2.fq.gz\n", "a20b5ac49bdf3e17f18d0dbc1b60d6d2 20230730.mmag.CH03-15.trimmed.megan_R1.fq.gz\n", "7cf21ab8f759a50183d31584aac1d1e3 20230730.mmag.CH03-15.trimmed.megan_R2.fq.gz\n", "be8e50bf2c6747671ab4c2f2c3cbc5fd 20230730.mmag.CH03-33.trimmed.megan_R1.fq.gz\n", "a5edfe0babb198adb29d1a19784f678b 20230730.mmag.CH03-33.trimmed.megan_R2.fq.gz\n", "53ef62108920c3957954ac443418ca92 20230730.mmag.CH05-01.trimmed.megan_R1.fq.gz\n", "0c22a7ca75b5c5adb646794115f112c7 20230730.mmag.CH05-01.trimmed.megan_R2.fq.gz\n", "f5abf6a6f57473c7a8aed7b37472a4d2 20230730.mmag.CH05-06.trimmed.megan_R1.fq.gz\n", "3f17e210323b66673b87682e18c65756 20230730.mmag.CH05-06.trimmed.megan_R2.fq.gz\n", "3e3823bbf19ec39bca8d5544640a4211 20230730.mmag.CH05-07.trimmed.megan_R1.fq.gz\n", "58189bc49bee52fa10aca82274657ce3 20230730.mmag.CH05-07.trimmed.megan_R2.fq.gz\n", "c6900bf83ffbf843216fb67fce1c7df2 20230730.mmag.CH05-09.trimmed.megan_R1.fq.gz\n", "02f9adfa7438fb42b4dca917aa4763fb 20230730.mmag.CH05-09.trimmed.megan_R2.fq.gz\n", "fa31a570dc375a9fd2764b4723fad235 20230730.mmag.CH05-14.trimmed.megan_R1.fq.gz\n", "99e636f1a2ebdd870d6f822901ceed61 20230730.mmag.CH05-14.trimmed.megan_R2.fq.gz\n", "753d18733a2d9c041660d3b8c14c8f6e 20230730.mmag.CH05-21.trimmed.megan_R1.fq.gz\n", "f5d38ef3d32b2e11a23ca6a46098b8e6 20230730.mmag.CH05-21.trimmed.megan_R2.fq.gz\n", "383b33f41b1412d9dbf0dcc9457a305c 20230730.mmag.CH05-29.trimmed.megan_R1.fq.gz\n", "9a910e7748d1aa92c8035df26c1cc42b 20230730.mmag.CH05-29.trimmed.megan_R2.fq.gz\n", "370c7b651ff1cc7c1406123d831e7d64 20230730.mmag.CH07-04.trimmed.megan_R1.fq.gz\n", "1574c16f21475c91aa21ae9f81426768 20230730.mmag.CH07-04.trimmed.megan_R2.fq.gz\n", "9d44fc4c9f7a7c7df0125b6174613627 20230730.mmag.CH07-06.trimmed.megan_R1.fq.gz\n", "ac41ebdd06208c728e1146839153b29e 20230730.mmag.CH07-06.trimmed.megan_R2.fq.gz\n", "d8f891faeead968fa5b4fdb6a9c26d21 20230730.mmag.CH07-08.trimmed.megan_R1.fq.gz\n", "98340e3ca008cf97fe240469b5c0f6f5 20230730.mmag.CH07-08.trimmed.megan_R2.fq.gz\n", "8ffcad6e64dd7d1b12c1739767425ec1 20230730.mmag.CH07-11.trimmed.megan_R1.fq.gz\n", "4b2673c0f02e48db0acbb847a4fb6dab 20230730.mmag.CH07-11.trimmed.megan_R2.fq.gz\n", "cdd6d93b54fd40b135f4545474d9f837 20230730.mmag.CH07-24.trimmed.megan_R1.fq.gz\n", "3229af42be57da29f13cad72f1f99523 20230730.mmag.CH07-24.trimmed.megan_R2.fq.gz\n", "4328a6e6194fb6417dcc9cb61d3f0a72 20230730.mmag.CH09-02.trimmed.megan_R1.fq.gz\n", "74c58ba1072a61ea3e75b83342589830 20230730.mmag.CH09-02.trimmed.megan_R2.fq.gz\n", "a748a887a31a327d90bb80550c4e2e40 20230730.mmag.CH09-13.trimmed.megan_R1.fq.gz\n", "7d81df73ba6dc5bbe3457bf973ede371 20230730.mmag.CH09-13.trimmed.megan_R2.fq.gz\n", "67d2237f163289bc2b962e52c21562cc 20230730.mmag.CH09-28.trimmed.megan_R1.fq.gz\n", "088a69aaf3371a0a6e16ab63e67146e2 20230730.mmag.CH09-28.trimmed.megan_R2.fq.gz\n", "e61fb563e027256c0d45144bb3c89b78 20230730.mmag.CH10-08.trimmed.megan_R1.fq.gz\n", "426403c8c8cb8275c84bdca3635b3186 20230730.mmag.CH10-08.trimmed.megan_R2.fq.gz\n", "8e5dc9bc9d058b23cefec0e4d03f2cde 20230730.mmag.CH10-11.trimmed.megan_R1.fq.gz\n", "e894988fd6467f8263f40e5946e3fe88 20230730.mmag.CH10-11.trimmed.megan_R2.fq.gz\n", "19d34e71bf5baf059f1b945e74c4cac5 20230730.mmag.seqtk.read_id.list\n", "0fc19544ed8bb60bbb1ce02213dc64f8 input-fasta-checksums.md5\n" ] } ], "source": [ "%%bash\n", "timestamp=$(date +%Y%m%d)\n", "\n", "cd \"${wd}\"/\"${timestamp}\".\"${species}\"-megan_reads\n", "\n", "for file in *\n", "do\n", " md5sum ${file} | tee --append checksums.md5\n", "done" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" } }, "nbformat": 4, "nbformat_minor": 4 }