{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Reference sequence mapping horserace\n", "\n", "ipyrad is capable of incorporating a reference sequence to aid in the assembly. There are actually 4 different assembly methods, 3 of which use reference sequence in some way. Here we test reference assisted assembly for ipyrad and stacks, as ddocent and aftrrad do not allow for . Though aftrRAD performs nicely on empirical data and does allow for reference assisted assembly, we consider runtimes to be prohibitive and so exclude it from analysis here.\n", "\n", "Ideas for datasets (all have data in SRA):\n", "\n", " Selection and sex-biased dispersal in a coastal shark: the influence of philopatry on adaptive variation\n", " - 134 individuals (paper assembled w/ ddocent)\n", " \n", " Genome-wide data reveal cryptic diversity and genetic introgression in an Oriental cynopterine fruit bat radiation\n", " - < 45 samples, 2 reference genomes in the same family\n", " \n", " Beyond the Coral Triangle: high genetic diversity and near panmixia in Singapore's populations of the broadcast spawning sea star Protoreaster nodosus\n", " - 77 samples, it's a passerine, so there must be something reasonably close\n", " \n", " PSMC (pairwise sequentially Markovian coalescent) analysis of RAD (restriction site associated DNA) sequencing data\n", " - 17 sticklebacks, they used stacks\n", " \n", "For this analysis we chose the paired-end ddRAD dataset from Lah et al 2016 (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0162792#sec002):\n", " Spatially Explicit Analysis of Genome-Wide SNPs Detects Subtle Population Structure in a Mobile Marine Mammal, the Harbor Porpoise\n", " - 49 samples from 3 populations of European Harbor Porpoise\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import subprocess\n", "import ipyrad as ip\n", "import shutil\n", "import glob\n", "import sys\n", "import os\n", "\n", "## Set the default directories for exec and data. \n", "WORK_DIR=\"/home/iovercast/manuscript-analysis/\"\n", "REFMAP_EMPIRICAL_DIR=os.path.join(WORK_DIR, \"Phocoena_empirical/\")\n", "REFMAP_FASTQS=os.path.join(REFMAP_EMPIRICAL_DIR, \"Final_Files_forDryad/Bbif_ddRADseq/fastq/\")\n", "IPYRAD_DIR=os.path.join(WORK_DIR, \"ipyrad/\")\n", "STACKS_DIR=os.path.join(WORK_DIR, \"stacks/\")\n", "\n", "for dir in [WORK_DIR, REFMAP_EMPIRICAL_DIR, IPYRAD_DIR, STACKS_DIR]:\n", " if not os.path.exists(dir):\n", " os.makedirs(dir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Simulated reference sequence mapping\n", "To get some idea of how ipyrad and stacks perform with reference sequence mapping we'll\n", "first assemble a simulated dataset.\n", "\n", "Right now i'm just grabbing the simulated data from the ipyrad ipsimdata directory cuz it's quick and dirty\n", "but if you want to get crazy you can simulate new seqs by ripping code from here: https://github.com/dereneaton/ipyrad/blob/master/tests/cookbook-making-sim-data.ipynb" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make directories and fetch the simulated data\n", "The toy simulated data that lives in the ipyrad git repo consists of 1000 simulated loci, 500 of which are present\n", "in the simulated reference sequence." ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2016-12-31 14:12:22-- https://github.com/dereneaton/ipyrad/raw/master/tests/ipsimdata.tar.gz\n", "Resolving github.com (github.com)... 192.30.253.113, 192.30.253.112\n", "Connecting to github.com (github.com)|192.30.253.113|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/dereneaton/ipyrad/master/tests/ipsimdata.tar.gz [following]\n", "--2016-12-31 14:12:23-- https://raw.githubusercontent.com/dereneaton/ipyrad/master/tests/ipsimdata.tar.gz\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.20.133\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.20.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 12433112 (12M) [application/octet-stream]\n", "Saving to: ‘ipsimdata.tar.gz.1’\n", "\n", "100%[======================================>] 12,433,112 17.8MB/s in 0.7s \n", "\n", "2016-12-31 14:12:24 (17.8 MB/s) - ‘ipsimdata.tar.gz.1’ saved [12433112/12433112]\n", "\n" ] } ], "source": [ "REFMAP_SIM_DIR = os.path.join(WORK_DIR, \"REFMAP_SIM/\")\n", "REFMAP_DAT_DIR = os.path.join(REFMAP_SIM_DIR, \"ipsimdata/\")\n", "IPYRAD_SIM_DIR = os.path.join(REFMAP_SIM_DIR, \"ipyrad/\")\n", "STACKS_SIM_DIR = os.path.join(REFMAP_SIM_DIR, \"stacks/\")\n", "DDOCENT_SIM_DIR = os.path.join(REFMAP_SIM_DIR, \"ddocent/\")\n", "## REFMAP_DAT_DIR will be created when we untar ipsimdata.tar.gz\n", "for d in [REFMAP_SIM_DIR, IPYRAD_SIM_DIR, STACKS_SIM_DIR, DDOCENT_SIM_DIR]:\n", " if not os.path.exists(d):\n", " os.makedirs(d)\n", "os.chdir(REFMAP_SIM_DIR)\n", "\n", "!wget https://github.com/dereneaton/ipyrad/raw/master/tests/ipsimdata.tar.gz\n", "!tar -xzf ipsimdata.tar.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do ipyrad simulated reference mapping\n", "For the first analysis we'll use the `reference` assembly method which will just toss out\n", "all reads that don't map the the reference sequence. We should expect to recover 500 reads per\n", "sample.\n", "\n", "The toy data runs in ~2 minutes." ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " New Assembly: refmap-sim\n", "ipyrad -p params-refmap-sim.txt -s 1234567 -c 40\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: refmap-sim\n", " from saved path: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim.json\n", " local compute node: [40 cores] on node001\n", "\n", " Step 1: Demultiplexing fastq data to Samples\n", " Skipping: 12 Samples already found in Assembly refmap-sim.\n", " (can overwrite with force argument) \n", "\n", " Step 2: Filtering reads \n", " Skipping: All 12 selected Samples already edited.\n", " (can overwrite with force argument) \n", "\n", " Step 3: Clustering/Mapping reads\n", " Skipping: All 12 selected Samples already clustered.\n", " (can overwrite with force argument) \n", "\n", " Step 4: Joint estimation of error rate and heterozygosity\n", " Skipping: All 12 selected Samples already joint estimated\n", " (can overwrite with force argument) \n", "\n", " Step 5: Consensus base calling \n", " Skipping: All 12 selected Samples already consensus called\n", " (can overwrite with force argument) \n", "\n", " Step 6: Clustering at 0.85 similarity across 12 samples\n", " Skipping: All 12 selected Samples already clustered.\n", " (can overwrite with force argument) \n", "\n", " Step 7: Filter and write output files for 12 Samples\n", "ERROR:ipyrad.core.assembly:IPyradWarningExit: Output files already created for this Assembly in:\n", " /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_outfiles\n", " To overwrite, rerun using the force argument.\n", " \n", "\n", " Encountered an error, see ./ipyrad_log.txt. \n", " Output files already created for this Assembly in:\n", " /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_outfiles\n", " To overwrite, rerun using the force argument.\n", " \n", "\n", "\n", "real\t0m18.623s\n", "user\t0m1.899s\n", "sys\t0m0.552s\n" ] } ], "source": [ "os.chdir(IPYRAD_SIM_DIR)\n", "\n", "## Make a new assembly and set some assembly parameters\n", "data = ip.Assembly(\"refmap-sim\")\n", "data.set_params(\"raw_fastq_path\", REFMAP_DAT_DIR + \"pairddrad_wmerge_example_R*_.fastq.gz\")\n", "data.set_params(\"barcodes_path\", REFMAP_DAT_DIR + \"pairddrad_wmerge_example_barcodes.txt\")\n", "data.set_params(\"project_dir\", \"reference-assembly\")\n", "data.set_params(\"assembly_method\", \"reference\")\n", "data.set_params(\"reference_sequence\", REFMAP_DAT_DIR + \"pairddrad_wmerge_example_genome.fa\")\n", "data.set_params(\"datatype\", \"pairddrad\")\n", "data.set_params(\"restriction_overhang\", (\"TGCAG\", \"CGG\"))\n", "\n", "data.write_params(force=True)\n", "\n", "cmd = \"ipyrad -p params-refmap-sim.txt -s 1234567 -c 40\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do ipyrad denovo+reference\n", "Create a new branch and set the assembly method to `denovo+reference`. Now we will expect to recover\n", "1000 loci per sample (500 from the reference mapping and 500 from de novo).\n", "\n", "Again, the toy data runs in slighly less than 3 minutes." ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad -p params-denovo_plus_reference-sim.txt -s 1234567 -c 40\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: denovo_plus_reference-sim\n", " from saved path: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_plus_reference-sim.json\n", " New Assembly: denovo_plus_reference-sim\n", " local compute node: [40 cores] on node001\n", "\n", " Step 1: Demultiplexing fastq data to Samples\n", " [####################] 100% chunking large files | 0:00:00 \n", " [####################] 100% sorting reads | 0:00:09 \n", " [####################] 100% writing/compressing | 0:00:01 \n", "\n", " Step 2: Filtering reads \n", " [####################] 100% processing reads | 0:00:02 \n", "\n", " Step 3: Clustering/Mapping reads\n", " [####################] 100% dereplicating | 0:00:00 \n", " [####################] 100% mapping | 0:00:00 \n", " [####################] 100% clustering | 0:00:01 \n", " [####################] 100% building clusters | 0:00:00 \n", " [####################] 100% finalize mapping | 0:01:07 \n", " [####################] 100% chunking | 0:00:00 \n", " [####################] 100% aligning | 0:00:14 \n", " [####################] 100% concatenating | 0:00:00 \n", "\n", " Step 4: Joint estimation of error rate and heterozygosity\n", " [####################] 100% inferring [H, E] | 0:00:15 \n", "\n", " Step 5: Consensus base calling \n", " Mean error [0.00074 sd=0.00001]\n", " Mean hetero [0.00195 sd=0.00015]\n", " [####################] 100% calculating depths | 0:00:00 \n", " [####################] 100% chunking clusters | 0:00:00 \n", " [####################] 100% consens calling | 0:00:12 \n", "\n", " Step 6: Clustering at 0.85 similarity across 12 samples\n", " [####################] 100% concat/shuffle input | 0:00:00 \n", " [####################] 100% clustering across | 0:00:01 \n", " [####################] 100% building clusters | 0:00:00 \n", " [####################] 100% aligning clusters | 0:00:02 \n", " [####################] 100% database indels | 0:00:00 \n", " [####################] 100% indexing clusters | 0:00:02 \n", " [####################] 100% building database | 0:00:00 \n", "\n", " Step 7: Filter and write output files for 12 Samples\n", " [####################] 100% filtering loci | 0:00:06 \n", " [####################] 100% building loci/stats | 0:00:00 \n", " [####################] 100% building vcf file | 0:00:01 \n", " [####################] 100% writing vcf file | 0:00:00 \n", " [####################] 100% building arrays | 0:00:02 \n", " [####################] 100% writing outfiles | 0:00:01 \n", " Outfiles written to: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_plus_reference-sim_outfiles\n", "\n", "\n", "real\t2m50.283s\n", "user\t0m22.885s\n", "sys\t0m1.653s\n" ] } ], "source": [ "data2 = data.branch(\"denovo_plus_reference-sim\")\n", "data2.set_params(\"assembly_method\", \"denovo+reference\")\n", "\n", "data2.write_params(force=True)\n", "\n", "cmd = \"ipyrad -p params-denovo_plus_reference-sim.txt -s 1234567 -c 40\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do ipyrad denovo-reference\n", "Create a new branch and set the assembly method to `denovo+reference`. Now we will expect to recover\n", "1000 loci per sample (500 from the reference mapping and 500 from de novo).\n", "\n", "Again, the toy data runs in slighly less than 2 minutes." ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad -p params-denovo_minus_reference-sim.txt -s 1234567 -c 40\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: denovo_minus_reference-sim\n", " from saved path: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_minus_reference-sim.json\n", " New Assembly: denovo_minus_reference-sim\n", " local compute node: [40 cores] on node001\n", "\n", " Step 1: Demultiplexing fastq data to Samples\n", " [####################] 100% chunking large files | 0:00:00 \n", " [####################] 100% sorting reads | 0:00:07 \n", " [####################] 100% writing/compressing | 0:00:01 \n", "\n", " Step 2: Filtering reads \n", " [####################] 100% processing reads | 0:00:02 \n", "\n", " Step 3: Clustering/Mapping reads\n", " [####################] 100% dereplicating | 0:00:00 \n", " [####################] 100% mapping | 0:00:01 \n", " [####################] 100% clustering | 0:00:00 \n", " [####################] 100% building clusters | 0:00:01 \n", " [####################] 100% chunking | 0:00:00 \n", " [####################] 100% aligning | 0:00:07 \n", " [####################] 100% concatenating | 0:00:00 \n", "\n", " Step 4: Joint estimation of error rate and heterozygosity\n", " [####################] 100% inferring [H, E] | 0:00:12 \n", "\n", " Step 5: Consensus base calling \n", " Mean error [0.00074 sd=0.00002]\n", " Mean hetero [0.00192 sd=0.00023]\n", " [####################] 100% calculating depths | 0:00:00 \n", " [####################] 100% chunking clusters | 0:00:00 \n", " [####################] 100% consens calling | 0:00:05 \n", "\n", " Step 6: Clustering at 0.85 similarity across 12 samples\n", " [####################] 100% concat/shuffle input | 0:00:00 \n", " [####################] 100% clustering across | 0:00:01 \n", " [####################] 100% building clusters | 0:00:00 \n", " [####################] 100% aligning clusters | 0:00:01 \n", " [####################] 100% database indels | 0:00:00 \n", " [####################] 100% indexing clusters | 0:00:01 \n", " [####################] 100% building database | 0:00:00 \n", "\n", " Step 7: Filter and write output files for 12 Samples\n", " [####################] 100% filtering loci | 0:00:06 \n", " [####################] 100% building loci/stats | 0:00:00 \n", " [####################] 100% building vcf file | 0:00:01 \n", " [####################] 100% writing vcf file | 0:00:00 \n", " [####################] 100% building arrays | 0:00:02 \n", " [####################] 100% writing outfiles | 0:00:00 \n", " Outfiles written to: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_minus_reference-sim_outfiles\n", "\n", "\n", "real\t1m17.382s\n", "user\t0m14.547s\n", "sys\t0m1.512s\n" ] } ], "source": [ "data2 = data.branch(\"denovo_minus_reference-sim\")\n", "data2.set_params(\"assembly_method\", \"denovo-reference\")\n", "\n", "data2.write_params(force=True)\n", "\n", "cmd = \"ipyrad -p params-denovo_minus_reference-sim.txt -s 1234567 -c 40\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stacks simulated reference sequence assembly\n", "Stacks needs reference mapped sequences in .bam or .sam format. ~~Since we already did the mapping\n", "in ipyrad we'll just pluck the `*-mapped-sorted.bam` files out of the ipyrad \\_refmapping directory.~~ Nope! That would certainly be nice, but we `dereplicate` reads before we map. This is obviously important information\n", "(which ipyrad records and uses downstream), but stacks doesn't know about this, so you get bad quality\n", "mapping. I learned this the hard way. We need to map each sample by hand from the original ipyrad \\_edits files.\n", "\n", "On the toy simulated data stacks runs in an impressive 5 seconds." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### bwa mapping \n", "We are going to poach bwa and samtools from the dDocent install, since we already installed them there. ugh.\n", "\n", "This proceeds in 3 steps. First bwa maps to the reference (-t is # of cores, -v shuts down verbosity). Then samtools pulls out mapped and properly paired reads (-b outputs .bam format, -F 0x804 filters on mapping/pairing). Then it sorts the bam file, and cleans up the dangling sam file. Mapping is quick for the simulated data (<2 minutes)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40080 sequences (3827640 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10025, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.20, 14.47)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40080 reads in 2.988 CPU sec, 0.112 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1A_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1A_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.434 sec; CPU: 3.219 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 39964 sequences (3816562 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10017, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.28, 14.44)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 39964 reads in 2.968 CPU sec, 0.108 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1B_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1B_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.262 sec; CPU: 3.031 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40210 sequences (3840055 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10125, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.14, 14.41)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40210 reads in 3.001 CPU sec, 0.112 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1C_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1C_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.271 sec; CPU: 3.068 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40344 sequences (3852852 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10084, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.31, 14.37)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40344 reads in 3.024 CPU sec, 0.113 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1D_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/1D_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.273 sec; CPU: 3.092 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40164 sequences (3835662 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10021, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 476)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (404, 524)\n", "[M::mem_pestat] mean and std.dev: (464.12, 14.42)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (380, 548)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40164 reads in 2.835 CPU sec, 0.114 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2E_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2E_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.269 sec; CPU: 2.900 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40164 sequences (3835662 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10003, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.13, 14.39)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40164 reads in 2.821 CPU sec, 0.111 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2F_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2F_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.265 sec; CPU: 2.884 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40190 sequences (3838145 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10068, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.24, 14.37)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40190 reads in 2.563 CPU sec, 0.106 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2G_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2G_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.260 sec; CPU: 2.626 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40010 sequences (3820955 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10056, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.25, 14.46)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40010 reads in 2.893 CPU sec, 0.109 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2H_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/2H_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.264 sec; CPU: 2.958 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 39648 sequences (3786384 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10000, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.29, 14.39)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 39648 reads in 3.018 CPU sec, 0.108 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3I_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3I_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.264 sec; CPU: 3.084 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40200 sequences (3839100 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10060, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.19, 14.41)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40200 reads in 3.027 CPU sec, 0.133 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3J_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3J_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.287 sec; CPU: 3.090 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 40152 sequences (3834516 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 10123, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.27, 14.41)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 40152 reads in 3.084 CPU sec, 0.120 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3K_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3K_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.282 sec; CPU: 3.155 sec\n", "[M::bwa_idx_load_from_disk] read 0 ALT contigs\n", "[M::process] read 39864 sequences (3807012 bp)...\n", "[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 9964, 0, 0)\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (452, 464, 477)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (402, 527)\n", "[M::mem_pestat] mean and std.dev: (464.23, 14.40)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (377, 552)\n", "[M::mem_pestat] skip orientation RF as there are not enough pairs\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_process_seqs] Processed 39864 reads in 3.026 CPU sec, 0.122 real sec\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 /home/iovercast/manuscript-analysis/REFMAP_SIM/ipsimdata/pairddrad_wmerge_example_genome.fa /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3L_0.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/refmap-sim_edits/3L_0.trimmed_R2_.fastq.gz\n", "[main] Real time: 0.286 sec; CPU: 3.100 sec\n" ] } ], "source": [ "IPYRAD_SIMEDITS_DIR = IPYRAD_SIM_DIR + \"reference-assembly/refmap-sim_edits/\"\n", "REF_SEQ = REFMAP_DAT_DIR + \"pairddrad_wmerge_example_genome.fa\"\n", "\n", "## Sim sample names\n", "pop1 = [\"1A_0\", \"1B_0\", \"1C_0\", \"1D_0\"]\n", "pop2 = [\"2E_0\", \"2F_0\", \"2G_0\", \"2H_0\"]\n", "pop3 = [\"3I_0\", \"3J_0\", \"3K_0\", \"3L_0\"]\n", "sim_sample_names = pop1 + pop2 + pop3\n", "\n", "for samp in sim_sample_names:\n", " R1 = IPYRAD_SIMEDITS_DIR + samp + \".trimmed_R1_.fastq.gz\"\n", " R2 = IPYRAD_SIMEDITS_DIR + samp + \".trimmed_R2_.fastq.gz\"\n", " samout = STACKS_SIM_DIR + samp + \".sam\"\n", " bamout = STACKS_SIM_DIR + samp + \".bam\"\n", " export_cmd = \"export PATH=~/manuscript-analysis/dDocent:$PATH\"\n", " bwa_cmd = \"bwa mem -t 40 -v 1 \" + REF_SEQ\\\n", " + \" \" + R1\\\n", " + \" \" + R2\\\n", " + \" > \" + samout\n", " samtools_cmd = \"samtools view -b -F 0x804 \" + samout\\\n", " + \" | samtools sort -T /tmp/{}.sam -O bam -o {}\".format(samp, bamout)\n", " cleanup_cmd = \"rm {}\".format(samout)\n", " cmd = \";\".join([export_cmd, bwa_cmd, samtools_cmd, cleanup_cmd])\n", " !$cmd" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Command to run - ref_map.pl -T 40 -b 1 -S -d -X 'populations:--vcf --genepop --structure --phylip ' -X 'populations:-m 6' -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/ -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1A_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1B_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1C_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1D_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2E_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2F_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2G_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2H_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3I_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3J_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3K_0.bam -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3L_0.bam \n" ] } ], "source": [ "## This is how we'd do it since we weren't using a popmap file\n", "infiles = [\"-s \"+ff+\" \" for ff in glob.glob(STACKS_SIM_DIR + \"*.bam\")]\n", "\n", "## Toggle the dryrun flag for testing\n", "DRYRUN=\"\"\n", "DRYRUN=\"-d\"\n", "\n", "## Options\n", "## -T The number of threads to use\n", "## -O The popmap file specifying individuals and populations\n", "## -S Disable database business\n", "## -o Output directory. Just write to the empirical stacks directory\n", "## -X Tell populations to create the output formats specified\n", "## -X and use `-m 6` which sets min depth per locus\n", "OUTPUT_FORMATS = \"--vcf --genepop --structure --phylip \"\n", "cmd = \"ref_map.pl -T 40 -b 1 -S \" + DRYRUN\\\n", " + \" -X \\'populations:\" + OUTPUT_FORMATS + \"\\'\"\\\n", " + \" -X \\'populations:-m 6\\'\"\\\n", " + \" -o \" + STACKS_SIM_DIR + \" \"\\\n", " + \" \".join(infiles)\n", "print(\"\\nCommand to run - {}\".format(cmd))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Identifying unique stacks; file 1 of 12 [1A_0]\n", "Identifying unique stacks; file 2 of 12 [1B_0]\n", "Identifying unique stacks; file 3 of 12 [1C_0]\n", "Identifying unique stacks; file 4 of 12 [1D_0]\n", "Identifying unique stacks; file 5 of 12 [2E_0]\n", "Identifying unique stacks; file 6 of 12 [2F_0]\n", "Identifying unique stacks; file 7 of 12 [2G_0]\n", "Identifying unique stacks; file 8 of 12 [2H_0]\n", "Identifying unique stacks; file 9 of 12 [3I_0]\n", "Identifying unique stacks; file 10 of 12 [3J_0]\n", "Identifying unique stacks; file 11 of 12 [3K_0]\n", "Identifying unique stacks; file 12 of 12 [3L_0]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Found 12 sample file(s).\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1A_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 1 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1B_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 2 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1C_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 3 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1D_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 4 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2E_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 5 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2F_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 6 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2G_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 7 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2H_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 8 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3I_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 9 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3J_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 10 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3K_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 11 -p 40 -m 1 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3L_0.bam -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -i 12 -p 40 -m 1 2>&1\n", "\n", "Depths of Coverage for Processed Samples:\n", "\n", "Generating catalog...\n", " /home/iovercast/manuscript-analysis/miniconda/bin/cstacks -g -b 1 -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1A_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1B_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1C_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1D_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2E_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2F_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2G_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2H_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3I_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3J_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3K_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3L_0 -p 40 2>&1\n", "Matching samples to the catalog...\n", " /home/iovercast/manuscript-analysis/miniconda/bin/sstacks -g -b 1 -c /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/batch_1 -o /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1A_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1B_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1C_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/1D_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2E_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2F_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2G_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/2H_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3I_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3J_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3K_0 -s /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks/3L_0 -p 40 2>&1\n", "Calculating population-level summary statistics\n", "/home/iovercast/manuscript-analysis/miniconda/bin/populations -b 1 -P /home/iovercast/manuscript-analysis/REFMAP_SIM/stacks -s -t 40 --vcf --genepop --structure --phylip -m 6 2>&1\n", "\n", "real\t0m0.040s\n", "user\t0m0.030s\n", "sys\t0m0.007s\n" ] } ], "source": [ "%%bash -s \"$WORK_DIR\" \"$STACKS_SIM_DIR\" \"$cmd\"\n", "export PATH=\"$1/miniconda/bin:$PATH\"; export \"STACKS_SIM_DIR=$2\"; export \"cmd=$3\"\n", "\n", "## We have to play a little cat and mouse game here because of quoting in some of the args\n", "## and how weird bash is we have to write the cmd to a file and then exec it.\n", "## If you try to just run $cmd it truncates the command at the first single tic. Hassle.\n", "cd $STACKS_SIM_DIR\n", "echo $cmd > stacks.sh; chmod 777 stacks.sh\n", "time ./stacks.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## dDocent simulated reference assembly\n", "dDocent reference sequence mapping is not clearly documented. It turns out you can skip the initial assembly and go right to mapping/snp calling, but it requires you to copy the reference sequence to the dDocent workind directory as \"reference.fasta\". This uses the trimmed and QC'd fastq files from the ipyrad simulated run, so it assumes you have already done that and the fq files are still in place." ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; time dDocent /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent//sim-config.txt\n" ] } ], "source": [ "IPYRAD_SIMEDITS_DIR = IPYRAD_SIM_DIR + \"reference-assembly/refmap-sim_edits/\"\n", "REF_SEQ = REFMAP_DAT_DIR + \"pairddrad_wmerge_example_genome.fa\"\n", "DDOCENT_DIR = \"/home/iovercast/manuscript-analysis/dDocent/\"\n", "os.chdir(DDOCENT_SIM_DIR)\n", "\n", "## Create a simlink to the reference sequence in the current directory\n", "cmd = \"ln -sf {} reference.fasta\".format(REF_SEQ)\n", "!$cmd\n", "\n", "## Sim sample names\n", "pop1 = [\"1A_0\", \"1B_0\", \"1C_0\", \"1D_0\"]\n", "pop2 = [\"2E_0\", \"2F_0\", \"2G_0\", \"2H_0\"]\n", "pop3 = [\"3I_0\", \"3J_0\", \"3K_0\", \"3L_0\"]\n", "sim_sample_names = pop1 + pop2 + pop3\n", "sim_mapping_dict = {}\n", "\n", "for pop_num, samps in enumerate([pop1, pop2, pop3]):\n", " for samp_num, samp_name in enumerate(samps):\n", " sim_mapping_dict[samp_name] = \"Pop{}_{:03d}\".format(pop_num+1, samp_num+1)\n", "\n", "## Now we have to rename all the files in the way dDocent expects them:\n", "## 1A_0_R1_.fastq.gz -> Pop1_001.F.fq.gz\n", "for k, v in sim_mapping_dict.items():\n", " ## Symlink R1 and R2\n", " for i in [\"1\", \"2\"]:\n", " source = os.path.join(IPYRAD_SIMEDITS_DIR, k + \".trimmed_R{}_.fastq.gz\".format(i))\n", " ## This is the way the current documentation says to name imported trimmed\n", " ## files, but it doesn't work.\n", " ## dest = os.path.join(DDOCENT_SIM_DIR, v + \".R{}.fq.gz\".format(i))\n", " if i == \"1\":\n", " dest = os.path.join(DDOCENT_SIM_DIR, v + \".F.fq.gz\".format(i))\n", " else:\n", " dest = os.path.join(DDOCENT_SIM_DIR, v + \".R.fq.gz\".format(i))\n", " cmd = \"ln -sf {} {}\".format(source, dest)\n", " !$cmd\n", "\n", "config_file = \"{}/sim-config.txt\".format(DDOCENT_SIM_DIR)\n", "with open(config_file, 'w') as outfile:\n", " outfile.write('Number of Processors\\n40\\nMaximum Memory\\n0\\nTrimming\\nno\\nAssembly?\\nno\\nType_of_Assembly\\nPE\\nClustering_Similarity%\\n0.85\\nMapping_Reads?\\nyes\\nMapping_Match_Value\\n1\\nMapping_MisMatch_Value\\n3\\nMapping_GapOpen_Penalty\\n5\\nCalling_SNPs?\\nyes\\nEmail\\nwatdo@mailinator.com\\n')\n", "\n", "cmd = \"export LD_LIBRARY_PATH={}/freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; \".format(DDOCENT_DIR)\n", "cmd += \"export PATH={}:$PATH; time dDocent {}\".format(DDOCENT_DIR, config_file)\n", "print(cmd)\n", "with open(\"ddocent.sh\", 'w') as outfile:\n", " outfile.write(\"#!/bin/bash\\n\")\n", " outfile.write(cmd)\n", "!chmod 777 ddocent.sh" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finalizing - /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/TotalRawSNPs.vcf\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; vcfallelicprimitives /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/TotalRawSNPs.vcf > ddoc-tmp.vcf\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; vcftools --vcf ddoc-tmp.vcf --remove-indels --recode --recode-INFO-all --out /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/TotalRawSNPs.snps.vcf\n", "\n", "VCFtools - v0.1.11\n", "(C) Adam Auton 2009\n", "\n", "Parameters as interpreted:\n", "\t--vcf ddoc-tmp.vcf\n", "\t--recode-INFO-all\n", "\t--out /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/TotalRawSNPs.snps.vcf\n", "\t--recode\n", "\t--remove-indels\n", "\n", "Index file is older than variant file. Will regenerate.\n", "Building new index file.\n", "\tScanning Chromosome: MT\n", "Writing Index file.\n", "File contains 4842 entries and 12 individuals.\n", "Applying Required Filters.\n", "Filtering sites by allele type\n", "After filtering, kept 12 out of 12 Individuals\n", "After filtering, kept 4840 out of a possible 4842 Sites\n", "Outputting VCF file... Done\n", "Run Time = 0.00 seconds\n", "Finalizing - /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/Final.recode.vcf\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; vcfallelicprimitives /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/Final.recode.vcf > ddoc-tmp.vcf\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; vcftools --vcf ddoc-tmp.vcf --remove-indels --recode --recode-INFO-all --out /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/Final.recode.snps.vcf\n", "\n", "VCFtools - v0.1.11\n", "(C) Adam Auton 2009\n", "\n", "Parameters as interpreted:\n", "\t--vcf ddoc-tmp.vcf\n", "\t--recode-INFO-all\n", "\t--out /home/iovercast/manuscript-analysis/REFMAP_SIM/ddocent/Final.recode.snps.vcf\n", "\t--recode\n", "\t--remove-indels\n", "\n", "Index file is older than variant file. Will regenerate.\n", "Building new index file.\n", "\tScanning Chromosome: MT\n", "Writing Index file.\n", "File contains 4723 entries and 12 individuals.\n", "Applying Required Filters.\n", "Filtering sites by allele type\n", "After filtering, kept 12 out of 12 Individuals\n", "After filtering, kept 4722 out of a possible 4723 Sites\n", "Outputting VCF file... Done\n", "Run Time = 1.00 seconds\n" ] } ], "source": [ "## You have to post-process the vcf files to decompose complex genotypes and remove indels\n", "os.chdir(DDOCENT_SIM_DIR)\n", "exports = \"export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH\"\n", "\n", "fullvcf = os.path.join(DDOCENT_SIM_DIR, \"TotalRawSNPs.vcf\")\n", "filtvcf = os.path.join(DDOCENT_SIM_DIR, \"Final.recode.vcf\")\n", "for f in [fullvcf, filtvcf]:\n", " print(\"Finalizing - {}\".format(f))\n", " \n", " ## Rename the samples to make them agree with the ipyrad/stacks names so\n", " ## the results analysis will work.\n", " vcffile = os.path.join(DDOCENT_SIM_DIR, f)\n", " infile = open(vcffile,'r')\n", " filedata = infile.readlines()\n", " infile.close()\n", " \n", " outfile = open(vcffile,'w')\n", " for line in filedata:\n", " if \"CHROM\" in line:\n", " for ipname, ddname in sim_mapping_dict.items():\n", " line = line.replace(ddname, ipname)\n", " outfile.write(line)\n", " outfile.close()\n", " \n", " ## Naming the new outfiles as .snps.vcf\n", " ## Decompose complex genotypes and remove indels\n", " outfile = os.path.join(DDOCENT_SIM_DIR, f.split(\"/\")[-1].split(\".vcf\")[0] + \".snps.vcf\")\n", " cmd = \"{}; vcfallelicprimitives {} > ddoc-tmp.vcf\".format(exports, f)\n", " print(cmd)\n", " !$cmd\n", " cmd = \"{}; vcftools --vcf ddoc-tmp.vcf --remove-indels --recode --recode-INFO-all --out {}\".format(exports, outfile)\n", " print(cmd)\n", " !$cmd\n", " !rm ddoc-tmp.vcf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Empirical reference sequence mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetch the Phocoena raw sequence data \n", "We will use the sra-toolkit command `fastq-dump` to pull the PE reads out of SRA. \n", "This maybe isn't the best way, or the quickest, but it'll get the job done. Takes \n", "~30 minutes and requires ~70GB of space. After I downloaded the fq I looked at a\n", "couple random samples in FastQC to get an idea where to trim in step2." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Doing 662\tDoing 663\tDoing 664\tDoing 665\tDoing 666\tDoing 667\tDoing 668\tDoing 669\tDoing 670\tDoing 671\tDoing 672\tDoing 673\tDoing 674\tDoing 675\tDoing 676\tDoing 677\tDoing 678\tDoing 679\tDoing 680\tDoing 681\tDoing 682\tDoing 683\tDoing 684\tDoing 685\tDoing 686\tDoing 687\tDoing 688\tDoing 689\tDoing 690\tDoing 691\tDoing 692\tDoing 693\tDoing 694\tDoing 695\tDoing 696\tDoing 697\tDoing 698\tDoing 699\tDoing 700\tDoing 701\tDoing 702\t" ] } ], "source": [ "os.chdir(REFMAP_EMPIRICAL_DIR)\n", "!mkdir raws\n", "!cd raws\n", "## Grab the sra-toolkit pre-built binaries to download from SRA\n", "## This works, but commented for now so it doesn't keep redownloading\n", "!wget http://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.8.0/sratoolkit.2.8.0-ubuntu64.tar.gz\n", "!tar -xvzf sratoolkit*\n", "FQ_DUMP = os.path.join(REFMAP_EMPIRICAL_DIR, \"sratoolkit.2.8.0-ubuntu64/bin/fastq-dump\")\n", "res = subprocess.check_output(FQ_DUMP + \" -version\", shell=True)\n", "\n", "## The SRR numbers for the samples from this bioproject range from SRR4291662 to SRR4291705\n", "## so go fetch them one by one\n", "for samp in range(662, 706):\n", " print(\"Doing {}\\t\".format(samp)),\n", " res = subprocess.check_output(FQ_DUMP + \" --split-files SRR4291\" + str(samp), shell=True)\n" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [], "source": [ "## The SRA download files have wonky names, like SRR1234_R1.fastq.gz, but ipyrad expects SRR1234_R1_.fastq.gz,\n", "## so we have to fix the filenames. Filename hax...\n", "import glob\n", "for f in glob.glob(REFMAP_EMPIRICAL_DIR + \"raws/*.fastq.gz\"):\n", " splits = f.split(\"/\")[-1].split(\"_\")\n", " newf = REFMAP_EMPIRICAL_DIR + \"raws/\" + splits[0] + \"_R\" + splits[1].split(\".\")[0] + \"_.fastq.gz\"\n", " os.rename(f, newf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetch the bottlenose dolphin genome\n", "Tursiops truncatus reference genome. Divergence time between dolphin and porpoise is approximately 15Mya, which is on the order of divergence between humans and orang. There is also a genome for the Minke whale, which is much more deeply diverged (~30Mya), could be interesting to try both to see how it works.\n", "\n", "Minke whale - http://www.nature.com/ng/journal/v46/n1/full/ng.2835.html#accessions\n", "\n", "SRA Data table for converting fq files to sample name as used in the paper: file:///home/chronos/u-b20882bda0f801c7265d4462f127d0cb4376d46d/Downloads/Tune/SraRunTable.txt" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2016-12-18 10:29:57-- ftp://ftp.ensembl.org/pub/release-87/fasta/tursiops_truncatus/dna/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz\n", " => ‘Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz’\n", "Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.203.85\n", "Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.203.85|:21... connected.\n", "Logging in as anonymous ... Logged in!\n", "==> SYST ... done. ==> PWD ... done.\n", "==> TYPE I ... done. ==> CWD (1) /pub/release-87/fasta/tursiops_truncatus/dna ... done.\n", "==> SIZE Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz ... 453168562\n", "==> PASV ... done. ==> RETR Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz ... done.\n", "Length: 453168562 (432M) (unauthoritative)\n", "\n", "100%[======================================>] 453,168,562 2.88MB/s in 3m 33s \n", "\n", "2016-12-18 10:33:32 (2.03 MB/s) - ‘Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz’ saved [453168562]\n", "\n" ] } ], "source": [ "os.chdir(REFMAP_EMPIRICAL_DIR)\n", "!mkdir TurtrunRef\n", "!cd TurtrunRef\n", "!wget ftp://ftp.ensembl.org/pub/release-87/fasta/tursiops_truncatus/dna/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz\n", "\n", "## Ensembl distributes gzip'd reference sequence files, but samtools really wants it to be bgzipped or uncompressed\n", "!gunzip Tursiops_truncatus.turTru1.dna_rm.toplevel.fa.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Trim reads w/ cutadapt\n", "To reduce any potential bias introduced by differences in trimming and filtering methods we will trim reads w/ cutadapt, and QC w/ ipyrad and use this dataset as the starting point for assembly. If you are wondering, you __can__ run fastqc from the command line, like this\n", "\n", " fastqc -o fastqc_out/ SRR4291662_1.fastq SRR4291662_2.fastq\n", " \n", "We'll trim R1 and R2 to 85bp following Lah et al 2016. The `-l` flag for cutadapt specifies the length to which each read will be trimmed." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SRR4291662_R1_.fastq.gz\n", "SRR4291662_R2_.fastq.gz\n", "SRR4291663_R1_.fastq.gz\n", "SRR4291663_R2_.fastq.gz\n", "SRR4291664_R1_.fastq.gz\n", "SRR4291664_R2_.fastq.gz\n", "SRR4291665_R1_.fastq.gz\n", "SRR4291665_R2_.fastq.gz\n", "SRR4291666_R1_.fastq.gz\n", "SRR4291666_R2_.fastq.gz\n", "SRR4291667_R1_.fastq.gz\n", "SRR4291667_R2_.fastq.gz\n", "SRR4291668_R1_.fastq.gz\n", "SRR4291668_R2_.fastq.gz\n", "SRR4291669_R1_.fastq.gz\n", "SRR4291669_R2_.fastq.gz\n", "SRR4291670_R1_.fastq.gz\n", "SRR4291670_R2_.fastq.gz\n", "SRR4291671_R1_.fastq.gz\n", "SRR4291671_R2_.fastq.gz\n", "SRR4291672_R1_.fastq.gz\n", "SRR4291672_R2_.fastq.gz\n", "SRR4291673_R1_.fastq.gz\n", "SRR4291673_R2_.fastq.gz\n", "SRR4291674_R1_.fastq.gz\n", "SRR4291674_R2_.fastq.gz\n", "SRR4291675_R1_.fastq.gz\n", "SRR4291675_R2_.fastq.gz\n", "SRR4291676_R1_.fastq.gz\n", "SRR4291676_R2_.fastq.gz\n", "SRR4291677_R1_.fastq.gz\n", "SRR4291677_R2_.fastq.gz\n", "SRR4291678_R1_.fastq.gz\n", "SRR4291678_R2_.fastq.gz\n", "SRR4291679_R1_.fastq.gz\n", "SRR4291679_R2_.fastq.gz\n", "SRR4291680_R1_.fastq.gz\n", "SRR4291680_R2_.fastq.gz\n", "SRR4291681_R1_.fastq.gz\n", "SRR4291681_R2_.fastq.gz\n", "SRR4291682_R1_.fastq.gz\n", "SRR4291682_R2_.fastq.gz\n", "SRR4291683_R1_.fastq.gz\n", "SRR4291683_R2_.fastq.gz\n", "SRR4291684_R1_.fastq.gz\n", "SRR4291684_R2_.fastq.gz\n", "SRR4291685_R1_.fastq.gz\n", "SRR4291685_R2_.fastq.gz\n", "SRR4291686_R1_.fastq.gz\n", "SRR4291686_R2_.fastq.gz\n", "SRR4291687_R1_.fastq.gz\n", "SRR4291687_R2_.fastq.gz\n", "SRR4291688_R1_.fastq.gz\n", "SRR4291688_R2_.fastq.gz\n", "SRR4291689_R1_.fastq.gz\n", "SRR4291689_R2_.fastq.gz\n", "SRR4291690_R1_.fastq.gz\n", "SRR4291690_R2_.fastq.gz\n", "SRR4291691_R1_.fastq.gz\n", "SRR4291691_R2_.fastq.gz\n", "SRR4291692_R1_.fastq.gz\n", "SRR4291692_R2_.fastq.gz\n", "SRR4291693_R1_.fastq.gz\n", "SRR4291693_R2_.fastq.gz\n", "SRR4291694_R1_.fastq.gz\n", "SRR4291694_R2_.fastq.gz\n", "SRR4291695_R1_.fastq.gz\n", "SRR4291695_R2_.fastq.gz\n", "SRR4291696_R1_.fastq.gz\n", "SRR4291696_R2_.fastq.gz\n", "SRR4291697_R1_.fastq.gz\n", "SRR4291697_R2_.fastq.gz\n", "SRR4291698_R1_.fastq.gz\n", "SRR4291698_R2_.fastq.gz\n", "SRR4291699_R1_.fastq.gz\n", "SRR4291699_R2_.fastq.gz\n", "SRR4291700_R1_.fastq.gz\n", "SRR4291700_R2_.fastq.gz\n", "SRR4291701_R1_.fastq.gz\n", "SRR4291701_R2_.fastq.gz\n", "SRR4291702_R1_.fastq.gz\n", "SRR4291702_R2_.fastq.gz\n", "SRR4291703_R1_.fastq.gz\n", "SRR4291703_R2_.fastq.gz\n", "SRR4291704_R1_.fastq.gz\n", "SRR4291704_R2_.fastq.gz\n", "SRR4291705_R1_.fastq.gz\n", "SRR4291705_R2_.fastq.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "mkdir: cannot create directory ‘trimmed’: File exists\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291662_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.66 s (6 us/read; 10.37 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,225,657\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,225,657 (100.0%)\n", "\n", "Total basepairs processed: 302,535,710 bp\n", "Total written (filtered): 273,671,316 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291662_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.43 s (6 us/read; 10.50 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,225,657\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,225,657 (100.0%)\n", "\n", "Total basepairs processed: 321,680,483 bp\n", "Total written (filtered): 273,633,566 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291663_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.78 s (6 us/read; 10.56 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,602,278\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,602,278 (100.0%)\n", "\n", "Total basepairs processed: 241,637,058 bp\n", "Total written (filtered): 220,902,242 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291663_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.96 s (6 us/read; 10.44 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,602,278\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,602,278 (100.0%)\n", "\n", "Total basepairs processed: 259,728,172 bp\n", "Total written (filtered): 220,886,902 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291664_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 7.47 s (6 us/read; 10.51 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,308,140\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,308,140 (100.0%)\n", "\n", "Total basepairs processed: 124,071,961 bp\n", "Total written (filtered): 111,051,112 bp (89.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291664_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 7.39 s (6 us/read; 10.62 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,308,140\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,308,140 (100.0%)\n", "\n", "Total basepairs processed: 130,566,117 bp\n", "Total written (filtered): 111,041,864 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291665_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.12 s (6 us/read; 10.51 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,422,513\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,422,513 (100.0%)\n", "\n", "Total basepairs processed: 132,119,445 bp\n", "Total written (filtered): 120,780,161 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291665_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.14 s (6 us/read; 10.49 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,422,513\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,422,513 (100.0%)\n", "\n", "Total basepairs processed: 142,019,450 bp\n", "Total written (filtered): 120,774,460 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291666_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.56 s (6 us/read; 10.47 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,237,438\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,237,438 (100.0%)\n", "\n", "Total basepairs processed: 300,710,625 bp\n", "Total written (filtered): 274,900,243 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291666_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.72 s (6 us/read; 10.38 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,237,438\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,237,438 (100.0%)\n", "\n", "Total basepairs processed: 323,231,561 bp\n", "Total written (filtered): 274,880,218 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291667_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 23.73 s (6 us/read; 10.54 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,170,153\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,170,153 (100.0%)\n", "\n", "Total basepairs processed: 390,976,823 bp\n", "Total written (filtered): 353,700,081 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291667_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 23.96 s (6 us/read; 10.44 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,170,153\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,170,153 (100.0%)\n", "\n", "Total basepairs processed: 415,710,634 bp\n", "Total written (filtered): 353,654,174 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291668_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.04 s (6 us/read; 10.33 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,450,882\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,450,882 (100.0%)\n", "\n", "Total basepairs processed: 310,304,790 bp\n", "Total written (filtered): 293,093,219 bp (94.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291668_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.70 s (6 us/read; 10.51 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,450,882\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,450,882 (100.0%)\n", "\n", "Total basepairs processed: 344,647,193 bp\n", "Total written (filtered): 293,069,094 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291669_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 21.57 s (6 us/read; 10.18 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,658,457\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,658,457 (100.0%)\n", "\n", "Total basepairs processed: 332,569,307 bp\n", "Total written (filtered): 310,678,470 bp (93.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291669_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 21.07 s (6 us/read; 10.42 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,658,457\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,658,457 (100.0%)\n", "\n", "Total basepairs processed: 365,337,470 bp\n", "Total written (filtered): 310,656,388 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291670_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.09 s (6 us/read; 10.59 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,428,040\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,428,040 (100.0%)\n", "\n", "Total basepairs processed: 135,277,065 bp\n", "Total written (filtered): 121,110,383 bp (89.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291670_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.11 s (6 us/read; 10.57 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,428,040\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,428,040 (100.0%)\n", "\n", "Total basepairs processed: 142,339,159 bp\n", "Total written (filtered): 121,090,656 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291671_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 21.11 s (6 us/read; 10.59 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,726,102\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,726,102 (100.0%)\n", "\n", "Total basepairs processed: 346,070,011 bp\n", "Total written (filtered): 316,369,233 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291671_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.70 s (6 us/read; 10.80 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,726,102\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,726,102 (100.0%)\n", "\n", "Total basepairs processed: 371,995,572 bp\n", "Total written (filtered): 316,342,060 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291672_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 26.01 s (8 us/read; 7.51 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,256,943\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,256,943 (100.0%)\n", "\n", "Total basepairs processed: 305,656,861 bp\n", "Total written (filtered): 276,475,866 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291672_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.99 s (6 us/read; 9.31 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,256,943\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,256,943 (100.0%)\n", "\n", "Total basepairs processed: 325,037,311 bp\n", "Total written (filtered): 276,448,443 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291673_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 12.23 s (6 us/read; 10.44 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,128,431\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,128,431 (100.0%)\n", "\n", "Total basepairs processed: 199,673,694 bp\n", "Total written (filtered): 180,616,031 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291673_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 11.72 s (6 us/read; 10.90 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,128,431\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,128,431 (100.0%)\n", "\n", "Total basepairs processed: 212,332,918 bp\n", "Total written (filtered): 180,599,472 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291674_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.54 s (6 us/read; 10.76 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,606,704\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,606,704 (100.0%)\n", "\n", "Total basepairs processed: 234,287,729 bp\n", "Total written (filtered): 221,303,963 bp (94.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291674_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.52 s (6 us/read; 10.77 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,606,704\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,606,704 (100.0%)\n", "\n", "Total basepairs processed: 260,154,896 bp\n", "Total written (filtered): 221,272,346 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291675_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.16 s (6 us/read; 10.71 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,242,962\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,242,962 (100.0%)\n", "\n", "Total basepairs processed: 297,982,996 bp\n", "Total written (filtered): 275,355,483 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291675_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.60 s (6 us/read; 10.46 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,242,962\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,242,962 (100.0%)\n", "\n", "Total basepairs processed: 323,769,609 bp\n", "Total written (filtered): 275,327,109 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291676_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.46 s (6 us/read; 10.23 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,466,111\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,466,111 (100.0%)\n", "\n", "Total basepairs processed: 221,799,201 bp\n", "Total written (filtered): 209,492,582 bp (94.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291676_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 14.33 s (6 us/read; 10.33 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,466,111\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,466,111 (100.0%)\n", "\n", "Total basepairs processed: 246,375,382 bp\n", "Total written (filtered): 209,484,079 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291677_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.64 s (6 us/read; 10.38 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,397,721\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,397,721 (100.0%)\n", "\n", "Total basepairs processed: 308,581,562 bp\n", "Total written (filtered): 288,301,569 bp (93.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291677_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.30 s (6 us/read; 10.56 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,397,721\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,397,721 (100.0%)\n", "\n", "Total basepairs processed: 338,887,177 bp\n", "Total written (filtered): 288,267,890 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291678_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.95 s (6 us/read; 9.65 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,048,039\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,048,039 (100.0%)\n", "\n", "Total basepairs processed: 276,868,465 bp\n", "Total written (filtered): 258,666,672 bp (93.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291678_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 17.05 s (6 us/read; 10.73 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,048,039\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,048,039 (100.0%)\n", "\n", "Total basepairs processed: 304,094,709 bp\n", "Total written (filtered): 258,647,431 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291679_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 23.03 s (6 us/read; 10.75 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,127,787\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,127,787 (100.0%)\n", "\n", "Total basepairs processed: 378,825,733 bp\n", "Total written (filtered): 350,115,785 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291679_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 23.27 s (6 us/read; 10.64 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,127,787\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,127,787 (100.0%)\n", "\n", "Total basepairs processed: 411,474,061 bp\n", "Total written (filtered): 350,061,395 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291680_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 16.20 s (6 us/read; 10.81 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,917,555\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,917,555 (100.0%)\n", "\n", "Total basepairs processed: 265,159,848 bp\n", "Total written (filtered): 247,711,645 bp (93.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291680_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 21.15 s (7 us/read; 8.28 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,917,555\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,917,555 (100.0%)\n", "\n", "Total basepairs processed: 291,272,774 bp\n", "Total written (filtered): 247,692,864 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291681_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.07 s (6 us/read; 10.91 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,466,670\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,466,670 (100.0%)\n", "\n", "Total basepairs processed: 325,229,207 bp\n", "Total written (filtered): 294,186,403 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291681_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.76 s (6 us/read; 10.53 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,466,670\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,466,670 (100.0%)\n", "\n", "Total basepairs processed: 345,853,871 bp\n", "Total written (filtered): 294,161,220 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291682_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.14 s (6 us/read; 10.49 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,519,888\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,519,888 (100.0%)\n", "\n", "Total basepairs processed: 326,570,566 bp\n", "Total written (filtered): 298,588,290 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291682_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.88 s (6 us/read; 10.62 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,519,888\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,519,888 (100.0%)\n", "\n", "Total basepairs processed: 350,929,362 bp\n", "Total written (filtered): 298,543,164 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291683_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 11.83 s (6 us/read; 10.47 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,064,559\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,064,559 (100.0%)\n", "\n", "Total basepairs processed: 187,046,314 bp\n", "Total written (filtered): 174,797,570 bp (93.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291683_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 11.70 s (6 us/read; 10.59 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,064,559\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,064,559 (100.0%)\n", "\n", "Total basepairs processed: 205,296,143 bp\n", "Total written (filtered): 174,759,321 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291684_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 17.51 s (6 us/read; 10.58 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,086,468\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,086,468 (100.0%)\n", "\n", "Total basepairs processed: 289,461,086 bp\n", "Total written (filtered): 261,848,374 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291684_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 17.64 s (6 us/read; 10.50 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,086,468\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,086,468 (100.0%)\n", "\n", "Total basepairs processed: 307,783,378 bp\n", "Total written (filtered): 261,819,338 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291685_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 13.52 s (6 us/read; 10.86 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,446,083\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,446,083 (100.0%)\n", "\n", "Total basepairs processed: 229,320,135 bp\n", "Total written (filtered): 207,457,048 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291685_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 13.74 s (6 us/read; 10.68 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,446,083\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,446,083 (100.0%)\n", "\n", "Total basepairs processed: 243,821,853 bp\n", "Total written (filtered): 207,429,219 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291686_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.07 s (6 us/read; 10.63 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,201,430\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,201,430 (100.0%)\n", "\n", "Total basepairs processed: 287,489,343 bp\n", "Total written (filtered): 271,572,152 bp (94.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291686_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 24.01 s (7 us/read; 8.00 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,201,430\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,201,430 (100.0%)\n", "\n", "Total basepairs processed: 319,198,511 bp\n", "Total written (filtered): 271,529,648 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291687_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.34 s (6 us/read; 10.62 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,245,019\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,245,019 (100.0%)\n", "\n", "Total basepairs processed: 291,561,672 bp\n", "Total written (filtered): 275,408,840 bp (94.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291687_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 17.94 s (6 us/read; 10.85 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,245,019\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,245,019 (100.0%)\n", "\n", "Total basepairs processed: 323,754,288 bp\n", "Total written (filtered): 275,370,853 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291688_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.72 s (6 us/read; 10.76 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,564,117\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,564,117 (100.0%)\n", "\n", "Total basepairs processed: 146,438,492 bp\n", "Total written (filtered): 132,507,950 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291688_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 8.69 s (6 us/read; 10.80 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,564,117\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,564,117 (100.0%)\n", "\n", "Total basepairs processed: 155,657,876 bp\n", "Total written (filtered): 132,487,616 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291689_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.13 s (6 us/read; 10.53 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,181,938\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,181,938 (100.0%)\n", "\n", "Total basepairs processed: 298,646,806 bp\n", "Total written (filtered): 270,124,604 bp (90.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291689_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 17.88 s (6 us/read; 10.68 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,181,938\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,181,938 (100.0%)\n", "\n", "Total basepairs processed: 317,599,989 bp\n", "Total written (filtered): 270,100,955 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291690_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 15.60 s (6 us/read; 10.29 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,674,772\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,674,772 (100.0%)\n", "\n", "Total basepairs processed: 253,734,600 bp\n", "Total written (filtered): 227,093,686 bp (89.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291690_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 15.38 s (6 us/read; 10.43 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,674,772\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,674,772 (100.0%)\n", "\n", "Total basepairs processed: 267,026,097 bp\n", "Total written (filtered): 227,081,425 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291691_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 35.33 s (6 us/read; 10.02 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 5,903,027\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 5,903,027 (100.0%)\n", "\n", "Total basepairs processed: 548,419,385 bp\n", "Total written (filtered): 501,329,616 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291691_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 34.24 s (6 us/read; 10.34 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 5,903,027\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 5,903,027 (100.0%)\n", "\n", "Total basepairs processed: 589,507,289 bp\n", "Total written (filtered): 501,290,530 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291692_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 6.55 s (6 us/read; 10.74 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,172,868\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,172,868 (100.0%)\n", "\n", "Total basepairs processed: 108,479,568 bp\n", "Total written (filtered): 99,222,769 bp (91.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291692_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 6.43 s (5 us/read; 10.94 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,172,868\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,172,868 (100.0%)\n", "\n", "Total basepairs processed: 116,504,462 bp\n", "Total written (filtered): 99,193,009 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291693_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 12.27 s (6 us/read; 10.63 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,174,717\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,174,717 (100.0%)\n", "\n", "Total basepairs processed: 199,675,649 bp\n", "Total written (filtered): 184,528,700 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291693_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 12.04 s (6 us/read; 10.84 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,174,717\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,174,717 (100.0%)\n", "\n", "Total basepairs processed: 216,926,920 bp\n", "Total written (filtered): 184,508,611 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291694_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.30 s (6 us/read; 10.74 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,632,242\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,632,242 (100.0%)\n", "\n", "Total basepairs processed: 340,683,039 bp\n", "Total written (filtered): 308,182,877 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291694_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.48 s (6 us/read; 10.64 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,632,242\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,632,242 (100.0%)\n", "\n", "Total basepairs processed: 362,256,218 bp\n", "Total written (filtered): 308,150,487 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291695_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 10.68 s (6 us/read; 10.43 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,856,718\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,856,718 (100.0%)\n", "\n", "Total basepairs processed: 173,941,660 bp\n", "Total written (filtered): 157,377,022 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291695_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 10.78 s (6 us/read; 10.33 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,856,718\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,856,718 (100.0%)\n", "\n", "Total basepairs processed: 184,905,679 bp\n", "Total written (filtered): 157,341,499 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291696_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.31 s (6 us/read; 10.39 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,345,439\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,345,439 (100.0%)\n", "\n", "Total basepairs processed: 307,309,204 bp\n", "Total written (filtered): 283,984,631 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291696_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.43 s (6 us/read; 10.89 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,345,439\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,345,439 (100.0%)\n", "\n", "Total basepairs processed: 333,884,503 bp\n", "Total written (filtered): 283,956,832 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291697_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.52 s (6 us/read; 10.40 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,210,823\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,210,823 (100.0%)\n", "\n", "Total basepairs processed: 294,875,650 bp\n", "Total written (filtered): 272,500,412 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291697_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.37 s (6 us/read; 10.49 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,210,823\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,210,823 (100.0%)\n", "\n", "Total basepairs processed: 320,361,719 bp\n", "Total written (filtered): 272,471,463 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291698_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 10.47 s (6 us/read; 10.26 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,790,978\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,790,978 (100.0%)\n", "\n", "Total basepairs processed: 167,594,570 bp\n", "Total written (filtered): 151,658,671 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291698_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 10.25 s (6 us/read; 10.48 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 1,790,978\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 1,790,978 (100.0%)\n", "\n", "Total basepairs processed: 178,125,550 bp\n", "Total written (filtered): 151,625,435 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291699_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 15.90 s (6 us/read; 10.63 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,816,420\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,816,420 (100.0%)\n", "\n", "Total basepairs processed: 261,543,984 bp\n", "Total written (filtered): 239,104,344 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291699_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 16.04 s (6 us/read; 10.54 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,816,420\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,816,420 (100.0%)\n", "\n", "Total basepairs processed: 281,129,007 bp\n", "Total written (filtered): 239,083,585 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291700_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 15.41 s (6 us/read; 10.39 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,668,256\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,668,256 (100.0%)\n", "\n", "Total basepairs processed: 249,952,844 bp\n", "Total written (filtered): 226,146,252 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291700_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 15.27 s (6 us/read; 10.48 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 2,668,256\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 2,668,256 (100.0%)\n", "\n", "Total basepairs processed: 265,721,099 bp\n", "Total written (filtered): 226,097,178 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291701_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 20.10 s (6 us/read; 10.37 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,474,962\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,474,962 (100.0%)\n", "\n", "Total basepairs processed: 315,839,017 bp\n", "Total written (filtered): 295,059,339 bp (93.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291701_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.76 s (6 us/read; 10.55 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,474,962\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,474,962 (100.0%)\n", "\n", "Total basepairs processed: 346,913,613 bp\n", "Total written (filtered): 295,029,825 bp (85.0%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291702_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 28.11 s (6 us/read; 10.50 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,918,786\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,918,786 (100.0%)\n", "\n", "Total basepairs processed: 460,455,562 bp\n", "Total written (filtered): 416,655,206 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291702_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 28.02 s (6 us/read; 10.53 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 4,918,786\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 4,918,786 (100.0%)\n", "\n", "Total basepairs processed: 489,428,555 bp\n", "Total written (filtered): 416,577,729 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291703_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.27 s (6 us/read; 10.46 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,186,044\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,186,044 (100.0%)\n", "\n", "Total basepairs processed: 295,683,146 bp\n", "Total written (filtered): 270,343,584 bp (91.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291703_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.17 s (6 us/read; 10.52 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,186,044\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,186,044 (100.0%)\n", "\n", "Total basepairs processed: 317,754,200 bp\n", "Total written (filtered): 270,312,770 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291704_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.37 s (6 us/read; 10.44 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,197,026\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,197,026 (100.0%)\n", "\n", "Total basepairs processed: 299,931,145 bp\n", "Total written (filtered): 271,300,448 bp (90.5%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291704_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 18.23 s (6 us/read; 10.52 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,197,026\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,197,026 (100.0%)\n", "\n", "Total basepairs processed: 318,941,014 bp\n", "Total written (filtered): 271,271,223 bp (85.1%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291705_R1_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.39 s (6 us/read; 10.60 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,424,403\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,424,403 (100.0%)\n", "\n", "Total basepairs processed: 314,439,868 bp\n", "Total written (filtered): 290,585,202 bp (92.4%)\n", "\n", "This is cutadapt 1.12 with Python 2.7.12\n", "Command line parameters: -l 85 raws/SRR4291705_R2_.fastq.gz\n", "Trimming 0 adapters with at most 10.0% errors in single-end mode ...\n", "Finished in 19.99 s (6 us/read; 10.28 M reads/minute).\n", "\n", "=== Summary ===\n", "\n", "Total reads processed: 3,424,403\n", "Reads with adapters: 0 (0.0%)\n", "Reads written (passing filters): 3,424,403 (100.0%)\n", "\n", "Total basepairs processed: 341,609,530 bp\n", "Total written (filtered): 290,553,612 bp (85.1%)\n", "\n" ] } ], "source": [ "%%bash -s \"$REFMAP_EMPIRICAL_DIR\"\n", "cd $1\n", "mkdir trimmed\n", "for i in `ls raws`; do echo $i; cutadapt -l 85 raws/$i | gzip > trimmed/$i; done\n", "## Housekeeping\n", "rm -rf raws\n", "mv trimmed raws" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Import reads into ipyrad and do QC\n", "Now take the trimmed reads and filter for adapters, minimum sequence length, and max low quality bases." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "IPYRAD_REFMAP_DIR = os.path.join(REFMAP_EMPIRICAL_DIR, \"ipyrad/\")\n", "if not os.path.exists(IPYRAD_REFMAP_DIR):\n", " os.makedirs(IPYRAD_REFMAP_DIR)\n", "os.chdir(IPYRAD_REFMAP_DIR)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " New Assembly: refmap-empirical\n", "ipyrad -p params-refmap-empirical.txt -s 1 --force\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " New Assembly: refmap-empirical\n", " local compute node: [40 cores] on node001\n", "\n", " Step 1: Loading sorted fastq data to Samples\n", " [####################] 100% loading reads | 0:00:21 \n", " 88 fastq files loaded to 44 Samples.\n", "\n", "\n", "real\t0m42.604s\n", "user\t0m2.510s\n", "sys\t0m0.791s\n", "ipyrad -p params-refmap-empirical.txt -s 2 --force\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: refmap-empirical\n", " from saved path: ~/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical.json\n", " local compute node: [40 cores] on node001\n", "\n", " Step 2: Filtering reads \n", " [####################] 100% processing reads | 0:13:44 \n", "\n", "\n", "real\t13m57.667s\n", "user\t0m10.699s\n", "sys\t0m0.653s\n" ] } ], "source": [ "## Make a new assembly and set some assembly parameters\n", "data = ip.Assembly(\"refmap-empirical\")\n", "data.set_params(\"sorted_fastq_path\", REFMAP_EMPIRICAL_DIR + \"raws/*.fastq.gz\")\n", "data.set_params(\"project_dir\", \"reference-assembly\")\n", "data.set_params(\"assembly_method\", \"reference\")\n", "data.set_params(\"reference_sequence\", REFMAP_EMPIRICAL_DIR + \"TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa\")\n", "data.set_params(\"datatype\", \"pairddrad\")\n", "data.set_params(\"restriction_overhang\", (\"TGCAG\", \"CGG\"))\n", "data.set_params('max_low_qual_bases', 5)\n", "data.set_params('filter_adapters', 2)\n", "\n", "data.write_params(force=True)\n", "\n", "cmd = \"ipyrad -p params-refmap-empirical.txt -s 1 --force\".format(dir)\n", "print(cmd)\n", "!time $cmd\n", "cmd = \"ipyrad -p params-refmap-empirical.txt -s 2 --force\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do ipyrad refmap empirical\n", "Here we will use the ipyrad 'reference' assembly method which will only use reads that successfully map to the reference sequence. This is similar and should be comparable to the way stacks incorporates reference sequences.\n", "\n", "Steps 1 & 2 run in ~15 minutes." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad -p params-refmap-empirical.txt -s 3\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: refmap-empirical\n", " from saved path: ~/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical.json\n", " local compute node: [40 cores] on node001\n", "\n", " Step 3: Clustering/Mapping reads\n", " [####################] 100% dereplicating | 0:10:16 \n", " [######### ] 45% mapping | 0:24:11 " ] } ], "source": [ "## Oops. If you run some other cell while this is running it steals stdout, so you lose track of progress.\n", "cmd = \"ipyrad -p params-refmap-empirical.txt -s 34567\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do ipyrad denovo+reference\n", "Create a new branch and set the assembly method to `denovo+reference`. " ] }, { "cell_type": "code", "execution_count": 174, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad -p params-denovo_ref-sim.txt -s 1234567 -c 40\n", "\n", " -------------------------------------------------------------\n", " ipyrad [v.0.5.15]\n", " Interactive assembly and analysis of RAD-seq data\n", " -------------------------------------------------------------\n", " loading Assembly: denovo_ref-sim\n", " from saved path: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_ref-sim.json\n", " New Assembly: denovo_ref-sim\n", " local compute node: [40 cores] on node001\n", "\n", " Step 1: Demultiplexing fastq data to Samples\n", " [####################] 100% chunking large files | 0:00:00 \n", " [####################] 100% sorting reads | 0:00:07 \n", " [####################] 100% writing/compressing | 0:00:01 \n", "\n", " Step 2: Filtering reads \n", " [####################] 100% processing reads | 0:00:02 \n", "\n", " Step 3: Clustering/Mapping reads\n", " [####################] 100% dereplicating | 0:00:00 \n", " [####################] 100% mapping | 0:00:00 \n", " [####################] 100% clustering | 0:00:00 \n", " [####################] 100% building clusters | 0:00:01 \n", " [####################] 100% finalize mapping | 0:00:29 \n", " [####################] 100% chunking | 0:00:00 \n", " [####################] 100% aligning | 0:00:15 \n", " [####################] 100% concatenating | 0:00:00 \n", "\n", " Step 4: Joint estimation of error rate and heterozygosity\n", " [####################] 100% inferring [H, E] | 0:00:15 \n", "\n", " Step 5: Consensus base calling \n", " Mean error [0.00074 sd=0.00001]\n", " Mean hetero [0.00195 sd=0.00015]\n", " [####################] 100% calculating depths | 0:00:00 \n", " [####################] 100% chunking clusters | 0:00:00 \n", " [####################] 100% consens calling | 0:00:11 \n", "\n", " Step 6: Clustering at 0.85 similarity across 12 samples\n", " [####################] 100% concat/shuffle input | 0:00:00 \n", " [####################] 100% clustering across | 0:00:01 \n", " [####################] 100% building clusters | 0:00:00 \n", " [####################] 100% aligning clusters | 0:00:03 \n", " [####################] 100% database indels | 0:00:00 \n", " [####################] 100% indexing clusters | 0:00:01 \n", " [####################] 100% building database | 0:00:00 \n", "\n", " Step 7: Filter and write output files for 12 Samples\n", " [####################] 100% filtering loci | 0:00:06 \n", " [####################] 100% building loci/stats | 0:00:00 \n", " [####################] 100% building vcf file | 0:00:01 \n", " [####################] 100% writing vcf file | 0:00:00 \n", " [####################] 100% building arrays | 0:00:02 \n", " [####################] 100% writing outfiles | 0:00:01 \n", " Outfiles written to: ~/manuscript-analysis/REFMAP_SIM/ipyrad/reference-assembly/denovo_ref-sim_outfiles\n", "\n", "\n", "real\t2m12.816s\n", "user\t0m23.420s\n", "sys\t0m1.675s\n" ] } ], "source": [ "data2 = data.branch(\"denovo_ref-empirical\")\n", "data2.set_params(\"assembly_method\", \"denovo+reference\")\n", "\n", "data2.write_params(force=True)\n", "\n", "cmd = \"ipyrad -p params-denovo_ref-empirical.txt -s 34567 -c 40\".format(dir)\n", "print(cmd)\n", "!time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do Stacks refmap empirical\n", "Lah et al trim R1 and R2 to 85bp and then concatenate them for stacks analysis. I thought this was weird, but the stacks manual says \"For double-digest RAD data that has been paired-end sequenced, Stacks supports this type of data by treating the loci built from the single-end and paired-end as two independent loci\", which I guess is effectively the same thing as concatenating.\n", "\n", "For reference sequence assembly stacks doesn't actually handle the mapping, you need to Do It Yourself and then pull in .sam or .bam files. Since we already do the bwa mapping in ipyrad lets just use those files. The question is do we use the full .sam file or the \\*-mapped.bam (which excludes unmapped and unpaired reads). We'll use the \\*-mapped.bam files because the stacks manual says the mapped reads should be clean and tidy, so no junk.\n", "\n", "__NB__: Stacks makes the __strong__ assumption that the \"left\" edge of all reads within a stack line up. It looks like it just uses the StartPos to define a stack (assuming the cut site is the same). We will clean up soft-clipped sequences with ngsutils (https://github.com/ngsutils/ngsutils.git).\n", "\n", "__NB__: Stacks treats R1 and R2 as independent.\n", "\n", "__NB__: The other consequence of being forced to remove soft clipped sequence is that you are bound to be losing some real variation.\n", "\n", "For the empirical data stacks runs in ~2 hours (~8 if you include the time it takes to map the sequences)." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing popmap file to /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/popmap.txt\n" ] } ], "source": [ "## Set directories and make the popmap file\n", "STACKS_REFMAP_DIR = os.path.join(REFMAP_EMPIRICAL_DIR, \"stacks/\")\n", "if not os.path.exists(STACKS_REFMAP_DIR):\n", " os.makedirs(STACKS_REFMAP_DIR)\n", "os.chdir(STACKS_REFMAP_DIR)\n", "\n", "make_stacks_popmap(STACKS_REFMAP_DIR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Map ipyrad trimmed reads to the reference sequence\n", "Based on the same logic as above with the simulated data, we need to remap the raw reads from the empirical ipyrad \\_edits directory to make stacks happy. This step consumes a __ton__ of ram and takes a non-negligible amount of time. See docs for sim mapping above for the meaning of all the bwa/samtools flags.\n", "\n", "Mapping the real data will take several hours (5-6 hours if you're lucky)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1784, 2185)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5347)\n", "[M::mem_pestat] mean and std.dev: (1788.75, 1421.66)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7475)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (193, 229, 272)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (35, 430)\n", "[M::mem_pestat] mean and std.dev: (234.03, 57.92)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 509)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1516, 2019, 6187)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15529)\n", "[M::mem_pestat] mean and std.dev: (3734.25, 2720.10)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20200)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 2931, 5098)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13924)\n", "[M::mem_pestat] mean and std.dev: (2765.14, 2259.57)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 18337)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (740, 1784, 2182)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5066)\n", "[M::mem_pestat] mean and std.dev: (1764.10, 1151.54)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 6508)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (194, 230, 273)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (36, 431)\n", "[M::mem_pestat] mean and std.dev: (234.42, 57.91)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 510)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1536, 4455, 6187)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15489)\n", "[M::mem_pestat] mean and std.dev: (4025.73, 2707.42)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20140)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291701.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291701.trimmed_R2_.fastq.gz\n", "[main] Real time: 303.408 sec; CPU: 4020.255 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1850, 2472)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6208)\n", "[M::mem_pestat] mean and std.dev: (1776.17, 1180.59)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 8076)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (203, 242, 287)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (35, 455)\n", "[M::mem_pestat] mean and std.dev: (245.55, 60.92)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 539)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1531, 2530, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15487)\n", "[M::mem_pestat] mean and std.dev: (3645.37, 2595.32)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20139)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 2931, 5098)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13924)\n", "[M::mem_pestat] mean and std.dev: (3050.90, 1983.19)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 18337)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (740, 2182, 2472)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5936)\n", "[M::mem_pestat] mean and std.dev: (2053.10, 1324.06)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7668)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (203, 242, 287)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (35, 455)\n", "[M::mem_pestat] mean and std.dev: (245.70, 60.87)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 539)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1556, 4515, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15437)\n", "[M::mem_pestat] mean and std.dev: (4229.70, 2491.04)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20064)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291681.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291681.trimmed_R2_.fastq.gz\n", "[main] Real time: 280.003 sec; CPU: 3767.582 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1850, 2418)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6046)\n", "[M::mem_pestat] mean and std.dev: (1658.53, 1230.99)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7860)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (190, 224, 265)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (40, 415)\n", "[M::mem_pestat] mean and std.dev: (228.78, 56.99)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 490)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1531, 2530, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15487)\n", "[M::mem_pestat] mean and std.dev: (3686.63, 2574.13)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20139)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 1465, 4752)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 12886)\n", "[M::mem_pestat] mean and std.dev: (2641.26, 2199.88)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 16953)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (740, 1850, 2472)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5936)\n", "[M::mem_pestat] mean and std.dev: (1866.46, 1192.82)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7668)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (190, 224, 266)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (38, 418)\n", "[M::mem_pestat] mean and std.dev: (229.05, 56.99)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 494)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1516, 2581, 6176)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15496)\n", "[M::mem_pestat] mean and std.dev: (3641.20, 2518.20)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20156)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291682.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291682.trimmed_R2_.fastq.gz\n", "[main] Real time: 278.177 sec; CPU: 3841.945 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (391, 1784, 2182)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5764)\n", "[M::mem_pestat] mean and std.dev: (1495.00, 1179.37)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7555)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (192, 228, 270)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (36, 426)\n", "[M::mem_pestat] mean and std.dev: (232.69, 58.10)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 504)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1516, 2581, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15517)\n", "[M::mem_pestat] mean and std.dev: (3892.85, 2693.16)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20184)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 4153, 4219)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 11287)\n", "[M::mem_pestat] mean and std.dev: (2734.53, 1833.27)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 14821)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] skip orientation FF as there are not enough pairs\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (191, 226, 269)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (35, 425)\n", "[M::mem_pestat] mean and std.dev: (231.42, 57.61)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 503)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1708, 4468, 6247)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15325)\n", "[M::mem_pestat] mean and std.dev: (4684.27, 2680.30)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 19864)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291678.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291678.trimmed_R2_.fastq.gz\n", "[main] Real time: 278.955 sec; CPU: 3769.486 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (740, 1850, 2472)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5936)\n", "[M::mem_pestat] mean and std.dev: (1827.28, 1175.99)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7668)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (195, 232, 276)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (33, 438)\n", "[M::mem_pestat] mean and std.dev: (236.03, 59.48)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 519)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1514, 2678, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15521)\n", "[M::mem_pestat] mean and std.dev: (3759.94, 2525.84)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20190)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (966, 2318, 4219)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 10725)\n", "[M::mem_pestat] mean and std.dev: (2703.32, 1997.38)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 13978)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (969, 2182, 2182)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4608)\n", "[M::mem_pestat] mean and std.dev: (2038.79, 1330.03)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7359)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (193, 231, 274)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (31, 436)\n", "[M::mem_pestat] mean and std.dev: (234.46, 59.58)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 517)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1516, 4468, 6187)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15529)\n", "[M::mem_pestat] mean and std.dev: (4027.00, 2526.71)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20200)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291686.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291686.trimmed_R2_.fastq.gz\n", "[main] Real time: 263.595 sec; CPU: 3605.401 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (670, 1850, 2418)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5914)\n", "[M::mem_pestat] mean and std.dev: (1741.52, 1201.23)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7662)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (207, 250, 299)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (23, 483)\n", "[M::mem_pestat] mean and std.dev: (253.12, 63.72)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 575)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1513, 1777, 6173)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15493)\n", "[M::mem_pestat] mean and std.dev: (3434.44, 2553.81)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20153)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 2931, 4219)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 11287)\n", "[M::mem_pestat] mean and std.dev: (3135.72, 2399.62)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 14821)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (785, 1850, 2418)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5684)\n", "[M::mem_pestat] mean and std.dev: (1647.73, 822.81)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7317)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (205, 247, 296)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (23, 478)\n", "[M::mem_pestat] mean and std.dev: (250.71, 63.42)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 569)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1536, 1777, 6219)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15585)\n", "[M::mem_pestat] mean and std.dev: (3547.52, 2557.66)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20268)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291689.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291689.trimmed_R2_.fastq.gz\n", "[main] Real time: 270.172 sec; CPU: 3508.493 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1850, 2418)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6046)\n", "[M::mem_pestat] mean and std.dev: (1720.91, 1142.69)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7860)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (190, 228, 272)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (26, 436)\n", "[M::mem_pestat] mean and std.dev: (231.34, 61.11)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 518)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1539, 2581, 6176)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15450)\n", "[M::mem_pestat] mean and std.dev: (3768.03, 2522.60)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20087)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 2931, 4436)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 11938)\n", "[M::mem_pestat] mean and std.dev: (3005.27, 2207.49)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 15689)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291683.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291683.trimmed_R2_.fastq.gz\n", "[main] Real time: 199.643 sec; CPU: 2349.067 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1784, 2297)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5683)\n", "[M::mem_pestat] mean and std.dev: (1676.03, 1147.20)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7376)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (192, 227, 270)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (36, 426)\n", "[M::mem_pestat] mean and std.dev: (232.32, 57.67)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 504)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1531, 2530, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15487)\n", "[M::mem_pestat] mean and std.dev: (3638.20, 2588.46)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20139)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (685, 2931, 5098)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 13924)\n", "[M::mem_pestat] mean and std.dev: (2756.81, 2182.51)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 18337)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (357, 969, 2182)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5832)\n", "[M::mem_pestat] mean and std.dev: (1471.56, 1306.32)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7657)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (192, 227, 270)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (36, 426)\n", "[M::mem_pestat] mean and std.dev: (232.40, 57.61)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 504)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1495, 2581, 6173)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15529)\n", "[M::mem_pestat] mean and std.dev: (3508.49, 2429.33)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20207)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (4219, 5098, 5098)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (2461, 6856)\n", "[M::mem_pestat] mean and std.dev: (4801.57, 460.97)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1582, 7735)\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[M::mem_pestat] skip orientation RR\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291705.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291705.trimmed_R2_.fastq.gz\n", "[main] Real time: 268.112 sec; CPU: 3609.621 sec\n", "[bam_sort_core] merging from 2 files...\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n", "[M::mem_pestat] analyzing insert size distribution for orientation FF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (604, 1784, 2418)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6046)\n", "[M::mem_pestat] mean and std.dev: (1640.13, 1243.11)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 7860)\n", "[M::mem_pestat] analyzing insert size distribution for orientation FR...\n", "[M::mem_pestat] (25, 50, 75) percentile: (181, 211, 247)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (49, 379)\n", "[M::mem_pestat] mean and std.dev: (214.95, 53.03)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 445)\n", "[M::mem_pestat] analyzing insert size distribution for orientation RF...\n", "[M::mem_pestat] (25, 50, 75) percentile: (1556, 4410, 6183)\n", "[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15437)\n", "[M::mem_pestat] mean and std.dev: (3951.14, 2572.36)\n", "[M::mem_pestat] low and high boundaries for proper pairs: (1, 20064)\n", "[M::mem_pestat] skip orientation RR as there are not enough pairs\n", "[M::mem_pestat] skip orientation FF\n", "[M::mem_pestat] skip orientation RF\n", "[main] Version: 0.7.15-r1142-dirty\n", "[main] CMD: bwa mem -t 40 -v 1 /home/iovercast/manuscript-analysis/Phocoena_empirical/TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291688.trimmed_R1_.fastq.gz /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/reference-assembly/refmap-empirical_edits/SRR4291688.trimmed_R2_.fastq.gz\n", "[main] Real time: 142.756 sec; CPU: 1975.479 sec\n", "\n", "real\t0m0.000s\n", "user\t0m0.000s\n", "sys\t0m0.000s\n" ] } ], "source": [ "IPYRAD_EDITS_DIR = os.path.join(IPYRAD_REFMAP_DIR, \"reference-assembly/refmap-empirical_edits/\")\n", "REF_SEQ = REFMAP_EMPIRICAL_DIR + \"TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa\"\n", "\n", "## Just get the \n", "sample_names = glob.glob(IPYRAD_EDITS_DIR + \"*.trimmed_R1_.fastq.gz\")\n", "sample_names = [x.split(\".\")[0].split(\"/\")[-1] for x in sample_names]\n", "\n", "for samp in sample_names:\n", " R1 = IPYRAD_EDITS_DIR + samp + \".trimmed_R1_.fastq.gz\"\n", " R2 = IPYRAD_EDITS_DIR + samp + \".trimmed_R2_.fastq.gz\"\n", " samout = STACKS_REFMAP_DIR + samp + \".sam\"\n", " bamout = STACKS_REFMAP_DIR + samp + \".bam\"\n", " export_cmd = \"export PATH=~/manuscript-analysis/dDocent:$PATH\"\n", " bwa_cmd = \"bwa mem -t 40 -v 0 \" + REF_SEQ\\\n", " + \" \" + R1\\\n", " + \" \" + R2\\\n", " + \" > \" + samout\n", " samtools_cmd = \"samtools view -b -F 0x804 \" + samout\\\n", " + \" | samtools sort -T /tmp/{}.sam -O bam -o {}\".format(samp, bamout)\n", " cleanup_cmd = \"rm {}\".format(samout)\n", " cmd = \"; \".join([export_cmd, bwa_cmd, samtools_cmd, cleanup_cmd])\n", " !time $cmd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Stacks prereqs - install ngsutils\n", "Install ngsutils and use it to remove soft-clipped sequences from each read. We could do this\n", "by rerunning bwa with the `-H` flag, but that would be slightly more annoying. This is not\n", "guaranteed to 'work' because i had a hell of a time getting ngsutils to install on my system." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash -s \"$REFMAP_EMPIRICAL_DIR\"\n", "cd $1/stacks\n", "git clone https://github.com/ngsutils/ngsutils.git\n", "cd ngsutils\n", "make" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now do the filtering\n", "This will apply the `removeclipping` method of bamutils to each mapped bam file in the stacks refmap directory. Then it deletes the old bam file and moves the new bam file to have the name of the old one. On a nice system this takes ~2 minutes per sample (so ~2 hours total)." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291679.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291679.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291702.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291702.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291701.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291701.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291681.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291681.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291682.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291682.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291678.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291678.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291686.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291686.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291705.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291705.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291688.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291688.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291674.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291674.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291673.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291673.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291684.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291684.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291677.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291677.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291675.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291675.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291703.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291703.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291672.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291672.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291687.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291687.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291696.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291696.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291699.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291699.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291668.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291668.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291698.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291698.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291690.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291690.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291692.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291692.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291666.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291666.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291664.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291664.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291689.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291689.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291683.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291683.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291680.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291680.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291685.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291685.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291700.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291700.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291671.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291671.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291693.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291693.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291704.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291704.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291676.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291676.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291695.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291695.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291697.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291697.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291691.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291691.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291662.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291662.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291663.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291663.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291669.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291669.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291670.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291670.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291694.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291694.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291667.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291667.bam.tmp')\n", "('/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291665.bam', '/home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291665.bam.tmp')\n" ] } ], "source": [ "infiles = glob.glob(STACKS_REFMAP_DIR + \"SRR*.bam\")\n", "\n", "for f in infiles:\n", " outfile = f + \".tmp\"\n", " print(f, outfile)\n", " subprocess.call(\"bamutils removeclipping {} {}\".format(f, outfile), shell=True)\n", " subprocess.call(\"rm {}\".format(f), shell=True)\n", " subprocess.call(\"mv {} {}\".format(outfile, f), shell=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now run stacks ref_map pipeline\n", "We build the stacks command w/ python because then we call the command inside a bash cell because denovo_map expects all the submodules to be in the PATH. The default command line created will include the `-d` flag to do the dry run, so the bash cell won't actually do anything real. But it does create a `stacks.sh` file in the stacks directory that includes the command to run." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Command to run - ref_map.pl -T 40 -b 1 -S -d -O /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks//popmap.txt --samples /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/ -X 'populations:--vcf --genepop --structure --phylip ' -X 'populations:-m 6' -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/\n" ] } ], "source": [ "## This is how we'd do it if we weren't using a popmap file\n", "#infiles = [\"-s \"+ff+\" \" for ff in glob.glob(IPYRAD_REFMAP_DIR+\"*-mapped-sorted.bam\")]\n", "\n", "## Toggle the dryrun flag for testing\n", "DRYRUN=\"\"\n", "DRYRUN=\"-d\"\n", "\n", "## Options\n", "## -T The number of threads to use\n", "## -O The popmap file specifying individuals and populations\n", "## -S Disable database business\n", "## -o Output directory. Just write to the empirical stacks directory\n", "## -X The first -X tells populations to create the output formats sepcified\n", "## -X The second one passes `-m 6` which sets min depth per locus\n", "OUTPUT_FORMATS = \"--vcf --genepop --structure --phylip \"\n", "cmd = \"ref_map.pl -T 40 -b 1 -S \" + DRYRUN\\\n", " + \" -O {}/popmap.txt\".format(STACKS_REFMAP_DIR)\\\n", " + \" --samples {}\".format(STACKS_REFMAP_DIR)\\\n", " + \" -X \\'populations:\" + OUTPUT_FORMATS + \"\\'\"\\\n", " + \" -X \\'populations:-m 6\\'\"\\\n", " + \" -o \" + STACKS_REFMAP_DIR\n", "print(\"\\nCommand to run - {}\".format(cmd))" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Identifying unique stacks; file 1 of 44 [SRR4291704]\n", "Identifying unique stacks; file 2 of 44 [SRR4291705]\n", "Identifying unique stacks; file 3 of 44 [SRR4291700]\n", "Identifying unique stacks; file 4 of 44 [SRR4291701]\n", "Identifying unique stacks; file 5 of 44 [SRR4291702]\n", "Identifying unique stacks; file 6 of 44 [SRR4291703]\n", "Identifying unique stacks; file 7 of 44 [SRR4291685]\n", "Identifying unique stacks; file 8 of 44 [SRR4291684]\n", "Identifying unique stacks; file 9 of 44 [SRR4291687]\n", "Identifying unique stacks; file 10 of 44 [SRR4291686]\n", "Identifying unique stacks; file 11 of 44 [SRR4291681]\n", "Identifying unique stacks; file 12 of 44 [SRR4291680]\n", "Identifying unique stacks; file 13 of 44 [SRR4291683]\n", "Identifying unique stacks; file 14 of 44 [SRR4291682]\n", "Identifying unique stacks; file 15 of 44 [SRR4291689]\n", "Identifying unique stacks; file 16 of 44 [SRR4291688]\n", "Identifying unique stacks; file 17 of 44 [SRR4291674]\n", "Identifying unique stacks; file 18 of 44 [SRR4291675]\n", "Identifying unique stacks; file 19 of 44 [SRR4291676]\n", "Identifying unique stacks; file 20 of 44 [SRR4291677]\n", "Identifying unique stacks; file 21 of 44 [SRR4291670]\n", "Identifying unique stacks; file 22 of 44 [SRR4291671]\n", "Identifying unique stacks; file 23 of 44 [SRR4291672]\n", "Identifying unique stacks; file 24 of 44 [SRR4291673]\n", "Identifying unique stacks; file 25 of 44 [SRR4291678]\n", "Identifying unique stacks; file 26 of 44 [SRR4291679]\n", "Identifying unique stacks; file 27 of 44 [SRR4291696]\n", "Identifying unique stacks; file 28 of 44 [SRR4291697]\n", "Identifying unique stacks; file 29 of 44 [SRR4291694]\n", "Identifying unique stacks; file 30 of 44 [SRR4291695]\n", "Identifying unique stacks; file 31 of 44 [SRR4291692]\n", "Identifying unique stacks; file 32 of 44 [SRR4291693]\n", "Identifying unique stacks; file 33 of 44 [SRR4291690]\n", "Identifying unique stacks; file 34 of 44 [SRR4291691]\n", "Identifying unique stacks; file 35 of 44 [SRR4291698]\n", "Identifying unique stacks; file 36 of 44 [SRR4291699]\n", "Identifying unique stacks; file 37 of 44 [SRR4291669]\n", "Identifying unique stacks; file 38 of 44 [SRR4291668]\n", "Identifying unique stacks; file 39 of 44 [SRR4291663]\n", "Identifying unique stacks; file 40 of 44 [SRR4291662]\n", "Identifying unique stacks; file 41 of 44 [SRR4291667]\n", "Identifying unique stacks; file 42 of 44 [SRR4291666]\n", "Identifying unique stacks; file 43 of 44 [SRR4291665]\n", "Identifying unique stacks; file 44 of 44 [SRR4291664]\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Parsed population map: 44 files in 7 populations and 1 group.\n", "Found 44 sample file(s).\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291704.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 1 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291705.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 2 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291700.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 3 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291701.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 4 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291702.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 5 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291703.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 6 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291685.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 7 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291684.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 8 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291687.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 9 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291686.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 10 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291681.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 11 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291680.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 12 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291683.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 13 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291682.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 14 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291689.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 15 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291688.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 16 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291674.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 17 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291675.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 18 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291676.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 19 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291677.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 20 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291670.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 21 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291671.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 22 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291672.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 23 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291673.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 24 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291678.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 25 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291679.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 26 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291696.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 27 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291697.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 28 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291694.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 29 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291695.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 30 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291692.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 31 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291693.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 32 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291690.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 33 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291691.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 34 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291698.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 35 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291699.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 36 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291669.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 37 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291668.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 38 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291663.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 39 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291662.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 40 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291667.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 41 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291666.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 42 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291665.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 43 -p 40 2>&1\n", " /home/iovercast/manuscript-analysis/miniconda/bin/pstacks -t bam -f /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291664.bam -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -i 44 -p 40 2>&1\n", "\n", "Depths of Coverage for Processed Samples:\n", "\n", "Generating catalog...\n", " /home/iovercast/manuscript-analysis/miniconda/bin/cstacks -g -b 1 -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291704 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291705 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291700 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291701 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291702 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291703 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291685 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291684 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291687 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291686 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291681 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291680 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291683 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291682 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291689 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291688 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291674 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291675 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291676 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291677 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291670 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291671 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291672 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291673 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291678 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291679 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291696 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291697 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291694 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291695 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291692 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291693 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291690 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291691 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291698 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291699 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291669 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291668 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291663 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291662 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291667 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291666 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291665 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291664 -p 40 2>&1\n", "Matching samples to the catalog...\n", " /home/iovercast/manuscript-analysis/miniconda/bin/sstacks -g -b 1 -c /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/batch_1 -o /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291704 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291705 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291700 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291701 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291702 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291703 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291685 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291684 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291687 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291686 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291681 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291680 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291683 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291682 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291689 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291688 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291674 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291675 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291676 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291677 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291670 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291671 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291672 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291673 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291678 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291679 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291696 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291697 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291694 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291695 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291692 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291693 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291690 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291691 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291698 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291699 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291669 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291668 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291663 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291662 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291667 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291666 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291665 -s /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks/SRR4291664 -p 40 2>&1\n", "Calculating population-level summary statistics\n", "/home/iovercast/manuscript-analysis/miniconda/bin/populations -b 1 -P /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks -s -t 40 -M /home/iovercast/manuscript-analysis/Phocoena_empirical/stacks//popmap.txt --vcf --genepop --structure --phylip -m 6 2>&1\n", "\n", "real\t0m0.385s\n", "user\t0m0.031s\n", "sys\t0m0.012s\n" ] } ], "source": [ "%%bash -s \"$WORK_DIR\" \"$STACKS_REFMAP_DIR\" \"$cmd\"\n", "export PATH=\"$1/miniconda/bin:$PATH\"; export \"STACKS_REFMAP_DIR=$2\"; export \"cmd=$3\"\n", "\n", "## We have to play a little cat and mouse game here because of quoting in some of the args\n", "## and how weird bash is we have to write the cmd to a file and then exec it.\n", "## If you try to just run $cmd it truncates the command at the first single tic. Hassle.\n", "cd $STACKS_REFMAP_DIR\n", "echo $cmd > stacks.sh; chmod 777 stacks.sh\n", "time ./stacks.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Do dDocent refmap empirical\n", "dDocent requires a __lot__ of pre-flight housekeeping. You can't use an arbitrary directory for raw files, \n", "you have to puth the raw reads in a __specific__ directory. You also can't use arbitrary file names, you have\n", "to use __specific__ filenames that dDocent requires. zomg. I wonder if we can trick it with symlinks.\n", "\n", "This next cell will do all the housekeeping and then print the nicely formatted command to run. You have to copy \n", "this command and run it by hand in a terminal on the machine you want to run it on bcz it doesn't like running\n", "inside the notebook.\n", "\n", "~1 hour to munge reads and index the reference sequence. ~3hrs to map the reads and build the vcf. Several hours getting the .fq.gz files to have a naming format that dDocent agreed with." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'SRR4291704': 'Pop7_001', 'SRR4291705': 'Pop7_002', 'SRR4291700': 'Pop6_001', 'SRR4291701': 'Pop4_001', 'SRR4291702': 'Pop4_002', 'SRR4291703': 'Pop7_003', 'SRR4291685': 'Pop3_001', 'SRR4291684': 'Pop5_001', 'SRR4291687': 'Pop5_002', 'SRR4291686': 'Pop5_003', 'SRR4291681': 'Pop5_004', 'SRR4291680': 'Pop5_005', 'SRR4291683': 'Pop5_006', 'SRR4291682': 'Pop5_007', 'SRR4291689': 'Pop5_008', 'SRR4291688': 'Pop5_009', 'SRR4291674': 'Pop3_002', 'SRR4291675': 'Pop1_001', 'SRR4291676': 'Pop1_002', 'SRR4291677': 'Pop1_003', 'SRR4291670': 'Pop2_001', 'SRR4291671': 'Pop2_002', 'SRR4291672': 'Pop1_004', 'SRR4291673': 'Pop1_005', 'SRR4291678': 'Pop1_006', 'SRR4291679': 'Pop5_010', 'SRR4291696': 'Pop4_003', 'SRR4291697': 'Pop6_002', 'SRR4291694': 'Pop6_003', 'SRR4291695': 'Pop6_004', 'SRR4291692': 'Pop6_005', 'SRR4291693': 'Pop6_006', 'SRR4291690': 'Pop6_007', 'SRR4291691': 'Pop6_008', 'SRR4291698': 'Pop6_009', 'SRR4291699': 'Pop6_010', 'SRR4291669': 'Pop2_003', 'SRR4291668': 'Pop2_004', 'SRR4291663': 'Pop3_003', 'SRR4291662': 'Pop3_004', 'SRR4291667': 'Pop2_005', 'SRR4291666': 'Pop7_004', 'SRR4291665': 'Pop7_005', 'SRR4291664': 'Pop7_006'}\n" ] } ], "source": [ "## A housekeeping function for getting a dictionary to map SRR* filenames in the ipyrad edits directory\n", "## to ddocent style.\n", "##\n", "## Gotcha: Nice 1-based indexing for the dDocent format.\n", "##\n", "## For raw reads the format (for R1) is pop1_sample1.F.fq.gz format a la:\n", "## 1A_0_R1_.fastq.gz -> Pop1_Sample1.F.fq.gz\n", "##\n", "## For trimmed reads the format is pop1_001.R1.fq.gz a la:\n", "## 1A_0_R1_.fastq.gz -> Pop1_001.R1.fq.gz\n", "## So annoying because we have to translate across a bunch of different mappings. ugh.\n", "def get_ddocent_filename_mapping():\n", " mapping_dict = {}\n", " \n", " ## Maps sample name to population\n", " pop_dict = get_popdict()\n", " pops = set(pop_dict.values())\n", "\n", " ## For each population go through and add items to the dict per sample\n", " ## So we have to map the sample name to the SRR and then make an entry\n", " ## mapping SRR file name to ddocent format\n", " for i, pop in enumerate(pops):\n", " ## Get a list of all the samples in this population. This is probably a stupid way but it works.\n", " samps = [item[0] for item in pop_dict.items() if item[1] == pop]\n", " for j, samp in enumerate(samps):\n", " mapping_dict[samp] = \"Pop{}_{:03d}\".format(i+1, j+1)\n", " ## For the untrimmed format, if you want dDocent to do the trimming\n", " ## mapping_dict[samp] = \"Pop{}_Sample{}\".format(i, j)\n", "\n", " return mapping_dict\n", "print(get_ddocent_filename_mapping())" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ln: failed to create symbolic link ‘reference.fasta’: File exists\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; time dDocent /home/iovercast/manuscript-analysis/Phocoena_empirical/ddocent//empirical-config.txt\n" ] } ], "source": [ "## Set up directory structures. change the force flag if you want to\n", "## blow everything away and restart\n", "# force = True\n", "force = \"\"\n", "\n", "DDOCENT_DIR = \"/home/iovercast/manuscript-analysis/dDocent/\"\n", "DDOCENT_REFMAP_DIR = os.path.join(REFMAP_EMPIRICAL_DIR, \"ddocent/\")\n", "if force and os.path.exists(DDOCENT_REFMAP_DIR):\n", " shutil.rmtree(DDOCENT_REFMAP_DIR)\n", "if not os.path.exists(DDOCENT_REFMAP_DIR):\n", " os.makedirs(DDOCENT_REFMAP_DIR)\n", "os.chdir(DDOCENT_REFMAP_DIR)\n", "\n", "## Create a simlink to the reference sequence in the current directory\n", "REF_SEQ = REFMAP_EMPIRICAL_DIR + \"TurtrunRef/Tursiops_truncatus.turTru1.dna_rm.toplevel.fa\"\n", "cmd = \"ln -s {} reference.fasta\".format(REF_SEQ)\n", "!$cmd\n", "\n", "## Now we have to rename all the files in the way dDocent expects them:\n", "## 1A_0_R1_.fastq.gz -> Pop1_Sample1.F.fq.gz\n", "## Make symlinks to the trimmed data files in the ipyrad directory. It _should_ work.\n", "## Trimmed reads in the ipyrad directory are of the format: SRR4291681.trimmed_R1_.fastq.gz\n", "IPYRAD_EDITS_DIR = os.path.join(IPYRAD_REFMAP_DIR, \"reference-assembly/refmap-empirical_edits/\")\n", "\n", "name_mapping = get_ddocent_filename_mapping()\n", "\n", "for k,v in name_mapping.items():\n", " ## Symlink R1 and R2\n", " for i in [\"1\", \"2\"]:\n", " source = os.path.join(IPYRAD_EDITS_DIR, k + \".trimmed_R{}_.fastq.gz\".format(i))\n", " ##dest = os.path.join(DDOCENT_REFMAP_DIR, v + \".R{}.fq.gz\".format(i))\n", " if i == \"1\":\n", " dest = os.path.join(DDOCENT_REFMAP_DIR, v + \".R1.fq.gz\".format(i))\n", " else:\n", " dest = os.path.join(DDOCENT_REFMAP_DIR, v + \".R2.fq.gz\".format(i))\n", " cmd = \"ln -sf {} {}\".format(source, dest)\n", " !$cmd\n", "\n", "## Write out the config file for this run.\n", "## Compacted the config file into one long line here to make it not take up so much room\n", "## Trimming = no because we trimmed in ipyrad\n", "## Assembly = no because we are providing a reverence sequence\n", "## Type of Assembly = PE for paired-end\n", "config_file = \"{}/empirical-config.txt\".format(DDOCENT_REFMAP_DIR)\n", "with open(config_file, 'w') as outfile:\n", " outfile.write('Number of Processors\\n40\\nMaximum Memory\\n0\\nTrimming\\nno\\nAssembly?\\nno\\nType_of_Assembly\\nPE\\nClustering_Similarity%\\n0.85\\nMapping_Reads?\\nyes\\nMapping_Match_Value\\n1\\nMapping_MisMatch_Value\\n3\\nMapping_GapOpen_Penalty\\n5\\nCalling_SNPs?\\nyes\\nEmail\\nwatdo@mailinator.com\\n')\n", "\n", "cmd = \"export LD_LIBRARY_PATH={}/freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; \".format(DDOCENT_DIR)\n", "cmd += \"export PATH={}:$PATH; time dDocent {}\".format(DDOCENT_DIR, config_file)\n", "print(cmd)\n", "with open(\"ddocent.sh\", 'w') as outfile:\n", " outfile.write(\"#!/bin/bash\\n\")\n", " outfile.write(cmd)\n", "!chmod 777 ddocent.sh\n", "## Have to run the printed command by hand from the ddocent REALDATA dir bcz it doesn't like running in the notebook\n", "#!$cmd\n", "\n", "## NB: Must rename all the samples in the output vcf and then use vcf-shuffle-cols\n", "## perl script in the vcf/perl directory to reorder the vcf file to match\n", "## the output of stacks and ipyrad for pca/heatmaps to work." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finalizing - /home/iovercast/manuscript-analysis/Phocoena_empirical/ddocent/TotalRawSNPs.vcf\n", "perl vcf-shuffle-cols -t /home/iovercast/manuscript-analysis/Phocoena_empirical/ipyrad/refmap-empirical_outfiles/refmap-empirical.vcf /home/iovercast/manuscript-analysis/Phocoena_empirical/ddocent/TotalRawSNPs.vcf > /home/iovercast/manuscript-analysis/Phocoena_empirical/ddocent/ddocent-tmp.vcf\n", "export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH; vcfallelicprimitives /home/iovercast/manuscript-analysis/Phocoena_empirical/ddocent/TotalRawSNPs.vcf > ddoc-tmp.vcf\n" ] } ], "source": [ "## You have to post-process the vcf files to decompose complex genotypes and remove indels\n", "os.chdir(DDOCENT_REFMAP_DIR)\n", "exports = \"export LD_LIBRARY_PATH=/home/iovercast/manuscript-analysis/dDocent//freebayes-src/vcflib/tabixpp/htslib/:$LD_LIBRARY_PATH; export PATH=/home/iovercast/manuscript-analysis/dDocent/:$PATH\"\n", "\n", "fullvcf = os.path.join(DDOCENT_REFMAP_DIR, \"TotalRawSNPs.vcf\")\n", "filtvcf = os.path.join(DDOCENT_REFMAP_DIR, \"Final.recode.vcf\")\n", "for f in [fullvcf, filtvcf]:\n", " print(\"Finalizing - {}\".format(f))\n", " \n", " ## Rename the samples to make them agree with the ipyrad/stacks names so\n", " ## the results analysis will work.\n", " vcffile = f\n", " infile = open(vcffile,'r')\n", " filedata = infile.readlines()\n", " infile.close()\n", " \n", " outfile = open(vcffile,'w')\n", " for line in filedata:\n", " if \"CHROM\" in line:\n", " for ipname, ddname in name_mapping.items():\n", " line = line.replace(ddname, ipname)\n", " outfile.write(line)\n", " outfile.close()\n", " \n", " ## Rename columns to match ipyrad and then resort columns to be in same order\n", " IPYRAD_VCF = os.path.join(IPYRAD_REFMAP_DIR, \"refmap-empirical_outfiles/refmap-empirical.vcf\")\n", " os.chdir(os.path.join(DDOCENT_DIR, \"vcftools_0.1.11/perl\"))\n", " tmpvcf = os.path.join(DDOCENT_REFMAP_DIR, \"ddocent-tmp.vcf\")\n", " cmd = \"perl vcf-shuffle-cols -t {} {} > {}\".format(IPYRAD_VCF, vcffile, tmpvcf)\n", " print(cmd)\n", " #!$cmd\n", "\n", " os.chdir(DDOCENT_REFMAP_DIR)\n", " ## Naming the new outfiles as .snps.vcf\n", " ## Decompose complex genotypes and remove indels\n", " outfile = os.path.join(DDOCENT_REFMAP_DIR, f.split(\"/\")[-1].split(\".vcf\")[0] + \".snps.vcf\")\n", " cmd = \"{}; vcfallelicprimitives {} > ddoc-tmp.vcf\".format(exports, f)\n", " print(cmd)\n", " !$cmd\n", " cmd = \"{}; vcftools --vcf ddoc-tmp.vcf --remove-indels --recode --recode-INFO-all --out {}\".format(exports, outfile)\n", " print(cmd)\n", " !$cmd\n", " !rm ddoc-tmp.vcf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Housekeeping\n", "This should be at the top, but i'm leaving it here so it's out of the way. Just some housekeeping for translating between SRA sequence file name and sample name from the paper. This is only for the benefit of stacks populations script." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetch info from SRA for mapping between SRR accession numbers and sample names\n", "The files that come down from SRA are not helpfully named. We have to create a mapping between sample\n", "names from the paper and SRR numbers.\n", "\n", "Get the RunInfo table from here: https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP090334\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def get_sampsdict():\n", " info_header = \"BioSample_s\tExperiment_s\tLibrary_Name_s\tMBases_l\tMBytes_l\tRun_s\tSRA_Sample_s\tSample_Name_s\tdev_stage_s\tecotype_s\tlat_lon_s\tsex_s\ttissue_s\tAssay_Type_s\tAssemblyName_s\tBioProject_s\tBioSampleModel_s\tCenter_Name_s\tConsent_s\tInsertSize_l\tLibraryLayout_s\tLibrarySelection_s\tLibrarySource_s\tLoadDate_s\tOrganism_s\tPlatform_s\tReleaseDate_s\tSRA_Study_s\tg1k_analysis_group_s\tg1k_pop_code_s\tsource_s\"\n", " info = \"\"\"SAMN05806468\tSRX2187156\tPp01\t595\t395\tSRR4291662\tSRS1709994\tPp01\t\trelicta\t44.09 N 29.81 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806469\tSRX2187157\tPp02\t478\t318\tSRR4291663\tSRS1709995\tPp02\t\trelicta\t41.42 N 28.92 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806478\tSRX2187158\tPp11\t242\t162\tSRR4291664\tSRS1709996\tPp11\tadult\tphocoena\t54.96 N 8.32 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806479\tSRX2187159\tPp12\t261\t174\tSRR4291665\tSRS1709997\tPp12\tadult\tphocoena\t54.95 N 8.32 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806480\tSRX2187160\tPp13\t595\t397\tSRR4291666\tSRS1709998\tPp13\tjuvenile\tphocoena\t54.16 N 8.82 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806481\tSRX2187161\tPp14\t769\t511\tSRR4291667\tSRS1709999\tPp14\t\tphocoena\t57.00 N 11.00 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806482\tSRX2187162\tPp15\t624\t414\tSRR4291668\tSRS1710000\tPp15\t\tphocoena\t56.89 N 12.50 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806483\tSRX2187163\tPp16\t665\t446\tSRR4291669\tSRS1710001\tPp16\t\tphocoena\t57.37 N 9.68 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806484\tSRX2187164\tPp17\t264\t177\tSRR4291670\tSRS1710002\tPp17\t\tphocoena\t57.59 N 10.10 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806485\tSRX2187165\tPp18\t684\t453\tSRR4291671\tSRS1710003\tPp18\t\tphocoena\t58.93 N 11.15 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806486\tSRX2187166\tPp19\t601\t398\tSRR4291672\tSRS1710004\tPp19\t\tphocoena\t55.43 N 107.0 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806487\tSRX2187167\tPp20\t392\t261\tSRR4291673\tSRS1710005\tPp20\t\tphocoena\t55.97 N 11.18 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806470\tSRX2187168\tPp03\t471\t316\tSRR4291674\tSRS1710006\tPp03\t\trelicta\t41.48 N 28.31 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806488\tSRX2187169\tPp21\t592\t397\tSRR4291675\tSRS1710007\tPp21\t\tphocoena\t55.43 N 10.70 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806489\tSRX2187170\tPp22\t446\t300\tSRR4291676\tSRS1710008\tPp22\t\tphocoena\t56.25 N 12.82 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806490\tSRX2187171\tPp23\t617\t409\tSRR4291677\tSRS1710009\tPp23\t\tphocoena\t56.65 N 12.85 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806491\tSRX2187172\tPp24\t554\t367\tSRR4291678\tSRS1710010\tPp24\t\tphocoena\t56.00 N 12.00 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806492\tSRX2187173\tPp25\t753\t500\tSRR4291679\tSRS1710011\tPp25\tjuvenile\tphocoena\t55.00 N 10.23 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806493\tSRX2187174\tPp26\t530\t353\tSRR4291680\tSRS1710012\tPp26\t\tphocoena\t54.38 N 10.99 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806494\tSRX2187175\tPp27\t639\t426\tSRR4291681\tSRS1710013\tPp27\tjuvenile\tphocoena\t54.83 N 9.62 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806495\tSRX2187176\tPp28\t646\t430\tSRR4291682\tSRS1710014\tPp28\tjuvenile\tphocoena\t54.59 N 10.03 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806496\tSRX2187177\tPp29\t374\t247\tSRR4291683\tSRS1710015\tPp29\tjuvenile\tphocoena\t54.42 N 11.55 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806497\tSRX2187178\tPp30\t569\t376\tSRR4291684\tSRS1710016\tPp30\tjuvenile\tphocoena\t54.53 N 11.12 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806471\tSRX2187179\tPp04\t451\t303\tSRR4291685\tSRS1710017\tPp04\t\trelicta\t41.65 N 28.27 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806498\tSRX2187180\tPp31\t578\t384\tSRR4291686\tSRS1710018\tPp31\tadult\tphocoena\t54.53 N 11.11 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806499\tSRX2187181\tPp32\t586\t392\tSRR4291687\tSRS1710019\tPp32\tjuvenile\tphocoena\t54.32 N 13.09 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806500\tSRX2187182\tPp33\t288\t189\tSRR4291688\tSRS1710020\tPp33\tjuvenile\tphocoena\t54.46 N 12.54 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806501\tSRX2187183\tPp34\t587\t389\tSRR4291689\tSRS1710021\tPp34\t\tphocoena\t54.32 N 13.09 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806502\tSRX2187184\tPp35\t496\t330\tSRR4291690\tSRS1710022\tPp35\t\tphocoena\t55.00 N 14.00 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806503\tSRX2187185\tPp36\t1085\t720\tSRR4291691\tSRS1710023\tPp36\tjuvenile\tphocoena\t56.00 N 15.00 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806504\tSRX2187186\tPp37\t214\t141\tSRR4291692\tSRS1710024\tPp37\t\tphocoena\t55.56 N 17.63 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806505\tSRX2187187\tPp38\t397\t263\tSRR4291693\tSRS1710025\tPp38\t\tphocoena\t55.50 N 17.00 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806506\tSRX2187188\tPp39\t670\t447\tSRR4291694\tSRS1710026\tPp39\tjuvenile\tphocoena\t56.00 N 16.00 E\tmale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806507\tSRX2187189\tPp40\t342\t226\tSRR4291695\tSRS1710027\tPp40\t\tphocoena\t54.73 N 18.58 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806472\tSRX2187190\tPp05\t611\t406\tSRR4291696\tSRS1710028\tPp05\t\tphocoena\t64.78 N 13.22 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806508\tSRX2187191\tPp41\t586\t389\tSRR4291697\tSRS1710029\tPp41\t\tphocoena\t54.80 N 18.44 E\tfemale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806509\tSRX2187192\tPp42\t329\t219\tSRR4291698\tSRS1710030\tPp42\t\tphocoena\t54.67 N 18.59 E\tmale\tmuscle\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806510\tSRX2187193\tPp43\t517\t343\tSRR4291699\tSRS1710031\tPp43\tjuvenile\tphocoena\t57.00 N 20.00 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806511\tSRX2187194\tPp44\t491\t326\tSRR4291700\tSRS1710032\tPp44\tadult\tphocoena\t57.01 N 20.00 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806473\tSRX2187195\tPp06\t632\t423\tSRR4291701\tSRS1710033\tPp06\t\tphocoena\t64.58 N 13.58 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806474\tSRX2187196\tPp07\t905\t602\tSRR4291702\tSRS1710034\tPp07\t\tphocoena\t64.31 N 14.00 E\tmale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806475\tSRX2187197\tPp08\t585\t390\tSRR4291703\tSRS1710035\tPp08\tadult\tphocoena\t54.70 N 8.33 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806476\tSRX2187198\tPp09\t590\t392\tSRR4291704\tSRS1710036\tPp09\t\tphocoena\t54.30 N 8.93 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\n", " SAMN05806477\tSRX2187199\tPp10\t625\t414\tSRR4291705\tSRS1710037\tPp10\tadult\tphocoena\t55.47 N 8.38 E\tfemale\tskin\tOTHER\t\tPRJNA343959\tModel organism or animal\t\tpublic\t0\tPAIRED\tRestriction Digest\tGENOMIC\t2016-09-22\tPhocoena phocoena\tILLUMINA\t2016-09-27\tSRP090334\t\t\t\"\"\".split(\"\\n\")\n", " samps_dict = {}\n", " for i in info:\n", " line = i.split(\"\\t\")\n", " samps_dict[line[2]] = line[5]\n", " return(samps_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a population map\n", "Supplementary table S1 (http://journals.plos.org/plosone/article/file?type=supplementary&id=info:doi/10.1371/journal.pone.0162792.s006) contains detailed sample information. We can just extract the sample names and populations they belong to." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get_popdict():\n", " samps_dict = get_sampsdict()\n", " popmap = \\\n", "\"\"\"01\tWBS\n", "02\tWBS\n", "03\tWBS\n", "04\tWBS\n", "05\tIS\n", "06\tIS\n", "07\tIS\n", "08\tNOS\n", "09\tNOS\n", "10\tNOS\n", "11\tNOS\n", "12\tNOS\n", "13\tNOS\n", "14\tSK1\n", "15\tSK1\n", "16\tSK1\n", "17\tSK1\n", "18\tSK1\n", "19\tKB1\n", "20\tKB1\n", "21\tKB1\n", "22\tKB1\n", "23\tKB1\n", "24\tKB1\n", "25\tBES2\n", "26\tBES2\n", "27\tBES2\n", "28\tBES2\n", "29\tBES2\n", "30\tBES2\n", "31\tBES2\n", "32\tBES2\n", "33\tBES2\n", "34\tBES2\n", "35\tIBS\n", "36\tIBS\n", "37\tIBS\n", "38\tIBS\n", "39\tIBS\n", "40\tIBS\n", "41\tIBS\n", "42\tIBS\n", "43\tIBS\n", "44\tIBS\"\"\".split(\"\\n\")\n", " pop_dict = {}\n", " for i in popmap:\n", " line = i.split(\"\\t\")\n", " pop_dict[samps_dict[\"Pp\"+line[0]]] = line[1]\n", " return(pop_dict)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Adding \"-mapped-sorted\" to each individual name to avoid having to rename the .bam files created by ipyrad\n", "def make_stacks_popmap(OUTDIR):\n", " pop_dict = get_popdict()\n", " out = os.path.join(OUTDIR, \"popmap.txt\")\n", " print(\"Writing popmap file to {}\".format(out))\n", " with open(out, 'w') as outfile:\n", " for k,v in pop_dict.items():\n", " outfile.write(k + \"\\t\" + v + \"\\n\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Ignore all below here\n", "~~Oh man. I thought dDocent could use a reference sequence for assembly, but it totally can't. I did all this work\n", "to get the housekeeping in order, but then finally figured out at the last minute while setting the config\n", "that it just can't use an external reference sequence. zomg. Keeping this here cuz it might be useful some day\n", "in the event that dDocent makes this possible.~~ Nvm, it _can_ do reference sequence mapping, it's just not well documented." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }