{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cookbook: Merging samples or Assemblies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ipyrad is designed to allow you to work with your samples in a very modular way. Meaning you can easily split your Assembly objects to group different samples into different runs that apply different parameter settings, and then you can merge them back together later. Similarly, with ipyrad you can easily merge multiple samples into a single sample, which may be useful when you include technical replicates in your sequencing lanes, or sequence an individual on multiple lanes. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ipyrad as ip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Merging assemblies\n", "This will combine all samples from the two assemblies into a single assembly. If some samples have the same name in both assemblies they will be merged into one sample in the merged assembly." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Assembly: data1\n", "establishing parallel connection:\n", "Assembly: data1\n", "[####################] 100% 0:00:06 | sorting reads | s1 |\n", "[####################] 100% 0:00:01 | writing/compressing | s1 |\n", "New Assembly: data2\n", "establishing parallel connection:\n", "Assembly: data2\n", "[####################] 100% 0:00:05 | sorting reads | s1 |\n", "[####################] 100% 0:00:01 | writing/compressing | s1 |\n", "New Assembly: alldata\n", "establishing parallel connection:\n", "Assembly: alldata\n", "[####################] 100% 0:00:01 | concatenating inputs | s2 |\n", "[####################] 100% 0:00:08 | processing reads | s2 |\n", "[####################] 100% 0:00:00 | indexing reference | s3 |\n", "[####################] 100% 0:00:00 | concatenating | s3 |\n", "[####################] 100% 0:00:03 | join unmerged pairs | s3 |\n", "[####################] 100% 0:00:03 | dereplicating | s3 |\n", "[####################] 100% 0:00:00 | splitting dereps | s3 |\n", "[####################] 100% 0:00:04 | mapping reads | s3 |\n", "[####################] 100% 0:00:06 | building clusters | s3 |\n", "[####################] 100% 0:00:00 | calc cluster stats | s3 |\n", "[####################] 100% 0:00:03 | inferring [H, E] | s4 |\n", "[####################] 100% 0:00:00 | calculating depths | s5 |\n", "[####################] 100% 0:00:00 | chunking clusters | s5 |\n", "[####################] 100% 0:00:21 | consens calling | s5 |\n", "[####################] 100% 0:00:02 | indexing alleles | s5 |\n", "[####################] 100% 0:00:00 | concatenating bams | s6 |\n", "[####################] 100% 0:00:00 | fetching regions | s6 |\n", "[####################] 100% 0:00:00 | building loci | s6 |\n", "[####################] 100% 0:00:03 | applying filters | s7 |\n", "[####################] 100% 0:00:02 | building arrays | s7 |\n", "[####################] 100% 0:00:00 | writing conversions | s7 |\n", "[####################] 100% 0:00:01 | indexing vcf depths | s7 |\n", "[####################] 100% 0:00:01 | writing vcf output | s7 |\n" ] } ], "source": [ "# demux first lane of data\n", "data1 = ip.Assembly(\"data1\")\n", "data1.params.raw_fastq_path = \"./ipsimdata/pairddrad_example_R*.gz\"\n", "data1.params.barcodes_path = \"./ipsimdata/pairddrad_example_barcodes.txt\"\n", "data1.params.assembly_method = \"reference\"\n", "data1.params.reference_sequence = \"./ipsimdata/pairddrad_example_genome.fa\"\n", "data1.params.datatype = \"pairddrad\"\n", "data1.run('1', force=True, launch_client=True)\n", "\n", "# demux second lane of data\n", "data2 = ip.Assembly(\"data2\")\n", "data2.params.raw_fastq_path = \"./ipsimdata/pairddrad_example_R*.gz\"\n", "data2.params.barcodes_path = \"./ipsimdata/pairddrad_example_barcodes.txt\"\n", "data2.params.assembly_method = \"reference\"\n", "data2.params.reference_sequence = \"./ipsimdata/pairddrad_example_genome.fa\"\n", "data2.params.datatype = \"pairddrad\"\n", "data2.run('1', force=True, launch_client=True)\n", "\n", "# merge assemblies\n", "mdata = ip.merge(\"alldata\", [data1, data2])\n", "mdata.run(\"234567\", force=True, launch_client=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Merge samples\n", "Sometimes you may wish to merge samples within an Assembly. This can be done like below. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Assembly: data1\n", "establishing parallel connection:\n", "Assembly: data1\n", "[####################] 100% 0:00:04 | sorting reads | s1 |\n", "[####################] 100% 0:00:01 | writing/compressing | s1 |\n" ] } ], "source": [ "# demux first lane of data\n", "data1 = ip.Assembly(\"data1\")\n", "data1.params.raw_fastq_path = \"./ipsimdata/rad_example_R1_.fastq.gz\"\n", "data1.params.barcodes_path = \"./ipsimdata/rad_example_barcodes.txt\"\n", "data1.run('1', force=True, launch_client=True)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "New Assembly: merged\n", "establishing parallel connection:\n", "Assembly: merged\n", "[####################] 100% 0:00:00 | concatenating inputs | s2 |\n", "[####################] 100% 0:00:03 | processing reads | s2 |\n", "[####################] 100% 0:00:00 | concatenating | s3 |\n", "[####################] 100% 0:00:02 | dereplicating | s3 |\n", "[####################] 100% 0:00:01 | clustering/mapping | s3 |\n", "[####################] 100% 0:00:00 | building clusters | s3 |\n", "[####################] 100% 0:00:00 | chunking clusters | s3 |\n", "[####################] 100% 0:00:07 | aligning clusters | s3 |\n", "[####################] 100% 0:00:00 | concat clusters | s3 |\n", "[####################] 100% 0:00:00 | calc cluster stats | s3 |\n", "[####################] 100% 0:00:03 | inferring [H, E] | s4 |\n", "[####################] 100% 0:00:00 | calculating depths | s5 |\n", "[####################] 100% 0:00:00 | chunking clusters | s5 |\n", "[####################] 100% 0:00:10 | consens calling | s5 |\n", "[####################] 100% 0:00:00 | indexing alleles | s5 |\n", "[####################] 100% 0:00:00 | concatenating inputs | s6 |\n", "[####################] 100% 0:00:00 | clustering across | s6 |\n", "[####################] 100% 0:00:00 | building clusters | s6 |\n", "[####################] 100% 0:00:08 | aligning clusters | s6 |\n", "[####################] 100% 0:00:02 | applying filters | s7 |\n", "[####################] 100% 0:00:02 | building arrays | s7 |\n", "[####################] 100% 0:00:00 | writing conversions | s7 |\n", "[####################] 100% 0:00:00 | indexing vcf depths | s7 |\n", "[####################] 100% 0:00:01 | writing vcf output | s7 |\n" ] }, { "data": { "text/html": [ "
\n", " | state | \n", "reads_raw | \n", "reads_passed_filter | \n", "clusters_total | \n", "clusters_hidepth | \n", "hetero_est | \n", "error_est | \n", "reads_consens | \n", "
---|---|---|---|---|---|---|---|---|
2E_0 | \n", "6 | \n", "20017 | \n", "20017 | \n", "1000 | \n", "1000 | \n", "0.001830 | \n", "0.000766 | \n", "1000 | \n", "
2F_0 | \n", "6 | \n", "19933 | \n", "19933 | \n", "1000 | \n", "1000 | \n", "0.001996 | \n", "0.000755 | \n", "1000 | \n", "
2G_0 | \n", "6 | \n", "20030 | \n", "20030 | \n", "1000 | \n", "1000 | \n", "0.001940 | \n", "0.000763 | \n", "1000 | \n", "
2H_0 | \n", "6 | \n", "20199 | \n", "20199 | \n", "1000 | \n", "1000 | \n", "0.001747 | \n", "0.000756 | \n", "1000 | \n", "
3I_0 | \n", "6 | \n", "19885 | \n", "19885 | \n", "1000 | \n", "1000 | \n", "0.001807 | \n", "0.000758 | \n", "1000 | \n", "
3J_0 | \n", "6 | \n", "19822 | \n", "19822 | \n", "1000 | \n", "1000 | \n", "0.001931 | \n", "0.000776 | \n", "1000 | \n", "
3K_0 | \n", "6 | \n", "19965 | \n", "19965 | \n", "1000 | \n", "1000 | \n", "0.002092 | \n", "0.000766 | \n", "1000 | \n", "
3L_0 | \n", "6 | \n", "20008 | \n", "20008 | \n", "1000 | \n", "1000 | \n", "0.002042 | \n", "0.000748 | \n", "1000 | \n", "
A | \n", "6 | \n", "60041 | \n", "60041 | \n", "1000 | \n", "1000 | \n", "0.009826 | \n", "0.000833 | \n", "1000 | \n", "
D | \n", "6 | \n", "19966 | \n", "19966 | \n", "1000 | \n", "1000 | \n", "0.001803 | \n", "0.000761 | \n", "1000 | \n", "