{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

ipyrad-analysis toolkit: construct

\n", "\n", "The program construct is a STRUCTURE-like tool that incorporates expectations of isolation by distance. It is available as an R package. This notebook demonstrates how to convert data to the proper format for use in construct using simulated data as an example. For details on running construct see [their documentation](https://github.com/gbradburd/conStruct). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Required software" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad ipcoal -c conda-forge -c bioconda" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toytree\n", "import ipcoal" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ipyrad 0.9.54\n", "toytree 2.0.1\n", "ipcoal 0.1.4\n" ] } ], "source": [ "print('ipyrad', ipa.__version__)\n", "print('toytree', toytree.__version__)\n", "print('ipcoal', ipcoal.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate example data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
r0r1r2r3r4r5r6
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# network model\n", "tree = toytree.rtree.unittree(7, treeheight=3e6, seed=123)\n", "tree.draw(ts='o', admixture_edges=(3, 2));" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wrote 250 SNPs to /tmp/test-construct.snps.hdf5\n" ] } ], "source": [ "# simulation model with admixture and missing data\n", "model = ipcoal.Model(tree, Ne=1e4, nsamples=4, admixture_edges=(3, 2, 0.5, 0.1))\n", "model.sim_snps(250)\n", "model.write_snps_to_hdf5(name=\"test-construct\", outdir=\"/tmp\", diploid=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Input data file" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# the path to your HDF5 formatted snps file\n", "SNPS = \"/tmp/test-construct.snps.hdf5\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Population assignments" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "IMAP = {\n", " \"r0\": [\"r0-0\", \"r0-1\"],\n", " \"r1\": [\"r1-0\", \"r1-1\"],\n", " \"r2\": [\"r2-0\", \"r2-1\"],\n", " \"r3\": [\"r3-0\", \"r3-1\"],\n", " \"r4\": [\"r4-0\", \"r4-1\"],\n", " \"r5\": [\"r5-0\", \"r5-1\"],\n", " \"r6\": [\"r6-0\", \"r6-1\"],\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filter missing data and convert to genotype frequencies" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Samples: 14\n", "Sites before filtering: 250\n", "Filtered (indels): 0\n", "Filtered (bi-allel): 9\n", "Filtered (mincov): 0\n", "Filtered (minmap): 0\n", "Filtered (subsample invariant): 0\n", "Filtered (combined): 9\n", "Sites after filtering: 241\n", "Sites containing missing values: 0 (0.00%)\n", "Missing values in SNP matrix: 0 (0.00%)\n" ] } ], "source": [ "# apply filtering to the SNPs file\n", "tool = ipa.snps_extracter(data=SNPS, imap=IMAP, minmap={i:2 for i in IMAP})\n", "tool.parse_genos_from_hdf5();" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123456789...231232233234235236237238239240
r00.00.01.00.01.00.00.00.00.250.0...0.00.00.00.00.01.00.00.00.00.00
r10.00.01.00.01.00.01.00.00.000.0...0.00.00.00.00.01.00.00.00.00.00
r20.01.01.00.01.00.00.00.00.000.0...0.00.00.00.00.00.00.00.00.00.00
r30.00.00.00.01.00.00.01.00.000.0...0.01.00.00.00.00.00.00.00.00.00
r41.00.00.00.00.01.00.00.00.000.0...1.00.01.01.00.00.00.00.00.00.25
\n", "

5 rows × 241 columns

\n", "
" ], "text/plain": [ " 0 1 2 3 4 5 6 7 8 9 ... 231 232 233 \\\n", "r0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.25 0.0 ... 0.0 0.0 0.0 \n", "r1 0.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 0.00 0.0 ... 0.0 0.0 0.0 \n", "r2 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.00 0.0 ... 0.0 0.0 0.0 \n", "r3 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.00 0.0 ... 0.0 1.0 0.0 \n", "r4 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.00 0.0 ... 1.0 0.0 1.0 \n", "\n", " 234 235 236 237 238 239 240 \n", "r0 0.0 0.0 1.0 0.0 0.0 0.0 0.00 \n", "r1 0.0 0.0 1.0 0.0 0.0 0.0 0.00 \n", "r2 0.0 0.0 0.0 0.0 0.0 0.0 0.00 \n", "r3 0.0 0.0 0.0 0.0 0.0 0.0 0.00 \n", "r4 1.0 0.0 0.0 0.0 0.0 0.0 0.25 \n", "\n", "[5 rows x 241 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# convert SNP data to genotype frequencies\n", "df = tool.get_population_geno_frequency()\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Write data to file" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "# write to a file\n", "df.to_csv(\"/tmp/freqs.csv\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }