{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# genetic distances \n", "\n", "This is not the true genetic distance between samples since we start by sampling sites that are already variable. Rather, we can use this to calculate relative genetic distance matrices among samples. To calculate true genetic distances we would need to divide by the total number of sites examined (and should take into account missing data again). We could also apply a substitution model correction for homoplasy. None of that is implemented yet. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### required software" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# conda install ipyrad -c bioconda\n", "# conda install toyplot -c eaton-lab (optional)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ipyrad.analysis as ipa\n", "import toyplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Short tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Setup input files and params" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# the path to your VCF or HDF5 formatted snps file\n", "data = \"/home/deren/Downloads/cranos_pop4.snps.hdf5\"\n", "names = ipa.snps_extracter(data).names" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# load object again this time entering imap and minmap using names\n", "imap = {\n", " \"DE_1\": [i for i in names if \"DE_1.\" in i],\n", " \"DE_4\": [i for i in names if \"DE_4.\" in i],\n", " \"DE_6\": [i for i in names if \"DE_6.\" in i],\n", " \"DE_8\": [i for i in names if \"DE_8.\" in i],\n", " \"DE_9\": [i for i in names if \"DE_9.\" in i],\n", " \"DE_10\": [i for i in names if \"DE_10.\" in i],\n", " \"DE_11\": [i for i in names if \"DE_11.\" in i],\n", " \"DE_12\": [i for i in names if \"DE_12.\" in i],\n", " \"DE_15\": [i for i in names if \"DE_15.\" in i],\n", " \"DE_18\": [i for i in names if \"DE_18.\" in i],\n", " \"DE_19\": [i for i in names if \"DE_19.\" in i],\n", " \"DE_22\": [i for i in names if \"DE_22.\" in i],\n", " \"DE_23\": [i for i in names if \"DE_23.\" in i],\n", " \"DE_24\": [i for i in names if \"DE_24.\" in i],\n", " \"DE_26\": [i for i in names if \"DE_26.\" in i],\n", "}\n", "\n", "minmap={\n", " \"DE_1\": 4,\n", " \"DE_4\": 4,\n", " \"DE_6\": 4,\n", " \"DE_8\": 4,\n", " \"DE_9\": 4,\n", " \"DE_10\": 4,\n", " \"DE_11\": 4,\n", " \"DE_12\": 4,\n", " \"DE_15\": 4,\n", " \"DE_18\": 4,\n", " \"DE_19\": 4,\n", " \"DE_22\": 4,\n", " \"DE_23\": 4,\n", " \"DE_24\": 4,\n", " \"DE_26\": 4,\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### calculate distances" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Samples: 102\n", "Sites before filtering: 92109\n", "Filtered (indels): 15037\n", "Filtered (bi-allel): 6345\n", "Filtered (mincov): 657\n", "Filtered (minmap): 7580\n", "Filtered (combined): 20625\n", "Sites after filtering: 71484\n", "Sites containing missing values: 62367 (87.25%)\n", "Missing values in SNP matrix: 314975 (4.32%)\n", "Imputation (sampled by freq. within pops): 94.2%, 2.9%, 2.9%\n" ] } ], "source": [ "# load the snp data into distance tool with arguments\n", "from ipyrad.analysis.distance import Distance\n", "dist = Distance(\n", " data=data, \n", " imap=imap,\n", " minmap=minmap,\n", " mincov=0.5,\n", " impute_method=\"sample\",\n", " subsample_snps=False,\n", ")\n", "dist.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### save results" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# save to a CSV file\n", "dist.dists.to_csv(\"cranolopha_distances.csv\")" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# save to a CSV file with no labels (eems style)\n", "dist.dists.to_csv(\n", " \"cranolopha_distances_eems.csv\",\n", " header=None,\n", " index=False,\n", " sep=\" \",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Draw the matrix\n", "Hover over cells to see values in a pop-up." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "