{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1><span style=\"color:gray\">ipyrad-analysis toolkit:</span> construct </h1>\n",
    "\n",
    "The program construct is a STRUCTURE-like tool that incorporates expectations of isolation by distance. It is available as an R package. This notebook demonstrates how to convert data to the proper format for use in construct using simulated data as an example. For details on running construct see [their documentation](https://github.com/gbradburd/conStruct). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Required software"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# conda install ipyrad ipcoal -c conda-forge -c bioconda"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ipyrad.analysis as ipa\n",
    "import toytree\n",
    "import ipcoal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ipyrad 0.9.54\n",
      "toytree 2.0.1\n",
      "ipcoal 0.1.4\n"
     ]
    }
   ],
   "source": [
    "print('ipyrad', ipa.__version__)\n",
    "print('toytree', toytree.__version__)\n",
    "print('ipcoal', ipcoal.__version__)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simulate example data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div class=\"toyplot\" id=\"t328f9fefed68410391b713b8c5b3630a\" style=\"text-align:center\"><svg class=\"toyplot-canvas-Canvas\" height=\"275.0px\" id=\"t8ffe99f416da454d9cdd7d42956fe43c\" preserveAspectRatio=\"xMidYMid meet\" style=\"background-color:transparent;border-color:#292724;border-style:none;border-width:1.0;fill:rgb(16.1%,15.3%,14.1%);fill-opacity:1.0;font-family:Helvetica;font-size:12px;opacity:1.0;stroke:rgb(16.1%,15.3%,14.1%);stroke-opacity:1.0;stroke-width:1.0\" viewBox=\"0 0 260.0 275.0\" width=\"260.0px\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:toyplot=\"http://www.sandia.gov/toyplot\" xmlns:xlink=\"http://www.w3.org/1999/xlink\"><g class=\"toyplot-coordinates-Cartesian\" id=\"t8b3f0da8bd694611b46f0835b72afc53\"><clipPath id=\"t49822f30887946dda64bd3a71d1d38d3\"><rect height=\"215.0\" width=\"200.0\" x=\"30.0\" y=\"30.0\"></rect></clipPath><g clip-path=\"url(#t49822f30887946dda64bd3a71d1d38d3)\"><g class=\"toytree-mark-Toytree\" id=\"t7a8a6a38eb494d78974aa19881c80edb\"><g class=\"toytree-Edges\" style=\"fill:none;stroke:rgb(14.9%,14.9%,14.9%);stroke-linecap:round;stroke-opacity:1;stroke-width:2\"><path d=\"M 56.9 120.0 L 91.1 80.1\" id=\"12,11\"></path><path d=\"M 56.9 120.0 L 91.1 159.8\" id=\"12,10\"></path><path d=\"M 91.1 80.1 L 125.4 99.3\" id=\"11,9\"></path><path d=\"M 91.1 159.8 L 125.4 182.1\" id=\"10,8\"></path><path d=\"M 125.4 182.1 L 159.7 201.2\" id=\"8,7\"></path><path d=\"M 91.1 80.1 L 194.0 61.0\" id=\"11,6\"></path><path d=\"M 125.4 99.3 L 194.0 86.5\" id=\"9,5\"></path><path d=\"M 125.4 99.3 L 194.0 112.0\" id=\"9,4\"></path><path d=\"M 91.1 159.8 L 194.0 137.5\" id=\"10,3\"></path><path d=\"M 125.4 182.1 L 194.0 163.0\" id=\"8,2\"></path><path d=\"M 159.7 201.2 L 194.0 188.5\" id=\"7,1\"></path><path d=\"M 159.7 201.2 L 194.0 214.0\" id=\"7,0\"></path></g><g class=\"toytree-AlignEdges\" style=\"stroke:rgb(66.3%,66.3%,66.3%);stroke-dasharray:2, 4;stroke-linecap:round;stroke-opacity:1.0;stroke-width:2\"><path d=\"M 194.0 214.0 L 194.0 214.0\"></path><path d=\"M 194.0 188.5 L 194.0 188.5\"></path><path d=\"M 194.0 163.0 L 194.0 163.0\"></path><path d=\"M 194.0 137.5 L 194.0 137.5\"></path><path d=\"M 194.0 112.0 L 194.0 112.0\"></path><path d=\"M 194.0 86.5 L 194.0 86.5\"></path><path d=\"M 194.0 61.0 L 194.0 61.0\"></path></g><g class=\"toytree-AdmixEdges\" style=\"fill:none;stroke:rgba(90.6%,54.1%,76.5%,1.000);stroke-linecap:round;stroke-opacity:0.6;stroke-width:5\"><path d=\"M 194.0 137.5 L 159.7 144.9 L 159.7 172.6 L 125.4 182.1\" style=\"stroke:rgb(98.8%,55.3%,38.4%);stroke-opacity:0.7\"></path></g><g class=\"toytree-Nodes\" style=\"fill:rgb(10.6%,62%,46.7%);fill-opacity:1.0;stroke:rgb(100%,100%,100%);stroke-opacity:1.0;stroke-width:1.5\"><g id=\"node-0\" transform=\"translate(193.998,213.986)\"><circle r=\"4.0\"></circle></g><g id=\"node-1\" transform=\"translate(193.998,188.491)\"><circle r=\"4.0\"></circle></g><g id=\"node-2\" transform=\"translate(193.998,162.995)\"><circle r=\"4.0\"></circle></g><g id=\"node-3\" transform=\"translate(193.998,137.500)\"><circle r=\"4.0\"></circle></g><g id=\"node-4\" transform=\"translate(193.998,112.005)\"><circle r=\"4.0\"></circle></g><g id=\"node-5\" transform=\"translate(193.998,86.509)\"><circle r=\"4.0\"></circle></g><g id=\"node-6\" transform=\"translate(193.998,61.014)\"><circle r=\"4.0\"></circle></g><g id=\"node-7\" transform=\"translate(159.713,201.238)\"><circle r=\"4.0\"></circle></g><g id=\"node-8\" transform=\"translate(125.428,182.117)\"><circle r=\"4.0\"></circle></g><g id=\"node-9\" transform=\"translate(125.428,99.257)\"><circle r=\"4.0\"></circle></g><g id=\"node-10\" transform=\"translate(91.142,159.808)\"><circle r=\"4.0\"></circle></g><g id=\"node-11\" transform=\"translate(91.142,80.135)\"><circle r=\"4.0\"></circle></g><g id=\"node-12\" transform=\"translate(56.857,119.972)\"><circle r=\"4.0\"></circle></g></g><g class=\"toytree-TipLabels\" style=\"fill:rgb(14.9%,14.9%,14.9%);fill-opacity:1.0;font-family:helvetica;font-size:11px;font-weight:normal;stroke:none;white-space:pre\"><g transform=\"translate(194.00,213.99)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r0</text></g><g transform=\"translate(194.00,188.49)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r1</text></g><g transform=\"translate(194.00,163.00)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r2</text></g><g transform=\"translate(194.00,137.50)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r3</text></g><g transform=\"translate(194.00,112.00)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r4</text></g><g transform=\"translate(194.00,86.51)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r5</text></g><g transform=\"translate(194.00,61.01)rotate(0)\"><text style=\"\" x=\"15.00\" y=\"2.81\">r6</text></g></g></g></g></g></svg><div class=\"toyplot-behavior\"><script>(function()\n",
       "{\n",
       "var modules={};\n",
       "})();</script></div></div>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# network model\n",
    "tree = toytree.rtree.unittree(7, treeheight=3e6, seed=123)\n",
    "tree.draw(ts='o', admixture_edges=(3, 2));"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "wrote 250 SNPs to /tmp/test-construct.snps.hdf5\n"
     ]
    }
   ],
   "source": [
    "# simulation model with admixture and missing data\n",
    "model = ipcoal.Model(tree, Ne=1e4, nsamples=4, admixture_edges=(3, 2, 0.5, 0.1))\n",
    "model.sim_snps(250)\n",
    "model.write_snps_to_hdf5(name=\"test-construct\", outdir=\"/tmp\", diploid=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Input data file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# the path to your HDF5 formatted snps file\n",
    "SNPS = \"/tmp/test-construct.snps.hdf5\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Population assignments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "IMAP = {\n",
    "    \"r0\": [\"r0-0\", \"r0-1\"],\n",
    "    \"r1\": [\"r1-0\", \"r1-1\"],\n",
    "    \"r2\": [\"r2-0\", \"r2-1\"],\n",
    "    \"r3\": [\"r3-0\", \"r3-1\"],\n",
    "    \"r4\": [\"r4-0\", \"r4-1\"],\n",
    "    \"r5\": [\"r5-0\", \"r5-1\"],\n",
    "    \"r6\": [\"r6-0\", \"r6-1\"],\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filter missing data and convert to genotype frequencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Samples: 14\n",
      "Sites before filtering: 250\n",
      "Filtered (indels): 0\n",
      "Filtered (bi-allel): 9\n",
      "Filtered (mincov): 0\n",
      "Filtered (minmap): 0\n",
      "Filtered (subsample invariant): 0\n",
      "Filtered (combined): 9\n",
      "Sites after filtering: 241\n",
      "Sites containing missing values: 0 (0.00%)\n",
      "Missing values in SNP matrix: 0 (0.00%)\n"
     ]
    }
   ],
   "source": [
    "# apply filtering to the SNPs file\n",
    "tool = ipa.snps_extracter(data=SNPS, imap=IMAP, minmap={i:2 for i in IMAP})\n",
    "tool.parse_genos_from_hdf5();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>...</th>\n",
       "      <th>231</th>\n",
       "      <th>232</th>\n",
       "      <th>233</th>\n",
       "      <th>234</th>\n",
       "      <th>235</th>\n",
       "      <th>236</th>\n",
       "      <th>237</th>\n",
       "      <th>238</th>\n",
       "      <th>239</th>\n",
       "      <th>240</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>r0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.25</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r1</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>r4</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.0</td>\n",
       "      <td>...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.25</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 241 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    0    1    2    3    4    5    6    7     8    9    ...  231  232  233  \\\n",
       "r0  0.0  0.0  1.0  0.0  1.0  0.0  0.0  0.0  0.25  0.0  ...  0.0  0.0  0.0   \n",
       "r1  0.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  0.00  0.0  ...  0.0  0.0  0.0   \n",
       "r2  0.0  1.0  1.0  0.0  1.0  0.0  0.0  0.0  0.00  0.0  ...  0.0  0.0  0.0   \n",
       "r3  0.0  0.0  0.0  0.0  1.0  0.0  0.0  1.0  0.00  0.0  ...  0.0  1.0  0.0   \n",
       "r4  1.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.00  0.0  ...  1.0  0.0  1.0   \n",
       "\n",
       "    234  235  236  237  238  239   240  \n",
       "r0  0.0  0.0  1.0  0.0  0.0  0.0  0.00  \n",
       "r1  0.0  0.0  1.0  0.0  0.0  0.0  0.00  \n",
       "r2  0.0  0.0  0.0  0.0  0.0  0.0  0.00  \n",
       "r3  0.0  0.0  0.0  0.0  0.0  0.0  0.00  \n",
       "r4  1.0  0.0  0.0  0.0  0.0  0.0  0.25  \n",
       "\n",
       "[5 rows x 241 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# convert SNP data to genotype frequencies\n",
    "df = tool.get_population_geno_frequency()\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Write data to file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write to a file\n",
    "df.to_csv(\"/tmp/freqs.csv\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}