{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# stdpopsim example notebook\n", "\n", "#### This notebook serves as a place to play with ``stdpopsim`` in a ready to go environment.\n", "\n", "If you did not open this notebook via Binder, try clicking [](https://mybinder.org/v2/gh/popsim-consortium/stdpopsim/main?filepath=stdpopsim_example.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Instructions on using a Jupyter Notebook__ - Simply click the cell and press shift-enter, or click the \"Run\" button in the top panel. \n", "*Note: If the notebook seems slow to run, try restarting the kernel.*\n", "*If you want to save your work while using Binder, making sure to save and download your notebook*\n", "\n", "To run ``stdpopsim`` locally, make sure to follow the [installation instructions](https://stdpopsim.readthedocs.io/en/latest/installation.html)\n", "\n", "--------------------------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## API example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### To get started, let's run a simple simulation and compute the site frequency spectrum." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Import the necessary packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import stdpopsim\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulate 10% of human chromosome 22 with the [three population out-of-Africa model](https://stdpopsim.readthedocs.io/en/latest/catalog.html#sec_catalog_homsap_models_outofafrica_3g09), with 10 samples from each of the three populations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "species = stdpopsim.get_species(\"HomSap\")\n", "contig = species.get_contig(\"chr22\", length_multiplier=0.1)\n", "model = species.get_demographic_model(\"OutOfAfrica_3G09\")\n", "samples = model.get_samples(10, 10, 10)\n", "engine = stdpopsim.get_default_engine()\n", "ts = engine.simulate(model, contig, samples)\n", "print(\"simulated {} trees and {} sites, from {} samples.\".format(ts.num_trees, ts.num_sites, ts.num_samples))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we get subsets of the data corresponding to each set of samples and then we simplfy to get the tree sequences corresponding to these subsets of samples. This removes monomorphic sites within each population." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "YRI_samples = ts.samples(0)\n", "CEU_samples = ts.samples(1)\n", "CHB_samples = ts.samples(2)\n", "ts_YRI = ts.simplify(samples=YRI_samples)\n", "ts_CEU = ts.simplify(samples=CEU_samples)\n", "ts_CHB = ts.simplify(samples=CHB_samples)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we calculate the site frequency spectrum for each population." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sfs_YRI = ts_YRI.allele_frequency_spectrum(polarised=True,span_normalise=False)\n", "sfs_CEU = ts_CEU.allele_frequency_spectrum(polarised=True,span_normalise=False)\n", "sfs_CHB = ts_CHB.allele_frequency_spectrum(polarised=True,span_normalise=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we plot the site frequency spectrum for each population." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bar_width = 0.2\n", "r1 = np.arange(0,11) - 0.2\n", "r2 = [x + bar_width for x in r1]\n", "r3 = [x + bar_width for x in r2]\n", "ax = plt.subplot()\n", "plt.bar(x = r1, height = sfs_YRI/ts_YRI.num_sites, width=bar_width, color='grey')\n", "plt.bar(x = r2, height = sfs_CEU/ts_CEU.num_sites, width=bar_width, color='lightblue')\n", "plt.bar(x = r3, height = sfs_CHB/ts_CHB.num_sites, width=bar_width, color='green')\n", "plt.xlabel(\"Allele count\", fontweight=\"bold\")\n", "plt.ylabel(\"Proportion of mutated sites in sample\", fontweight=\"bold\")\n", "ax.set_xticks(np.arange(0,11))\n", "ax.legend(['YRI', 'CEU', 'CHB'])\n", "plt.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CLI Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's run the same simulation model as above, but with the CLI." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulate 10% of human chromosome 22 with the [three population out-of-Africa model](https://stdpopsim.readthedocs.io/en/latest/catalog.html#sec_catalog_homsap_models_outofafrica_3g09), with 10 samples from each of the three populations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "stdpopsim HomSap --chromosome chr22 --length-multiplier 0.1 --demographic-model OutOfAfrica_3G09 --output OutOfAfrica_3G09.trees 10 10 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we can use tskit to read the simulated file and calculate the site frequency spectrum." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tskit\n", "ts = tskit.load(\"OutOfAfrica_3G09.trees\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once we have loaded the simulated tree sequence file, we can use the same code as above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "YRI_samples = ts.samples(0)\n", "CEU_samples = ts.samples(1)\n", "CHB_samples = ts.samples(2)\n", "ts_YRI = ts.simplify(samples=YRI_samples)\n", "ts_CEU = ts.simplify(samples=CEU_samples)\n", "ts_CHB = ts.simplify(samples=CHB_samples)\n", "\n", "sfs_YRI = ts_YRI.allele_frequency_spectrum(polarised=True,span_normalise=False)\n", "sfs_CEU = ts_CEU.allele_frequency_spectrum(polarised=True,span_normalise=False)\n", "sfs_CHB = ts_CHB.allele_frequency_spectrum(polarised=True,span_normalise=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bar_width = 0.2\n", "r1 = np.arange(0,11) - 0.2\n", "r2 = [x + bar_width for x in r1]\n", "r3 = [x + bar_width for x in r2]\n", "ax = plt.subplot()\n", "plt.bar(x = r1, height = sfs_YRI/ts_YRI.num_sites, width=bar_width, color='grey')\n", "plt.bar(x = r2, height = sfs_CEU/ts_CEU.num_sites, width=bar_width, color='lightblue')\n", "plt.bar(x = r3, height = sfs_CHB/ts_CHB.num_sites, width=bar_width, color='green')\n", "plt.xlabel(\"Allele count\", fontweight=\"bold\")\n", "plt.ylabel(\"Proportion of mutated sites in sample\", fontweight=\"bold\")\n", "ax.set_xticks(np.arange(0,11))\n", "ax.legend(['YRI', 'CEU', 'CHB'])\n", "plt.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-------------------------------\n", "## Try out the [tutorials](https://stdpopsim.readthedocs.io/en/latest/tutorial.html#) yourself\n", "\n", "*Remember, if you want to play with the CLI, use the Jupyter Notebook bash magic at the beginning of the cell `%%bash`*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }