{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This notebook replicates part of the [E-vident](https://github.com/biocore/evident) analysis platform, allowing you to explore a series of different distance metrics, and rarefaction levels by leveraging the Jupyter interface available in **Emperor**.\n", "\n", "Before you execute this example, you need to make sure you install a few additional dependencies:\n", "\n", "```\n", "pip install scikit-learn ipywidgets h5py biom-format qiime_default_reference\n", "```\n", "\n", "Once you've done this, you will need to enable the `ipywidgets` interface, to do so, you will need to run:\n", "\n", "```\n", "jupyter nbextension enable --py widgetsnbextension\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "%matplotlib inline\n", "from emperor import Emperor, nbinstall\n", "nbinstall()\n", "\n", "from skbio.stats.ordination import pcoa\n", "from skbio.diversity import beta_diversity\n", "from skbio import TreeNode\n", "\n", "from biom import load_table\n", "from biom.util import biom_open\n", "\n", "import qiime_default_reference\n", "\n", "# pydata/scipy\n", "import pandas as pd\n", "import numpy as np\n", "\n", "from scipy.spatial.distance import braycurtis, canberra\n", "from ipywidgets import interact\n", "from sklearn.metrics import pairwise_distances\n", "from functools import partial\n", "\n", "import warnings\n", "\n", "warnings.filterwarnings(action='ignore', category=Warning)\n", "\n", "# -1 means all the processors available\n", "pw_dists = partial(pairwise_distances, n_jobs=-1)\n", "\n", "def load_mf(fn, index='#SampleID'):\n", " _df = pd.read_csv(fn, sep='\\t', dtype=str, keep_default_na=False, na_values=[])\n", " _df.set_index(index, inplace=True)\n", " return _df" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We are going to load data from [Fierer et al. 2010](http://www.pnas.org/content/107/14/6477.full) (the data was retrieved from study [232](https://qiita.ucsd.edu/study/description/232) in [Qiita](https://qiita.ucsd.edu), remember you need to be logged in to access the study).\n", "\n", "We will load this as a [QIIME](http://qiime.org) mapping file and as a [BIOM](http://biom-format.org) OTU table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "mf = load_mf('keyboard/mapping-file.txt')\n", "bt = load_table('keyboard/otu-table.biom')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Now we will load a reference database using [scikit-bio](http://scikit-bio.org)'s TreeNode object. The reference itself is as provided by [Greengenes](http://greengenes.secondgenome.com/downloads)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "tree = TreeNode.read(qiime_default_reference.get_reference_tree())\n", "\n", "for n in tree.traverse():\n", " if n.length is None:\n", " n.length = 0" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The function `evident` uses the OTU table (`bt`), the mapping file (`mf`), and the phylogenetic tree (`tree`), to construct a distance matrix and ordinate it using principal coordinates analysis.\n", "\n", "To exercise this function, we build a small ipywidgets function that will let us experiment with a variety of rarefaction levels and distance metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "def evident(n, metric):\n", " rarefied = bt.subsample(n)\n", " data = np.array([rarefied.data(i) for i in rarefied.ids()], dtype='int64')\n", " \n", " if metric in ['unweighted_unifrac', 'weighted_unifrac']:\n", " res = pcoa(beta_diversity(metric, data, rarefied.ids(),\n", " otu_ids=rarefied.ids('observation'),\n", " tree=tree, pairwise_func=pw_dists))\n", " else:\n", " res = pcoa(beta_diversity(metric, data, rarefied.ids(),\n", " pairwise_func=pw_dists))\n", " # If you want to share your notebook via GitHub use `remote=True` and\n", " # make sure you share your notebook using nbviewer.\n", " return Emperor(res, mf, remote=False)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "**Note** that the ipywidgets themselves, will not be visible unless you are executing this notebook i.e. by running your own Jupyter server." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "interact(evident, n=(200, 2000, 50),\n", " metric=['unweighted_unifrac', 'weighted_unifrac', 'braycurtis', 'euclidean'],\n", " __manual=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 0 }