{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "source": [ "# Emperor's Python API\n", "\n", "**This notebook demonstrate Emperor's new Python API, which can and will change as we continue to exercise this interface, for more information, have a look at the [pull request here](https://github.com/biocore/emperor/pull/405).**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true, "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import pandas as pd, numpy as np\n", "\n", "from emperor import Emperor, nbinstall\n", "\n", "from skbio import OrdinationResults\n", "from skbio.io.util import open_file\n", "from skbio.stats.composition import clr, centralize, closure\n", "\n", "from scipy.spatial.distance import euclidean\n", "\n", "from biom import load_table\n", "\n", "nbinstall()\n", "\n", "def load_mf(fn, index='#SampleID'):\n", " _mf = pd.read_csv(fn, sep='\\t', dtype=str, keep_default_na=False, na_values=[])\n", " _mf.set_index(index, inplace=True)\n", " return _mf" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In this notebook we are going to showcase how to visualize a biplot using Emperor. To exemplify this, we are going to load data from [Reber et al. 2016](https://www.ncbi.nlm.nih.gov/pubmed/27185913) (the data was retrieved from study [1634](https://qiita.ucsd.edu/study/description/1634) in [Qiita](https://qiita.ucsd.edu), remember you need to be logged in to access the study). Specifically, here we will reproduce *Figure S4*." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We start by loading the sample metadata and a BIOM table that has already been rarefied to an even depth of 20,000 sequences per sample (this table was generated using a closed reference protocol)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "bt = load_table('ptsd-mice/table.biom')\n", "mf = load_mf('ptsd-mice/mapping-file.tsv')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Next we are going to create a table of metadata for the bacteria represented in this table. In this example we are only going to use the taxonomic information, but you could add any additional information that you have access to. Note that we only use the genus level (`'taxonomy_5'`) as our category to collapse the OTUs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "feature_mf = bt.metadata_to_dataframe('observation')\n", "feature_mf = feature_mf.reset_index(drop=True).drop_duplicates(subset=['taxonomy_5']).copy()\n", "\n", "feature_mf.set_index('taxonomy_5', inplace=True, )\n", "feature_mf.index.name = 'FeatureID'" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "In the original figure, the authors created the ordination based on a table collapsed at the genus level." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "collapse_genus = lambda id_, x: x['taxonomy'][5]\n", "\n", "bt = bt.collapse(collapse_genus, norm=False, min_group_size=1,\n", " axis='observation')" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Lastly, we compute a compositional Principal Components Analysis ordination and select only the 10 most important features (meaning that in the plot we will only see 10 arrows)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "table = bt.to_dataframe()\n", "\n", "mat = clr(centralize(closure(table.T + 1)))\n", "u, k, v = np.linalg.svd(mat)\n", "N = len(u)\n", "\n", "DIMENSIONS = 5\n", "_k = k[:DIMENSIONS]\n", "\n", "# scale U matrix wrt to sqrt of eigenvalues\n", "u = u[:,:DIMENSIONS] * np.sqrt(N-1)\n", "# scale V matrix wrt to sqrt of eigenvalues\n", "v = np.multiply(v[:DIMENSIONS,:],(_k.reshape(DIMENSIONS,1) / np.sqrt(N-1)))\n", "\n", "axes = ['CPCA %d' % i for i in range(1, DIMENSIONS + 1)]\n", "\n", "samples = pd.DataFrame(u, index=table.columns, columns=axes)\n", "features = pd.DataFrame(v.T, index=table.index, columns=axes)\n", "\n", "features['importance'] = features.apply(lambda x: euclidean(np.zeros_like(x), x), axis=1)\n", "features.sort_values('importance', inplace=True, ascending=False)\n", "features.drop(['importance'], inplace=True, axis=1)\n", "\n", "# only keep the 10 most important features, change this number to see more arrows\n", "features = features[:10]\n", "\n", "res = OrdinationResults(\n", " short_method_name='CPCA',\n", " long_method_name='Compositional Principal Component Analysis',\n", " eigvals=pd.Series(_k, index=axes),\n", " samples=samples,\n", " features=features,\n", " proportion_explained=_k /_k.sum()\n", ")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# With feature metadata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The figure below will display the feature and sample data. You can go to Color, select `taxonomy_1` (this will color the arrows at the phylum level) and then select `collection_day_fixed` to color the samples by collection day (we recommend that you use a continuou color mapping, for example Viridis)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "Emperor(res, mf, feature_mapping_file=feature_mf, remote=False)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Without feature metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "Emperor(res, mf, remote=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.1" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 0 }