{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Doublet Detection Tutorial\n", "\n", "Author: [Hui Ma](https://github.com/huimalinda), [Yiming Yang](https://github.com/yihming), [Rimte Rocher](https://github.com/rocherr)
\n", "Date: 2022-03-09
\n", "Notebook Source: [doublet_detection.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/doublet_detection.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pegasus as pg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset\n", "In this tutorial, we'll use the output result of [Pegasus Tutorial](https://pegasus-tutorials.readthedocs.io/en/latest/_static/tutorials/pegasus_analysis.html) to demonstrate how to detect and remove doublet cells in Pegasus. The dataset consists of human bone marrow single cells from 8 donors.\n", "\n", "The dataset is stored at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip. You can also use `gsutil` to download it via its Google bucket URL (gs://terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip).\n", "\n", "Now load the count matrix:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = pg.read_input(\"MantonBM_result.zarr.zip\")\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sections\n", "- [Detect Doublets](#mark)\n", "- [Remove Doublets and Recluster](#recluster)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Detect Doublets\n", "\n", "In this step, infer doublets per channel. Set clust_attr = 'anno' to see the doublet density in each cluster and infer doublet cluster.\n", "\n", "The method used for detecting doublets can be found [here](https://github.com/klarman-cell-observatory/pegasus/raw/master/doublet_detection.pdf)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.infer_doublets(data, channel_attr = 'Channel', clust_attr = 'anno') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, plot annotation and Scrublet-like doublet score." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.scatter(data,attrs=['anno','doublet_score'], basis='umap', wspace=1.2) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also want to see the doublet percentage of each cluster to decide if there is a doublet cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.uns['pred_dbl_cluster']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All clusters have doublet percentage under 5%, so no need to mark any doublet clusters here. If any cluster has doublet percentage more than $50\\%$, we can consider to mark it as doublet cluster. \n", "\n", "For example, If we want to mark 'CD14+ Monocyte' and 'CD14+ Monocyte-2' as doublet clusters, use the following code:\n", "\n", "`pg.mark_doublets(data, dbl_clusts = 'anno:CD14+ Monocyte,CD14+ Monocyte-2')`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `mark_doublets` function will mark doublet cluster (if any), and write singlet/doublet assignment to the \"demux_type\" column attribute in `data.obs`. The \"demux_type\" attribute is also used for singlet/doublet assignment of cell hashing, nucleus hashing and genetics pooling data (see [documentation](https://cumulus.readthedocs.io/en/latest/demultiplexing.html)).\n", "\n", "For this demonstration dataset, among $35,465$ cells, $724$ doublets detected. Doublet rate is $2.04\\%$:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.mark_doublets(data)\n", "data.obs['demux_type'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doublets distribution can be better observed in UMAP plot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.scatter(data, attrs = ['anno', 'demux_type'], legend_loc = ['on data', 'right margin'], \n", " wspace = 0.1,alpha = [1.0, 0.8], palettes = 'demux_type:gainsboro,red')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Remove Doublets and Recluster\n", "" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.qc_metrics(data, select_singlets=True)\n", "pg.filter_data(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start the reclustering process from re-selecting highly variable genes. Batch effect is observed, so we also want to use harmony algorithm to correct bach effect for reclustering." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.highly_variable_features(data, batch='Channel')\n", "pg.pca(data)\n", "pca_key = pg.run_harmony(data)\n", "pg.neighbors(data,rep=pca_key)\n", "pg.louvain(data,rep=pca_key)\n", "pg.umap(data,rep=pca_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Re-annotate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.de_analysis(data, cluster='louvain_labels')\n", "celltype_dict = pg.infer_cell_types(data, markers = 'human_immune',de_test='mwu',output_file='BM_celltype_re_dict.txt')\n", "cluster_names = pg.infer_cluster_names(celltype_dict)\n", "pg.annotate(data, name='anno', based_on='louvain_labels', anno_dict=cluster_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Umap of annotation after re-clustering:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pg.scatter(data,attrs='anno',legend_loc='on data',basis='umap')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }