{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Doublet Detection Tutorial\n",
"\n",
"Author: [Hui Ma](https://github.com/huimalinda), [Yiming Yang](https://github.com/yihming), [Rimte Rocher](https://github.com/rocherr)
\n",
"Date: 2022-03-09
\n",
"Notebook Source: [doublet_detection.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/doublet_detection.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pegasus as pg"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataset\n",
"In this tutorial, we'll use the output result of [Pegasus Tutorial](https://pegasus-tutorials.readthedocs.io/en/latest/_static/tutorials/pegasus_analysis.html) to demonstrate how to detect and remove doublet cells in Pegasus. The dataset consists of human bone marrow single cells from 8 donors.\n",
"\n",
"The dataset is stored at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip. You can also use `gsutil` to download it via its Google bucket URL (gs://terra-featured-workspaces/Cumulus/MantonBM_result.zarr.zip).\n",
"\n",
"Now load the count matrix:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = pg.read_input(\"MantonBM_result.zarr.zip\")\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sections\n",
"- [Detect Doublets](#mark)\n",
"- [Remove Doublets and Recluster](#recluster)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Detect Doublets\n",
"\n",
"In this step, infer doublets per channel. Set clust_attr = 'anno' to see the doublet density in each cluster and infer doublet cluster.\n",
"\n",
"The method used for detecting doublets can be found [here](https://github.com/klarman-cell-observatory/pegasus/raw/master/doublet_detection.pdf)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.infer_doublets(data, channel_attr = 'Channel', clust_attr = 'anno') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, plot annotation and Scrublet-like doublet score."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.scatter(data,attrs=['anno','doublet_score'], basis='umap', wspace=1.2) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also want to see the doublet percentage of each cluster to decide if there is a doublet cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data.uns['pred_dbl_cluster']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All clusters have doublet percentage under 5%, so no need to mark any doublet clusters here. If any cluster has doublet percentage more than $50\\%$, we can consider to mark it as doublet cluster. \n",
"\n",
"For example, If we want to mark 'CD14+ Monocyte' and 'CD14+ Monocyte-2' as doublet clusters, use the following code:\n",
"\n",
"`pg.mark_doublets(data, dbl_clusts = 'anno:CD14+ Monocyte,CD14+ Monocyte-2')`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `mark_doublets` function will mark doublet cluster (if any), and write singlet/doublet assignment to the \"demux_type\" column attribute in `data.obs`. The \"demux_type\" attribute is also used for singlet/doublet assignment of cell hashing, nucleus hashing and genetics pooling data (see [documentation](https://cumulus.readthedocs.io/en/latest/demultiplexing.html)).\n",
"\n",
"For this demonstration dataset, among $35,465$ cells, $724$ doublets detected. Doublet rate is $2.04\\%$:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.mark_doublets(data)\n",
"data.obs['demux_type'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Doublets distribution can be better observed in UMAP plot:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.scatter(data, attrs = ['anno', 'demux_type'], legend_loc = ['on data', 'right margin'], \n",
" wspace = 0.1,alpha = [1.0, 0.8], palettes = 'demux_type:gainsboro,red')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remove Doublets and Recluster\n",
""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.qc_metrics(data, select_singlets=True)\n",
"pg.filter_data(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start the reclustering process from re-selecting highly variable genes. Batch effect is observed, so we also want to use harmony algorithm to correct bach effect for reclustering."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.highly_variable_features(data, batch='Channel')\n",
"pg.pca(data)\n",
"pca_key = pg.run_harmony(data)\n",
"pg.neighbors(data,rep=pca_key)\n",
"pg.louvain(data,rep=pca_key)\n",
"pg.umap(data,rep=pca_key)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-annotate:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.de_analysis(data, cluster='louvain_labels')\n",
"celltype_dict = pg.infer_cell_types(data, markers = 'human_immune',de_test='mwu',output_file='BM_celltype_re_dict.txt')\n",
"cluster_names = pg.infer_cluster_names(celltype_dict)\n",
"pg.annotate(data, name='anno', based_on='louvain_labels', anno_dict=cluster_names)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Umap of annotation after re-clustering:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pg.scatter(data,attrs='anno',legend_loc='on data',basis='umap')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}