{ "cells": [ { "cell_type": "markdown", "id": "a6bde5eb", "metadata": {}, "source": [ "# Pegasus Pseudobulk Analysis\n", "\n", "Author: [Bo Li](https://github.com/bli25), [Yiming Yang](https://github.com/yihming)
\n", "Date: 2022-04-10
\n", "Notebook Source: [pseudobulk.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/pseudobulk.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "id": "73510eeb", "metadata": {}, "outputs": [], "source": [ "import pegasus as pg\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "2b4b6ddd", "metadata": {}, "source": [ "## Dataset\n", "\n", "In this tutorial, we'll use a gene-count matrix dataset on human bone marrow from 8 donors, create pseudobulk matrix regarding donors, and perform pseudobulk analysis on the data.\n", "\n", "The dataset is stored at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip. You can also use [gsutil](https://cloud.google.com/storage/docs/gsutil) to download it via its Google bucket URL (gs://terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip).\n", "\n", "Now load the count matrix:" ] }, { "cell_type": "code", "execution_count": null, "id": "fbe49dcb", "metadata": {}, "outputs": [], "source": [ "data = pg.read_input('MantonBM_nonmix_subset.zarr.zip')" ] }, { "cell_type": "markdown", "id": "1ef8e9c0", "metadata": {}, "source": [ "## Generate pseudobulk matrix\n", "\n", "Pegasus provides `pseudobulk` function to generate pseudobulk matrix. The code below generate a pseudobulk count matrix regarding donors (`Channel`), and transfer the gender attribute to the resulting pseudobulks:" ] }, { "cell_type": "code", "execution_count": null, "id": "ba9210f2", "metadata": {}, "outputs": [], "source": [ "pseudo = pg.pseudobulk(data, 'Channel', 'gender')\n", "pseudo" ] }, { "cell_type": "markdown", "id": "ed8a8b16", "metadata": {}, "source": [ "For details on `pseudobulk` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.pseudobulk.html)." ] }, { "cell_type": "markdown", "id": "93a287ad", "metadata": {}, "source": [ "## Differential Expression (DE) Analysis on Pseudobulk Matrix\n", "\n", "Pegasus has `deseq2` function to perform DE analysis on Pseudobulk data, which is a Python wrapper of [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) package in R (You need to first install the original R package). The code below analyzes based on a regression model considering the gender attribute, and estimates the contrast between `female` and `male`:" ] }, { "cell_type": "code", "execution_count": null, "id": "86cdfc84", "metadata": {}, "outputs": [], "source": [ "pg.deseq2(pseudo, '~gender', ('gender', 'female', 'male'))" ] }, { "cell_type": "markdown", "id": "c02f8d23", "metadata": {}, "source": [ "The DE result is a Pandas DataFrame object stored in `pseudo.varm[\"deseq2\"]`, which could be viewed as the following (ranked by log2 fold-change):" ] }, { "cell_type": "code", "execution_count": null, "id": "8dbd8cbf", "metadata": {}, "outputs": [], "source": [ "pd.DataFrame(pseudo.varm[\"deseq2\"], index=pseudo.var_names)" ] }, { "cell_type": "markdown", "id": "747e65dc", "metadata": {}, "source": [ "Notice that the fold-change calculated has `female` be the numerator, while `male` be the denominator.\n", "\n", "For details on `deseq2` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.deseq2.html)." ] }, { "cell_type": "markdown", "id": "01e10459", "metadata": {}, "source": [ "### Get significant DE genes in human-readable format" ] }, { "cell_type": "code", "execution_count": null, "id": "a074fb23", "metadata": {}, "outputs": [], "source": [ "markers = pg.pseudo.markers(pseudo)\n", "print(markers)" ] }, { "cell_type": "markdown", "id": "bb6b7c6d", "metadata": {}, "source": [ "### Write DE results to spreadsheet" ] }, { "cell_type": "code", "execution_count": null, "id": "c10148b7", "metadata": {}, "outputs": [], "source": [ "pg.pseudo.write_results_to_excel(markers, 'test.de.xlsx')" ] }, { "cell_type": "markdown", "id": "fd875cde", "metadata": {}, "source": [ "### Generate volcano plot" ] }, { "cell_type": "code", "execution_count": null, "id": "7641ac37", "metadata": {}, "outputs": [], "source": [ "pg.pseudo.volcano(pseudo)" ] }, { "cell_type": "markdown", "id": "23e6be94", "metadata": {}, "source": [ "## Gene Set Enrichment Analysis (GSEA)\n", "\n", "Pegasus has `fgsea` function to perform GSEA analysis, which is a Python wrapper of [fgsea](http://bioconductor.org/packages/release/bioc/html/fgsea.html) package in R (You need to first install the original R package):" ] }, { "cell_type": "code", "execution_count": null, "id": "e62d4e47", "metadata": {}, "outputs": [], "source": [ "pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')" ] }, { "cell_type": "markdown", "id": "d72f99f8", "metadata": {}, "source": [ "The code above runs GSEA analysis based on log2 fold-change values, and use one of the preset gene sets that Pegasus provides:\n", "* `canonical_pathways`: The [MsigDB](http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) C2/CP gene set.\n", "* `hallmark`: The [MsigDB](http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) H gene set.\n", "Notice that this function is applicable to DE results from both single-cell and pseudobulk data.\n", "\n", "For details on `fgsea` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.fgsea.html)." ] }, { "cell_type": "markdown", "id": "29168c1b", "metadata": {}, "source": [ "### Generate GSEA plots\n", "\n", "Finally, generate the GSEA histograms by `plot_gsea` function. In general, there will be 2 panels:\n", "* Top panel shows the up-regulated pathways in red color.\n", "* Bottom panel shows the down-regulated pathways in green color." ] }, { "cell_type": "code", "execution_count": null, "id": "90b62fa5", "metadata": {}, "outputs": [], "source": [ "pg.plot_gsea(pseudo, 'fgsea_deseq2')" ] }, { "cell_type": "markdown", "id": "c211c078", "metadata": {}, "source": [ "As shown in the plot above, for this pseudobulk data, there is no up-regulated (i.e. female significant) pathways, while some down-regulated (i.e. male significant) pathways are observed." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }