{
"cells": [
{
"cell_type": "markdown",
"id": "a6bde5eb",
"metadata": {},
"source": [
"# Pegasus Pseudobulk Analysis\n",
"\n",
"Author: [Bo Li](https://github.com/bli25), [Yiming Yang](https://github.com/yihming)
\n",
"Date: 2022-04-10
\n",
"Notebook Source: [pseudobulk.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/pseudobulk.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73510eeb",
"metadata": {},
"outputs": [],
"source": [
"import pegasus as pg\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "2b4b6ddd",
"metadata": {},
"source": [
"## Dataset\n",
"\n",
"In this tutorial, we'll use a gene-count matrix dataset on human bone marrow from 8 donors, create pseudobulk matrix regarding donors, and perform pseudobulk analysis on the data.\n",
"\n",
"The dataset is stored at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip. You can also use [gsutil](https://cloud.google.com/storage/docs/gsutil) to download it via its Google bucket URL (gs://terra-featured-workspaces/Cumulus/MantonBM_nonmix_subset.zarr.zip).\n",
"\n",
"Now load the count matrix:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fbe49dcb",
"metadata": {},
"outputs": [],
"source": [
"data = pg.read_input('MantonBM_nonmix_subset.zarr.zip')"
]
},
{
"cell_type": "markdown",
"id": "1ef8e9c0",
"metadata": {},
"source": [
"## Generate pseudobulk matrix\n",
"\n",
"Pegasus provides `pseudobulk` function to generate pseudobulk matrix. The code below generate a pseudobulk count matrix regarding donors (`Channel`), and transfer the gender attribute to the resulting pseudobulks:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ba9210f2",
"metadata": {},
"outputs": [],
"source": [
"pseudo = pg.pseudobulk(data, 'Channel', 'gender')\n",
"pseudo"
]
},
{
"cell_type": "markdown",
"id": "ed8a8b16",
"metadata": {},
"source": [
"For details on `pseudobulk` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.pseudobulk.html)."
]
},
{
"cell_type": "markdown",
"id": "93a287ad",
"metadata": {},
"source": [
"## Differential Expression (DE) Analysis on Pseudobulk Matrix\n",
"\n",
"Pegasus has `deseq2` function to perform DE analysis on Pseudobulk data, which is a Python wrapper of [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) package in R (You need to first install the original R package). The code below analyzes based on a regression model considering the gender attribute, and estimates the contrast between `female` and `male`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86cdfc84",
"metadata": {},
"outputs": [],
"source": [
"pg.deseq2(pseudo, '~gender', ('gender', 'female', 'male'))"
]
},
{
"cell_type": "markdown",
"id": "c02f8d23",
"metadata": {},
"source": [
"The DE result is a Pandas DataFrame object stored in `pseudo.varm[\"deseq2\"]`, which could be viewed as the following (ranked by log2 fold-change):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8dbd8cbf",
"metadata": {},
"outputs": [],
"source": [
"pd.DataFrame(pseudo.varm[\"deseq2\"], index=pseudo.var_names)"
]
},
{
"cell_type": "markdown",
"id": "747e65dc",
"metadata": {},
"source": [
"Notice that the fold-change calculated has `female` be the numerator, while `male` be the denominator.\n",
"\n",
"For details on `deseq2` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.deseq2.html)."
]
},
{
"cell_type": "markdown",
"id": "01e10459",
"metadata": {},
"source": [
"### Get significant DE genes in human-readable format"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a074fb23",
"metadata": {},
"outputs": [],
"source": [
"markers = pg.pseudo.markers(pseudo)\n",
"print(markers)"
]
},
{
"cell_type": "markdown",
"id": "bb6b7c6d",
"metadata": {},
"source": [
"### Write DE results to spreadsheet"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c10148b7",
"metadata": {},
"outputs": [],
"source": [
"pg.pseudo.write_results_to_excel(markers, 'test.de.xlsx')"
]
},
{
"cell_type": "markdown",
"id": "fd875cde",
"metadata": {},
"source": [
"### Generate volcano plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7641ac37",
"metadata": {},
"outputs": [],
"source": [
"pg.pseudo.volcano(pseudo)"
]
},
{
"cell_type": "markdown",
"id": "23e6be94",
"metadata": {},
"source": [
"## Gene Set Enrichment Analysis (GSEA)\n",
"\n",
"Pegasus has `fgsea` function to perform GSEA analysis, which is a Python wrapper of [fgsea](http://bioconductor.org/packages/release/bioc/html/fgsea.html) package in R (You need to first install the original R package):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e62d4e47",
"metadata": {},
"outputs": [],
"source": [
"pg.fgsea(pseudo, 'log2FoldChange', 'canonical_pathways', 'deseq2', fgsea_key = 'fgsea_deseq2')"
]
},
{
"cell_type": "markdown",
"id": "d72f99f8",
"metadata": {},
"source": [
"The code above runs GSEA analysis based on log2 fold-change values, and use one of the preset gene sets that Pegasus provides:\n",
"* `canonical_pathways`: The [MsigDB](http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) C2/CP gene set.\n",
"* `hallmark`: The [MsigDB](http://www.gsea-msigdb.org/gsea/msigdb/collections.jsp) H gene set.\n",
"Notice that this function is applicable to DE results from both single-cell and pseudobulk data.\n",
"\n",
"For details on `fgsea` function, please see its [documentation](https://pegasus.readthedocs.io/en/stable/api/pegasus.fgsea.html)."
]
},
{
"cell_type": "markdown",
"id": "29168c1b",
"metadata": {},
"source": [
"### Generate GSEA plots\n",
"\n",
"Finally, generate the GSEA histograms by `plot_gsea` function. In general, there will be 2 panels:\n",
"* Top panel shows the up-regulated pathways in red color.\n",
"* Bottom panel shows the down-regulated pathways in green color."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90b62fa5",
"metadata": {},
"outputs": [],
"source": [
"pg.plot_gsea(pseudo, 'fgsea_deseq2')"
]
},
{
"cell_type": "markdown",
"id": "c211c078",
"metadata": {},
"source": [
"As shown in the plot above, for this pseudobulk data, there is no up-regulated (i.e. female significant) pathways, while some down-regulated (i.e. male significant) pathways are observed."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}