{
"cells": [
{
"cell_type": "markdown",
"id": "e0d18445",
"metadata": {},
"source": [
"# Spatial Analysis Tutorial\n",
"\n",
"Author: [Bo Li](https://github.com/bli25)
\n",
"Date: 2022-03-09
\n",
"Notebook Source: [spatial_analysis.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/spatial_analysis.ipynb)\n",
"\n",
"\n",
"This tutorial runs analysis on a 10x Visium [mouse brain section](https://www.10xgenomics.com/resources/datasets/mouse-brain-section-coronal-1-standard-1-1-0) dataset with Pegasus."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2729f1d",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import pegasusio as io\n",
"import pegasus as pg"
]
},
{
"cell_type": "markdown",
"id": "399c14d6",
"metadata": {},
"source": [
"## Load data\n",
"\n",
"\n",
"You can download the data at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/mouse_brain_10x.tar.gz.\n",
"\n",
"After downloading, unzip the tar ball file and load the data into memory:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "090fb39a",
"metadata": {},
"outputs": [],
"source": [
"data = io.read_input('mouse_brain_10x', file_type='visium')\n",
"data"
]
},
{
"cell_type": "markdown",
"id": "65325e49",
"metadata": {},
"source": [
"## Quality Control (QC)\n",
"\n",
"### Calculate statistics for QC\n",
"\n",
"First calculate QC metrics before filtration. Notice that mouse mito gene names start with `mt-`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b9783eed",
"metadata": {},
"outputs": [],
"source": [
"pg.qc_metrics(data, mito_prefix='mt-')"
]
},
{
"cell_type": "markdown",
"id": "c6c05f85",
"metadata": {},
"source": [
"Then we can view the metrics. For example, the code below shows the 2.5% and 97.5% quantiles for number of genes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ebd72b6f",
"metadata": {},
"outputs": [],
"source": [
"np.percentile(data.obs['n_genes'], [2.5, 97.5])"
]
},
{
"cell_type": "markdown",
"id": "4679ce6a",
"metadata": {},
"source": [
"### QC filtration\n",
"\n",
"Based on the quantiles above, we filter the data by number of genes between 2.5% and 97.5%, and percent of mito gene expression below 20%:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43544ac7",
"metadata": {},
"outputs": [],
"source": [
"pg.qc_metrics(data, min_genes = 2917, max_genes = 8664, mito_prefix='mt-', percent_mito=20)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "87cc272c",
"metadata": {},
"outputs": [],
"source": [
"pg.qcviolin(data, plot_type='gene', dpi=100)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53d588d9",
"metadata": {},
"outputs": [],
"source": [
"pg.qcviolin(data, plot_type='count', dpi=100)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7d70c83",
"metadata": {},
"outputs": [],
"source": [
"pg.qcviolin(data, plot_type='mito', dpi=100)"
]
},
{
"cell_type": "markdown",
"id": "bd3ff22d",
"metadata": {},
"source": [
"Now do the actual filteration below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5de76ec4",
"metadata": {},
"outputs": [],
"source": [
"pg.filter_data(data)"
]
},
{
"cell_type": "markdown",
"id": "cc5a54d1",
"metadata": {},
"source": [
"And identify robust genes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11d9e8a7",
"metadata": {},
"outputs": [],
"source": [
"pg.identify_robust_genes(data)"
]
},
{
"cell_type": "markdown",
"id": "77443735",
"metadata": {},
"source": [
"## Downstream analysis to get clusters"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90371173",
"metadata": {},
"outputs": [],
"source": [
"pg.log_norm(data)\n",
"pg.highly_variable_features(data)\n",
"pg.pca(data)\n",
"pg.neighbors(data)\n",
"pg.leiden(data)\n",
"pg.umap(data)"
]
},
{
"cell_type": "markdown",
"id": "2955e9f0",
"metadata": {},
"source": [
"Run the code below to show the UMAP plot of cells colored by cluster labels generated by Leiden algorithm on their PCA embedding:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb82b84c",
"metadata": {},
"outputs": [],
"source": [
"pg.scatter(data, attrs='leiden_labels')"
]
},
{
"cell_type": "markdown",
"id": "66cd620d",
"metadata": {},
"source": [
"## DE analysis"
]
},
{
"cell_type": "markdown",
"id": "09748368",
"metadata": {},
"source": [
"Run the function below to perform Differential Expression analysis:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cdb08973",
"metadata": {},
"outputs": [],
"source": [
"pg.de_analysis(data, cluster='leiden_labels')"
]
},
{
"cell_type": "markdown",
"id": "2967e347",
"metadata": {},
"source": [
"## Cell type annotation\n",
"\n",
"This is to annotate cluster-specific cell types with preset mouse brain markers:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72f8a70d",
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"celltype_dict = pg.infer_cell_types(data, markers = 'mouse_brain')\n",
"celltype_dict"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68b71b60",
"metadata": {},
"outputs": [],
"source": [
"cluster_names = pg.infer_cluster_names(celltype_dict)\n",
"pg.annotate(data, name='anno', based_on='leiden_labels', anno_dict=cluster_names)"
]
},
{
"cell_type": "markdown",
"id": "4b8445c6",
"metadata": {},
"source": [
"## Plotting"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d936be1d",
"metadata": {},
"outputs": [],
"source": [
"pg.scatter(data, 'anno', legend_loc='on data')"
]
},
{
"cell_type": "markdown",
"id": "a1cbad9d",
"metadata": {},
"source": [
"Besides the UMAP plot, Pegasus also provides [spatial](https://pegasus.readthedocs.io/en/stable/api/pegasus.spatial.html) function to generate spatial plots. Below is the spatial plot of cells colored by their cell types:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "effa4f46",
"metadata": {},
"outputs": [],
"source": [
"pg.spatial(data, 'anno')"
]
},
{
"cell_type": "markdown",
"id": "50fb6769",
"metadata": {},
"source": [
"## Save results"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4fc87b20",
"metadata": {},
"outputs": [],
"source": [
"pg.write_output(data, \"spatial_results.zarr.zip\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}