{ "cells": [ { "cell_type": "markdown", "id": "e0d18445", "metadata": {}, "source": [ "# Spatial Analysis Tutorial\n", "\n", "Author: [Bo Li](https://github.com/bli25)
\n", "Date: 2022-03-09
\n", "Notebook Source: [spatial_analysis.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/spatial_analysis.ipynb)\n", "\n", "\n", "This tutorial runs analysis on a 10x Visium [mouse brain section](https://www.10xgenomics.com/resources/datasets/mouse-brain-section-coronal-1-standard-1-1-0) dataset with Pegasus." ] }, { "cell_type": "code", "execution_count": null, "id": "d2729f1d", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import pegasusio as io\n", "import pegasus as pg" ] }, { "cell_type": "markdown", "id": "399c14d6", "metadata": {}, "source": [ "## Load data\n", "\n", "\n", "You can download the data at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/mouse_brain_10x.tar.gz.\n", "\n", "After downloading, unzip the tar ball file and load the data into memory:" ] }, { "cell_type": "code", "execution_count": null, "id": "090fb39a", "metadata": {}, "outputs": [], "source": [ "data = io.read_input('mouse_brain_10x', file_type='visium')\n", "data" ] }, { "cell_type": "markdown", "id": "65325e49", "metadata": {}, "source": [ "## Quality Control (QC)\n", "\n", "### Calculate statistics for QC\n", "\n", "First calculate QC metrics before filtration. Notice that mouse mito gene names start with `mt-`." ] }, { "cell_type": "code", "execution_count": null, "id": "b9783eed", "metadata": {}, "outputs": [], "source": [ "pg.qc_metrics(data, mito_prefix='mt-')" ] }, { "cell_type": "markdown", "id": "c6c05f85", "metadata": {}, "source": [ "Then we can view the metrics. For example, the code below shows the 2.5% and 97.5% quantiles for number of genes:" ] }, { "cell_type": "code", "execution_count": null, "id": "ebd72b6f", "metadata": {}, "outputs": [], "source": [ "np.percentile(data.obs['n_genes'], [2.5, 97.5])" ] }, { "cell_type": "markdown", "id": "4679ce6a", "metadata": {}, "source": [ "### QC filtration\n", "\n", "Based on the quantiles above, we filter the data by number of genes between 2.5% and 97.5%, and percent of mito gene expression below 20%:" ] }, { "cell_type": "code", "execution_count": null, "id": "43544ac7", "metadata": {}, "outputs": [], "source": [ "pg.qc_metrics(data, min_genes = 2917, max_genes = 8664, mito_prefix='mt-', percent_mito=20)" ] }, { "cell_type": "code", "execution_count": null, "id": "87cc272c", "metadata": {}, "outputs": [], "source": [ "pg.qcviolin(data, plot_type='gene', dpi=100)" ] }, { "cell_type": "code", "execution_count": null, "id": "53d588d9", "metadata": {}, "outputs": [], "source": [ "pg.qcviolin(data, plot_type='count', dpi=100)" ] }, { "cell_type": "code", "execution_count": null, "id": "d7d70c83", "metadata": {}, "outputs": [], "source": [ "pg.qcviolin(data, plot_type='mito', dpi=100)" ] }, { "cell_type": "markdown", "id": "bd3ff22d", "metadata": {}, "source": [ "Now do the actual filteration below:" ] }, { "cell_type": "code", "execution_count": null, "id": "5de76ec4", "metadata": {}, "outputs": [], "source": [ "pg.filter_data(data)" ] }, { "cell_type": "markdown", "id": "cc5a54d1", "metadata": {}, "source": [ "And identify robust genes:" ] }, { "cell_type": "code", "execution_count": null, "id": "11d9e8a7", "metadata": {}, "outputs": [], "source": [ "pg.identify_robust_genes(data)" ] }, { "cell_type": "markdown", "id": "77443735", "metadata": {}, "source": [ "## Downstream analysis to get clusters" ] }, { "cell_type": "code", "execution_count": null, "id": "90371173", "metadata": {}, "outputs": [], "source": [ "pg.log_norm(data)\n", "pg.highly_variable_features(data)\n", "pg.pca(data)\n", "pg.neighbors(data)\n", "pg.leiden(data)\n", "pg.umap(data)" ] }, { "cell_type": "markdown", "id": "2955e9f0", "metadata": {}, "source": [ "Run the code below to show the UMAP plot of cells colored by cluster labels generated by Leiden algorithm on their PCA embedding:" ] }, { "cell_type": "code", "execution_count": null, "id": "fb82b84c", "metadata": {}, "outputs": [], "source": [ "pg.scatter(data, attrs='leiden_labels')" ] }, { "cell_type": "markdown", "id": "66cd620d", "metadata": {}, "source": [ "## DE analysis" ] }, { "cell_type": "markdown", "id": "09748368", "metadata": {}, "source": [ "Run the function below to perform Differential Expression analysis:" ] }, { "cell_type": "code", "execution_count": null, "id": "cdb08973", "metadata": {}, "outputs": [], "source": [ "pg.de_analysis(data, cluster='leiden_labels')" ] }, { "cell_type": "markdown", "id": "2967e347", "metadata": {}, "source": [ "## Cell type annotation\n", "\n", "This is to annotate cluster-specific cell types with preset mouse brain markers:" ] }, { "cell_type": "code", "execution_count": null, "id": "72f8a70d", "metadata": { "scrolled": false }, "outputs": [], "source": [ "celltype_dict = pg.infer_cell_types(data, markers = 'mouse_brain')\n", "celltype_dict" ] }, { "cell_type": "code", "execution_count": null, "id": "68b71b60", "metadata": {}, "outputs": [], "source": [ "cluster_names = pg.infer_cluster_names(celltype_dict)\n", "pg.annotate(data, name='anno', based_on='leiden_labels', anno_dict=cluster_names)" ] }, { "cell_type": "markdown", "id": "4b8445c6", "metadata": {}, "source": [ "## Plotting" ] }, { "cell_type": "code", "execution_count": null, "id": "d936be1d", "metadata": {}, "outputs": [], "source": [ "pg.scatter(data, 'anno', legend_loc='on data')" ] }, { "cell_type": "markdown", "id": "a1cbad9d", "metadata": {}, "source": [ "Besides the UMAP plot, Pegasus also provides [spatial](https://pegasus.readthedocs.io/en/stable/api/pegasus.spatial.html) function to generate spatial plots. Below is the spatial plot of cells colored by their cell types:" ] }, { "cell_type": "code", "execution_count": null, "id": "effa4f46", "metadata": {}, "outputs": [], "source": [ "pg.spatial(data, 'anno')" ] }, { "cell_type": "markdown", "id": "50fb6769", "metadata": {}, "source": [ "## Save results" ] }, { "cell_type": "code", "execution_count": null, "id": "4fc87b20", "metadata": {}, "outputs": [], "source": [ "pg.write_output(data, \"spatial_results.zarr.zip\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 5 }