# Spatial Analysis Tutorial

Author: [Bo Li](https://github.com/bli25)<br />
Date: 2022-03-09<br />
Notebook Source: [spatial_analysis.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/spatial_analysis.ipynb)


This tutorial runs analysis on a 10x Visium [mouse brain section](https://www.10xgenomics.com/resources/datasets/mouse-brain-section-coronal-1-standard-1-1-0) dataset with Pegasus.

In [None]:
import numpy as np
import pandas as pd
import pegasusio as io
import pegasus as pg

## Load data


You can download the data at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/mouse_brain_10x.tar.gz.

After downloading, unzip the tar ball file and load the data into memory:

In [None]:
data = io.read_input('mouse_brain_10x', file_type='visium')
data

## Quality Control (QC)

### Calculate statistics for QC

First calculate QC metrics before filtration. Notice that mouse mito gene names start with `mt-`.

In [None]:
pg.qc_metrics(data, mito_prefix='mt-')

Then we can view the metrics. For example, the code below shows the 2.5% and 97.5% quantiles for number of genes:

In [None]:
np.percentile(data.obs['n_genes'], [2.5, 97.5])

### QC filtration

Based on the quantiles above, we filter the data by number of genes between 2.5% and 97.5%, and percent of mito gene expression below 20%:

In [None]:
pg.qc_metrics(data, min_genes = 2917, max_genes = 8664, mito_prefix='mt-', percent_mito=20)

In [None]:
pg.qcviolin(data, plot_type='gene', dpi=100)

In [None]:
pg.qcviolin(data, plot_type='count', dpi=100)

In [None]:
pg.qcviolin(data, plot_type='mito', dpi=100)

Now do the actual filteration below:

In [None]:
pg.filter_data(data)

And identify robust genes:

In [None]:
pg.identify_robust_genes(data)

## Downstream analysis to get clusters

In [None]:
pg.log_norm(data)
pg.highly_variable_features(data)
pg.pca(data)
pg.neighbors(data)
pg.leiden(data)
pg.umap(data)

Run the code below to show the UMAP plot of cells colored by cluster labels generated by Leiden algorithm on their PCA embedding:

In [None]:
pg.scatter(data, attrs='leiden_labels')

## DE analysis

Run the function below to perform Differential Expression analysis:

In [None]:
pg.de_analysis(data, cluster='leiden_labels')

## Cell type annotation

This is to annotate cluster-specific cell types with preset mouse brain markers:

In [None]:
celltype_dict = pg.infer_cell_types(data, markers = 'mouse_brain')
celltype_dict

In [None]:
cluster_names = pg.infer_cluster_names(celltype_dict)
pg.annotate(data, name='anno', based_on='leiden_labels', anno_dict=cluster_names)

## Plotting

In [None]:
pg.scatter(data, 'anno', legend_loc='on data')

Besides the UMAP plot, Pegasus also provides [spatial](https://pegasus.readthedocs.io/en/stable/api/pegasus.spatial.html) function to generate spatial plots. Below is the spatial plot of cells colored by their cell types:

In [None]:
pg.spatial(data, 'anno')

## Save results

In [None]:
pg.write_output(data, "spatial_results.zarr.zip")