{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e0d18445",
   "metadata": {},
   "source": [
    "# Spatial Analysis Tutorial\n",
    "\n",
    "Author: [Bo Li](https://github.com/bli25)<br />\n",
    "Date: 2022-03-09<br />\n",
    "Notebook Source: [spatial_analysis.ipynb](https://raw.githubusercontent.com/lilab-bcb/pegasus-tutorials/main/notebooks/spatial_analysis.ipynb)\n",
    "\n",
    "\n",
    "This tutorial runs analysis on a 10x Visium [mouse brain section](https://www.10xgenomics.com/resources/datasets/mouse-brain-section-coronal-1-standard-1-1-0) dataset with Pegasus."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2729f1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import pegasusio as io\n",
    "import pegasus as pg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "399c14d6",
   "metadata": {},
   "source": [
    "## Load data\n",
    "\n",
    "\n",
    "You can download the data at https://storage.googleapis.com/terra-featured-workspaces/Cumulus/mouse_brain_10x.tar.gz.\n",
    "\n",
    "After downloading, unzip the tar ball file and load the data into memory:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "090fb39a",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = io.read_input('mouse_brain_10x', file_type='visium')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65325e49",
   "metadata": {},
   "source": [
    "## Quality Control (QC)\n",
    "\n",
    "### Calculate statistics for QC\n",
    "\n",
    "First calculate QC metrics before filtration. Notice that mouse mito gene names start with `mt-`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9783eed",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.qc_metrics(data, mito_prefix='mt-')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6c05f85",
   "metadata": {},
   "source": [
    "Then we can view the metrics. For example, the code below shows the 2.5% and 97.5% quantiles for number of genes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ebd72b6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "np.percentile(data.obs['n_genes'], [2.5, 97.5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4679ce6a",
   "metadata": {},
   "source": [
    "### QC filtration\n",
    "\n",
    "Based on the quantiles above, we filter the data by number of genes between 2.5% and 97.5%, and percent of mito gene expression below 20%:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43544ac7",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.qc_metrics(data, min_genes = 2917, max_genes = 8664, mito_prefix='mt-', percent_mito=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87cc272c",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.qcviolin(data, plot_type='gene', dpi=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53d588d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.qcviolin(data, plot_type='count', dpi=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7d70c83",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.qcviolin(data, plot_type='mito', dpi=100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd3ff22d",
   "metadata": {},
   "source": [
    "Now do the actual filteration below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5de76ec4",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.filter_data(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc5a54d1",
   "metadata": {},
   "source": [
    "And identify robust genes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11d9e8a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.identify_robust_genes(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77443735",
   "metadata": {},
   "source": [
    "## Downstream analysis to get clusters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90371173",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.log_norm(data)\n",
    "pg.highly_variable_features(data)\n",
    "pg.pca(data)\n",
    "pg.neighbors(data)\n",
    "pg.leiden(data)\n",
    "pg.umap(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2955e9f0",
   "metadata": {},
   "source": [
    "Run the code below to show the UMAP plot of cells colored by cluster labels generated by Leiden algorithm on their PCA embedding:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb82b84c",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.scatter(data, attrs='leiden_labels')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66cd620d",
   "metadata": {},
   "source": [
    "## DE analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09748368",
   "metadata": {},
   "source": [
    "Run the function below to perform Differential Expression analysis:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdb08973",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.de_analysis(data, cluster='leiden_labels')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2967e347",
   "metadata": {},
   "source": [
    "## Cell type annotation\n",
    "\n",
    "This is to annotate cluster-specific cell types with preset mouse brain markers:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72f8a70d",
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "celltype_dict = pg.infer_cell_types(data, markers = 'mouse_brain')\n",
    "celltype_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68b71b60",
   "metadata": {},
   "outputs": [],
   "source": [
    "cluster_names = pg.infer_cluster_names(celltype_dict)\n",
    "pg.annotate(data, name='anno', based_on='leiden_labels', anno_dict=cluster_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b8445c6",
   "metadata": {},
   "source": [
    "## Plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d936be1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.scatter(data, 'anno', legend_loc='on data')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1cbad9d",
   "metadata": {},
   "source": [
    "Besides the UMAP plot, Pegasus also provides [spatial](https://pegasus.readthedocs.io/en/stable/api/pegasus.spatial.html) function to generate spatial plots. Below is the spatial plot of cells colored by their cell types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "effa4f46",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.spatial(data, 'anno')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50fb6769",
   "metadata": {},
   "source": [
    "## Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4fc87b20",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg.write_output(data, \"spatial_results.zarr.zip\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}