{ "cells": [ { "cell_type": "markdown", "id": "68a5c2f5-9391-4170-b5ea-9df9ad5eafb4", "metadata": {}, "source": [ "# Access and Analyze `scanpy anndata` Objects from a Manuscript\n", "\n", "This guide provides steps to access and analyze the `scanpy anndata` objects associated with a recent manuscript. These objects are essential for computational biologists and data scientists working in genomics and related fields. There are three replicates available for download:\n", "\n", "- [Replicate 1 (Rep1)](https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad)\n", "- [Replicate 2 (Rep2)](https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad)\n", "- [Replicate 3 (Rep3)](https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad)\n", "\n", "Each `anndata` object contains several elements crucial for comprehensive data analysis:\n", "\n", "1. `.X`: Filtered, normalized, and log-transformed count matrix.\n", "2. `.raw`: Original, filtered raw count matrix.\n", "3. `.obsm['MAGIC_imputed_data']`: Imputed count matrix using MAGIC algorithm.\n", "4. `.obsm['tsne']`: t-SNE maps (as presented in the manuscript), generated using scaled diffusion components.\n", "5. `.obs['clusters']`: Cell clustering information.\n", "6. `.obs['palantir_pseudotime']`: Cell pseudo-time ordering, as determined by Palantir.\n", "7. `.obs['palantir_diff_potential']`: Palantir-determined differentiation potential of cells.\n", "8. `.obsm['palantir_branch_probs']`: Probabilities of cells branching into different lineages, according to Palantir.\n", "9. `.uns['palantir_branch_probs_cell_types']`: Labels for Palantir branch probabilities.\n", "10. `.uns['ct_colors']`: Color codes for cell types, as used in the manuscript.\n", "11. `.uns['cluster_colors']`: Color codes for cell clusters, as used in the manuscript.\n", "\n", "## Python Code for Data Access:" ] }, { "cell_type": "code", "execution_count": 1, "id": "63f356a7-3856-4596-a7b3-9fc05cc3029a", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:20:46.755293Z", "iopub.status.busy": "2023-11-28T21:20:46.755059Z", "iopub.status.idle": "2023-11-28T21:20:59.646740Z", "shell.execute_reply": "2023-11-28T21:20:59.645355Z", "shell.execute_reply.started": "2023-11-28T21:20:46.755266Z" } }, "outputs": [], "source": [ "import scanpy as sc\n", "\n", "# Read in the data, with backup URLs provided\n", "adata_Rep1 = sc.read(\n", " \"../data/human_cd34_bm_rep1.h5ad\",\n", " backup_url=\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad\",\n", ")\n", "adata_Rep2 = sc.read(\n", " \"../data/human_cd34_bm_rep2.h5ad\",\n", " backup_url=\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad\",\n", ")\n", "adata_Rep3 = sc.read(\n", " \"../data/human_cd34_bm_rep3.h5ad\",\n", " backup_url=\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad\",\n", ")" ] }, { "cell_type": "code", "execution_count": 2, "id": "bee4a735-7c47-415a-b1e3-ee776998dbd5", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:20:59.650053Z", "iopub.status.busy": "2023-11-28T21:20:59.649313Z", "iopub.status.idle": "2023-11-28T21:20:59.659463Z", "shell.execute_reply": "2023-11-28T21:20:59.658910Z", "shell.execute_reply.started": "2023-11-28T21:20:59.650021Z" } }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 5780 × 14651\n", " obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'\n", " uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'\n", " obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_Rep1" ] }, { "cell_type": "code", "execution_count": 3, "id": "515e6760-8f95-42d6-87ba-1a2375797ccf", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:20:59.660313Z", "iopub.status.busy": "2023-11-28T21:20:59.660133Z", "iopub.status.idle": "2023-11-28T21:20:59.676952Z", "shell.execute_reply": "2023-11-28T21:20:59.676283Z", "shell.execute_reply.started": "2023-11-28T21:20:59.660295Z" } }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 6501 × 14913\n", " obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'\n", " uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'\n", " obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_Rep2" ] }, { "cell_type": "code", "execution_count": 4, "id": "61d7a8e0-0916-4099-8982-5599d7166104", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:20:59.678250Z", "iopub.status.busy": "2023-11-28T21:20:59.677863Z", "iopub.status.idle": "2023-11-28T21:20:59.691822Z", "shell.execute_reply": "2023-11-28T21:20:59.691131Z", "shell.execute_reply.started": "2023-11-28T21:20:59.678220Z" } }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 12046 × 14044\n", " obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'\n", " uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'\n", " obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "adata_Rep3" ] }, { "cell_type": "markdown", "id": "b057a720-f0f4-40b0-8bcf-02efc9b2124d", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T19:21:40.634650Z", "iopub.status.busy": "2023-11-28T19:21:40.634039Z", "iopub.status.idle": "2023-11-28T19:21:40.647637Z", "shell.execute_reply": "2023-11-28T19:21:40.646498Z", "shell.execute_reply.started": "2023-11-28T19:21:40.634595Z" } }, "source": [ "# Converting `anndata` Objects to `Seurat` Objects Using R\n", "\n", "For researchers working with R and Seurat, the process to convert `anndata` objects to Seurat objects involves the following steps:\n", "\n", "1. **Set Up R Environment and Libraries**:\n", " - Load the necessary libraries: `Seurat` and `anndata`.\n", "\n", "2. **Download and Read the Data**:\n", " - Use `curl::curl_download` to download the `anndata` from the provided URLs.\n", " - Read the data using the `read_h5ad` method from the `anndata` library.\n", "\n", "3. **Create Seurat Objects**:\n", " - Use the `CreateSeuratObject` function to convert the data into Seurat objects, incorporating counts and metadata from the `anndata` object.\n", " - Transfer additional data like tSNE embeddings, imputed gene expressions, and cell fate probabilities into the appropriate slots in the Seurat object.\n", "\n", "### R Code Snippet:" ] }, { "cell_type": "code", "execution_count": null, "id": "562d56fb-80dc-4f44-8266-3ca559e79106", "metadata": { "jupyter": { "source_hidden": true } }, "outputs": [], "source": [ "# this cell only exists to allow running R code inside this python notebook using a conda kernel\n", "import sys\n", "import os\n", "\n", "# Get the path to the python executable\n", "python_executable_path = sys.executable\n", "\n", "# Extract the path to the environment from the path to the python executable\n", "env_path = os.path.dirname(os.path.dirname(python_executable_path))\n", "\n", "print(\n", " f\"Conda env path: {env_path}\\n\"\n", " \"Please make sure you have R installed in the conda environment.\"\n", ")\n", "\n", "os.environ['R_HOME'] = os.path.join(env_path, 'lib', 'R')\n", "\n", "%load_ext rpy2.ipython" ] }, { "cell_type": "code", "execution_count": 6, "id": "ed46f119-e8be-45ba-b447-b46e8b947cf8", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:21:01.081154Z", "iopub.status.busy": "2023-11-28T21:21:01.080675Z", "iopub.status.idle": "2023-11-28T21:23:08.313753Z", "shell.execute_reply": "2023-11-28T21:23:08.313058Z", "shell.execute_reply.started": "2023-11-28T21:21:01.081128Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "R[write to console]: Loading required package: SeuratObject\n", "\n", "R[write to console]: Loading required package: sp\n", "\n", "R[write to console]: \n", "Attaching package: ‘SeuratObject’\n", "\n", "\n", "R[write to console]: The following object is masked from ‘package:base’:\n", "\n", " intersect\n", "\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", " WARNING: The R package \"reticulate\" only fixed recently\n", " an issue that caused a segfault when used with rpy2:\n", " https://github.com/rstudio/reticulate/pull/1188\n", " Make sure that you use a version of that package that includes\n", " the fix.\n", " " ] }, { "name": "stderr", "output_type": "stream", "text": [ "R[write to console]: \n", "Attaching package: ‘anndata’\n", "\n", "\n", "R[write to console]: The following object is masked from ‘package:SeuratObject’:\n", "\n", " Layers\n", "\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Data is of class matrix. Coercing to dgCMatrix.\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Data is of class matrix. Coercing to dgCMatrix.\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Data is of class matrix. Coercing to dgCMatrix.\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n", "R[write to console]: Warning:\n", "R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')\n", "\n" ] } ], "source": [ "%%R\n", "library(Seurat)\n", "library(anndata)\n", "\n", "create_seurat <- function(url) {\n", " file_path <- sub(\"https://s3.amazonaws.com/dp-lab-data-public/palantir/\", \"../data/\", url)\n", " if (!file.exists(file_path)) {\n", " curl::curl_download(url, file_path)\n", " }\n", " data <- read_h5ad(file_path)\n", " \n", " seurat_obj <- CreateSeuratObject(\n", " counts = t(data$X), \n", " meta.data = data$obs,\n", " project = \"CD34+ Bone Marrow Cells\"\n", " )\n", " tsne_data <- data$obsm[[\"tsne\"]]\n", " rownames(tsne_data) <- rownames(data$obs)\n", " colnames(tsne_data) <- c(\"tSNE_1\", \"tSNE_2\")\n", " seurat_obj[[\"tsne\"]] <- CreateDimReducObject(\n", " embeddings = tsne_data,\n", " key = \"tSNE_\"\n", " )\n", " imputed_data <- t(data$obsm[[\"MAGIC_imputed_data\"]])\n", " colnames(imputed_data) <- rownames(data$obs)\n", " rownames(imputed_data) <- rownames(data$var)\n", " seurat_obj[[\"MAGIC_imputed\"]] <- CreateAssayObject(counts = imputed_data)\n", " fate_probs <- as.data.frame(data$obsm[[\"palantir_branch_probs\"]])\n", " colnames(fate_probs) <- data$uns[[\"palantir_branch_probs_cell_types\"]]\n", " rownames(fate_probs) <- rownames(data$obs)\n", " seurat_obj <- AddMetaData(seurat_obj, metadata = fate_probs)\n", "\n", " return(seurat_obj)\n", "}\n", "\n", "human_cd34_bm_Rep1 <- create_seurat(\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad\")\n", "human_cd34_bm_Rep2 <- create_seurat(\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad\")\n", "human_cd34_bm_Rep3 <- create_seurat(\"https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad\")" ] }, { "cell_type": "code", "execution_count": 7, "id": "a7c8b823-4d18-4252-acc1-4a9f51f929b9", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:23:08.315660Z", "iopub.status.busy": "2023-11-28T21:23:08.315364Z", "iopub.status.idle": "2023-11-28T21:23:08.361153Z", "shell.execute_reply": "2023-11-28T21:23:08.360630Z", "shell.execute_reply.started": "2023-11-28T21:23:08.315642Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "An object of class Seurat \n", "29302 features across 5780 samples within 2 assays \n", "Active assay: RNA (14651 features, 0 variable features)\n", " 1 layer present: counts\n", " 1 other assay present: MAGIC_imputed\n", " 1 dimensional reduction calculated: tsne\n" ] } ], "source": [ "%%R\n", "\n", "human_cd34_bm_Rep1" ] }, { "cell_type": "code", "execution_count": 8, "id": "094067ac-b251-4e37-8d67-eedc2641b8fa", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:23:08.362383Z", "iopub.status.busy": "2023-11-28T21:23:08.361964Z", "iopub.status.idle": "2023-11-28T21:23:08.400063Z", "shell.execute_reply": "2023-11-28T21:23:08.399518Z", "shell.execute_reply.started": "2023-11-28T21:23:08.362356Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "An object of class Seurat \n", "29826 features across 6501 samples within 2 assays \n", "Active assay: RNA (14913 features, 0 variable features)\n", " 1 layer present: counts\n", " 1 other assay present: MAGIC_imputed\n", " 1 dimensional reduction calculated: tsne\n" ] } ], "source": [ "%%R\n", "\n", "human_cd34_bm_Rep2" ] }, { "cell_type": "code", "execution_count": 9, "id": "6fb000c4-41ee-4147-aba8-08c0e6f7deb5", "metadata": { "execution": { "iopub.execute_input": "2023-11-28T21:23:08.401196Z", "iopub.status.busy": "2023-11-28T21:23:08.400878Z", "iopub.status.idle": "2023-11-28T21:23:08.441148Z", "shell.execute_reply": "2023-11-28T21:23:08.440627Z", "shell.execute_reply.started": "2023-11-28T21:23:08.401171Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "An object of class Seurat \n", "28088 features across 12046 samples within 2 assays \n", "Active assay: RNA (14044 features, 0 variable features)\n", " 1 layer present: counts\n", " 1 other assay present: MAGIC_imputed\n", " 1 dimensional reduction calculated: tsne\n" ] } ], "source": [ "%%R\n", "\n", "human_cd34_bm_Rep3" ] }, { "cell_type": "code", "execution_count": null, "id": "e208ff84-85d0-40f7-b08d-9153537b088a", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "da1", "language": "python", "name": "da1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }