{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "45940777-2ceb-4970-96a7-88cdc1a303ee",
   "metadata": {},
   "source": [
    "# Label cell types using Seurat Label Transfer\n",
    "\n",
    "To build our reference, we would like to start with labels that originate from published cell type references. \n",
    "\n",
    "One of the approaches for this cell type labeling is Seurat, which labels cells by integration with a reference dataset.  \n",
    "\n",
    "Label transfer using Seurat is described [on their website](https://satijalab.org/seurat/articles/integration_mapping), and was introduced in this publication:  \n",
    "\n",
    "Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902.e21 (2019)\n",
    "\n",
    "Here, we'll load in our cells in batches, and assign cell types based on the PBMC reference dataset provided by the Satija lab as part of their 2021 publication in Cell (described below)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "578488b5-52e9-4696-aa42-9c2944b77b1b",
   "metadata": {},
   "source": [
    "## Load libraries\n",
    "\n",
    "`dplyr`: Data frame manipulation tools  \n",
    "`H5weaver`: A package for reading .h5 files generated by AIFI  \n",
    "`hise`: The HISE SDK  \n",
    "`parallel`: Parallelization of processes in R  \n",
    "`purrr`: Functional programing tools  \n",
    "`Seurat`: Single-cell data analysis tools  \n",
    "`SeuratObject`: Data structures for Seurat\n",
    "\n",
    "We also set the `timeout` option to be high so that R waits for us to download the large reference dataset from Zenodo, below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "84c8862c-7ddd-4ef1-b5fb-27bdd8c948f6",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }\n",
    "\n",
    "quiet_library(dplyr)\n",
    "quiet_library(H5weaver)\n",
    "quiet_library(hise)\n",
    "quiet_library(parallel)\n",
    "quiet_library(purrr)\n",
    "quiet_library(Seurat)\n",
    "quiet_library(SeuratObject)\n",
    "\n",
    "options(timeout = 10000)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31328a06-0d11-4f06-b369-97cb326b610e",
   "metadata": {},
   "source": [
    "## Prepare Seurat PBMC reference"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05c93485-4e86-4137-98f3-21b46999a767",
   "metadata": {},
   "source": [
    "To use Seurat to label our cells, we'll utilize the PBMC reference provided by the Satija Lab for PBMCs generated using Seurat V5. This reference is derived from data in this publication from the Satija lab:\n",
    "\n",
    "Hao, Y. and Hao, S. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021)\n",
    "\n",
    "The version of record for this reference dataset is provided in a Zenodo repository at this accession:  \n",
    "https://zenodo.org/records/7779017\n",
    "\n",
    "Additional information about the cell type labels in this reference is available [on the Azimuth website](https://azimuth.hubmapconsortium.org/references/#Human%20-%20PBMC).\n",
    "\n",
    "We'll download the reference from Zenodo for label transfer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a2237f65-92a1-4167-b6d3-1d2c2feea463",
   "metadata": {},
   "outputs": [],
   "source": [
    "if(!dir.exists(\"reference\")) {\n",
    "    dir.create(\"reference\")\n",
    "}\n",
    "download.file(\n",
    "    \"https://zenodo.org/records/7779017/files/pbmc_multimodal_2023.rds?download=1\",\n",
    "    \"reference/pbmc_multimodal_2023.rds\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "310a3425-2e9c-4d09-92d7-d37aa3133f1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "reference <- readRDS(\"reference/pbmc_multimodal_2023.rds\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "352f9dd8-7653-4b8d-82cc-ec7fc32e1b8f",
   "metadata": {},
   "source": [
    "Between Level 2 (L2) and Level 3 (L3) labels provided by the Satija lab, we like to add an additional level that we call L2.5, which separates the Treg Naive and Memory cells based on L3, and assigns a CD8 TEMRA cell label to cells with the L3 labels CD8 TEM_4 and CD8 TEM_5. All other cell types use their L2 assignments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "414d81e9-23a1-4293-a12d-4557273eeb84",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "l3 <- as.character(reference@meta.data$celltype.l3)\n",
    "l2 <- as.character(reference@meta.data$celltype.l2)\n",
    "l2.5 <- l2\n",
    "l2.5[l3 == \"Treg Naive\"] <- \"Treg Naive\"\n",
    "l2.5[l3 == \"Treg Memory\"] <- \"Treg Memory\"\n",
    "l2.5[l3 %in% c(\"CD8 TEM_4\", \"CD8 TEM_5\")] <- \"CD8 TEMRA\"\n",
    "\n",
    "reference <- AddMetaData(reference, metadata = l2.5, col.name = \"celltype.l2.5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d96208b-30c8-4516-9093-ff0fd724f0c3",
   "metadata": {},
   "source": [
    "## Retreive sample metadata\n",
    "\n",
    "In an earlier step, we assembled and stored sample metadata in HISE. We'll pull this file, and use it to retrieve file for our labeling process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "41889435-89c3-419d-845b-848514fe977f",
   "metadata": {},
   "outputs": [],
   "source": [
    "sample_meta_uuid <- \"2da66a1a-17cc-498b-9129-6858cf639caf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "086448e5-5aad-443e-af11-8d7e86a90b8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(sample_meta_uuid))\n",
    "sample_meta_file <- list.files(\n",
    "    paste0(\"cache/\", sample_meta_uuid), \n",
    "    pattern = \".csv\",\n",
    "    full.names = TRUE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ec100df3-b875-46b0-95f4-0798c488cd6f",
   "metadata": {},
   "outputs": [],
   "source": [
    "hise_meta <- read.csv(sample_meta_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c163a8af-555c-4728-b00f-c6620abf7619",
   "metadata": {},
   "source": [
    "## Cache input files\n",
    "\n",
    "Next, we'll use the hise package to cache all of the input files. With this many files, there are occasional problems with transfer, so we'll also add a check for existing files in our function, and run a second pass to make sure we have everything we want to label."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed9e672c-fe7d-4494-81d8-56a68184a835",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function with cache path check\n",
    "cache_file <- function(h5_uuid) {\n",
    "    cache_dir <- paste0(\"cache/\",h5_uuid)\n",
    "     if(!dir.exists(cache_dir)) {\n",
    "         res <- cacheFiles(list(h5_uuid))\n",
    "     }\n",
    "}\n",
    "\n",
    "# Walk file UUIDs to cache\n",
    "file_ids <- hise_meta$file.id\n",
    "walk(file_ids, cache_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6ae6965-7ab7-4a4f-ad0f-1d0016c800a3",
   "metadata": {},
   "source": [
    "Run a second pass to make sure we have everything"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd55c1db-da54-43a1-9889-efefd793f42e",
   "metadata": {},
   "outputs": [],
   "source": [
    "walk(file_ids, cache_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7abde79c-a9aa-429f-ab51-e210c885e1e2",
   "metadata": {},
   "source": [
    "## Divide data into chunks for parallel processing\n",
    "\n",
    "For labeling, we'll take files in batches of up to 10 files. We'll label those files, then output the results for each sample in the batch."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a9bf4fd0-6c9e-4944-8925-36684ace10ae",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "hise_meta <- hise_meta %>% \n",
    "  arrange(file.batchID)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "2421afd7-2be3-4e9c-8985-1a6f1d6e2554",
   "metadata": {},
   "outputs": [],
   "source": [
    "b <- rep(1:11, each = 10)[1:nrow(hise_meta)]\n",
    "df_chunk_list <- split(hise_meta, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "6f607a41-e4da-478a-994e-09eba3989e1b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       ".dl-inline {width: auto; margin:0; padding: 0}\n",
       ".dl-inline>dt, .dl-inline>dd {float: none; width: auto; display: inline-block}\n",
       ".dl-inline>dt::after {content: \":\\0020\"; padding-right: .5ex}\n",
       ".dl-inline>dt:not(:first-of-type) {padding-left: .5ex}\n",
       "</style><dl class=dl-inline><dt>1</dt><dd>10</dd><dt>2</dt><dd>10</dd><dt>3</dt><dd>10</dd><dt>4</dt><dd>10</dd><dt>5</dt><dd>10</dd><dt>6</dt><dd>10</dd><dt>7</dt><dd>10</dd><dt>8</dt><dd>10</dd><dt>9</dt><dd>10</dd><dt>10</dt><dd>10</dd><dt>11</dt><dd>8</dd></dl>\n"
      ],
      "text/latex": [
       "\\begin{description*}\n",
       "\\item[1] 10\n",
       "\\item[2] 10\n",
       "\\item[3] 10\n",
       "\\item[4] 10\n",
       "\\item[5] 10\n",
       "\\item[6] 10\n",
       "\\item[7] 10\n",
       "\\item[8] 10\n",
       "\\item[9] 10\n",
       "\\item[10] 10\n",
       "\\item[11] 8\n",
       "\\end{description*}\n"
      ],
      "text/markdown": [
       "1\n",
       ":   102\n",
       ":   103\n",
       ":   104\n",
       ":   105\n",
       ":   106\n",
       ":   107\n",
       ":   108\n",
       ":   109\n",
       ":   1010\n",
       ":   1011\n",
       ":   8\n",
       "\n"
      ],
      "text/plain": [
       " 1  2  3  4  5  6  7  8  9 10 11 \n",
       "10 10 10 10 10 10 10 10 10 10  8 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "map_int(df_chunk_list, nrow)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5408115d-1f36-431d-9556-764a48a41749",
   "metadata": {},
   "source": [
    "## Prepare output directory\n",
    "\n",
    "We'll store results for each sample in `output/Hao_PBMC/`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "64f66d04-d1c1-428a-8369-62735f79a897",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_dir <- \"output/Hao_PBMC/\"\n",
    "if(!dir.exists(out_dir)) {\n",
    "    dir.create(out_dir, recursive = TRUE)\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e892ae10-b5a9-4c87-b5c3-b30236e1133a",
   "metadata": {},
   "source": [
    "## Functions for parallel label transfer of chunks\n",
    "\n",
    "For each chunk, we'll retrieve the data from HISE, perform label transfer and ADT imputation using Seurat, and then store the labels and imputed ADT matrices to allow us to assess cell type identity.\n",
    "\n",
    "The main function to perform these steps is `label_chunk()`, provided below. There are also 4 helper functions that assist in performing these steps for each sample:  \n",
    "`read_so()` Reads the .h5 files stored in HISE as Seurat Objects for analysis  \n",
    "`write_labels()` Writes the labeling results to .csv files for each sample  \n",
    "`get_sample_adt()` subsets the ADT matrix for each sample and returns ADT values per sample  \n",
    "`write_adt()` Writes the imputed ADT matrix values to a .h5 file for later use"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "56d31af1-64d3-432b-b129-110c539b69ba",
   "metadata": {},
   "outputs": [],
   "source": [
    "read_so <- function(h5_uuid) {\n",
    "    cache_dir <- paste0(\"cache/\",h5_uuid)\n",
    "    h5_file <- list.files(\n",
    "        cache_dir, \n",
    "        pattern = \".h5\", \n",
    "        full.names = TRUE\n",
    "    )\n",
    "    \n",
    "    counts <- read_h5_dgCMatrix(h5_file)\n",
    "    rownames(counts) <- make.unique(rownames(counts))\n",
    "    meta <- read_h5_cell_meta(h5_file)\n",
    "    rownames(meta) <- meta$barcodes\n",
    "\n",
    "    so <- CreateSeuratObject(\n",
    "      counts,\n",
    "      meta.data = meta,\n",
    "      assay = \"RNA\")\n",
    "\n",
    "    so\n",
    "}\n",
    "\n",
    "write_labels <- function(sample_labels, sample_id) {\n",
    "    out_file <- paste0(\"output/Hao_PBMC/\", sample_id, \"_Hao_PBMC.csv\")\n",
    "    write.csv(\n",
    "        sample_labels,\n",
    "        out_file,\n",
    "        row.names = FALSE,\n",
    "        quote = FALSE\n",
    "    )\n",
    "}\n",
    "\n",
    "get_sample_adt <- function(sample_meta, adt_data) {\n",
    "    adt_data[,sample_meta$barcodes]\n",
    "}\n",
    "\n",
    "write_adt <- function(sample_adt, sample_id) {\n",
    "    out_file <- paste0(\"output/Hao_PBMC/\", sample_id, \"_ADT.h5\")\n",
    "    list_mat <- list(\n",
    "        i = sample_adt@i,\n",
    "        p = sample_adt@p,\n",
    "        x = sample_adt@x,\n",
    "        Dim = dim(sample_adt),\n",
    "        rownames = rownames(sample_adt),\n",
    "        colnames = colnames(sample_adt)\n",
    "    )\n",
    "\n",
    "    h5createFile(out_file)\n",
    "    h5write(list_mat, out_file, \"mat\")\n",
    "}\n",
    "\n",
    "check_output <- function(sample_id) {\n",
    "    out_file <- paste0(\"output/Hao_PBMC/\", sample_id, \"_Hao_PBMC.csv\")\n",
    "    file.exists(out_file)\n",
    "}\n",
    "\n",
    "label_chunk <- function(meta_data) {\n",
    "    # check for existing labels\n",
    "    sample_ids <- unique(meta_data$pbmc_sample_id)\n",
    "    out_check <- map_lgl(sample_ids, check_output)\n",
    "    if(sum(out_check) == length(sample_ids)) {\n",
    "        return(invisible(NULL))\n",
    "    }\n",
    "    \n",
    "    # Read and combine individual samples\n",
    "    so_list <- map(meta_data$file.id, read_so)\n",
    "    combined <- Reduce(merge, so_list)\n",
    "    rm(so_list)\n",
    "\n",
    "    # Transform data to match the reference\n",
    "    combined <- SCTransform(\n",
    "        combined,\n",
    "        method = \"glmGamPoi\", \n",
    "        verbose = FALSE\n",
    "    )\n",
    "    \n",
    "    # find anchors\n",
    "    anchors <- FindTransferAnchors(\n",
    "      reference = reference,\n",
    "      query = combined,\n",
    "      normalization.method = \"SCT\",\n",
    "      reference.reduction = \"spca\",\n",
    "      dims = 1:50\n",
    "    )  \n",
    "        \n",
    "    #perform projection to get labels\n",
    "    combined <- MapQuery(\n",
    "      anchorset = anchors,\n",
    "      query = combined,\n",
    "      reference = reference,\n",
    "      refdata = list(\n",
    "        celltype.l1 = \"celltype.l1\",\n",
    "        celltype.l2 = \"celltype.l2\",\n",
    "        celltype.l3 = \"celltype.l3\",\n",
    "        celltype.l2.5 = \"celltype.l2.5\",\n",
    "        predicted_ADT = \"ADT\"\n",
    "      ),\n",
    "      reference.reduction = \"spca\", \n",
    "      reduction.model = \"wnn.umap\"\n",
    "    )\n",
    "\n",
    "    # Split the metadata by sample for output\n",
    "    sample_meta_list <- split(combined@meta.data, combined@meta.data$pbmc_sample_id)\n",
    "\n",
    "    # Write labels\n",
    "    walk2(\n",
    "        sample_meta_list, names(sample_meta_list),\n",
    "        write_labels\n",
    "    )\n",
    "\n",
    "    # Subset and write projected ADT data\n",
    "    sample_adt_list <- map(\n",
    "        sample_meta_list, \n",
    "        get_sample_adt, \n",
    "        adt_data = combined@assays$predicted_ADT@data)\n",
    "\n",
    "    walk2(\n",
    "        sample_adt_list, names(sample_meta_list),\n",
    "        write_adt\n",
    "    )\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c3bf721-40f2-42db-9b6d-82ed4dcefbaf",
   "metadata": {},
   "source": [
    "## Apply label transfer to all chunks\n",
    "\n",
    "We'll use `mclapply()` to perform our label transfer steps in parallel for multiple chunks. \n",
    "\n",
    "Because of the extremely memory-intensive use of `SCTransform()`, we're limited in the number of chunks that we can process simultaneously."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "98dd7302-49e8-4efc-b79c-480df9304770",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "res <- mclapply(\n",
    "    df_chunk_list,\n",
    "    label_chunk,\n",
    "    mc.cores = 3\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d0ffcac-1e78-43f8-b80f-37470a3b4736",
   "metadata": {},
   "source": [
    "## Assemble results\n",
    "\n",
    "Next, we'll assemble all of the labeling results in a single file for storage in HISE and downstream utilization."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "a998eb80-39a3-4b8f-beb9-14331e0bf8ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "label_files <- list.files(\n",
    "    out_dir,\n",
    "    pattern = \".csv\",\n",
    "    full.names = TRUE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "b5f3c1e1-b2d8-4e8a-81cb-65456176d1d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_labels <- map_dfr(label_files, read.csv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "14bb59ef-30a7-4a7f-9d2f-044ea8d363cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_labels <- all_labels %>%\n",
    "  select(pbmc_sample_id, barcodes, starts_with(\"predicted\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "0caad4e8-c5a3-47de-a8d9-f197c342ddb4",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_csv <- paste0(\n",
    "    \"output/ref_seurat_labels_PBMC_\",\n",
    "    Sys.Date(),\n",
    "    \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    all_labels,\n",
    "    out_csv,\n",
    "    row.names = FALSE,\n",
    "    quote = FALSE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "a066e794-11f9-4381-b81b-4be1e47c28c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "adt_files <- list.files(\n",
    "    out_dir,\n",
    "    pattern = \".h5\",\n",
    "    full.names = TRUE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "93c94d41-f21d-4eff-aedd-3d543e43a991",
   "metadata": {},
   "outputs": [],
   "source": [
    "read_adt_mat <- function(adt_file) {\n",
    "    mat_list <- rhdf5::h5read(adt_file, \"/mat\")\n",
    "    mat_list <- lapply(mat_list, as.vector)\n",
    "    mat <- Matrix::sparseMatrix(\n",
    "        i = mat_list$i,\n",
    "        p = mat_list$p,\n",
    "        x = mat_list$x,\n",
    "        dims = mat_list$Dim,\n",
    "        dimnames = list(\n",
    "            mat_list$rownames,\n",
    "            mat_list$colnames\n",
    "        ),\n",
    "        index1 = FALSE\n",
    "    )\n",
    "    mat\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "b934bf75-cd44-4813-b525-1db9a5ba01a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "adt_list <- map(adt_files, read_adt_mat)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "e961f32c-ed96-4876-83c1-b3dcc8240124",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_adt <- do.call(cbind, adt_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "7574bf70-af89-4144-adb8-25ec960fd8e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "h5_list <- list(\n",
    "    matrix_dgCMatrix = all_adt\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "513f3512-1b5e-41ff-9304-82d0390180d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "h5_list <- h5_list_convert_from_dgCMatrix(h5_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "d3f485d1-e965-4857-b8ac-52a34df99c67",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "List of 1\n",
      " $ matrix:List of 6\n",
      "  ..$ data    : num [1:475230846] 0.406 0.7 0.973 2.481 1.206 ...\n",
      "  ..$ indices : int [1:475230846] 0 1 2 3 4 5 6 7 8 9 ...\n",
      "  ..$ indptr  : int [1:2093079] 0 227 454 681 908 1135 1362 1589 1817 2044 ...\n",
      "  ..$ shape   : int [1:2] 228 2093078\n",
      "  ..$ barcodes: chr [1:2093078] \"cf71f47048b611ea8957bafe6d70929e\" \"cf71f54248b611ea8957bafe6d70929e\" \"cf71fa1048b611ea8957bafe6d70929e\" \"cf71fb7848b611ea8957bafe6d70929e\" ...\n",
      "  ..$ features:List of 1\n",
      "  .. ..$ id: chr [1:228] \"CD39\" \"Rat-IgG1-1\" \"CD107a\" \"CD62P\" ...\n"
     ]
    }
   ],
   "source": [
    "str(h5_list)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "bd1ef0b1-69db-479a-95b2-81b4d31c38b8",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_h5 <- paste0(\n",
    "    \"output/ref_seurat_projected-ADT_PBMC_\",\n",
    "    Sys.Date(),\n",
    "    \".h5\"\n",
    ")\n",
    "write_h5_list(\n",
    "    h5_list,\n",
    "    out_h5\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c55ef962-3fb3-483f-a194-3f817209b900",
   "metadata": {},
   "source": [
    "## Store results in HISE\n",
    "\n",
    "Finally, we store the output file in our Collaboration Space for later retrieval and use. We need to provide the UUID for our Collaboration Space (aka `studySpaceId`), as well as a title for this step in our analysis process.\n",
    "\n",
    "The hise function `uploadFiles()` also requires the FileIDs from the original fileset for reference."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "27cec870-f4fb-438a-afaf-1468edc3275a",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "study_space_uuid <- \"64097865-486d-43b3-8f94-74994e0a72e0\"\n",
    "title <- paste(\"Ref. Seurat Label Predictions\", Sys.Date())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "acf155b2-1d89-4840-9c76-204bf7f6b116",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "in_list <- as.list(c(sample_meta_uuid, hise_meta$file.id))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "aad47bda-506f-459c-a563-08ee1ac8d7ea",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "out_list <- as.list(c(out_csv, out_h5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "5d41db31-92ce-46ae-b957-41d86977e0e9",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] \"Authorization token invalid or expired.\"\n",
      "[1] \"Retrying...\"\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<dl>\n",
       "\t<dt>$files</dt>\n",
       "\t\t<dd><ol>\n",
       "\t<li>'output/ref_seurat_labels_PBMC_2024-02-18.csv'</li>\n",
       "\t<li>'output/ref_seurat_projected-ADT_PBMC_2024-02-18.h5'</li>\n",
       "</ol>\n",
       "</dd>\n",
       "\t<dt>$traceId</dt>\n",
       "\t\t<dd>'e9ecdbe0-8f06-4c88-ac02-457f26c7abe1'</dd>\n",
       "</dl>\n"
      ],
      "text/latex": [
       "\\begin{description}\n",
       "\\item[\\$files] \\begin{enumerate}\n",
       "\\item 'output/ref\\_seurat\\_labels\\_PBMC\\_2024-02-18.csv'\n",
       "\\item 'output/ref\\_seurat\\_projected-ADT\\_PBMC\\_2024-02-18.h5'\n",
       "\\end{enumerate}\n",
       "\n",
       "\\item[\\$traceId] 'e9ecdbe0-8f06-4c88-ac02-457f26c7abe1'\n",
       "\\end{description}\n"
      ],
      "text/markdown": [
       "$files\n",
       ":   1. 'output/ref_seurat_labels_PBMC_2024-02-18.csv'\n",
       "2. 'output/ref_seurat_projected-ADT_PBMC_2024-02-18.h5'\n",
       "\n",
       "\n",
       "\n",
       "$traceId\n",
       ":   'e9ecdbe0-8f06-4c88-ac02-457f26c7abe1'\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "$files\n",
       "$files[[1]]\n",
       "[1] \"output/ref_seurat_labels_PBMC_2024-02-18.csv\"\n",
       "\n",
       "$files[[2]]\n",
       "[1] \"output/ref_seurat_projected-ADT_PBMC_2024-02-18.h5\"\n",
       "\n",
       "\n",
       "$traceId\n",
       "[1] \"e9ecdbe0-8f06-4c88-ac02-457f26c7abe1\"\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "uploadFiles(\n",
    "    files = out_list,\n",
    "    studySpaceId = study_space_uuid,\n",
    "    title = title,\n",
    "    inputFileIds = in_list,\n",
    "    store = \"project\",\n",
    "    doPrompt = FALSE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "5a4f2040-2c12-4051-a30d-11f551ba847f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R version 4.3.2 (2023-10-31)\n",
       "Platform: x86_64-conda-linux-gnu (64-bit)\n",
       "Running under: Ubuntu 20.04.6 LTS\n",
       "\n",
       "Matrix products: default\n",
       "BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0\n",
       "\n",
       "locale:\n",
       " [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       \n",
       " [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   \n",
       " [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          \n",
       "[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   \n",
       "\n",
       "time zone: Etc/UTC\n",
       "tzcode source: system (glibc)\n",
       "\n",
       "attached base packages:\n",
       "[1] parallel  stats     graphics  grDevices utils     datasets  methods  \n",
       "[8] base     \n",
       "\n",
       "other attached packages:\n",
       " [1] Seurat_5.0.1       SeuratObject_5.0.1 sp_2.1-2           purrr_1.0.2       \n",
       " [5] hise_2.16.0        H5weaver_1.2.0     rhdf5_2.46.1       Matrix_1.6-4      \n",
       " [9] data.table_1.14.10 dplyr_1.1.4       \n",
       "\n",
       "loaded via a namespace (and not attached):\n",
       "  [1] RColorBrewer_1.1-3     jsonlite_1.8.8         magrittr_2.0.3        \n",
       "  [4] spatstat.utils_3.0-4   vctrs_0.6.5            ROCR_1.0-11           \n",
       "  [7] spatstat.explore_3.2-5 RCurl_1.98-1.14        base64enc_0.1-3       \n",
       " [10] htmltools_0.5.7        curl_5.1.0             Rhdf5lib_1.24.1       \n",
       " [13] sctransform_0.4.1      parallelly_1.36.0      KernSmooth_2.23-22    \n",
       " [16] htmlwidgets_1.6.4      ica_1.0-3              plyr_1.8.9            \n",
       " [19] plotly_4.10.3          zoo_1.8-12             uuid_1.2-0            \n",
       " [22] igraph_1.6.0           mime_0.12              lifecycle_1.0.4       \n",
       " [25] pkgconfig_2.0.3        R6_2.5.1               fastmap_1.1.1         \n",
       " [28] fitdistrplus_1.1-11    future_1.33.1          shiny_1.8.0           \n",
       " [31] digest_0.6.34          colorspace_2.1-0       patchwork_1.1.3       \n",
       " [34] tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1         \n",
       " [37] progressr_0.14.0       fansi_1.0.6            spatstat.sparse_3.0-3 \n",
       " [40] httr_1.4.7             polyclip_1.10-6        abind_1.4-5           \n",
       " [43] compiler_4.3.2         withr_3.0.0            fastDummies_1.7.3     \n",
       " [46] MASS_7.3-60            tools_4.3.2            lmtest_0.9-40         \n",
       " [49] httpuv_1.6.13          future.apply_1.11.1    goftest_1.2-3         \n",
       " [52] glue_1.7.0             nlme_3.1-164           rhdf5filters_1.14.1   \n",
       " [55] promises_1.2.1         grid_4.3.2             pbdZMQ_0.3-10         \n",
       " [58] Rtsne_0.17             cluster_2.1.6          reshape2_1.4.4        \n",
       " [61] generics_0.1.3         gtable_0.3.4           spatstat.data_3.0-3   \n",
       " [64] tidyr_1.3.0            utf8_1.2.4             spatstat.geom_3.2-7   \n",
       " [67] RcppAnnoy_0.0.21       ggrepel_0.9.4          RANN_2.6.1            \n",
       " [70] pillar_1.9.0           stringr_1.5.1          spam_2.10-0           \n",
       " [73] IRdisplay_1.1          RcppHNSW_0.5.0         later_1.3.2           \n",
       " [76] splines_4.3.2          lattice_0.22-5         survival_3.5-7        \n",
       " [79] deldir_2.0-2           tidyselect_1.2.0       miniUI_0.1.1.1        \n",
       " [82] pbapply_1.7-2          gridExtra_2.3          scattermore_1.2       \n",
       " [85] matrixStats_1.2.0      stringi_1.8.3          lazyeval_0.2.2        \n",
       " [88] evaluate_0.23          codetools_0.2-19       tibble_3.2.1          \n",
       " [91] cli_3.6.2              uwot_0.1.16            IRkernel_1.3.2        \n",
       " [94] xtable_1.8-4           reticulate_1.34.0      repr_1.1.6.9000       \n",
       " [97] munsell_0.5.0          Rcpp_1.0.12            globals_0.16.2        \n",
       "[100] spatstat.random_3.2-2  png_0.1-8              ellipsis_0.3.2        \n",
       "[103] ggplot2_3.4.4          assertthat_0.2.1       dotCall64_1.1-1       \n",
       "[106] bitops_1.0-7           listenv_0.9.0          viridisLite_0.4.2     \n",
       "[109] scales_1.3.0           ggridges_0.5.5         leiden_0.4.3.1        \n",
       "[112] crayon_1.5.2           rlang_1.1.3            cowplot_1.1.2         "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sessionInfo()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R 4",
   "language": "R",
   "name": "ir4"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.3.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}