{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7b02e85c-bd28-4877-accd-a762bc2782a3",
   "metadata": {},
   "source": [
    "# Differential gene expression with DESeq2\n",
    "\n",
    "In this notebook, we'll perform differential gene expression tests using the DESeq2 package on our aggregated pseudobulk counts for each subject and cell type."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2035210-db59-4072-8be7-6863bcd0a6f4",
   "metadata": {},
   "source": [
    "## Load packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "3a9e9a35-04db-401c-82c0-d9303324d465",
   "metadata": {},
   "outputs": [],
   "source": [
    "quiet_library <- function(...) {suppressPackageStartupMessages(library(...))}\n",
    "quiet_library(DESeq2)\n",
    "quiet_library(dplyr)\n",
    "quiet_library(hise)\n",
    "quiet_library(purrr)\n",
    "quiet_library(furrr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "1575c033-7482-4265-ad90-d25f6739f1ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "plan(multicore, workers = 12)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "67f757e3-cd47-4c17-9ecd-dfcee6032282",
   "metadata": {},
   "outputs": [],
   "source": [
    "if(!dir.exists(\"output\")) {\n",
    "    dir.create(\"output\")\n",
    "}\n",
    "in_files <- c()\n",
    "out_files <- c()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87bbfdc6-5429-45ef-8f4e-4e46f6e8245c",
   "metadata": {},
   "source": [
    "## Helper functions\n",
    "\n",
    "This function prepares data for analysis by setting factors, converting the matrix of data to a non-sparse, integer matrix, and assembling a DESeqDataSet with a given factor design."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "e13fbf3a-0855-4ad4-bb89-de2592ad7504",
   "metadata": {},
   "outputs": [],
   "source": [
    "prepare_deseq_data <- function(mat, meta, design) {\n",
    "    meta$cohort.cohortGuid <- factor(meta$cohort.cohortGuid, levels = c(\"BR1\", \"BR2\"))\n",
    "    meta$subject.biologicalSex <- factor(meta$subject.biologicalSex, levels = c(\"Female\", \"Male\"))\n",
    "    meta$subject.cmv <- factor(meta$subject.cmv, levels = c(\"Negative\", \"Positive\"))\n",
    "    \n",
    "    mat <- mat[,meta$barcodes]\n",
    "    \n",
    "    mat <- as.matrix(mat)\n",
    "    mode(mat) <- \"integer\"\n",
    "    \n",
    "    DESeqDataSetFromMatrix(\n",
    "        mat, colData = meta,\n",
    "        design = design\n",
    "    )\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "be6112e2-a08e-489d-baf0-263caa9b12bc",
   "metadata": {},
   "source": [
    "This function retrieves results from a DESeqDataSet after modeling and builds a data.frame with these results, cell types, and data grouping metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "80704c08-4eb2-4743-979e-d597c20e8ae8",
   "metadata": {},
   "outputs": [],
   "source": [
    "format_deseq_results <- function(dds, cell_type, group_by, fg, bg) {\n",
    "        meta <- as.data.frame(colData(dds))\n",
    "        \n",
    "        res <- results(dds, contrast = c(group_by, fg, bg))\n",
    "        res <- data.frame(res)\n",
    "\n",
    "        \n",
    "        res <- res %>%\n",
    "          arrange(padj) %>%\n",
    "          mutate(direction = ifelse(log2FoldChange > 0,\n",
    "                                    paste0(\"HigherIn\", fg), \n",
    "                                    paste0(\"HigherIn\", bg))) %>%\n",
    "          mutate(gene = rownames(.),\n",
    "                 cell_type = cell_type,\n",
    "                 fg = fg, n_fg = sum(res[[group_by]] == fg),\n",
    "                 bg = bg, n_bg = sum(res[[group_by]] == bg)) %>%\n",
    "          select(cell_type, fg, n_fg, bg, n_bg, gene, log2FoldChange, padj, direction,\n",
    "                 baseMean, lfcSE, stat, pvalue)\n",
    "\n",
    "        res\n",
    "        \n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45ba9da8-44dd-4c88-8e51-496f7fbb0838",
   "metadata": {},
   "source": [
    "## Single-factor comparisons\n",
    "\n",
    "Here, we'll identify differences in comparisons between cohorts, sexes, and CMV status across all subjects while controlling for the other two factors."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f97031f1-a279-4dc5-988e-1c1edd567ee8",
   "metadata": {},
   "source": [
    "### Retrieve pseudobulk data from HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "047a7ec6-559f-4dc1-a972-d62ebf1ebdc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "all_uuid <- \"edf436fb-7063-4655-9055-6ad431fe7159\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "66d08b3c-368d-416a-9be2-b3951042e135",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(all_uuid))\n",
    "pb_file <- list.files(paste0(\"cache/\", all_uuid), pattern = \".rds\", full.names = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "4aae0400-eeba-4d3a-a66b-c15bd618bdc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "pb_data <- readRDS(pb_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "f287942d-c643-4a8f-a96f-b25766f31302",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_files = c(in_files, all_uuid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "012de32f-652a-454f-ac02-3cf7aa95abda",
   "metadata": {},
   "source": [
    "### YA vs OA; control for sex and CMV status\n",
    "\n",
    "To test for cohort differences, we'll specify the design of the differential expression tests using a function:  \n",
    "`design = ~ cohort.cohortGuid + subject.biologicalSex + subject.cmv`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "f7b1b52d-44af-4acc-aa9a-19d019771159",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ cohort.cohortGuid + subject.biologicalSex + subject.cmv\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ac0a774-3c44-40d2-8d7d-6e4ecef4b2ae",
   "metadata": {},
   "source": [
    "#### Run DESeq2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "135ecccc-e79b-4e60-aee2-f0415fa0bdf2",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "847d12a9-b643-4275-b2ed-cf289f1ee266",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "56fab2c6-58f0-49c2-a321-91b4effe6cea",
   "metadata": {},
   "outputs": [],
   "source": [
    "cohort_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"cohort.cohortGuid\",\n",
    "    fg = \"BR1\", bg = \"BR2\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f25e7b6-00ee-48b8-aa4c-6b590b7e0086",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f534a415-47e9-4880-a27d-14ebedd401c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"cohort\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    cohort_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc102c63-f8e8-4f21-9338-79b5c068e4eb",
   "metadata": {},
   "source": [
    "### CMV-Negative vs CMV-Positive; control for Cohort and sex\n",
    "\n",
    "To test for CMV differences, we'll specify the design of the differential expression tests using a function:  \n",
    "`design = ~ subject.cmv + cohort.cohortGuid + subject.biologicalSex`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "1adacaa5-389d-4dc2-adba-ffe38267f428",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ subject.cmv + cohort.cohortGuid + subject.biologicalSex\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61f8da60-abd9-4b89-8e81-da7e76ea06af",
   "metadata": {},
   "source": [
    "#### Run DESeq2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "cf767797-335e-4944-ad4d-27adc12f4410",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc50799b-8a8b-46c0-9a54-72bd05cb3477",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "c752d6b4-e599-4c23-8793-fcc1eee9c4e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmv_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"subject.cmv\",\n",
    "    fg = \"Positive\", bg = \"Negative\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27353483-fffd-4259-8779-17769ee25099",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "628eaf3c-66a9-49fc-827e-084801ca648d",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"cmv\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    cmv_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "599a95ef-7df3-4e77-95db-ddd5aa9f2cf2",
   "metadata": {},
   "source": [
    "### Female vs Male; control for Cohort and CMV status\n",
    "\n",
    "To test for CMV differences, we'll specify the design of the differential expression tests using a function:  \n",
    "`design = ~ subject.biologicalSex + cohort.cohortGuid + subject.cmv`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "8dbc65f3-a5d2-489e-a589-03b25d311221",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ subject.biologicalSex + cohort.cohortGuid + subject.cmv\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d24ce06e-a9b6-43a4-8c8d-8ba78cd2e4ff",
   "metadata": {},
   "source": [
    "#### Run DESeq2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "cf79a532-5ebb-43b5-9c97-de542ca85a66",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64728c6a-ee05-423e-a22f-a64eda608b6a",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "ed07407f-816d-4091-b8e8-452f3b2f753c",
   "metadata": {},
   "outputs": [],
   "source": [
    "sex_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"subject.biologicalSex\",\n",
    "    fg = \"Female\", bg = \"Male\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9fa1eb59-59ac-413b-8587-70f799357d23",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "fbc3d084-a65f-4afc-a12a-00117d2466ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"sex\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    sex_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91591bf5-0c95-44f5-bf5e-ab80b3958eb7",
   "metadata": {},
   "source": [
    "## Within-factor comparisons\n",
    "\n",
    "Here, we'll identify differences between the cohorts within sex and CMV status.\n",
    "\n",
    "To ensure gene selection was based on the data available within these groups, we previously generated pseudobulk datasets for the BR1 and BR2 cohorts, for CMV+ and CMV- subjects, and for Female and Male subjects separately.\n",
    "\n",
    "Here, we're most interested in the effect of age - that is, comparisons between BR1 and BR2 - within the other two factors, CMV+ and CMV- or Male and Female.\n",
    "\n",
    "We could perform every possible subsetting, but for now we'll stick with these 4 additional comparisons - BR1 vs BR2 within CMV+, within CMV-, within Female, and within Male subjects."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1fa6353-ecd6-48c3-b72d-b58ec7ecfcb8",
   "metadata": {},
   "source": [
    "### Cohort within CMV positive"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d592c47d-2112-4682-9c04-72c98880f1c8",
   "metadata": {},
   "source": [
    "#### Retrieve pseudobulk data from HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "d4c2f64b-b02a-47da-b15c-e7ebe4083104",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmv_pos_uuid <- \"c8e65e37-d11d-4baf-b450-6eda124ef102\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "2fe8efde-dea6-4f9f-9461-ce3d0004ff7a",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(cmv_pos_uuid))\n",
    "pb_file <- list.files(paste0(\"cache/\", cmv_pos_uuid), pattern = \".rds\", full.names = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "b67016b2-c84a-4112-a0dc-44e918f87285",
   "metadata": {},
   "outputs": [],
   "source": [
    "pb_data <- readRDS(pb_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "c530ecdc-c4c8-4a59-9479-1031c4d17f7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_files = c(in_files, cmv_pos_uuid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1102961e-f3a1-430a-9bce-99b78f0906b2",
   "metadata": {},
   "source": [
    "#### Prep data for analysis\n",
    "\n",
    "Here, we're comparing cohort and controlling for sex within our CMV-positive subjects, so our design formula is:\n",
    "\n",
    "`~ cohort.cohortGuid + subject.biologicalSex`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "876d0c45-616b-4370-93f3-4a0d78887cf6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\n",
       "Positive \n",
       "      37 "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "table(pb_data$meta[[2]]$subject.cmv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "b3a06cb3-aad4-4cf1-ad0f-8ac71926321b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "37"
      ],
      "text/latex": [
       "37"
      ],
      "text/markdown": [
       "37"
      ],
      "text/plain": [
       "[1] 37"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "ncol(pb_data$mats[[2]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "bbfaa086-0ff0-4f2a-a4ed-9ea84d1e092d",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ cohort.cohortGuid + subject.biologicalSex\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb9c98f2-59ca-4513-a794-3f9c8c4dff4a",
   "metadata": {},
   "source": [
    "#### Run DESeq2 and extract results from tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "dfa8d6a2-494a-4a98-8047-fa561379509f",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd58de98-688a-44c8-af5c-4bd9c0be0780",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "174e8a9f-121e-4096-b6b6-36ddadec61ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmvpos_cohort_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"cohort.cohortGuid\",\n",
    "    fg = \"BR1\", bg = \"BR2\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5300a1d-a9f8-4255-93dd-229dd863e7e1",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "8483461f-8346-4ad7-a738-54689f76139a",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"cmvpos_cohort\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    cmvpos_cohort_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a4926f8-3bdf-470f-af6e-3358a2fc0ab5",
   "metadata": {},
   "source": [
    "### Cohort within CMV negative"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64800b03-5def-44c0-8b75-dee4d8e58d73",
   "metadata": {},
   "source": [
    "#### Retrieve pseudobulk data from HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "f51fd6b9-e65c-4d17-a8a7-1e13a6da4bbc",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmv_neg_uuid <- \"42de2197-4ce6-400e-a0e7-212f2b25e896\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "c8a6344c-c0aa-4a88-98b0-729cf34ae87f",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(cmv_neg_uuid))\n",
    "pb_file <- list.files(paste0(\"cache/\", cmv_neg_uuid), pattern = \".rds\", full.names = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "2eb3387f-1b5b-4713-957b-e20868a14188",
   "metadata": {},
   "outputs": [],
   "source": [
    "pb_data <- readRDS(pb_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "c4b88f9c-45c4-4007-8a2d-10f76b408789",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_files <- c(in_files, cmv_neg_uuid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "debda7df-d834-4d2e-9fa1-5c5a4ec1b7af",
   "metadata": {},
   "source": [
    "#### Prep data for analysis\n",
    "\n",
    "Here, we're comparing cohort and controlling for sex within our CMV-negative subjects, so our design formula is:\n",
    "\n",
    "`~ cohort.cohortGuid + subject.biologicalSex`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "adc79bda-f4c9-4eed-a2fd-f70c3a999998",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ cohort.cohortGuid + subject.biologicalSex\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9042f6cf-4f1d-4445-9d58-655408ae35d1",
   "metadata": {},
   "source": [
    "#### Run DESeq2 and extract results from tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "ec51e0c3-135a-4e4d-9083-f945c0296a39",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce833e6b-bbfc-4a01-87dd-3bd70695cef7",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "a725240f-4923-4386-ae30-45805e216217",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmvneg_cohort_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"cohort.cohortGuid\",\n",
    "    fg = \"BR1\", bg = \"BR2\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc45d5e3-f772-4e95-9501-35c55c8b8055",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "3651c5af-7754-4f2d-9e1a-b7862a71ec57",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"cmvneg_cohort\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    cmvneg_cohort_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5903d984-bf8b-410b-949b-08093a10331b",
   "metadata": {},
   "source": [
    "### Cohort within Female subjects"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc12fe13-ef5d-4a9e-bbc1-ca2d0fb42712",
   "metadata": {},
   "source": [
    "#### Retrieve pseudobulk data from HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "1a0505cc-c716-461b-9b03-f756abb81cd7",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_uuid <- \"56f09312-eb3f-40b7-b840-368d1ddbc22b\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "d204ebe2-be27-4977-9aa3-fc86a2817ac6",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(female_uuid))\n",
    "pb_file <- list.files(paste0(\"cache/\", female_uuid), pattern = \".rds\", full.names = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "b288f57e-6771-4394-acfc-ba7f12ff1bf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "pb_data <- readRDS(pb_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "aeb275de-1fe4-418b-a2f5-d9cb9a2564d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_files = c(in_files, female_uuid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dc797c5-2f61-4736-be3f-89ef493bd729",
   "metadata": {},
   "source": [
    "#### Prep data for analysis\n",
    "\n",
    "Here, we're comparing cohort and controlling for CMV status within our Female subjects, so our design formula is:\n",
    "\n",
    "`~ cohort.cohortGuid + subject.cmv`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "502b1f90-375c-4b04-a38e-be70b9dc6f88",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ cohort.cohortGuid + subject.cmv\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7de13df-67ca-4f4f-bcfb-4db930aafb7a",
   "metadata": {},
   "source": [
    "#### Run DESeq2 and extract results from tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "69454de9-d776-479e-a6fb-f52bddec9980",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69ad0958-f7a5-4674-882e-4d68253a75a4",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "b68a3235-b9ed-474e-8e10-a14ec52e3185",
   "metadata": {},
   "outputs": [],
   "source": [
    "female_cohort_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"cohort.cohortGuid\",\n",
    "    fg = \"BR1\", bg = \"BR2\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4cd08c0-835f-4f56-9682-7db53cf6a12a",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "599f7ff6-1c05-4acb-8994-b7ba0eb38d64",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"female_cohort\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    female_cohort_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "351d5384-7048-4bdd-ae84-e4ba2523407c",
   "metadata": {},
   "source": [
    "### Cohort within Male subjects"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6ccdb3d-7684-4c34-aa99-324951c22824",
   "metadata": {},
   "source": [
    "#### Retrieve pseudobulk data from HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "id": "ae4236fb-aff0-48e8-81de-77d287e35710",
   "metadata": {},
   "outputs": [],
   "source": [
    "male_uuid <- \"4bf2bab1-b3c8-456b-aa90-566cd7b03577\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "47aca01c-56cd-4d08-b760-c9b993a027fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "res <- cacheFiles(list(male_uuid))\n",
    "pb_file <- list.files(paste0(\"cache/\", male_uuid), pattern = \".rds\", full.names = TRUE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "1999e2db-70af-481a-a26a-83d7038a172d",
   "metadata": {},
   "outputs": [],
   "source": [
    "pb_data <- readRDS(pb_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "541d6ab4-ebe7-4e6f-9da7-5182b2a0bd83",
   "metadata": {},
   "outputs": [],
   "source": [
    "in_files <- c(in_files, male_uuid)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b026d6d-2923-4fae-868c-a4917898cea0",
   "metadata": {},
   "source": [
    "#### Prep data for analysis\n",
    "\n",
    "Here, we're comparing cohort and controlling for CMV status within our Female subjects, so our design formula is:\n",
    "\n",
    "`~ cohort.cohortGuid + subject.cmv`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "3f0514b9-b55b-41c5-97c7-608cda2c561c",
   "metadata": {},
   "outputs": [],
   "source": [
    "dds_data <- map2(\n",
    "    pb_data$mats, pb_data$meta,\n",
    "    prepare_deseq_data,\n",
    "    design = ~ cohort.cohortGuid + subject.cmv\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c5439a0-0d67-4eb3-99dc-008c11bdd5fd",
   "metadata": {},
   "source": [
    "#### Run DESeq2 and extract results from tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "a79be214-de23-4c04-9713-c55a1a746567",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n",
      "-- note: fitType='parametric', but the dispersion trend was not well captured by the\n",
      "   function: y = a/x + b, and a local regression fit was automatically substituted.\n",
      "   specify fitType='local' or 'mean' to avoid this message next time.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "deseq_res <- future_map(\n",
    "    dds_data,\n",
    "    DESeq,\n",
    "    parallel = FALSE, # parallelization handled by future_map\n",
    "    quiet = TRUE,\n",
    "    .options = furrr_options(seed = 3030L)\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d787f62e-c101-4545-a5d3-66b50d39470e",
   "metadata": {},
   "source": [
    "#### Structure results per cell type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "ff76fb04-0556-4642-a687-e1c60f1f0834",
   "metadata": {},
   "outputs": [],
   "source": [
    "male_cohort_res <- map2_dfr(\n",
    "    deseq_res, names(deseq_res),\n",
    "    format_deseq_results,\n",
    "    group_by = \"cohort.cohortGuid\",\n",
    "    fg = \"BR1\", bg = \"BR2\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1251ce0-4be7-4fd7-9375-04b70917c16f",
   "metadata": {},
   "source": [
    "#### Save results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "47ac0e5b-07c8-4a7d-baad-b55b6925c43c",
   "metadata": {},
   "outputs": [],
   "source": [
    "out_group <- \"male_cohort\"\n",
    "out_csv <- paste0(\n",
    "    \"output/diha_pbmc_\", out_group, \"_pseudobulk_deseq2_\", Sys.Date(), \".csv\"\n",
    ")\n",
    "write.csv(\n",
    "    male_cohort_res, out_csv,\n",
    "    quote = FALSE, row.names = FALSE\n",
    ")\n",
    "out_files <- c(out_files, out_csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a6040ab-0688-4384-99d9-c2937551d33b",
   "metadata": {},
   "source": [
    "## Upload results to HISE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "c3315b33-36c1-4adf-9afc-0f159e5faf62",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "study_space_uuid <- \"64097865-486d-43b3-8f94-74994e0a72e0\"\n",
    "title <- paste(\"PBMC Ref. Seurat Pseudobulk DESeq2 Res.\", Sys.Date())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "d00f886c-12fe-4b20-a2ea-c38166d09f62",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ol>\n",
       "\t<li>'edf436fb-7063-4655-9055-6ad431fe7159'</li>\n",
       "\t<li>'c8e65e37-d11d-4baf-b450-6eda124ef102'</li>\n",
       "\t<li>'42de2197-4ce6-400e-a0e7-212f2b25e896'</li>\n",
       "\t<li>'56f09312-eb3f-40b7-b840-368d1ddbc22b'</li>\n",
       "\t<li>'4bf2bab1-b3c8-456b-aa90-566cd7b03577'</li>\n",
       "</ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate}\n",
       "\\item 'edf436fb-7063-4655-9055-6ad431fe7159'\n",
       "\\item 'c8e65e37-d11d-4baf-b450-6eda124ef102'\n",
       "\\item '42de2197-4ce6-400e-a0e7-212f2b25e896'\n",
       "\\item '56f09312-eb3f-40b7-b840-368d1ddbc22b'\n",
       "\\item '4bf2bab1-b3c8-456b-aa90-566cd7b03577'\n",
       "\\end{enumerate}\n"
      ],
      "text/markdown": [
       "1. 'edf436fb-7063-4655-9055-6ad431fe7159'\n",
       "2. 'c8e65e37-d11d-4baf-b450-6eda124ef102'\n",
       "3. '42de2197-4ce6-400e-a0e7-212f2b25e896'\n",
       "4. '56f09312-eb3f-40b7-b840-368d1ddbc22b'\n",
       "5. '4bf2bab1-b3c8-456b-aa90-566cd7b03577'\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[[1]]\n",
       "[1] \"edf436fb-7063-4655-9055-6ad431fe7159\"\n",
       "\n",
       "[[2]]\n",
       "[1] \"c8e65e37-d11d-4baf-b450-6eda124ef102\"\n",
       "\n",
       "[[3]]\n",
       "[1] \"42de2197-4ce6-400e-a0e7-212f2b25e896\"\n",
       "\n",
       "[[4]]\n",
       "[1] \"56f09312-eb3f-40b7-b840-368d1ddbc22b\"\n",
       "\n",
       "[[5]]\n",
       "[1] \"4bf2bab1-b3c8-456b-aa90-566cd7b03577\"\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "in_list <- as.list(in_files)\n",
    "in_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "77a9530e-6cc1-4d9e-98fa-5ff9cc73b7e1",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ol>\n",
       "\t<li>'output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "</ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate}\n",
       "\\item 'output/diha\\_pbmc\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmv\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_sex\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmvpos\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmvneg\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_female\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_male\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\end{enumerate}\n"
      ],
      "text/markdown": [
       "1. 'output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "2. 'output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv'\n",
       "3. 'output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv'\n",
       "4. 'output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "5. 'output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "6. 'output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "7. 'output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[[1]]\n",
       "[1] \"output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[2]]\n",
       "[1] \"output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[3]]\n",
       "[1] \"output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[4]]\n",
       "[1] \"output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[5]]\n",
       "[1] \"output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[6]]\n",
       "[1] \"output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "[[7]]\n",
       "[1] \"output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "out_list <- as.list(out_files)\n",
    "out_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "8e3cd744-9ed0-424f-86ff-14d2434d36d2",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1] \"Cannot determine the current notebook.\"\n",
      "[1] \"1) /home/jupyter/scRNA-Reference-IH-A/08-Differential_expression/36-R_pseudobulk_deseq2.ipynb\"\n",
      "[1] \"2) /home/jupyter/scRNA-Reference-IH-A/08-Differential_expression/36a-Python_pseudobulk_deseq2_vis_format.ipynb\"\n",
      "[1] \"3) /home/jupyter/scRNA-Reference-IH-A/07-Pseudobulk/35-R_pseudobulk_seurat.ipynb\"\n"
     ]
    },
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Please select (1-3)  1\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<dl>\n",
       "\t<dt>$files</dt>\n",
       "\t\t<dd><ol>\n",
       "\t<li>'output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "\t<li>'output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv'</li>\n",
       "</ol>\n",
       "</dd>\n",
       "\t<dt>$traceId</dt>\n",
       "\t\t<dd>'4f461fd6-3cff-468a-88e5-c1621734eaba'</dd>\n",
       "</dl>\n"
      ],
      "text/latex": [
       "\\begin{description}\n",
       "\\item[\\$files] \\begin{enumerate}\n",
       "\\item 'output/diha\\_pbmc\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmv\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_sex\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmvpos\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_cmvneg\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_female\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\item 'output/diha\\_pbmc\\_male\\_cohort\\_pseudobulk\\_deseq2\\_2024-04-05.csv'\n",
       "\\end{enumerate}\n",
       "\n",
       "\\item[\\$traceId] '4f461fd6-3cff-468a-88e5-c1621734eaba'\n",
       "\\end{description}\n"
      ],
      "text/markdown": [
       "$files\n",
       ":   1. 'output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "2. 'output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv'\n",
       "3. 'output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv'\n",
       "4. 'output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "5. 'output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "6. 'output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "7. 'output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv'\n",
       "\n",
       "\n",
       "\n",
       "$traceId\n",
       ":   '4f461fd6-3cff-468a-88e5-c1621734eaba'\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "$files\n",
       "$files[[1]]\n",
       "[1] \"output/diha_pbmc_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[2]]\n",
       "[1] \"output/diha_pbmc_cmv_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[3]]\n",
       "[1] \"output/diha_pbmc_sex_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[4]]\n",
       "[1] \"output/diha_pbmc_cmvpos_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[5]]\n",
       "[1] \"output/diha_pbmc_cmvneg_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[6]]\n",
       "[1] \"output/diha_pbmc_female_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "$files[[7]]\n",
       "[1] \"output/diha_pbmc_male_cohort_pseudobulk_deseq2_2024-04-05.csv\"\n",
       "\n",
       "\n",
       "$traceId\n",
       "[1] \"4f461fd6-3cff-468a-88e5-c1621734eaba\"\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "uploadFiles(\n",
    "    files = out_list,\n",
    "    studySpaceId = study_space_uuid,\n",
    "    title = title,\n",
    "    inputFileIds = in_list,\n",
    "    store = \"project\",\n",
    "    destination = \"pseudobulk_deseq2\",\n",
    "    doPrompt = FALSE\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "038ec4b8-585f-4f5b-92ee-69ab4fa6b5b9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "R version 4.3.2 (2023-10-31)\n",
       "Platform: x86_64-conda-linux-gnu (64-bit)\n",
       "Running under: Ubuntu 20.04.6 LTS\n",
       "\n",
       "Matrix products: default\n",
       "BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.25.so;  LAPACK version 3.11.0\n",
       "\n",
       "Random number generation:\n",
       " RNG:     L'Ecuyer-CMRG \n",
       " Normal:  Inversion \n",
       " Sample:  Rejection \n",
       " \n",
       "locale:\n",
       " [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       \n",
       " [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   \n",
       " [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          \n",
       "[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   \n",
       "\n",
       "time zone: Etc/UTC\n",
       "tzcode source: system (glibc)\n",
       "\n",
       "attached base packages:\n",
       "[1] stats4    stats     graphics  grDevices utils     datasets  methods  \n",
       "[8] base     \n",
       "\n",
       "other attached packages:\n",
       " [1] furrr_0.3.1                 future_1.33.1              \n",
       " [3] purrr_1.0.2                 hise_2.16.0                \n",
       " [5] dplyr_1.1.4                 DESeq2_1.42.0              \n",
       " [7] SummarizedExperiment_1.32.0 Biobase_2.62.0             \n",
       " [9] MatrixGenerics_1.14.0       matrixStats_1.2.0          \n",
       "[11] GenomicRanges_1.54.1        GenomeInfoDb_1.38.5        \n",
       "[13] IRanges_2.36.0              S4Vectors_0.40.2           \n",
       "[15] BiocGenerics_0.48.1        \n",
       "\n",
       "loaded via a namespace (and not attached):\n",
       " [1] gtable_0.3.4            ggplot2_3.4.4           lattice_0.22-5         \n",
       " [4] vctrs_0.6.5             tools_4.3.2             bitops_1.0-7           \n",
       " [7] generics_0.1.3          curl_5.1.0              parallel_4.3.2         \n",
       "[10] tibble_3.2.1            fansi_1.0.6             pkgconfig_2.0.3        \n",
       "[13] Matrix_1.6-4            assertthat_0.2.1        uuid_1.2-0             \n",
       "[16] lifecycle_1.0.4         GenomeInfoDbData_1.2.11 stringr_1.5.1          \n",
       "[19] compiler_4.3.2          munsell_0.5.0           repr_1.1.6.9000        \n",
       "[22] codetools_0.2-19        htmltools_0.5.7         RCurl_1.98-1.14        \n",
       "[25] pillar_1.9.0            crayon_1.5.2            BiocParallel_1.36.0    \n",
       "[28] DelayedArray_0.28.0     abind_1.4-5             parallelly_1.36.0      \n",
       "[31] tidyselect_1.2.0        locfit_1.5-9.8          digest_0.6.34          \n",
       "[34] stringi_1.8.3           listenv_0.9.0           fastmap_1.1.1          \n",
       "[37] grid_4.3.2              colorspace_2.1-0        cli_3.6.2              \n",
       "[40] SparseArray_1.2.3       magrittr_2.0.3          S4Arrays_1.2.0         \n",
       "[43] base64enc_0.1-3         utf8_1.2.4              IRdisplay_1.1          \n",
       "[46] withr_3.0.0             scales_1.3.0            IRkernel_1.3.2         \n",
       "[49] httr_1.4.7              XVector_0.42.0          globals_0.16.2         \n",
       "[52] pbdZMQ_0.3-10           evaluate_0.23           rlang_1.1.3            \n",
       "[55] Rcpp_1.0.12             glue_1.7.0              jsonlite_1.8.8         \n",
       "[58] R6_2.5.1                zlibbioc_1.48.0        "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sessionInfo()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "300550b2-1ee0-4852-92be-336fb6e3b9c9",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R 4",
   "language": "R",
   "name": "ir4"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.3.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}