{ "cells": [ { "cell_type": "markdown", "id": "f4143c1d-4f24-4aae-9d6d-98552383c4a6", "metadata": {}, "source": [ "# Select samples for use in reference building" ] }, { "cell_type": "markdown", "id": "18f31246-4851-4a0d-a915-68ec4894d1ad", "metadata": {}, "source": [ "## Load packages" ] }, { "cell_type": "code", "execution_count": 1, "id": "5adb72ac-565d-4960-931c-3803a71e145c", "metadata": { "tags": [] }, "outputs": [], "source": [ "quiet_library <- function(...) { suppressPackageStartupMessages(library(...)) }\n", "quiet_library(hise)\n", "quiet_library(dplyr)\n", "quiet_library(purrr)" ] }, { "cell_type": "markdown", "id": "559d4162-e70f-43b8-b171-0254152a8d36", "metadata": {}, "source": [ "## Retrieve file and sample metadata from HISE" ] }, { "cell_type": "code", "execution_count": 2, "id": "886829cd-06ef-4300-956b-75fbabca4378", "metadata": { "tags": [] }, "outputs": [], "source": [ "BR1_rna_desc <- getFileDescriptors(\n", " fileType = \"scRNA-seq-labeled\", \n", " filter = list(cohort.cohortGuid = \"BR1\"))\n", "BR2_rna_desc <- getFileDescriptors(\n", " fileType = \"scRNA-seq-labeled\", \n", " filter = list(cohort.cohortGuid = \"BR2\"))\n", "UP1_rna_desc <- getFileDescriptors(\n", " fileType = \"scRNA-seq-labeled\", \n", " filter = list(cohort.cohortGuid = \"UP1\"))" ] }, { "cell_type": "code", "execution_count": 3, "id": "7713d294-470e-4bce-b823-50db7bb28ea7", "metadata": { "tags": [] }, "outputs": [], "source": [ "BR1_rna_desc <- fileDescToDataframe(BR1_rna_desc)\n", "BR2_rna_desc <- fileDescToDataframe(BR2_rna_desc)\n", "UP1_rna_desc <- fileDescToDataframe(UP1_rna_desc)" ] }, { "cell_type": "markdown", "id": "b7e3b449-b423-4987-8a7b-b9689ad1130d", "metadata": {}, "source": [ "## Remove irrelevant batches\n", "\n", "Batches starting with \"EXP\" are experimental, non-pipeline batches. \n", "B004 is an early batch that has some batch effects. We'll exclude this batch, as samples have been re-run in later batches. \n", "Batches later than B145 are not used for this reference." ] }, { "cell_type": "code", "execution_count": 4, "id": "fea5288b-59be-47ad-8e23-1ec57b1080a4", "metadata": { "tags": [] }, "outputs": [], "source": [ "meta_data <- plyr::rbind.fill(BR1_rna_desc , BR2_rna_desc )" ] }, { "cell_type": "code", "execution_count": 5, "id": "59f3e5f6-408a-4e77-ac97-a8d350ebebc1", "metadata": {}, "outputs": [], "source": [ "meta_data <- meta_data %>%\n", " filter(!grepl(\"EXP\",file.batchID)) %>%\n", " filter(!file.batchID == \"B004\") %>%\n", " mutate(file.batch_num = as.numeric(sub(\"B\",\"\",file.batchID))) %>%\n", " filter(file.batch_num <= 145) %>%\n", " select(-file.batch_num)" ] }, { "cell_type": "markdown", "id": "3a0a76db-0e8f-4743-b929-5bbb6d765934", "metadata": {}, "source": [ "## Remove non-healthy and abnormal subjects\n", "\n", "We want to use only healthy subjects without abnormal presentation for this reference. A few subjects have non-healthy or abnormal states recored at some visits. We'll identify and remove these subjects." ] }, { "cell_type": "code", "execution_count": 6, "id": "1e37252c-19bc-41df-8dca-f97275a73407", "metadata": {}, "outputs": [], "source": [ "non_healthy <- meta_data %>%\n", " filter(sample.diseaseStatesRecordedAtVisit != \"\") %>%\n", " select(subject.subjectGuid, sample.diseaseStatesRecordedAtVisit) %>%\n", " unique()" ] }, { "cell_type": "code", "execution_count": 7, "id": "e67ba6bc-ad53-4f2d-bb79-54c944aeed2e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
subject.subjectGuid | sample.diseaseStatesRecordedAtVisit | |
---|---|---|
<chr> | <chr> | |
1 | BR1034 | Psoriasis |
7 | BR2007 | Healthy - Abnormal |
17 | BR2049 | Healthy - Abnormal |