run_sceptre_in_memory.RdRuns sceptre in memory. This function is appropriate for data of intermediate size (i.e., the data are small enough to fit into memory, and the analysis does not need to be run across nodes on a computer cluster).
run_sceptre_in_memory( storage_dir, expression_matrix, perturbation_matrix, covariate_matrix, gene_gRNA_pairs, side, pod_sizes, regularization_amount = 0.1, seed = 4, B = 500 )
| storage_dir | name of a directory in which to store logs, intermediate computations, and final results |
|---|---|
| expression_matrix | a matrix of gene expressions (in UMI counts); rows correspond to genes and columns to cells; the rows should be named. |
| perturbation_matrix | a binary matrix of gRNA perturbations; rows correspond to gRNAs and columns to cells; the rows likewise should be names. |
| covariate_matrix | a data frame of covariates; rows correspond to cells. |
| gene_gRNA_pairs | a data frame with named columns "gene_id" and "gRNA_id" giving the names of the gRNAs and genes to analyze. |
| side | sidedness of the test; one of "left", "right," and "both". "left" is most appropriate for experiments in which cis-regulatory relationships are tested by perturbing putative enhancers with CRISPRi. |
| pod_sizes | an integer vector giving the size of the "gene", "gRNA", and "pair" pods. |
| regularization_amount | (optional; default 0.1) the amount of regularization to apply to the estimated negative binomial size parameters, where 0 corresponds to no regularization at all. |
| seed | (optional; default 4) seed to pass to the random number generator. |
| B | (optional; default 500) number of random samples to draw in the conditional randomization test. |
a data frame containing the results. Includes columns gene_id, gRNA_id, p_value, skew_t_fit_success (indicating whether the skew-t fit succeeded), xi, omega, alpha, nu (parameters of the skew-t distribution), z_value (the ground truth negative binomial test statistic), and n_successful_resamples (indicates how many of the B resamples were successful).
if (FALSE) { # generate random perturbation and expression data in memory library(dplyr) set.seed(4) n_gRNAs <- 50 n_genes <- 40 n_cells <- 5000 # create perturbation, expression, and covariate matrices perturbation_matrix <- replicate(n = n_gRNAs, rbinom(n_cells, 1, 0.05)) %>% t() expression_matrix <- replicate(n = n_genes, rpois(n_cells, 1)) %>% t() covariate_matrix <- data.frame(p_mito = runif(n = n_cells, min = 0, max = 10), lg_umi_count = log(rpois(n = n_cells, lambda = 500))) # assign column names to the perturbation and expression matrices row.names(perturbation_matrix) <- paste0("gRNA", seq(1, n_gRNAs)) row.names(expression_matrix) <- paste0("gene", seq(1, n_genes)) # select 5000 random gene-gRNA pairs to analyze; create the gene_gRNA_pairs data frame gene_gRNA_pairs <- expand.grid(gene_id = row.names(expression_matrix), gRNA_id = row.names(perturbation_matrix)) %>% slice_sample(n = 90) # let the temporary directory be the storage dir storage_dir <- tempdir() # set the remaining parameters: test sidedness, pod_sizes, regularization_amount, seed, and B. side <- "left" pod_sizes = c(gene = 10, gRNA = 10, pair = 15) regularization_amount <- 0.1 seed <- 4 B <- 500 result <- run_sceptre_in_memory(storage_dir, expression_matrix, perturbation_matrix, covariate_matrix, gene_gRNA_pairs, side, pod_sizes, regularization_amount, seed, B) }