Version 0.1.0 is a major update to sceptre. Usability
and speed are improved considerably.
The function run_sceptre_gRNA_gene_pair, which was
redundant, is now deprecated.
run_sceptre_high_moi(previously called
run_sceptre_in_memory) is simpler to use: the function now
has only four required arguments: gene_matrix (previously
called expression_matrix), gRNA_matrix
(previously called expression_matrix),
covariate_matrix, and gene_gRNA_pairs. The
formerly required arguments storage_dir and
side are now set to tempdir() and “both” by
default. Additionally, the argument pod_sizes is removed
entirely (and handled internally).
run_sceptre_high_moi has the additional optional
arguments full_output and parallel.
full_output controls the complexity of the data frame
outputted by the method. When full_output is set to FALSE
(the default), run_sceptre_high_moi outputs a data frame
with four columns only, all of which are easy to interpret:
gene_id, gRNA_id, p_value, and
z_value. parallel controls whether the
function is parallelized (TRUE; default) or not (FALSE).
run_sceptre_high_moi now accepts a raw (i.e.,
unthresholded) gRNA matrix or a user-thresholded gRNA matrix.
A new auxiliary function combine_gRNAs combines
gRNAs that target the same chromosomal site.
Numerous checks have been added to
run_sceptre_high_moi ensure that the input is valid. For
example, run_sceptre_high_moi checks that gene IDs and gRNA
IDs in gene_gRNA_pairs are in fact subsets of the row names
of gene_matrix and gRNA_matrix,
respectively.
Two accelerations have been implemented to improve speed. These accelerations do not affect the API of the package.
First, the test statistic used in the conditional randomization test has changed. Previously, the test statistic was a z-score derived from a Wald test of a fitted negative binomial GLM. Now, the test statistic is a z-score derived from a score test of the same negative binomial GLM, which is asymptotically equivalent to the former but more robust in finite samples. Additionally, this score test-based z-score is computed via an explicit formula, sidestepping the need to fit a GLM, as was done previously. Overall, the new test statistic is faster to compute and more robust than the previous test statistic.
The synthetic perturbation indicators are now generated as part of the gRNA precomputation, factoring out this somewhat time-intensive step from the pairwise tests of association.