--- title: "Stability Measures" output: rmarkdown::html_vignette: toc: true description: > Evaluating robustness of cluster solutions through resampling methods. vignette: > %\VignetteIndexEntry{Stability Measures} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Download a copy of the vignette to follow along here: [stability_measures.Rmd](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/stability_measures.Rmd) In this vignette, we will highlight the main stability measure options in the metasnf package. ## Data set-up ```{r eval = FALSE} library(metasnf) data_list <- generate_data_list( list(cort_t, "cortical_thickness", "neuroimaging", "continuous"), list(cort_sa, "cortical_area", "neuroimaging", "continuous"), list(subc_v, "subcortical_volume", "neuroimaging", "continuous"), list(income, "household_income", "demographics", "continuous"), list(pubertal, "pubertal_status", "demographics", "continuous"), uid = "unique_id" ) set.seed(42) settings_matrix <- generate_settings_matrix( data_list, nrow = 4, max_k = 40 ) # Cluster solutions made using the full data solutions_matrix <- batch_snf(data_list, settings_matrix) ``` To begin start calculating resampling-based stability measures, we'll build subsamples of the data list using the `subsample_data_list` function. ```{r eval = FALSE} data_list_subsamples <- subsample_data_list( data_list, n_subsamples = 100, subsample_fraction = 0.85 ) ``` `data_list_subsamples` contains a list of 100 subsamples of the full data list. Each variation only has a random 85% of the original observations. ```{r eval = FALSE} batch_subsample_results <- batch_snf_subsamples( data_list_subsamples, settings_matrix ) ``` By default, the function returns a one-element list: `cluster_solutions`, which is itself a list of cluster solution data frames corresponding to each of the provided data list subsamples. Setting the parameters `return_similarity_matrices` and `return_solutions_matrices` to `TRUE` will turn the result of the function to a three-element list containing the corresponding solutions matrices and final fused similarity matrices of those cluster solutions, should you require these objects for your own stability calculations. The function `subsample_pairwise_aris` can then be used to calculate the ARIs between cluster solutions across the subsamples. ```{r eval = FALSE} subsample_cluster_solutions <- batch_subsample_results[["cluster_solutions"]] pairwise_aris <- subsample_pairwise_aris( subsample_cluster_solutions, return_raw_aris = TRUE ) ``` `pairwise_aris` is a list containing a summary data frame of the ARIs between subsamples for each row of the original settings matrix, as well as another list of all the generated inter-subsample ARIs as a result of setting `return_raw_aris` to `TRUE`. The raw inter-subsample ARIs corresponding to a particualr settings matrix row can be visualized with a heatmap: ```{r eval = FALSE} ComplexHeatmap::Heatmap( raw_aris[[1]], heatmap_legend_param = list( color_bar = "continuous", title = "Inter-Subsample\nARI", at = c(0, 0.5, 1) ), show_column_names = FALSE, show_row_names = FALSE ) ```