--- title: "Manhattan Plots" output: rmarkdown::html_vignette: toc: true description: > Visualize a summary of the association between cluster-feature and feature-feature relationships. vignette: > %\VignetteIndexEntry{Manhattan Plots} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` Download a copy of the vignette to follow along here: [manhattan_plots.Rmd](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/manhattan_plots.Rmd) Manhattan plots can be quickly visualize the relationships between variables and cluster solutions. There are three main Manhattan plot variations provided in metaSNF. 1. `esm_manhattan_plot` Visualize how a set of cluster solutions separate over input/out-of-model variables 2. `mc_manhattan_plot` Visualize how representative solutions from defined meta clusters separate over input/out-of-model variables 3. `var_manhattan_plot` Visualize how one raw feature associates with other raw features (similar to `assoc_pval_heatmap`) ## Data set-up The example below is taken from the ["complete example" vignette](https://branchlab.github.io/metasnf/articles/a_complete_example.html). ```{r eval = FALSE} library(metasnf) # Start by making a data list containing all our dataframes to more easily # identify subjects without missing data full_data_list <- generate_data_list( list(abcd_subc_v, "subcortical_volume", "neuroimaging", "continuous"), list(abcd_h_income, "household_income", "demographics", "continuous"), list(abcd_pubertal, "pubertal_status", "demographics", "continuous"), list(abcd_anxiety, "anxiety", "behaviour", "ordinal"), list(abcd_depress, "depressed", "behaviour", "ordinal"), uid = "patient" ) # Partition into a data and target list (optional) data_list <- full_data_list[1:3] target_list <- full_data_list[4:5] # Build space of settings to cluster over settings_matrix <- generate_settings_matrix( data_list, nrow = 20, min_k = 20, max_k = 50, seed = 42 ) # Clustering solutions_matrix <- batch_snf(data_list, settings_matrix) # Calculate p-values between cluster solutions and features extended_solutions_matrix <- extend_solutions( solutions_matrix, data_list = data_list, target_list = target_list, min_pval = 1e-10 # p-values below 1e-10 will be thresholded to 1e-10 ) ``` ## Associations with Multiple Cluster Solutions (`esm_manhattan_plot`) ```{r eval = FALSE} esm_manhattan <- esm_manhattan_plot( extended_solutions_matrix[1:5, ], neg_log_pval_thresh = 5, threshold = 0.05, point_size = 3, jitter_width = 0.1, jitter_height = 0.1, plot_title = "Variable-Solution Associations", text_size = 14, bonferroni_line = TRUE ) ggplot2::ggsave( "esm_manhattan.png", esm_manhattan, height = 5, width = 8, dpi = 100 ) ```
![](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/esm_manhattan.png)
A bit of an unwieldy plot if you try looking at too many solutions at a time, but it can be handy if you intend on just examining a few cluster solutions. ## Associations with Meta Clusters (`mc_manhattan_plot`) The `mc_manhattan_plot` function can be used after meta clustering to more efficiently examine the entire space of generated cluster solutions. ```{r eval = FALSE} # Calculate pairwise similarities between cluster solutions solutions_matrix_aris <- calc_aris(solutions_matrix) # Extract hierarchical clustering order of the cluster solutions meta_cluster_order <- get_matrix_order(solutions_matrix_aris) # Create a base heatmap for visual meta clustering ari_hm <- adjusted_rand_index_heatmap( solutions_matrix_aris, order = meta_cluster_order ) # Identify meta cluster boundaries # This can also be by trial & error if you do not wish to use the shiny app. shiny_annotator(ari_hm) # Result of meta cluster examination split_vec <- c(2, 5, 12, 17) # Create a base heatmap for visual meta clustering ari_hm <- adjusted_rand_index_heatmap( solutions_matrix_aris, order = meta_cluster_order, split_vector = split_vec ) # Extracting representative solutions from each defined meta cluster rep_solutions <- get_representative_solutions( solutions_matrix_aris, split_vector = split_vec, order = meta_cluster_order, extended_solutions_matrix ) mc_manhattan <- mc_manhattan_plot( rep_solutions, data_list = data_list, target_list = target_list, point_size = 3, text_size = 12, plot_title = "Variable-Meta Cluster Associations", threshold = 0.05, neg_log_pval_thresh = 5 ) ggplot2::ggsave( "mc_manhattan_clean.png", mc_manhattan, height = 10, width = 10, dpi = 100 ) ```
![](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/mc_manhattan_clean.png)
## Associations with a Key Variable You can also visualize associations with a specific variable of interest rather than cluster solutions. The only thing needed for this plot is a data_list - no clustering necessary. ```{r eval = FALSE} var_manhattan <- var_manhattan_plot( data_list, key_var = "household_income", plot_title = "Correlation of Features with Household Income", text_size = 16, neg_log_pval_thresh = 3, threshold = 0.05 ) ggplot2::ggsave( "var_manhattan.png", var_manhattan, height = 7, width = 12, dpi = 100 ) ```
![](https://raw.githubusercontent.com/BRANCHlab/metasnf/main/vignettes/var_manhattan.png)