--- name: global-methylation-profile description: This skill performs genome-wide DNA methylation profiling. It supports single-sample and multi-sample workflows to compute methylation density distributions, genomic feature distribution of the methylation profile, and sample-level clustering/PCA. Use it when you want to systematically characterize global methylation patterns from WGBS or similar per-CpG methylation call files. --- # Global DNA Methylation Profiling ## Overview Main steps include: - Refer to the **Inputs & Outputs** section to check available inputs and design the output structure. - **Always prompt user** for genome assembly used. - **Always prompt user** for which columns are methylation fraction/percent and coverage and strand. - Analyze the genomic feature distribution of methylations for each sample. - Compute and visualize genome-wide methylation density distributions. - For multi-sample datasets, prepare the matrix of methylation data. - Perform PCA and hierarchical clustering to assess sample similarity based on global methylation. - **Never use MCP tools in this skill**, use R scripts instead. --- ## When to use this skill Use the **global-methylation-profiling** skill when you want to: - Characterize **global DNA methylation status** of one or multiple samples (e.g. normal vs tumor, different cell types). - Compare broad methylation patterns across samples: - Are some samples globally hypo-/hyper-methylated? - Are certain chromosomes or genomic regions more strongly affected? - Explore genomic feature of your methylation dataset (e.g. promoter hypomethylation, gene body hypermethylation). - Perform **unsupervised clustering/PCA** to see if samples separate by condition based on genome-wide methylation patterns. --- ## Inputs & Outputs ### Inputs ` 0 meth.var <- meth.mat[keep.var, ] if (sum(keep.var) > 10000) { keep.idx <- order(cpg.sd[keep.var], decreasing = TRUE)[1:10000] meth.var <- meth.var[keep.idx, ] } # Z-score transformation (per CpG) – helps clustering meth.scaled <- t(scale(t(meth.var))) # rows scaled pca <- prcomp(t(meth.scaled), center = FALSE, scale. = FALSE) pca.df <- data.frame( Sample = colnames(meth.scaled), PC1 = pca$x[, 1], PC2 = pca$x[, 2], Treatment = factor(treatment, labels = c("Control", "Treatment")) ) ggplot(pca.df, aes(x = PC1, y = PC2, color = Treatment, label = Sample)) + geom_point(size = 3) + geom_text(vjust = -1) + theme_bw() + ggtitle("PCA of global CpG methylation") + xlab(paste0("PC1 (", round(summary(pca)$importance[2, 1] * 100, 1), "%)")) + ylab(paste0("PC2 (", round(summary(pca)$importance[2, 2] * 100, 1), "%)")) dist.samples <- dist(t(meth.scaled), method = "euclidean") hc <- hclust(dist.samples, method = "complete") plot(hc, main = "Hierarchical clustering of samples (methylation)", xlab = "", sub = "") cor.samples <- cor(meth.var, use = "pairwise.complete.obs") pheatmap(cor.samples, clustering_method = "complete", main = "Sample correlation based on CpG methylation") ```