--- name: bio-metagenomics-visualization description: Visualize metagenomic profiles using R (phyloseq, microbiome) and Python (matplotlib, seaborn). Create stacked bar plots, heatmaps, PCA plots, and diversity analyses. Use when creating publication-quality figures from MetaPhlAn, Bracken, or other taxonomic profiling output. tool_type: mixed primary_tool: phyloseq --- # Metagenome Visualization ## Python - Stacked Bar Plot ```python import pandas as pd import matplotlib.pyplot as plt abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0) abundance = abundance[abundance.index.str.contains('s__')] abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '') top_n = 10 top_species = abundance.sum(axis=1).nlargest(top_n).index abundance_top = abundance.loc[top_species] abundance_top.loc['Other'] = abundance.drop(top_species).sum() abundance_top.T.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='tab20') plt.xlabel('Sample') plt.ylabel('Relative Abundance (%)') plt.title('Species Composition') plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left') plt.tight_layout() plt.savefig('stacked_bar.png', dpi=300) ``` ## Python - Heatmap ```python import pandas as pd import seaborn as sns import matplotlib.pyplot as plt abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0) abundance = abundance[abundance.index.str.contains('s__')] abundance.index = abundance.index.str.split('|').str[-1].str.replace('s__', '') top_species = abundance.sum(axis=1).nlargest(20).index abundance_top = abundance.loc[top_species] plt.figure(figsize=(12, 10)) sns.heatmap(abundance_top, cmap='YlOrRd', annot=False, cbar_kws={'label': 'Abundance (%)'}) plt.xlabel('Sample') plt.ylabel('Species') plt.title('Species Abundance Heatmap') plt.tight_layout() plt.savefig('heatmap.png', dpi=300) ``` ## Python - PCA ```python import pandas as pd import matplotlib.pyplot as plt from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler abundance = pd.read_csv('merged_abundance.txt', sep='\t', index_col=0).T scaler = StandardScaler() abundance_scaled = scaler.fit_transform(abundance) pca = PCA(n_components=2) pca_result = pca.fit_transform(abundance_scaled) plt.figure(figsize=(8, 6)) plt.scatter(pca_result[:, 0], pca_result[:, 1]) for i, sample in enumerate(abundance.index): plt.annotate(sample, (pca_result[i, 0], pca_result[i, 1])) plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)') plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)') plt.title('PCA of Sample Composition') plt.savefig('pca.png', dpi=300) ``` ## R - phyloseq Setup ```r library(phyloseq) library(ggplot2) library(vegan) # From MetaPhlAn merged table abundance <- read.table('merged_abundance.txt', sep = '\t', header = TRUE, row.names = 1) # Filter to species level species <- abundance[grepl('s__', rownames(abundance)), ] rownames(species) <- sapply(strsplit(rownames(species), '\\|'), tail, 1) rownames(species) <- gsub('s__', '', rownames(species)) # Create phyloseq object otu <- otu_table(as.matrix(species), taxa_are_rows = TRUE) # Sample metadata (create or load) sample_data <- data.frame( Sample = colnames(species), Group = c('Control', 'Control', 'Treatment', 'Treatment'), row.names = colnames(species) ) samp <- sample_data(sample_data) ps <- phyloseq(otu, samp) ``` ## R - Stacked Bar Plot ```r library(phyloseq) library(ggplot2) # Top taxa top_taxa <- names(sort(taxa_sums(ps), decreasing = TRUE))[1:10] ps_top <- prune_taxa(top_taxa, ps) # Stacked bar plot_bar(ps_top, fill = 'Species') + geom_bar(stat = 'identity', position = 'stack') + theme_minimal() + labs(x = 'Sample', y = 'Relative Abundance (%)') + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ## R - Ordination (PCoA) ```r library(phyloseq) library(ggplot2) # Bray-Curtis distance ord <- ordinate(ps, method = 'PCoA', distance = 'bray') # Plot ordination plot_ordination(ps, ord, color = 'Group') + geom_point(size = 4) + stat_ellipse() + theme_minimal() + labs(title = 'PCoA of Sample Composition') ``` ## R - Alpha Diversity ```r library(phyloseq) library(ggplot2) # Calculate diversity metrics alpha_div <- estimate_richness(ps, measures = c('Shannon', 'Simpson', 'Observed')) # Add metadata alpha_div$Group <- sample_data(ps)$Group # Plot ggplot(alpha_div, aes(x = Group, y = Shannon, fill = Group)) + geom_boxplot() + geom_jitter(width = 0.1) + theme_minimal() + labs(title = 'Alpha Diversity by Group', y = 'Shannon Index') ``` ## R - Beta Diversity (PERMANOVA) ```r library(vegan) # Get abundance matrix abundance_matrix <- as(otu_table(ps), 'matrix') if (taxa_are_rows(ps)) abundance_matrix <- t(abundance_matrix) # Calculate Bray-Curtis distance dist_bc <- vegdist(abundance_matrix, method = 'bray') # PERMANOVA groups <- sample_data(ps)$Group permanova <- adonis2(dist_bc ~ groups, permutations = 999) permanova ``` ## Krona Chart ```bash # From Kraken2 report ktImportTaxonomy -q 1 -t 5 kraken_report.txt -o krona_chart.html # From MetaPhlAn metaphlan2krona.py -p profile.txt -k krona_profile.txt ktImportText krona_profile.txt -o krona_metaphlan.html ``` ## Key Packages ### Python | Package | Purpose | |---------|---------| | matplotlib | General plotting | | seaborn | Statistical visualizations | | scikit-learn | PCA, clustering | | scipy | Statistical tests | ### R | Package | Purpose | |---------|---------| | phyloseq | Microbiome data handling | | vegan | Community ecology | | ggplot2 | Visualization | | microbiome | Additional analyses | ## Related Skills - kraken-classification - Generate input data - metaphlan-profiling - Generate input data - abundance-estimation - Process Kraken output