phyloseq: XXX pipeline
TwoWen,XX, XX, XX, XX,Yong Xin Liu*, and Jun Yuan*
*correspondence: YongXin Liu <YongXin@smu.edu.cn>and Jun Yuan <JunYuan@njau.edu.cn>
phyloseq:使用R语言分析微生物群落(microbiome census data) 目前对微生物群落的分析有许多挑战:使用生态学,遗传学,系统发育学,网络分析等方法对不同类型的微生物群落数据进行整合,可视化分析(visualization and testing)。微生物群落数据本身可能来源广泛,例如:人体微生物,土壤,海面及其海水,污水处理厂,工业设施等; 因此,不同来源微生物群落就其实验设计和科学问题可能有非常大的差异。phyloseq软件包是一个对在聚类OTU后的微生物群落(数据包括:OTU表格,系统发育树,OTU注释信息)进行下游系统发育等分析的综合性工具,集成了微生物群落数据导入,存储,分析和出图等功能。该软件包利用R中的许多工具进行生态学和系统发育分析(vegan, ade4, ape, picante),同时还使用先进/灵活的图形系统(ggplot2)轻松的对系统发育数据绘制发表级别的图表。phyloseq使用S4类的系统将所有相关的系统发育测序数据存储为单个实验级对象,从而更容易共享数据并重现分析。通常,phyloseq寻求促进使用R进行OTU聚类的高通量系统发育测序数据的有效探索和可重复分析。
基于phyloseq分析微生物组数据和传统方式不同,S4对象的使用给phyloseq处理微生物组数据带来了方便,但是也有门槛了,比如构建phyloseq对象。
phyloseq包含五个内容,分别是OTU表格,物种注释表格,样本分组文件,进化树文件和代表序列文件。其中OTU表格和物种注释文件要求格式为矩阵,样本分组文件要求为数据框,进化树格式为RWK,代表序列为fa文件。
下满个构造phyloseq对象。
library(ggplot2)
library(phyloseq)
library(ggClusterNet)# github:taowenmicro/ggCLusterNet
本次数据来源于刘永鑫老师的研究,发表于XXX。这里存储在github,直接运行下方代码即可直接下载和再导入
# metadata = read.delim("https://raw.githubusercontent.com/taowenmicro/R-_function/main/metadata.tsv",row.names = 1)
# otutab = read.delim("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otutab.txt", row.names=1)
# taxonomy = read.table("https://raw.githubusercontent.com/taowenmicro/R-_function/main/taxonomy.txt", row.names=1,header = T)
# tree = read_tree("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otus.tree")
# library(Biostrings)
# rep = readDNAStringSet("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otus.fa")
#
# ps = phyloseq(sample_data(metadata),
# otu_table(as.matrix(otutab), taxa_are_rows=TRUE),
# tax_table(as.matrix(taxonomy)), phy_tree(tree),refseq(rep)
# )
data(ps)
#统计OTU数量
ntaxa(ps)
## [1] 2861
# 统计样品数量
nsamples(ps)
## [1] 18
# 查看样品名称
sample_names(ps)[1:5]
## [1] "KO1" "KO2" "KO3" "KO4" "KO5"
# 查看物种分类等级
rank_names(ps)
## [1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
# 查看分组文件表头信息
sample_variables(ps)
## [1] "Group" "Date" "Site"
## [4] "Sequencing" "Platform" "Species"
## [7] "Batch" "BarcodeSequence" "LinkerPrimerSequence"
## [10] "ReversePrimer"
# 部分可视化OTU表格
otu_table(ps)[1:5, 1:5]
## OTU Table: [5 taxa and 5 samples]
## taxa are rows
## KO1 KO2 KO3 KO4 KO5
## ASV_1 1113 1968 816 1372 1062
## ASV_1591 12 10 1 14 3
## ASV_657 5 7 6 9 8
## ASV_28 253 303 54 439 132
## ASV_1717 3 6 0 1 0
# 部分可视化tax注释表格
tax_table(ps)[1:5, 1:4]
## Taxonomy Table: [5 taxa by 4 taxonomic ranks]:
## Kingdom Phylum Class Order
## ASV_1 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_1591 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_657 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_28 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_1717 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
# 查看进化树文件
phy_tree(ps)
##
## Phylogenetic tree with 2861 tips and 2860 internal nodes.
##
## Tip labels:
## ASV_1, ASV_1591, ASV_657, ASV_28, ASV_1717, ASV_2407, ...
##
## Rooted; includes branch lengths.
#-基于样本-----去除序列量比较少的样本
ps <- prune_samples(sample_sums(ps) >=2000,ps);ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 2861 taxa and 18 samples ]
## sample_data() Sample Data: [ 18 samples by 10 sample variables ]
## tax_table() Taxonomy Table: [ 2861 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 2861 tips and 2860 internal nodes ]
#--基于样本----去除低丰度OTU,这里去除全部read为0的OTU
ps = filter_taxa(ps, function(x) sum(x ) > 0 , TRUE)
p0 <- plot_richness(ps, x="Group", color="Group", measures=c("Chao1", "Observed"))
p0$layers[[2]] = NULL # 去除误差线
# 添加箱型图
p0 <- p0 + geom_boxplot() + theme_bw()
ggsave("./Fig1_alpha.png",p0,width = 6,height = 4)
ps_rela = transform_sample_counts(ps, function(x) 1E6 * x/sum(x))
ord <- ordinate(ps_rela, "NMDS", "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.07711872
## Run 1 stress 0.07711868
## ... New best solution
## ... Procrustes: rmse 7.169087e-05 max resid 0.0002338007
## ... Similar to previous best
## Run 2 stress 0.09655982
## Run 3 stress 0.0941294
## Run 4 stress 0.07711869
## ... Procrustes: rmse 5.314733e-05 max resid 0.0001746393
## ... Similar to previous best
## Run 5 stress 0.08757124
## Run 6 stress 0.07711873
## ... Procrustes: rmse 9.101156e-05 max resid 0.0002970059
## ... Similar to previous best
## Run 7 stress 0.07711868
## ... New best solution
## ... Procrustes: rmse 3.320922e-05 max resid 0.0001085458
## ... Similar to previous best
## Run 8 stress 0.09412915
## Run 9 stress 0.07711871
## ... Procrustes: rmse 7.731383e-05 max resid 0.0002522644
## ... Similar to previous best
## Run 10 stress 0.09541457
## Run 11 stress 0.09412917
## Run 12 stress 0.09541427
## Run 13 stress 0.09414427
## Run 14 stress 0.09414429
## Run 15 stress 0.08880238
## Run 16 stress 0.3719181
## Run 17 stress 0.07711878
## ... Procrustes: rmse 0.0001730924 max resid 0.0005655198
## ... Similar to previous best
## Run 18 stress 0.07711876
## ... Procrustes: rmse 0.000151849 max resid 0.0004953314
## ... Similar to previous best
## Run 19 stress 0.08780134
## Run 20 stress 0.09612165
## *** Solution reached
ord
##
## Call:
## metaMDS(comm = veganifyOTU(physeq), distance = distance)
##
## global Multidimensional Scaling using monoMDS
##
## Data: wisconsin(sqrt(veganifyOTU(physeq)))
## Distance: bray
##
## Dimensions: 2
## Stress: 0.07711868
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling
## Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
p1 = plot_ordination(ps, ord, type="samples", color="Group", title="Ordination") + theme_bw()
p1
ggsave("./Fig2_ordination.png",p1,width = 5,height = 5)
堆叠柱状图在不同分类水平按照不同分组展示微生物的丰度信息,遗憾的是这个函数如果可以展示高丰度的几十个微生物,并且将其他微生物合并在一起作为others就好了。
ps_rela = transform_sample_counts(ps, function(x) x/sum(x))
#--相对丰都标准化后展示全部OTU-使用门水平着色。
p2 <- plot_bar(ps_rela, fill="Phylum") + theme_bw()
#--展示部分感兴趣的微生物
gp.ch = subset_taxa(ps_rela, Phylum == "Chlamydiae")
plot_bar(gp.ch, fill="Genus")
plot_bar(gp.ch, x="Group", fill="Genus")
plot_bar(gp.ch, "Group", fill="Genus", facet_grid=~Family)
ggsave("./Fig3_Phylum_abundance_bar.png",p2,width = 8,height = 8)
ps_rela = transform_sample_counts(ps, function(x) x/sum(x))
gpt <- subset_taxa(ps_rela, Kingdom=="Bacteria")
gpt <- prune_taxa(names(sort(taxa_sums(ps),TRUE)[1:30]), gpt)
#默认从OTU水平展示丰度热图
plot_heatmap(gpt, sample.label="Group")
#--该函数可以使用排序方法对行和列进行排序分析,并且按照结构进行顺序安排。
p3 <- plot_heatmap(gpt, "NMDS", "bray", "Group", "Family") +
scale_fill_gradientn(colours =colorRampPalette(RColorBrewer::brewer.pal(4,"Set3"))(60))
ggsave("./Fig4_Family_high_abundance_heatmap.png",p3,width = 6,height = 6)
p4 <- plot_tree(gpt, color="Group", label.tips="taxa_names", ladderize="left",size="abundance", base.spacing=0.8) +
scale_size_continuous()
plot_tree(gpt, color="Group", label.tips="taxa_names", ladderize="left",size="abundance", base.spacing=0.5) + coord_polar(theta="y") +
scale_size_continuous()
ggsave("./Fig5_phytree_high_abundance_point.png",p4,width = 8,height = 8)
myTaxa = names(sort(taxa_sums(ps), decreasing = TRUE)[1:50])
ps1 = prune_taxa(myTaxa, ps)
ps1
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 50 taxa and 18 samples ]
## sample_data() Sample Data: [ 18 samples by 10 sample variables ]
## tax_table() Taxonomy Table: [ 50 taxa by 7 taxonomic ranks ]
## phy_tree() Phylogenetic Tree: [ 50 tips and 49 internal nodes ]
p5 <- plot_net(ps1,type = "taxa",maxdist = 0.6,color = "Phylum")
ggsave("./Fig6_Micro_network.png",p5,width = 8,height = 8)
# p0/p4|p1/p4|p2/p3
library(DESeq2)
# gm_mean = function(x, na.rm=TRUE){
# exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
# }
diagdds = phyloseq_to_deseq2(ps, ~ Group)
## converting counts to integer mode
# calculate geometric means prior to estimate size factors
gm_mean = function(x, na.rm=TRUE){
exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds = DESeq(diagdds, fitType="local")
res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(ps)[rownames(sigtab), ], "matrix"))
head(sigtab)
## baseMean log2FoldChange lfcSE stat pvalue padj
## ASV_174 33.03969 3.4870081 0.8432392 4.135254 3.545625e-05 0.003381463
## ASV_79 82.21388 -0.9250167 0.2133781 -4.335105 1.456906e-05 0.002086737
## ASV_6 625.40881 -1.0358844 0.2333951 -4.438329 9.066008e-06 0.001688091
## ASV_41 154.25212 -1.2313562 0.2991679 -4.115937 3.856089e-05 0.003419066
## ASV_73 85.50976 -1.3770650 0.3129025 -4.400940 1.077831e-05 0.001803533
## ASV_141 42.06084 -1.4928966 0.3334011 -4.477779 7.542366e-06 0.001560432
## Kingdom Phylum Class Order
## ASV_174 Bacteria Actinobacteria Actinobacteria Actinomycetales
## ASV_79 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_6 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_41 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_73 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_141 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## Family Genus Species
## ASV_174 Streptomycetaceae Streptomyces Unassigned
## ASV_79 Unassigned Unassigned Unassigned
## ASV_6 Unassigned Unassigned Unassigned
## ASV_41 Oxalobacteraceae Unassigned Unassigned
## ASV_73 Oxalobacteraceae Unassigned Unassigned
## ASV_141 Oxalobacteraceae Massilia Unassigned
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
scale_fill_brewer(palette = palname, ...)
}
# Phylum order
x = tapply(sigtab$log2FoldChange, sigtab$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtab$Phylum = factor(as.character(sigtab$Phylum), levels=names(x))
# Genus order
x = tapply(sigtab$log2FoldChange, sigtab$Genus, function(x) max(x))
x = sort(x, TRUE)
sigtab$Genus = factor(as.character(sigtab$Genus), levels=names(x))
ggplot(sigtab, aes(x=Genus, y=log2FoldChange, color=Phylum)) + geom_point(size=6) +
theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))