phyloseq: XXX pipeline

TwoWen,XX, XX, XX, XX,Yong Xin Liu*, and Jun Yuan*

*correspondence: YongXin Liu <>and Jun Yuan <>

introduction

phyloseq:使用R语言分析微生物群落(microbiome census data) 目前对微生物群落的分析有许多挑战:使用生态学,遗传学,系统发育学,网络分析等方法对不同类型的微生物群落数据进行整合,可视化分析(visualization and testing)。微生物群落数据本身可能来源广泛,例如:人体微生物,土壤,海面及其海水,污水处理厂,工业设施等; 因此,不同来源微生物群落就其实验设计和科学问题可能有非常大的差异。phyloseq软件包是一个对在聚类OTU后的微生物群落(数据包括:OTU表格,系统发育树,OTU注释信息)进行下游系统发育等分析的综合性工具,集成了微生物群落数据导入,存储,分析和出图等功能。该软件包利用R中的许多工具进行生态学和系统发育分析(vegan, ade4, ape, picante),同时还使用先进/灵活的图形系统(ggplot2)轻松的对系统发育数据绘制发表级别的图表。phyloseq使用S4类的系统将所有相关的系统发育测序数据存储为单个实验级对象,从而更容易共享数据并重现分析。通常,phyloseq寻求促进使用R进行OTU聚类的高通量系统发育测序数据的有效探索和可重复分析。

pipeline-phyloseq

基于phyloseq分析微生物组数据和传统方式不同,S4对象的使用给phyloseq处理微生物组数据带来了方便,但是也有门槛了,比如构建phyloseq对象。

构造phyloseq对象

phyloseq包含五个内容,分别是OTU表格,物种注释表格,样本分组文件,进化树文件和代表序列文件。其中OTU表格和物种注释文件要求格式为矩阵,样本分组文件要求为数据框,进化树格式为RWK,代表序列为fa文件。

下满个构造phyloseq对象。

library(ggplot2)
library(phyloseq)
library(ggClusterNet)# github:taowenmicro/ggCLusterNet

准备数据-构建phyloseq对象

本次数据来源于刘永鑫老师的研究,发表于XXX。这里存储在github,直接运行下方代码即可直接下载和再导入

# metadata = read.delim("https://raw.githubusercontent.com/taowenmicro/R-_function/main/metadata.tsv",row.names = 1)
# otutab = read.delim("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otutab.txt", row.names=1)
# taxonomy = read.table("https://raw.githubusercontent.com/taowenmicro/R-_function/main/taxonomy.txt", row.names=1,header = T)
# tree  = read_tree("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otus.tree")
# library(Biostrings)
# rep = readDNAStringSet("https://raw.githubusercontent.com/taowenmicro/R-_function/main/otus.fa")
# 
# ps = phyloseq(sample_data(metadata),
#               otu_table(as.matrix(otutab), taxa_are_rows=TRUE),
#               tax_table(as.matrix(taxonomy)), phy_tree(tree),refseq(rep)
#               )

data(ps)

对群落数据进行基本描述和分析

#统计OTU数量
ntaxa(ps)
## [1] 2861
# 统计样品数量
nsamples(ps)
## [1] 18
# 查看样品名称
sample_names(ps)[1:5]
## [1] "KO1" "KO2" "KO3" "KO4" "KO5"
# 查看物种分类等级
rank_names(ps)
## [1] "Kingdom" "Phylum"  "Class"   "Order"   "Family"  "Genus"   "Species"
# 查看分组文件表头信息
sample_variables(ps)
##  [1] "Group"                "Date"                 "Site"                
##  [4] "Sequencing"           "Platform"             "Species"             
##  [7] "Batch"                "BarcodeSequence"      "LinkerPrimerSequence"
## [10] "ReversePrimer"
# 部分可视化OTU表格
otu_table(ps)[1:5, 1:5]
## OTU Table:          [5 taxa and 5 samples]
##                      taxa are rows
##           KO1  KO2 KO3  KO4  KO5
## ASV_1    1113 1968 816 1372 1062
## ASV_1591   12   10   1   14    3
## ASV_657     5    7   6    9    8
## ASV_28    253  303  54  439  132
## ASV_1717    3    6   0    1    0
# 部分可视化tax注释表格
tax_table(ps)[1:5, 1:4]
## Taxonomy Table:     [5 taxa by 4 taxonomic ranks]:
##          Kingdom    Phylum           Class            Order            
## ASV_1    "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_1591 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_657  "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_28   "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
## ASV_1717 "Bacteria" "Actinobacteria" "Actinobacteria" "Actinomycetales"
# 查看进化树文件
phy_tree(ps)
## 
## Phylogenetic tree with 2861 tips and 2860 internal nodes.
## 
## Tip labels:
##   ASV_1, ASV_1591, ASV_657, ASV_28, ASV_1717, ASV_2407, ...
## 
## Rooted; includes branch lengths.

数据清洗和标准化

#-基于样本-----去除序列量比较少的样本
ps <- prune_samples(sample_sums(ps) >=2000,ps);ps
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 2861 taxa and 18 samples ]
## sample_data() Sample Data:       [ 18 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 2861 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 2861 tips and 2860 internal nodes ]
#--基于样本----去除低丰度OTU,这里去除全部read为0的OTU
ps = filter_taxa(ps, function(x) sum(x ) > 0 , TRUE)

alpha多样性分析

p0 <- plot_richness(ps, x="Group", color="Group", measures=c("Chao1", "Observed")) 

p0$layers[[2]] = NULL # 去除误差线
# 添加箱型图
p0 <- p0  + geom_boxplot() + theme_bw()

ggsave("./Fig1_alpha.png",p0,width = 6,height = 4)

beta多样性-排序分析可视化

ps_rela = transform_sample_counts(ps, function(x) 1E6 * x/sum(x))
ord <- ordinate(ps_rela, "NMDS", "bray")
## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.07711872 
## Run 1 stress 0.07711868 
## ... New best solution
## ... Procrustes: rmse 7.169087e-05  max resid 0.0002338007 
## ... Similar to previous best
## Run 2 stress 0.09655982 
## Run 3 stress 0.0941294 
## Run 4 stress 0.07711869 
## ... Procrustes: rmse 5.314733e-05  max resid 0.0001746393 
## ... Similar to previous best
## Run 5 stress 0.08757124 
## Run 6 stress 0.07711873 
## ... Procrustes: rmse 9.101156e-05  max resid 0.0002970059 
## ... Similar to previous best
## Run 7 stress 0.07711868 
## ... New best solution
## ... Procrustes: rmse 3.320922e-05  max resid 0.0001085458 
## ... Similar to previous best
## Run 8 stress 0.09412915 
## Run 9 stress 0.07711871 
## ... Procrustes: rmse 7.731383e-05  max resid 0.0002522644 
## ... Similar to previous best
## Run 10 stress 0.09541457 
## Run 11 stress 0.09412917 
## Run 12 stress 0.09541427 
## Run 13 stress 0.09414427 
## Run 14 stress 0.09414429 
## Run 15 stress 0.08880238 
## Run 16 stress 0.3719181 
## Run 17 stress 0.07711878 
## ... Procrustes: rmse 0.0001730924  max resid 0.0005655198 
## ... Similar to previous best
## Run 18 stress 0.07711876 
## ... Procrustes: rmse 0.000151849  max resid 0.0004953314 
## ... Similar to previous best
## Run 19 stress 0.08780134 
## Run 20 stress 0.09612165 
## *** Solution reached
ord
## 
## Call:
## metaMDS(comm = veganifyOTU(physeq), distance = distance) 
## 
## global Multidimensional Scaling using monoMDS
## 
## Data:     wisconsin(sqrt(veganifyOTU(physeq))) 
## Distance: bray 
## 
## Dimensions: 2 
## Stress:     0.07711868 
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation, halfchange scaling 
## Species: expanded scores based on 'wisconsin(sqrt(veganifyOTU(physeq)))'
p1 = plot_ordination(ps, ord, type="samples", color="Group", title="Ordination") + theme_bw()
p1

ggsave("./Fig2_ordination.png",p1,width = 5,height = 5)

微生物物种组成分析

堆叠柱状图在不同分类水平按照不同分组展示微生物的丰度信息,遗憾的是这个函数如果可以展示高丰度的几十个微生物,并且将其他微生物合并在一起作为others就好了。

ps_rela = transform_sample_counts(ps, function(x) x/sum(x))
#--相对丰都标准化后展示全部OTU-使用门水平着色。
p2 <- plot_bar(ps_rela, fill="Phylum") + theme_bw()
#--展示部分感兴趣的微生物
gp.ch = subset_taxa(ps_rela, Phylum == "Chlamydiae")
plot_bar(gp.ch, fill="Genus") 

plot_bar(gp.ch, x="Group", fill="Genus")

plot_bar(gp.ch, "Group", fill="Genus", facet_grid=~Family)

ggsave("./Fig3_Phylum_abundance_bar.png",p2,width = 8,height = 8)

热图展示微生物的丰度信息

ps_rela = transform_sample_counts(ps, function(x) x/sum(x))
gpt <- subset_taxa(ps_rela, Kingdom=="Bacteria")
gpt <- prune_taxa(names(sort(taxa_sums(ps),TRUE)[1:30]), gpt)
#默认从OTU水平展示丰度热图
plot_heatmap(gpt, sample.label="Group") 

#--该函数可以使用排序方法对行和列进行排序分析,并且按照结构进行顺序安排。
p3 <- plot_heatmap(gpt, "NMDS", "bray", "Group", "Family") +
  scale_fill_gradientn(colours =colorRampPalette(RColorBrewer::brewer.pal(4,"Set3"))(60))

ggsave("./Fig4_Family_high_abundance_heatmap.png",p3,width = 6,height = 6)

进化树可视化

p4 <- plot_tree(gpt,  color="Group", label.tips="taxa_names", ladderize="left",size="abundance", base.spacing=0.8) + 
  scale_size_continuous()

plot_tree(gpt,  color="Group", label.tips="taxa_names", ladderize="left",size="abundance", base.spacing=0.5) + coord_polar(theta="y") +
  scale_size_continuous()

ggsave("./Fig5_phytree_high_abundance_point.png",p4,width = 8,height = 8)

微生物网络分析

myTaxa = names(sort(taxa_sums(ps), decreasing = TRUE)[1:50])
ps1 = prune_taxa(myTaxa, ps)
ps1
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 50 taxa and 18 samples ]
## sample_data() Sample Data:       [ 18 samples by 10 sample variables ]
## tax_table()   Taxonomy Table:    [ 50 taxa by 7 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 50 tips and 49 internal nodes ]
p5 <- plot_net(ps1,type = "taxa",maxdist = 0.6,color = "Phylum")
ggsave("./Fig6_Micro_network.png",p5,width = 8,height = 8)
# p0/p4|p1/p4|p2/p3

phyloseq&DEsep2:差异分析

library(DESeq2)
# gm_mean = function(x, na.rm=TRUE){
#   exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
# }

diagdds = phyloseq_to_deseq2(ps, ~ Group)
## converting counts to integer mode
# calculate geometric means prior to estimate size factors
gm_mean = function(x, na.rm=TRUE){
  exp(sum(log(x[x > 0]), na.rm=na.rm) / length(x))
}
geoMeans = apply(counts(diagdds), 1, gm_mean)
diagdds = estimateSizeFactors(diagdds, geoMeans = geoMeans)
diagdds = DESeq(diagdds, fitType="local")

res = results(diagdds, cooksCutoff = FALSE)
alpha = 0.01
sigtab = res[which(res$padj < alpha), ]
sigtab = cbind(as(sigtab, "data.frame"), as(tax_table(ps)[rownames(sigtab), ], "matrix"))
head(sigtab)
##          baseMean log2FoldChange     lfcSE      stat       pvalue        padj
## ASV_174  33.03969      3.4870081 0.8432392  4.135254 3.545625e-05 0.003381463
## ASV_79   82.21388     -0.9250167 0.2133781 -4.335105 1.456906e-05 0.002086737
## ASV_6   625.40881     -1.0358844 0.2333951 -4.438329 9.066008e-06 0.001688091
## ASV_41  154.25212     -1.2313562 0.2991679 -4.115937 3.856089e-05 0.003419066
## ASV_73   85.50976     -1.3770650 0.3129025 -4.400940 1.077831e-05 0.001803533
## ASV_141  42.06084     -1.4928966 0.3334011 -4.477779 7.542366e-06 0.001560432
##          Kingdom         Phylum              Class           Order
## ASV_174 Bacteria Actinobacteria     Actinobacteria Actinomycetales
## ASV_79  Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_6   Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_41  Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_73  Bacteria Proteobacteria Betaproteobacteria Burkholderiales
## ASV_141 Bacteria Proteobacteria Betaproteobacteria Burkholderiales
##                    Family        Genus    Species
## ASV_174 Streptomycetaceae Streptomyces Unassigned
## ASV_79         Unassigned   Unassigned Unassigned
## ASV_6          Unassigned   Unassigned Unassigned
## ASV_41   Oxalobacteraceae   Unassigned Unassigned
## ASV_73   Oxalobacteraceae   Unassigned Unassigned
## ASV_141  Oxalobacteraceae     Massilia Unassigned
theme_set(theme_bw())
scale_fill_discrete <- function(palname = "Set1", ...) {
    scale_fill_brewer(palette = palname, ...)
}
# Phylum order
x = tapply(sigtab$log2FoldChange, sigtab$Phylum, function(x) max(x))
x = sort(x, TRUE)
sigtab$Phylum = factor(as.character(sigtab$Phylum), levels=names(x))
# Genus order
x = tapply(sigtab$log2FoldChange, sigtab$Genus, function(x) max(x))
x = sort(x, TRUE)
sigtab$Genus = factor(as.character(sigtab$Genus), levels=names(x))
ggplot(sigtab, aes(x=Genus, y=log2FoldChange, color=Phylum)) + geom_point(size=6) + 
  theme(axis.text.x = element_text(angle = -90, hjust = 0, vjust=0.5))