本文使用某宏基因組測序數(shù)據(jù)中的種水平類群豐度表,并直接根據(jù)所有種相對豐度總和叛拷,提取出top10豐度的種類群,作圖展示這些種的豐度組成猜憎,top10外的類群合并為“Others”搁料。
-
作圖數(shù)據(jù)說明
-
文件“species_RPM_table-NC.txt”為某宏基因組測序所得的種水平類群豐度(RPM)表格(即界門綱目科屬種的“門”這一水平)奈嘿,第一列(Taxonomy)為每類物種的名稱量承,第一行為樣本ID(名稱及順序要與以下兩個文件一致)好唯;每一行為一類種馍资,交叉區(qū)域為每類種在各樣本中的豐度(RPM)筒主,內(nèi)容如下:
-
文件“sample_frequency.txt”為各樣本的總序列數(shù)(total mapped reads),第1列為樣本ID,第2列為物種數(shù)量乌妙,第3列為總序列數(shù)使兔,內(nèi)容如下:
-
文件“sample_group.txt”為各樣本的分組信息,第一列為各樣本名稱冠胯;第二列(group)為各樣本的分組信息火诸,其內(nèi)容如下:
-
繪圖腳本說明
操作環(huán)境: Rstudio
腳本保存位置:D:\ ...\new_database\S_table\RPM\cutoff-0.5\draw_taxa_barplot.r
####讀取并挑選 top10 豐度種(RPM),并繪制物種豐度堆疊柱狀圖
#讀取數(shù)據(jù):特征表
species <- read.delim('species_RPM_table-NC.txt', row.names = 1, sep = '\t', stringsAsFactors = FALSE, check.names = FALSE)
#求各類群的豐度總和荠察,并排序
species$sum <- rowSums(species)
species <- species[order(species$sum, decreasing = TRUE), ]
#挑選 top10 種類群置蜀,并將 top10 外的類群合并為“Others”
species_top10 <- species[1:10, -ncol(species)]
## 讀取數(shù)據(jù):樣本的總特征序列數(shù)
sample_frequency <- read.delim('sample_frequency.txt', row.names = 1, sep = '\t', stringsAsFactors = FALSE, check.names = FALSE)
## others:用每個樣本對應(yīng)的總特征序列數(shù)減去top10
species_top10['Others', ] <- sample_frequency$feature_count - colSums(species_top10)
## others:當使用相對豐度表時,用1減去top10物種的相對豐度
# species_top10['Others', ] <- 1 - colSums(species_top10)
#可選輸出(如 csv 格式)
write.csv(species_top10, 'species_top10.csv', quote = FALSE)
####ggplot2 作圖
library(reshape2) #用于排列數(shù)據(jù)
library(ggplot2) #ggplot2 作圖
#調(diào)整數(shù)據(jù)布局
species_top10$Taxonomy <- factor(rownames(species_top10), levels = rev(rownames(species_top10)))
species_top10 <- melt(species_top10, id = 'Taxonomy')
#添加分組:variable列為各樣本名稱悉盆,Taxonomy列為各類群名稱盯荤,value列為各樣本中各類群的相對豐度,group列為各樣本的分組信息
group <- read.delim('sample_group.txt', sep = '\t', stringsAsFactors = FALSE)
names(group)[1] <- 'variable'
species_top10 <- merge(species_top10, group, by = 'variable')
## 指定橫坐標(樣本ID)順序
species_top10$variable=factor(species_top10$variable,levels = c("A0065","A4313","C0065","C4313","C4445","V0065","V4313","V4445"))
#繪制帶分面的柱狀圖
# p <- ggplot(species_top10, aes(variable, 100 * value, fill = Taxonomy)) + ##相對豐度乘以100焕盟,求百分比
p <- ggplot(species_top10, aes(variable, value, fill = Taxonomy)) +
geom_col(position = 'stack', width = 0.6) +
facet_wrap(~group, scales = 'free_x', ncol = 3) + ### ncol 分組數(shù)量
scale_fill_manual(values = rev(c('blue', 'orange', 'green', 'yellow', 'red', 'hotpink', 'cyan','purple', 'burlywood1', 'skyblue', 'gray'))) +
# labs(x = '', y = 'Relative Abundance(%)') +
labs(x = '', y = 'RPM') +
theme(panel.grid = element_blank(), panel.background = element_rect(color = 'black', fill = 'transparent'), strip.text = element_text(size = 12)) + ##設(shè)定分面標題字體大小
theme(axis.text = element_text(size = 10), axis.title = element_text(size = 13), legend.title = element_blank(), legend.text = element_text(size = 11))
## 使用geom_col()繪制柱狀圖秋秤,position = 'stack',堆疊柱狀圖脚翘,width = 0.6灼卢,柱子寬度;labs()設(shè)置坐標軸標簽来农;theme()中鞋真,axis.text和axis.title分別指定坐標軸刻度字體大小及標簽字體大小,legend.text指定圖例字體大小沃于。
## 保存圖片:結(jié)果如下圖
ggsave('species_taxa_barplot.png', p, width = 8, height = 6)
#### 其他方法:barplot 作圖(一個簡單示例)
#定義作圖整體分布
png('barplot_plot.png', width = 1000, height = 700)
par(xpd = TRUE, mar = par()$mar + c(1, 3, 1, 16))
#barplot作圖涩咖,包含了顏色、字體大小繁莹、圖例位置等信息
barplot(as.matrix(100 * phylum_top10), col = c('blue', 'orange', 'green', 'yellow', 'red', 'hotpink', 'cyan','purple', 'burlywood1', 'skyblue', 'gray'),
legend = rownames(species_top10),
cex.axis = 2, cex.names = 2, ylim = c(0, 100), las = 1, width = 0.5, space = 0.5, beside = FALSE,
args.legend = list(x = 'right', bty = 'n', inset = -0.18, cex = 2, y.intersp = 1.2, x.intersp = 0.7, text.width = 1))
mtext('Relative Abundance(%)', cex = 2, side = 2, line = 4)
dev.off()