A random rarefaction of sample reads according to a specific reads length (usually the smallest value) should be performed firstly for downstream analysis.
擴(kuò)增子測(cè)序拿到OTU表之后通常會(huì)被要求進(jìn)行抽平處理朴肺,這樣去進(jìn)行后續(xù)比較分析,測(cè)序量一致后續(xù)分析比較才有意義,但是這種方式的缺陷在于當(dāng)樣品測(cè)序量相差比較大時(shí)候,會(huì)造成數(shù)據(jù)的極大浪費(fèi)噪沙,假設(shè)樣品A測(cè)序量為3萬(wàn)條reads叮盘,樣品B測(cè)序量10萬(wàn)條,抽平后樣品B就會(huì)浪費(fèi)7萬(wàn)條reads坛猪,當(dāng)然抽平并不是唯一的解決途徑涩搓,文獻(xiàn)中也有通過(guò)像Deseq2這種方法去進(jìn)行后續(xù)分析的污秆,Deseq2有自己的標(biāo)準(zhǔn)化的方法,做過(guò)轉(zhuǎn)錄組的人應(yīng)該大多都清楚昧甘,這里呢我就先說(shuō)下前者--抽平的實(shí)現(xiàn)
Option 1 Vegan包
library(vegan)
otu = read.table('16s_OTU_Table.txt', header=T, sep="\t", quote = "", row.names=1, comment.char="",stringsAsFactors = FALSE)%>%select(-13)
colSums(otu)
otu_rare = as.data.frame(t(rrarefy(t(otu), min(colSums(otu)))))
colSums(otu_rare)
Option 2 Phyloseq包
library(phyloseq)
set.seed(123)#這種方法最好設(shè)置一個(gè)隨機(jī)種子便于重復(fù)
otu1 = otu_table(otu, taxa_are_rows = T)
phyloseq = phyloseq(otu1)
#這種方法會(huì)自動(dòng)去除一些低豐度的otu
rare.data = rarefy_even_depth(phyloseq,replace = TRUE)
#8OTUs were removed because they are no longer present in any sample after random subsampling
#查看抽平前后的變化
sample_sums(phyloseq)
sample_sums(rare.data)
#提取抽平后的otu表格
rare.otu = rare.data@.Data %>%
as.data.frame()
可以看到通過(guò)phyloseq方法會(huì)過(guò)濾掉一下低豐度的OTU良拼,所以通過(guò)這種方法進(jìn)行抽平的話(huà),最好set.seed一下充边,便于重復(fù).
且看下被過(guò)濾掉的這8個(gè)OTU在各樣品中的值如何
otu[setdiff(rownames(otu),rownames(rare.otu)),]
en庸推,確實(shí)蠻低的,刪就刪了吧痛黎!~~ 方法沒(méi)有好壞,大家自主選擇吧刮吧!