論文
Microbiome differential abundance methods produce different results across 38 datasets
數(shù)據(jù)鏈接
https://figshare.com/articles/dataset/16S_rRNA_Microbiome_Datasets/14531724
代碼鏈接
https://github.com/nearinj/Comparison_of_DA_microbiome_methods
這個人的github主頁還有其他論文的數(shù)據(jù)和代碼
https://github.com/jnmacdonald/differential-abundance-analysis 這個鏈接有很多冠以差異豐度分析的代碼
這兩天在看宏基因組的利用otu豐度數(shù)據(jù)做差異豐度分析,找到了這篇論文,看了摘要,好像是比較了不同差異豐度分析方法獲得結(jié)果的異同。重復(fù)一下這里利用DESeq2做差異豐度分析的代碼
這里我用到的數(shù)據(jù)集是
- 豐度數(shù)據(jù)
ArcticFireSoils_genus_table.tsv
- 分組數(shù)據(jù)
ArcticFireSoils_meta.tsv
這里有一個疑問:論文提供的豐度表格數(shù)據(jù)有兩個歉秫,還有一個是帶rare后綴的,暫時不知道這兩個有啥區(qū)別
首先是讀取數(shù)據(jù)集
ASV_table <- read.table("metagenomics/dat01/ArcticFireSoils_genus_table.tsv",
sep="\t",
skip=1,
header=T,
row.names = 1,
comment.char = "",
quote="", check.names = F)
groupings <- read.table("metagenomics/dat01/ArcticFireSoils_meta.tsv",
sep="\t",
row.names = 1,
header=T,
comment.char = "",
quote="",
check.names = F)
dim(ASV_table)
dim(groupings)
groupings$Fire<-factor(groupings$Fire)
這里需要對表示分組的賦予因子,要不然后面deseq2的步驟會有警告信息
判斷一下豐度數(shù)據(jù)的樣本名和分組數(shù)據(jù)的樣本名順序是否一致
identical(colnames(ASV_table), rownames(groupings))
返回false不一致
對兩個樣本名取交集
rows_to_keep <- intersect(colnames(ASV_table), rownames(groupings))
根據(jù)取交集的結(jié)果重新選擇樣本
groupings <- groupings[rows_to_keep,,drop=F]
ASV_table <- ASV_table[,rows_to_keep]
疑問:這里中括號里的drop參數(shù)是啥作用了罢浇?
再次判斷兩個數(shù)據(jù)集中樣本名的順序
identical(colnames(ASV_table), rownames(groupings))
這次返回TRUE
對分組文件的列名進行修改
colnames(groupings)[1] <- "Groupings"
差異豐度分析
library(DESeq2)
dds <- DESeq2::DESeqDataSetFromMatrix(countData = ASV_table,
colData=groupings,
design = ~ Groupings)
dds_res <- DESeq2::DESeq(dds, sfType = "poscounts")
res <- results(dds_res,
tidy=T,
format="DataFrame",
contrast = c("Groupings","Fire","Control"))
head(res)
火山圖代碼
DEG<-res
logFC_cutoff<-2
DEG$change<-as.factor(ifelse(DEG$pvalue<0.05&abs(DEG$log2FoldChange)>logFC_cutoff,
ifelse(DEG$log2FoldChange>logFC_cutoff,"UP","DOWN"),
"NOT"))
this_title <- paste0('Cutoff for logFC is ',round(logFC_cutoff,3),
'\nThe number of up gene is ',nrow(DEG[DEG$change =='UP',]) ,
'\nThe number of down gene is ',nrow(DEG[DEG$change =='DOWN',]))
DEG<-na.omit(DEG)
library(ggplot2)
ggplot(data=DEG,aes(x=log2FoldChange,
y=-log10(pvalue),
color=change))+
geom_point(alpha=0.8,size=3)+
labs(x="log2 fold change")+ ylab("-log10 pvalue")+
ggtitle(this_title)+theme_bw(base_size = 20)+
theme(plot.title = element_text(size=15,hjust=0.5),)+
scale_color_manual(values=c('#a121f0','#bebebe','#ffad21')) -> p1
p1+xlim(NA,10)+ylim(NA,30) -> p2
library(patchwork)
p1+p2
今天推文的示例數(shù)據(jù)和代碼可以在公眾號查看獲取方式
歡迎大家關(guān)注我的公眾號
小明的數(shù)據(jù)分析筆記本
小明的數(shù)據(jù)分析筆記本 公眾號 主要分享:1、R語言和python做數(shù)據(jù)分析和數(shù)據(jù)可視化的簡單小例子沐祷;2嚷闭、園藝植物相關(guān)轉(zhuǎn)錄組學(xué)、基因組學(xué)赖临、群體遺傳學(xué)文獻閱讀筆記胞锰;3、生物信息學(xué)入門學(xué)習(xí)資料及自己的學(xué)習(xí)筆記兢榨!