前言
轉(zhuǎn)錄組入門(7): 差異基因分析這個(gè)步驟推薦在R里面做困曙,載入表達(dá)矩陣概而,然后設(shè)置好分組信息诗宣,統(tǒng)一用DEseq2進(jìn)行差異分析,當(dāng)然也可以走走edgeR或者limma的voom流程奥此』“ィ基本任務(wù)是得到差異分析結(jié)果,進(jìn)階任務(wù)是比較多個(gè)差異分析結(jié)果的異同點(diǎn)稚虎。生信技能樹
實(shí)驗(yàn)操作
rm(list=ls())
library(DESeq2)
setwd("E:/0ngs/sam_out/")
expr <- read.table("Akap95_expr.txt",header = T,stringsAsFactors = F)
rownames(expr) <- expr$Gene
expr <- expr[,-1]
表達(dá)矩陣expr如下:
# library(edgeR)
# keepGene <- rowSums(cpm(expr)>0) >=2
# table(keepGene); dim(expr)
group=c('con','con','treat','treat')
colData <- data.frame(row.names=colnames(expr), group=group)
dds <- DESeqDataSetFromMatrix(countData = expr,
colData = colData,
design = ~ group)
dds <- DESeq(dds)
png("qc_dispersions.png", 1000, 1000, pointsize=20)
plotDispEsts(dds, main="Dispersion plot") # 散度估計(jì)
dev.off()
# 提取差異基因
res <- results(dds, contrast = c('group','treat','con'))
resOrdered <- res[order(res$padj),]
DEG <- as.data.frame(resOrdered)
write.csv(DEG,"Akap95-DEG.csv")
差異表達(dá)顯著的基因:
# normalizedCounts1與normalizedCounts2相同
normalizedCounts1 <- t(t(counts(dds)) / sizeFactors(dds))
exprMatrix_rpm=as.data.frame(normalizedCounts1)
normalizedCounts2 <- counts(dds, normalized=T)
# exprMatrix_rpm[grep("Akap8",rownames(exprMatrix_rpm)),]
# 單獨(dú)用DESeq2的normlization
rld <- rlogTransformation(dds)
exprMatrix_rlog=assay(rld)
# 原始表達(dá)值與DESeq2的normlization后的表達(dá)值比較
png("DEseq_RAWvsNORM.png",height = 800,width = 800)
par(cex = 0.7)
n.sample=ncol(expr)
if(n.sample>40) par(cex = 0.5)
cols <- rainbow(n.sample*1.2)
par(mfrow=c(2,2))
boxplot(expr, col = cols,main="expression value",las=2)
boxplot(exprMatrix_rlog, col = cols,main="expression value",las=2)
hist(as.matrix(expr))
hist(exprMatrix_rlog)
dev.off()
PS: 本篇都是用的建明群主的代碼里DESeq2的部分完成的撤嫩。