今天的隨筆主要涉及運(yùn)用DESeq2對(duì)基因進(jìn)行差異分析,先上代碼:
# 差異表達(dá) -----------------------------------------------------------------------------------
BiocManager::install("DESeq2")
library(DESeq2)
##這里導(dǎo)入了mRNA或miRNA的矩陣--------------------------------------------------------------------
mRNA_exp=readRDS("TCGA_THCA_mRNA.rds")
miRNA_exp <- data.frame(mirCounts_tcgabiolinks)
##1.區(qū)分癌和正常組織-----------------------------------------------------------------------------
metadata_RNA <- data.frame(sample=ifelse(substring(names(mRNA_exp),14,15)<10,"cancer","normal"))
##2.添加樣本因子---------------------------------------------------------------------------------
metadata_RNA$sample <- as.factor(metadata_RNA$sample)
##3.用已知列名命名行名---------------------------------------------------------------------------
rownames(metadata_RNA)=names(mRNA_exp)
##4.列出癌與正常組織數(shù)---------------------------------------------------------------------------
table(metadata_RNA$sample)
##5.表達(dá)矩陣的標(biāo)準(zhǔn)化-----------------------------------------------------------------------------
mycounts <- mRNA_exp
dds <-DESeqDataSetFromMatrix(countData=mycounts,
colData=metadata_RNA,
design=~sample)
##6.表達(dá)差異分析(DESeq)------------------------------------------------------------------------
DESeq_mRNA_dds <- DESeq(dds)
##7.保存差異分析文件-----------------------------------------------------------------------------
save(DESeq_mRNA_dds,file = "TCGA_mRNA_DESeq.Rdata")
load("TCGA_mRNA_DESeq.Rdata")
##8.得到差異分析結(jié)果-----------------------------------------------------------------------------
res_RNA=results(DESeq_mRNA_dds, contrast=c("sample","cancer","normal"),tidy=TRUE)
DEseq_diff=subset(res_RNA,padj<0.05&abs(log2FoldChange)>1)
##前者為分子晨另,后者為分母
那么潭千,當(dāng)我們?cè)谕瓿闪薽RNA的差異分析后,ID的命名我們看到是如下情況:
mRNA差異分析.png
rowname的代碼是以ENSG開(kāi)頭的借尿,這里我們就要提一下EMBL數(shù)據(jù)庫(kù)命名方式(http://asia.ensembl.org/index.html)刨晴,在網(wǎng)站內(nèi)能夠查到一對(duì)一的對(duì)應(yīng)關(guān)系,但是路翻,當(dāng)遇到很多ID的轉(zhuǎn)換時(shí)狈癞,我們就要考慮一下如何借助R語(yǔ)言完成了。
# KEGG ---------------------------------------------------------------------------------------------------------
##01 安裝 clusterProfiler和org.Hs.eg.db---------------------------------------------------------------------
BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db")
##02 加載clusterProfiler和org.Hs.eg.db----------------------------------------------------------------------
library(clusterProfiler)
library(org.Hs.eg.db)
##03 將ENSEMBL轉(zhuǎn)換為SYMBOL
eg = bitr(res_RNA$row, fromType="ENSEMBL", toType=c("ENTREZID","SYMBOL"), OrgDb="org.Hs.eg.db")
write.csv(eg,"res_RNA.csv")
##04 運(yùn)用KEGG富集相關(guān)功能基因-------------------------------------------------------------------------------
kk <- enrichKEGG(gene = eg$ENTREZID,
organism = 'hsa',
pvalueCutoff = 0.05)
head(kk)
##05 富集功能作圖-------------------------------------------------------------------------------------------
library(enrichplot)
barplot(kk, showCategory=20)
dotplot(kk, showCategory=20)
以上是通過(guò)clusterProfiler和org.Hs.eg.db包完成的ID轉(zhuǎn)換茂契。