這個(gè)推文已經(jīng)發(fā)布在生信技能樹公眾號:
RNAseq數(shù)據(jù),下載GEO中的FPKM文件后該怎么下游分析
- 文獻(xiàn)標(biāo)題是:Oncogenic lncRNA downregulates cancer cell antigen presentation and intrinsic tumor suppression不過不需要看文章
- 6個(gè)樣本警检,分成2組孙援,是RPKM值表達(dá)矩陣,做差異分析扇雕,看GO通路拓售,跟文章比較
- 新的作業(yè):(f) Enrichment of GO biological process (BP) terms for up-regulated genes (red) and down-regulated genes in tumor versus normal samples (n?=?3, 3 animals). (g-i) Log2 of fold changes of indicated metabolites in MMTV-Tg(LINK-A) breast tumor compared to that of Tg(LINK-A) mammary gland (n?=?3 animals respectively).
1.下載數(shù)據(jù)GSE113143并加載數(shù)據(jù)
a=read.table('GSE113143_Normal_Tumor_Expression.tab.gz',sep='\t',quote = "",fill = T,
comment.char = "!",header = T) # 提取表達(dá)矩陣
rownames(a)=a[,1]
a <- a[,-1]
- TPM值就是RPKM的百分比:關(guān)于TPM的解釋可以看看這個(gè)
- What the FPKM? A review of RNA-Seq expression units
- Question: Differential expression analysis starting from TPM data
2.將FPKM轉(zhuǎn)換為TPM
expMatrix <- a
fpkmToTpm <- function(fpkm)
{
exp(log(fpkm) - log(sum(fpkm)) + log(1e6))
}
tpms <- apply(expMatrix,2,fpkmToTpm)
tpms[1:3,]
colSums(tpms)
#輸出結(jié)果:
> tpms[1:3,]
N1 N2 N3 T1 T2 T3
0610005C13Rik 0.232 0.1715 0.00 0.00 0.00 0.00
0610007P14Rik 48.391 39.2632 46.04 50.04 59.05 67.29
0610009B22Rik 47.491 58.5954 54.27 49.79 53.13 58.00
> colSums(tpms)
N1 N2 N3 T1 T2 T3
1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
3.差異分析:
group_list=c(rep('Normal',3),rep('Tumor',3))
## 強(qiáng)制限定順序
group_list <- factor(group_list,levels = c("Normal","Tumor"),ordered = F)
#表達(dá)矩陣數(shù)據(jù)校正
exprSet <- tpms
boxplot(exprSet,outline=FALSE, notch=T,col=group_list, las=2)
library(limma)
exprSet=normalizeBetweenArrays(exprSet)
boxplot(exprSet,outline=FALSE, notch=T,col=group_list, las=2)
#判斷數(shù)據(jù)是否需要轉(zhuǎn)換
exprSet <- log2(exprSet+1)
#差異分析:
dat <- exprSet
design=model.matrix(~factor( group_list ))
fit=lmFit(dat,design)
fit=eBayes(fit)
options(digits = 4)
topTable(fit,coef=2,adjust='BH')
bp=function(g){
library(ggpubr)
df=data.frame(gene=g,stage=group_list)
p <- ggboxplot(df, x = "stage", y = "gene",
color = "stage", palette = "jco",
add = "jitter")
# Add p-value
p + stat_compare_means()
}
deg=topTable(fit,coef=2,adjust='BH',number = Inf)
head(deg)
#save(deg,file = 'deg.Rdata')
劃重點(diǎn):以下代碼、方法全來自生信技能樹的最新推文:為R包寫一本書(向Y叔致敬)
4.做完差異分析
- GEO數(shù)據(jù)挖掘代碼镶奉,很容易得到上下調(diào)基因础淤,而且轉(zhuǎn)為ENTREZID,后續(xù)分析都以這個(gè)為主線哨苛。
- 根據(jù)原文文獻(xiàn)中:
Differential gene expression was defined if the fold change >1.5 and P?<?0.05 between tumor and normal samples
找差異基因
## 不同的閾值鸽凶,篩選到的差異基因數(shù)量就不一樣,后面的超幾何分布檢驗(yàn)結(jié)果就大相徑庭建峭。
if(T){
logFC_t=1.5
deg$g=ifelse(deg$P.Value>0.05,'stable',
ifelse( deg$logFC > logFC_t,'UP',
ifelse( deg$logFC < -logFC_t,'DOWN','stable') )
)
table(deg$g)
head(deg)
deg$symbol=rownames(deg)
library(ggplot2)
library(clusterProfiler)
library(org.Mm.eg.db)
df <- bitr(unique(deg$symbol), fromType = "SYMBOL",
toType = c( "ENTREZID"),
OrgDb = org.Mm.eg.db)
head(df)
DEG=deg
head(DEG)
DEG=merge(DEG,df,by.y='SYMBOL',by.x='symbol')
head(DEG)
save(DEG,file = 'anno_DEG.Rdata')
gene_up= DEG[DEG$g == 'UP','ENTREZID']
gene_down=DEG[DEG$g == 'DOWN','ENTREZID']
}
5.最簡單的超幾何分布檢驗(yàn):
# 最簡單的超幾何分布檢驗(yàn)
###這里就拿KEGG數(shù)據(jù)庫舉例吧玻侥,拿自己判定好的上調(diào)基因集進(jìn)行超幾何分布檢驗(yàn),如下
if(T){
gene_down
gene_up
enrichKK <- enrichKEGG(gene = gene_up,
organism = 'mmu',
#universe = gene_all,
pvalueCutoff = 0.05,
qvalueCutoff =0.05)
head(enrichKK)[,1:6]
browseKEGG(enrichKK, 'hsa04512')
dotplot(enrichKK)
ggsave("enrichKK.png")
enrichKK=DOSE::setReadable(enrichKK, OrgDb='org.Mm.eg.db',keyType='ENTREZID')
enrichKK
}
##最基礎(chǔ)的條形圖和點(diǎn)圖
#條帶圖
barplot(enrichKK,showCategory=20)
#氣泡圖
dotplot(enrichKK)
- 通路與基因之間的關(guān)系可視化
#通路與上調(diào)基因之間的關(guān)系可視化
###制作genlist三部曲:
## 1.獲取基因logFC
DEG_up <- DEG[DEG$g == 'UP',]
geneList <- DEG_up$logFC
## 2.命名
names(geneList) = DEG_up$ENTREZID
## 3.排序很重要
geneList = sort(geneList, decreasing = TRUE)
head(geneList)
cnetplot(enrichKK, categorySize="pvalue", foldChange=geneList,colorEdge = TRUE)
cnetplot(enrichKK, foldChange=geneList, circular = TRUE, colorEdge = TRUE)
ggsave("enrichKK_cnetplot.png")
- 通路與通路之間的連接展示
#通路與通路之間的連接展示
emapplot(enrichKK)
ggsave("enrichKK_emapplot.png")
- 熱圖展現(xiàn)通路與基因之間的關(guān)系
#熱圖展現(xiàn)通路與基因之間的關(guān)系
heatplot(enrichKK)
ggsave("enrichKK_heatplot.png")
- 如果你是做GO數(shù)據(jù)庫呢亿蒸,其實(shí)還有一個(gè)goplot可以試試看凑兰,當(dāng)然是以Y叔的書為主啦。
#如果你是做GO數(shù)據(jù)庫呢边锁,其實(shí)還有一個(gè)goplot可以試試看
ego_bp_up<-enrichGO(gene = DEG_up$ENTREZID,
OrgDb = org.Mm.eg.db,
keyType = 'ENTREZID',
ont = "BP",
pAdjustMethod = "BH",
pvalueCutoff = 0.01,#0.01
qvalueCutoff = 0.05)
goplot(ego_up)
ggsave("ego_bp_up_goplot.png")
head(ego)
library(stringr)
barplot(ego_bp_up,showCategory = 16,title="The GO_BP enrichment analysis of all DEGs ")+
scale_size(range=c(2, 12))+
scale_x_discrete(labels=function(ego_bp) str_wrap(ego_bp,width = 25))
ggsave("ego_bp_up_barplot.png")
-
同樣的方式看看下調(diào)基因的GO_BP:
-
和文獻(xiàn)中的GO_BP比較一下