單基因泛癌表達(dá)箱線圖是泛癌分析文章必不可少的一個圖,配對箱線圖在很多文章中也有出現(xiàn)猾浦。本文講解如何實現(xiàn)單個基因在泛癌表達(dá)箱線圖和配對箱線圖展示夫壁。
先上效果圖:
image.png
1. 單基因泛癌表達(dá)箱線圖
本首先是從xena下載泛癌矩陣扩劝。下載地址:https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443 一般下載tpm格式即可亏娜。
image
然后是在同頁面下載臨床信息。
image
通過tidyverse R包進(jìn)行數(shù)據(jù)的合并嫉戚、轉(zhuǎn)置等操作刨裆,得到如下數(shù)據(jù)框,其中每一行為一個樣本彼水,每一列為基因或臨床信息(這里的tpm竟然有負(fù)數(shù))崔拥。也可以參考2022新版TCGA數(shù)據(jù)下載與整理,人肉下載再手動合并(鏈接中是下載的count矩陣凤覆,也可以選擇下載tpm那一列)链瓦。
image
以CBX3基因為例畫圖。
library(ggpubr)
# Type為Control和Tumor的分組,Cancer為33種腫瘤的名稱慈俯。
p <- ggboxplot(pandata, x = "Cancer", y = "CBX3",
color = "Type", palette = "jco")+
rotate_x_text(angle = 90) #將x軸腫瘤名稱旋轉(zhuǎn)90°展示
p + stat_compare_means(aes(group = Type), label = "p.signif", label.y =11)
# label = "p.signif"表示星號表示渤刃,label="p.format"表示p值展示
# label.y表示設(shè)置星號的縱坐標(biāo)。
圖如下:
image
還可以顯示散點(diǎn):
library(ggpubr)
p <- ggboxplot(pandata, x = "Cancer", y = "CBX3",
color = "Type", palette = "jco",
add = "jitter")+
rotate_x_text(angle = 90)
p + stat_compare_means(aes(group = Type),label = "p.signif", label.y =11)
image
2. 單基因配對箱線圖
先以BRCA為例
library(tidyverse)
BRCA=drawdata[pandata$Cancer=="BRCA",]
BRCA$ID=stringr::str_sub(BRCA$ID,1,12) # 取樣本名字前12位
Normal=filter(BRCA,Type=="Normal")
Tumor=filter(BRCA,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),] #去除腫瘤組中的重復(fù)樣本
index <- intersect(Normal$ID,Tumor$ID) #取正常和腫瘤組中共有患者
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
data=rbind(T1,N1)
library(ggpubr)
p <- ggpaired(data, x = "Type", y = "CBX3",
color = "black",
fill = c("#E11E24","#FBB96F"),
line.color = "gray", line.size = 0.4,
ylab = "expression of CBX3",
palette = "npg")
p + stat_compare_means(paired = TRUE,label="p.signif", label.x.npc=0.4,comparisons=list(c("Tumor","Normal")))
image.png
p值顯示:
library(tidyverse)
BRCA=drawdata[pandata$Cancer=="BRCA",]
BRCA$ID=stringr::str_sub(BRCA$ID,1,12) # 取樣本名字前12位
Normal=filter(BRCA,Type=="Normal")
Tumor=filter(BRCA,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),] #去除腫瘤組中的重復(fù)樣本
index <- intersect(Normal$ID,Tumor$ID) #取正常和腫瘤組中共有患者
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
data=rbind(T1,N1)
library(ggpubr)
p <- ggpaired(data, x = "Type", y = "CBX3",
color = "black",
fill = c("#E11E24","#FBB96F"),
line.color = "gray", line.size = 0.4,
ylab = "expression of CBX3",
palette = "npg")
p + stat_compare_means(paired = TRUE,label="p.format", label.x.npc=0.4,comparisons=list(c("Tumor","Normal")))
image.png
至于單基因配對箱線圖泛癌展示我還沒想好贴膘,暫時只能用分面來解決卖子。
library(ggpubr)
data=pandata
data$ID=stringr::str_sub(data$ID,1,12)
Tumor = subset(data,Type=="Tumor")
Tumor=Tumor[!duplicated(Tumor$ID),]
Normal = subset(data,Type=="Normal")
index <- intersect(Normal$ID,Tumor$ID)
T1=filter(Tumor, ID %in% index)
N1=filter(Normal, ID %in% index)
paireddata=rbind(T1,N1)
p <- ggpaired(paireddata,x="Type", y="CBX3",
color = "Type",palette = "jco",
line.color = "gray",line.size = 0.4,
facet.by = "Cancer",short.panel.labs = F)
p + stat_compare_means(label="p.signif",paired=T,label.x.npc=0.4,label.y=9)
image.png