本文使用基因表達(dá)數(shù)據(jù)繪制箱式圖,并疊加小提琴圖和點(diǎn)圖 (geom_boxplot繪制箱式圖,geom_violin繪制小提琴圖,geom_dotplot和geom_jitter繪制點(diǎn)圖).
了解一下R語(yǔ)言中箱式圖的術(shù)語(yǔ),以及它的含義:
導(dǎo)入數(shù)據(jù):
>genefpkm <- read.csv(file = "clipboard",header = T,sep = "\t")
>head(genefpkm)
x_d <- genefpkm? ?#復(fù)制數(shù)據(jù)框热监,萬(wàn)一后面操作失誤就不用重新導(dǎo)入數(shù)據(jù)。
x_d <- as.matrix(x_d)? ?#變成矩陣類(lèi)型才能進(jìn)行接下來(lái)的操作
x_d <- matrix(log10(as.numeric(x_d)),dimnames = list(row.names(x_d),colnames(x_d)),nrow = dim(x_d)[1])? ?#對(duì)矩陣中的每個(gè)數(shù)取log10饮寞,使數(shù)據(jù)差異減小孝扛。有些表達(dá)量為0,在這一步會(huì)返回Inf幽崩,在接下來(lái)畫(huà)圖時(shí)會(huì)直接排除掉苦始。
group <- c(rep("LPE",4*dim(genefpkm)[1]),rep("LPF",4*dim(genefpkm)[1]))? ? #分組情況
data <- data.frame(expression=c(x_d),sample=rep(colnames(x_d),each=nrow(x_d)),group = group)? ? ?#添加分組
開(kāi)始畫(huà)圖:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T)+? ? ?#添加誤差線
geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T) +? ? ?# notch參數(shù)會(huì)在箱式圖的中位線處生成缺口,可以比較缺口有無(wú)重疊慌申,來(lái)判斷中位數(shù)是否有差異陌选。linetype的值有很多,不同的值代表不同的線(在R語(yǔ)言工作區(qū)中輸入vignette("ggplot2-specs")有詳細(xì)解釋?zhuān)?/p>
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))? ? #手動(dòng)設(shè)置顏色
此時(shí)蹄溉,中間是虛線咨油,兩端是實(shí)線(其實(shí)都是虛線,只是誤差線是實(shí)線柒爵,覆蓋了兩端的虛線)
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T,linetype = 2)+
? geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,alpha = 1,notch = T,outlier.shape = NA,na.rm = T)+? ? ? ? #這行代碼只會(huì)畫(huà)出中間的箱子役电,上下的線不會(huì)畫(huà)出來(lái),因?yàn)樵O(shè)置了ymin = ..lower.. , ymax = ..upper..餐弱,可以看看本文第一幅圖宴霸,理解ymin和ymax是什么意思囱晴。
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
此時(shí)中間是實(shí)線膏蚓,兩端是虛線(其實(shí)全都是虛線,只是中間又畫(huà)了實(shí)線的框框畸写,覆蓋了虛線)
箱式圖疊加小提琴圖:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_violin(linetype = "dashed",na.rm = T)+
? stat_boxplot(geom = "errorbar",size = 1,width = 0.3,na.rm = T,linetype = 2)+
? geom_boxplot(linetype = 2,na.rm = T,outlier.alpha = 0.3,outlier.size = 3,notch = T,width = 0.3) +? #設(shè)置箱式圖的寬度驮瞧,避免和小提琴圖重合。
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,na.rm = T)+
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
去掉誤差線和離群點(diǎn):
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_violin(na.rm = T)+
? geom_boxplot(linetype = 2,na.rm = T,notch = T,width = 0.3,outlier.shape = NA) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,outlier.shape = NA,na.rm = T)+
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))
點(diǎn)圖也可以表示小提琴圖的含義:
ggplot(data = data,aes(x=sample,y=expression,fill = group))+
? geom_boxplot(linetype = 2,na.rm = T,notch = T,width = 0.3,outlier.shape = NA) +
? stat_boxplot(aes(ymin = ..lower..,ymax = ..upper..),size = 1,width = 0.3,notch = T,outlier.shape = NA,na.rm = T)+
? geom_dotplot(binaxis = "y",stackdir = "center",dotsize = 0.11,method = "histodot",stackratio = 0.01,na.rm = T)+? #由于點(diǎn)很多枯芬,可以縮小點(diǎn)的大小和比例论笔,來(lái)展示所有點(diǎn)。
? xlab("Samples") + ylab("log10(FPKM)")+
? theme(axis.text = element_text(size = rel(1.2)),
? ? ? ? axis.line = element_line(size = rel(1.5)),
? ? ? ? axis.title = element_text(size = rel(1.5)),
? ? ? ? panel.background = element_blank())+
? scale_fill_manual(values = c("darkolivegreen1", "deeppink"))