單/多個(gè)基因在組間同時(shí)展示的多種選擇：分組小提琴圖、分組/分面柱狀圖夫壁、單基因蜂群點(diǎn)圖拼圖繪制

這是ggplot2可視化專題的第二個(gè)實(shí)例操作

【ggplot2的基本思路見前文總論】：基于ggplot2的RNA-seq轉(zhuǎn)錄組可視化：總述和分文目錄

【ggplot2繪圖具體策略第一篇】：測(cè)序結(jié)果概覽：基因表達(dá)量rank瀑布圖拾枣，高密度表達(dá)相關(guān)性散點(diǎn)圖，注釋特定基因及errorbar的表達(dá)相關(guān)性散點(diǎn)圖繪制

【ggplot2繪圖具體策略第三篇】：配對(duì)樣本基因表達(dá)趨勢(shì)：優(yōu)化前后的散點(diǎn)連線圖+拼圖繪制

在我們獲得轉(zhuǎn)錄組的測(cè)序結(jié)果并進(jìn)行數(shù)據(jù)處理-差異分析一整套流程后，我們不僅關(guān)注整個(gè)轉(zhuǎn)錄組的表達(dá)趨勢(shì)梅肤，還包括了特定基因在不同處理組/特征間的表達(dá)情況司蔬。這時(shí)依舊是各類ggplot2統(tǒng)計(jì)圖的主場(chǎng)。

本文將要介紹的圖表類型展示如下：包括分組小提琴圖（B）姨蝴、分組（A）/分面柱狀圖（C）俊啼、單基因蜂群點(diǎn)圖拼圖（D）。想必大家都知道這些圖表在生物醫(yī)學(xué)文獻(xiàn)中半壁江山的地位左医。用R-ggplot2來繪制它們授帕，簡單明了美觀，也不需要對(duì)數(shù)據(jù)進(jìn)行跨平臺(tái)保存讀取浮梢，可謂方便豪墅。

the panels that we are going to draw in this figure

我們依舊使用通過第一、二篇介紹的整合好并差異分析過的TCGA白種人LUSC肺鱗癌mRNA-seq轉(zhuǎn)錄組表達(dá)數(shù)據(jù)黔寇。

數(shù)據(jù)獲取

基于第一篇文章從TCGA數(shù)據(jù)庫下載并整合清洗高通量腫瘤表達(dá)譜-臨床性狀數(shù)據(jù)偶器，我們下載并清洗TCGA數(shù)據(jù)庫中white人種的LUSC肺鱗癌mRNA-seq轉(zhuǎn)錄組counts數(shù)據(jù)和FPKM數(shù)據(jù)。

隨后根據(jù)第二篇文章TCGA數(shù)據(jù)整合后進(jìn)行DESeq2差異表達(dá)分析和基于R的多種可視化對(duì)counts數(shù)據(jù)進(jìn)行了基于DESeq2的差異分析缝裤。

現(xiàn)在假設(shè)我們已經(jīng)獲得：
（樣本1到344為cancer屏轰，345到386為normal）
（1）resSigAll: the subset object generated by DESeq2 containing info about all differentially expressed genes.
（2）clinical_trait: the data frame containing submitter_id and tumor_stage of selected TCGA LUSC samples. It will be used for grouping.
（3）condition_table: the data frame defining the sample_type and recording the TCGA_IDs and submitter_id of each TCGA sample. It will be used for grouping.
（4）expr_vst: vst transformed normalized counts matrix of genes and samples generated by DESeq2. It is the raw material of downstream visualization.

需要R包

ggplot2 (作圖)，reshape2包 (對(duì)數(shù)據(jù)框格式進(jìn)行轉(zhuǎn)制)憋飞，ggsignif (統(tǒng)計(jì)學(xué)注釋包)霎苗，ggbeeswarm (蜂群圖作圖包)，customLayout榛做，和gridExtra(用于ggplot2對(duì)象拼圖)唁盏。

install.packages('ggplot2')
install.packages('reshape2')
install.packages('ggsignif')
install.packages('ggbeeswarm')
install.packages('customLayout')
install.packages('gridExtra')
library('ggplot2')
library('reshape2')
library('ggsignif')
library('ggbeeswarm')
library('customLayout')
library('gridExtra')

1. 分組柱狀圖

假設(shè)現(xiàn)在我們關(guān)注一個(gè)目的基因在normal和cancer組間的表達(dá)分布，且希望看到每組內(nèi)不同LUSC tumor stages間該基因的表達(dá)變化情況检眯。（因?yàn)閚ormal組也是來自于患者的癌旁組織厘擂，所以也可以分組）

則繪圖需要：
（1）normal/cancer分組以及組內(nèi)tumor stage亞組（x軸分組依據(jù)）
（2）亞組的vst counts均值（y軸，因?yàn)樽髦鶢顖D锰瘸，不需要每個(gè)散點(diǎn)的信息）
（3）亞組的vst counts標(biāo)準(zhǔn)差（errorbar）

我們隨機(jī)從差異基因中選取一個(gè)基因作為可視化對(duì)象刽严。自定義函數(shù)，輸入stage i/stageii/>= stage iii可通過clinical_trait和condition_table dataframe分別匹配相應(yīng)的normal和cancer樣本避凝，并分別計(jì)算均值和標(biāo)準(zhǔn)差舞萄。最后匯總得到

set.seed(10)
gene_chosen <- sample(resSigAll@rownames,1)
gene_counts_all <- expr_vst[gene_chosen,]
mean_sd_by_clinical <- function(input_stage){
  if(input_stage == '>= stage iii'){
    samples_indices <- clinical_trait$submitter_id[grep(clinical_trait$tumor_stage, pattern = 'stage iii[ab]?|stage iv')]
  } else {
    samples_indices <- clinical_trait$submitter_id[grep(clinical_trait$tumor_stage, pattern = paste0(input_stage, '[ab]?$'))]
  }
  normal_samples <- condition_table[condition_table$submitter_id %in% samples_indices & condition_table$sample_type == 'normal','TCGA_IDs']
  cancer_samples <- condition_table[condition_table$submitter_id %in% samples_indices & condition_table$sample_type == 'cancer','TCGA_IDs']
  data.frame('sample_type'=c('normal','cancer'), 'stages'=c(rep(input_stage,2)),
             'means'=c(mean(gene_counts_all[normal_samples]), mean(gene_counts_all[cancer_samples])),
             'sds'=c(sd(gene_counts_all[normal_samples]), sd(gene_counts_all[cancer_samples])))
}
mean_sd_table_by_clinical <- do.call(rbind, lapply(c('stage i', 'stage ii', '>= stage iii'), mean_sd_by_clinical))
mean_sd_table_by_clinical$sample_type <- factor(mean_sd_table_by_clinical$sample_type, 
                                                levels = c('normal','cancer'), ordered = T)

整理得到的數(shù)據(jù)框如下：

the integrated data for visualization

然后進(jìn)行統(tǒng)計(jì)顯著性框的標(biāo)注。這里打算用geom_segment進(jìn)行線條繪制管削，所以每個(gè)統(tǒng)一方向的線條都需要注明x/y軸的起點(diǎn)和終點(diǎn)坐標(biāo)倒脓。

注：這里s指start，e指end含思，l指left崎弃，r指right，h指horizontal，u指upper吊履，uh指upper horizontal。為了畫出顯著性注釋框调鬓，這里將左豎線艇炎、右豎線、封頂橫線腾窝、組間豎線分別用同一函數(shù)繪制缀踪，最后組間封頂橫線用一函數(shù)繪制。

the guide for how we draw the annotation

annotation_table <- data.frame('xsl'=c(0.7, 1.7), 'xel'=c(0.7, 1.7), 'ysl'=c(7.7, 8.7), 'yel'=c(8.2, 9.2),
                               'xsr'=c(1.3, 2.3), 'xer'=c(1.3, 2.3), 'ysr'=c(7.7, 8.7), 'yer'=c(8.2, 9.2),
                               'xsh'=c(0.7, 1.7), 'xeh'=c(1.3, 2.3), 'ysh'=c(8.2, 9.2), 'yeh'=c(8.2, 9.2),
                               'xsu'=c(1, 2), 'xeu'=c(1, 2), 'ysu'=c(8.2, 9.2), 'yeu'=c(9.7, 9.7))
annotation_table2 <- data.frame('xsuh'=1 ,'xeuh'=2, 'ysuh'=9.7, 'yeuh'=9.7)

接下來就是ggplot2繪圖：

? 將整理的mean_sd_table_by_clinical傳入ggplot函數(shù)虹脯，以主要分組依據(jù)normal/cancer為x軸驴娃，y為counts均數(shù)，【fill填色依據(jù)為組內(nèi)亞組tumor_stage】循集。

? geom_col函數(shù)繪制柱狀圖唇敞，組內(nèi)亞組的柱間間距為0.7 (position)，柱寬為0.5 (width)咒彤，柱的邊框?yàn)楹谏?(color) 【fill是填充顏色疆柔，color是點(diǎn)/線顏色】，邊框?yàn)閷?shí)線 (linetype = 1)镶柱。

? 手動(dòng)設(shè)置亞組的填充顏色旷档。

? 設(shè)置誤差棒的位置 (x繼承g(shù)gplot分組位置，需補(bǔ)充指定ymax, ymin)歇拆，頂端橫線寬度 (width)鞋屈，和亞組的位置 (position: 注意與主繪圖函數(shù)的間隔保持一致即可)。

? 不繼承主參數(shù)故觅，進(jìn)行注釋框的繪制 (geom_segment())厂庇，以及進(jìn)行單獨(dú)一行FDR值的注釋 (annotate())。

? 設(shè)定y軸顯示范圍输吏、坐標(biāo)軸小標(biāo)題和主題宋列。

stage_expr <- ggplot(mean_sd_table_by_clinical, aes(x=sample_type, y=means, fill=stages))+
  geom_col(position = position_dodge(width = 0.7), width = 0.5, color='black', linetype=1)+
  scale_fill_manual(values = c('stage i'='gray20', 'stage ii'='gray50', '>= stage iii'='gray80'))+
  geom_errorbar(aes(ymin=means-sds, ymax=means+sds), width=0.1, position = position_dodge(width = 0.7), size=0.5)+
  geom_segment(inherit.aes = F, data = annotation_table, aes(x=xsl,xend=xel,y=ysl,yend=yel))+
  geom_segment(inherit.aes = F, data = annotation_table, aes(x=xsr,xend=xer,y=ysr,yend=yer))+
  geom_segment(inherit.aes = F, data = annotation_table, aes(x=xsh,xend=xeh,y=ysh,yend=yeh))+
  geom_segment(inherit.aes = F, data = annotation_table, aes(x=xsu,xend=xeu,y=ysu,yend=yeu))+
  geom_segment(inherit.aes = F, data = annotation_table2, aes(x=xsuh,xend=xeuh,y=ysuh,yend=yeuh))+
  annotate('text', label='FDR<0.001', color='black', x=1.5, y=10.1, fontface='bold', size=3)+
  ylim(0,10.3)+xlab(gene_chosen)+ylab('vst transformed normalized counts')+
  theme_bw()+
  theme(panel.grid = element_line(color = 'white'))
stage_expr

即可得到圖片。注明：這里只是教學(xué)顯著性注釋框畫法评也，并未分亞組分別進(jìn)行假設(shè)檢驗(yàn)炼杖。

the barplot containing sub-groups

2. 分組小提琴圖

我們可能同時(shí)關(guān)注多個(gè)基因，想要同時(shí)展示它們?cè)诮M間的表達(dá)情況盗迟。此時(shí)也可以類似地坤邪，將多個(gè)基因作為主要分組，而將sample_type作為fill填色依據(jù)的組內(nèi)亞組罚缕，合并在一個(gè)panel進(jìn)行展示艇纺。

則繪圖需要：
（1）基因分組和每個(gè)基因組內(nèi)的normal/cancer亞組（x軸分組和fill填色依據(jù)）
（2）每個(gè)樣本相應(yīng)基因的表達(dá)值（這里是vst counts）（y軸）

注：需注意不同繪圖類型要求的數(shù)據(jù)形式。如作柱狀圖只需要每個(gè)亞組的均值，而散點(diǎn)圖黔衡、小提琴圖等需要亞組內(nèi)所有樣本的表達(dá)值（要進(jìn)行密度展示）蚓聘。又如手動(dòng)繪制errorbar需提前準(zhǔn)備每組的sd值，而繪制箱線圖時(shí)不需提供分位數(shù)盟劫，ggplot2會(huì)使用輸入表達(dá)數(shù)據(jù)自動(dòng)計(jì)算并繪圖夜牡。

這里在所有差異基因中隨機(jī)選取4個(gè)基因進(jìn)行展示。首先依附于expr_vst原本的數(shù)據(jù)形式整理成寬形式表格侣签，即每個(gè)基因的表達(dá)值占據(jù)一列塘装，并另有sample_type列進(jìn)行normal/cancer分組：

format demonstration of data frame for_multigroup_vis

使用reshape2包的melt函數(shù)將數(shù)據(jù)整理成長形式，此時(shí)主分組和亞組信息就可以直接分別從genes和sample_type列傳遞給ggplot了影所。

format demonstration of data frame melted_var_table

然后將主分組genes進(jìn)行factor化排序蹦肴。

set.seed(29)
for_multigroup_vis <- t(expr_vst[sample(resSigAll@rownames, 4),])
for_multigroup_vis <- data.frame(for_multigroup_vis, 'sample_type'=condition_table$sample_type, stringsAsFactors = F)
melted_var_table <- reshape2::melt(for_multigroup_vis, id.vars='sample_type', 
                                   variable.name='genes', value.name='vst_counts')
#tell ggplot2 the order of graphing within each gene group.
melted_var_table$sample_type <- factor(melted_var_table$sample_type, levels = c('normal','cancer'), ordered = T)

和前面的示例相同，建立坐標(biāo)表以使用geom_segment函數(shù)繪制各組的統(tǒng)計(jì)顯著性框猴娩，并建立坐標(biāo)和標(biāo)簽表以使用geom_text函數(shù)添加各組FDR文字注釋阴幌。

FDR_text_table <- data.frame('gene'=as.character(unique(melted_var_table$genes)),'start'=c(0.8,1.8,2.8,3.8), 'end'=c(1.19, 2.19,3.19,4.19),
                                'height'=c(22,15,16.5,16), annotations=c('FDR=2.15e-91','FDR=3.40e-59','FDR=1.74e-89','FDR=1.07e-48'))
segment_table <- data.frame('xsl'=c(0.8,1.8,2.8,3.8), 'xel'=c(0.8,1.8,2.8,3.8), 'ysl'=c(17,10,15.3,14.7), 'yel'=c(21,14,15.5,15),
                            'xsr'=c(1.2,2.2,3.2,4.2), 'xer'=c(1.2,2.2,3.2,4.2), 'ysr'=c(20.5,13.8,14,14.5), 'yer'=c(21,14,15.5,15),
                            'xsh'=c(0.8,1.8,2.8,3.8), 'xeh'=c(1.2,2.2,3.2,4.2), 'ysh'=c(21,14,15.5,15), 'yeh'=c(21,14,15.5,15))

ggplot2繪圖的思路基本相同，不再詳細(xì)描述卷中。值得注意的點(diǎn)是：

? 此次映射數(shù)據(jù)時(shí)以genes為x軸主分組裂七，sample_type為fill填色亞分組，每個(gè)樣本的vst_counts映射至y軸仓坞。

? 使用geom_violin()繪制小提琴圖捶枢，position參數(shù)設(shè)置主組內(nèi)亞組間距離嗤形，使用scale設(shè)置組間面積恒定 'area'恩伺。(或設(shè)置為width/count喊积，分別表示每組有相同的最大小提琴寬度和寬度與觀察值數(shù)成比例)

#draw the plot.
multiple_vio <- ggplot(melted_var_table, aes(x=genes, y=vst_counts, fill=sample_type))+
  geom_violin(position = position_dodge(width=0.8), scale = 'area')+
  geom_boxplot(position = position_dodge(width = 0.8), 
               outlier.size = 0, width = 0.1, show.legend = FALSE)+
  scale_fill_manual(values = c('cancer'='orange','normal'='blue'))+
  geom_text(inherit.aes = F,data = FDR_text_table, aes(x=gene, y=height, label=annotations), size=2.5, fontface='bold')+
  geom_segment(inherit.aes = F,data=segment_table,aes(x=xsl,xend=xel,y=ysl,yend=yel))+
  geom_segment(inherit.aes = F,data=segment_table,aes(x=xsr,xend=xer,y=ysr,yend=yer))+
  geom_segment(inherit.aes = F,data=segment_table,aes(x=xsh,xend=xeh,y=ysh,yend=yeh))+
  theme_bw()+theme(panel.grid = element_line(colour = 'white'), plot.title = element_text(hjust = 0.5))+
  ylab('normalized_counts')+xlab('genes')+ylim(3,24)+
  ggtitle('violin plot of multiple genes')
multiple_vio

獲得圖形為：

the violin plot containing multiple groups and sub-groups

3. 分面柱狀圖

上述兩個(gè)圖片的策略均是二次分組，x映射主分組嫉称，fill映射亞組侦镇。另一種策略是使用ggplot2的facet.grid()函數(shù)進(jìn)行分面展示，即每個(gè)主分組獲得一個(gè)panel分別可視化织阅。

則繪圖需要：
（1）同時(shí)含有基因和normal/cancer信息的獨(dú)立分組（x軸分組依據(jù)）
（2）每個(gè)組隸屬的基因信息（x軸分面）
（3）每個(gè)組隸屬的樣本類型（fill填充依據(jù)）
（4）每個(gè)組的表達(dá)值均數(shù)（y軸）
（5）每個(gè)組的表達(dá)值標(biāo)準(zhǔn)差（errorbar）

整理數(shù)據(jù)壳繁，沿用之前隨機(jī)選取的4個(gè)基因并整理成長格式的表達(dá)data frame。自建函數(shù)荔棉，向量化批量計(jì)算每個(gè)基因-每種樣本類型獨(dú)立組的表達(dá)平均值和標(biāo)準(zhǔn)差闹炉。以”gene_sample_type“的格式標(biāo)記為8個(gè)獨(dú)立組，并添加annotation列便于geom_text的注釋润樱。最后將獨(dú)立分組因子化用于手動(dòng)排序渣触，人工將同一個(gè)基因的normal/cancer組相鄰放置。

#draw the graphs by facet.
melted_var_table2 <- melted_var_table
#calculate the mean and sd of each group.
mean_sd <- lapply(as.character(unique(melted_var_table2$genes)), function(x){
  mean_normal <- mean(melted_var_table2$vst_counts[melted_var_table2$genes == x & melted_var_table2$sample_type == 'normal'])
  mean_cancer <- mean(melted_var_table2$vst_counts[melted_var_table2$genes == x & melted_var_table2$sample_type == 'cancer'])
  sd_normal <- sd(melted_var_table2$vst_counts[melted_var_table2$genes == x & melted_var_table2$sample_type == 'normal'])
  sd_cancer <- sd(melted_var_table2$vst_counts[melted_var_table2$genes == x & melted_var_table2$sample_type == 'cancer'])
  data.frame('gene'= c(x,x), 'sample_type'=c('normal','cancer'),
             'mean'=c(mean_normal, mean_cancer), 'sd'=c(sd_normal, sd_cancer))})
mean_sd <- do.call(rbind, mean_sd)
mean_sd_annotation <- data.frame(mean_sd, 'groups'=paste(mean_sd$gene, mean_sd$sample_type, sep = '_'),
                                 'annotation'= rep(c('','****')), stringsAsFactors = F)
mean_sd_annotation$groups <- factor(mean_sd_annotation$groups, levels = unique(mean_sd_annotation$groups),
                                   ordered = T)

整理好的數(shù)據(jù)框?yàn)槿缦乱既簦琯ene分類嗅钻，sample_type分類皂冰，將兩列字符串粘貼在一起而形成的獨(dú)立組信息，每組的mean/sd值养篓，以及每個(gè)組對(duì)應(yīng)的annotation信息：

format demonstration of the data frame mean_sd_annotation

ggplot2繪圖：值得注意的點(diǎn)是：

? ggplot函數(shù)x軸映射groups秃流，即各個(gè)獨(dú)立組。fill以sample_type分別配色柳弄。

? geom_text補(bǔ)充文字的y軸位置舶胀，即y=mean+sd+0.5。

? 使用facet_grid()函數(shù)语御，以輸入數(shù)據(jù)框的gene列按行分面 (~gene)，每個(gè)分面只保留有數(shù)據(jù)的x軸部分 (scale='free')席怪。

p <- ggplot(mean_sd_annotation, aes(x=groups, y=mean, fill=sample_type))+
  geom_col(width=0.4, linetype=1, color='black')+
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=0.1)+
  geom_text(aes(y=mean+sd+0.5,label=annotation), size = 3, fontface='bold')+
  scale_fill_manual(values=c('normal'='blue', 'cancer'='orange'))+
  facet_grid(~gene, scales='free')+
  scale_x_discrete(label=c('normal','cancer'))+
  ylab('vst-normalized counts')+xlab('groups')+
  theme_bw()+theme(panel.grid = element_line(colour = 'white'), legend.position = 'none')
p

獲得圖形如下：

每個(gè)框都屬于一個(gè)facet的分面柱狀圖

注：使用facet.grid函數(shù)時(shí)需要注意各個(gè)函數(shù)間需要有統(tǒng)一的映射數(shù)據(jù)应闯。比如：本例如果使用geom_violin()，則需輸入所有樣本的表達(dá)值而不是均值挂捻，這會(huì)使得文字注釋的變量長度與輸入數(shù)據(jù)框長度不一而無法合并到一個(gè)數(shù)據(jù)框中共同傳入ggplot()碉纺。若geom_text不繼承全局參數(shù)，會(huì)導(dǎo)致facet.grid認(rèn)為每個(gè)分面在x軸的每個(gè)分組上均有數(shù)據(jù)刻撒，導(dǎo)致布局混亂骨田。

總之，分facet這個(gè)方法是比較局限的声怔。

4. 拼接多個(gè)單基因蜂群圖

如果我想使用更多樣性的可視化方法呢态贤？比如在樣本例數(shù)較多時(shí)使用蜂群圖而不是較為死板的柱狀圖？

an example of beeswarm plot downloaded from an article of Plos One

R中有beeswarm包醋火，加載后使用geom_beeswarm()函數(shù)可實(shí)現(xiàn)本功能悠汽。但由于未知的原因，它和geom_jitter()都不能像geom_violin()一樣進(jìn)行二次分組芥驳，如果采取第二種作圖策略直接設(shè)置ggplot(data, aes(x=gene, y=vst_counts, color=sample_type))柿冲，他們僅表現(xiàn)為geom_point即可完成的單純散點(diǎn)圖。這不是我們想要的效果兆旬。如下圖：

an erroneous example

所以這里我使用產(chǎn)生多個(gè)單基因一次分組的蜂群散點(diǎn)圖假抄，并用拼圖包將所有單圖拼接。

繪圖需要：
（1）各基因的sample_type (x軸)丽猬；
（2）各組內(nèi)樣本的表達(dá)值 (這里是vst-counts) (y軸)宿饱；
（3）各組的平均值和標(biāo)準(zhǔn)差 (自制mean±sd errorbar)。

首先整理數(shù)據(jù)框脚祟，獲得各樣本的gene刑棵，sample_type，vst_counts數(shù)據(jù)框愚铡，各組的均值蛉签、標(biāo)準(zhǔn)差數(shù)據(jù)框胡陪，和各基因組間顯著性標(biāo)記數(shù)據(jù)框。

#draw the bee plot of the same data by another manner (by joining graphs together).
melted_var_table3 <- melted_var_table
#we should manually tell the ggplot2 how we oder the x axis by transforming groups into factors.
melted_var_table3$sample_type <- factor(melted_var_table3$sample_type, 
                                        levels = c('normal','cancer'), ordered = T)
#calculate the mean and sd of each group.
mean_sd <- lapply(as.character(unique(melted_var_table3$genes)), function(x){
  mean_normal <- mean(melted_var_table3$vst_counts[melted_var_table3$genes == x & melted_var_table3$sample_type == 'normal'])
  mean_cancer <- mean(melted_var_table3$vst_counts[melted_var_table3$genes == x & melted_var_table3$sample_type == 'cancer'])
  sd_normal <- sd(melted_var_table3$vst_counts[melted_var_table3$genes == x & melted_var_table3$sample_type == 'normal'])
  sd_cancer <- sd(melted_var_table3$vst_counts[melted_var_table3$genes == x & melted_var_table3$sample_type == 'cancer'])
  data.frame('gene'= c(x,x), 'sample_type'=c('normal','cancer'),
             'mean'=c(mean_normal, mean_cancer), 'sd'=c(sd_normal, sd_cancer))})
mean_sd <- do.call(rbind, mean_sd)
#we have gotten the FDRs from the object of DE analysis before.
annotation_FDR <- data.frame('gene'=as.character(unique(melted_var_table$genes)),
                             'annotations'=c('FDR=2.15e-91','FDR=3.40e-59','FDR=1.74e-89','FDR=1.07e-48'))

ggplot2作圖碍舍。由于拼圖R包c(diǎn)ustomLayout拼圖的輸入對(duì)象是多個(gè)ggplot2繪圖對(duì)象的list柠座，因此這里創(chuàng)建一個(gè)自定義ggplot2繪圖function，之后配合lapply函數(shù)建立多個(gè)單基因蜂群圖的list片橡。

幾個(gè)注意點(diǎn)：
? geom_beeswarm()繪制蜂群點(diǎn)圖妈经，cex設(shè)置點(diǎn)之間的排列距離，priority設(shè)置一個(gè)組中點(diǎn)的排列方式捧书。

? 使用geom_point()吹泡，以均值標(biāo)準(zhǔn)差數(shù)據(jù)框作為輸入，標(biāo)記每個(gè)組的平均值经瓷，設(shè)置shape為3即加號(hào) (+)爆哑，方便與errorbar融合在一起。

? 使用geom_errorbar()舆吮，以均值標(biāo)準(zhǔn)差數(shù)據(jù)框作為輸入揭朝，繪制sd值errorbar，在圖中與均值的＋號(hào)融合形成三分errorbar形狀色冀。

? 使用一個(gè)新的R包ggsignif進(jìn)行統(tǒng)計(jì)學(xué)注釋潭袱。geom_signif()仍繼承全局參數(shù)，comparisons輸入所有進(jìn)行比較的組間名稱的list (list(c('group1', 'group2'), c('group3', 'group4'), ...)) 锋恬，annotations參數(shù)則是按comparisons參數(shù)中輸入比較組的順序添加文字注釋屯换。本包只能進(jìn)行一次分組的比較，而不能進(jìn)行由fill/color劃分的亞組間的比較注釋与学。

? 為了防止不同基因間y軸區(qū)間差異過大趟径，使用ylim()設(shè)置每個(gè)圖的y軸作圖區(qū)間僅分布在本基因樣本最小表達(dá)值-1.5，到最大表達(dá)值+1.5之間癣防。

multi_bee_plot <- function(gene_for_vis){
  library(ggplot2)
  library(ggbeeswarm)
  library(ggsignif)
  ggplot(melted_var_table3[melted_var_table3$genes == gene_for_vis,], aes(x=sample_type, y=vst_counts, color=sample_type))+
  geom_beeswarm(cex = 1.5, size=0.5, priority = 'ascending')+ #priority=none/random/ascending/descending
  geom_point(inherit.aes = F, data=mean_sd[mean_sd$gene == gene_for_vis,],
             aes(x=sample_type, y=mean), shape=3, color='black', size=5)+
  geom_errorbar(inherit.aes = F, data=mean_sd[mean_sd$gene == gene_for_vis,],
                aes(x=sample_type, ymin=mean-sd, ymax=mean+sd), width=0.5, size=0.5)+
  scale_x_discrete(labels=c('normal','cancer'))+
  scale_color_manual(values=c('normal'='blue','cancer'='orange'))+
  geom_signif(comparisons = list(c('normal','cancer')), annotations = annotation_FDR$annotations[annotation_FDR$gene == gene_for_vis],
              color='black', size = 0.5, textsize=3, fontface='bold')+
  xlab(gene_for_vis)+ylab('vst normalized counts')+
  ylim(min(melted_var_table3[melted_var_table3$genes == gene_for_vis,'vst_counts'])-1.5,
       max(melted_var_table3[melted_var_table3$genes == gene_for_vis,'vst_counts'])+1.5)+
  theme_bw()+
  theme(panel.grid = element_line(color = 'white'), legend.position = 'none')
}

使用拼圖R包c(diǎn)ustomLayout進(jìn)行拼圖蜗巧。

思路是用lay_new函數(shù)創(chuàng)建新畫布，建立matrix蕾盯，用其中的行幕屹、列、和數(shù)字分別代表畫布中容納圖片的行级遭、列數(shù)和圖片排列的順序望拖。

lay <- lay_new(mat=matrix(1:4, ncol = 4), widths=c(2,2,2,2),heights = 1)
lay_show(lay)

lay_show函數(shù)用于展示創(chuàng)建的畫布：

the canvas we generated

lapply函數(shù)創(chuàng)建繪圖對(duì)象list，并用lay_grid函數(shù)進(jìn)行拼圖輸出挫鸽。

multiple_beeswarm <- lapply(as.character(unique(melted_var_table2$genes)), multi_bee_plot)
lay_grid(grobs = multiple_beeswarm, lay = lay)

獲得如下組圖：

beeswarm plots collage

一共四種解決方案说敏，希望本文能對(duì)大家有所幫助！

喜歡的朋友就學(xué)起來丢郊，點(diǎn)個(gè)贊吧盔沫！

最后編輯于：2020.04.28 23:11:26

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者

禁止轉(zhuǎn)載医咨，如需轉(zhuǎn)載請(qǐng)通過簡信或評(píng)論聯(lián)系作者。

人面猴
序言：七十年代末架诞，一起剝皮案震驚了整個(gè)濱河市拟淮，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌谴忧，老刑警劉巖很泊，帶你破解...
沈念sama閱讀 206,013評(píng)論 6贊 481
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場(chǎng)離奇詭異沾谓，居然都是意外死亡委造，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,205評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門均驶，熙熙樓的掌柜王于貴愁眉苦臉地迎上來昏兆，“玉大人，你說我怎么就攤上這事辣恋×恋妫” “怎么了模软？”我有些...
開封第一講書人閱讀 152,370評(píng)論 0贊 342
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵伟骨，是天一觀的道長。經(jīng)常有香客問我燃异，道長携狭，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 55,168評(píng)論 1贊 278
?港島之戀（遺憾婚禮）
正文為了忘掉前任回俐，我火速辦了婚禮逛腿，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘仅颇。我一直安慰自己单默，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,153評(píng)論 5贊 371
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布忘瓦。她就那樣靜靜地躺著搁廓，像睡著了一般。火紅的嫁衣襯著肌膚如雪耕皮。梳的紋絲不亂的頭發(fā)上境蜕，一...
開封第一講書人閱讀 48,954評(píng)論 1贊 283
城市分裂傳說
那天，我揣著相機(jī)與錄音凌停，去河邊找鬼粱年。笑死，一個(gè)胖子當(dāng)著我的面吹牛罚拟，可吹牛的內(nèi)容都是我干的台诗。我是一名探鬼主播完箩，決...
沈念sama閱讀 38,271評(píng)論 3贊 399
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場(chǎng)噩夢(mèng)啊……” “哼拉庶！你這毒婦竟也來了嗜憔？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 36,916評(píng)論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對(duì)情侶失蹤氏仗，失蹤者是張志新（化名）和其女友劉穎皆尔，沒想到半個(gè)月后慷蠕，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體澎现，經(jīng)...
沈念sama閱讀 43,382評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡渠欺，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 35,877評(píng)論 2贊 323
?白月光啟示錄
正文我和宋清朗相戀三年舔稀，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了产园。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片拂酣。...
茶點(diǎn)故事閱讀 37,989評(píng)論 1贊 333
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡埃撵，死狀恐怖暂刘，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情森缠，我是刑警寧澤贵涵，帶...
沈念sama閱讀 33,624評(píng)論 4贊 322
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站跨晴，受9級(jí)特大地震影響坟奥，放射性物質(zhì)發(fā)生泄漏树瞭。R本人自食惡果不足惜拇厢，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,209評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一凉敲、第九天我趴在偏房一處隱蔽的房頂上張望衣盾。院中可真熱鬧，春花似錦爷抓、人聲如沸势决。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,199評(píng)論 0贊 19
一樁弒父案蓝撇，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽果复。三九已至，卻和暖如春渤昌，著一層夾襖步出監(jiān)牢的瞬間虽抄，已是汗流浹背走搁。一陣腳步聲響...
開封第一講書人閱讀 31,418評(píng)論 1贊 260
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留迈窟，地道東北人私植。一個(gè)月前我還...
沈念sama閱讀 45,401評(píng)論 2贊 352
代替公主和親
正文我出身青樓，卻偏偏與公主長得像车酣，于是被迫代替她去往敵國和親曲稼。傳聞我的和親對(duì)象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,700評(píng)論 2贊 345

單/多個(gè)基因在組間同時(shí)展示的多種選擇：分組小提琴圖篮幢、分組/分面柱狀圖慷嗜、單基因蜂群點(diǎn)圖拼圖繪制