測序結(jié)果概覽：基因表達(dá)量rank瀑布圖失驶，高密度表達(dá)相關(guān)性散點(diǎn)圖，注釋特定基因及errorbar的表達(dá)相關(guān)性散點(diǎn)圖繪制

這是ggplot2可視化專題的第一個(gè)實(shí)例操作

ggplot2的基本思路見前文總論：基于ggplot2的RNA-seq轉(zhuǎn)錄組可視化：總述和分文目錄

【ggplot2繪圖具體策略第二篇】：單/多個(gè)基因在組間同時(shí)展示的多種選擇：分組小提琴圖歹垫、分組/分面柱狀圖、單基因蜂群點(diǎn)圖拼圖繪制

【ggplot2繪圖具體策略第三篇】：配對樣本基因表達(dá)趨勢：優(yōu)化前后的散點(diǎn)連線圖+拼圖繪制

在完成差異表達(dá)分析后颠放，我們通常想使用圖片對分組間的所有測到的基因表達(dá)情況進(jìn)行總體的展示排惨。此時(shí)，組間標(biāo)準(zhǔn)化counts碰凶、或FPKM或TPM的相關(guān)性散點(diǎn)圖即可以完成這一目的暮芭，在展示所有基因在兩組間的表達(dá)水平的同時(shí)，可以選擇性突出我們重點(diǎn)關(guān)注的hits痒留。

此外在展示差異基因的層面谴麦，除了熱圖能夠通過色階反映基因在樣本間的表達(dá)離散程度外，瀑布圖更能直觀地反映差異基因的foldchange以及整體的變化趨勢伸头。

此前在文獻(xiàn)閱讀過程中發(fā)現(xiàn)幾張figures還挺喜歡的匾效，分別是測序后FoldChange和p值的總體展示瀑布圖、測序結(jié)果間相關(guān)性高密度散點(diǎn)圖恤磷，和帶errorbar注釋的測序結(jié)果間相關(guān)性散點(diǎn)圖面哼。

the figures that we're going to imitate

致謝以下研究人員，引用如有不妥請聯(lián)系我刪除扫步。

A圖來源：Ye, L.; Park, J.J.; Dong, M.B.; Yang, Q.; Chow, R.D.; Peng, L.; Du, Y.; Guo, J.; Dai, X.; Wang, G., et al. In vivo CRISPR screening in CD8 T cells with AAV-Sleeping Beauty hybrid vectors identifies membrane targets for improving immunotherapy for glioblastoma. Nat Biotechnol 2019, 37, 1302-1313, doi:10.1038/s41587-019-0246-4.

B圖來源：Zhao, Y.; Tyrishkin, K.; Sjaarda, C.; Khanal, P.; Stafford, J.; Rauh, M.; Liu, X.; Babak, T.; Yang, X. A one-step tRNA-CRISPR system for genome-wide genetic interaction mapping in mammalian cells. Sci Rep 2019, 9, 14499, doi:10.1038/s41598-019-51090-3.

C圖來源：Chow, R.D.; Wang, G.; Ye, L.; Codina, A.; Kim, H.R.; Shen, L.; Dong, M.B.; Errami, Y.; Chen, S. In vivo profiling of metastatic double knockouts through CRISPR-Cpf1 screens. Nat Methods 2019, 16, 405-408, doi:10.1038/s41592-019-0371-5.

下面就嘗試來用前文介紹的TCGA白人LUSC肺鱗癌mRNA-seq公共數(shù)據(jù)復(fù)制一下圖片大致的形式魔策。

數(shù)據(jù)獲取

基于第一篇文章從TCGA數(shù)據(jù)庫下載并整合清洗高通量腫瘤表達(dá)譜-臨床性狀數(shù)據(jù)，我們下載并清洗TCGA數(shù)據(jù)庫中white人種的LUSC肺鱗癌mRNA-seq轉(zhuǎn)錄組counts數(shù)據(jù)和FPKM數(shù)據(jù)河胎。

隨后根據(jù)第二篇文章TCGA數(shù)據(jù)整合后進(jìn)行DESeq2差異表達(dá)分析和基于R的多種可視化對counts數(shù)據(jù)進(jìn)行了基于DESeq2的差異分析闯袒。

現(xiàn)在假設(shè)我們已經(jīng)獲得：
（樣本是1到344為cancer，345到386為normal）

（1）dds_DE: the object generated by DESeq2 containing all info about the differential expression analysis.
（2）resSigAll: the object containing all up-regulated and down-regulated significant genes extracted by results and subset function from dds_DE.
（3）resSigUp: the subset object of resSigAll containing all up-regulated genes.
（4）resSigDown: the subset object of resSigAll containing all down-regulated genes.
（5）expr_vst: vst-transformed normalized counts matrix generated by DESeq2.

需要R包：

ggplot2 (作圖)游岳、DESeq2 (從差異分析對象中提取一些數(shù)據(jù))政敢、ggrepel (不會(huì)發(fā)生重疊的基于ggplot2的文字注釋包)、IDPmisc（繪制像素化高密度散點(diǎn)圖）胚迫，和dplyr (進(jìn)行文字處理)喷户。

install.packages('ggplot2')
install.packages('DESeq2')
install.packages('ggrepel')
install.packages('dplyr')
install.packages('IDPmisc')
library('ggplot2')
library('DESeq2')
library('ggrepel')
library('dplyr')
library('IDPmisc')

基因表達(dá)量rank瀑布圖（A）

我們首先需要想好ggplot2需要的數(shù)據(jù)是什么。觀察圖A访锻，我們可知繪圖需要：
（1）所有在DESeq2中分析的qualified genes的log2FC（x軸）褪尝；
（2）基因名和log2FC ranks（y軸）；
（3）基因FC的校正后p值（點(diǎn)的size）期犬。

因此使用DESeq2包的results函數(shù)獲得（幾乎）所有基因的基因名河哑、log2FC、-log10(p.adj)龟虎、表達(dá)rank璃谨，并按rank倒序排列。由于需要用顏色突出top5和bottom5差異基因遣总，因此dataframe還需加一列'hits'將基因分為top5/medium/bottom5睬罗。準(zhǔn)備將該數(shù)據(jù)框傳遞給畫圖主體函數(shù)。

由于只對top/bottom5差異基因進(jìn)行文字注釋旭斥，因此還需要準(zhǔn)備一個(gè)只包含它們的數(shù)據(jù)框傳遞給注釋函數(shù)容达。

pre_ranked_all_genes <- as.data.frame(results(dds_DE, alpha=0.9999,contrast=c("sample_type","cancer","normal")))
pre_ranked_all_genes <- pre_ranked_all_genes[order(pre_ranked_all_genes$log2FoldChange, decreasing = F),]
trend_all <- sapply(pre_ranked_all_genes$log2FoldChange, function(x){if(x>0) 'up' else 'down'})
pre_ranked_all_genes <- data.frame('gene_names'=rownames(pre_ranked_all_genes), pre_ranked_all_genes, 
                                   'trend'=trend_all, 'rank'=1:nrow(pre_ranked_all_genes), 
                                   'hits'='medium', stringsAsFactors = F)
pre_ranked_all_genes[1:5, 'hits'] <- 'bottom5'
pre_ranked_all_genes[(nrow(pre_ranked_all_genes)-4):nrow(pre_ranked_all_genes), 'hits'] <- 'top5'
pre_ranked_all_genes$padj <- -log10(pre_ranked_all_genes$padj)
set.seed(12)
to_be_pointed_out_all <- pre_ranked_all_genes[c(1:5, (nrow(pre_ranked_all_genes)-4):nrow(pre_ranked_all_genes)),]

隨后可進(jìn)行繪圖，思路和使用參數(shù)解釋如下（可對比代碼理解）：

? 含所有基因的數(shù)據(jù)框傳遞給ggplot函數(shù) (data=)垂券，x軸為rank花盐，y軸為log2FoldChange，color以hits列分類 (aes())菇爪，作為繪圖對象中的全局變量算芯。

? 調(diào)用散點(diǎn)圖函數(shù) (geom_point()) 并在繼承變量的基礎(chǔ)上補(bǔ)充參數(shù)，即本函數(shù)控制的散點(diǎn)圖中點(diǎn)的size由padj列控制 (aes())凳宙。

? 添加y軸截距為2熙揍，0和-2的水平虛線 (geom_hline(xintercept=, size=, linetype=))，添加x軸位于上/下調(diào)基因間的豎直虛線 (geom_vline(yintercept=, ...))氏涩。

? 選擇不繼承g(shù)gplot函數(shù)的輸入變量 (inherit.aes=F)届囚，將僅含top/bottom5基因信息的小dataframe傳入ggrepel包的文字注釋函數(shù) (geom_text.repel(data=,mapping=))，指定與全局相同的x/y軸是尖、注釋文字和顏色信息 (aes(x=, y=, label=, color=))意系，并限定注釋大小 (size)，排列方式 (direction='x'/'y')饺汹，和注釋位置的x軸范圍限制 (xlim=c(m,n))蛔添。

? 手動(dòng)設(shè)置ggplot函數(shù)傳入color分類變量的對應(yīng)顏色 (scale_color_manual(values=c('type1'='color1', ...)))；設(shè)置size對應(yīng)圖例的名稱 (scale_size_continuous(name=))兜辞。

? 最后設(shè)置x軸小標(biāo)題(xlab())迎瞧，在默認(rèn)白色標(biāo)題(theme_bw())的基礎(chǔ)上修改分割線為白色，并居中圖例標(biāo)題(theme())弦疮。

fall_plot <- ggplot(pre_ranked_all_genes, aes(x=rank, y=log2FoldChange, color=hits))+
  geom_point(aes(size=padj))+
  geom_hline(yintercept = c(2,-2), linetype=2, size=0.25)+
  geom_hline(yintercept = c(0), linetype=1, size=0.5)+
  geom_vline(xintercept = sum(pre_ranked_all_genes$trend == 'down')+0.5, linetype=2, size=0.25)+
  ggrepel::geom_text_repel(inherit.aes = F, data = to_be_pointed_out_all, aes(x=rank, y=log2FoldChange, label=gene_names, color=hits), 
                           size=3, direction = 'y', xlim = c(1000, 15000))+
  scale_color_manual(values=c('bottom5'='blue', 'top5'='red', 'medium'='black'))+
  scale_size_continuous(name='-log10(FDR)')+
  scale_y_continuous(breaks=c(-5, -2, 0, 2, 5, 10))+
  xlab('rank of differentially expressed genes')+
  theme_bw()+
  theme(panel.grid = element_line(color='white'), legend.title.align = 0.5)
fall_plot

感覺還原度還挺不錯(cuò)的夹攒！

the plot we get

另一個(gè)版本的只含差異基因的瀑布圖：

這個(gè)版本我們只選取差異基因作圖，隨機(jī)選取5個(gè)基因進(jìn)行注釋胁塞，顏色代表foldchange咏尝，而不再將size與校正p值關(guān)聯(lián)。

整體流程區(qū)別不大啸罢。作圖的代碼可以嘗試自行理解编检。差別主要體現(xiàn)在：
ggplot函數(shù)中aes()映射的color變量由上面的三分類變量（top5/medium/bottom5）變成了連續(xù)變量（log2FoldChange），因此顏色的設(shè)置發(fā)生變化扰才。

這里使用了scale_color_gradient2()設(shè)置顏色允懂，其特點(diǎn)為可以設(shè)置3種顏色 (low=, high=, mid=, midpoint=) 根據(jù)指定值由低到高的漸變。

#draw the foldchange plot with specific annotations.
pre_ranked_sig_genes <- as.data.frame(resSigAll)[,c(2,6)] %>% data.frame('gene_names'=resSigAll@rownames, stringsAsFactors = F)
pre_ranked_sig_genes <- pre_ranked_sig_genes[order(pre_ranked_sig_genes$log2FoldChange, decreasing = T),]
trend <- sapply(pre_ranked_sig_genes$log2FoldChange, function(x){if(x>0) 'up' else 'down'})
pre_ranked_sig_genes <- data.frame(pre_ranked_sig_genes, 'trend'=trend,
                                   'rank'=1:nrow(pre_ranked_sig_genes), stringsAsFactors = F)
#randomly select 5 genes to be annotated in the plot.
set.seed(12)
to_be_pointed_out <- pre_ranked_sig_genes[sample(1:nrow(pre_ranked_sig_genes), 5),]

fall_plot <- ggplot(pre_ranked_sig_genes, aes(x=rank, y=log2FoldChange, color=log2FoldChange))+
  geom_point(size=1)+
  geom_hline(yintercept = c(2,-2), linetype=2, size=0.25)+
  geom_hline(yintercept = c(0), linetype=1, size=0.5)+
  geom_vline(xintercept = 1636.5, linetype=2, size=0.25)+
  scale_color_gradient2(low="navy", high="firebrick3", mid="white", midpoint=0)+
  geom_point(inherit.aes = F, data=to_be_pointed_out, aes(x=rank, y=log2FoldChange), size = 3, color='black')+
  geom_point(inherit.aes = F, data=to_be_pointed_out, aes(x=rank, y=log2FoldChange), size = 2, color='yellow')+
  ggrepel::geom_text_repel(inherit.aes = F, data = to_be_pointed_out, aes(x=rank, y=log2FoldChange, label=gene_names), size=3)+
  xlab('rank of differentially expressed genes')+
  theme_bw()+
  theme(panel.grid = element_line(color='white'), legend.title.align = 0.5)
fall_plot

得到的圖形為:

another version of FoldChange plot

高密度表達(dá)相關(guān)性散點(diǎn)圖（B）

這個(gè)figure的意義是為了說明測序結(jié)果的可重復(fù)性和可比較性衩匣，即主要的散點(diǎn)集中在散點(diǎn)圖的對角線上蕾总，而色域能很好地反映這一聚集特點(diǎn)粥航。

繪圖需要：
（1）在cancer樣本中的所有基因表達(dá)值（這里用vst counts）均數(shù)（y軸）；
（2）在normal樣本中的所有基因表達(dá)值（這里用vst counts）均數(shù)（x軸）生百；
（3）散點(diǎn)的聚集密度信息（色域）递雀。

其實(shí)已經(jīng)有其他的R包或ggplot2中的函數(shù)可以完成類似的功能。但感覺沒有達(dá)到article中的效果蚀浆。article中的figure是直接對散點(diǎn)進(jìn)行上色缀程，而不是像素化的效果：

（1）IDPmisc包：iplot函數(shù)將畫圖區(qū)域按pixs命令的分辨率劃分為若干方格，按落在每個(gè)方格中的點(diǎn)密度進(jìn)行繪圖市俊。

pixs參數(shù)設(shè)置像素邊長杨凑，colramp參數(shù)設(shè)置漸變色彩。

expr_for_visualization <- data.frame('cancer'=rowMeans(expr_vst[,1:344]), 'normal'=rowMeans(expr_vst[,345:386]), stringsAsFactors = F)
library(IDPmisc)
IDPmisc::iplot(x = expr_for_visualization$normal, y=expr_for_visualization$cancer, pixs = 0.4, 
               colramp=colorRampPalette(colors = c('black','blue','yellow','red')),
               xlab='normal', ylab='cancer', main='correlation scattorplot with density')

（2）ggplot2包stat_bin2d()函數(shù)：將畫圖區(qū)域分為30*30的方格進(jìn)行密度計(jì)算和可視化摆昧。

expr_for_visualization <- data.frame('cancer'=rowMeans(expr_vst[,1:344]), 'normal'=rowMeans(expr_vst[,345:386]), stringsAsFactors = F)
ggplot(expr_for_visualization, aes(x=normal, y=cancer))+stat_bin2d()+theme_bw()+
  theme(panel.grid = element_line(colour = 'white'))

得到的結(jié)果如下撩满，都是像素化效果。左邊還好据忘，右邊丑哭：

density plot drawn by IDPmisc (left)-density plot drawn by ggplot2-stat_bin2d() function (right)

所以我自己用ggplot2進(jìn)行了自定義作圖：

圖片的繪制思想：繪制表達(dá)均值的散點(diǎn)圖鹦牛，以散點(diǎn)密度為連續(xù)變量定義散點(diǎn)的color。

具體做法1：直接將繪圖區(qū)域劃分網(wǎng)格并計(jì)算網(wǎng)格內(nèi)密度

設(shè)定將繪圖區(qū)域分為70*70的網(wǎng)格勇吊，使用cut函數(shù)進(jìn)行x/y軸的分組曼追，使用table函數(shù)進(jìn)行網(wǎng)格密度頻數(shù)統(tǒng)計(jì)得到freq，最終merge函數(shù)左合并至坐標(biāo)dataframe汉规，讓每個(gè)散點(diǎn)都獲得自己所在網(wǎng)格的密度數(shù)礼殊。

#Draw the high-density scatterplot.
intended_pixel_num = 70
expr_for_visualization <- data.frame('cancer'=rowMeans(expr_vst[,1:344]), 'normal'=rowMeans(expr_vst[,345:386]), stringsAsFactors = F)
cancer_notches <- seq(from = min(expr_for_visualization$cancer-0.1),to = max(expr_for_visualization$cancer+0.1), length.out = (intended_pixel_num + 1))
normal_notches <- seq(from = min(expr_for_visualization$normal-0.1),to = max(expr_for_visualization$normal+0.1), length.out = (intended_pixel_num + 1))
cancer_groups <- as.character(cut(expr_for_visualization$cancer, breaks = cancer_notches))
normal_groups <- as.character(cut(expr_for_visualization$normal, breaks = normal_notches))
expr_for_visualization <- data.frame('genes'=rownames(expr_for_visualization), expr_for_visualization,
                                     'groups'=paste(cancer_groups, normal_groups, sep = '_'), stringsAsFactors = F)
frequency <- as.data.frame(table(expr_for_visualization$groups))
frequency$Var1 <- as.character(frequency$Var1)
colnames(frequency) <- c('groups','freq')
expr_for_visualization <- merge(expr_for_visualization, frequency, by='groups', all.x = T)

然后進(jìn)行g(shù)gplot2的繪圖：

? 含所有基因在cancer和normal樣本間vst counts均值的數(shù)據(jù)框傳遞給ggplot函數(shù)，x軸為normal针史，y軸為cancer晶伦，color由freq列連續(xù)變量控制，作為繪圖對象中的全局變量啄枕。

? 用geom_point繼承輸入變量作散點(diǎn)圖婚陪，散點(diǎn)size固定。

? scale_color_gradientn()函數(shù)設(shè)置連續(xù)變量的多種顏色漸變频祝。

? ggtitle()設(shè)置figure標(biāo)題泌参，統(tǒng)一使用自己習(xí)慣的作圖主題，并將標(biāo)題居中常空。

high_density_scatterplot <- ggplot(expr_for_visualization, aes(x=normal, y=cancer, colour=freq))+
  geom_point(size=0.1)+
  scale_color_gradientn(colours=c('black','blue','yellow','red'))+
  ggtitle('correlation scattorplot with density')+
  theme_bw()+theme(panel.grid = element_line(colour = 'white'), 
                   plot.title = element_text(hjust = 0.5))
high_density_scatterplot

獲得的圖片和第二種方法同時(shí)展示在下面沽一。方便比較。

具體做法2：將繪圖區(qū)域劃分網(wǎng)格后漓糙，計(jì)算網(wǎng)格內(nèi)和網(wǎng)格周圍部分區(qū)域密度

可以看到铣缠，雖然是對散點(diǎn)著色，但在高密度區(qū)由于散點(diǎn)密集，分割像素化的痕跡還是很明顯蝗蛙。所以這次擴(kuò)大每個(gè)網(wǎng)格計(jì)算密度的范圍蝇庭，使得每個(gè)網(wǎng)格取密度時(shí)相互重疊，或許可以獲得更好的過渡效果捡硅。

設(shè)定將繪圖區(qū)域分為100*100的網(wǎng)格遗契，使用兩輪for循環(huán)（用for循環(huán)真是要了R的老命...），逐個(gè)網(wǎng)格進(jìn)行分組標(biāo)記和網(wǎng)格內(nèi)散點(diǎn)密度數(shù)標(biāo)記病曾。

隨后的ggplot2繪圖命令幾乎是完全相同的。不再過細(xì)解釋漾根。

intended_pixel_num <- 100
#data integration and formating.
expr_for_visualization <- data.frame('cancer'=rowMeans(expr_vst[,1:344]), 'normal'=rowMeans(expr_vst[,345:386]),
                                     'series'=rep('0', nrow(expr_vst)), 'counts'= rep(0, nrow(expr_vst)), stringsAsFactors = F)
cancer_notches <- seq(from = min(expr_for_visualization$cancer-0.1),to = max(expr_for_visualization$cancer+0.1), length.out = (intended_pixel_num + 1))
normal_notches <- seq(from = min(expr_for_visualization$normal-0.1),to = max(expr_for_visualization$normal+0.1), length.out = (intended_pixel_num + 1))
n <- 0
for(rows in 1:(length(cancer_notches)-1)) {
  for (columns in 1:(length(normal_notches)-1)) {
    n = n+1
    series_num <- as.character(n)
    cancer_interval <- cancer_notches[3]-cancer_notches[2]
    normal_interval <- normal_notches[3]-normal_notches[2]
    indices <- dplyr::intersect(which(expr_for_visualization$cancer>=(cancer_notches[rows]) & expr_for_visualization$cancer<(cancer_notches[rows+1])),
                     which(expr_for_visualization$normal>=(normal_notches[columns]) & expr_for_visualization$normal<(normal_notches[columns+1])))
    counts <- dplyr::intersect(which(expr_for_visualization$cancer>=(cancer_notches[rows]-cancer_interval*0.5) & expr_for_visualization$cancer<(cancer_notches[rows+1]+cancer_interval*0.5)),
                                which(expr_for_visualization$normal>=(normal_notches[columns]-normal_interval*0.5) & expr_for_visualization$normal<(normal_notches[columns+1]+normal_interval*0.5)))
    if(length(indices) >0){
      expr_for_visualization[indices,'series'] <- series_num
      expr_for_visualization[indices,'counts'] <- length(counts)
    }
  }
}

#draw the plot.
high_density_scatterplot <- ggplot(expr_for_visualization, aes(x=normal, y=cancer, colour=counts))+
  theme_bw()+theme(panel.grid = element_line(colour = 'white'), plot.title = element_text(hjust = 0.5))+
  geom_point(size=0.02)+ggtitle('correlation scattorplot with density')+
  scale_color_gradientn(colours=c('black','blue','yellow','red'))
high_density_scatterplot

得到的結(jié)果說明泰涂，第二種采樣取密度的方法得到的結(jié)果很好地減輕了像素化的程度。
【不過我還是不知道article的原做法...它的色域漸變非常流暢辐怕！如果有朋友有相關(guān)經(jīng)驗(yàn)逼蒙，歡迎交流！】

the high-density scatterplot drawn by method1 (left)-the high-density scatterplot drawn by method2 (right)

注釋特定基因及errorbar的表達(dá)相關(guān)性散點(diǎn)圖（C）

圖片的特點(diǎn)就在于對特定的outliers做了表達(dá)量的errorbar注釋寄疏，增加了擬合直線和置信區(qū)間是牢，以及左上角進(jìn)行了少許圖文注釋。

繪圖需要：
（1）在cancer樣本中的所有基因表達(dá)值（這里用vst counts）均數(shù)（y軸）陕截；
（2）在normal樣本中的所有基因表達(dá)值（這里用vst counts）均數(shù)（x軸）驳棱；
（3）上下調(diào)差異基因列表（進(jìn)行color標(biāo)識(shí)）；
（4）top/bottom3基因在cancer和normal組中vst counts的均數(shù)农曲、標(biāo)準(zhǔn)差sd（errorbar注釋）社搅。

數(shù)據(jù)整理：獲得表達(dá)均值即在圖中的坐標(biāo)后，根據(jù)差異分析的結(jié)果將散點(diǎn)分為up/no/down三類乳规。然后提取top/bottom3基因名形葬，計(jì)算sd值，并與mean值整合成新的小dataframe用于特定注釋暮的。

#draw the scatterplot with specific annotation.
vst_coordinate <- data.frame('cancer'=rowMeans(expr_vst[,1:344]), 'normal'=rowMeans(expr_vst[,345:386]), 'trend'='no', stringsAsFactors = F)
vst_coordinate[resSigUp@rownames,'trend'] <- 'up'
vst_coordinate[resSigDown@rownames,'trend'] <- 'down'
#get the top 3 up/down-regulated genes and calculate the sd values.
ressigall_df <- as.data.frame(resSigAll)
ressigall_df <- ressigall_df[order(ressigall_df$log2FoldChange, decreasing = T),]
top_genes <- rownames(ressigall_df)[c(1:3, (nrow(ressigall_df)-2):nrow(ressigall_df))]
#get the small dataframe containing sd and mean vst values of top genes.
sd_2d_calculation <- function(gene_name){
  sd_cancer <- sd(expr_vst[gene_name,1:344])
  sd_normal <- sd(expr_vst[gene_name,345:386])
  data.frame('sd_cancer'=sd_cancer,'sd_normal'=sd_normal)
}
sd_df <- do.call(rbind,lapply(top_genes, sd_2d_calculation))
sd_df <- cbind(vst_coordinate[top_genes,], sd_df)

進(jìn)行g(shù)gplot2作圖：
? 含所有基因在cancer和normal樣本間vst counts均值的數(shù)據(jù)框傳遞給ggplot函數(shù)笙以，x軸為normal，y軸為cancer冻辩，作為繪圖對象中的全局變量猖腕。

? 用stat_smooth()繼承輸入變量進(jìn)行線性回歸擬合 (method='lm')，添加擬合虛線 (linetype=2) 并可視化95%置信區(qū)間 (level=0.95)微猖。

? geom_point()函數(shù)繪制主體散點(diǎn)谈息，并補(bǔ)充參數(shù)，以trend列作為散點(diǎn)顏色分類依據(jù)凛剥。

? scale_color_manual()函數(shù)手動(dòng)設(shè)置散點(diǎn)類別對應(yīng)顏色侠仇，上調(diào)基因用firebrick3，下調(diào)基因用navy，不顯著基因用black逻炊。

? geom_errorbar()函數(shù)和geom_errorbarh()函數(shù)均不繼承全局參數(shù)互亮，而用小dataframe作為輸入數(shù)據(jù)，分別添加top genes豎直和水平方向的errorbar (誤差棒的方向需設(shè)置最大值和最小值余素，如errorbar函數(shù)為ymax, ymin豹休；errorbarh函數(shù)為xmax，xmin)桨吊，并設(shè)置bar頂端橫線的長度 (width/height) 和bar的粗細(xì) (size)威根。

? 另外調(diào)用一個(gè)geom_point()函數(shù)，輸入top genes dataframe進(jìn)行top genes的黃色點(diǎn)標(biāo)注视乐。

? 使用ggrepel包的geom_label_repel()函數(shù)洛搀，輸入top genes dataframe進(jìn)行top hits的文字注釋。（由于散點(diǎn)較多佑淀，所以使用了帶文字框的label函數(shù)留美，而不是僅進(jìn)行文字注釋的text函數(shù)）

? 使用annotate()函數(shù)進(jìn)行文字的小范圍自由注釋，需輸入好注釋的類型（如'text'）伸刃、color谎砾、size、label (注釋內(nèi)容)捧颅，以及在x/y軸分別的位置景图。

? 設(shè)置figure標(biāo)題，統(tǒng)一使用自己習(xí)慣的作圖主題碉哑，并將標(biāo)題居中症歇。

p <- ggplot(vst_coordinate, aes(x=normal, y=cancer))
#add the fitting line. level is the level of confidence interval.
p <- p+stat_smooth(method='lm', level = 0.95, linetype=2, color='black')
#add the points.
p <- p+geom_point(aes(color=trend), size=0.5)+
  scale_color_manual(values = c('no'='black','up'='firebrick3','down'='navy'))
#add error bars to top hits. widths and heights are parameters of the tips at the top of bars.
p <- p+geom_errorbar(inherit.aes = F, data = sd_df, aes(x=normal, ymin=cancer-sd_cancer, ymax=cancer+sd_cancer), width=0.3)+
  geom_errorbarh(inherit.aes = F, data = sd_df, aes(y=cancer, xmin=normal-sd_normal, xmax=normal+sd_normal), size=0.5, height=0.4)
#emphasize the top hits.
p <- p+geom_point(inherit.aes = F, data = sd_df, aes(x=normal, y=cancer), size=2.5, color='black')+
  geom_point(inherit.aes = F, data = sd_df, aes(x=normal, y=cancer), size=1.5, color='yellow')
#annotate the hits with gene names.
p <- p+geom_label_repel(data = sd_df, aes(label = rownames(sd_df)), 
                   color='black', fontface='bold', size = 3, 
                   segment.color = 'black', xlim = c(7, 20))
#add some text annotations on the upper left of the panel.
p <- p+annotate('text', label=paste0('Upregulated genes (n=', length(resSigUp@rownames),')'), color='firebrick3', x=9.6, y=19.5)+
  annotate('text', label=paste0('Downregulated genes (n=', length(resSigDown@rownames),')'), color='navy', x=10, y=18.7)+
  annotate('point', color='black', size=3, x=5.8, y=17.9)+
  annotate('point', color='yellow', size=2, x=5.8, y=17.9)+
  annotate('text', label='Top3 up/downregulated DE hits', color='black', x=10.5, y=17.9)
#Set the theme of the plot.
p <- p+theme_bw()+
  theme(panel.grid = element_line(color = 'white'), legend.position = 'none')+
  xlab('vst counts of normal samples')+ylab('vst counts of cancer samples')
p

獲得的圖片結(jié)果為：

scattorplot with specific annotations.jpg

因?yàn)槭寝D(zhuǎn)錄組mRNA-seq數(shù)據(jù)，不是原文中的較小范圍的CRISPR-screening谭梗，所以圖片沒有那么美觀忘晤。但是樣式模仿到了就好！

不知道大家是否對ggplot2的繪圖范式有了一定的認(rèn)識(shí)激捏，希望本文能有所幫助~喜歡就點(diǎn)個(gè)贊吧设塔！

最后編輯于：2020.04.28 23:08:38

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

禁止轉(zhuǎn)載，如需轉(zhuǎn)載請通過簡信或評(píng)論聯(lián)系作者远舅。

人面猴
序言：七十年代末闰蛔，一起剝皮案震驚了整個(gè)濱河市，隨后出現(xiàn)的幾起案子图柏，更是在濱河造成了極大的恐慌序六，老刑警劉巖，帶你破解...
沈念sama閱讀 206,839評(píng)論 6贊 482
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件蚤吹，死亡現(xiàn)場離奇詭異例诀，居然都是意外死亡随抠，警方通過查閱死者的電腦和手機(jī)，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 88,543評(píng)論 2贊 382
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進(jìn)店門繁涂，熙熙樓的掌柜王于貴愁眉苦臉地迎上來拱她，“玉大人，你說我怎么就攤上這事扔罪”樱” “怎么了？”我有些...
開封第一講書人閱讀 153,116評(píng)論 0贊 344
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵矿酵，是天一觀的道長唬复。經(jīng)常有香客問我，道長全肮，這世上最難降的妖魔是什么盅抚？我笑而不...
開封第一講書人閱讀 55,371評(píng)論 1贊 279
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮倔矾，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘柱锹。我一直安慰自己哪自，他們只是感情好，可當(dāng)我...
茶點(diǎn)故事閱讀 64,384評(píng)論 5贊 374
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布禁熏。她就那樣靜靜地躺著壤巷，像睡著了一般。火紅的嫁衣襯著肌膚如雪瞧毙。梳的紋絲不亂的頭發(fā)上蝶桶，一...
開封第一講書人閱讀 49,111評(píng)論 1贊 285
城市分裂傳說
那天卸留，我揣著相機(jī)與錄音，去河邊找鬼。笑死脂信，一個(gè)胖子當(dāng)著我的面吹牛，可吹牛的內(nèi)容都是我干的握玛。我是一名探鬼主播望门，決...
沈念sama閱讀 38,416評(píng)論 3贊 400
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼男图！你這毒婦竟也來了示姿？” 一聲冷哼從身側(cè)響起，我...
開封第一講書人閱讀 37,053評(píng)論 0贊 259
萬榮殺人案實(shí)錄
序言：老撾萬榮一對情侶失蹤逊笆，失蹤者是張志新（化名）和其女友劉穎栈戳，沒想到半個(gè)月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體难裆，經(jīng)...
沈念sama閱讀 43,558評(píng)論 1贊 300
?護(hù)林員之死
正文獨(dú)居荒郊野嶺守林人離奇死亡子檀，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點(diǎn)故事閱讀 36,007評(píng)論 2贊 325
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片命锄。...
茶點(diǎn)故事閱讀 38,117評(píng)論 1贊 334
活死人
序言：一個(gè)原本活蹦亂跳的男人離奇死亡堰乔，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出脐恩，到底是詐尸還是另有隱情镐侯，我是刑警寧澤，帶...
沈念sama閱讀 33,756評(píng)論 4贊 324
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布驶冒，位于F島的核電站苟翻，受9級(jí)特大地震影響，放射性物質(zhì)發(fā)生泄漏骗污。R本人自食惡果不足惜崇猫，卻給世界環(huán)境...
茶點(diǎn)故事閱讀 39,324評(píng)論 3贊 307
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望需忿。院中可真熱鬧诅炉，春花似錦、人聲如沸屋厘。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,315評(píng)論 0贊 19
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽汗洒。三九已至议纯，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間溢谤，已是汗流浹背瞻凤。一陣腳步聲響...
開封第一講書人閱讀 31,539評(píng)論 1贊 262
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留世杀，地道東北人阀参。一個(gè)月前我還...
沈念sama閱讀 45,578評(píng)論 2贊 355
代替公主和親
正文我出身青樓，卻偏偏與公主長得像瞻坝，于是被迫代替她去往敵國和親结笨。傳聞我的和親對象是個(gè)殘疾皇子，可洞房花燭夜當(dāng)晚...
茶點(diǎn)故事閱讀 42,877評(píng)論 2贊 345

測序結(jié)果概覽：基因表達(dá)量rank瀑布圖塔插，高密度表達(dá)相關(guān)性散點(diǎn)圖，注釋特定基因及errorbar的表達(dá)相關(guān)性散點(diǎn)圖繪制