最近發(fā)現(xiàn)一個新包EnhancedVolcano
乱灵,畫火山圖令人發(fā)指的簡單,而且用戶自主設置顏色七冲、形狀痛倚、大小和陰影等參數(shù)定義不同的繪圖屬性,還可以通過添加連線的方式有效避免數(shù)據(jù)點之間的重疊現(xiàn)象澜躺,強烈分享一下:
介紹來源:
https://www.bioconductor.org/packages/release/bioc/vignettes/EnhancedVolcano/inst/doc/EnhancedVolcano.html
1 安裝
1.1 1. 來源 Bioconductor
if (!requireNamespace('BiocManager', quietly = TRUE))
install.packages('BiocManager')
BiocManager::install('EnhancedVolcano')
注:安裝開發(fā)版
devtools::install_github('kevinblighe/EnhancedVolcano')
1.2 2. 加載R包
library(EnhancedVolcano)
2 快速啟動
通過示例演示 RNA-seq workflow: gene-level exploratory analysis and differential expression. 加載 ‘a(chǎn)irway’ 數(shù)據(jù)
library(airway)
library(magrittr)
data('airway')
airway$dex %<>% relevel('untrt')
注釋Ensembl gene IDs為gene symbols:
ens <- rownames(airway)
library(org.Hs.eg.db)
symbols <- mapIds(org.Hs.eg.db, keys = ens,
column = c('SYMBOL'), keytype = 'ENSEMBL')
symbols <- symbols[!is.na(symbols)]
symbols <- symbols[match(rownames(airway), names(symbols))]
rownames(airway) <- symbols
keep <- !is.na(rownames(airway))
airway <- airway[keep,]
運行DESeq2進行兩組的差異分析
library('DESeq2')
dds <- DESeqDataSet(airway, design = ~ cell + dex)
dds <- DESeq(dds, betaPrior=FALSE)
res <- results(dds,
contrast = c('dex','trt','untrt'))
res <- lfcShrink(dds,
contrast = c('dex','trt','untrt'), res=res, type = 'normal')
2.1 最基礎的火山圖繪制
對于多數(shù)基礎火山圖而言蝉稳,僅有單個數(shù)據(jù)框、數(shù)據(jù)矩陣或三列數(shù)據(jù)結果即可苗踪,包括標簽颠区、log2FC和校正或未校正的P值,其中默認log2FC的閾值為|2|; 默認 P值閾值為10e-6.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue')
3 高級參數(shù)
實際上通铲,EnhancedVolcano圖的所有方面都可以配置為適應所有類型的統(tǒng)計分布和標簽偏好。默認情況下器贩,EnhancedVolcano只會嘗試標記那些通過統(tǒng)計顯著性閾值的基因颅夺,即“pCutoff”和“FCcutoff”朋截。此外,它只會標注盡可能多的這些可以合理地適應情節(jié)空間吧黄。用戶可以選擇提供他/她希望在繪圖中標記的標簽向量(如“selectLab”)部服。
3.1 修改log2FC和P值的截止值;指定標題拗慨;調整點和標簽大小
對于大多數(shù)研究而言廓八,10e-6的默認P值截止值可能過于寬松,因此可能需要將該閾值提高幾個數(shù)量級赵抢。同樣剧蹂,log2FC截止值可能過于嚴格,因為現(xiàn)在可以計算差異表達分析中l(wèi)og2FC差異的適度“收縮”估計烦却。
在本例中宠叼,我們還修改了點和標簽大小,這有助于提高差異表達式分析中許多變量的清晰度其爵。
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-32, ##自己定義p閾值
FCcutoff = 0.5, ##自己定義FC閾值
pointSize = 3.0,
labSize = 6.0)
4.2 調整點和字母的顏色
默認的配色方案可能不符合每個人的口味摩渺。在這里简烤,我們使得只有通過log2FC和P值閾值的變量是紅色的,其他的都是黑色的摇幻。我們還調整“alpha”的值乐埠,它控制打印點的透明度:1=100%不透明;0=100%透明囚企。
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
col=c('black', 'black', 'black', 'red3'),
colAlpha = 1)
3.3 調整點的形狀
也可以繪制不同的點作為不同的形狀丈咐。默認形狀是圓。用戶可以通過“shape”參數(shù)指定自己的形狀編碼龙宏,該參數(shù)接受單個或四個可能的值:如果是四個值棵逊,則這些值映射到同樣由顏色指定的標準名稱;如果只有一個值银酗,則所有點都將使用該值進行變化辆影。
更多形狀信息見 ggplot2 Quick Reference: shape
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 4.0,
labSize = 6.0,
shape = 8,
colAlpha = 1)
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
title = 'N061011 versus N61311',
pCutoff = 10e-16,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
shape = c(1, 4, 23, 25),
colAlpha = 1)
3.4 Adjust cut-off lines and add extra threshold lines
The lines that are drawn to indicate cut-off points are also modifiable. The parameter ‘cutoffLineType’ accepts the following values: “blank”, “solid”, “dashed”, “dotted”, “dotdash”, “l(fā)ongdash”, and “twodash”. The colour and thickness of these can also be modified with ‘cutoffLineCol’ and ‘cutoffLineWidth’. To disable the lines, set either cutoffLineType=“blank” or cutoffLineWidth=0.
Extra lines can also be added via ‘hline’ and ‘vline’ to display other cut-offs.
To make these more visible, we will also remove the default gridlines.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
xlim = c(-6, 6),
title = 'N061011 versus N61311',
pCutoff = 10e-12,
FCcutoff = 1.5,
pointSize = 3.0,
labSize = 6.0,
colAlpha = 1,
cutoffLineType = 'blank',
cutoffLineCol = 'black',
cutoffLineWidth = 0.8,
hline = c(10e-20,
10e-20 * 10e-30,
10e-20 * 10e-60,
10e-20 * 10e-90),
hlineCol = c('pink', 'red', 'red2', 'red4'),
hlineType = c('solid', 'longdash', 'dotdash', 'dotted'),
hlineWidth = c(4, 3, 2, 1),
gridlines.major = FALSE,
gridlines.minor = FALSE)
Adjust cut-off lines and add extra threshold lines.
4.5 Adjust legend position, size, and text
The position of the legend can also be changed to “l(fā)eft” or “right” (and stacked vertically), or ‘top’ or “bottom” (stacked horizontally). The legend text, label size, and icon size can also be modified.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
pCutoff = 10e-12,
FCcutoff = 1.5,
cutoffLineType = 'twodash',
cutoffLineWidth = 0.8,
pointSize = 4.0,
labSize = 6.0,
colAlpha = 1,
legendLabels=c('Not sig.','Log (base 2) FC','p-value',
'p-value & Log (base 2) FC'),
legendPosition = 'right',
legendLabSize = 16,
legendIconSize = 5.0)
Adjust legend position, size, and text.
Note: to make the legend completely invisible, specify:
legendPosition = 'none'
4.6 Fit more labels by adding connectors
In order to maximise free space in the plot window, one can fit more labels by adding connectors from labels to points, where appropriate. The width and colour of these connectors can also be modified with ‘widthConnectors’ and ‘colConnectors’, respectively. Further configuration is achievable via ‘typeConnectors’ (“open”, “closed”), ‘endsConnectors’ (“l(fā)ast”, “first”, “both”), and lengthConnectors (default = unit(0.01, ‘npc’)).
The result may not always be desirable as it can make the plot look overcrowded.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-32,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 12,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 0.75)
Fit more labels by adding connectors.
4.7 Only label key variables
In many situations, people may only wish to label their key variables / variables of interest. One can therefore supply a vector of these variables via the ‘selectLab’ parameter, the contents of which have to also be present in the vector passed to ‘lab’. In addition, only those variables that pass both the cutoff for log2FC and P value will be labelled.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c('TMEM176B','ADH1A'),
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-14,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
shape = c(4, 35, 17, 18),
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 14,
legendIconSize = 5.0)
4.8 Draw labels in boxes
To improve label clarity, we can draw simple boxes around the plots labels. This works much better when drawConnectors is also TRUE.
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c('VCAM1','KCTD12','ADAM12',
'CXCL12','CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA'),
xlab = bquote(~Log[2]~ 'fold change'),
pCutoff = 10e-14,
FCcutoff = 2.0,
pointSize = 4.0,
labSize = 6.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
colAlpha = 4/5,
legendPosition = 'right',
legendLabSize = 14,
legendIconSize = 4.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black')
Draw labels in boxes.
4.9 Over-ride colouring scheme with custom key-value pairs
In certain situations, one may wish to over-ride the default colour scheme with their own colour-scheme, such as colouring variables by pathway, cell-type or group. This can be achieved by supplying a named vector as ‘colCustom’.
In this example, we just wish to colour all variables with log2FC > 2.5 as ‘high’ and those with log2FC < -2.5 as ‘low’.
# create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
# this can be achieved with nested ifelse statements
keyvals <- ifelse(
res$log2FoldChange < -2.5, 'royalblue',
ifelse(res$log2FoldChange > 2.5, 'gold',
'black'))
keyvals[is.na(keyvals)] <- 'black'
names(keyvals)[keyvals == 'gold'] <- 'high'
names(keyvals)[keyvals == 'black'] <- 'mid'
names(keyvals)[keyvals == 'royalblue'] <- 'low'
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom colour over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 3.5,
labSize = 4.5,
shape = c(6, 4, 2, 11),
colCustom = keyvals,
colAlpha = 1,
legendPosition = 'left',
legendLabSize = 15,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black',
arrowheads = FALSE,
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black')
Over-ride colouring scheme with custom key-value pairs.
4.10 Over-ride colour and/or shape scheme with custom key-value pairs
In this example, we first over-ride the existing shape scheme and then both the colour and shape scheme at the same time.
# define different cell-types that will be shaded
celltype1 <- c('VCAM1','KCTD12','ADAM12','CXCL12')
celltype2 <- c('CACNB2','SPARCL1','DUSP1','SAMHD1','MAOA')
# create custom key-value pairs for different cell-types
# this can be achieved with nested ifelse statements
keyvals.shape <- ifelse(
rownames(res) %in% celltype1, 17,
ifelse(rownames(res) %in% celltype2, 64,
3))
keyvals.shape[is.na(keyvals.shape)] <- 3
names(keyvals.shape)[keyvals.shape == 3] <- 'PBMC'
names(keyvals.shape)[keyvals.shape == 17] <- 'Cell-type 1'
names(keyvals.shape)[keyvals.shape == 64] <- 'Cell-type 2'
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('high', 'low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom shape over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 4.5,
labSize = 4.5,
shapeCustom = keyvals.shape,
colCustom = NULL,
colAlpha = 1,
legendLabSize = 15,
legendPosition = 'left',
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 0.5,
colConnectors = 'grey50',
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'partial',
borderWidth = 1.5,
borderColour = 'black')
# create custom key-value pairs for 'high', 'low', 'mid' expression by fold-change
# this can be achieved with nested ifelse statements
keyvals.colour <- ifelse(
res$log2FoldChange < -2.5, 'royalblue',
ifelse(res$log2FoldChange > 2.5, 'gold',
'black'))
keyvals.colour[is.na(keyvals.colour)] <- 'black'
names(keyvals.colour)[keyvals.colour == 'gold'] <- 'high'
names(keyvals.colour)[keyvals.colour == 'black'] <- 'mid'
names(keyvals.colour)[keyvals.colour == 'royalblue'] <- 'low'
p2 <- EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = rownames(res)[which(names(keyvals) %in% c('High', 'Low'))],
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Custom shape & colour over-ride',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 5.5,
labSize = 0.0,
shapeCustom = keyvals.shape,
colCustom = keyvals.colour,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 15,
legendIconSize = 5.0,
drawConnectors = TRUE,
widthConnectors = 0.5,
colConnectors = 'grey50',
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'full',
borderWidth = 1.0,
borderColour = 'black')
library(gridExtra)
library(grid)
grid.arrange(p1, p2,
ncol=2,
top = textGrob('EnhancedVolcano',
just = c('center'),
gp = gpar(fontsize = 32)))
Over-ride colour and/or shape scheme with custom key-value pairs.
4.11 Encircle / highlight certain variables
In this example we add an extra level of identifying key variables by encircling them.
This feature works best for shading just 1 or 2 key variables. It is expected that the user can use the ‘shapeCustom’ parameter for more in depth identification of different types of variables.
# define different cell-types that will be shaded
celltype1 <- c('VCAM1','CXCL12')
celltype2 <- c('SORT1', 'KLF15')
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
selectLab = c(celltype1, celltype2),
xlab = bquote(~Log[2]~ 'fold change'),
title = 'Shading cell-type 1|2',
pCutoff = 10e-14,
FCcutoff = 1.0,
pointSize = 8.0,
labSize = 6.0,
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
shape = 42,
colCustom = keyvals,
colAlpha = 1,
legendPosition = 'right',
legendLabSize = 20,
legendIconSize = 20.0,
# encircle
encircle = celltype1,
encircleCol = 'black',
encircleSize = 2.5,
encircleFill = 'pink',
encircleAlpha = 1/2,
# shade
shade = celltype2,
shadeAlpha = 1/2,
shadeFill = 'skyblue',
shadeSize = 1,
shadeBins = 5,
drawConnectors = TRUE,
widthConnectors = 2.0,
gridlines.major = TRUE,
gridlines.minor = FALSE,
border = 'full',
borderWidth = 5,
borderColour = 'black')
Shade certain variables.
4.12 Highlighting key variables via custom point sizes
One can also supply a vector of sizes to pointSize for the purpose of having a different size for each poin. For example, if we want to change the size of just those variables with log2FC>2:
library("pasilla")
pasCts <- system.file("extdata", "pasilla_gene_counts.tsv",
package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata", "pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
rownames(coldata) <- sub("fb", "", rownames(coldata))
cts <- cts[, rownames(coldata)]
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ condition)
featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)
dds <- DESeq(dds)
res <- results(dds)
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 10e-4,
FCcutoff = 2,
ylim = c(0, -log10(10e-12)),
pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
labSize = 6.0,
shape = c(6, 6, 19, 16),
title = "DESeq2 results",
subtitle = "Differential expression",
caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
legendPosition = "right",
legendLabSize = 14,
col = c("grey30", "forestgreen", "royalblue", "red2"),
colAlpha = 0.9,
drawConnectors = TRUE,
hline = c(10e-8),
widthConnectors = 0.5)
p1
Highlighting key variabvles via custom point sizes.
4.13 Change to continuous colour scheme
We can over-ride the default ‘discrete’ colour scheme with a continuous one that shades between 2 colours based on nominal or adjusted p-value, whichever is selected by y, via colGradient:
p1 <- EnhancedVolcano(res,
lab = rownames(res),
x = "log2FoldChange",
y = "pvalue",
pCutoff = 10e-4,
FCcutoff = 2,
ylim = c(0, -log10(10e-12)),
pointSize = c(ifelse(res$log2FoldChange>2, 8, 1)),
labSize = 6.0,
shape = c(6, 6, 19, 16),
title = "DESeq2 results",
subtitle = "Differential expression",
caption = bquote(~Log[2]~ "fold change cutoff, 2; p-value cutoff, 10e-4"),
legendPosition = "right",
legendLabSize = 14,
colAlpha = 0.9,
colGradient = c('red3', 'royalblue'),
drawConnectors = TRUE,
hline = c(10e-8),
widthConnectors = 0.5)
p1
Highlighting key variabvles via custom point sizes.
4.14 Custom axis tick marks
Custom axis ticks can be added in a ‘plug and play’ fashion via ggplot2 functionality, as follows:
p1 +
ggplot2::coord_cartesian(xlim=c(-6, 6)) +
ggplot2::scale_x_continuous(
breaks=seq(-6,6, 1))
Custom axis tick marks
More information on this can be found here: http://www.sthda.com/english/wiki/ggplot2-axis-ticks-a-guide-to-customize-tick-marks-and-labels