最近想要可視化樣本間的相關(guān)性,但又不滿足于常規(guī)的相關(guān)性熱圖状蜗。因此嗓奢,就注意到GGally
包中的ggpairs
函數(shù),可以方便地實現(xiàn)多方面的相關(guān)性可視化燎窘。
本文僅介紹
ggpairs
在連續(xù)型變量方面的應(yīng)用摹闽。它也可以用到離散型變量的可視化上。
下面以airway
數(shù)據(jù)集進行演示:
這里我們在前4個樣本中隨機選取1000個基因進行展示
library(GGally)
# airway example
library(airway)
data(airway)
df <- as.data.frame(assays(airway)$counts[,1:4]) #first 4 columns
df <- df[rowSums(df)>4,] #keep genes with some counts
set.seed(123)
df <- df[sample.int(nrow(df),1e3),] #random 1K gene
# ggpairs default
ggpairs(log2(df+1))
ggpairs
將輸出的圖劃分為三個區(qū)域褐健,分別是左下角的lower
, 對角線的diag
, 以及右上角的upper
. 對于連續(xù)性數(shù)值變量付鹿,默認(rèn)在lower
區(qū)畫pairwise scatter plot,diag
區(qū)畫density plot铝量,upper
區(qū)展示相應(yīng)的pairwise Pearson's correaltion coefficient.
進一步倘屹,我還希望在左下角的散點圖中加入y=x
的擬合線,并在對角線的加上直方圖慢叨。我們可以通過自定義畫圖的函數(shù)實現(xiàn)這些操作纽匙。
ggscatter <- function(data, mapping, ...) {
x <- GGally::eval_data_col(data, mapping$x)
y <- GGally::eval_data_col(data, mapping$y)
df <- data.frame(x = x, y = y)
sp1 <- ggplot(df, aes(x=x, y=y)) +
geom_point() +
geom_abline(intercept = 0, slope = 1, col = 'darkred')
return(sp1)
}
ggdehist <- function(data, mapping, ...) {
x <- GGally::eval_data_col(data, mapping$x)
df <- data.frame(x = x)
dh1 <- ggplot(df, aes(x=x)) +
geom_histogram(aes(y=..density..), bins = 50, fill = 'steelblue', color='black', alpha=.4) +
geom_density(aes(y=..density..)) +
theme_minimal()
return(dh1)
}
ggpairs(log2(df+1),
lower = list(continuous = wrap(ggscatter)),
diag = list(continuous = wrap(ggdehist))) +
theme_minimal() +
theme(panel.grid = element_blank(),
panel.border = element_rect(fill=NA),
axis.text = element_text(color='black'))
再放一個高度修改的版本
# https://pascal-martin.netlify.app/post/nicer-scatterplot-in-gggally/
GGscatterPlot <- function(data, mapping, ...,
method = "pearson") {
#Get correlation coefficient
x <- GGally::eval_data_col(data, mapping$x)
y <- GGally::eval_data_col(data, mapping$y)
cor <- cor(x, y, method = method, use="pairwise.complete.obs")
#Assemble data frame
df <- data.frame(x = x, y = y)
df <- na.omit(df)
# PCA
nonNull <- x!=0 & y!=0
dfpc <- prcomp(~x+y, df[nonNull,])
df$cols <- predict(dfpc, df)[,1]
# Define the direction of color range based on PC1 orientation:
dfsum <- x+y
colDirection <- ifelse(dfsum[which.max(df$cols)] <
dfsum[which.min(df$cols)],
1,
-1)
#Get 2D density for alpha
dens2D <- MASS::kde2d(df$x, df$y)
df$density <- fields::interp.surface(dens2D ,df[,c("x", "y")])
if (any(df$density==0)) {
mini2D = min(df$density[df$density!=0]) #smallest non zero value
df$density[df$density==0] <- mini2D
}
#Prepare plot
pp <- ggplot(df, aes(x=x, y=y, alpha = 1/density, color = cols)) +
ggplot2::geom_point(shape=16, show.legend = FALSE) +
ggplot2::scale_color_viridis_c(direction = colDirection) +
ggplot2::scale_alpha(range = c(.05, .6)) +
ggplot2::geom_abline(intercept = 0, slope = 1, col="darkred") +
ggplot2::geom_label(
data = data.frame(
xlabel = min(x, na.rm = TRUE),
ylabel = max(y, na.rm = TRUE),
lab = round(cor, digits = 3)),
mapping = ggplot2::aes(x = xlabel,
y = ylabel,
label = lab),
hjust = 0, vjust = 1,
size = 3, fontface = "bold",
inherit.aes = FALSE # do not inherit anything from the ...
) +
theme_bw()
return(pp)
}
exonNumber <- elementNROWS(rowRanges(airway[rownames(df),]))
df$MoreThan15Exons <- ifelse(exonNumber>15,
">15ex", "<15ex")
df[,1:4] <- log2(df[,1:4]+1)
GGally::ggpairs(df,
1:4,
lower = list(continuous = wrap(GGscatterPlot, method="pearson")),
upper = list(continuous = wrap(ggally_cor, align_percent = 0.8),
mapping = ggplot2::aes(color = MoreThan15Exons))) +
theme_minimal() +
theme(panel.grid = element_blank(),
panel.border = element_rect(fill=NA),
axis.text = element_text(color='black'))
- 對散點圖的透明度進行調(diào)整,點越密的區(qū)域透明度越高拍谐,反之亦然烛缔。
- 散點圖的顏色代表了基因表達量。
- 在散點圖左上角添加Pearson's 相關(guān)系數(shù)
- 右上角展示了對小于15個外顯子的基因和大于15個外顯子基因的相關(guān)性的統(tǒng)計轩拨。
在我看來ggpairs
相當(dāng)于是一個ggplot2
的集成可視化方法践瓷,可以很方便的一次性展示多個方面的相關(guān)性信息。同時亡蓉,它的可定制性也很高晕翠,可以滿足許多額外的可視化需求。唯一的缺陷可能是需要耗費一定功夫?qū)懗霭b的函數(shù)砍濒。
ref
ggpairs doc: https://ggobi.github.io/ggally/articles/ggpairs.html
Nicer scatter plots in GGgally ggpairs-ggduo: https://pascal-martin.netlify.app/post/nicer-scatterplot-in-gggally
Creating a density histogram in ggplot2: https://stackoverflow.com/questions/21061653/creating-a-density-histogram-in-ggplot2