ComplexHeatmap復雜熱圖繪制學習——8.upsetplot

upset-plot

UpSet與傳統(tǒng)方法（即維恩圖）相比抖单，UpSet 圖提供了一種可視化多個集合的交集的有效方法。通過R中的UpSetR 包中實現(xiàn)溺忧。在這里聂宾，我們使用ComplexHeatmap 包重新實現(xiàn)了 UpSet 圖，并進行了一些改進。

8.1 輸入數(shù)據

為了表示多個集合，變量可以表示為：

一個集合列表耘纱，其中每個集合都是一個向量，例如：

list(set1 = c("a", "b", "c"),
     set2 = c("b", "c", "d", "e"),
     ...)

一個二進制矩陣/數(shù)據框怕敬，其中行是元素揣炕，列是集合帘皿，例如：

  set1 set2 set3
h    1    1    1
t    1    0    1
j    1    0    0
u    1    0    1
w    1    0    0
...

例如东跪，在矩陣中的t行表示：t在集合set1 中，不在集合set2 中鹰溜，在集合set3 中虽填。（只有在該矩陣是邏輯矩陣時才有效）

如果變量是數(shù)據框，則只使用二進制列（僅包含 0 和 1）和邏輯列曹动。

兩種格式都可以用于制作 UpSet 圖斋日，用戶仍然可以使用 list_to_matrix()從列表到二進制矩陣的轉換。

lt = list(set1 = c("a", "b", "c"),
          set2 = c("b", "c", "d", "e"))
list_to_matrix(lt)

##   set1 set2
## a    1    0
## b    1    1
## c    1    1
## d    0    1
## e    0    1

您還可以在list_to_matrix()下位置設置通用集：

list_to_matrix(lt, universal = letters[1:10])

##   set1 set2
## a    1    0
## b    1    1
## c    1    1
## d    0    1
## e    0    1
## f    0    0
## g    0    0
## h    0    0
## i    0    0
## j    0    0

如果全集沒有完全覆蓋輸入集墓陈，那些不在全集中的元素將被刪除：

list_to_matrix(lt, universal = letters[1:4])

##   set1 set2
## a    1    0
## b    1    1
## c    1    1
## d    0    1

該集合可以是基因組區(qū)間恶守，那么它只能表示為GRanges/IRanges對象的列表第献。

list(set1 = GRanges(...),
     set2 = GRanges(...),
     ...)

8.2 upset模式

例如，對于三個集合（A兔港，B庸毫，C），選擇在或不在集合中的元素的所有組合編碼如下：

1 表示選擇該集合衫樊，0 表示不選擇該集合飒赃。例如，1 1 0意味著選擇集合 A科侈、B 而不選擇集合 C载佳。注意沒有0 0 0，因為這里的背景集合不感興趣臀栈。在本節(jié)的以下部分蔫慧，我們將A、B和C稱為集合权薯，將每個組合稱為組合集藕漱。整個二元矩陣稱為組合矩陣。

UpSet 圖將每個組合集的大小可視化崭闲。有了每個組合集的二進制代碼肋联，接下來我們需要定義如何計算該組合集的大小。共有三種模式：

distinct模式： 1 表示在該集合中刁俭，0 表示不在該集合中橄仍，然后1 1 0表示A和B是集合元素，而C不是集合中的元素( setdiff(intersect(A, B), C)) 牍戚。在這種模式下侮繁，七個組合集就可以看成維恩圖中的七個分區(qū)，它們是相互排斥的如孝。
intersect模式: 1 表示在該集合中宪哩，不考慮0，然后1 1 0表示A和B是集合元素第晰，它們也可以在或不在C中( intersect(A, B))锁孟。在此模式下，七個組合集可以重疊茁瘦。
union模式: 1 表示在該集合中品抽，不考慮0。當有多個1時甜熔，關系為OR圆恤。然后，1 1 0表示A或B集合中的元素腔稀，它們也可以在或不在 C ( union(A, B)) 中盆昙。在此模式下羽历，七個組合集可以重疊。

三種模式如下圖所示：

image

8.3 生成組合矩陣

該make_comb_mat()函數(shù)生成組合矩陣并計算集合和組合集合的大小淡喜。輸入可以是單個變量或名稱-值對：

set.seed(123)
lt = list(a = sample(letters, 5),
          b = sample(letters, 10),
          c = sample(letters, 15))
m1 = make_comb_mat(lt)
m1

## A combination matrix with 3 sets and 7 combinations.
##   ranges of combination set size: c(1, 8).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
##   x      100    1
##     x    010    3
##       x  001    8
## 
## Sets are:
##   set size
##     a    5
##     b   10
##     c   15

m2 = make_comb_mat(a = lt$a, b = lt$b, c = lt$c)
m3 = make_comb_mat(list_to_matrix(lt))

m1窄陡，m2和m3結果是相同的。

模式由mode參數(shù)控制：

m1 = make_comb_mat(lt) # the default mode is `distinct`
m2 = make_comb_mat(lt, mode = "intersect")
m3 = make_comb_mat(lt, mode = "union")

不同模式下的 UpSet 圖將在后面演示拆火。

當集合過多時跳夭，可以通過集合大小對集合進行預過濾（min_set_size和top_n_sets）。min_set_size 控制集合的最小大小们镜，top_n_sets控制具有最大大小的頂部集合的數(shù)量币叹。

m1 = make_comb_mat(lt, min_set_size = 6)
m2 = make_comb_mat(lt, top_n_sets = 2)

集合的子集會影響組合集大小的計算，這就是為什么需要在組合矩陣生成步驟對其進行控制模狭。組合集的子集可以直接通過對矩陣進行子集來進行：

m = make_comb_mat(lt)
m[1:4]

## A combination matrix with 3 sets and 4 combinations.
##   ranges of combination set size: c(1, 4).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
## 
## Sets are:
##   set size
##     a    5
##     b   10
##     c   15

make_comb_mat() 還允許指定全集颈抚，以便還考慮包含不屬于任何集合的元素的補集。

m = make_comb_mat(lt, universal_set = letters)
m

## A combination matrix with 3 sets and 8 combinations.
##   ranges of combination set size: c(1, 8).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
##   x      100    1
##     x    010    3
##       x  001    8
##          000    6
## 
## Sets are:
##          set size
##            a    5
##            b   10
##            c   15
##   complement    6

全集可以小于所有集合的并集嚼鹉，那么對于每個集合贩汉，只考慮與全集的交集。

m = make_comb_mat(lt, universal_set = letters[1:10])
m

## A combination matrix with 3 sets and 5 combinations.
##   ranges of combination set size: c(1, 3).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x    110    1
##   x   x  101    1
##     x x  011    2
##       x  001    3
##          000    3
## 
## Sets are:
##          set size
##            a    2
##            b    3
##            c    6
##   complement    3

如果您已經知道補碼的大小锚赤，則可以直接設置 complement_size參數(shù)匹舞。

m = make_comb_mat(lt, complement_size = 5)
m

## A combination matrix with 3 sets and 8 combinations.
##   ranges of combination set size: c(1, 8).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
##   x      100    1
##     x    010    3
##       x  001    8
##          000    5
## 
## Sets are:
##          set size
##            a    5
##            b   10
##            c   15
##   complement    5

當輸入的矩陣不屬于任何集合的元素時，這些元素被視為補集线脚。

x = list_to_matrix(lt, universal_set = letters)
m = make_comb_mat(x)
m

## A combination matrix with 3 sets and 8 combinations.
##   ranges of combination set size: c(1, 8).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
##   x      100    1
##     x    010    3
##       x  001    8
##          000    6
## 
## Sets are:
##          set size
##            a    5
##            b   10
##            c   15
##   complement    6

接下來我們演示第二個示例赐稽，其中集合是基因組區(qū)域。 當集合是基因組區(qū)域時浑侥，大小計算為每個集合中區(qū)域寬度的總和（也就是指堿基對的總數(shù)）姊舵。

library(circlize)
library(GenomicRanges)
lt2 = lapply(1:4, function(i) generateRandomBed())
lt2 = lapply(lt2, function(df) GRanges(seqnames = df[, 1], 
    ranges = IRanges(df[, 2], df[, 3])))
names(lt2) = letters[1:4]
m2 = make_comb_mat(lt2)
m2

## A combination matrix with 4 sets and 15 combinations.
##   ranges of combination set size: c(184941701, 199900416).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Top 8 combination sets are:
##   a b c d code      size
##       x x 0011 199900416
##   x       1000 199756519
##   x   x x 1011 198735008
##   x x x x 1111 197341532
##   x x x   1110 197137160
##   x x   x 1101 194569926
##   x     x 1001 194462988
##   x   x   1010 192670258
## 
## Sets are:
##   set       size
##     a 1566783009
##     b 1535968265
##     c 1560549760
##     d 1552480645

我們不建議將兩組基因組區(qū)域的交集用于區(qū)域數(shù)。有兩個原因：
1. 取值不對稱寓落，即set1中測得的相交區(qū)域數(shù)并不總是與set2中測得的相交區(qū)域數(shù)相同括丁，因此很難為set1和 set2之間的交集賦值；
2. 如果 set1 中的一個長區(qū)域與 set2 中的另一個長區(qū)域重疊伶选，但只有幾個堿基對史飞，那么說這兩個區(qū)域在兩組中是常見的是否有意義？

通用集也適用于作為基因組區(qū)域的集合考蕾。

8.4 upset實用功能

make_comb_mat()返回一個矩陣祸憋，也在comb_mat類中会宪。有一些實用函數(shù)可以應用于這個comb_mat對象：

set_name(): 集合名稱肖卧。
comb_name(): 組合集名稱。組合集的名稱被格式化為一串二進制位掸鹅。例如對于三組A , B , C塞帐，名稱為“101”的組合集合對應于選擇集合 A拦赠，不選擇集合B和選擇集合C。
set_size(): 設置的大小葵姥。
comb_size()：組合套裝尺寸荷鼠。
comb_degree()：組合集的度數(shù)是選擇的集數(shù)。
t()：轉置組合矩陣榔幸。默認情況下make_comb_mat() 生成一個矩陣允乐，其中集合在行上，組合集在列上削咆，它們在 UpSet 圖上也是如此牍疏。通過對組合矩陣進行轉置，可以在 UpSet 圖上切換集合和組合集合的位置拨齐。
extract_comb()：提取指定組合集中的元素鳞陨。用法將在后面解釋。
用于對矩陣進行子集化的函數(shù)瞻惋。

快速示例是：

m = make_comb_mat(lt)
set_name(m)

## [1] "a" "b" "c"

comb_name(m)

## [1] "111" "110" "101" "011" "100" "010" "001"

set_size(m)

##  a  b  c 
##  5 10 15

comb_size(m)

## 111 110 101 011 100 010 001 
##   2   1   1   4   1   3   8

comb_degree(m)

## 111 110 101 011 100 010 001 
##   3   2   2   2   1   1   1

t(m)

## A combination matrix with 3 sets and 7 combinations.
##   ranges of combination set size: c(1, 8).
##   mode for the combination size: distinct.
##   sets are on columns
## 
## Combination sets are:
##   a b c code size
##   x x x  111    2
##   x x    110    1
##   x   x  101    1
##     x x  011    4
##   x      100    1
##     x    010    3
##       x  001    8
## 
## Sets are:
##   set size
##     a    5
##     b   10
##     c   15

對于extract_comb()的使用厦滤，有效的組合集名稱應該是comb_name()。請注意歼狼，組合集中的元素取決于 make_comb_mat()中設置的“mode”掏导。

extract_comb(m, "101")

## [1] "j"

以及作為基因組區(qū)域的集合的示例：

# `lt2` was generated in the previous section 
m2 = make_comb_mat(lt2)
set_size(m2)

##          a          b          c          d 
## 1566783009 1535968265 1560549760 1552480645

comb_size(m2)

##      1111      1110      1101      1011      0111      1100      1010      1001 
## 197341532 197137160 194569926 198735008 191312455 192109618 192670258 194462988 
##      0110      0101      0011      1000      0100      0010      0001 
## 191359036 184941701 199900416 199756519 187196837 192093895 191216619

現(xiàn)在extract_comb()返回相應組合集中的基因組區(qū)域。

extract_comb(m2, "1010")

## GRanges object with 5063 ranges and 0 metadata columns:
##          seqnames            ranges strand
##             <Rle>         <IRanges>  <Rle>
##      [1]     chr1     255644-258083      *
##      [2]     chr1     306114-308971      *
##      [3]     chr1   1267493-1360170      *
##      [4]     chr1   2661311-2665736      *
##      [5]     chr1   3020553-3030645      *
##      ...      ...               ...    ...
##   [5059]     chrY 56286079-56286864      *
##   [5060]     chrY 57049541-57078332      *
##   [5061]     chrY 58691055-58699756      *
##   [5062]     chrY 58705675-58716954      *
##   [5063]     chrY 58765097-58776696      *
##   -------
##   seqinfo: 24 sequences from an unspecified genome; no seqlengths

使用comb_size()和comb_degree()羽峰，我們可以將組合矩陣過濾為：

m = make_comb_mat(lt)
# combination set size >= 4
m[comb_size(m) >= 4]

## A combination matrix with 3 sets and 2 combinations.
##   ranges of combination set size: c(4, 8).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##     x x  011    4
##       x  001    8
## 
## Sets are:
##   set size
##     a    5
##     b   10
##     c   15

# combination set degree == 2
m[comb_degree(m) == 2]

## A combination matrix with 3 sets and 3 combinations.
##   ranges of combination set size: c(1, 4).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Combination sets are:
##   a b c code size
##   x x    110    1
##   x   x  101    1
##     x x  011    4
## 
## Sets are:
##   set size
##     a    5
##     b   10
##     c   15

對于補集碘菜，這個特殊組合集的名稱僅由零組成。

m2 = make_comb_mat(lt, universal_set = letters)
comb_name(m2) # see the first element

## [1] "111" "110" "101" "011" "100" "010" "001" "000"

comb_degree(m2)

## 111 110 101 011 100 010 001 000 
##   3   2   2   2   1   1   1   0

如果在make_comb_mat()中設置universal_set限寞，extract_comb()則可以應用于補集忍啸。

m2 = make_comb_mat(lt, universal_set = letters)
extract_comb(m2, "000")

## [1] "a" "b" "f" "p" "u" "z"

m2 = make_comb_mat(lt, universal_set = letters[1:10])
extract_comb(m2, "000")

## [1] "a" "b" "f"

當設置universal_set，extract_comb()也適用于基因組區(qū)域集履植。

在前面的例子中计雌，我們演示了使用“一維索引”，例如：

m[comb_degree(m) == 2]

由于組合矩陣本質上是一個矩陣玫霎，因此索引也可以應用于兩個維度凿滤。在默認設置中，集合在行上庶近，組合集在列上翁脆，因此，矩陣第一維上的索引對應于集合鼻种，第二維上的索引對應于組合集：

# by set names
m[c("a", "b", "c"), ]
# by nummeric indicies
m[3:1, ]

可以通過以下方式將新的空集添加到組合矩陣中：

# `d` is the new empty set
m[c("a", "b", "c", "d"), ]

注意當指定的索引沒有覆蓋原始組合矩陣中的所有非空集合時反番，會重新計算組合矩陣，因為它會影響組合集合中的值：

# if `c` is a non-empty set
m[c("a", "b"),]

與組合集對應的第二維上的子集類似：

# reorder
m[, 5:1]
# take a subset
m[, 1:3]
# by charater indices
m[, c("110", "101", "011")]

也可以通過設置字符索引來添加新的空組合集：

m[m, c(comb_name(m), "100")]

只有當集合索引覆蓋所有非空集合時，才能同時在兩個維度上設置索引：

m[3:1, 5:1]
# this will throw an error because `c` is a non-empty set
m[c("a", "b"), 5:1]

如果組合矩陣進行了轉置罢缸，則需要切換矩陣的集索引和組合集索引的邊距篙贸。

tm = t(m)
tm[reverse(comb_name(tm)), reverse(set_name(tm))]

如果僅將組合集的索引設置為一維，則它會自動適用于轉置或未轉置的兩個矩陣：

m[1:5]
tm[1:5]

8.5 生成upset圖

生成 UpSet 圖非常簡單枫疆，用戶只需將組合矩陣發(fā)送到UpSet()函數(shù)即可：

m = make_comb_mat(lt)
UpSet(m)

image

默認情況下爵川，集合按大小排序，組合集合按度數(shù)（選擇的集合數(shù)）排序息楔。

訂單由set_order和控制comb_order：

UpSet(m, set_order = c("a", "b", "c"), comb_order = order(comb_size(m)))

image

點的顏色、點的大小和線段的線寬由pt_size值依、comb_col和控制 lwd兔甘。comb_col是組合集對應的向量。在下面的代碼中鳞滨，由于comb_degree(m)返回一個整數(shù)向量洞焙，我們只將它用作顏色向量的索引。

UpSet(m, pt_size = unit(5, "mm"), lwd = 3,
    comb_col = c("red", "blue", "black")[comb_degree(m)])

image

背景顏色（代表集合的矩形和圓點沒有被選中）由bg_col拯啦、bg_pt_col控制澡匪。bg_col 的長度可以是1或2。

UpSet(m, comb_col = "#0000FF", bg_col = "#F0F0FF", bg_pt_col = "#CCCCFF")

image

UpSet(m, comb_col = "#0000FF", bg_col = c("#F0F0FF", "#FFF0F0"), bg_pt_col = "#CCCCFF")

image

組合矩陣轉置將集合切換為列褒链，將組合集合切換為行唁情。

UpSet(t(m))

image

正如我們所介紹的，如果對組合集進行子集化甫匹，也可以將矩陣的子集可視化：

UpSet(m[comb_size(m) >= 4])
UpSet(m[comb_degree(m) == 2])

image

以下比較了make_comb_mat()中的不同模式：

m1 = make_comb_mat(lt) # the default mode is `distinct`
m2 = make_comb_mat(lt, mode = "intersect")
m3 = make_comb_mat(lt, mode = "union")
UpSet(m1)
UpSet(m2)
UpSet(m3)

image

對于包含補集的圖甸鸟，有一個額外的列顯示此補集不與任何集重疊（所有點均為灰色）。

m2 = make_comb_mat(lt, universal_set = letters)
UpSet(m2)

image

請記住兵迅，如果您已經知道補集的大小抢韭，則可以直接通過make_comb_mat()中的complement_size參數(shù)分配它。

m2 = make_comb_mat(lt, complement_size = 10)
UpSet(m2)

image

對于全集小于所有集合的并集的情況：

m2 = make_comb_mat(lt, universal_set = letters[1:10])
UpSet(m2)

image

在某些情況下恍箭，您可能有補集但不想顯示它刻恭，尤其是當輸入為make_comb_mat()已包含補集的矩陣時，您可以按組合度進行過濾扯夭。

x = list_to_matrix(lt, universal_set = letters)
m2 = make_comb_mat(x)
m2 = m2[comb_degree(m2) > 0]
UpSet(m2)

image

8.6 UpSet 圖作為熱圖

在 UpSet 圖中鳍贾，主要成分是組合矩陣，兩側是表示集合大小和組合集合的條形圖交洗，因此骑科，將其實現(xiàn)為“熱圖”是非常簡單的，其中熱圖是用點和段定義构拳，兩個條形圖是由anno_barplot().

默認的頂部注釋是：

HeatmapAnnotation("Intersection\nsize" = anno_barplot(comb_size(m), 
        border = FALSE, gp = gpar(fill = "black"), height = unit(3, "cm")), 
    annotation_name_side = "left", annotation_name_rot = 0)

此頂部注釋被包裹在upset_top_annotation()中咆爽，其中僅包含翻轉頂部條形圖注釋梁棠。大多數(shù)參數(shù) upset_top_annotation()直接轉到anno_barplot()，例如設置條形的顏色：

UpSet(m, top_annotation = upset_top_annotation(m, 
    gp = gpar(col = comb_degree(m))))

image

控制數(shù)據范圍和軸：

UpSet(m, top_annotation = upset_top_annotation(m, 
    ylim = c(0, 15),
    bar_width = 1,
    axis_param = list(side = "right", at = c(0, 5, 10, 15),
        labels = c("zero", "five", "ten", "fifteen"))))

image

控制注釋名稱：

UpSet(m, top_annotation = upset_top_annotation(m, 
    annotation_name_rot = 90,
    annotation_name_side = "right",
    axis_param = list(side = "right")))

image

右注釋的設置非常相似：

UpSet(m, right_annotation = upset_right_annotation(m, 
    ylim = c(0, 30),
    gp = gpar(fill = "green"),
    annotation_name_side = "top",
    axis_param = list(side = "top")))

image

upset_top_annotation()和upset_right_annotation()可以自動識別集合是在行上還是列上。

upset_top_annotation()和upset_right_annotation()只包含一個條形圖注釋。如果用戶想要添加更多的注釋阎毅，則需要手動構造一個HeatmapAnnotation具有多個注釋的對象靴迫。

要在頂部添加更多注釋：

UpSet(m, top_annotation = HeatmapAnnotation(
    degree = as.character(comb_degree(m)),
    "Intersection\nsize" = anno_barplot(comb_size(m), 
        border = FALSE, 
        gp = gpar(fill = "black"), 
        height = unit(2, "cm")
    ), 
    annotation_name_side = "left", 
    annotation_name_rot = 0))

image

要在右側添加更多注釋：

UpSet(m, right_annotation = rowAnnotation(
    "Set size" = anno_barplot(set_size(m), 
        border = FALSE, 
        gp = gpar(fill = "black"), 
        width = unit(2, "cm")
    ),
    group = c("group1", "group1", "group2")))

image

將右側注釋移動到組合矩陣的左側，請使用upset_left_annotation()：

UpSet(m, left_annotation = upset_left_annotation(m))

image

在條形頂部添加數(shù)字：

UpSet(m, top_annotation = upset_top_annotation(m, add_numbers = TRUE),
    right_annotation = upset_right_annotation(m, add_numbers = TRUE))

image

返回的對象UpSet()實際上是一個Heatmap類對象盐碱，因此把兔，您可以通過+或%v%將其添加到其他熱圖和注釋中。

ht = UpSet(m)
class(ht)

## [1] "Heatmap"
## attr(,"package")
## [1] "ComplexHeatmap"

ht + Heatmap(1:3, name = "foo", width = unit(5, "mm")) + 
    rowAnnotation(bar = anno_points(1:3))

image

ht %v% Heatmap(rbind(1:7), name = "foo", row_names_side = "left", 
        height = unit(5, "mm")) %v% 
    HeatmapAnnotation(bar = anno_points(1:7),
        annotation_name_side = "left")

image

添加多個 UpSet 圖：

m1 = make_comb_mat(lt, mode = "distinct")
m2 = make_comb_mat(lt, mode = "intersect")
m3 = make_comb_mat(lt, mode = "union")
UpSet(m1, row_title = "distinct mode") %v%
    UpSet(m2, row_title = "intersect mode") %v%
    UpSet(m3, row_title = "union mode")

image

或者先將所有組合矩陣轉置瓮顽，然后水平相加：

m1 = make_comb_mat(lt, mode = "distinct")
m2 = make_comb_mat(lt, mode = "intersect")
m3 = make_comb_mat(lt, mode = "union")
UpSet(t(m1), column_title = "distinct mode") +
    UpSet(t(m2), column_title = "intersect mode") +
    UpSet(t(m3), column_title = "union mode")

image

三個組合矩陣實際上是相同的县好，將它們繪制三次是多余的。借助ComplexHeatmap包中的功能暖混，我們可以直接添加三個條形圖注釋缕贡。

top_ha = HeatmapAnnotation(
    "distict" = anno_barplot(comb_size(m1), 
        gp = gpar(fill = "black"), height = unit(2, "cm")), 
    "intersect" = anno_barplot(comb_size(m2), 
        gp = gpar(fill = "black"), height = unit(2, "cm")), 
    "union" = anno_barplot(comb_size(m3), 
        gp = gpar(fill = "black"), height = unit(2, "cm")), 
    gap = unit(2, "mm"), annotation_name_side = "left", annotation_name_rot = 0)
# the same for using m2 or m3
UpSet(m1, top_annotation = top_ha)

image

組合矩陣轉置時類似：

right_ha = rowAnnotation(
    "distict" = anno_barplot(comb_size(m1), 
        gp = gpar(fill = "black"), width = unit(2, "cm")), 
    "intersect" = anno_barplot(comb_size(m2), 
        gp = gpar(fill = "black"), width = unit(2, "cm")), 
    "union" = anno_barplot(comb_size(m3), 
        gp = gpar(fill = "black"), width = unit(2, "cm")), 
    gap = unit(2, "mm"), annotation_name_side = "bottom")
# the same for using m2 or m3
UpSet(t(m1), right_annotation = right_ha)

image

初始 UpSet 實現(xiàn)，組合集大小也繪制在條形圖的頂部拣播。這里我們不直接支持晾咪，但是可以通過decorate_annotation()函數(shù)手動添加尺寸。請參閱以下示例：

ht = draw(UpSet(m))
od = column_order(ht)
cs = comb_size(m)
decorate_annotation("intersection_size", {
    grid.text(cs[od], x = seq_along(cs), y = unit(cs[od], "native") + unit(2, "pt"), 
        default.units = "native", just = "bottom", gp = gpar(fontsize = 8))
})

image

我們不直接支持將組合集大小添加到繪圖中的原因有幾個：
1. 添加新文本意味著向函數(shù)添加幾個新參數(shù)贮配，例如圖形參數(shù)的參數(shù)谍倦、旋轉、位置泪勒、條形的邊距昼蛀，這將使功能變的重復。
2.需要正確計算barplot注釋的ylim圆存，讓文字不超過注釋區(qū)域叼旋。
3、使用decoration_annotation()更靈活沦辙，不僅可以添加大小送淆，還可以添加自定義文本。

8.7 電影數(shù)據集的例子

UpsetR 包還提供了一個movies 數(shù)據集怕轿，其中包含 3883 部電影的 17 個流派偷崩。首先加載數(shù)據集。

movies = read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
    header = TRUE, sep = ";")
head(movies) # `make_comb_mat()` automatically ignores the first two columns

##                                 Name ReleaseDate Action Adventure Children
## 1                   Toy Story (1995)        1995      0         0        1
## 2                     Jumanji (1995)        1995      0         1        1
## 3            Grumpier Old Men (1995)        1995      0         0        0
## 4           Waiting to Exhale (1995)        1995      0         0        0
## 5 Father of the Bride Part II (1995)        1995      0         0        0
## 6                        Heat (1995)        1995      1         0        0
##   Comedy Crime Documentary Drama Fantasy Noir Horror Musical Mystery Romance
## 1      1     0           0     0       0    0      0       0       0       0
## 2      0     0           0     0       1    0      0       0       0       0
## 3      1     0           0     0       0    0      0       0       0       1
## 4      1     0           0     1       0    0      0       0       0       0
## 5      1     0           0     0       0    0      0       0       0       0
## 6      0     1           0     0       0    0      0       0       0       0
##   SciFi Thriller War Western AvgRating Watches
## 1     0        0   0       0      4.15    2077
## 2     0        0   0       0      3.20     701
## 3     0        0   0       0      3.02     478
## 4     0        0   0       0      2.73     170
## 5     0        0   0       0      3.01     296
## 6     0        1   0       0      3.88     940

要生成與此示例相同的 UpSet 圖：

m = make_comb_mat(movies, top_n_sets = 6)
m

## A combination matrix with 6 sets and 39 combinations.
##   ranges of combination set size: c(1, 1028).
##   mode for the combination size: distinct.
##   sets are on rows.
## 
## Top 8 combination sets are:
##   Action Comedy Drama Horror Romance Thriller   code size
##                     x                         001000 1028
##               x                               010000  698
##                            x                  000100  216
##        x                                      100000  206
##                                             x 000001  183
##               x     x                         011000  180
##               x                    x          010010  160
##                     x              x          001010  158
## 
## Sets are:
##          set size
##       Action  503
##       Comedy 1200
##        Drama 1603
##       Horror  343
##      Romance  471
##     Thriller  492
##   complement    2

m = m[comb_degree(m) > 0]
UpSet(m)

image

以下代碼使其看起來與原始圖更相似撞羽。代碼有點長阐斜，但大部分代碼主要是自定義注釋和行/列順序。

ss = set_size(m)
cs = comb_size(m)
ht = UpSet(m, 
    set_order = order(ss),
    comb_order = order(comb_degree(m), -cs),
    top_annotation = HeatmapAnnotation(
        "Genre Intersections" = anno_barplot(cs, 
            ylim = c(0, max(cs)*1.1),
            border = FALSE, 
            gp = gpar(fill = "black"), 
            height = unit(4, "cm")
        ), 
        annotation_name_side = "left", 
        annotation_name_rot = 90),
    left_annotation = rowAnnotation(
        "Movies Per Genre" = anno_barplot(-ss, 
            baseline = 0,
            axis_param = list(
                at = c(0, -500, -1000, -1500),
                labels = c(0, 500, 1000, 1500),
                labels_rot = 0),
            border = FALSE, 
            gp = gpar(fill = "black"), 
            width = unit(4, "cm")
        ),
        set_name = anno_text(set_name(m), 
            location = 0.5, 
            just = "center",
            width = max_text_width(set_name(m)) + unit(4, "mm"))
    ), 
    right_annotation = NULL,
    show_row_names = FALSE)
ht = draw(ht)
od = column_order(ht)
decorate_annotation("Genre Intersections", {
    grid.text(cs[od], x = seq_along(cs), y = unit(cs[od], "native") + unit(2, "pt"), 
        default.units = "native", just = c("left", "bottom"), 
        gp = gpar(fontsize = 6, col = "#404040"), rot = 45)
})

image

在movies數(shù)據集中诀紊，還有一列AvgRating給出了每部電影的評分谒出，接下來我們根據評分將所有電影分為五組。

genre = c("Action", "Romance", "Horror", "Children", "SciFi", "Documentary")
rating = cut(movies$AvgRating, c(0, 1, 2, 3, 4, 5))
m_list = tapply(seq_len(nrow(movies)), rating, function(ind) {
    m = make_comb_mat(movies[ind, genre, drop = FALSE])
    m[comb_degree(m) > 0]
})

中的組合矩陣m_list可能有不同的組合集：

sapply(m_list, comb_size)

## $`(0,1]`
## 010000 001000 000100 000001 
##      1      2      1      1 
## 
## $`(1,2]`
## 101010 100110 110000 101000 100100 100010 001010 100000 010000 001000 000100 
##      1      1      1      4      5      5      8     14      7     38     14 
## 000010 000001 
##      3      2 
## 
## $`(2,3]`
## 101010 110000 101000 100100 100010 010100 010010 001010 000110 100000 010000 
##      4      8      2      6     35      3      1     27      7    126     99 
## 001000 000100 000010 000001 
##    142     77     27      9 
## 
## $`(3,4]`
## 110010 101010 100110 110000 101000 100010 011000 010100 010010 001100 001010 
##      1      6      1     20      6     45      3      4      4      1     11 
## 000110 100000 010000 001000 000100 000010 000001 
##      5    176    276     82    122     66     87 
## 
## $`(4,5]`
## 110010 101010 110000 101000 100010 100000 010000 001000 000100 000010 000001 
##      1      1      4      1      6     23     38      4      4     10     28

為了用 UpSet 圖在多個組之間進行比較，我們需要對所有矩陣進行歸一化笤喳，使它們具有相同的集合和相同的組合集为居。 normalize_comb_mat()基本上將零添加到以前不存在的新組合集。

m_list = normalize_comb_mat(m_list)
sapply(m_list, comb_size)

##        (0,1] (1,2] (2,3] (3,4] (4,5]
## 110001     0     1     0     1     0
## 100101     0     1     4     6     1
## 100011     0     0     0     1     1
## 110000     0     5     6     0     0
## 100100     0     4     2     6     1
## 100010     0     1     8    20     4
## 100001     0     5    35    45     6
## 010100     0     0     0     1     0
## 010010     0     0     3     4     0
## 010001     0     0     7     5     0
## 000110     0     0     0     3     0
## 000101     0     8    27    11     0
## 000011     0     0     1     4     0
## 100000     0    14   126   176    23
## 010000     1    14    77   122     4
## 001000     1     2     9    87    28
## 000100     2    38   142    82     4
## 000010     1     7    99   276    38
## 000001     0     3    27    66    10

我們計算兩個條形圖的范圍：

max_set_size = max(sapply(m_list, set_size))
max_comb_size = max(sapply(m_list, comb_size))

最后杀狡，我們垂直添加五個 UpSet 圖：

ht_list = NULL
for(i in seq_along(m_list)) {
    ht_list = ht_list %v%
        UpSet(m_list[[i]], row_title = paste0("rating in", names(m_list)[i]),
            set_order = NULL, comb_order = NULL,
            top_annotation = upset_top_annotation(m_list[[i]], ylim = c(0, max_comb_size)),
            right_annotation = upset_right_annotation(m_list[[i]], ylim = c(0, max_set_size)))
}
ht_list

image.png

比較五個 UpSet 圖后蒙畴，我們可以看到大多數(shù)電影的評分在 2 到 4 之間∥叵螅恐怖片的評分往往較低膳凝，而愛情片的評分往往較高。

除了直接比較組合集的大小之外恭陡，我們還可以將相對分數(shù)與完整集進行比較蹬音。在下面的代碼中，我們刪除了c(0, 1]組休玩，因為那里的電影數(shù)量太少著淆。

m_list = m_list[-1]
max_set_size = max(sapply(m_list, set_size))
rel_comb_size = sapply(m_list, function(m) {
    s = comb_size(m)
    # because the combination matrix is generated under "distinct" mode
    # the sum of `s` is the size of the full set
    s/sum(s)
})
ht_list = NULL
for(i in seq_along(m_list)) {
    ht_list = ht_list %v%
        UpSet(m_list[[i]], row_title = paste0("rating in", names(m_list)[i]),
            set_order = NULL, comb_order = NULL,
            top_annotation = HeatmapAnnotation(
                "Relative\nfraction" = anno_barplot(
                    rel_comb_size[, i],
                    ylim = c(0, 0.5),
                    gp = gpar(fill = "black"),
                    border = FALSE,
                    height = unit(2, "cm"),
                ), 
                annotation_name_side = "left",
                annotation_name_rot = 0),
            right_annotation = upset_right_annotation(m_list[[i]], 
                ylim = c(0, max_set_size))
        )
}
ht_list

image

現(xiàn)在的趨勢更加明顯，恐怖片評分低拴疤，紀錄片評分高永部。

接下來我們按年份劃分電影：

year = floor(movies$ReleaseDate/10)*10
m_list = tapply(seq_len(nrow(movies)), year, function(ind) {
    m = make_comb_mat(movies[ind, genre, drop = FALSE])
    m[comb_degree(m) > 0]
})
m_list = normalize_comb_mat(m_list)
max_set_size = max(sapply(m_list, set_size))
max_comb_size = max(sapply(m_list, comb_size))
ht_list1 = NULL
for(i in 1:5) {
    ht_list1 = ht_list1 %v%
        UpSet(m_list[[i]], row_title = paste0(names(m_list)[i], "s"),
            set_order = NULL, comb_order = NULL,
            top_annotation = upset_top_annotation(m_list[[i]], ylim = c(0, max_comb_size),
                height = unit(2, "cm")),
            right_annotation = upset_right_annotation(m_list[[i]], ylim = c(0, max_set_size)))
}

ht_list2 = NULL
for(i in 6:10) {
    ht_list2 = ht_list2 %v%
        UpSet(m_list[[i]], row_title = paste0(names(m_list)[i], "s"),
            set_order = NULL, comb_order = NULL,
            top_annotation = upset_top_annotation(m_list[[i]], ylim = c(0, max_comb_size),
                height = unit(2, "cm")),
            right_annotation = upset_right_annotation(m_list[[i]], ylim = c(0, max_set_size)))
}
grid.newpage()
pushViewport(viewport(x = 0, width = 0.5, just = "left"))
draw(ht_list1, newpage = FALSE)
popViewport()
pushViewport(viewport(x = 0.5, width = 0.5, just = "left"))
draw(ht_list2, newpage = FALSE)
popViewport()

image

現(xiàn)在我們可以看到大部分電影都是 1990 年代制作的，兩大類型是動作片和愛情片遥赚。

類似地扬舒，如果我們將頂部注釋更改為完整集的相對分數(shù)（代碼未顯示）：

image

最后，我們可以在 UpSet 圖的右側添加作為箱線圖注釋的每個組合集的年份凫佛、評級和觀看次數(shù)的統(tǒng)計數(shù)據讲坎。

m = make_comb_mat(movies[, genre])
m = m[comb_degree(m) > 0]
comb_elements = lapply(comb_name(m), function(nm) extract_comb(m, nm))
years = lapply(comb_elements, function(ind) movies$ReleaseDate[ind])
rating = lapply(comb_elements, function(ind) movies$AvgRating[ind])
watches = lapply(comb_elements, function(ind) movies$Watches[ind])

UpSet(t(m)) + rowAnnotation(years = anno_boxplot(years),
    rating = anno_boxplot(rating),
    watches = anno_boxplot(watches),
    gap = unit(2, "mm"))

image

我們可以看到“科幻+兒童”類型的電影制作時間很長，但收視率還不錯愧薛〕靠唬“動作+兒童”類型的電影收視率最低。

8.8 基因組區(qū)域示例

來自六個路線圖樣本的 H3K4me3 ChIP-seq 峰通過 UpSet 圖進行可視化毫炉。這六個樣本是：

電調瓮栗，E016
ES衍生，E004
ES衍生瞄勾，E006
大腦费奸，E071
肌肉，E100
心臟进陡，E104

首先讀取文件并轉換為GRanges對象愿阐。

file_list = c(
    "ESC" = "data/E016-H3K4me3.narrowPeak.gz",
    "ES-deriv1" = "data/E004-H3K4me3.narrowPeak.gz",
    "ES-deriv2" = "data/E006-H3K4me3.narrowPeak.gz",
    "Brain" = "data/E071-H3K4me3.narrowPeak.gz",
    "Muscle" = "data/E100-H3K4me3.narrowPeak.gz",
    "Heart" = "data/E104-H3K4me3.narrowPeak.gz"
)
library(GenomicRanges)
peak_list = lapply(file_list, function(f) {
    df = read.table(f)
    GRanges(seqnames = df[, 1], ranges = IRanges(df[, 2], df [, 3]))
})

制作組合矩陣。現(xiàn)在注意集合和組合集合的大小是總堿基對或區(qū)域寬度的總和趾疚。我們只保留超過 500kb 的組合集缨历。

m = make_comb_mat(peak_list)
m = m[comb_size(m) > 500000]
UpSet(m)

image

我們可以通過設置axis_param很好地格式化軸標簽：

UpSet(m, 
    top_annotation = upset_top_annotation(
        m,
        axis_param = list(at = c(0, 1e7, 2e7),
            labels = c("0Mb", "10Mb", "20Mb")),
        height = unit(4, "cm")
    ),
    right_annotation = upset_right_annotation(
        m,
        axis_param = list(at = c(0, 2e7, 4e7, 6e7),
            labels = c("0Mb", "20Mb", "40Mb", "60Mb"),
            labels_rot = 0),
        width = unit(4, "cm")
    ))

image

對于每組基因組區(qū)域以蕴，我們可以將更多信息與其關聯(lián)，例如平均甲基化或與最近 TSS 的距離辛孵。

subgroup = c("ESC" = "group1",
    "ES-deriv1" = "group1",
    "ES-deriv2" = "group1",
    "Brain" = "group2",
    "Muscle" = "group2",
    "Heart" = "group2"
)
comb_sets = lapply(comb_name(m), function(nm) extract_comb(m, nm))
comb_sets = lapply(comb_sets, function(gr) {
    # we just randomly generate dist_to_tss and mean_meth
    gr$dist_to_tss = abs(rnorm(length(gr), mean = runif(1, min = 500, max = 2000), sd = 1000))
    gr$mean_meth = abs(rnorm(length(gr), mean = 0.1, sd = 0.1))
    gr
})
UpSet(m, 
    top_annotation = upset_top_annotation(
        m,
        axis_param = list(at = c(0, 1e7, 2e7),
            labels = c("0Mb", "10Mb", "20Mb")),
        height = unit(4, "cm")
    ),
    right_annotation = upset_right_annotation(
        m,
        axis_param = list(at = c(0, 2e7, 4e7, 6e7),
            labels = c("0Mb", "20Mb", "40Mb", "60Mb"),
            labels_rot = 0),
        width = unit(4, "cm")
    ),
    left_annotation = rowAnnotation(group = subgroup[set_name(m)], show_annotation_name = FALSE),
    bottom_annotation = HeatmapAnnotation(
        dist_to_tss = anno_boxplot(lapply(comb_sets, function(gr) gr$dist_to_tss), outline = FALSE),
        mean_meth = sapply(comb_sets, function(gr) mean(gr$mean_meth)),
        annotation_name_side = "left"
    )
)

image

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末丛肮，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子魄缚，更是在濱河造成了極大的恐慌宝与，老刑警劉巖，帶你破解...
沈念sama閱讀 218,682評論 6贊 507
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件鲜滩，死亡現(xiàn)場離奇詭異伴鳖，居然都是意外死亡节值，警方通過查閱死者的電腦和手機徙硅，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,277評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來搞疗，“玉大人嗓蘑，你說我怎么就攤上這事∧淠耍” “怎么了桩皿？”我有些...
開封第一講書人閱讀 165,083評論 0贊 355
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長幢炸。經常有香客問我泄隔，道長，這世上最難降的妖魔是什么宛徊？我笑而不...
開封第一講書人閱讀 58,763評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任佛嬉，我火速辦了婚禮，結果婚禮上闸天，老公的妹妹穿的比我還像新娘暖呕。我一直安慰自己，他們只是感情好苞氮，可當我...
茶點故事閱讀 67,785評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布湾揽。她就那樣靜靜地躺著，像睡著了一般笼吟。火紅的嫁衣襯著肌膚如雪库物。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,624評論 1贊 305
城市分裂傳說
那天贷帮，我揣著相機與錄音戚揭，去河邊找鬼。笑死皿桑，一個胖子當著我的面吹牛毫目，可吹牛的內容都是我干的蔬啡。我是一名探鬼主播，決...
沈念sama閱讀 40,358評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼镀虐，長吁一口氣：“原來是場噩夢啊……” “哼箱蟆！你這毒婦竟也來了？” 一聲冷哼從身側響起刮便，我...
開封第一講書人閱讀 39,261評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤空猜，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后恨旱，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體辈毯，經...
沈念sama閱讀 45,722評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 37,900評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年搜贤，在試婚紗的時候發(fā)現(xiàn)自己被綠了谆沃。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 40,030評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡仪芒，死狀恐怖唁影，靈堂內的尸體忽然破棺而出，到底是詐尸還是另有隱情掂名，我是刑警寧澤据沈，帶...
沈念sama閱讀 35,737評論 5贊 346
?日本核電站爆炸內幕
正文年R本政府宣布，位于F島的核電站饺蔑，受9級特大地震影響锌介，放射性物質發(fā)生泄漏。R本人自食惡果不足惜猾警，卻給世界環(huán)境...
茶點故事閱讀 41,360評論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一孔祸、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧肿嘲，春花似錦融击、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,941評論 0贊 22
一樁弒父案尊浪，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至封救，卻和暖如春拇涤，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背誉结。一陣腳步聲響...
開封第一講書人閱讀 33,057評論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工鹅士，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人惩坑。一個月前我還...
沈念sama閱讀 48,237評論 3贊 371
代替公主和親
正文我出身青樓掉盅，卻偏偏與公主長得像也拜，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子趾痘，可洞房花燭夜當晚...
茶點故事閱讀 44,976評論 2贊 355

ComplexHeatmap復雜熱圖繪制學習——8.upsetplot

upset-plot

8.1 輸入數(shù)據

8.2 upset模式

8.3 生成組合矩陣

8.4 upset實用功能

8.5 生成upset圖

8.6 UpSet 圖作為熱圖

8.7 電影數(shù)據集的例子

8.8 基因組區(qū)域示例

推薦閱讀更多精彩內容