R語言分析4:一致性聚類分析(ConsensusClusterPlus)

一致性聚類是一種提供定量證據(jù)的方法伞广,用以確定數(shù)據(jù)集中可能的聚類的成員及數(shù)量轩拨,例如微陣列基因表達(dá)。這種方法在癌癥基因組學(xué)中得到了廣泛應(yīng)用驶赏,以發(fā)現(xiàn)了新的疾病分子亞類。

一致性聚類方法包括從一組項目(例如微陣列)中進(jìn)行二次抽樣既鞠,并確定聚類計數(shù)(k)的簇煤傍。然后,計算成對共識值嘱蛋,即兩個項目在同一子樣本中出現(xiàn)的次數(shù)中占據(jù)同一簇的比例蚯姆,并將其存儲在每個 k 的對稱共識矩陣中。

1. 準(zhǔn)備輸入數(shù)據(jù)

# 安裝并加載所需的R包
# BiocManager::install("ConsensusClusterPlus")
library(ConsensusClusterPlus)

# 輸入數(shù)據(jù)格式是一個矩陣浑槽,其中列是樣本(項目)蒋失,行是特征,單元格是數(shù)值(此處展示TCGA-STAD的FPKM數(shù)據(jù))
STAD_tumor[1:5,1:5]
##           TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
## 5_8S_rRNA                       0.0866                       0.4991                       1.0460                       0.1388                       0.0965
## 5S_rRNA                         0.0000                       0.2676                       2.6896                       0.3327                       0.1393
## 7SK                             0.3533                       0.2546                       1.2531                       0.2813                       0.0000
## A1BG                            0.0197                       0.0358                       0.1067                       0.1153                       0.0406
## A1BG-AS1                        0.0546                       0.2324                       0.3320                       0.1367                       0.0525

# 為了選擇信息最豐富的基因進(jìn)行類檢測桐玻,將數(shù)據(jù)集減少到前 5,000 個可變性最大的基因(通過中值絕對偏差 - MAD來衡量)
mads = apply(STAD_tumor, 1, mad) # MAD測度
STAD_tumor = STAD_tumor[rev(order(mads))[1:5000], ] # 提取前5000個基因

# 歸一化操作篙挽,sweep是一個循環(huán)函數(shù),首先用apply計算每行的中值镊靴,然后用每個基因在樣本中的表達(dá)值減中值
STAD_tumor = sweep(STAD_tumor, 1, apply(STAD_tumor, 1, median, na.rm = T))

STAD_tumor[1:5,1:5]
##           TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
## 5_8S_rRNA                     -0.04830                      0.36420                      0.91110                      0.00390                     -0.03840
## 5S_rRNA                       -0.36695                     -0.09935                      2.32265                     -0.03425                     -0.22765
## 7SK                            0.24415                      0.14545                      1.14395                      0.17215                     -0.10915
## A1BG                          -0.00930                      0.00680                      0.07770                      0.08630                      0.01160
## A1BG-AS1                      -0.05605                      0.12175                      0.22135                      0.02605                     -0.05815

2. 運(yùn)行ConsensusClusterPlus

主要包含了以下幾個參數(shù):
1)pItem, 樣品的抽樣比例铣卡,如 pItem=0.8 表示選擇80%的樣本進(jìn)行重復(fù)抽樣;
2)pfeature, Feature的抽樣比例偏竟;
3)maxK, 聚類結(jié)果中分類的最大的K值煮落,必須是整數(shù);
4)reps, 重復(fù)抽樣的數(shù)目踊谋;
5)clusterAlg, 使用的聚類算法蝉仇,“hc”用于層次聚類,“pam”用于PAM(Partioning Around Medoids)算法殖蚕,“km”用于K-Means算法轿衔,也可以自定義函數(shù);
6)distance, 計算距離的方法睦疫,有pearson害驹、spearman、euclidean蛤育、binary宛官、maximum、canberra瓦糕、minkowski;
7)title, 輸出結(jié)果的文件夾名字底洗,包含了輸出的圖片;
8)seed, 隨機(jī)種子咕娄,用于重復(fù)結(jié)果
9)tmyPal枷恕,指定一致性矩陣使用的顏色,默認(rèn)使用白-藍(lán)色
10)plot谭胚,不設(shè)置時圖片結(jié)果僅輸出到屏幕徐块,也可以設(shè)置輸出為'pdf', 'png', 'pngBMP'
11)writeTable,若為TRUE灾而,則將一致性矩陣胡控、ICL、log輸出到CSV文件

title=tempdir()
results = ConsensusClusterPlus(STAD_tumor, maxK = 20, reps = 1000, pItem = 0.8, pFeature = 1, title = "untitled_consensus_cluster", clusterAlg = "hc", 
                               distance = "pearson", seed = 1262118388.71279, tmyPal=NULL, writeTable=FALSE, plot = "png")
## end fraction
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered

# ConsensusClusterPlus 的輸出是一個列表旁趟,其中列表的元素對應(yīng)于第 k 個集群的結(jié)果昼激,例如 results[[8]] 就是 k =8 的結(jié)果 result。
str(results[[8]])
## List of 5
##  $ consensusMatrix: num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
##  $ consensusTree  :List of 7
##   ..$ merge      : int [1:411, 1:2] -1 -2 -30 -42 -43 -49 -50 -51 -59 -63 ...
##   ..$ height     : num [1:411] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ order      : int [1:412] 410 258 251 330 22 175 191 241 252 382 ...
##   ..$ labels     : NULL
##   ..$ method     : chr "average"
##   ..$ call       : language hclust(d = as.dist(1 - fm), method = finalLinkage)
##   ..$ dist.method: NULL
##   ..- attr(*, "class")= chr "hclust"
##  $ consensusClass : Named int [1:412] 1 2 3 4 2 3 3 2 3 3 ...
##   ..- attr(*, "names")= chr [1:412] "TCGA-BR-A4J4-01A-12R-A251-31" "TCGA-BR-A4IZ-01A-32R-A251-31" "TCGA-RD-A7C1-01A-11R-A32D-31" ## "TCGA-BR-6852-01A-11R-1884-13" ...
##  $ ml             : num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
##  $ clrs           :List of 3
##   ..$ : chr [1:412] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...
##   ..$ : num 8
##   ..$ : chr [1:8] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...

results[[8]][["consensusMatrix"]][1:5,1:5]
##             [,1]        [,2]        [,3]        [,4]        [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000

# 樣本的聚類樹
results[[8]][["consensusTree"]]
## 
## Call:
## hclust(d = as.dist(1 - fm), method = finalLinkage)
## 
## Cluster method   : average 
## Number of objects: 412 

# consensusClass锡搜, 樣本的聚類結(jié)果
length(results[[8]][["consensusClass"]])
## [1] 412
results[[8]][["consensusClass"]][1:5]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13 
##                            1                            2                            3                            4                            2 

# ml, 就是consensusMatrix
results[[8]][["ml"]][1:5,1:5]
##             [,1]        [,2]        [,3]        [,4]        [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
results[[8]][["consensusMatrix"]][1:5,1:5]
##             [,1]        [,2]        [,3]        [,4]        [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000

# clrs, 顏色
results[[8]][["clrs"]]
## [[1]]
##   [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#FF7F00" "#FF7F00" "#FF7F00" "#1F78B4"
##  [17] "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#E31A1C" "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00"
##  [33] "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#33A02C" "#1F78B4" "#1F78B4"
##  [49] "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#FB9A99" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#33A02C"
##  [65] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99"
##  [81] "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4"
##  [97] "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3"
## [113] "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#FB9A99"
## [129] "#1F78B4" "#FF7F00" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [145] "#FF7F00" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4"
## [161] "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#1F78B4"
## [177] "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FDBF6F" "#1F78B4"
## [193] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [209] "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#1F78B4" "#B2DF8A"
## [225] "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4"
## [241] "#FDBF6F" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#FDBF6F" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00"
## [257] "#FF7F00" "#E31A1C" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#33A02C" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [273] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FB9A99" "#1F78B4"
## [289] "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#B2DF8A" "#FF7F00" "#FF7F00" "#FF7F00"
## [305] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4"
## [321] "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#E31A1C" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4"
## [337] "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#B2DF8A" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#33A02C"
## [353] "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#B2DF8A" "#1F78B4" "#A6CEE3" "#B2DF8A" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00"
## [369] "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FDBF6F" "#1F78B4" "#1F78B4"
## [385] "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3"
## [401] "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#B2DF8A" "#1F78B4" "#1F78B4" "#FF7F00" "#E31A1C" "#1F78B4" "#FF7F00"
## 
## [[2]]
## [1] 8
## 
## [[3]]
## [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#E31A1C" "#33A02C" "#FB9A99" "#FDBF6F"

★ 官網(wǎng)推薦橙困,在實際運(yùn)行中,推薦reps設(shè)置的更大耕餐,比如1000凡傅, maxK設(shè)置的更大,比如20

3. 計算聚類一致性 (cluster-consensus) 和樣品一致性 (item-consensus)

主要包含了以下幾個參數(shù):
1)title肠缔,設(shè)置生成的文件的路徑
2)plot夏跷,不設(shè)置時圖片結(jié)果僅輸出到屏幕,也可以設(shè)置輸出為'pdf', 'png', 'pngBMP' 明未。
3)writeTable槽华,若為TRUE,則將一致性矩陣趟妥、ICL猫态、log輸出到CSV文件

icl = calcICL(results, title = "consensus_cluster", plot = "png")

dim(icl[["clusterConsensus"]])
icl[["clusterConsensus"]][1:5,]
##       k cluster clusterConsensus
##  [1,] 2       1        0.8236517
##  [2,] 2       2        0.9204836
##  [3,] 3       1        0.6659016
##  [4,] 3       2        0.7503082
##  [5,] 3       3        0.8561029

dim(icl[["itemConsensus"]])
icl[["itemConsensus"]][1:5,]
##   k cluster                         item itemConsensus
##  1 2       1 TCGA-ZQ-A9CR-01A-11R-A39E-31     0.2776996
##  2 2       1 TCGA-BR-6563-01A-13R-2055-13     0.2786404
##  3 2       1 TCGA-IN-7808-01A-11R-2203-13     0.2941890
##  4 2       1 TCGA-VQ-A8P8-01A-11R-A39E-31     0.2783686
##  5 2       1 TCGA-HU-A4HB-01A-12R-A251-31     0.2714451

4. 聚類結(jié)果展示

4.1 矩陣熱圖

ConsensusClusterPlus-1

? 一致性矩陣的行和列表示的都是樣本,一致性矩陣的值按從0(不可能聚類在一起)到1(總是聚類在一起)用白色到深藍(lán)色表示
? 一致性矩陣按照一致性聚類(熱圖上方的樹狀圖)來排列披摄。樹狀圖和熱圖之間的彩色矩形即分出來的類別

4.2 一致性累積分布函數(shù)(CDF)圖

ConsensusClusterPlus-2

? 此圖展示了k取不同數(shù)值時(用顏色表示)的一致性矩陣的累積分布函數(shù)亲雪,用于判斷當(dāng)k取何值時,CDF達(dá)到一個近似最大值行疏,此時一致性和簇置信度在達(dá)到最大匆光,聚類分析結(jié)果最可靠

4.3 Delta Area Plot

ConsensusClusterPlus-3

? 此圖展示了 k 和 k-1 相比CDF曲線下面積的相對變化。當(dāng)k = 2時酿联,因為沒有k=1终息,所以第一個點表示的是k=2時CDF曲線下總面積,而當(dāng)k = 10 時贞让,曲線下面積趨近于平穩(wěn)周崭,所以10為最合適的k值。

4.4 Tracking Plot

ConsensusClusterPlus-4

? 此圖下方的黑色條紋表示樣品喳张,展示的是樣品在k取不同的值時续镇,歸屬的分類情況,不同顏色的色塊代表不同的分類销部。取不同k值前后經(jīng)常改變顏色分類的樣品代表其分類不穩(wěn)定摸航。若分類不穩(wěn)定的樣本太多制跟,則說明該k值下的分類不穩(wěn)定。

4.5 item-Consensus Plot(樣本一致性圖)

ConsensusClusterPlus-5

? 縱坐標(biāo)代表Item-consensus values酱虎。k值不同時雨膨,每個樣本都會有一個對應(yīng)不同簇的item-consensus values。豎條代表每一個樣本读串,豎條按從下到上遞增的值進(jìn)行排序聊记,高度代表該樣本的總item-consensus values。每個樣本的頂部都有一個小矩形恢暖,小矩形的顏色代表該樣本被分到了哪一簇排监。
? 該圖提供了給定k下所有其他集群的樣本一致性圖,可以查看樣本是否是集群中非辰芪妫“純粹”的成員舆床,或者它是否與多個集群共享高度一致性(多種顏色的列中的大矩形),表明它是不穩(wěn)定或“不純粹”的成員琼娘。 這些值可用于選擇“核心”樣本峭弟,它們高度代表集群。 此外脱拼,也可以幫助決策簇數(shù)量

4.6 Cluster-Consensus Plot(聚類一致性圖 )

ConsensusClusterPlus-6

? 此圖展示的是不同k值下瞒瘸,每個分類的cluster-consensus value(該簇中成員pairwise consensus values的均值)。該值越高(低)代表穩(wěn)定性越高(低)熄浓∏槌簦可用于判斷同一k值下以及不同k值之間cluster-consensus value的高低

5. 最佳的K值的選擇

通過上面的結(jié)果解讀我們可以選取出初步分類的亞組的個數(shù),但是這種方法有時候帶有主觀性赌蔑,也可以用PAC法確定最佳聚類數(shù)目

#  面積最小值對應(yīng)K為最佳K
Kvec = 2:20
x1 = 0.1; x2 = 0.9        # threshold defining the intermediate sub-interval
PAC = rep(NA,length(Kvec)) 
names(PAC) = paste("K=",Kvec,sep="")  # from 2 to maxK
for(i in Kvec){
  M = results[[i]]$consensusMatrix
  Fn = ecdf(M[lower.tri(M)])          # M 為計算出共識矩陣
  PAC[i-1] = Fn(x2) - Fn(x1)
} 

optK = Kvec[which.min(PAC)]  # 理想的K值


# 重新繪制熱圖(以k=7為例)
## 選取自己感興趣的基因集
intersection_geneExp[1:3,1:3]
##         TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
## ALOX12B                       0.5224                       0.0031                       0.6993
## GOT1                         20.0073                       9.3511                      39.1501
## ALOX15B                       0.2185                       0.1652                       1.3996

annCol <- data.frame(results = paste0("Cluster",
                                      results[[7]][['consensusClass']]),
                     row.names = colnames(intersection_geneExp))
                                  
head(annCol)
##                               results
##  TCGA-BR-A4J4-01A-12R-A251-31 Cluster1
##  TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1
##  TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2
##  TCGA-BR-6852-01A-11R-1884-13 Cluster2
##  TCGA-BR-7851-01A-11R-2203-13 Cluster2
##  TCGA-BR-A4J1-01A-11R-A251-31 Cluster3

library(RColorBrewer)
library(pheatmap)

mycol <- brewer.pal(7, "Set3")

annColors <- list(results = c("Cluster1" = mycol[1],
                              "Cluster2" = mycol[2],
                              "Cluster3" = mycol[3], 
                              "Cluster4" = mycol[4], 
                              "Cluster5" = mycol[5], 
                              "Cluster6" = mycol[6], 
                              "Cluster7"= mycol[7]))
                    
heatdata <- results[[7]][["consensusMatrix"]]
dimnames(heatdata) <- list(colnames(intersection_geneExp), colnames(intersection_geneExp))
heatdata[1:3,1:3]
##                              TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
##  TCGA-BR-A4J4-01A-12R-A251-31                    1.0000000                    0.6542056                            0
##  TCGA-BR-A4IZ-01A-32R-A251-31                    0.6542056                    1.0000000                            0
##  TCGA-RD-A7C1-01A-11R-A32D-31                    0.0000000                    0.0000000                            1

# 繪制熱圖
result <- pheatmap(mat = heatdata,
         color = colorRampPalette((c("white","steelblue")))(100),
         border_color = NA,
         annotation_col = annCol,
         annotation_colors = annColors,
         show_colnames = F,
         show_rownames = F)
ConsensusClusterPlus-7

6. 提取分型結(jié)果

library(tidyverse)

Cluster_res <- annCol %>% as.data.frame %>% rownames_to_column("patient_ID") 
table(Cluster_res$results) 
## 
## Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 
##      132       94       37       57       38       48        6 

#合并預(yù)后數(shù)據(jù)進(jìn)行生存分析 
STAD_clinical <- read_tsv("STAD_clinical.txt") 
STAD_clinical$time <- ifelse(STAD_clinical$vital_status=='Alive', STAD_clinical$days_to_last_follow_up, STAD_clinical$days_to_death) # 如果患者已經(jīng)死亡則選擇days_to_death俯在,如果患者處于活著狀態(tài),則選擇days_to_last_follow_up
STAD_clinical$status <- ifelse(STAD_clinical$vital_status=='Alive', 0, 1)

Cluster_res <- Cluster_res %>% inner_join(STAD_clinical, "patient_ID") 
head(Cluster_res) 
##                     patient_ID  results age gender tissue_or_organ_of_origin               primary_diagnosis vital_status ajcc_pathologic_stage year_of_birth days_to_last_follow_up year_of_death days_to_death time status
##  1 TCGA-BR-A4J4-01A-12R-A251-31 Cluster1  39   male            Gastric antrum             Adenocarcinoma, NOS        Alive            Stage IIIB          1973                     16            NA            NA   16      0
##  2 TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1  45 female            Gastric antrum         Carcinoma, diffuse type         Dead            Stage IIIB          1967                     24            NA           273  273      1
##  3 TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2  82   male         Fundus of stomach         Carcinoma, diffuse type         Dead              Stage IB          1923                     NA          2006           507  507      1
##  4 TCGA-BR-6852-01A-11R-1884-13 Cluster2  64 female               Cardia, NOS             Adenocarcinoma, NOS        Alive             Stage IIA          1947                   1367            NA            NA 1367      0
##  5 TCGA-BR-7851-01A-11R-2203-13 Cluster2  74   male           Body of stomach Adenocarcinoma, intestinal type         Dead             Stage IIB          1937                    574            NA            NA   NA      1
##  6 TCGA-BR-A4J1-01A-11R-A251-31 Cluster2  63   male            Gastric antrum             Adenocarcinoma, NOS         Dead            Stage IIIA          1949                     NA          2012            22   22      1

#進(jìn)行生存分析 
library(survival)
library(survminer)

fit <- survfit(Surv(time, status) ~ results, data = Cluster_res) 

ggsurvplot(fit, data = Cluster_res,  
           main = "Survival curve", # 添加標(biāo)題
           surv.median.line = "hv", # 添加中位數(shù)生存時間線
           font.main = c(16, "bold", "darkblue"), # 設(shè)置標(biāo)題字體大小娃惯、格式和顏色
           font.x = c(14, "bold.italic", "red"), # 設(shè)置x軸字體大小跷乐、格式和顏色
           font.y = c(14, "bold.italic", "darkred"), # 設(shè)置y軸字體大小、格式和顏色
           font.tickslab = c(12, "plain", "darkgreen"), # 設(shè)置坐標(biāo)軸刻度字體大小趾浅、格式和顏色
           legend = c(0.9, 0.15), # 通過坐標(biāo)指定圖例位置
           legend.title = "Cluster", legend.labs = c("Cluster1","Cluster2","Cluster3","Cluster4","Cluster5","Cluster6","Cluster7"), # 更改圖例標(biāo)題和標(biāo)簽
           size = 1,  # 改變線條大小
           linetype = "strata", # 按組更改線型
           break.time.by = 250, # 將時間軸以250作為打斷
           # palette = "Dark2" # 使用 brewer 調(diào)色板“Dark2”
           palette = c('#E5D2DD', '#53A85F', '#F1BB72', '#F3B1A0', '#D6E7A3', '#57C3F3', '#476D87'), # 自定義調(diào)色板
           ggtheme = theme_bw(), # 修改主題
           # conf.int = TRUE, # 添加置信區(qū)間
           pval = TRUE, # 添加p值
           risk.table = TRUE, risk.table.y.text.col = TRUE, # 添加風(fēng)險表,并按層更改風(fēng)險表y 文本顏色  
           tables.height = 0.2, # 設(shè)置風(fēng)險表的高度
           tables.theme = theme_cleantable(), # 設(shè)置風(fēng)險表的主題
           xlim = c(0, 1050) # 更改 x 軸范圍
)  
ConsensusClusterPlus-8

7. tSNE分析

# 將數(shù)據(jù)框中字符型數(shù)據(jù)轉(zhuǎn)化為數(shù)值型
intersection_tumor <- as.data.frame(lapply(intersection_geneExp, as.numeric))
row.names(intersection_tumor) <- row.names(intersection_geneExp)
colnames(intersection_tumor) <- colnames(tumor_matrix)
intersection_tumor = na.omit(intersection_tumor)
intersection_tumor <- as.matrix(intersection_tumor)

library(Rtsne)

tSNE_res<- Rtsne(t(intersection_tumor), dims= 2, perplexity= 10, verbose= F, max_iter= 500, check_duplicates= F) 

tsne<- data.frame(tSNE1 = tSNE_res[["Y"]][,1], tSNE2= tSNE_res[["Y"]][,2], cluster = annCol$results) 

ggplot(tsne, aes(x= tSNE1, y= tSNE2, color= cluster)) + 
  geom_point(size= 4.5, alpha= 0.5) + 
  stat_ellipse(level= 0.85, show.legend = F) + 
  theme(legend.position= "top")
ConsensusClusterPlus-9
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末愕提,一起剝皮案震驚了整個濱河市,隨后出現(xiàn)的幾起案子皿哨,更是在濱河造成了極大的恐慌浅侨,老刑警劉巖,帶你破解...
    沈念sama閱讀 217,277評論 6 503
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件证膨,死亡現(xiàn)場離奇詭異如输,居然都是意外死亡,警方通過查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 92,689評論 3 393
  • 文/潘曉璐 我一進(jìn)店門不见,熙熙樓的掌柜王于貴愁眉苦臉地迎上來澳化,“玉大人,你說我怎么就攤上這事脖祈∷敛叮” “怎么了?”我有些...
    開封第一講書人閱讀 163,624評論 0 353
  • 文/不壞的土叔 我叫張陵盖高,是天一觀的道長。 經(jīng)常有香客問我眼虱,道長喻奥,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 58,356評論 1 293
  • 正文 為了忘掉前任捏悬,我火速辦了婚禮撞蚕,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘过牙。我一直安慰自己甥厦,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 67,402評論 6 392
  • 文/花漫 我一把揭開白布寇钉。 她就那樣靜靜地躺著刀疙,像睡著了一般。 火紅的嫁衣襯著肌膚如雪扫倡。 梳的紋絲不亂的頭發(fā)上谦秧,一...
    開封第一講書人閱讀 51,292評論 1 301
  • 那天,我揣著相機(jī)與錄音撵溃,去河邊找鬼疚鲤。 笑死,一個胖子當(dāng)著我的面吹牛缘挑,可吹牛的內(nèi)容都是我干的集歇。 我是一名探鬼主播,決...
    沈念sama閱讀 40,135評論 3 418
  • 文/蒼蘭香墨 我猛地睜開眼语淘,長吁一口氣:“原來是場噩夢啊……” “哼诲宇!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起亏娜,我...
    開封第一講書人閱讀 38,992評論 0 275
  • 序言:老撾萬榮一對情侶失蹤焕窝,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后维贺,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體它掂,經(jīng)...
    沈念sama閱讀 45,429評論 1 314
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 37,636評論 3 334
  • 正文 我和宋清朗相戀三年,在試婚紗的時候發(fā)現(xiàn)自己被綠了虐秋。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片榕茧。...
    茶點故事閱讀 39,785評論 1 348
  • 序言:一個原本活蹦亂跳的男人離奇死亡,死狀恐怖客给,靈堂內(nèi)的尸體忽然破棺而出用押,到底是詐尸還是另有隱情,我是刑警寧澤靶剑,帶...
    沈念sama閱讀 35,492評論 5 345
  • 正文 年R本政府宣布蜻拨,位于F島的核電站,受9級特大地震影響桩引,放射性物質(zhì)發(fā)生泄漏缎讼。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點故事閱讀 41,092評論 3 328
  • 文/蒙蒙 一坑匠、第九天 我趴在偏房一處隱蔽的房頂上張望血崭。 院中可真熱鬧,春花似錦厘灼、人聲如沸夹纫。這莊子的主人今日做“春日...
    開封第一講書人閱讀 31,723評論 0 22
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽舰讹。三九已至,卻和暖如春围来,著一層夾襖步出監(jiān)牢的瞬間跺涤,已是汗流浹背。 一陣腳步聲響...
    開封第一講書人閱讀 32,858評論 1 269
  • 我被黑心中介騙來泰國打工监透, 沒想到剛下飛機(jī)就差點兒被人妖公主榨干…… 1. 我叫王不留桶错,地道東北人。 一個月前我還...
    沈念sama閱讀 47,891評論 2 370
  • 正文 我出身青樓胀蛮,卻偏偏與公主長得像院刁,于是被迫代替她去往敵國和親。 傳聞我的和親對象是個殘疾皇子粪狼,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 44,713評論 2 354

推薦閱讀更多精彩內(nèi)容