一致性聚類是一種提供定量證據(jù)的方法伞广,用以確定數(shù)據(jù)集中可能的聚類的成員及數(shù)量轩拨,例如微陣列基因表達(dá)。這種方法在癌癥基因組學(xué)中得到了廣泛應(yīng)用驶赏,以發(fā)現(xiàn)了新的疾病分子亞類。
一致性聚類方法包括從一組項目(例如微陣列)中進(jìn)行二次抽樣既鞠,并確定聚類計數(shù)(k)的簇煤傍。然后,計算成對共識值嘱蛋,即兩個項目在同一子樣本中出現(xiàn)的次數(shù)中占據(jù)同一簇的比例蚯姆,并將其存儲在每個 k 的對稱共識矩陣中。
1. 準(zhǔn)備輸入數(shù)據(jù)
# 安裝并加載所需的R包
# BiocManager::install("ConsensusClusterPlus")
library(ConsensusClusterPlus)
# 輸入數(shù)據(jù)格式是一個矩陣浑槽,其中列是樣本(項目)蒋失,行是特征,單元格是數(shù)值(此處展示TCGA-STAD的FPKM數(shù)據(jù))
STAD_tumor[1:5,1:5]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
## 5_8S_rRNA 0.0866 0.4991 1.0460 0.1388 0.0965
## 5S_rRNA 0.0000 0.2676 2.6896 0.3327 0.1393
## 7SK 0.3533 0.2546 1.2531 0.2813 0.0000
## A1BG 0.0197 0.0358 0.1067 0.1153 0.0406
## A1BG-AS1 0.0546 0.2324 0.3320 0.1367 0.0525
# 為了選擇信息最豐富的基因進(jìn)行類檢測桐玻,將數(shù)據(jù)集減少到前 5,000 個可變性最大的基因(通過中值絕對偏差 - MAD來衡量)
mads = apply(STAD_tumor, 1, mad) # MAD測度
STAD_tumor = STAD_tumor[rev(order(mads))[1:5000], ] # 提取前5000個基因
# 歸一化操作篙挽,sweep是一個循環(huán)函數(shù),首先用apply計算每行的中值镊靴,然后用每個基因在樣本中的表達(dá)值減中值
STAD_tumor = sweep(STAD_tumor, 1, apply(STAD_tumor, 1, median, na.rm = T))
STAD_tumor[1:5,1:5]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
## 5_8S_rRNA -0.04830 0.36420 0.91110 0.00390 -0.03840
## 5S_rRNA -0.36695 -0.09935 2.32265 -0.03425 -0.22765
## 7SK 0.24415 0.14545 1.14395 0.17215 -0.10915
## A1BG -0.00930 0.00680 0.07770 0.08630 0.01160
## A1BG-AS1 -0.05605 0.12175 0.22135 0.02605 -0.05815
2. 運(yùn)行ConsensusClusterPlus
主要包含了以下幾個參數(shù):
1)pItem, 樣品的抽樣比例铣卡,如 pItem=0.8 表示選擇80%的樣本進(jìn)行重復(fù)抽樣;
2)pfeature, Feature的抽樣比例偏竟;
3)maxK, 聚類結(jié)果中分類的最大的K值煮落,必須是整數(shù);
4)reps, 重復(fù)抽樣的數(shù)目踊谋;
5)clusterAlg, 使用的聚類算法蝉仇,“hc”用于層次聚類,“pam”用于PAM(Partioning Around Medoids)算法殖蚕,“km”用于K-Means算法轿衔,也可以自定義函數(shù);
6)distance, 計算距離的方法睦疫,有pearson害驹、spearman、euclidean蛤育、binary宛官、maximum、canberra瓦糕、minkowski;
7)title, 輸出結(jié)果的文件夾名字底洗,包含了輸出的圖片;
8)seed, 隨機(jī)種子咕娄,用于重復(fù)結(jié)果
9)tmyPal枷恕,指定一致性矩陣使用的顏色,默認(rèn)使用白-藍(lán)色
10)plot谭胚,不設(shè)置時圖片結(jié)果僅輸出到屏幕徐块,也可以設(shè)置輸出為'pdf', 'png', 'pngBMP'
11)writeTable,若為TRUE灾而,則將一致性矩陣胡控、ICL、log輸出到CSV文件
title=tempdir()
results = ConsensusClusterPlus(STAD_tumor, maxK = 20, reps = 1000, pItem = 0.8, pFeature = 1, title = "untitled_consensus_cluster", clusterAlg = "hc",
distance = "pearson", seed = 1262118388.71279, tmyPal=NULL, writeTable=FALSE, plot = "png")
## end fraction
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
## clustered
# ConsensusClusterPlus 的輸出是一個列表旁趟,其中列表的元素對應(yīng)于第 k 個集群的結(jié)果昼激,例如 results[[8]] 就是 k =8 的結(jié)果 result。
str(results[[8]])
## List of 5
## $ consensusMatrix: num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
## $ consensusTree :List of 7
## ..$ merge : int [1:411, 1:2] -1 -2 -30 -42 -43 -49 -50 -51 -59 -63 ...
## ..$ height : num [1:411] 0 0 0 0 0 0 0 0 0 0 ...
## ..$ order : int [1:412] 410 258 251 330 22 175 191 241 252 382 ...
## ..$ labels : NULL
## ..$ method : chr "average"
## ..$ call : language hclust(d = as.dist(1 - fm), method = finalLinkage)
## ..$ dist.method: NULL
## ..- attr(*, "class")= chr "hclust"
## $ consensusClass : Named int [1:412] 1 2 3 4 2 3 3 2 3 3 ...
## ..- attr(*, "names")= chr [1:412] "TCGA-BR-A4J4-01A-12R-A251-31" "TCGA-BR-A4IZ-01A-32R-A251-31" "TCGA-RD-A7C1-01A-11R-A32D-31" ## "TCGA-BR-6852-01A-11R-1884-13" ...
## $ ml : num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
## $ clrs :List of 3
## ..$ : chr [1:412] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...
## ..$ : num 8
## ..$ : chr [1:8] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...
results[[8]][["consensusMatrix"]][1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
# 樣本的聚類樹
results[[8]][["consensusTree"]]
##
## Call:
## hclust(d = as.dist(1 - fm), method = finalLinkage)
##
## Cluster method : average
## Number of objects: 412
# consensusClass锡搜, 樣本的聚類結(jié)果
length(results[[8]][["consensusClass"]])
## [1] 412
results[[8]][["consensusClass"]][1:5]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
## 1 2 3 4 2
# ml, 就是consensusMatrix
results[[8]][["ml"]][1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
results[[8]][["consensusMatrix"]][1:5,1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
# clrs, 顏色
results[[8]][["clrs"]]
## [[1]]
## [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#FF7F00" "#FF7F00" "#FF7F00" "#1F78B4"
## [17] "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#E31A1C" "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00"
## [33] "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#33A02C" "#1F78B4" "#1F78B4"
## [49] "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#FB9A99" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#33A02C"
## [65] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99"
## [81] "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4"
## [97] "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3"
## [113] "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#FB9A99"
## [129] "#1F78B4" "#FF7F00" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [145] "#FF7F00" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4"
## [161] "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#1F78B4"
## [177] "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FDBF6F" "#1F78B4"
## [193] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [209] "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#1F78B4" "#B2DF8A"
## [225] "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4"
## [241] "#FDBF6F" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#FDBF6F" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00"
## [257] "#FF7F00" "#E31A1C" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#33A02C" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
## [273] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FB9A99" "#1F78B4"
## [289] "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#B2DF8A" "#FF7F00" "#FF7F00" "#FF7F00"
## [305] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4"
## [321] "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#E31A1C" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4"
## [337] "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#B2DF8A" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#33A02C"
## [353] "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#B2DF8A" "#1F78B4" "#A6CEE3" "#B2DF8A" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00"
## [369] "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FDBF6F" "#1F78B4" "#1F78B4"
## [385] "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3"
## [401] "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#B2DF8A" "#1F78B4" "#1F78B4" "#FF7F00" "#E31A1C" "#1F78B4" "#FF7F00"
##
## [[2]]
## [1] 8
##
## [[3]]
## [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#E31A1C" "#33A02C" "#FB9A99" "#FDBF6F"
★ 官網(wǎng)推薦橙困,在實際運(yùn)行中,推薦reps設(shè)置的更大耕餐,比如1000凡傅, maxK設(shè)置的更大,比如20
3. 計算聚類一致性 (cluster-consensus) 和樣品一致性 (item-consensus)
主要包含了以下幾個參數(shù):
1)title肠缔,設(shè)置生成的文件的路徑
2)plot夏跷,不設(shè)置時圖片結(jié)果僅輸出到屏幕,也可以設(shè)置輸出為'pdf', 'png', 'pngBMP' 明未。
3)writeTable槽华,若為TRUE,則將一致性矩陣趟妥、ICL猫态、log輸出到CSV文件
icl = calcICL(results, title = "consensus_cluster", plot = "png")
dim(icl[["clusterConsensus"]])
icl[["clusterConsensus"]][1:5,]
## k cluster clusterConsensus
## [1,] 2 1 0.8236517
## [2,] 2 2 0.9204836
## [3,] 3 1 0.6659016
## [4,] 3 2 0.7503082
## [5,] 3 3 0.8561029
dim(icl[["itemConsensus"]])
icl[["itemConsensus"]][1:5,]
## k cluster item itemConsensus
## 1 2 1 TCGA-ZQ-A9CR-01A-11R-A39E-31 0.2776996
## 2 2 1 TCGA-BR-6563-01A-13R-2055-13 0.2786404
## 3 2 1 TCGA-IN-7808-01A-11R-2203-13 0.2941890
## 4 2 1 TCGA-VQ-A8P8-01A-11R-A39E-31 0.2783686
## 5 2 1 TCGA-HU-A4HB-01A-12R-A251-31 0.2714451
4. 聚類結(jié)果展示
4.1 矩陣熱圖
? 一致性矩陣的行和列表示的都是樣本,一致性矩陣的值按從0(不可能聚類在一起)到1(總是聚類在一起)用白色到深藍(lán)色表示
? 一致性矩陣按照一致性聚類(熱圖上方的樹狀圖)來排列披摄。樹狀圖和熱圖之間的彩色矩形即分出來的類別
4.2 一致性累積分布函數(shù)(CDF)圖
? 此圖展示了k取不同數(shù)值時(用顏色表示)的一致性矩陣的累積分布函數(shù)亲雪,用于判斷當(dāng)k取何值時,CDF達(dá)到一個近似最大值行疏,此時一致性和簇置信度在達(dá)到最大匆光,聚類分析結(jié)果最可靠
4.3 Delta Area Plot
? 此圖展示了 k 和 k-1 相比CDF曲線下面積的相對變化。當(dāng)k = 2時酿联,因為沒有k=1终息,所以第一個點表示的是k=2時CDF曲線下總面積,而當(dāng)k = 10 時贞让,曲線下面積趨近于平穩(wěn)周崭,所以10為最合適的k值。
4.4 Tracking Plot
? 此圖下方的黑色條紋表示樣品喳张,展示的是樣品在k取不同的值時续镇,歸屬的分類情況,不同顏色的色塊代表不同的分類销部。取不同k值前后經(jīng)常改變顏色分類的樣品代表其分類不穩(wěn)定摸航。若分類不穩(wěn)定的樣本太多制跟,則說明該k值下的分類不穩(wěn)定。
4.5 item-Consensus Plot(樣本一致性圖)
? 縱坐標(biāo)代表Item-consensus values酱虎。k值不同時雨膨,每個樣本都會有一個對應(yīng)不同簇的item-consensus values。豎條代表每一個樣本读串,豎條按從下到上遞增的值進(jìn)行排序聊记,高度代表該樣本的總item-consensus values。每個樣本的頂部都有一個小矩形恢暖,小矩形的顏色代表該樣本被分到了哪一簇排监。
? 該圖提供了給定k下所有其他集群的樣本一致性圖,可以查看樣本是否是集群中非辰芪妫“純粹”的成員舆床,或者它是否與多個集群共享高度一致性(多種顏色的列中的大矩形),表明它是不穩(wěn)定或“不純粹”的成員琼娘。 這些值可用于選擇“核心”樣本峭弟,它們高度代表集群。 此外脱拼,也可以幫助決策簇數(shù)量
4.6 Cluster-Consensus Plot(聚類一致性圖 )
? 此圖展示的是不同k值下瞒瘸,每個分類的cluster-consensus value(該簇中成員pairwise consensus values的均值)。該值越高(低)代表穩(wěn)定性越高(低)熄浓∏槌簦可用于判斷同一k值下以及不同k值之間cluster-consensus value的高低
5. 最佳的K值的選擇
通過上面的結(jié)果解讀我們可以選取出初步分類的亞組的個數(shù),但是這種方法有時候帶有主觀性赌蔑,也可以用PAC法確定最佳聚類數(shù)目
# 面積最小值對應(yīng)K為最佳K
Kvec = 2:20
x1 = 0.1; x2 = 0.9 # threshold defining the intermediate sub-interval
PAC = rep(NA,length(Kvec))
names(PAC) = paste("K=",Kvec,sep="") # from 2 to maxK
for(i in Kvec){
M = results[[i]]$consensusMatrix
Fn = ecdf(M[lower.tri(M)]) # M 為計算出共識矩陣
PAC[i-1] = Fn(x2) - Fn(x1)
}
optK = Kvec[which.min(PAC)] # 理想的K值
# 重新繪制熱圖(以k=7為例)
## 選取自己感興趣的基因集
intersection_geneExp[1:3,1:3]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
## ALOX12B 0.5224 0.0031 0.6993
## GOT1 20.0073 9.3511 39.1501
## ALOX15B 0.2185 0.1652 1.3996
annCol <- data.frame(results = paste0("Cluster",
results[[7]][['consensusClass']]),
row.names = colnames(intersection_geneExp))
head(annCol)
## results
## TCGA-BR-A4J4-01A-12R-A251-31 Cluster1
## TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1
## TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2
## TCGA-BR-6852-01A-11R-1884-13 Cluster2
## TCGA-BR-7851-01A-11R-2203-13 Cluster2
## TCGA-BR-A4J1-01A-11R-A251-31 Cluster3
library(RColorBrewer)
library(pheatmap)
mycol <- brewer.pal(7, "Set3")
annColors <- list(results = c("Cluster1" = mycol[1],
"Cluster2" = mycol[2],
"Cluster3" = mycol[3],
"Cluster4" = mycol[4],
"Cluster5" = mycol[5],
"Cluster6" = mycol[6],
"Cluster7"= mycol[7]))
heatdata <- results[[7]][["consensusMatrix"]]
dimnames(heatdata) <- list(colnames(intersection_geneExp), colnames(intersection_geneExp))
heatdata[1:3,1:3]
## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
## TCGA-BR-A4J4-01A-12R-A251-31 1.0000000 0.6542056 0
## TCGA-BR-A4IZ-01A-32R-A251-31 0.6542056 1.0000000 0
## TCGA-RD-A7C1-01A-11R-A32D-31 0.0000000 0.0000000 1
# 繪制熱圖
result <- pheatmap(mat = heatdata,
color = colorRampPalette((c("white","steelblue")))(100),
border_color = NA,
annotation_col = annCol,
annotation_colors = annColors,
show_colnames = F,
show_rownames = F)
6. 提取分型結(jié)果
library(tidyverse)
Cluster_res <- annCol %>% as.data.frame %>% rownames_to_column("patient_ID")
table(Cluster_res$results)
##
## Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7
## 132 94 37 57 38 48 6
#合并預(yù)后數(shù)據(jù)進(jìn)行生存分析
STAD_clinical <- read_tsv("STAD_clinical.txt")
STAD_clinical$time <- ifelse(STAD_clinical$vital_status=='Alive', STAD_clinical$days_to_last_follow_up, STAD_clinical$days_to_death) # 如果患者已經(jīng)死亡則選擇days_to_death俯在,如果患者處于活著狀態(tài),則選擇days_to_last_follow_up
STAD_clinical$status <- ifelse(STAD_clinical$vital_status=='Alive', 0, 1)
Cluster_res <- Cluster_res %>% inner_join(STAD_clinical, "patient_ID")
head(Cluster_res)
## patient_ID results age gender tissue_or_organ_of_origin primary_diagnosis vital_status ajcc_pathologic_stage year_of_birth days_to_last_follow_up year_of_death days_to_death time status
## 1 TCGA-BR-A4J4-01A-12R-A251-31 Cluster1 39 male Gastric antrum Adenocarcinoma, NOS Alive Stage IIIB 1973 16 NA NA 16 0
## 2 TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1 45 female Gastric antrum Carcinoma, diffuse type Dead Stage IIIB 1967 24 NA 273 273 1
## 3 TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2 82 male Fundus of stomach Carcinoma, diffuse type Dead Stage IB 1923 NA 2006 507 507 1
## 4 TCGA-BR-6852-01A-11R-1884-13 Cluster2 64 female Cardia, NOS Adenocarcinoma, NOS Alive Stage IIA 1947 1367 NA NA 1367 0
## 5 TCGA-BR-7851-01A-11R-2203-13 Cluster2 74 male Body of stomach Adenocarcinoma, intestinal type Dead Stage IIB 1937 574 NA NA NA 1
## 6 TCGA-BR-A4J1-01A-11R-A251-31 Cluster2 63 male Gastric antrum Adenocarcinoma, NOS Dead Stage IIIA 1949 NA 2012 22 22 1
#進(jìn)行生存分析
library(survival)
library(survminer)
fit <- survfit(Surv(time, status) ~ results, data = Cluster_res)
ggsurvplot(fit, data = Cluster_res,
main = "Survival curve", # 添加標(biāo)題
surv.median.line = "hv", # 添加中位數(shù)生存時間線
font.main = c(16, "bold", "darkblue"), # 設(shè)置標(biāo)題字體大小娃惯、格式和顏色
font.x = c(14, "bold.italic", "red"), # 設(shè)置x軸字體大小跷乐、格式和顏色
font.y = c(14, "bold.italic", "darkred"), # 設(shè)置y軸字體大小、格式和顏色
font.tickslab = c(12, "plain", "darkgreen"), # 設(shè)置坐標(biāo)軸刻度字體大小趾浅、格式和顏色
legend = c(0.9, 0.15), # 通過坐標(biāo)指定圖例位置
legend.title = "Cluster", legend.labs = c("Cluster1","Cluster2","Cluster3","Cluster4","Cluster5","Cluster6","Cluster7"), # 更改圖例標(biāo)題和標(biāo)簽
size = 1, # 改變線條大小
linetype = "strata", # 按組更改線型
break.time.by = 250, # 將時間軸以250作為打斷
# palette = "Dark2" # 使用 brewer 調(diào)色板“Dark2”
palette = c('#E5D2DD', '#53A85F', '#F1BB72', '#F3B1A0', '#D6E7A3', '#57C3F3', '#476D87'), # 自定義調(diào)色板
ggtheme = theme_bw(), # 修改主題
# conf.int = TRUE, # 添加置信區(qū)間
pval = TRUE, # 添加p值
risk.table = TRUE, risk.table.y.text.col = TRUE, # 添加風(fēng)險表,并按層更改風(fēng)險表y 文本顏色
tables.height = 0.2, # 設(shè)置風(fēng)險表的高度
tables.theme = theme_cleantable(), # 設(shè)置風(fēng)險表的主題
xlim = c(0, 1050) # 更改 x 軸范圍
)
7. tSNE分析
# 將數(shù)據(jù)框中字符型數(shù)據(jù)轉(zhuǎn)化為數(shù)值型
intersection_tumor <- as.data.frame(lapply(intersection_geneExp, as.numeric))
row.names(intersection_tumor) <- row.names(intersection_geneExp)
colnames(intersection_tumor) <- colnames(tumor_matrix)
intersection_tumor = na.omit(intersection_tumor)
intersection_tumor <- as.matrix(intersection_tumor)
library(Rtsne)
tSNE_res<- Rtsne(t(intersection_tumor), dims= 2, perplexity= 10, verbose= F, max_iter= 500, check_duplicates= F)
tsne<- data.frame(tSNE1 = tSNE_res[["Y"]][,1], tSNE2= tSNE_res[["Y"]][,2], cluster = annCol$results)
ggplot(tsne, aes(x= tSNE1, y= tSNE2, color= cluster)) +
geom_point(size= 4.5, alpha= 0.5) +
stat_ellipse(level= 0.85, show.legend = F) +
theme(legend.position= "top")