腫瘤突變負(fù)荷(Tumor Mutational Burden, TMB)育谬,指腫瘤細(xì)胞基因組中,基因的外顯子編碼區(qū)每兆堿基中發(fā)生置換和插入或缺失突變的總數(shù)。從理論上來(lái)說(shuō)囱桨,高TMB(High Tumor Mutational Burden, TMB-H)的腫瘤患者,具有獲得更多新生抗原的潛力嗅绰,并且與腫瘤內(nèi)異質(zhì)性有關(guān)舍肠,能增強(qiáng)腫瘤免疫原性以及與ICI的反應(yīng)。
要計(jì)算樣本的TMB值窘面,必須先獲取樣本的突變數(shù)據(jù)翠语,TCGA 的突變流程有 4 種,分別是:muse, varscan2, somaticsniper, mutect2财边。其中 muse 和 somaticsniper 只能計(jì)算點(diǎn)突變肌括,無(wú)法識(shí)別 indel。而目前新版的TCGA不能直接下載4種制作好的maf文件酣难,可通過(guò) "TCGAbiolinks" 整理
1. 使用包整理突變數(shù)據(jù)
1.1 MAF文件的下載
# 安裝并加載所需的R包
library(TCGAbiolinks)
query <- GDCquery(
project = "TCGA-STAD",
data.category = "Simple Nucleotide Variation",
data.type = "Masked Somatic Mutation",
access = "open"
)
GDCdownload(query)
GDCprepare(query, save = T,save.filename = "TCGA-STAD_SNP.Rdata") # 這里的Rdata文件是一個(gè)數(shù)據(jù)框们童,可直接用maftools讀取使用
1.2 maftools讀取處理MAF文件
library(maftools)
load(file = "TCGA-COAD_SNP.Rdata")
maf.stad <- data
## 查看數(shù)據(jù)
class(maf.stad)
# [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
dim(maf.stad)
# [1] 183107 140
maf.stad[1:5, 1:10]
# # A tibble: 5 × 10
# Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type
# <chr> <int> <chr> <chr> <chr> <int> <int> <chr> <chr> <chr>
# 1 NEXN 91624 BI GRCh38 chr1 77929416 77929416 + Missense_Mutation SNP
# 2 COL24A1 255631 BI GRCh38 chr1 86125858 86125858 + Missense_Mutation SNP
# 3 LRRC8B 23507 BI GRCh38 chr1 89593023 89593023 + Frame_Shift_Del DEL
# 4 BPNT1 10380 BI GRCh38 chr1 220069429 220069429 + Missense_Mutation SNP
# 5 KCNF1 3754 BI GRCh38 chr2 10913244 10913244 + Missense_Mutation SNP
maf <- read.maf(maf.stad)
# -Validating
# -Silent variants: 45460
# -Summarizing
# --Possible FLAGS among top ten genes:
# TTN
# MUC16
# SYNE1
# FLG
# -Processing clinical data
# --Missing clinical data
# -Finished in 8.084s elapsed (23.5s cpu)
1.3 可視化
# 繪制MAF文件的整體結(jié)果圖
plotmafSummary(maf = maf, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE)
★ 錯(cuò)義突變和移碼插入突變較多;C>T 的點(diǎn)突變類型是最多的鲸鹦;TTN 和 MUC16 兩個(gè)基因的突變次數(shù)最多
# 繪制oncoplot圖(只展示前 15 個(gè)基因)
vc_cols <- RColorBrewer::brewer.pal(n = 8, name = 'Paired')
names(vc_cols) <- c(
'Frame_Shift_Del',
'Missense_Mutation',
'Nonsense_Mutation',
'Multi_Hit',
'Frame_Shift_Ins',
'In_Frame_Ins',
'Splice_Site',
'In_Frame_Del'
)
oncoplot(maf = maf, colors = vc_cols, top = 15)
# 使用 somaticInteractions 函數(shù)檢測(cè)互斥突變或同時(shí)突變的基因
somaticInteractions(maf = maf, top = 25, pvalue = c(0.05, 0.1))
★ 上方的條形圖展示的是樣本的突變負(fù)荷(若有樣本的突變負(fù)荷特別高慧库,在后續(xù)分析時(shí)可考慮是否將其作為離群點(diǎn)去除),右側(cè)的直方圖展示了基因的每種突變類型的情況
★ 突變率靠前的基因中馋嗜,同時(shí)出現(xiàn)突變的基因?qū)ζ?/p>
# 繪制Oncostrip(展示特定基因在樣本中的突變情況)
oncostrip(maf = maf, genes = c("PTGS2","NNMT","CA9"))
# 繪制Transitions Transversions匯總圖
laml.titv = titv(maf = maf, plot = FALSE, useSyn = TRUE)
plotTiTv(res = laml.titv)
★ 轉(zhuǎn)換和顛換圖 (Transition and transversion plot, Ti/Tv) 顯示了STAD中SNV的分布齐板,分為 6 個(gè)轉(zhuǎn)換和顛換事件。堆積條形圖(底部)顯示了MAF文件中每個(gè)樣本的突變譜分布
2. 計(jì)算 TMB
stad.tmb <- tmb(maf, captureSize = 38, logScale = T)
dim(stad.tmb)
# [1] 431 6
head(stad.tmb)
# Tumor_Sample_Barcode total total_perMB total_perMB_log group Tumor_Sample_ID
# 1: TCGA-RD-A8N4-01A-21D-A364-08 1 0.02631579 -1.5797836 TMB_low TCGA-RD-A8N4
# 2: TCGA-HU-A4GJ-01A-11D-A25D-08 2 0.05263158 -1.2787536 TMB_low TCGA-HU-A4GJ
# 3: TCGA-RD-A8N0-01A-12D-A364-08 2 0.05263158 -1.2787536 TMB_low TCGA-RD-A8N0
# 4: TCGA-FP-8210-01A-11D-2340-08 3 0.07894737 -1.1026623 TMB_low TCGA-FP-8210
# 5: TCGA-VQ-A8PQ-01A-11D-A410-08 5 0.13157895 -0.8808136 TMB_low TCGA-VQ-A8PQ
# 6: TCGA-BR-A44T-01A-32D-A24D-08 7 0.18421053 -0.7346856 TMB_low TCGA-BR-A44T
# 根據(jù)TMB平均值進(jìn)行分組
library(dplyr)
stad.tmb <- stad.tmb %>% mutate(group = if_else(total_perMB_log > mean(total_perMB_log), "TMB_high","TMB_low"))
3. 結(jié)合臨床數(shù)據(jù)進(jìn)行生存分析
# 只取前12位患者編號(hào)
stad.tmb$patient <- substr(stad.tmb$Tumor_Sample_Barcode, 1, 12)
# 加載自身臨床數(shù)據(jù)
clin_info[1:4,1:4]
# patient entity_submitter_id status time
# 1 TCGA-BR-A4J4 TCGA-BR-A4J4-01A-12R-A251-31 0 16
# 2 TCGA-BR-A4IZ TCGA-BR-A4IZ-01A-32R-A251-31 1 273
# 3 TCGA-RD-A7C1 TCGA-RD-A7C1-01A-11R-A32D-31 1 507
# 4 TCGA-BR-6852 TCGA-BR-6852-01A-11R-1884-13 0 1367
TMB <- merge(clin, stad.tmb, by = "patient")
fit <- survfit(Surv(time, status) ~ group, data = TMB)
ggsurvplot(fit, data = TMB,
pval = T,
risk.table = T,
surv.median.line = "hv", #添加中位生存曲線
palette = c("red", "blue"), #更改線的顏色
legend.labs = c("High risk","Low risk"), #標(biāo)簽
legend.title = "TMB Score",
title = "Overall survival", #標(biāo)題
ylab = "Cumulative survival (percentage)", xlab = " Time (Days)", #更改橫縱坐標(biāo)
censor.shape = 124, censor.size = 2, conf.int = FALSE, #刪失點(diǎn)的形狀和大小
break.x.by = 500 #橫坐標(biāo)間隔
)