這里以食管癌(Esophageal carcinoma,ESCA)為例
首先使用R包TCGAbiolinks下載ESCA的數(shù)據(jù)
library(TCGAbiolinks)
query.esca <- GDCquery(project = "TCGA-ESCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts")
GDCdownload(query.esca)
上述代碼運(yùn)行完畢后,會在你的當(dāng)前路徑下創(chuàng)建一個(gè)GDCdata文件夾,然后并會自動連接TCGA網(wǎng)站進(jìn)行數(shù)據(jù)的下載
# 合并所有樣本
esca <- GDCprepare(query.esca)
# define ESCC and ESAD
table(esca$primary_diagnosis)
esca$tumor_type <- factor(esca$primary_diagnosis,
levels = c('Adenocarcinoma, NOS','Basaloid squamous cell carcinoma','Mucinous adenocarcinoma',
'Squamous cell carcinoma, keratinizing, NOS','Squamous cell carcinoma, NOS','Tubular adenocarcinoma'),
labels = c('ESAD','ESCC','ESAD','ESCC','ESCC','ESAD')) %>% as.character()
table(esca$tumor_type)
提取ESCC數(shù)據(jù)的TPM表達(dá)矩陣怒医,并用TCGAbiolinks包自帶的TCGAanalyze_survival()
函數(shù)進(jìn)行生存分析
# 提取ESCC數(shù)據(jù)
escc <- esca[,which(esca$tumor_type == 'ESCC')]
# 提取TPM表達(dá)矩陣
tpm <- escc@assays@data$tpm_unstrand
dimnames(tpm) <- list(escc@rowRanges$gene_name,escc@colData@rownames)
# 按gender進(jìn)行生存分析
TCGAanalyze_survival(esca@colData,clusterCol = 'gender')
# 按CREBBP基因表達(dá)高低進(jìn)行生存分析
escc$CREBBP_exp <- 'CREBBP_high'
escc$CREBBP_exp[which(tpm['CREBBP',] < median(tpm['CREBBP',]))] <- 'CREBBP_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'CREBBP_exp')
# SIRT7
escc$SIRT7_exp <- 'SIRT7_high'
escc$SIRT7_exp[which(tpm['SIRT7',] < median(tpm['SIRT7',]))] <- 'SIRT7_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'SIRT7_exp')
參考
拓展閱讀:http://www.reibang.com/p/fd5e06ec260b
數(shù)據(jù)手動下載方法:https://zhuanlan.zhihu.com/p/563936447