前言
之前一直在用RTCGA包下載數(shù)據(jù)检盼,看著永不更新的數(shù)據(jù)骑科,心里總覺得怪怪的时捌,于是下定決心重新學習一個好用的包——TCGAbiolinks亏吝。這個包調用GDC的API,應該是最新的數(shù)據(jù)裹刮。
主要參考:TCGAbiolinks: TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
下載數(shù)據(jù)
直接上代碼
# if (!requireNamespace("BiocManager", quietly=TRUE))
# install.packages("BiocManager")
# BiocManager::install("TCGAbiolinks")
library(TCGAbiolinks)
library(dplyr)
library(DT)
library(SummarizedExperiment)
#下面填入要下載的癌癥種類
request_cancer=c("PRAD","BLCA","KICH","KIRC","KIRP")
for (i in request_cancer) {
cancer_type=paste("TCGA",i,sep="-")
print(cancer_type)
#下載臨床數(shù)據(jù)
clinical <- GDCquery_clinic(project = cancer_type, type = "clinical")
write.csv(clinical,file = paste(cancer_type,"clinical.csv",sep = "-"))
#下載rna-seq的counts數(shù)據(jù)
query <- GDCquery(project = cancer_type,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
GDCdownload(query, method = "api", files.per.chunk = 100)
expdat <- GDCprepare(query = query)
count_matrix=assay(expdat)
write.csv(count_matrix,file = paste(cancer_type,"Counts.csv",sep = "-"))
#下載miRNA數(shù)據(jù)
query <- GDCquery(project = cancer_type,
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
workflow.type = "BCGSC miRNA Profiling")
GDCdownload(query, method = "api", files.per.chunk = 50)
expdat <- GDCprepare(query = query)
count_matrix=assay(expdat)
write.csv(count_matrix,file = paste(cancer_type,"miRNA.csv",sep = "-"))
#下載Copy Number Variation數(shù)據(jù)
query <- GDCquery(project = cancer_type,
data.category = "Copy Number Variation",
data.type = "Copy Number Segment")
GDCdownload(query, method = "api", files.per.chunk = 50)
expdat <- GDCprepare(query = query)
count_matrix=assay(expdat)
write.csv(count_matrix,file = paste(cancer_type,"Copy-Number-Variation.csv",sep = "-"))
#下載甲基化數(shù)據(jù)
query.met <- GDCquery(project =cancer_type,
legacy = TRUE,
data.category = "DNA methylation")
GDCdownload(query.met, method = "api", files.per.chunk = 300)
expdat <- GDCprepare(query = query)
count_matrix=assay(expdat)
write.csv(count_matrix,file = paste(cancer_type,"methylation.csv",sep = "-"))
}
常用的一些數(shù)據(jù)基本都下下來了音榜,放在當前目錄下。