一父丰、重要相關(guān)網(wǎng)站
1.http://www.oncolnc.org/
2.https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/
3.https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables
二农渊、重要代碼
1丧诺、生存曲線
rm(list=ls())
options(stringsAsFactors = F)基因|癌種生存分析http://www.oncolnc.org/
a<-read.csv('./BRCA_7157_50_50.csv')
看一下做生存分析需要的數(shù)據(jù)內(nèi)容
head(a)
Patient Days Status Expression Group
1 TCGA-AR-A24H 4894 Alive 193.89 Low
2 TCGA-A2-A0YE 554 Alive 198.99 Low
3 TCGA-AN-A0FL 231 Alive 205.44 Low
4 TCGA-A8-A08B 1156 Alive 219.08 Low
5 TCGA-B6-A0X1 7455 Dead 220.14 Low
6 TCGA-E9-A1RG 647 Alive 230.59 Low
#######理解一下high和low的來(lái)源
tmp<- ifelse(aExpression),'Low','High')
table(tmp==aStatus=='Alive','Status']<-1
a[a$Status=='Dead','Status']<-2a[a$Status=='Alive','Status']<-0
a[a$Status=='Dead','Status']<-1
#######生存分析的步驟
library(survival)通過(guò)Status對(duì)生存時(shí)間進(jìn)行標(biāo)記,+代表的是alive自点,無(wú)+代表的是dead
fit.surv <-Surv(a
Status))
相應(yīng)處理之后得到的結(jié)果為當(dāng)前時(shí)間段的survival rate
km_2<-survfit(fit.surv~Group,data=a)
#########以High為例暇仲;
summary(km_2)
image.png
(469-1)/469
[1] 0.9978678
(469-1)/469*(463-1)/463
[1] 0.9957126
#######對(duì)兩個(gè)分組的survival rate進(jìn)行相應(yīng)的統(tǒng)計(jì)檢驗(yàn)
library(ggplot2)
Warning message:
In dontCheck(fnname) : reached elapsed time limit
library(ggpubr)
library(magrittr)
library(survminer)
surv_pvalue(km_2)
variable pval method pval.txt
1 Group 0.1394986 Log-rank p = 0.14
ggsurvplot (km_2,pval=TRUE,risk.table = T,title='TP53_BRCA_50_50')
ggsurvplot(km_2,palette = c("#E7B800", "#2E9FDF"),
- risk.table =TRUE,pval =TRUE,
- conf.int =TRUE,xlab ="Days",
- ggtheme =theme_light(),
-
ncensor.plot = TRUE)
image.png
2该面、生存曲線代碼理解
library(ggplot2)
rm(list=ls())
options(stringsAsFactors = F)
a<-read.csv('./BRCA_7157_50_50.csv')
a[aStatus=='Dead','Status']<-1
clin_group<- a[,c('Status','Days')]
group <- a$Group
table(group)
group
High Low
503 503
取出對(duì)應(yīng)分組的臨床信息
group_1 <- clin_group[group=='High',]
group_2 <- clin_group[group=='Low',]取出對(duì)應(yīng)分組的對(duì)應(yīng)Status(Alive|Death)信息
group_1_A<- group_1[group_1
Status==1,]
group_2_A<- group_2[group_2Status==1,]
對(duì)2組下的Status信息按照Days進(jìn)行排序
A_2<- group_2_A[order(group_2_A
Days),]
將相同Days的event的計(jì)數(shù)進(jìn)行加和
D_2<- data.frame(Days = as.numeric(D_2
Days)]),
event= as.numeric(by(D_2,D_2$Days, function(x){length(x[,1])})))
將相同Days的censor(alive)的計(jì)數(shù)進(jìn)行加和
A_2<- data.frame(Days = as.numeric(A_2
Days)]),
censor= as.numeric(by(A_2,A_2$Days, function(x){length(x[,1])})))
這一組的總?cè)藬?shù)-event(death)-censor(alive|依據(jù)時(shí)間排序后,
此death對(duì)應(yīng)時(shí)間
之前的alive)y<- function(x){nrow(group_2)-sum(D_2
Days<x[1]])-sum(A_2
Days<x[1]])}
D_2$n.risk<- apply(D_2,1,y)每一個(gè)時(shí)間點(diǎn)的survival=(risk(number of alive)-event)/risk(number of alive)
D_2$step.survival <- apply(D_2,1,function(x){(x[3]-x[2])/x[3]})
截止到此時(shí)間點(diǎn)的survival=此時(shí)間點(diǎn)之前的survival*此時(shí)間點(diǎn)的survival
D_2
step.survival[D_2$Days<=x[1]])})
3.下載brca_data
下載數(shù)據(jù)
########IHC相關(guān)數(shù)據(jù)的讀取
rm(list=ls())
a <- read.csv('nationwidechildrens.org_clinical_patient_brca.txt',sep='\t')
a<- a[3:nrow(a),]
clin<- a[,c('bcr_patient_barcode','her2_status_by_ihc','pr_status_by_ihc','er_status_by_ihc')]
#######三陰和其他進(jìn)行區(qū)分(可能與作者的分組不一致)
TNBC_barcode<-clin[,1][grep('NegativeNegativeNegative',paste0(clin[,2],clin[,3],clin[,4]))]
non_TNBC<-clin[,1][!clin[,1]%in%TNBC_barcode]
表達(dá)矩陣下載翼悴,有多種方法姆钉,見(jiàn)https://www.bilibili.com/video/av49363776?from=search&seid=860607939803005976
GDCquery一般包括3步,GDCquery抄瓦、GDCdownload和GDCprepare
library(TCGAbiolinks)
library(SummarizedExperiment)
確定需要下載的data
GDCquery下載的整理:http://www.reibang.com/p/33bdb5ef7689
query <- GDCquery(project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts",
barcode = TNBC_barcode)
到這的時(shí)候有可能出現(xiàn)報(bào)錯(cuò)
安裝最新版本的TCGAbiolinks 不知道是否可行:
先安裝devtools包
install.packages("devtools")
devtools::install_github("BioinformaticsFMRP/TCGAbiolinks")
依然不行
后面的程序全部無(wú)法運(yùn)行
GDCdownload(query1)
BRC_DATA2<-GDCprepare(query1,save=FALSE)
BRC_NONTNBC <- assay(BRC_DATA2)
表型信息下載
clinic <- GDCquery_clinic(project = "TCGA-BRCA",
type = 'Clinical')
save(BRC_TNBC,BRC_NONTNBC,clinic,file='result01_BRCA.Rdata')
明天繼續(xù)尋找解決辦法潮瓶。