歡迎關(guān)注博客:http://blog.genesino.com/2017/08/Survival-analysis-plots/
準(zhǔn)備工作
#安裝R包
#install.packages(c("survival", "survminer"))
#加載R包
library(survival)
library(survminer)
#survival包里包含的數(shù)據(jù)集
data(package="survival")
#以肺癌數(shù)據(jù)為例考传,顯示數(shù)據(jù)前六行
head(lung)
## inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
## 1 3 306 2 74 1 1 90 100 1175 NA
## 2 3 455 2 68 1 0 90 90 1225 15
## 3 3 1010 1 56 1 0 90 90 NA 15
## 4 5 210 2 57 1 1 90 60 1150 11
## 5 1 883 2 60 1 0 100 90 NA 0
## 6 12 1022 1 74 1 1 50 80 513 0
#查看肺癌數(shù)據(jù)詳細(xì)說明
?lung
#inst: Institution code
#time: Survival time in days
#status: censoring status 1=censored, 2=dead
#age: Age in years
#sex: Male=1 Female=2
#ph.ecog: ECOG performance score (0=good 5=dead)
#ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician
#pat.karno: Karnofsky performance score as rated by patient
#meal.cal: Calories consumed at meals
#wt.loss: Weight loss in last six months
建立生存分析
#創(chuàng)建生存數(shù)據(jù)對象
surv_obj <- with(lung, Surv(time, status))
#用survfit()估計KM生存曲線, 用數(shù)據(jù)集中所有患者作為分析對象
fit1 <- survfit(surv_obj ~ 1)
#顯示壽命表
summary1 <- surv_summary(fit1)
head(summary1)
## time n.risk n.event n.censor surv std.err upper lower
## 1 5 228 1 0 0.9956140 0.004395615 1.0000000 0.9870734
## 2 11 227 3 0 0.9824561 0.008849904 0.9996460 0.9655619
## 3 12 224 1 0 0.9780702 0.009916654 0.9972662 0.9592437
## 4 13 223 2 0 0.9692982 0.011786516 0.9919508 0.9471630
## 5 15 221 1 0 0.9649123 0.012628921 0.9890941 0.9413217
## 6 26 220 1 0 0.9605263 0.013425540 0.9861367 0.9355811
#中位生存期及95%置信區(qū)間
#中位生存期:當(dāng)累積生存率為0.5時所對應(yīng)的生存時間吃型,表示有且只有50%的個體可以活過這個時間.
fit1
## Call: survfit(formula = surv_obj ~ 1)
##
## n events median 0.95LCL 0.95UCL
## 228 165 310 285 363
##25%, 50%, 75%生存期
quantile(fit1, probs=c(0.25, 0.5, 0.75), conf.int=FALSE)
## 25 50 75
## 170 310 550
#50天和100天生存狀況
summary(fit1, times=c(200, 400))
## Call: survfit(formula = surv_obj ~ 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 200 144 72 0.680 0.0311 0.622 0.744
## 400 57 54 0.377 0.0358 0.313 0.454
不同因子生存曲線的比較
例如我們想比較男性和女性肺癌患者生存曲線的差別
#用survfit()估計KM生存曲線, 方程右邊以性別為區(qū)分,分別估計男性和女性的生存曲線
fit2 <- survfit(surv_obj ~ sex, data = lung)
#分別顯示男性和女性的壽命表
summary2 <- surv_summary(fit2)
head(summary2)
## time n.risk n.event n.censor surv std.err upper lower
## 1 11 138 3 0 0.9782609 0.01268978 1.0000000 0.9542301
## 2 12 135 1 0 0.9710145 0.01470747 0.9994124 0.9434235
## 3 13 134 2 0 0.9565217 0.01814885 0.9911586 0.9230952
## 4 15 132 1 0 0.9492754 0.01967768 0.9866017 0.9133612
## 5 26 131 1 0 0.9420290 0.02111708 0.9818365 0.9038355
## 6 30 130 1 0 0.9347826 0.02248469 0.9768989 0.8944820
## strata sex
## 1 sex=1 1
## 2 sex=1 1
## 3 sex=1 1
## 4 sex=1 1
## 5 sex=1 1
## 6 sex=1 1
#分別顯示男性和女性的中位生存期及95%置信區(qū)間
fit2
## Call: survfit(formula = surv_obj ~ sex, data = lung)
##
## n events median 0.95LCL 0.95UCL
## sex=1 138 112 270 212 310
## sex=2 90 53 426 348 550
#用lag-rank test檢驗(yàn)不同組生存曲線的差異
#survdiff()函數(shù)最后一個參數(shù)為rho僚楞,用于指定檢驗(yàn)的類型
#rho=0(默認(rèn))勤晚,進(jìn)行l(wèi)og-rank檢驗(yàn)或Mantel?Haenszelχ2檢驗(yàn),比較各組期望頻數(shù)和實(shí)際觀察數(shù)泉褐。如果兩組間的差異水平太大赐写,χ2會較大而P值較小,表示生存曲線有統(tǒng)計學(xué)差異.
#當(dāng)rho=1時膜赃,進(jìn)行Gehan-Wilcoxon的Peto校正檢驗(yàn)挺邀,該檢驗(yàn)賦予早期終點(diǎn)事件較大的權(quán)重
surv_diff <- survdiff(surv_obj ~ sex, data = lung)
surv_diff
## Call:
## survdiff(formula = surv_obj ~ sex, data = lung)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## sex=1 138 112 91.6 4.55 10.3
## sex=2 90 53 73.4 5.68 10.3
##
## Chisq= 10.3 on 1 degrees of freedom, p= 0.00131
生存曲線可視化
繪制兩條生存曲線
ggsurv <- ggsurvplot(fit2, #survfit object with calculated statistics.
data = lung, #data used to fit survival curves.
main = "Survival curve", #add title
xlab = "Time (Days)", #change the x axis label.
font.main = c(16, "bold", "darkblue"), #change title font size, style and color
font.x = c(14, "bold.italic", "red"), #change x axis label font size, style and color
font.y = c(14, "bold.italic", "darkred"), #change y axis font size, style and color
font.tickslab = c(12, "plain", "darkgreen"), #change ticks label font size, style and color
legend = "bottom", #change legend position
#legend = c(0.2, 0.2), #Specify legend position by its coordinates
legend.title = "Sex", #change legend title
legend.labs = c("Male", "Female"), #change legend labels
size = 1, #change line size
linetype = "strata", #change line type by groups (i.e. "strata")
break.time.by = 250, #break X axis in time intervals by 250
#xlim = c(0, 600)), #shorten the survival curves
palette = c("#E7B800", "#2E9FDF"), #custom color palette
#palette = "Dark2", #Use brewer color palette "Dark2"
#palette = "grey", #Use grey palette
conf.int = TRUE, #Add confidence interval
conf.int.style = "step", #customize style of confidence intervals
pval = TRUE, #Add p-value of the Log-Rank test comparing the groups
surv.median.line = "hv", #add horizontal/vertical line at median survival.
risk.table = TRUE, #Add risk table
#risk.table = "absolute/percentage/abs_pct",
#to show absolute number, percentage of subjects, and both at risk by time, respectively.
risk.table.col = "strata", #Change risk table color by groups
risk.table.y.text.col = TRUE, #colour risk table text annotations by strata
#risk.table.y.text = FALSE, #show bars instead of names in text annotations in legend of risk table
risk.table.height = 0.25, #the height of the risk table. Useful when you have multiple groups
ncensor.plot = TRUE, # plot the number of censored subjects at time t
ncensor.plot.height = 0.25,
ggtheme = theme_bw(), #customize plot and risk table with a theme.
xlim = c(0, 600) #Change x axis limits (xlim)
)
ggsurv
[圖片上傳失敗...(image-961dd3-1510129034755)]
繪制多條生存曲線
fit3 <- survfit(Surv(time, status) ~ sex + rx + adhere, data = colon )
ggsurvplot(fit3,
pval = TRUE, #Add p-value
break.time.by = 800, #break time axis by 800
risk.table = TRUE, #add risk table
risk.table.col = "strata", #Change risk table color by groups
risk.table.height = 0.5, #the height of the risk table. Useful when you have multiple groups
ggtheme = theme_bw() #change plot theme
#legend.labs = c("A", "B", "C", "D", "E", "F") #Change legend labels
)
[圖片上傳失敗...(image-88b919-1510129034755)]
Cox回歸
單因素COX回歸
covariates <- c("age", "sex", "ph.karno", "ph.ecog", "meal.cal", "wt.loss")
univ_formulas <- sapply(covariates,
function(x) as.formula(paste('Surv(time, status)~', x)))
univ_models <- lapply( univ_formulas, function(x){coxph(x, data = lung)})
# Extract results
univ_results <- lapply(univ_models,
function(x){
x <- summary(x)
p.value<-signif(x$wald["pvalue"], digits=2)
wald.test<-signif(x$wald["test"], digits=2)
beta<-signif(x$coef[1], digits=2);#coeficient beta
HR <-signif(x$coef[2], digits=2);#exp(beta)
HR.confint.lower <- signif(x$conf.int[,"lower .95"], 2)
HR.confint.upper <- signif(x$conf.int[,"upper .95"],2)
HR <- paste0(HR, " (",
HR.confint.lower, "-", HR.confint.upper, ")")
res<-c(beta, HR, wald.test, p.value)
names(res)<-c("beta", "HR (95% CI for HR)", "wald.test",
"p.value")
return(res)
#return(exp(cbind(coef(x),confint(x))))
})
results <- t(as.data.frame(univ_results, check.names = FALSE))
as.data.frame(results)
## beta HR (95% CI for HR) wald.test p.value
## age 0.019 1 (1-1) 4.1 0.042
## sex -0.53 0.59 (0.42-0.82) 10 0.0015
## ph.karno -0.016 0.98 (0.97-1) 7.9 0.005
## ph.ecog 0.48 1.6 (1.3-2) 18 2.7e-05
## meal.cal -0.00012 1 (1-1) 0.29 0.59
## wt.loss 0.0013 1 (0.99-1) 0.05 0.83
多因素COX回歸
多個自變量是如何共同影響因變量
cox_model <- coxph(Surv(time, status) ~ age + sex + ph.ecog + ph.karno, data = lung)
summary(cox_model)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + ph.karno,
## data = lung)
##
## n= 226, number of events= 163
## (2 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.012868 1.012951 0.009404 1.368 0.171226
## sex -0.572802 0.563943 0.169222 -3.385 0.000712 ***
## ph.ecog 0.633077 1.883397 0.176034 3.596 0.000323 ***
## ph.karno 0.012558 1.012637 0.009514 1.320 0.186842
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0130 0.9872 0.9945 1.0318
## sex 0.5639 1.7732 0.4048 0.7857
## ph.ecog 1.8834 0.5310 1.3338 2.6594
## ph.karno 1.0126 0.9875 0.9939 1.0317
##
## Concordance= 0.632 (se = 0.026 )
## Rsquare= 0.129 (max possible= 0.999 )
## Likelihood ratio test= 31.27 on 4 df, p=2.695e-06
## Wald test = 30.61 on 4 df, p=3.676e-06
## Score (logrank) test = 31.06 on 4 df, p=2.974e-06
COX回歸結(jié)果解釋
1. z結(jié)果是wald檢驗(yàn)的結(jié)果,z = coef/se(coef),用于檢驗(yàn)回歸系數(shù)是否顯著區(qū)別于0
2. 回歸系數(shù)(coef)
3. 風(fēng)險???(hazard ratio, HR): HR = exp(coef), 衡量協(xié)變量影響的程度
4. 風(fēng)險比的95%置信區(qū)間
5. 對模型的整體分析端铛,即評價模型是否有意義的三種檢驗(yàn)(Likelihood ratio test, Wald test, Score (logrank) test)
COX模型等比性檢驗(yàn)
方法一:通過圖形展示泣矛,對縱軸和橫軸分別取對數(shù),繪制不同自變量水平下的生存曲線禾蚕。如果兩條曲線平行乳蓄,則符合比例風(fēng)險假定.
方法二:利用cox.zph函數(shù)進(jìn)行檢驗(yàn)
cox.zph(cox_model)
## rho chisq p
## age 0.0168 0.0502 0.8227
## sex 0.0881 1.2539 0.2628
## ph.ecog 0.0600 0.5122 0.4742
## ph.karno 0.1843 4.4660 0.0346
## GLOBAL NA 8.0414 0.0901
#檢驗(yàn)結(jié)果中ph.karno具有顯著性(p<0.05),即這一自變量不符合“等比性”要求夕膀,這一回歸模型不準(zhǔn)確
根據(jù)檢驗(yàn)結(jié)果調(diào)整模型虚倒,將不可比的自變量進(jìn)行分層, 也就是說排除混雜因素的影響,分析研究因素的影響
cox_model1 <- coxph(Surv(time, status) ~ age + sex + ph.ecog + strata(ph.karno), data = lung)
summary(cox_model1)
## Call:
## coxph(formula = Surv(time, status) ~ age + sex + ph.ecog + strata(ph.karno),
## data = lung)
##
## n= 226, number of events= 163
## (2 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.014807 1.014918 0.009651 1.534 0.124972
## sex -0.591683 0.553395 0.173049 -3.419 0.000628 ***
## ph.ecog 0.483879 1.622356 0.196558 2.462 0.013825 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.0149 0.9853 0.9959 1.0343
## sex 0.5534 1.8070 0.3942 0.7768
## ph.ecog 1.6224 0.6164 1.1037 2.3848
##
## Concordance= 0.601 (se = 0.054 )
## Rsquare= 0.085 (max possible= 0.986 )
## Likelihood ratio test= 19.99 on 3 df, p=0.0001708
## Wald test = 19.14 on 3 df, p=0.0002557
## Score (logrank) test = 19.57 on 3 df, p=0.0002083
再次檢驗(yàn)?zāi)P?/p>
cox.zph(cox_model1)
## rho chisq p
## age 0.0454 0.363 0.547
## sex 0.0919 1.289 0.256
## ph.ecog 0.0358 0.184 0.668
## GLOBAL NA 1.958 0.581
#哈哈哈产舞,bingo~