1. 導(dǎo)入數(shù)據(jù)
library(tidyverse)
library("survival")
library("survminer")
test_data=read.table("survival.txt",header = T,sep = "\t",stringsAsFactors = F)
test_data$over55=ifelse(test_data$age >= 55,1,0)
test_data$over55=as.factor(test_data$over55)
test_data$drug=as.factor(test_data$drug)
head(test_data)
# studytime died drug age over55
# 1 1 1 0 61 1
# 2 1 1 0 65 1
# 3 2 1 0 59 1
# 4 3 1 0 52 0
# 5 4 1 0 56 1
# 6 4 1 0 67 1
summary(test_data)
# studytime died drug age over55
# Min. : 1.00 Min. :0.0000 0:20 Min. :47.00 0:19
# 1st Qu.: 7.75 1st Qu.:0.0000 1:28 1st Qu.:50.75 1:29
# Median :12.50 Median :1.0000 Median :56.00
# Mean :15.50 Mean :0.6458 Mean :55.88
# 3rd Qu.:23.00 3rd Qu.:1.0000 3rd Qu.:60.00
# Max. :39.00 Max. :1.0000 Max. :67.00
attach(test_data)
注意,因?yàn)槲覟榱俗詣?dòng)補(bǔ)全變量名澜建,使用了attach()。如果沒(méi)有用這個(gè)蝌以,像survfit() coxph()這些函數(shù)還需要加上data參數(shù)
2. KM生存曲線
drug_KM=survfit(Surv(studytime,died) ~ drug,data=test_data)
ggsurvplot(drug_KM,data=test_data,legend.title = "drug",pval = TRUE,pval.method=T,palette=c("red","blue"))
ggsave("drug.png",width = 5,height = 5)
over55_KM=survfit(Surv(studytime,died) ~ over55,data=test_data)
ggsurvplot(over55_KM,data=test_data,legend.title = "over55",pval = TRUE,pval.method=T,palette=c("red","blue"))
ggsave("over55.png",width = 5,height = 5)
3. COX比例風(fēng)險(xiǎn)模型
用兩個(gè)變量進(jìn)行Cox-PH model炕舵,都是分類變量,當(dāng)然數(shù)值變量也可以做
cox.mod=coxph(Surv(studytime,died) ~ drug + over55)
#描述一下
summary(cox.mod)
# Call:
# coxph(formula = Surv(studytime, died) ~ drug + over55)
#
# n= 48, number of events= 31
#
# coef exp(coef) se(coef) z Pr(>|z|)
# drug1 -2.2632 0.1040 0.4582 -4.940 7.82e-07 ***
# over551 0.9274 2.5278 0.3935 2.356 0.0184 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# exp(coef) exp(-coef) lower .95 upper .95
# drug1 0.104 9.6135 0.04238 0.2553
# over551 2.528 0.3956 1.16888 5.4668
#
# Concordance= 0.784 (se = 0.039 )
# Likelihood ratio test= 31.03 on 2 df, p=2e-07
# Wald test = 26.92 on 2 df, p=1e-06
# Score (logrank) test = 34.49 on 2 df, p=3e-08
- exp(coef)就是HR跟畅,0.104表示:在控制年齡的情況下咽筋,用藥的人死亡可能性是沒(méi)有用藥的人的0.104倍
- exp(-coef)為9.6135的解釋剛好反過(guò)來(lái),表示:在控制年齡的情況下徊件,沒(méi)有用藥的人死亡可能性是用藥的人的9.6135倍
- Concordance表示預(yù)測(cè)一致性
- 后面有三個(gè)假設(shè):零假設(shè)是b1=b2=b3=...=b_k=0奸攻,表征的是模型整體的顯著性
4. 森林圖
一般文獻(xiàn)里面比較關(guān)心HR,我們可以用森林圖來(lái)表示
ggforest(cox.mod, main = '',data = test_data)
ggsave(filename = "cox.mod.png",device = "png",width = 20,height = 10,units = c("cm"))
有時(shí)候也能看到KM曲線里面加HR的圖庇忌,這需要結(jié)合KM的結(jié)果和COX的結(jié)果舞箍。
到這里,對(duì)于paper的畫圖已經(jīng)足夠了皆疹,如果想深入了解Cox比例風(fēng)險(xiǎn)模型疏橄,可以看看下面的內(nèi)容
5. 采用某個(gè)變量前后的模型比較
用LRT (likelihood ratio tests) 比較嵌套模型(一個(gè)模型含有另一個(gè)模型的全部變量),零假設(shè)是兩個(gè)模型沒(méi)有差異
cox.mod=coxph(Surv(studytime,died) ~ drug + over55)
cox.mod2=coxph(Surv(studytime,died) ~ over55)
anova(cox.mod2,cox.mod,test = "LRT")
# Analysis of Deviance Table
# Cox model: response is Surv(studytime, died)
# Model 1: ~ over55
# Model 2: ~ drug + over55
# loglik Chisq Df P(>|Chi|)
# 1 -98.062
# 2 -83.994 28.136 1 1.131e-07 ***
p值很小略就,說(shuō)明有差異捎迫,所以我們不能去掉drug變量
6. cox模型也能用數(shù)值型變量
cox.mod3=coxph(Surv(studytime,died) ~ age)
summary(cox.mod3)
exp(coef)等于1.08表示:每增加1歲,風(fēng)險(xiǎn)增加0.08
7. 檢查COX PH model的假設(shè)
- check linearity
模型中有數(shù)值型變量
線性回歸里面也有這一步表牢,方法類似 - check the proportional hazard's assumptions
可以簡(jiǎn)單理解為某個(gè)變量對(duì)結(jié)局事件的影響(回歸得到的系數(shù))不隨時(shí)間而改變
cox.mod4=coxph(Surv(studytime,died) ~ age)
png("Martingale_residuals.png")
plot(predict(cox.mod4),residuals(cox.mod4,type = "martingale"),
xlab = "fitted values",ylab = "Martingale residuals",
main = "Residual Plot",las=1)
#加水平線
abline(h=0)
#畫殘差的擬合線
lines(smooth.spline(predict(cox.mod4),residuals(cox.mod4,type = "martingale")),col="red")
dev.off()
換一種殘差檢查線性窄绒,deviance residuals
png("deviance_residuals.png")
plot(predict(cox.mod4),residuals(cox.mod4,type = "deviance"),
xlab = "fitted values",ylab = "Deviance residuals",
main = "Residual Plot",las=1)
abline(h=0)
lines(smooth.spline(predict(cox.mod4),residuals(cox.mod4,type = "deviance")),col="red")
dev.off()
check proportional hazards assumption
using Schoenfeld test for PH, Ho: HAZARDS are prop, Ha: HAZARDS are NOT prop
結(jié)果會(huì)返回每個(gè)變量,以及整體的p值
cox.zph(cox.mod4)
# chisq df p
# age 0.662 1 0.42
# GLOBAL 0.662 1 0.42
p值較大可以接受原假設(shè)崔兴。
也可以看某個(gè)變量的系數(shù)(β)是不是隨著時(shí)間而改變彰导,如果是(也就是說(shuō)HR隨時(shí)間而改變)蛔翅,則說(shuō)明為non-prop hazard
par(mfrow=c(1,1))
png("cox.zph.png")
plot(cox.zph(cox.mod4)[1])
dev.off()
detach(test_data)