基本知識(shí)
- 通過一定的統(tǒng)計(jì)學(xué)方法對(duì)試驗(yàn)組與對(duì)照組進(jìn)行篩選,使篩選出來的研究對(duì)象在某些重要臨床特征(潛在的混雜因素)上具有可比性
- 一般是通過某種統(tǒng)計(jì)學(xué)模型求得每個(gè)觀測的多個(gè)協(xié)變量的綜合傾向性得分阵难,再按照傾向性得分是否接近進(jìn)行匹配
- 最常用的統(tǒng)計(jì)模型一般是以分組變量為因變量岳枷,其它可能影響結(jié)果的混雜因素為協(xié)變量構(gòu)建logistic回歸模型
- 計(jì)算每個(gè)觀測的傾向得分,按照得分大小進(jìn)行匹配
代碼實(shí)現(xiàn)(使用MatchIt
包)
library(MatchIt)
library(tableone)
data(lalonde)
head(lalonde,4)
# treat age educ race married nodegree re74 re75 re78
# NSW1 1 37 11 black 1 1 0 0 9930.046
# NSW2 1 22 9 hispan 0 1 0 0 3595.894
# NSW3 1 30 12 black 0 0 0 0 24909.450
# NSW4 1 27 11 black 0 1 0 0 7506.146
str(lalonde)
# 'data.frame': 614 obs. of 9 variables:
# $ treat : int 1 1 1 1 1 1 1 1 1 1 ...
# $ age : int 37 22 30 27 33 22 23 32 22 33 ...
# $ educ : int 11 9 12 11 8 9 12 11 16 12 ...
# $ race : Factor w/ 3 levels "black","hispan",..: 1 2 1 1 1 1 1 1 1 3 ...
# $ married : int 1 0 0 0 0 0 0 0 0 1 ...
# $ nodegree: int 1 1 0 1 1 1 0 1 0 0 ...
# $ re74 : num 0 0 0 0 0 0 0 0 0 0 ...
# $ re75 : num 0 0 0 0 0 0 0 0 0 0 ...
# $ re78 : num 9930 3596 24909 7506 290 ...
#dput(names(lalonde))
preBL <- CreateTableOne(vars=c("treat","age","educ","race","married","nodegree","re74","re75","re78"),
strata="treat",data=lalonde,
factorVars=c("treat","race","married","nodegree"))
# treat是感興趣變量,re78為結(jié)局變量
print(preBL,showAllLevels = TRUE)
f=matchit(treat~re74+re75+educ+race+age+married+nodegree,data=lalonde,method="nearest",ratio = 1)
# treat是感興趣變量,re78為結(jié)局變量
summary(f)
# ...
# Sample Sizes:
# Control Treated
# All 429 185
# Matched 185 185
# Unmatched 244 0
# Discarded 0 0
matchdata=match.data(f)
mBL <- CreateTableOne(vars=c("treat","age","educ","race","married","nodegree","re74","re75","re78"),
strata="treat",data=matchdata,
factorVars=c("treat","race","married","nodegree"))
print(mBL,showAllLevels = TRUE)
plot(f, type = 'jitter', interactive = FALSE)
可見race這個(gè)變量還是不平衡呜叫,使用卡鉗值來解決
f1=matchit(treat~re74+re75+educ+race+age+married+nodegree,data=lalonde,method="nearest",caliper=0.05)
summary(f1)
# ...
# Sample Sizes:
# Control Treated
# All 429 185
# Matched 109 109
# Unmatched 320 76
# Discarded 0 0
matchdata1=match.data(f1)
mBL1 <- CreateTableOne(vars=c("treat","age","educ","race","married","nodegree","re74","re75","re78"),
strata="treat",data=matchdata1,
factorVars=c("treat","race","married","nodegree"))
print(mBL1,showAllLevels = TRUE)
plot(f1, type = 'jitter', interactive = FALSE)
導(dǎo)出結(jié)果數(shù)據(jù)
library(foreign)
matchdata$id<-1:nrow(matchdata)
write.csv(matchdata1,"matchdata.csv")
# write.dta(matchdata,"matchdata.dta")
- PSM的適用條件:對(duì)照組樣本量足夠大嫩舟,對(duì)照組和試驗(yàn)組樣本量之比5:1以上,確保絕大多數(shù)試驗(yàn)組對(duì)象可以匹配上合適的對(duì)照怀偷,最好所有試驗(yàn)組對(duì)象均得到良好匹配家厌;
- PSM與回歸的關(guān)系:能用PSM的均可以用回歸分析,可以用回歸的未必可以用PSM椎工。建議同時(shí)采用PSM和回歸分析處理數(shù)據(jù)饭于,當(dāng)兩者結(jié)果一致的時(shí)候說明結(jié)果較可信
參考資料
丁香園課程完整版R語言進(jìn)階之機(jī)器學(xué)習(xí)
How to use R for matching samples (propensity score)