英文教程地址
https://data.library.virginia.edu/getting-started-with-generalized-estimating-equations/
廣義估計(jì)方程和混合效應(yīng)模型及多水平模型的區(qū)別如下
The main difference is that it’s a marginal model. It seeks to model a population average. Mixed-effect/Multilevel models are subject-specific, or conditional, models. They allow us to estimate different parameters for each subject or cluster. In other words, the parameter estimates are conditional on the subject/cluster. This in turn provides insight into the variability between subjects or clusters. We can also obtain a population-level model from a mixed-effect model, but it’s basically an average of the subject-specific models.
GEE is intended for simple clustering or repeated measures. It cannot easily accommodate more complex designs such as nested or crossed groups; for example, nested repeated measures within a subject or group. This is something better suited for a mixed-effect model.
GEE computations are usually easier than mixed-effect model computations. GEE does not use the likelihood methods that mixed-effect models employ, which means GEE can sometimes estimate more complex models.Because GEE doesn’t use likelihood methods, the estimated “model” is incomplete and not suitable for simulation.
GEE allows us to specify a correlation structure for different responses within a subject or group. For example, we can specify that the correlation of measurements taken closer together is higher than those taken farther apart. This is not something that’s currently possible in the popular lme4 package.
#建立模擬數(shù)據(jù)集
URL <- "http://static.lib.virginia.edu/statlab/materials/data/depression.csv"
dat <- read.csv(URL, stringsAsFactors = TRUE)
dat$id <- factor(dat$id)
dat$drug <- relevel(dat$drug, ref = "standard")
head(dat, n = 3)
#查看病人個(gè)數(shù)(每個(gè)病人可以有多個(gè)觀測(cè))
dat%>%
distinct(id)%>%
count()
#查看數(shù)據(jù)分布情況
with(dat, tapply(depression, list(diagnose, drug, time), mean)) %>%
ftable() %>%
round(2)
#構(gòu)建廣義估計(jì)方程并查看最終結(jié)果
dep_gee <- gee(depression ~ diagnose + drug*time,#方程,注意交互作用
data = dat, #數(shù)據(jù)集
id = id, #患者識(shí)別編號(hào)
family = binomial,#連接函數(shù)
corstr = "independence")#數(shù)據(jù)相關(guān)矩陣淋昭,這里設(shè)定為獨(dú)立
summary(dep_gee)
exp(estimate)后可以得到OR值,可以看到太颤,independence的作業(yè)相關(guān)矩陣中假設(shè)組內(nèi)相關(guān)性是0戳寸,因?yàn)橐粋€(gè)id是3個(gè)觀察,所以是3乘以3的矩陣了
# Now let’s try a model with an exchangeable correlation structure.
# This says all pairs of responses within a subject are equally correlated.
# To do this we set corstr = "exchangeable".
#設(shè)定相關(guān)性矩陣為exchangeable,意思是組內(nèi)配對(duì)之間的相關(guān)性系數(shù)相等
dep_gee2 <- gee(depression ~ diagnose + drug*time,
data = dat,
id = id,
family = binomial,
corstr = "exchangeable")
summary(dep_gee2)
# Another possibility for correlation is an autoregressive structure.
# This allows correlations of measurements taken closer together to be higher than those taken farther apart.
#設(shè)定自回歸相關(guān)性矩陣并查看結(jié)果
dep_gee3 <- gee(depression ~ diagnose + drug*time,
data = dat,
id = id,
family = binomial,
corstr = "AR-M", Mv = 1)
dep_gee3$working.correlation
作業(yè)相關(guān)矩陣的選擇
How to choose which correlation structure to use? The good news is GEE estimates are valid even if you misspecify the correlation structure (Agresti, 2002). Of course this assumes the model is correct, but then again no model is exactly correct. Agresti suggests using the exchangeable structure as a start and then checking how the coefficient estimates and standard errors change with other correlation structures. If the changes are minimal, go with the simpler correlation structure.