“emmeans” package

The emmeans package is one of several alternatives to facilitate post hoc methods application and contrast analysis. It is a relatively recent replacement for the lsmeans package that some R users may be familiar with. It is intended for use with a wide variety of ANOVA models, including repeated measures and nested designs where the initial modeling would employ aov, lm, ez or lme4 (mixed models).

emmeans包是一些R用戶可能熟悉的lsmeans包的相對較新的替代品钻哩。它適用于多種方差分析模型佑女,包括重復測量和嵌套設計捌蚊,其中初始建模將使用‘aov’、‘lm’损离、‘ez’或‘lme4’(混合模型)排监。

1. Using emmeans for pairwise post hoc multiple comparisons.
1. 用emmeans來進行兩兩事后多重比較

Initially, a minimal illustration is presented. First is a “pairwise” approach to followup comparisons, with a p-value adjustment equivalent to the Tukey test. The emmeans function requires a model object to be passed as the first argument. We could use either fit1 (the aov object) or fit2 (the lm object) originally created in the base Anova section of this document.

首先,給出了一個簡短的說明旗吁。首先是后續(xù)比較的“成對”方法痊臭,p值的矯正用Tukey檢驗哮肚。emmeans函數要求將模型對象作為第一個參數傳遞。我們可以使用fit1(AOV對象)或fit2(lm對象)广匙,這些對象最初是在本文的基礎Anova部分中創(chuàng)建的允趟。

# 用emmeans()對主效應進行事后檢驗,并且做多重比較矯正鸦致。
#library(emmeans)
# reminder: fit.2 <- lm(dv~factora, data=hays)
fit2.emm.a <- emmeans(fit.2, "factora", data=hays)
pairs(fit2.emm.a, adjust="tukey")
##  contrast       estimate   SE df t.ratio p.value
##  control - fast      6.2 1.99 27  3.113  0.0117 
##  control - slow      5.6 1.99 27  2.811  0.0239 
##  fast - slow        -0.6 1.99 27 -0.301  0.9513 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates
plot(fit2.emm.a, comparisons = TRUE)
#pairs(fit2.emm.a, adjust="none")

The blue bars on the plot are the confidence intervals. The red arrowed lines represent a scheme to determine homogeneous groups. If the red lines overlap for two groups, then they are not significantly different using the method chosen.

圖上的藍色條是置信區(qū)間潮剪。紅色箭頭線表示確定同質組的方案。如果兩組的紅線重疊分唾,則表示它們沒有顯著差異抗碰。

The adjust argument can take one of several useful methods. ‘tukey’ is default, but others including ‘sidak,’ ‘bonferroni,’ etc can be specified. Specifying ‘none’ produces unadjusted p-values. See help with ‘?emmeans::summary.emmGrid’ for details. Here is an example using the ‘holm’ method of adjustment.

adjust參數可以采用幾種有用的方法之一。默認為tukey绽乔,但可以指定其他名稱弧蝇,包括 sidakbonferroni等折砸。指定None會產生未矯正的p值看疗。有關詳細信息,請參閱睦授?emMeans::Summary y.emmGrid的幫助两芳。下面是一個使用Holm矯正方法的示例。

library(emmeans)
fit2.emm.b <- emmeans(fit.2, "factora", data=hays)
pairs(fit2.emm.b, adjust="holm")
##  contrast       estimate   SE df t.ratio p.value
##  control - fast      6.2 1.99 27  3.113  0.0131 
##  control - slow      5.6 1.99 27  2.811  0.0181 
##  fast - slow        -0.6 1.99 27 -0.301  0.7655 
## 
## P value adjustment: holm method for 3 tests
plot(fit2.emm.b, comparisons = TRUE)

Interaction analysis in emmeans

https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html#contrasts

emmeans package, Version 1.7.0

Models in which predictors interact seem to create a lot of confusion concerning what kinds of post hoc methods should be used. It is hoped that this vignette will be helpful in shedding some light on how to use the emmeans package effectively in such situations.
交互作用的模型似乎造成了很多困惑去枷,不知道應該使用什么樣的“事后”方法怖辆。希望這一簡介 將有助于揭示如何在這種情況下有效地使用emmeans包。

Contents

  1. Interacting factors
    1. Simple contrasts
  2. Interaction contrasts
  3. Multivariate contrasts
  4. Interactions with covariates
  5. Summary

Index of all vignette topics

Interacting factors

As an example for this topic, consider the auto.noise dataset included with the package. This is a balanced 3x2x2 experiment with three replications. The response – noise level – is evaluated with different sizes of cars, types of anti-pollution filters, on each side of the car being measured.1
作為本主題的示例沉填,請考慮軟件包附帶的auto.noise數據集疗隶。這是一個平衡的3x2x2實驗,有三個重復翼闹。因變量-噪音水平-是通過測量汽車兩側不同大小的汽車、不同類型的防污染過濾器來評估的蒋纬。
讓我們擬合一個模型并獲得ANOVA表(由于數據的規(guī)模猎荠,我們認為響應是以十分之一分貝記錄的;因此我們通過對響應進行縮放來補償這一點):
Let’s fit a model and obtain the ANOVA table (because of the scale of the data, we believe that the response is recorded in tenths of decibels; so we compensate for this by scaling the response):

noise.lm <- lm(noise/10 ~ size * type * side, data = auto.noise)
anova(noise.lm)
## Analysis of Variance Table
## 
## Response: noise/10
##                Df  Sum Sq Mean Sq  F value    Pr(>F)
## size            2 260.514 130.257 893.1905 < 2.2e-16
## type            1  10.563  10.563  72.4286 1.038e-08
## side            1   0.007   0.007   0.0476 0.8291042
## size:type       2   8.042   4.021  27.5714 6.048e-07
## size:side       2  12.931   6.465  44.3333 8.730e-09
## type:side       1   0.174   0.174   1.1905 0.2860667
## size:type:side  2   3.014   1.507  10.3333 0.0005791
## Residuals      24   3.500   0.146

There are statistically strong 2- and 3-way interactions.
有顯著的的二重和三重交互作用

One mistake that a lot of people seem to make is to proceed too hastily to estimating marginal means (even in the face of all these interactions!). They would go straight to analyses like this:
許多人似乎犯的一個錯誤是過于匆忙地估算邊際均值(即使面對所有這些交互!)蜀备。他們會直接進行這樣的分析:

emmeans(noise.lm, pairwise ~ size)
## NOTE: Results may be misleading due to involvement in interactions
## $emmeans
##  size emmean     SE df lower.CL upper.CL
##  S     82.42 0.1102 24    82.19    82.64
##  M     83.38 0.1102 24    83.15    83.60
##  L     77.25 0.1102 24    77.02    77.48
## 
## Results are averaged over the levels of: type, side 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate    SE df t.ratio p.value
##  S - M      -0.958 0.156 24  -6.147  <.0001
##  S - L       5.167 0.156 24  33.140  <.0001
##  M - L       6.125 0.156 24  39.287  <.0001
## 
## Results are averaged over the levels of: type, side 
## P value adjustment: tukey method for comparing a family of 3 estimates

The analyst-in-a-hurry would thus conclude that the noise level is higher for medium-sized cars than for small or large ones.
匆忙的分析師會因此得出結論关摇,中型汽車的噪音水平要高于小型或大型汽車。
But as is seen in the message before the output, emmeans() valiantly tries to warn you that it may not be a good idea to average over factors that interact with the factor of interest. It isn’t always a bad idea to do this, but sometimes it definitely is.
但正如輸出之前的消息所示碾阁,' emmeans() '勇敢地試圖警告您输虱,對與感興趣的因素交互的因素進行平均可能不是一個好主意。這樣做并不總是一個壞主意脂凶,但有時確實是宪睹。

What about this time? I think a good first step is always to try to visualize the nature of the interactions before doing any statistical comparisons. The following plot helps.
應該怎么做呢? 我認為愁茁,在做任何統(tǒng)計比較之前,最好的第一步總是嘗試將相互作用的本質形象化亭病。下面的情節(jié)有幫助鹅很。

emmip(noise.lm, type ~ size | side)
image.png

Examining this plot, we see that the “medium” mean is not always higher; so the marginal means, and the way they compare, does not represent what is always the case. Moreover, what is evident in the plot is that the peak for medium-size cars occurs for only one of the two filter types. So it seems more useful to do the comparisons of size separately for each filter type. This is easily done, simply by conditioning on type:
通過這張圖,我們可以看到“中等”平均值并不總是更高;所以邊際均值罪帖,和它們比較的方式促煮,并不代表通常的情況。此外整袁,從圖中可以明顯看出菠齿,中型汽車的峰值只出現在兩種過濾類型中的一種。因此坐昙,對每種過濾類型分別進行大小比較似乎更有用泞当。這很容易做到,只需對類型進行條件設置:

emm_s.t <- emmeans(noise.lm, pairwise ~ size | type)
## NOTE: Results may be misleading due to involvement in interactions
emm_s.t
## $emmeans
## type = Std:
##  size emmean     SE df lower.CL upper.CL
##  S     82.58 0.1559 24    82.26    82.91
##  M     84.58 0.1559 24    84.26    84.91
##  L     77.50 0.1559 24    77.18    77.82
## 
## type = Octel:
##  size emmean     SE df lower.CL upper.CL
##  S     82.25 0.1559 24    81.93    82.57
##  M     82.17 0.1559 24    81.84    82.49
##  L     77.00 0.1559 24    76.68    77.32
## 
## Results are averaged over the levels of: side 
## Confidence level used: 0.95 
## 
## $contrasts
## type = Std:
##  contrast estimate   SE df t.ratio p.value
##  S - M     -2.0000 0.22 24  -9.071  <.0001
##  S - L      5.0833 0.22 24  23.056  <.0001
##  M - L      7.0833 0.22 24  32.127  <.0001
## 
## type = Octel:
##  contrast estimate   SE df t.ratio p.value
##  S - M      0.0833 0.22 24   0.378  0.9245
##  S - L      5.2500 0.22 24  23.812  <.0001
##  M - L      5.1667 0.22 24  23.434  <.0001
## 
## Results are averaged over the levels of: side 
## P value adjustment: tukey method for comparing a family of 3 estimates

Not too surprisingly, the statistical comparisons are all different for standard filters, but with Octel filters, there isn’t much of a difference between small and medium size.

For comparing the levels of other factors, similar judgments must be made. It may help to construct other interaction plots with the factors in different roles. In my opinion, almost all meaningful statistical analysis should be grounded in evaluating the practical impact of the estimated effects first, and seeing if the statistical evidence backs it up. Those who put all their attention on how many asterisks (I call these people “* gazers”) are ignoring the fact that these don’t measure the sizes of the effects on a practical scale.2 An effect can be practically negligible and still have a very small P value – or practically important but have a large P value – depending on sample size and error variance. Failure to describe what is actually going on in the data is a failure to do an adequate analysis. Use lots of plots, and think about the results. For more on this, see the discussion of P values in the “basics” vignette.

為了比較其他因素的水平民珍,必須做出類似的判斷襟士。這可能有助于構建與不同角色的因素之間的其他相互作用圖。在我看來嚷量,幾乎所有有意義的統(tǒng)計分析都應該首先評估估計效果的實際影響陋桂,看看統(tǒng)計證據是否支持這一點。那些把所有注意力都放在有多少個星號(我把這些人稱為“*觀察者”)上的人忽略了這樣一個事實蝶溶,即這些星號并不能在實際尺度上衡量效應的大小嗜历。根據樣本大小和誤差方差,效應實際上可以忽略不計抖所,仍然有一個非常小的P值--或者實際上很重要梨州,但有一個很大的P值。未能描述數據中的實際情況就是沒有進行充分的分析田轧。使用大量的情節(jié)暴匠,并考慮結果。有關這方面的更多信息傻粘,請參閱 “basics” vignette中關于P值的討論每窖。

Simple contrasts

An alternative way to specify conditional contrasts or comparisons is through the use of the simple argument to contrast() or pairs(), which amounts to specifying which factors are not used as by variables. For example, consider:

指定條件對比或比較的另一種方法是使用參數 simple作為contrast()pairs(),這相當于指定哪些因素不是不用作 by變量弦悉。例如窒典,請考慮:

noise.emm <- emmeans(noise.lm, ~ size * side * type)

Then pairs(noise.emm, simple = "size") is the same as pairs(noise.emm, by = c("side", "type")).

simple的含義是,用作比較的變量稽莉。
by的含義是瀑志,不用做比較的變量。

One may specify a list for simple, in which case separate runs are made with each element of the list. Thus, pairs(noise.emm, simple = list("size", c("side", "type"))) returns two sets of contrasts: comparisons of size for each combination of the other two factors; and comparisons of side*type combinations for each size.

可以為 simple指定一個list,在這種情況下劈猪,對列表的每個元素進行單獨的運行昧甘。因此,pairs(noise.emm, simple = list("size", c("side", "type")))返回兩組對比:其他兩個因素的每個組合的大小比較岸霹;以及每個大小的side*type組合的比較疾层。

A shortcut that generates all simple main-effect comparisons is to use simple = "each". In this example, the result is the same as obtained using simple = list("size", "side", "type").

simple = "each"表示,對所有主效應進行比較贡避,也就是固定另外兩個水平痛黎,看某個水平兩兩比較的結果,注意是所有的刮吧。所以其實是三種湖饱,一種是固定size和side,比較type杀捻。一種是固定size和type井厌,比較side。一種是固定side和type致讥,比較size仅仆。等價于 simple = list("size", "side", "type").

Ordinarily, when simple is a list (or equal to "each"), a list of contrast sets is returned. However, if the additional argument combine is set to TRUE, they are all combined into one family:

通常,當simple為列表(或等于each)時垢袱,會返回對比集列表墓拜。但是,如果額外的參數combine設置為TRUE请契,結果會全部合并為一個咳榜,如下:

contrast(noise.emm, "consec", simple = "each", combine = TRUE, adjust = "mvt")
##  side type  size contrast    estimate    SE df t.ratio p.value
##  L    Std   .    M - S          1.500 0.312 24   4.811  0.0013
##  L    Std   .    L - M         -8.667 0.312 24 -27.795  <.0001
##  R    Std   .    M - S          2.500 0.312 24   8.018  <.0001
##  R    Std   .    L - M         -5.500 0.312 24 -17.639  <.0001
##  L    Octel .    M - S         -0.333 0.312 24  -1.069  0.9768
##  L    Octel .    L - M         -5.667 0.312 24 -18.174  <.0001
##  R    Octel .    M - S          0.167 0.312 24   0.535  0.9999
##  R    Octel .    L - M         -4.667 0.312 24 -14.967  <.0001
##  .    Std   S    R - L         -1.833 0.312 24  -5.880  0.0001
##  .    Std   M    R - L         -0.833 0.312 24  -2.673  0.1713
##  .    Std   L    R - L          2.333 0.312 24   7.483  <.0001
##  .    Octel S    R - L         -0.500 0.312 24  -1.604  0.7748
##  .    Octel M    R - L          0.000 0.312 24   0.000  1.0000
##  .    Octel L    R - L          1.000 0.312 24   3.207  0.0555
##  L    .     S    Octel - Std   -1.000 0.312 24  -3.207  0.0562
##  L    .     M    Octel - Std   -2.833 0.312 24  -9.087  <.0001
##  L    .     L    Octel - Std    0.167 0.312 24   0.535  0.9999
##  R    .     S    Octel - Std    0.333 0.312 24   1.069  0.9768
##  R    .     M    Octel - Std   -2.000 0.312 24  -6.414  <.0001
##  R    .     L    Octel - Std   -1.167 0.312 24  -3.742  0.0168
## 
## P value adjustment: mvt method for 20 tests

The dots (.) in this result correspond to which simple effect is being displayed. If we re-run this same call with combine = FALSE or omitted, these twenty comparisons would be displayed in three broad sets of contrasts, each broken down further by combinations of by variables, each separately multiplicity-adjusted (a total of 16 different tables).

點(.)。在該結果中對應于正在顯示哪種簡單效果爽锥。如果我們使用Combine=False或省略重新運行相同的調用涌韩,這20個比較將顯示為三組對比,每個對比由by變量的組合進一步細分氯夷,每個變量都經過單獨的多重性調整(總共16個不同的表)臣樱。

Back to Contents

Interaction contrasts

An interaction contrast is a contrast of contrasts. For instance, in the auto-noise example, we may want to obtain the linear and quadratic contrasts of size separately for each type, and compare them. Here are estimates of those contrasts:
交互對比就是對比的對比。
例如肠槽,在自動噪聲示例中擎淤,我們可能希望分別獲得每種類型的尺寸的線性和二次對比,并對它們進行比較秸仙。以下是對這些對比的估計:

contrast(emm_s.t[[1]], "poly")   ## 'by = "type"' already in previous result 
## type = Std:
##  contrast  estimate    SE df t.ratio p.value
##  linear       -5.08 0.220 24 -23.056  <.0001
##  quadratic    -9.08 0.382 24 -23.786  <.0001
## 
## type = Octel:
##  contrast  estimate    SE df t.ratio p.value
##  linear       -5.25 0.220 24 -23.812  <.0001
##  quadratic    -5.08 0.382 24 -13.311  <.0001
## 
## Results are averaged over the levels of: side

The comparison of these contrasts may be done using the interaction argument in contrast() as follows:

IC_st <- contrast(emm_s.t[[1]], interaction = c("poly", "consec"), by = NULL)
IC_st
##  size_poly type_consec estimate    SE df t.ratio p.value
##  linear    Octel - Std   -0.167 0.312 24  -0.535  0.5979
##  quadratic Octel - Std    4.000 0.540 24   7.407  <.0001
## 
## Results are averaged over the levels of: side

(Using by = NULL restores type to a primary factor in these contrasts.) The practical meaning of this is that there isn’t a statistical difference in the linear trends, but the quadratic trend for Octel is greater than for standard filter types. (Both quadratic trends are negative, so in fact it is the standard filters that have more pronounced downward curvature, as is seen in the plot.) In case you need to understand more clearly what contrasts are being estimated, the coef() method helps:

(使用BY=NULL可將類型恢復為這些對比中的主要因素。)桩盲。其實際意義是線性趨勢沒有統(tǒng)計學差異寂纪,但OCTEL的二次趨勢大于標準濾波器類型。(這兩個二次趨勢都是負的,因此實際上是標準濾鏡具有更明顯的向下曲率捞蛋,如圖所示孝冒。)。如果您需要更清楚地了解所估計的對比度拟杉,coef()方法會有所幫助:

coef(IC_st)
##   size  type c.1 c.2
## 1    S   Std   1  -1
## 2    M   Std   0   2
## 3    L   Std  -1  -1
## 4    S Octel  -1   1
## 5    M Octel   0  -2
## 6    L Octel   1   1

Note that the 4th through 6th contrast coefficients are the negatives of the 1st through 3rd – thus a comparison of two contrasts.

By the way, “type III” tests of interaction effects can be obtained via interaction contrasts:

test(IC_st, joint = TRUE)
##  df1 df2 F.ratio p.value
##    2  24  27.571  <.0001

This result is exactly the same as the F test of size:type in the anova output.

The three-way interaction may be explored via interaction contrasts too:

contrast(emmeans(noise.lm, ~ size*type*side),
         interaction = c("poly", "consec", "consec"))
##  size_poly type_consec side_consec estimate    SE df t.ratio p.value
##  linear    Octel - Std R - L          -2.67 0.624 24  -4.276  0.0003
##  quadratic Octel - Std R - L          -1.67 1.080 24  -1.543  0.1359

One interpretation of this is that the comparison by type of the linear contrasts for size is different on the left side than on the right side; but the comparison of that comparison of the quadratic contrasts, not so much. Refer again to the plot, and this can be discerned as a comparison of the interaction in the left panel versus the interaction in the right panel.
對此的一種解釋是,大小的線性對比的類型在左側與右側是不同的;但是對二次對比的比較馍资,就不是那么多了膀捷。再次參考該圖,可以將左面板中的交互與右面板中的交互進行比較來辨別拿穴。

最后泣洞,emmeans提供了一個joint_tests()函數,該函數獲取并測試模型中所有效果的交互對比默色,并將它們編譯到一個類似Type-III-ANOVA的表中:
Finally, emmeans provides a joint_tests() function that obtains and tests the interaction contrasts for all effects in the model and compiles them in one Type-III-ANOVA-like table:

joint_tests(noise.lm)
##  model term     df1 df2 F.ratio p.value
##  size             2  24 893.190  <.0001
##  type             1  24  72.429  <.0001
##  side             1  24   0.048  0.8291
##  size:type        2  24  27.571  <.0001
##  size:side        2  24  44.333  <.0001
##  type:side        1  24   1.190  0.2861
##  size:type:side   2  24  10.333  0.0006

You may even add by variable(s) to obtain separate ANOVA tables for the remaining factors:

joint_tests(noise.lm, by = "side")
## side = L:
##  model term df1 df2 F.ratio p.value
##  size         2  24 651.714  <.0001
##  type         1  24  46.095  <.0001
##  size:type    2  24  23.524  <.0001
## 
## side = R:
##  model term df1 df2 F.ratio p.value
##  size         2  24 285.810  <.0001
##  type         1  24  27.524  <.0001
##  size:type    2  24  14.381  0.0001

Back to Contents

Multivariate contrasts

多變量對比

在前面的章節(jié)中球凰,我們處理相互作用的因素的方法是在其他因素的水平上分別對某些因素()進行比較或對比。這導致了大量的估計和相關的測試腿宰。
In the preceding sections, the way we addressed interacting factors was to do comparisons or contrasts of some factors()) separately at levels of other factor(s). This leads to a lot of estimates and associated tests.

Another approach is to compare things in a multivariate way. In the auto-noise example, for example, we have four means (corresponding to the four combinations of type and size) with each size of car, and we could consider comparing these sets of means. Such multivariate comparisons can be done via the Mahalanobis distance (a kind of standardized distance measure) between one set of four means and another. This is facilitated by the mvcontrast() function:
另一種方法是用多變量的方式進行比較呕诉。例如,在汽車噪聲的例子中吃度,我們有四個平均值(對應于類型和大小的四種組合)來處理每種大小的汽車甩挫,我們可以考慮比較這幾組平均值。這種多變量比較可以通過一組四個均值與另一個均值之間的馬氏距離(一種標準化距離度量)來完成规肴。這可以通過mvcontrast()函數來實現:

mvcontrast(noise.emm, "pairwise", mult.name = c("type", "side"))
##  contrast T.square df1 df2 F.ratio p.value
##  S - M      88.857   4  21  19.438  <.0001
##  S - L    1199.429   4  21 262.375  <.0001
##  M - L    1638.000   4  21 358.312  <.0001
## 
## P value adjustment: sidak

In this output, the T.square values are Hotelling’s <nobr aria-hidden="true" style="transition: none 0s ease 0s; border: 0px; padding: 0px; margin: 0px; max-width: none; max-height: none; min-width: 0px; min-height: 0px; vertical-align: 0px; line-height: normal; text-decoration: none; white-space: nowrap !important;">T2</nobr><math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>T</mi><mn>2</mn></msup></math> statistics, which are the squared Mahalanobis distances among the sets of four means. These results thus accomplish a similar objective as the initial comparisons presented in this vignette, but are not complicated by the issue that the factors interact. (Instead, we lose the directionality of the comparisons.) While all comparisons are “significant,” the T.square values indicate that large cars are statistically most different from the other sizes.

We may still break things down using by variables. Suppose, for example, we wish to compare the two filter types for each size of car, without regard to which side:

update(mvcontrast(noise.emm, "consec", mult.name = "side", by = "size"), 
       by = NULL)
##  contrast    size T.square df1 df2 F.ratio p.value
##  Octel - Std S      11.429   2  23   5.476  0.0113
##  Octel - Std M     123.714   2  23  59.280  <.0001
##  Octel - Std L      14.286   2  23   6.845  0.0047
## 
## P value adjustment: sidak

One detail to note about multivariate comparisons: in order to make complete sense, all the factors involved must interact. Suppose we were to repeat the initial multivariate comparison after removing all interactions:

mvcontrast(update(noise.emm, submodel = ~ side + size + type), 
           "pairwise", mult.name = c("type", "side"))
##  contrast T.square df1 df2  F.ratio p.value
##  S - M      37.786   1  24   37.786  <.0001
##  S - L    1098.286   1  24 1098.286  <.0001
##  M - L    1543.500   1  24 1543.500  <.0001
## 
## P value adjustment: sidak 
## NOTE: Some or all d.f. are reduced due to singularities

Note that each F ratio now has 1 d.f. Also, note that T.square = F.ratio, and you can verify that these values are equal to the squares of the t.ratios in the initial example in this vignette ((?6.147)2=37.786, etc.). That is, if we ignore all interactions, the multivariate tests are exactly equivalent to the univariate tests of the marginal means.

Back to Contents

Interactions with covariates

與協(xié)變量的交互作用

當一個協(xié)變量和一個因素相互作用時捶闸,我們通常不需要EMMs本身,而是對每個因素水平的協(xié)變量趨勢的斜率進行估計拖刃。舉個簡單的例子删壮,考慮纖維數據集,并擬合一個包含直徑(協(xié)變量)和機器(因子)之間交互作用的模型:
When a covariate and a factor interact, we typically don’t want EMMs themselves, but rather estimates of slopes of the covariate trend for each level of the factor. As a simple example, consider the fiber dataset, and fit a model including the interaction between diameter (a covariate) and machine (a factor):

fiber.lm <- lm(strength ~ diameter*machine, data = fiber)

This model comprises fitting, for each machine, a separate linear trend for strength versus diameter. Accordingly, we can estimate and compare the slopes of those lines via the emtrends() function:

emtrends(fiber.lm, pairwise ~ machine, var = "diameter")
## $emtrends
##  machine diameter.trend    SE df lower.CL upper.CL
##  A                1.104 0.194  9    0.666     1.54
##  B                0.857 0.224  9    0.351     1.36
##  C                0.864 0.208  9    0.394     1.33
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate    SE df t.ratio p.value
##  A - B     0.24714 0.296  9   0.835  0.6919
##  A - C     0.24008 0.284  9   0.845  0.6863
##  B - C    -0.00705 0.306  9  -0.023  0.9997
## 
## P value adjustment: tukey method for comparing a family of 3 estimates

We see the three slopes, but no two of them test as being statistically different.

To visualize the lines themselves, you may use

emmip(fiber.lm, machine ~ diameter, cov.reduce = range)
image.png

The cov.reduce = range argument is passed to ref_grid(); it is needed because by default, each covariate is reduced to only one value (see the “basics” vignette). Instead, we call the range() function to obtain the minimum and maximum diameter.
將cov.duce=range參數傳遞給ref_grid()兑牡;它是必需的央碟,因為在默認情況下,每個協(xié)變量只減少到一個值(請參閱“基本”插頁)均函。相反亿虽,我們調用range()函數來獲取最小和最大直徑。

對于更復雜的示例苞也,請考慮包中包含的橙子數據集洛勉。這些數據涉及兩種橙子的銷售情況。價格(price1和price2)在不同的商店和不同的時間進行了實驗變化如迟,并觀察了Sales1和Sales2的反應收毫。讓我們考慮這些數據的三個多變量模型攻走,對天數和商店有相加效應,并對價格進行不同程度的擬合:
For a more sophisticated example, consider the oranges dataset included with the package. These data concern the sales of two varieties of oranges. The prices (price1 and price2) were experimentally varied in different stores and different days, and the responses sales1 and sales2 were observed. Let’s consider three multivariate models for these data, with additive effects for days and stores, and different levels of fitting on the prices:

org.quad <- lm(cbind(sales1, sales2) ~ poly(price1, price2, degree = 2)
                                       + day + store, data = oranges)
org.int <- lm(cbind(sales1, sales2) ~ price1 * price2 + day + store, data = oranges)
org.add <- lm(cbind(sales1, sales2) ~ price1 + price2 + day + store, data = oranges)

Being a multivariate model, emmeans methods will distinguish the responses as if they were levels of a factor, which we will name “variety”. Moreover, separate effects are estimated for each multivariate response, so there is an implied interaction between variety and each of the predictors involving price1 and price2. (In org.int, there is an implied three-way interaction.) An interesting way to view these models is to look at how they predict sales of each variety at each observed values of the prices:
作為一種多變量模型此再,EMMeans方法將區(qū)分不同的反應昔搂,就好像它們是某一因素的水平,我們將其命名為“多樣性”输拇。此外摘符,對每個多變量響應的單獨影響進行了估計,因此在品種和涉及價格1和價格2的每個預測因子之間存在隱含的交互作用策吠。(在org.int中逛裤,存在隱含的三向交互。)奴曙”鸢迹看待這些模型的一個有趣方法是,看看它們如何預測每個品種在每個觀察到的價格值下的銷售額:

emmip(org.quad, price2 ~ price1 | variety, mult.name = "variety", cov.reduce = FALSE)
image.png

The trends portrayed here are quite sensible: In the left panel, as we increase the price of variety 1, sales of that variety will tend to decrease – and the decrease will be faster when the other variety of oranges is low-priced. In the right panel, as price of variety 1 increases, sales of variety 2 will increase when it is low-priced, but could decrease also at high prices because oranges in general are just too expensive. A plot like this for org.int will be similar but all the curves will be straight lines; and the one for plot.add will have all lines parallel. In all models, though, there are implied price1:variety and price2:variety interactions, because we have different regression coefficients for the two responses.

Which model should we use? They are nested models, so they can be compared by anova():

anova(org.quad, org.int, org.add)
## Analysis of Variance Table
## 
## Model 1: cbind(sales1, sales2) ~ poly(price1, price2, degree = 2) + day + 
##     store
## Model 2: cbind(sales1, sales2) ~ price1 * price2 + day + store
## Model 3: cbind(sales1, sales2) ~ price1 + price2 + day + store
##   Res.Df Df Gen.var.   Pillai approx F num Df den Df Pr(>F)
## 1     20      22.798                                       
## 2     22  2   21.543 0.074438  0.38658      4     40 0.8169
## 3     23  1   23.133 0.218004  2.64840      2     19 0.0967

It seems like the full-quadratic model has little advantage over the interaction model. There truly is nothing magical about a P value of 0.05, and we have enough data that over-fitting is not a hazard; so I like org.int. However, what follows could be done with any of these models.

To summarize and test the results compactly, it makes sense to obtain estimates of a representative trend in each of the left and right panels, and perhaps to compare them. In turn, that can be done by obtaining the slope of the curve (or line) at the average value of price2. The emtrends() function is designed for exactly this kind of purpose. It uses a difference quotient to estimate the slope of a line fitted to a given variable. It works just like emmeans() except for requiring the variable to use in the difference quotient. Using the org.int model:

emtrends(org.int, pairwise ~ variety, var = "price1", mult.name = "variety")
## $emtrends
##  variety price1.trend    SE df lower.CL upper.CL
##  sales1        -0.749 0.171 22   -1.104   -0.394
##  sales2         0.138 0.214 22   -0.306    0.582
## 
## Results are averaged over the levels of: day, store 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast        estimate   SE df t.ratio p.value
##  sales1 - sales2   -0.887 0.24 22  -3.690  0.0013
## 
## Results are averaged over the levels of: day, store

From this, we can say that, starting with price1 and price2 both at their average values, we expect sales1 to decrease by about .75 per unit increase in price1; meanwhile, there is a suggestion of a slight increase of sales2, but without much statistical evidence. Marginally, the first variety has a 0.89 disadvantage relative to sales of the second variety.

Other analyses (not shown) with price2 set at a higher value will reduce these effects, while setting price2 lower will exaggerate all these effects. If the same analysis is done with the quadratic model, the the trends are curved, and so the results will depend somewhat on the setting for price1. The graph above gives an indication of the nature of those changes.

Similar results hold when we analyze the trends for price2:

emtrends(org.int, pairwise ~ variety, var = "price2", mult.name = "variety")
## $emtrends
##  variety price2.trend    SE df lower.CL upper.CL
##  sales1         0.172 0.102 22  -0.0404    0.384
##  sales2        -0.745 0.128 22  -1.0099   -0.480
## 
## Results are averaged over the levels of: day, store 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast        estimate    SE df t.ratio p.value
##  sales1 - sales2    0.917 0.143 22   6.387  <.0001
## 
## Results are averaged over the levels of: day, store

At the averages, increasing the price of variety 2 has the effect of decreasing sales of variety 2 while slightly increasing sales of variety 1 – a marginal difference of about .92.

Back to Contents

Summary

Interactions, by nature, make things more complicated. One must resist pressures and inclinations to try to produce simple bottom-line conclusions. Interactions require more work and more patience; they require presenting more cases – more than are presented in the examples in this vignette – in order to provide a complete picture.
從本質上講洽糟,交互會讓事情變得更加復雜炉菲。人們必須頂住壓力和傾向,試圖得出簡單的底線結論坤溃。交互需要更多的工作和更多的耐心拍霜;它們需要展示更多的案例--比這個簡介中的例子更多--以便提供一個完整的圖景。

所有簡介的索引
Index of all vignette topics


  1. I sure wish I could ask some questions about how how these data were collected; for example, are these independent experimental runs, or are some cars measured more than once? The model is based on the independence assumption, but I have my doubts.??

我當然希望我能問一些問題薪介,關于這些數據是如何收集的祠饺;例如,這些是獨立的實驗運行汁政,還是有些汽車測量了不止一次道偷?這個模型是建立在獨立性假設的基礎上的,但我對此持懷疑態(tài)度记劈。??勺鸦。

  1. You may have noticed that there are no asterisks in the ANOVA table in this vignette. I habitually opt out of star-gazing by including options(show.signif.stars = FALSE) in my .Rprofile file.??
    您可能已經注意到,在這個小插曲中目木,ANOVA表中沒有星號换途。我習慣性地通過在我的.Rprofile文件中加入選項(show.signif.star=false)來退出觀星。??
emmip(Ratio_raw_fit, Intention ~ Outcome | Interventiontype)
# 三重交互檢驗的方法有兩種:
# 1. 簡單簡單效應檢驗刽射【猓看一個因素在另外兩個因素水平的結合上的處理效應。
# 2. 簡單交互作用檢驗誓禁⌒赶ⅲ看在一個因素的各個水平上,另外兩個因素的交互作用摹恰。

# 1. 簡單簡單效應檢驗漓拾。
# Intention * Outcome * Interventiontype每一種組合下的阁最,邊際均值戒祠。
Ratio.emm <- emmeans(Ratio_raw_fit, ~ Intention * Outcome * Interventiontype)
# 先看  Intention 和 Outcome骇两,每一種組合下,Interventiontype 主效應的差異姜盈。簡單簡單效應分析
joint_tests(Ratio_raw_fit , by=c("Intention","Outcome"))
# 固定 Intention 和 Outcome低千,看每一種組合下,Interventiontype兩兩比較的差異馏颂。simple的含義是示血,用作比較的變量。
pairs(Ratio.emm, simple = "Interventiontype",adjust="bonferroni") 
# # 上一句等價于下面一句救拉,by的含義是难审,不用做比較的變量。也就是水平固定的變量亿絮。
# pairs(Ratio.emm, by = c("Intention","Outcome"),adjust="bonferroni")

# 2. 簡單交互作用檢驗告喊。
# simple參數可以指定一個list格式的參數,含義是派昧,對 list 里的每個元素進行單獨的運行黔姜。
# 比如在此例子中,運行 Interventiontype蒂萎,也就是秆吵,(固定Intention和Outcome)比較 Interventiontype 不同水平的差異。
# 運行 c("Intention", "Outcome")五慈,也就是(固定Interventiontype)對 Intention和Outcome 每一種纳寂,進行兩兩比較。
pairs(Ratio.emm, simple = list("Interventiontype", c("Intention", "Outcome")))

pairs(Ratio.emm, simple = "each")
contrast(Ratio.emm, "consec", simple = "each", adjust = "mvt")
contrast(Ratio.emm, "consec", simple = "each", combine = TRUE, adjust = "mvt")

a<-emmeans(Ratio_raw_fit,pairwise~Intention*Outcome|Interventiontype,interaction=TRUE)
a$contrasts
pairs(a$contrasts, by = NULL,adjust="bonferroni")

emmeans(Ratio_raw_fit,pairwise~Intention*Outcome|Interventiontype,interaction=FALSE,adjust="bonferroni")

joint_tests(Ratio_raw_fit , by = "Interventiontype")
               
# emms1 <- emmeans(Ratio_raw_fit, ~ Intention*Outcome|Interventiontype)
# con1 <- contrast(emms1, interaction = "pairwise")
# pairs(con1, by = NULL)
最后編輯于
?著作權歸作者所有,轉載或內容合作請聯系作者
  • 序言:七十年代末泻拦,一起剝皮案震驚了整個濱河市毙芜,隨后出現的幾起案子,更是在濱河造成了極大的恐慌聪轿,老刑警劉巖爷肝,帶你破解...
    沈念sama閱讀 206,723評論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現場離奇詭異陆错,居然都是意外死亡灯抛,警方通過查閱死者的電腦和手機,發(fā)現死者居然都...
    沈念sama閱讀 88,485評論 2 382
  • 文/潘曉璐 我一進店門音瓷,熙熙樓的掌柜王于貴愁眉苦臉地迎上來对嚼,“玉大人,你說我怎么就攤上這事绳慎∽菔” “怎么了漠烧?”我有些...
    開封第一講書人閱讀 152,998評論 0 344
  • 文/不壞的土叔 我叫張陵,是天一觀的道長靡砌。 經常有香客問我已脓,道長,這世上最難降的妖魔是什么通殃? 我笑而不...
    開封第一講書人閱讀 55,323評論 1 279
  • 正文 為了忘掉前任度液,我火速辦了婚禮,結果婚禮上画舌,老公的妹妹穿的比我還像新娘堕担。我一直安慰自己,他們只是感情好曲聂,可當我...
    茶點故事閱讀 64,355評論 5 374
  • 文/花漫 我一把揭開白布霹购。 她就那樣靜靜地躺著,像睡著了一般朋腋。 火紅的嫁衣襯著肌膚如雪齐疙。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,079評論 1 285
  • 那天乍丈,我揣著相機與錄音剂碴,去河邊找鬼。 笑死轻专,一個胖子當著我的面吹牛忆矛,可吹牛的內容都是我干的。 我是一名探鬼主播请垛,決...
    沈念sama閱讀 38,389評論 3 400
  • 文/蒼蘭香墨 我猛地睜開眼催训,長吁一口氣:“原來是場噩夢啊……” “哼!你這毒婦竟也來了宗收?” 一聲冷哼從身側響起漫拭,我...
    開封第一講書人閱讀 37,019評論 0 259
  • 序言:老撾萬榮一對情侶失蹤,失蹤者是張志新(化名)和其女友劉穎混稽,沒想到半個月后采驻,有當地人在樹林里發(fā)現了一具尸體,經...
    沈念sama閱讀 43,519評論 1 300
  • 正文 獨居荒郊野嶺守林人離奇死亡匈勋,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內容為張勛視角 年9月15日...
    茶點故事閱讀 35,971評論 2 325
  • 正文 我和宋清朗相戀三年礼旅,在試婚紗的時候發(fā)現自己被綠了。 大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片洽洁。...
    茶點故事閱讀 38,100評論 1 333
  • 序言:一個原本活蹦亂跳的男人離奇死亡痘系,死狀恐怖,靈堂內的尸體忽然破棺而出饿自,到底是詐尸還是另有隱情汰翠,我是刑警寧澤龄坪,帶...
    沈念sama閱讀 33,738評論 4 324
  • 正文 年R本政府宣布,位于F島的核電站复唤,受9級特大地震影響健田,放射性物質發(fā)生泄漏。R本人自食惡果不足惜苟穆,卻給世界環(huán)境...
    茶點故事閱讀 39,293評論 3 307
  • 文/蒙蒙 一抄课、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧雳旅,春花似錦、人聲如沸间聊。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,289評論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽哎榴。三九已至型豁,卻和暖如春,著一層夾襖步出監(jiān)牢的瞬間尚蝌,已是汗流浹背迎变。 一陣腳步聲響...
    開封第一講書人閱讀 31,517評論 1 262
  • 我被黑心中介騙來泰國打工, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留飘言,地道東北人衣形。 一個月前我還...
    沈念sama閱讀 45,547評論 2 354
  • 正文 我出身青樓,卻偏偏與公主長得像姿鸿,于是被迫代替她去往敵國和親谆吴。 傳聞我的和親對象是個殘疾皇子,可洞房花燭夜當晚...
    茶點故事閱讀 42,834評論 2 345

推薦閱讀更多精彩內容

  • 我們將微生物群落組成研究分為兩個主要部分:(1)分類多樣性苛预、OTUS和類群的假設檢驗句狼;(2)類群間差異分析。第一個...
    ZMQ要加油呀閱讀 1,836評論 0 2
  • 微生物群落研究的主要目標是比較不同群落的組成(β多樣性)热某。在第6章介紹了β多樣性腻菇,并舉例說明了如何計算β多樣性指數...
    ZMQ要加油呀閱讀 3,712評論 0 5
  • 方差分析的基本思想及應用條件 方差分析的基本思想 在進行科學研究時,有時要按實驗設計將所研究的對象分為多個處理組進...
    backup備份閱讀 7,464評論 0 10
  • 原英文文檔地址:https://raw.githubusercontent.com/kassambara/rsta...
    王詩翔閱讀 6,696評論 0 15
  • 20180502(從有道遷移) 方差分析 當包含的因子是解釋變量時昔馋,我們關注的重點通常會從預測轉向組別差異的分析筹吐,...
    KrisKC閱讀 552評論 0 0