Replace compIntercepts() with emmeans()

Derek H. Ogle

Note

The following packages are loaded for use below.

library(dplyr)    ## for filter(), mutate()
library(emmeans)  ## for emmeans()

Warning

Some functions illustrated below were in the FSA package but have now been removed and put into the non-released FSAmisc package that I maintain. These functions are used below only to show what could be done in older versions of FSA but should now be done as described in this post. DO NOT USE any of the functions below that begin with FSAmisc::.

Introduction

compIntercepts() in FSA prior to v0.9.0 was used to statistically compare intercepts for all pairs of groups in an indicator/dummy variable regression (IVR). However, the excellent emmeans() in emmmeans is a more general and strongly principled function for this purpose. As such, compIntercepts() was removed from FSA in early 2022. The purpose of this post is to demonstrate how to use emmeans() for the same purpose for which compIntercepts() was used.

Important

The results from compIntercepts() and emmeans() will not be identical because they use different methods to correct for multiple comparisons when comparing pairs of slopes.

Example Data

Examples below use the Mirex data set from FSA, which contains the concentration of mirex in the tissue and the body weight of two species of salmon (chinook and coho) captured in six years. The year variable is converted to a factor for modeling purposes. To keep the presentation simple, data from only three years will be used here.

data(Mirex,package="FSA")
Mirex <- Mirex |>
  filter(year<1990) |>
  mutate(year=factor(year))
head(Mirex)

#R|    year weight mirex species
#R|  1 1977   0.41  0.16 chinook
#R|  2 1977   0.45  0.19 chinook
#R|  3 1977   1.04  0.19 chinook
#R|  4 1977   1.09  0.10    coho
#R|  5 1977   1.24  0.13 chinook
#R|  6 1977   1.25  0.19 chinook

The lm() below fits the IVR to determine if the relationship between mirex concentration and weight of the salmon differs by year.¹

¹ The three terms on the left side of the formula are the covariate (i.e., weight), main factor (i.e., year), and the interaction between the two (defined with the :).

lm1 <- lm(mirex~weight+year+weight:year,data=Mirex)

The weight:year interaction term p-value suggests that the slopes (i.e., relationship between mirex concentration and salmon weight) do not differ among the three years. However, the year term p-value suggests that the intercepts of at least one pair of these parallel lines DO differ.²

² The weight term p-value suggests that there is a significant relationship between mirex concentration and salmon weight, regardless of which year is considered.

anova(lm1)

#R|  Analysis of Variance Table
#R|  
#R|  Response: mirex
#R|              Df  Sum Sq Mean Sq F value    Pr(>F)    
#R|  weight       1 0.32844 0.32844 89.9408 6.064e-14 ***
#R|  year         2 0.05719 0.02859  7.8306 0.0008881 ***
#R|  weight:year  2 0.00089 0.00044  0.1218 0.8855178    
#R|  Residuals   66 0.24101 0.00365                      
#R|  ---
#R|  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The next step is to determine which pair(s) of intercepts differ significantly. A model that does not include the insignificant interaction term is needed by both compIntercepts() and emmeans() to properly answer this question. That model is fit below.

lm1_noint <- lm(mirex~weight+year,data=Mirex)

What `compIntercepts()` Did

compIntercepts() was simple in that it only required the saved lm() object as an argument. Its returned results should be assigned to an object for further examination.³

³ compIntercepts() had a print() function for nicely printing the results. However, here we will look at each component separately to ease comparison with the emmeans() results.

cifsa <- FSAmisc::compIntercepts(lm1_noint)

The $comparisons component in this saved object contains the results from comparing all pairs of intercepts. Each paired comparison is a row in these results with the groups being compared under comparison, the differences in sample intercepts under diff, 95% confidence intervals for the difference in intercepts under 95% LCI and 95% UCI, and adjusted (for multiple comparisons) p-values for the hypothesis test comparing the intercepts under p.adj.

cifsa$comparisons

#R|    comparison        diff     95% LCI     95% UCI        p.adj
#R|  1  1982-1977 -0.05418782 -0.09512956 -0.01324608 0.0063507871
#R|  2  1986-1977 -0.06550789 -0.10644963 -0.02456615 0.0007997388
#R|  3  1986-1982 -0.01132007 -0.05226181  0.02962167 0.7860342734

For example, these results suggest that the intercepts for 1982 and 1977 ARE statistically different (first row), but the intercepts for 1986 and 1982 are NOT statistically different (last row).

The $smeans component in this saved object contains the mean value of the response variable predicted at the mean value of the covariate. For example, the results below show the predicted mean mirex concentration at the overall mean salmon weight (i.e., 3.782083 kg).

cifsa$means

#R|       1977      1982      1986 
#R|  0.2379541 0.1837663 0.1724462

Because the lines are known to be parallel, differences in intercepts also represent differences in predicted means of the response at all other values of the covariate. compIntercepts() defaulted to show these means at the mean (i.e., center) of the covariate. This could be adjusted with common.cov= in compIntercepts(). For example, the actual intercepts are shown below.

cifsa2 <- FSAmisc::compIntercepts(lm1_noint,common.cov=0)
cifsa2$means

#R|        1977       1982       1986 
#R|  0.13688994 0.08270212 0.07138205

What `emmeans()` Does

Similar results can be obtained with emmeans() from emmeans using the fitted lm() object (without the interaction term) as the first argument and a specs= argument with pairwise~ followed by the name of the factor variable from the lm() model (year in this case).

ci <- emmeans(lm1_noint,specs=pairwise~year)

The object saved from emmeans() is then given as the first argument to summary(), which also requires infer=TRUE if you would like p-values to be calculated.[^pvalues]

cis <- summary(ci,infer=TRUE)

The $contrasts component in this saved object contains the results for comparing all pairs of predicted means at the overall mean of the covariate. Each paired comparison is a row with the groups compared under contrast, the difference in predicted means under estimate, the standard error of the difference in predicted means under SE, the degrees-of-freedom under df, a 95% confidence interval for the difference in predicted means under lower.CL and upper.CL, and the t test statistic and p-value adjusted for multiple comparisons for testing a difference in predicted means under t.ratio and p.value, respectively.

cis$contrasts

#R|   contrast            estimate     SE df lower.CL upper.CL t.ratio p.value
#R|   year1977 - year1982   0.0542 0.0173 68   0.0128   0.0956   3.139  0.0070
#R|   year1977 - year1986   0.0655 0.0175 68   0.0235   0.1075   3.736  0.0011
#R|   year1982 - year1986   0.0113 0.0173 68  -0.0302   0.0529   0.653  0.7913
#R|  
#R|  Confidence level used: 0.95 
#R|  Conf-level adjustment: tukey method for comparing a family of 3 estimates 
#R|  P value adjustment: tukey method for comparing a family of 3 estimates

Comparing these results to the $comparison component from compIntercepts() shows that the difference in sample intercepts or predicted means are the same, though the signs differ because the subtraction was reversed. The confidence interval values and p-values are slightly different. Again, this is due to emmeans() and compIntercepts() using different methods of adjusting for multiple comparisons. These differences did not result in different conclusions in this case, but they could, especially if the p-values are near the rejection criterion.

The $emmeans component contains results for predicted means for each group with the groups under the name of the factor variable (year in this example), the predicted means under emmean, standard errors of the predicted means under SE, degrees-of-freedom under df, 95% confidence intervals for the predicted mean under lower.CL and upper.CL, and t test statistics and p-values adjusted for multiple comparisons for testing that the predicted mean is not equal to zero under t.ratio and p.adj, respectively. While it is not obvious here, these predict means of the response variable are at the mean of the covariate, as they were for compIntercepts().

cis$emmeans

#R|   year emmean     SE df lower.CL upper.CL t.ratio p.value
#R|   1977  0.238 0.0123 68    0.213    0.262  19.392  <.0001
#R|   1982  0.184 0.0122 68    0.159    0.208  15.091  <.0001
#R|   1986  0.172 0.0123 68    0.148    0.197  14.015  <.0001
#R|  
#R|  Confidence level used: 0.95

Here the predicted means match exactly (within rounding) with those in the $means component of compIntercepts().

The means can be predicted at any other “summary” value of the covariate using cov.reduce= in emmeans(). For example, the predicted values at the minimum value of the covariate are obtained below.

ci2 <- emmeans(lm1_noint,specs=pairwise~year,cov.reduce=min)
cis2 <- summary(ci2,infer=TRUE)
cis2$emmeans

#R|   year emmean     SE df lower.CL upper.CL t.ratio p.value
#R|   1977 0.1460 0.0143 68   0.1174    0.175  10.181  <.0001
#R|   1982 0.0918 0.0151 68   0.0617    0.122   6.097  <.0001
#R|   1986 0.0805 0.0163 68   0.0479    0.113   4.928  <.0001
#R|  
#R|  Confidence level used: 0.95

The following will compute predicted means that represent the actual intercepts.

ci3 <- emmeans(lm1_noint,specs=pairwise~year,cov.reduce=function(x) 0)
cis3 <- summary(ci3,infer=TRUE)
cis3$emmeans

#R|   year emmean     SE df lower.CL upper.CL t.ratio p.value
#R|   1977 0.1369 0.0148 68   0.1073    0.166   9.229  <.0001
#R|   1982 0.0827 0.0156 68   0.0516    0.114   5.301  <.0001
#R|   1986 0.0714 0.0169 68   0.0376    0.105   4.213  0.0001
#R|  
#R|  Confidence level used: 0.95

Conclusion

emmeans() in emmeans provides a more general solution to comparing multiple slopes than what was used in compIntercepts() in FSA prior to v0.9.0. As compIntercepts() was removed from FSA in 2022, you should now use emmeans() for this purpose.

emmeans has extensive vignettes that further explain its use. Their “Basics” vignette is also useful.

In a previous post I demonstrated how to use emtrends() from emmeans to replace compSlopes(), which was also removed from FSA.

Note

This change to FSA does not affect anything in Ogle (2016).

References

Ogle, D. H. 2016. Introductory Fisheries Analyses with R. CRC Press, Boca Raton, FL.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{h._ogle2021,
  author = {H. Ogle, Derek},
  title = {Replace {compIntercepts()} with Emmeans()},
  date = {2021-05-12},
  url = {https://fishr-core-team.github.io/fishR/blog/posts/2021-5-12_compIntercepts-replacement/},
  langid = {en}
}

For attribution, please cite this work as:

H. Ogle, D. 2021, May 12. Replace compIntercepts() with emmeans(). https://fishr-core-team.github.io/fishR/blog/posts/2021-5-12_compIntercepts-replacement/.