3
$\begingroup$

I am using R.

I have the following data set:

df <- data.frame(
  Plot = 1:16,
  Voles = rep(c("ja", "nein"), each = 8),
  Mulch = rep(rep(c("ja", "nein"), each = 4), 2),
  SoilTemperature = c(18.1, 17.5, 18.4, 18.2, 17.8, 17.3, 18.0, 17.9,
                      18.3, 17.6, 18.5, 18.1, 17.9, 17.7, 18.2, 17.8),
  SoilMoisture = c(25.2, 24.8, 26.1, 25.5, 22.9, 23.2, 24.1, 23.8,
                   26.0, 25.1, 25.8, 25.4, 23.1, 22.7, 24.0, 23.6),
  Infiltration = c(52, 48, 58, 55, 42, 39, 45, 41,
                   50, 49, 51, 50, 38, 36, 40, 39),
  Ksat = c(0.15, 0.14, 0.13, 0.15, 0.14, 0.13, 0.13, 0.12,
                   0.14, 0.13, 0.15, 0.14, 0.11, 0.10, 0.12, 0.11)
)

and want to find out whether voles and/or mulch have an effect on Ksat. I started by fitting two models and used AIC to determine which one fits better:

full <- lm(Ksat ~ Mulch * Voles, data = df)
reduced <- lm(Ksat ~ Voles + Mulch, data = df)

It turns out that the full model has a lower AIC so I proceeded with using the summary() function in R on it. However, the interaction coefficient is not significant (p > 0.05).

My question now is: In order to do a post-hoc test using the function emmeans() in R, should I still use the full model or should I switch to the reduced model since the interaction term is not significant? I am a bit confused right now because the AIC favors the full model.

$\endgroup$

1 Answer 1

5
$\begingroup$

Model uncertainty is in play. When you try different models, pick one, and do the analysis as if that was the only model and it was completely pre-specified, uncertainties (standard errors, etc.) will be underestimated and p-values will be too low. Despite that I do recommend using a single AIC assessment in model selection as not being an awful idea. You are wanting to take this one more step, which creates a more serious form of model uncertainty.

If you thought pre-analysis that you were not certain that interactions are unimportant, then pre-specifying a model with interactions and using that model as the basis for getting contrasts of interest, along with their confidence intervals or Bayesian uncertainty intervals, is recommended.

It is not too hard to come up with meaningful contrasts that apply whether or not an interaction exists. For example in the R rms package you might fit a model that is nonlinear in age with an age by sex interaction. To contrast males and females at age 50 you would do contrast(fit, list(sex=‘female’, age=50), list(sex=‘male’, age=50)). Use e.g. age=mean(age) to use the mean age.

You can also form contrasts that estimate average sex effect over a specific interval of sex. The emmeans package will probably facilitate what you want.

For more about model uncertainty and contrasts see RMS.

$\endgroup$
2
  • $\begingroup$ First of all thank you for your reply. So does that mean, despite the non-significant interaction, I should stick with the full model because it has the lower AIC? $\endgroup$
    – Faith
    Commented 7 hours ago
  • $\begingroup$ Yes, and because you are more likely not to bias standard errors downward. The most important step is analysis planning, which you had already done. This is where you decide on how many degrees of freedom you want to allow, among other things. Sticking to the plan will preserve the meaning of statistical quantities, for example, in linear models will give you an unbiased estimate of $\sigma^2$ by dividing by $N$ minus the correct number of opportunities for $\hat{\sigma}^2$ to be small. $\endgroup$ Commented 2 hours ago

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.