- A simple example: fitting the yield of oats
- Beyond the simple case
- 10 practical considerations
- General conclusion
2021-04-14
library(spaMM) print(citation("spaMM"), style = "text")
## Rousset F, Ferdy J (2014). "Testing environmental and genetic effects ## in the presence of spatial autocorrelation." _Ecography_, *37*(8), ## 781-790. <URL: http://dx.doi.org/10.1111/ecog.00566>.
I will use the R package spaMM for fitting LMs and LMMs.
The R package spaMM can fit LMs, LMMs, GLMs, GLMMs and more, using the same function and producing consistent outputs irrespective of the model.
Well known alternatives for fitting LMMs are:
data("Oats", package = "nlme") ## load a dataset from the pkg nlme Oats$Block <- factor(Oats$Block, ## original data from nlme have ordered factors, so we ordered = FALSE, ## remove the ordering not to trigger unusual contrasts levels = c("I", "II", "III", "IV", "V", "VI")) head(Oats, n = 14) ## display first 14 rows
## Grouped Data: yield ~ nitro | Block ## Block Variety nitro yield ## 1 I Victory 0.0 111 ## 2 I Victory 0.2 130 ## 3 I Victory 0.4 157 ## 4 I Victory 0.6 174 ## 5 I Golden Rain 0.0 117 ## 6 I Golden Rain 0.2 114 ## 7 I Golden Rain 0.4 161 ## 8 I Golden Rain 0.6 141 ## 9 I Marvellous 0.0 105 ## 10 I Marvellous 0.2 140 ## 11 I Marvellous 0.4 118 ## 12 I Marvellous 0.6 156 ## 13 II Victory 0.0 61 ## 14 II Victory 0.2 91
Oats$yield
## [1] 111 130 157 174 117 114 161 141 105 140 118 156 61 91 97 100 70 108 126 ## [20] 149 96 124 121 144 68 64 112 86 60 102 89 96 89 129 132 124 74 89 ## [39] 81 122 64 103 132 133 70 89 104 117 62 90 100 116 80 82 94 126 63 ## [58] 70 109 99 53 74 118 113 89 82 86 104 97 99 119 121
Oats$nitro
## [1] 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 ## [20] 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 ## [39] 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 ## [58] 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
table(Oats$Variety, Oats$Block)
## ## I II III IV V VI ## Golden Rain 4 4 4 4 4 4 ## Marvellous 4 4 4 4 4 4 ## Victory 4 4 4 4 4 4
coplot(yield ~ nitro | Variety + Block, data = Oats) ## simple native plot showing it all
(fit_oats_fixed <- fitme(yield ~ nitro + Variety + Block, data = Oats))
## formula: yield ~ nitro + Variety + Block ## ML: Estimation of phi by ML. ## Estimation of fixed effects by ML. ## family: gaussian( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 113.761 5.287 21.516 ## nitro 73.667 7.553 9.753 ## VarietyMarvellous 5.292 4.137 1.279 ## VarietyVictory -6.875 4.137 -1.662 ## BlockII -28.083 5.851 -4.800 ## BlockIII -39.417 5.851 -6.737 ## BlockIV -37.167 5.851 -6.352 ## BlockV -44.417 5.851 -7.592 ## BlockVI -39.083 5.851 -6.680 ## -------------- Residual variance ------------- ## Coefficients for log(phi) ~ 1 : ## Estimate Cond. SE ## (Intercept) 5.325 0.1667 ## Estimate of phi=residual var: 205.4 ## ------------- Likelihood values ------------- ## logLik ## p(h) (Likelihood): -293.8599
(fit_oats_mixed <- fitme(yield ~ nitro + Variety + (1|Block), data = Oats))
## formula: yield ~ nitro + Variety + (1 | Block) ## Estimation of fixed effects by ML. ## Estimation of lambda and phi by 'outer' ML, maximizing p_v. ## family: gaussian( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 82.400 6.969 11.823 ## nitro 73.667 7.889 9.338 ## VarietyMarvellous 5.292 4.321 1.225 ## VarietyVictory -6.875 4.321 -1.591 ## --------------- Random effects --------------- ## Family: gaussian( link = identity ) ## --- Variance parameters ('lambda'): ## lambda = var(u) for u ~ Gaussian; ## Block : 201.8 ## # of obs: 72; # of groups: Block, 6 ## -------------- Residual variance ------------- ## phi estimate was 224.059 ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): -304.3989
The effect of Block
is modelled by:
fit_oats_fixed
corresponding to each block level (but the one from the intercept)fit_oats_mixed
corresponding to the variable Block
The variance is that of a distribution of the means block effects:
Note: the distribution is centered around 0 (by construction) and considered gaussian.
compared to the average yield of block I,
each block modifies the average yield by a specific (unknown) value drawn from \(\mathcal{N}(\mu = 0, \sigma = 14.21)\)
Note: \(14.21 = \sqrt{201.82}\)
To infer the effect of blocks,
ranef(fit_oats_mixed)
## $`( 1 | Block )` ## I II III IV V VI ## 28.705380 3.000208 -7.373393 -5.313928 -11.949981 -7.068287 ## ## attr(,"class") ## [1] "ranef" "list"
Note: also possible in LM (although fitted, not predicted)
To infer the effect of blocks,
To infer the effect of blocks,
fixpred_V0N <- predict(fit_oats_mixed, newdata = data.frame(Variety = "Victory", nitro = 0), re.form = NA)[[1]]
## sampled level (-> same as ranef()) predict(fit_oats_mixed, newdata = data.frame(Block = "II", Variety = "Victory", nitro = 0))[[1]] - fixpred_V0N
## [1] 3.000208
get_intervals(fit_oats_mixed, newdata = data.frame(Block = "II", Variety = "Victory", nitro = 0)) - fixpred_V0N
## predVar_0.025 predVar_0.975 ## 1 -7.592184 13.5926
To infer the effect of blocks,
fixpred_V0N <- predict(fit_oats_mixed, newdata = data.frame(Variety = "Victory", nitro = 0), re.form = NA)[[1]]
## non-sampled level predict(fit_oats_mixed, newdata = data.frame(Block = "VII", Variety = "Victory", nitro = 0))[[1]] - fixpred_V0N
## [1] 0
get_intervals(fit_oats_mixed, newdata = data.frame(Block = "VII", Variety = "Victory", nitro = 0)) - fixpred_V0N
## predVar_0.025 predVar_0.975 ## 1 -31.01404 31.01404
Note: random effects are shrunk towards the mean (i.e 0). The shrinkage will increase with the uncertainty associated with random effect estimates, so more observations per block would reduce it! Here the random effects are aligned because data are balanced across blocks.
See also: https://m-clark.github.io/posts/2019-05-14-shrinkage-in-mixed-models/
This simple example already illustrates key points about mixed models:
they allow for the estimation of one variance (or several ones in more complex models)
fewer parameters need to be estimated
\(\rightarrow\) having a large number of blocks would not not be an issue in LMM, but it would be in LM
one additional assumption: the random effects are drawn from a gaussian distribution
the more levels you have, the more accurate the estimate of the variance
the more observations per level you have, the more accurate the prediction of each random effect
predictions & simulations for unobserved levels are possible
predictions for an average block are directly given by the fixed effects (in LMM only; not in GLMM)
fitme(yield ~ nitro + (1|Variety) + (1|Block), data = Oats) ## I would not do that here (only 3 varieties!)
## formula: yield ~ nitro + (1 | Variety) + (1 | Block) ## Estimation of fixed effects by ML. ## Estimation of lambda and phi by 'outer' ML, maximizing p_v. ## family: gaussian( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 81.87 7.174 11.41 ## nitro 73.67 8.016 9.19 ## --------------- Random effects --------------- ## Family: gaussian( link = identity ) ## --- Variance parameters ('lambda'): ## lambda = var(u) for u ~ Gaussian; ## Variety : 23.56 ## Block : 207.7 ## # of obs: 72; # of groups: Variety, 3; Block, 6 ## -------------- Residual variance ------------- ## phi estimate was 231.315 ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): -306.8688
fitme(yield ~ nitro + Variety + (nitro|Block), data = Oats)
## formula: yield ~ nitro + Variety + (nitro | Block) ## ML: Estimation of ranCoefs and phi by ML. ## Estimation of fixed effects by ML. ## Estimation of phi by 'outer' ML, maximizing p_v. ## family: gaussian( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 82.400 6.620 12.447 ## nitro 73.667 8.002 9.206 ## VarietyMarvellous 5.292 4.314 1.226 ## VarietyVictory -6.875 4.314 -1.593 ## --------------- Random effects --------------- ## Family: gaussian( link = identity ) ## --- Random-coefficients Cov matrices: ## Group Term Var. Corr. ## Block (Intercept) 173.6 ## Block nitro 11.88 1 ## # of obs: 72; # of groups: Block, 6 ## -------------- Residual variance ------------- ## phi estimate was 223.379 ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): -304.307
In our example fit_oats_mixed
, the random effects were modelled as:
\[ b \sim \mathcal{N}\left(0, \begin{bmatrix} \lambda & 0 & 0 & \dots & 0 \\ 0 & \lambda & 0 & \dots & 0 \\ 0 & 0 & \lambda & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & \lambda \\ \end{bmatrix}\right) \]
This representation highlights a square symmetric matrix called the covariance matrix, for which the number of rows (and columns, since it is a square) is equal to the number of levels for the grouping variable.
In this simple case,
the diagonal elements of the covariance matrix are all the same and modelled as a single variance (\(\lambda\))
the off-diagonal elements of the covariance matrix are 0, which implies that the random effects associated with different levels are not correlated
Conclusion: this is a very simple random-effects structure for a mixed model
In general terms, the random effects can be modelled as:
\[ b \sim \mathcal{N}\left(0, \begin{bmatrix} c_{1,1} & c_{1,2} & c_{1,3} & \dots & c_{1,q} \\ c_{2,1} & c_{2,2} & c_{2,3} & \dots & c_{2,q} \\ c_{3,1} & c_{3,2} & c_{3,3} & \dots & c_{3,q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ c_{q,1} & c_{q,2} & c_{q,3} & \dots & c_{q,q} \\ \end{bmatrix}\right) \]
Departure from the simple model:
Note: in general, the parameters \(c_{i,j}\) are not distinctly estimated but generated by a covariance (or correlation) function (\(c_{i, j} = g(i, j, \theta)\)) which depends on a smaller number of parameters to be estimated
data("Loaloa") ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")] head(ndvi, n = 14)
## maxNDVI latitude longitude ## X1 0.69 5.736750 8.041860 ## X2 0.74 5.680280 8.004330 ## X3 0.79 5.347222 8.905556 ## X4 0.67 5.917420 8.100720 ## X5 0.85 5.104540 8.182510 ## X6 0.80 5.355556 8.929167 ## X7 0.78 4.885000 11.360000 ## X8 0.69 5.897800 8.067490 ## X9 0.80 5.593056 9.018056 ## X10 0.84 6.004167 9.312500 ## X11 0.82 3.350000 11.508330 ## X12 0.80 3.391667 11.670830 ## X13 0.81 4.084000 11.283000 ## X14 0.90 5.472222 8.956944
spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE, xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))
(fit_loaloa <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi))
## formula: maxNDVI ~ 1 + Matern(1 | longitude + latitude) ## ML: Estimation of corrPars, lambda and phi by ML. ## Estimation of fixed effects by ML. ## Estimation of lambda and phi by 'outer' ML, maximizing p_v. ## family: gaussian( link = identity ) ## ------------ Fixed effects (beta) ------------ ## Estimate Cond. SE t-value ## (Intercept) 0.8028 0.01625 49.4 ## --------------- Random effects --------------- ## Family: gaussian( link = identity ) ## --- Correlation parameters: ## 1.nu 1.rho ## 0.5198754 1.7720476 ## --- Variance parameters ('lambda'): ## lambda = var(u) for u ~ Gaussian; ## longitude. : 0.00341 ## # of obs: 197; # of groups: longitude., 197 ## -------------- Residual variance ------------- ## phi estimate was 0.000333393 ## ------------- Likelihood values ------------- ## logLik ## p_v(h) (marginal L): 378.3962
fit_loaloa_km <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, control.dist = list(dist.method = "Euclidean")) # default distance method par(mar = c(4, 4, 0.1, 0.1)) plot(seq(0, 3, by = 0.1), MaternCorr(d = seq(0, 3, by = 0.1), rho = get_ranPars(fit_loaloa, which = "corrPars")[[1]]$rho, nu = get_ranPars(fit_loaloa, which = "corrPars")[[1]]$nu), type = "l", xlab = "Distance (degree)", ylab = "Correlation", ylim = c(0, 1), las = 1)
fit_loaloa_km <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, control.dist = list(dist.method = "Earth")) par(mar = c(4, 4, 0.1, 0.1)) plot(seq(0, 350, by = 0.1), MaternCorr(d = seq(0, 350, by = 0.1), rho = get_ranPars(fit_loaloa_km, which = "corrPars")[[1]]$rho, nu = get_ranPars(fit_loaloa_km, which = "corrPars")[[1]]$nu), type = "l", xlab = "Distance (km)", ylab = "Correlation", ylim = c(0, 1), las = 1)
filled.mapMM(fit_loaloa, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))
This example of fit_loaloa
illustrates additional benefits from mixed models:
the effect of continuous processes can also be modelled by random effects
particular structure of covariance matrices for the random effects can be used to relax the assumption of independence in LM
mixed models can be used to estimate parameters of correlation (or covariance) functions, which has many applications:
estimating spatial and temporal processes
estimating various components of phenotypic variation (animal model)
estimating evolutionary models in phylogenetic analyses
(Note: others methods such as GLS can do that too, but mixed models are statistically best)
If you want to use a correlation function, or predict levels that you did not sample \(\rightarrow\) RANDOM
Otherwise:
Note: whether a number is large or small must be assessed in comparison to the number of observations, but \(\leq 4\) will almost certainty qualify as small
When it is not clear, consider that:
In mixed models, we have more than one variance term.
For example, in fit_oats_mixed
we have two variances:
VarCorr(fit_oats_mixed)
## Group Term Variance Std.Dev. ## 1 Block (Intercept) 201.8184 14.20628 ## 2 Residual (Intercept) 224.0592 14.96861
Pay attention not too mix up (in your mind) the variance(s) of the random effects and the residual variance:
Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):
## Grouped Data: yield ~ nitro | Block ## Block Variety nitro yield ## 1 I Victory 0 111
Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):
## Grouped Data: yield ~ nitro | Block ## Block Variety nitro yield ## 1 I Victory 0 111
Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):
## Grouped Data: yield ~ nitro | Block ## Block Variety nitro yield ## 1 I Victory 0 111
Checking the assumptions of mixed models can be often difficult.
For the simple case (as in fit_oats_mixed
), you may check the usual assumptions from LM as well as the QQ plot of the random effects predicted for the observed levels of the correspinding grouping variable.
You may use methods based on the simulation of residuals (e.g. using the package DHARMa), but although such methods seem appealing, they remain to be statistically evaluated.
An alternative or complementary approach is to compare the goodness of fit of models with different assumptions.
There 2 main fitting methods available to fit mixed models which differ in how likelihood is computed:
ML has been introduced in a context of fixed effects and remains ideal in LMM to fit such effects.
REML has been introduced to reduce the bias in the ML estimation of variance components (by accounting for the degrees of freedom used up by the estimation of fixed effects).
Yet, the REML estimator can result in more uncertainty (depending on the number of parameters, the number of observations and how balanced the observations are between the level of the grouping variable).
Because computing ML and REML is computationally costly in GLMM, different approximations can be used, e.g. PQL (the penalized quasi-likelihood).
Different settings also exist for a given approximation method :-(
Guessing what is best in a given situation is often difficult (even for those who know all the details there is to know about this) and would ideally require a custom simulation analysis.
Conclusion: spaMM & other packages offer a lot of choice, but what to do is often not obvious…
Tips:
Notes:
anova()
is called on fits adjusted with the “wrong” methodChoosing between ML, REML, PQL… means choosing a mathematical function which value depends on the parameters the fit must estimate. The optimizer refers to the method used to optimise such a function, i.e. in how to find parameter values that maximise (or minimise) the lieklihood function.
One can thus choose between different optimizers such as:
nloptr
bobyqa
L-BFGS-B
Note: those methods have not been developed with mixed models in mind (they are general tools)
Packages such as spaMM and lme4 let you choose among various optimizers, but also among various implementations of the same optimizers (i.e. which R package must do the job), and they let you tinker with many inputs adjusting how a given optimizer behave.
The reason why this choice is offered is precisely because no optimizer is best in all situations (an optimizer may be very slow, it may not succeed in converging, or it may converge toward wrong values).
Tips: trust the default settings which have been carefully selected by the developers and when you have problems, collect more data (it often helps) or ask them (not me) for help
Comparing nested models is the usual way to identify which model fits the data the best and/or to test the effect of regressor(s).
There are 3 options when comparing model differing in their fixed-effect regressors:
anova()
)anova()
in spaMM, or pbkrtest::PBmodcomp()
in lme4) (the best approach for GLMMs)pbkrtest::KRmodcomp()
in lme4)There are 2 options when comparing models differing in their random-effect structure:
RLRsim::RLRTSim()
for the test of a single random-effects variance)Notes:
summary()
outputs don’t display p-values (by default) since the usual Wald test are not appropriateYou can compute the Akaike Information Criterion (AIC) but there is now 2 cases to distinguish:
E.g. the cAIC is the metric of choice to identify the spatial model that best predict the distribution of the response (e.g. max NDVI) in one non-sampled locations
Notes:
You may compute:
conditional predictions obtained by summing the fixed effect prediction to the the values predicted for the sampled realisation of the random effects \(\rightarrow\) Best Linear Unbiased Predictions (BLUPs)
unconditional predictions obtained by considering only the fixed effects (i.e. assuming a random effect of 0)
marginal predictions obtained by integrating the sum of the fixed-effect predictor and a random effect over all possible values for such a random effect
Notes:
When computing intervals, you have the possibility to include different sources of (co)variance/uncertainty:
the uncertainty associated with the predictions of the random effects
(e.g. uncertainty stemming from the variance estimate, but also from the estimates of the parameters of correlation function)
Notes:
?predict.HLfit
)A generalised Linear Mixed-effect Models (GLMM) = LMM + GLM
There is (from the user point of view) no additional complexity emerging from the combination of both types of model.
Yet, it does combine the inherent complexity of LMMs (which we just detailed) to the (arguably greater) complexity of GLMs \(\rightarrow\) how to best tests effects? how to check assumptions? how to compute predictions? become extremely tricky questions.
While many bad practices are known, best practices in the context of GLMM remain little investigated (it is a very general class of models).
Another form of generalisation offered by spaMM is to consider random effects that follow a non-gaussian distribution (many choices are possible, as in GLM)
Tip: learn how to do simulation studies and perform a simulation study for your particular situation
The core idea behind the LMM is relatively simple (although abstract when one tries to be both general and correct):
a LMM is a linear model where model parameters are random variable(s)
It is and will probably remain for long the right tool for many engineering and scientific applications
The devil lies in the details of how to work with such a beast (and should not be in the interpretation of the number it gives you, be it parameter estimates, p-values, likelihood… those you should learn to master)
Tips:
In my presentation based on the following model,
formula(fit_oats_mixed)
## yield ~ nitro + Variety + (1 | Block)
I chose to call random effects the values that can be drawn from the gaussian variable used to model the differences between blocks and I referred to the variable Block
in the data as a grouping variable.
Others would instead have written/said that this model contains 1 random effect (or 1 random effect term or 1 random factor: Block
) and use different wording to refer to what I have called random effects (e.g. realizations of the random effect).
A quick look at the literature did not lead to particular insights since authors tend to be precise when using maths and more inconsistent when using English.
I was tempted to use random factor to refer to Block
, but that would not work for the model with Matern(1|lat + long)
, which contains quantitative variables. (Random variable is a little too vague since many random variables are involved in LMM)
This is a general issue in statistics.
## R version 4.0.4 (2021-02-15) ## Platform: x86_64-apple-darwin17.0 (64-bit) ## Running under: macOS Big Sur 10.16 ## ## Matrix products: default ## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib ## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] ggplot2_3.3.3 spaMM_3.7.25 ## ## loaded via a namespace (and not attached): ## [1] tidyselect_1.1.0 xfun_0.22 bslib_0.2.4 ## [4] slam_0.1-48 purrr_0.3.4 pbapply_1.4-3 ## [7] lattice_0.20-41 colorspace_2.0-0 vctrs_0.3.7 ## [10] generics_0.1.0 htmltools_0.5.1.1 yaml_2.2.1 ## [13] utf8_1.2.1 rlang_0.4.10 jquerylib_0.1.3 ## [16] nloptr_1.2.2.2 pillar_1.5.1 withr_2.4.1 ## [19] DBI_1.1.1 glue_1.4.2 registry_0.5-1 ## [22] lifecycle_1.0.0 stringr_1.4.0 munsell_0.5.0 ## [25] gtable_0.3.0 evaluate_0.14 labeling_0.4.2 ## [28] knitr_1.31 parallel_4.0.4 fansi_0.4.2 ## [31] highr_0.8 Rcpp_1.0.6 scales_1.1.1 ## [34] jsonlite_1.7.2 farver_2.1.0 digest_0.6.27 ## [37] stringi_1.5.3 dplyr_1.0.5 ROI_1.0-0 ## [40] numDeriv_2016.8-1.1 grid_4.0.4 tools_4.0.4 ## [43] magrittr_2.0.1 maps_3.3.0 sass_0.3.1 ## [46] proxy_0.4-25 tibble_3.1.0 crayon_1.4.1 ## [49] pkgconfig_2.0.3 MASS_7.3-53.1 ellipsis_0.3.1 ## [52] Matrix_1.3-2 assertthat_0.2.1 minqa_1.2.4 ## [55] rmarkdown_2.7 R6_2.5.0 boot_1.3-27 ## [58] nlme_3.1-152 compiler_4.0.4