2021-04-14

Before we start

Loading spaMM

library(spaMM)
print(citation("spaMM"), style = "text")
## Rousset F, Ferdy J (2014). "Testing environmental and genetic effects
## in the presence of spatial autocorrelation." _Ecography_, *37*(8),
## 781-790. <URL: http://dx.doi.org/10.1111/ecog.00566>.

I will use the R package spaMM for fitting LMs and LMMs.

The R package spaMM can fit LMs, LMMs, GLMs, GLMMs and more, using the same function and producing consistent outputs irrespective of the model.

Well known alternatives for fitting LMMs are:

  • lme4 (for simple LMMs and GLMMs)
  • glmmTMB (for LMs, LMMs, GLMs, GLMMs)

Content

  • A simple example: fitting the yield of oats
  • Beyond the simple case
  • 10 practical considerations
  • General conclusion

A simple example: fitting the yield of oats

The Oats dataset

data("Oats", package = "nlme") ## load a dataset from the pkg nlme
Oats$Block <- factor(Oats$Block,       ## original data from nlme have ordered factors, so we
                     ordered = FALSE,  ## remove the ordering not to trigger unusual contrasts
                     levels = c("I", "II", "III", "IV", "V", "VI"))
head(Oats, n = 14) ## display first 14 rows
## Grouped Data: yield ~ nitro | Block
##    Block     Variety nitro yield
## 1      I     Victory   0.0   111
## 2      I     Victory   0.2   130
## 3      I     Victory   0.4   157
## 4      I     Victory   0.6   174
## 5      I Golden Rain   0.0   117
## 6      I Golden Rain   0.2   114
## 7      I Golden Rain   0.4   161
## 8      I Golden Rain   0.6   141
## 9      I  Marvellous   0.0   105
## 10     I  Marvellous   0.2   140
## 11     I  Marvellous   0.4   118
## 12     I  Marvellous   0.6   156
## 13    II     Victory   0.0    61
## 14    II     Victory   0.2    91

The Oats dataset

Oats$yield
##  [1] 111 130 157 174 117 114 161 141 105 140 118 156  61  91  97 100  70 108 126
## [20] 149  96 124 121 144  68  64 112  86  60 102  89  96  89 129 132 124  74  89
## [39]  81 122  64 103 132 133  70  89 104 117  62  90 100 116  80  82  94 126  63
## [58]  70 109  99  53  74 118 113  89  82  86 104  97  99 119 121
Oats$nitro
##  [1] 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4
## [20] 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2
## [39] 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0
## [58] 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6
table(Oats$Variety, Oats$Block)
##              
##               I II III IV V VI
##   Golden Rain 4  4   4  4 4  4
##   Marvellous  4  4   4  4 4  4
##   Victory     4  4   4  4 4  4

The Oats dataset

coplot(yield ~ nitro | Variety + Block, data = Oats)  ## simple native plot showing it all

Fitting the yield of oats using a LM

(fit_oats_fixed <- fitme(yield ~ nitro + Variety + Block, data = Oats))
## formula: yield ~ nitro + Variety + Block
## ML: Estimation of phi by ML.
##     Estimation of fixed effects by ML.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##                   Estimate Cond. SE t-value
## (Intercept)        113.761    5.287  21.516
## nitro               73.667    7.553   9.753
## VarietyMarvellous    5.292    4.137   1.279
## VarietyVictory      -6.875    4.137  -1.662
## BlockII            -28.083    5.851  -4.800
## BlockIII           -39.417    5.851  -6.737
## BlockIV            -37.167    5.851  -6.352
## BlockV             -44.417    5.851  -7.592
## BlockVI            -39.083    5.851  -6.680
##  -------------- Residual variance  -------------
## Coefficients for log(phi)  ~ 1  :
##             Estimate Cond. SE
## (Intercept)    5.325   0.1667
## Estimate of phi=residual var:  205.4 
##  ------------- Likelihood values  -------------
##                         logLik
## p(h)   (Likelihood): -293.8599

Fitting the yield of oats using a LMM

(fit_oats_mixed <- fitme(yield ~ nitro + Variety + (1|Block), data = Oats))
## formula: yield ~ nitro + Variety + (1 | Block)
## Estimation of fixed effects by ML.
## Estimation of lambda and phi by 'outer' ML, maximizing p_v.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##                   Estimate Cond. SE t-value
## (Intercept)         82.400    6.969  11.823
## nitro               73.667    7.889   9.338
## VarietyMarvellous    5.292    4.321   1.225
## VarietyVictory      -6.875    4.321  -1.591
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    Block  :  201.8  
## # of obs: 72; # of groups: Block, 6 
##  -------------- Residual variance  -------------
## phi estimate was 224.059 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -304.3989

Parameters in LM & LMM

The effect of Block is modelled by:

  • 5 means for fit_oats_fixed corresponding to each block level (but the one from the intercept)
  • 1 variance for fit_oats_mixed corresponding to the variable Block

The variance is that of a distribution of the means block effects:

Note: the distribution is centered around 0 (by construction) and considered gaussian.

Interpretation in LM & LMM

LM:

compared to the average yield of block I,

  • block II increases the average yield by -28.08
  • block III increases the average yield by -39.42
  • block IV increases the average yield by -37.17
  • block V increases the average yield by -44.42
  • block VI increases the average yield by -39.08

LMM:

each block modifies the average yield by a specific (unknown) value drawn from \(\mathcal{N}(\mu = 0, \sigma = 14.21)\)

Note: \(14.21 = \sqrt{201.82}\)

The effect of blocks in LMM

To infer the effect of blocks,

  • you can predict the effect of the observed blocks by computing the mean of the random effects conditional to the observed data:


ranef(fit_oats_mixed)
## $`( 1 | Block )`
##          I         II        III         IV          V         VI 
##  28.705380   3.000208  -7.373393  -5.313928 -11.949981  -7.068287 
## 
## attr(,"class")
## [1] "ranef" "list"


Note: also possible in LM (although fitted, not predicted)

The effect of blocks in LMM

To infer the effect of blocks,

  • you can predict the effect of the observed blocks by computing the mean of the random effects conditional to the observed data:


The effect of blocks in LMM

To infer the effect of blocks,

  • you can predict the effect of the observed blocks by computing the mean of the random effects conditional to the observed data
  • you can predict the effect of an average new block: its effect is 0 and associated with a larger uncertainty:


fixpred_V0N <- predict(fit_oats_mixed, newdata = data.frame(Variety = "Victory", nitro = 0), re.form = NA)[[1]]
## sampled level (-> same as ranef())
predict(fit_oats_mixed, newdata = data.frame(Block = "II", Variety = "Victory", nitro = 0))[[1]] - fixpred_V0N
## [1] 3.000208
get_intervals(fit_oats_mixed, newdata = data.frame(Block = "II", Variety = "Victory", nitro = 0)) - fixpred_V0N
##   predVar_0.025 predVar_0.975
## 1     -7.592184       13.5926

The effect of blocks in LMM

To infer the effect of blocks,

  • you can predict the effect of the observed blocks by computing the mean of the random effects conditional to the observed data
  • you can predict the effect of an average new block: its effect is 0 and associated with a larger uncertainty:


fixpred_V0N <- predict(fit_oats_mixed, newdata = data.frame(Variety = "Victory", nitro = 0), re.form = NA)[[1]]
## non-sampled level
predict(fit_oats_mixed, newdata = data.frame(Block = "VII", Variety = "Victory", nitro = 0))[[1]] - fixpred_V0N
## [1] 0
get_intervals(fit_oats_mixed, newdata = data.frame(Block = "VII", Variety = "Victory", nitro = 0)) - fixpred_V0N
##   predVar_0.025 predVar_0.975
## 1     -31.01404      31.01404

Fixed vs Random effects for observed blocks

Note: random effects are shrunk towards the mean (i.e 0). The shrinkage will increase with the uncertainty associated with random effect estimates, so more observations per block would reduce it! Here the random effects are aligned because data are balanced across blocks.

See also: https://m-clark.github.io/posts/2019-05-14-shrinkage-in-mixed-models/

Conclusion

This simple example already illustrates key points about mixed models:

  • they allow for the estimation of one variance (or several ones in more complex models)

  • fewer parameters need to be estimated

    \(\rightarrow\) having a large number of blocks would not not be an issue in LMM, but it would be in LM

  • one additional assumption: the random effects are drawn from a gaussian distribution

  • the more levels you have, the more accurate the estimate of the variance

  • the more observations per level you have, the more accurate the prediction of each random effect

  • predictions & simulations for unobserved levels are possible

  • predictions for an average block are directly given by the fixed effects (in LMM only; not in GLMM)

Beyond the simple case

Simple generalisation

  • you can model several variances (useful to compare them!):
fitme(yield ~ nitro + (1|Variety) + (1|Block), data = Oats) ## I would not do that here (only 3 varieties!)
## formula: yield ~ nitro + (1 | Variety) + (1 | Block)
## Estimation of fixed effects by ML.
## Estimation of lambda and phi by 'outer' ML, maximizing p_v.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)    81.87    7.174   11.41
## nitro          73.67    8.016    9.19
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    Variety  :  23.56 
##    Block  :  207.7  
## # of obs: 72; # of groups: Variety, 3; Block, 6 
##  -------------- Residual variance  -------------
## phi estimate was 231.315 
##  ------------- Likelihood values  -------------
##                         logLik
## p_v(h) (marginal L): -306.8688

Simple generalisation

  • you can model several variances
  • the random effects can be added to fixed effect(s) other than the intercept \(\rightarrow\) random slope model:
fitme(yield ~ nitro + Variety + (nitro|Block), data = Oats)
## formula: yield ~ nitro + Variety + (nitro | Block)
## ML: Estimation of ranCoefs and phi by ML.
##     Estimation of fixed effects by ML.
## Estimation of phi by 'outer' ML, maximizing p_v.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##                   Estimate Cond. SE t-value
## (Intercept)         82.400    6.620  12.447
## nitro               73.667    8.002   9.206
## VarietyMarvellous    5.292    4.314   1.226
## VarietyVictory      -6.875    4.314  -1.593
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
##          --- Random-coefficients Cov matrices:
##  Group        Term  Var. Corr.
##  Block (Intercept) 173.6      
##  Block       nitro 11.88     1
## # of obs: 72; # of groups: Block, 6 
##  -------------- Residual variance  -------------
## phi estimate was 223.379 
##  ------------- Likelihood values  -------------
##                        logLik
## p_v(h) (marginal L): -304.307

Advanced generalisation

In our example fit_oats_mixed, the random effects were modelled as:

\[ b \sim \mathcal{N}\left(0, \begin{bmatrix} \lambda & 0 & 0 & \dots & 0 \\ 0 & \lambda & 0 & \dots & 0 \\ 0 & 0 & \lambda & \dots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & \lambda \\ \end{bmatrix}\right) \]

This representation highlights a square symmetric matrix called the covariance matrix, for which the number of rows (and columns, since it is a square) is equal to the number of levels for the grouping variable.

In this simple case,

  • the diagonal elements of the covariance matrix are all the same and modelled as a single variance (\(\lambda\))

  • the off-diagonal elements of the covariance matrix are 0, which implies that the random effects associated with different levels are not correlated

Conclusion: this is a very simple random-effects structure for a mixed model

Advanced generalisation

In general terms, the random effects can be modelled as:

\[ b \sim \mathcal{N}\left(0, \begin{bmatrix} c_{1,1} & c_{1,2} & c_{1,3} & \dots & c_{1,q} \\ c_{2,1} & c_{2,2} & c_{2,3} & \dots & c_{2,q} \\ c_{3,1} & c_{3,2} & c_{3,3} & \dots & c_{3,q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ c_{q,1} & c_{q,2} & c_{q,3} & \dots & c_{q,q} \\ \end{bmatrix}\right) \]

Departure from the simple model:

  • the variances used to draw each level can differ
  • the covariances between levels don’t have to be null


Note: in general, the parameters \(c_{i,j}\) are not distinctly estimated but generated by a covariance (or correlation) function (\(c_{i, j} = g(i, j, \theta)\)) which depends on a smaller number of parameters to be estimated

Example: modeling spatial autocorrelation

data("Loaloa")
ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")]
head(ndvi, n = 14)
##     maxNDVI latitude longitude
## X1     0.69 5.736750  8.041860
## X2     0.74 5.680280  8.004330
## X3     0.79 5.347222  8.905556
## X4     0.67 5.917420  8.100720
## X5     0.85 5.104540  8.182510
## X6     0.80 5.355556  8.929167
## X7     0.78 4.885000 11.360000
## X8     0.69 5.897800  8.067490
## X9     0.80 5.593056  9.018056
## X10    0.84 6.004167  9.312500
## X11    0.82 3.350000 11.508330
## X12    0.80 3.391667 11.670830
## X13    0.81 4.084000 11.283000
## X14    0.90 5.472222  8.956944

Example: modeling spatial autocorrelation

spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE,
            xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))

Example: modeling spatial autocorrelation

(fit_loaloa <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi))
## formula: maxNDVI ~ 1 + Matern(1 | longitude + latitude)
## ML: Estimation of corrPars, lambda and phi by ML.
##     Estimation of fixed effects by ML.
## Estimation of lambda and phi by 'outer' ML, maximizing p_v.
## family: gaussian( link = identity ) 
##  ------------ Fixed effects (beta) ------------
##             Estimate Cond. SE t-value
## (Intercept)   0.8028  0.01625    49.4
##  --------------- Random effects ---------------
## Family: gaussian( link = identity ) 
##                    --- Correlation parameters:
##      1.nu     1.rho 
## 0.5198754 1.7720476 
##            --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian; 
##    longitude.  :  0.00341  
## # of obs: 197; # of groups: longitude., 197 
##  -------------- Residual variance  -------------
## phi estimate was 0.000333393 
##  ------------- Likelihood values  -------------
##                        logLik
## p_v(h) (marginal L): 378.3962

Example: modeling spatial autocorrelation

fit_loaloa_km <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi,
                       control.dist = list(dist.method = "Euclidean")) # default distance method
par(mar = c(4, 4, 0.1, 0.1))
plot(seq(0, 3, by = 0.1),
     MaternCorr(d = seq(0, 3, by = 0.1),
                rho = get_ranPars(fit_loaloa, which = "corrPars")[[1]]$rho,
                nu = get_ranPars(fit_loaloa, which = "corrPars")[[1]]$nu),
     type = "l", xlab = "Distance (degree)", ylab = "Correlation", ylim = c(0, 1), las = 1)

Example: modeling spatial autocorrelation

fit_loaloa_km <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi,
                       control.dist = list(dist.method = "Earth"))
par(mar = c(4, 4, 0.1, 0.1))
plot(seq(0, 350, by = 0.1),
     MaternCorr(d = seq(0, 350, by = 0.1),
                rho = get_ranPars(fit_loaloa_km, which = "corrPars")[[1]]$rho,
                nu = get_ranPars(fit_loaloa_km, which = "corrPars")[[1]]$nu),
     type = "l", xlab = "Distance (km)", ylab = "Correlation", ylim = c(0, 1), las = 1)

Example: modeling spatial autocorrelation

filled.mapMM(fit_loaloa, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Conclusion

This example of fit_loaloa illustrates additional benefits from mixed models:

  • the effect of continuous processes can also be modelled by random effects

  • particular structure of covariance matrices for the random effects can be used to relax the assumption of independence in LM

  • mixed models can be used to estimate parameters of correlation (or covariance) functions, which has many applications:

    • estimating spatial and temporal processes

    • estimating various components of phenotypic variation (animal model)

    • estimating evolutionary models in phylogenetic analyses

      (Note: others methods such as GLS can do that too, but mixed models are statistically best)

10 practical considerations

1. Fix vs random

If you want to use a correlation function, or predict levels that you did not sample \(\rightarrow\) RANDOM


Otherwise:

  • if you have a very large number of levels in a factor \(\rightarrow\) RANDOM
  • if you have a very small number of levels in a factor \(\rightarrow\) FIX

Note: whether a number is large or small must be assessed in comparison to the number of observations, but \(\leq 4\) will almost certainty qualify as small


When it is not clear, consider that:

  • mixed models make additional assumptions on your data
  • mixed models are more difficult to handle (see next slides)
  • you could fit both alternatives and compare the goodness of fit of the models (see Model comparison)

2. Random vs residual variance

In mixed models, we have more than one variance term.

For example, in fit_oats_mixed we have two variances:

VarCorr(fit_oats_mixed)
##      Group        Term Variance Std.Dev.
## 1    Block (Intercept) 201.8184 14.20628
## 2 Residual (Intercept) 224.0592 14.96861


Pay attention not too mix up (in your mind) the variance(s) of the random effects and the residual variance:

  • if you want to model the changes in variances, use a GLM or an heteroscedastic model
  • random effects are used (except in DHGLM) to model changes in means (albeit using a variance)
  • in LMM you cannot have one different level of your grouping variable for each observation, unless you constrain the residual variance (in GLMM it is possible since the residual variance is related to the mean)

2. Random vs residual variance

Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):

## Grouped Data: yield ~ nitro | Block
##   Block Variety nitro yield
## 1     I Victory     0   111
  • Fixed effect:

2. Random vs residual variance

Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):

## Grouped Data: yield ~ nitro | Block
##   Block Variety nitro yield
## 1     I Victory     0   111
  • Random effect:

2. Random vs residual variance

Let’s breakdown all components behind the first observation (= fixed effect + random effect + residual):

## Grouped Data: yield ~ nitro | Block
##   Block Variety nitro yield
## 1     I Victory     0   111
  • Residual:

3. Checking assumptions

Checking the assumptions of mixed models can be often difficult.

For the simple case (as in fit_oats_mixed), you may check the usual assumptions from LM as well as the QQ plot of the random effects predicted for the observed levels of the correspinding grouping variable.

You may use methods based on the simulation of residuals (e.g. using the package DHARMa), but although such methods seem appealing, they remain to be statistically evaluated.

An alternative or complementary approach is to compare the goodness of fit of models with different assumptions.

4. Likelihood computation

There 2 main fitting methods available to fit mixed models which differ in how likelihood is computed:

  • Maximum Likelihood (ML)
  • Restricted (or Residual) Maximum Likelihood (REML)

ML has been introduced in a context of fixed effects and remains ideal in LMM to fit such effects.

REML has been introduced to reduce the bias in the ML estimation of variance components (by accounting for the degrees of freedom used up by the estimation of fixed effects).

Yet, the REML estimator can result in more uncertainty (depending on the number of parameters, the number of observations and how balanced the observations are between the level of the grouping variable).

Because computing ML and REML is computationally costly in GLMM, different approximations can be used, e.g. PQL (the penalized quasi-likelihood).

Different settings also exist for a given approximation method :-(

Guessing what is best in a given situation is often difficult (even for those who know all the details there is to know about this) and would ideally require a custom simulation analysis.

Conclusion: spaMM & other packages offer a lot of choice, but what to do is often not obvious…

4. Likelihood computation

4. Likelihood computation

Tips:

  • interested in fixed effects only \(\rightarrow\) ML
  • interested in variances in a model with many fixed effect parameters \(\rightarrow\) REML
  • the larger the sample size \(\rightarrow\) the smaller difference between estimates produced by ML and REML


Notes:

  • packages such as spaMM or lme4 are trying to prevent you from doing silly things; e.g. they will complain or refit the models when anova() is called on fits adjusted with the “wrong” method
  • beware when comparing likelihoods across packages that not all package use the same underlying formulas, even when you use the same method name

5. Optimizer for the likelihood function

Choosing between ML, REML, PQL… means choosing a mathematical function which value depends on the parameters the fit must estimate. The optimizer refers to the method used to optimise such a function, i.e. in how to find parameter values that maximise (or minimise) the lieklihood function.

One can thus choose between different optimizers such as:

  • nloptr
  • bobyqa
  • L-BFGS-B

Note: those methods have not been developed with mixed models in mind (they are general tools)

Packages such as spaMM and lme4 let you choose among various optimizers, but also among various implementations of the same optimizers (i.e. which R package must do the job), and they let you tinker with many inputs adjusting how a given optimizer behave.

The reason why this choice is offered is precisely because no optimizer is best in all situations (an optimizer may be very slow, it may not succeed in converging, or it may converge toward wrong values).

Tips: trust the default settings which have been carefully selected by the developers and when you have problems, collect more data (it often helps) or ask them (not me) for help

6. Model comparison and significance testing

Comparing nested models is the usual way to identify which model fits the data the best and/or to test the effect of regressor(s).

There are 3 options when comparing model differing in their fixed-effect regressors:

  • use an asymptotic likelihood ratio test (LRT) using the \(\chi^2\) distribution for that (also via anova())
  • use a LRT for which the distribution of the LR statistic under H0 is simulated by parametric bootstrap (also via anova() in spaMM, or pbkrtest::PBmodcomp() in lme4) (the best approach for GLMMs)
  • use an F-test with a particular computation of the degree of freedom (e.g. Kenward-Roger approximation) (pbkrtest::KRmodcomp() in lme4)

There are 2 options when comparing models differing in their random-effect structure:

  • use a LRT by parametric bootstrap
  • use Restricted LRT by parametric bootstrap (e.g. using RLRsim::RLRTSim() for the test of a single random-effects variance)

Notes:

  • even methods based on parametric bootstraps do not always behave well
  • summary() outputs don’t display p-values (by default) since the usual Wald test are not appropriate

7. Information-theoric metrics

You can compute the Akaike Information Criterion (AIC) but there is now 2 cases to distinguish:

  • the computation of the marginal AIC = usual AIC = quantifies the predictive power of a model considering all possible realisations for the random effects (i.e. associated to sampled levels or not)
  • the computation of the conditional AIC = quantifies the predictive power of a model considering only the particular (i.e. predicted) realisations of the random effects conditional to the sampled data

E.g. the cAIC is the metric of choice to identify the spatial model that best predict the distribution of the response (e.g. max NDVI) in one non-sampled locations


Notes:

  • cAIC is not the same as AICc (AIC for small samples) which you may have encountered for LMs
  • not many packages compute the cAIC but spaMM does

8. Computing predictions

You may compute:

  • conditional predictions obtained by summing the fixed effect prediction to the the values predicted for the sampled realisation of the random effects \(\rightarrow\) Best Linear Unbiased Predictions (BLUPs)

  • unconditional predictions obtained by considering only the fixed effects (i.e. assuming a random effect of 0)

  • marginal predictions obtained by integrating the sum of the fixed-effect predictor and a random effect over all possible values for such a random effect


Notes:

  • in LMM the last 2 computations lead to the same result, so in practice we just set the random effect to zero
  • in GLMM the 2 options do differ and only the integration makes sense to obtain marginal predictions (even though it is rarely done)
  • what is best depends on what you are using your predictions for

9. Computing intervals

When computing intervals, you have the possibility to include different sources of (co)variance/uncertainty:

  • as in LM:
    • the (co)variances around fixed effect parameters
    • the residual variance
    • the uncertainty around the estimate(s) for the residual variance
  • different from LM:
    • the uncertainty associated with the predictions of the random effects

      (e.g. uncertainty stemming from the variance estimate, but also from the estimates of the parameters of correlation function)


Notes:

  • what to consider depends on what you are using your interval for
  • spaMM allows you to cherry pick exactly what you need (see ?predict.HLfit)

10. GLMM

A generalised Linear Mixed-effect Models (GLMM) = LMM + GLM

There is (from the user point of view) no additional complexity emerging from the combination of both types of model.

Yet, it does combine the inherent complexity of LMMs (which we just detailed) to the (arguably greater) complexity of GLMs \(\rightarrow\) how to best tests effects? how to check assumptions? how to compute predictions? become extremely tricky questions.

While many bad practices are known, best practices in the context of GLMM remain little investigated (it is a very general class of models).

Another form of generalisation offered by spaMM is to consider random effects that follow a non-gaussian distribution (many choices are possible, as in GLM)


Tip: learn how to do simulation studies and perform a simulation study for your particular situation

General conclusion

LMM in a nutshell

The core idea behind the LMM is relatively simple (although abstract when one tries to be both general and correct):

a LMM is a linear model where model parameters are random variable(s)

It is and will probably remain for long the right tool for many engineering and scientific applications

The devil lies in the details of how to work with such a beast (and should not be in the interpretation of the number it gives you, be it parameter estimates, p-values, likelihood… those you should learn to master)

Tips:

  • explore LMs (and GLMs) in depth if you want to tackle LMMs (and GLMMs) properly
  • if you have no particular interest in statistics and all you are doing is analysing the outcome of an experiment with a factor with 3 (or 4, or 5) levels (e.g.), stay away from mixed models!
  • by the way: mixing models and mixed models are different beasts
  • keep simulating & reading (package documentation, articles, books) even if you don’t understand everything

A note on semantics

In my presentation based on the following model,

formula(fit_oats_mixed)
## yield ~ nitro + Variety + (1 | Block)

I chose to call random effects the values that can be drawn from the gaussian variable used to model the differences between blocks and I referred to the variable Block in the data as a grouping variable.

Others would instead have written/said that this model contains 1 random effect (or 1 random effect term or 1 random factor: Block) and use different wording to refer to what I have called random effects (e.g. realizations of the random effect).

A quick look at the literature did not lead to particular insights since authors tend to be precise when using maths and more inconsistent when using English.

I was tempted to use random factor to refer to Block, but that would not work for the model with Matern(1|lat + long), which contains quantitative variables. (Random variable is a little too vague since many random variables are involved in LMM)

This is a general issue in statistics.

Session info

## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_3.3.3 spaMM_3.7.25 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.1.0    xfun_0.22           bslib_0.2.4        
##  [4] slam_0.1-48         purrr_0.3.4         pbapply_1.4-3      
##  [7] lattice_0.20-41     colorspace_2.0-0    vctrs_0.3.7        
## [10] generics_0.1.0      htmltools_0.5.1.1   yaml_2.2.1         
## [13] utf8_1.2.1          rlang_0.4.10        jquerylib_0.1.3    
## [16] nloptr_1.2.2.2      pillar_1.5.1        withr_2.4.1        
## [19] DBI_1.1.1           glue_1.4.2          registry_0.5-1     
## [22] lifecycle_1.0.0     stringr_1.4.0       munsell_0.5.0      
## [25] gtable_0.3.0        evaluate_0.14       labeling_0.4.2     
## [28] knitr_1.31          parallel_4.0.4      fansi_0.4.2        
## [31] highr_0.8           Rcpp_1.0.6          scales_1.1.1       
## [34] jsonlite_1.7.2      farver_2.1.0        digest_0.6.27      
## [37] stringi_1.5.3       dplyr_1.0.5         ROI_1.0-0          
## [40] numDeriv_2016.8-1.1 grid_4.0.4          tools_4.0.4        
## [43] magrittr_2.0.1      maps_3.3.0          sass_0.3.1         
## [46] proxy_0.4-25        tibble_3.1.0        crayon_1.4.1       
## [49] pkgconfig_2.0.3     MASS_7.3-53.1       ellipsis_0.3.1     
## [52] Matrix_1.3-2        assertthat_0.2.1    minqa_1.2.4        
## [55] rmarkdown_2.7       R6_2.5.0            boot_1.3-27        
## [58] nlme_3.1-152        compiler_4.0.4