This tutorial aims to walk you through the mixed effects regression in R. The tutorial has two parts: 1) the linear mixed effects model and 2) the generalized linear mixed effects model.
First, we talk about an experiment. By experiment, we broadly refer to a study where the objective is to assess the effect of one or more factors on a measurable attribute, called outcome. The units for which the outcome is measured is called experimental unit. The factors that affect the outcome are called explanatory variables. If the study aims to compare the effect of specific treatments, then the treatments' effects are treated as fixed effect terms. Suppose that \(\beta\) is the regression parameter that measures the effect of a specific treatment with respect to a reference treatment (or control). Then the effect of the treatment can be verified by testing \(H_0: \beta=0\) versus \(H_a: \beta\ne 0\). On the other hand, if the levels of a factor/explanatory variable are randomly chosen for the experiment, then the treatment's effect is measured via a random effect term. And the presence of the effect is verified by testing \(H_0: \sigma^2_r=0\) versus \(H_a: \sigma^2_r>0\), where \(\sigma_r\) represents the variability of the random effect due the levels of the factor. The mixed effects model contains both fixed effects and random effects. There are several ways this relationship can manifest. We shall examine different mixed effects models in our sleep data example.
The sleep study dataset (Belenkyn et al., 2003) is available in the lme4 package. The response variable is the subjects' reaction time to a stimuli recorded by researchers Reaction
. The subjects (recorded as Subject
) on which the experiment was conducted are assumed to be random. The number of days (Days
) the subjects undergoing sleep deprivation is believed to exert a fixed effect. Below is the structure of the Sleepstudy data
str(sleepstudy)
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : num 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
Here we show the first few rows of the data frame. For example, the following lines of code indicate that there are 180 rows and three columns in the data frame, and the number of subjects (or the experimental units) is 18.
head(sleepstudy)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
dim(sleepstudy)
## [1] 180 3
length(unique(sleepstudy$Subject))
## [1] 18
In the following plot of the sleep study data, we use different colors for different patients.
ggplot(sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject)) +
geom_point(size=1) + theme_classic()
The variables in our model are: \(Y_{i, j}\), the reaction time of the \(i\)th subject at the \(j\)th day, \(\beta\) is a measure of the effect of days, and \(c_i\) is the subject-specific random effect. We assume that the number of days exerts a fixed effect while the subjects' effect is random. In this model (model-1), \(e_{i, j}\) is the residual term assumed to be independent of \(c_i\) and the number of days.
In model-1, only the intercept is random, \(Y_{i, j} = \alpha + c_{i} + \beta*(j) + e_{i, j}\). More specifically, we assume that for this model, 1) \(e_{i, j}\)'s are independent with mean equal to 0, finite variance, and follow \({\rm N}(0,\sigma^2_e )\), 2) \(c_1, c_2, \dots, c_n\) are all independent and follow \({\rm N}(0,\sigma^2_c)\), 3) \(e_{i, j}\) and \(c_i\) are independent and 4) \(e_{i, j}\), \(c_i\), and the fixed-effects covariates are all independent.
In model-2, \(Y_{i, j} = \alpha + c_{i} + (\beta+d_{i})*(j) + e_{i, j}\), where both intercept and slope are assumed to vary by subjects. That means both are random, and the effect of the number of days changes with the subjects. The random component of the slope is denoted by \(d_i\), and in this model \(c_i\) and \(d_i\) are allowed to be correlated (i.e., \(\rho_{c, d}\ne 0\)). [Fig 2] Model-2 and model-3 are the same, except that in the later model, the random intercept and slope are assumed to be uncorrelated (i.e., \(\rho_{c, d} = 0\)).
model2 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy,REML=F)
summary(model2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## AIC BIC logLik deviance df.resid
## 1763.9 1783.1 -876.0 1751.9 174
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9416 -0.4656 0.0289 0.4636 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 565.48 23.780
## Days 32.68 5.717 0.08
## Residual 654.95 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.405 6.632 37.907
## Days 10.467 1.502 6.968
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
In this output, we have some extra parameters: the random slope variance and the correlation between the two random effect terms. The estimated variance of \(d_i\) is 32.68, and the estimated correlation between \(c_i\) and \(d_i\) is .08.
The following plot shows the raw data and the prediction line for each subject \(\widehat{Y}_{i,j} = \widehat{\alpha} + \widehat{c}_{i} + (\widehat{\beta}_{j}+\widehat{d}_{i})j\).
(ggplot(NULL) +
geom_point(data=sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject))+
geom_line(data=sleepstudy%>%mutate(pred_dist=fitted(model2)),aes(x=Days, y=pred_dist, group=Subject, color=Subject))+ theme_classic())
model3 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy,REML=F)
summary(model3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject))
## Data: sleepstudy
##
## AIC BIC logLik deviance df.resid
## 1762 1778 -876 1752 175
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9535 -0.4673 0.0239 0.4625 5.1883
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 584.27 24.172
## Subject.1 Days 33.63 5.799
## Residual 653.12 25.556
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.405 6.708 37.48
## Days 10.467 1.519 6.89
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.194
The output for model 3 is in the same format as model 2 but without the correlation term.
The following plot shows the raw data and the prediction line for each subject, and the predicted value is \(\widehat{Y}_{i, j} = \widehat{\alpha} + \widehat{c}_{i} + (\widehat{\beta}_{j}+\widehat{d}_{i})j\).
ggplot(NULL) +
geom_point(data=sleepstudy,aes(x=Days, y=Reaction, group=Subject, color=Subject))+geom_line(data=sleepstudy%>%mutate(pred_dist=fitted(model3)),aes(x=Days, y=pred_dist, group=Subject, color=Subject))+ theme_classic()
Here we compare the AIC, BIC, and log-likelihood scores for each of the models.
print(c(AIC(model1), AIC(model2), AIC(model3)))
## [1] 1802.079 1763.939 1762.003
print(c(BIC(model1), BIC(model2), BIC(model3)))
## [1] 1814.850 1783.097 1777.968
print(c(logLik(model1), logLik(model2), logLik(model3)))
## [1] -897.0393 -875.9697 -876.0016
anova(model1,model2,model3)
## Data: sleepstudy
## Models:
## model1: Reaction ~ Days + (1 | Subject)
## model3: Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject))
## model2: Reaction ~ Days + (Days | Subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model1 4 1802.1 1814.8 -897.04 1794.1
## model3 5 1762.0 1778.0 -876.00 1752.0 42.0754 1 8.782e-11 ***
## model2 6 1763.9 1783.1 -875.97 1751.9 0.0639 1 0.8004
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the minimum AIC and minimum BIC criteria the uncorrelated model (model 3) while model 2 has the largest log-likelihood value. We will go with model 3 as the preferred model.
First, as mentioned previously, we will refit the model-3 with the REML=T
option.
model3 <- lmer(Reaction ~ Days + (Days||Subject), sleepstudy,REML=T)
The lme4 package contains a confint.mermod()
function that computes confidence intervals (CIs) for fixed and random effects. The function is shown below with all of its parameters.
# S3 method for merMod credit: https://www.rdocumentation.org/packages/lme4/versions/1.1-29/topics/confint.merMod
confint(object, parm, level = 0.95,
method = c("profile", "Wald", "boot"), zeta,
nsim = 500,
boot.type = c("perc","basic","norm"),
FUN = NULL, quiet = FALSE,
oldNames = TRUE, ...)
By default, the confidence level is set to 0.95 and CIs are calculated by the profile likelihood method. Below we obtain the confidence intervals for our model-3. In the output, sig01 refers to the standard deviation of \(c_i\), sig02 refers to the standard deviation of \(d_i\) term, sigma refers to the standard deviation of \(e_{i, j}\), and the fixed effect terms are listed by the names.
confint.merMod(model3)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 15.258649 37.786473
## .sig02 3.964074 8.769159
## .sigma 22.880555 28.787598
## (Intercept) 237.572148 265.238062
## Days 7.334067 13.600505
Next, we use the same function to calculate the bootstrap confidence intervals for all parameters. The default number of bootstrap samples is 500.
confint.merMod(model3, method = 'boot')
## Computing bootstrap confidence intervals ...
## 2.5 % 97.5 %
## .sig01 12.022686 35.910672
## .sig02 3.518198 8.580755
## .sigma 22.589035 28.501600
## (Intercept) 238.855363 263.857089
## Days 7.429879 13.109670
In matrix notation, model-3 is \(Y= X \beta+Zu+e\). Each component of the model can be extracted as follows:
Xi=getME(model3,"X")
Zi=getME(model3,"Z")
Ui=getME(model3,"u")
beta=getME(model3,"beta")
The linear mixed effects model can be analyzed using the Bayesian approach. Bayesian methods can be applied for both small and large sample sizes. We will use the brms package, which uses a syntax similar to the lme4 package. In the Bayesian method, all parameters are treated as random. Therefore, we use prior distribution for each of the parameters. The prior information is stirred into the observed data likelihood resulting in the final product, called the posterior distribution of the parameters. The posterior distribution idea originates from Bayes' theorem, where we use information in the observed data to update knowledge about the model parameters. The background information, known as the prior distribution, is combined with the likelihood function to determine the posterior distribution. To illustrate the Bayesian method, we will use the same sleep study data.
Our first model assumes \(Y_{i, j} = \beta_0 + \beta_{1}j + c_{i} + e_{i, j},\) where \(Y_{i, j}\) is the reaction time of the \(i^{th}\) subject at \(j\) days, \(\beta_0\) is the population intercept, \(\beta_1\) is the population slope, \(c_i\) is the random intercept for subject \(i\), and \(e_{i, j}\) is the error term.
The priors we are using: \(\beta_0 \sim N(251, 10)\), \(\beta_1 \sim N(10, 10)\), \(\sigma_e \sim N(31, 10)I(\sigma_e>0)\), \(\tau_0 \sim N(37, 10)I(\tau_0>0)\). We use the estimates from the classical method to choose the parameters of the prior distributions.
First, load in the rstan and brms packages, then build our base model.
require(rstan)
require(brms)
# Fit a base model using lmer
lmm <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
# Pull base model estimates from lmer model
b0 <- summary(lmm)$coefficients[1,1]
b1 <- summary(lmm)$coefficients[1,2]
se <- summary(lmm)$sigma
te <- attr(summary(lmm)$varcor$Subject, "stddev")
As mentioned before, this package does not fit the models itself but uses Stan
on the back end. Stan is a probabilistic programming language with a C++ backend. By default, this backend uses the static Hamiltonian Monte Carlo (HMC) Sampler and its extension, the No-U-Turn sampler (NUTS) by Hoffman and Gelman (2014).
Alternatives to these algorithms, not available in the brms
package, include random-walk Metropolis, Gibbs-sampling, and slice-sampling. The documentation for this package concludes that its choice of samplers converges much more quickly, especially for high-dimensional models, regardless of whether the priors are conjugate. Read the brms
documentation overview for more information.
This function in the brms
package fits generalized linear/non-linear multivariate multilevel models (MMM). To set priors, we use the prior()
function, which accepts a distribution with parameters as well as a class
argument which designates which parameter the prior is associated with. Unfortunately, the brm
function does not handle parameters set as variables inside of prior()
well, so one must manually enter the prior values into the prior()
function.
Chains: The number of Markov chains to use. The advantage of using more chains is that we can be more confident in understanding the posterior distribution, as each chain has a different starting point. If running in parallel using the "cluster" option, this package limits the number of chains to the number of cores the system has. The default choice of the number of chains is 4.
Iter: The total number of iterations per chain. Some datasets may require more iterations to converge - meaning it takes longer to reach the equilibrium where further iterations do not improve the model fitting. Therefore, a higher number will cause the model to take longer to fit; see a comparison below. The default number of iterations is 2,000.
Warmup: Also known as burn-in. This refers to the number of iterations at the beginning of an MCMC run to be thrown away. The warmup period is a hyper-parameter that is meant to give the Markov Chain time to reach its equilibrium distribution. The idea here is that a "bad" randomized starting point may over-sample regions that aren't representative of the equilibrium distribution. Defaults to floor(iter/2)
.
Thin: Thinning consists of picking values at each k-th step and removing them from the Markov chain. This helps speed up model convergence if number of iterations, iter
, is large. Defaults to 1, meaning all values are kept. Choose a value > 1 to save memory and computation time if iter
is large.
Cores: Number of CPU cores to utilize for building chans. As chains can be created independent of one another, parallelization can be utilized to speed up the process. You can see how many cores are available on your machine by using the detectCores()
function from the package parallel
.
Family: Defaults to Gaussian. Since we are building a linear mixed model, we will use the default choice. More information on specifying the family of distributions can be found in this manual's generalized mixed model section.
priors <- c(prior(normal(251, 10), class = Intercept), # intercept prior
prior(normal(10, 10), class = b), # slope prior
prior(normal(31, 10), class = sigma), # population sd
prior(normal(37, 10), class = sd) # tau0, group sd
)
model1 <- brm(Reaction ~ Days + (1 | Subject),
data = sleepstudy,
prior = priors,
chains = 4,
iter = 10000,
# warmup = ,
thin = 1,
cores = 4,
family = gaussian()
)
summary(model1)
Notes: * Within prior()
calls, sigma refers to the population-level and sd refers to the group-level * The brms
package adjustes prior distributions to be greater than 0 in sd
and sigma
priors. * To compare these results with the output from a model built with the lme4
package, please see the frequentist section of this manual above. * More iterations take longer to fit. If a model fails to converge, you will see a warning output and may need to adjust iterations or other fitting options in order to be successful. Experiment with iterations until you reach an acceptable output. * If you want to compare fitting times for different iterations, refer to the "iteration_comparison" section of accompanying code this markdown file.
With our model built, we can now get estimates of the parameters. R's built-in summary()
function explains each portion of the output, one item at a time.
summary(model1)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy (Number of observations: 180)
## Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
## total post-warmup draws = 20000
##
## Group-Level Effects:
## ~Subject (Number of levels: 18)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 42.55 6.20 31.54 55.86 1.00 5591 8862
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 227.13 8.51 210.01 243.31 1.00 4199 7736
## Days 10.46 0.81 8.87 12.08 1.00 25629 13763
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 31.24 1.73 28.11 34.84 1.00 19942 14829
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
The first five rows of the summary output describe our model and model inputs defined in the brm
call.
If there is an interest in accessing a specific estimate from the model object, it can be obtained by using the $
operator on the summary object of the model. For example: summary(model1)$random$Subject
returns Subject-specific (or group level) random effects:
summary(model1)$random$Subject
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 42.55292 6.200867 31.53863 55.85676 1.000412 5591.303 8861.941
Moreover, the$
notation can be used in addition to the last command to access individual parts of the output.
Our estimate for the subject-specific random effect is 42.55, which differs from the estimate \(\widehat{\tau}_0\) = 37 under the classical method. However, the base model estimate is still within the Bayesian credible interval (31.5386345, 55.856759).
Rhat (\(\hat R\)) for a given parameter estimate measures whether the samples obtained are representative of the actual distribution. Mathematically, it is the square root of the estimated total variance over within-chain variance. Its formula is
\[\widehat R=\sqrt{\frac{\widehat {\rm var}^+ (\theta|{\rm Data})} {W}},\]
where
The STAN backend code uses split \(\hat R\) that takes \(M\) chains, splits them in half, then performs the \(\hat R\) calculation on the resulting \(2M\) chains. So if there is only one chain, \(\hat R\) can still be calculated (\(M=2\)).
If this value is larger than 1.1, it is an indication that the model has not converged adequately. This can be accessed for each parameter estimate within the model summary. For example, \(\hat R\) for subject specific random intercept, \(\hat R_{\hat\sigma_\tau}\) is accessed with summary(model1)$random$Subject$Rhat
= 1.0004121.
Bulk_ESS or bulk effective sample size is a diagnostic for the sampling efficiency in the bulk of the posterior distribution. It measures the information content of a sample chain. It can be thought of as the minimum sample size from the posterior which has the same efficiency in estimating the posterior as a chain of MCMC samples. A higher number here is better - ESS > 400 is recommended (Vehtari et al., 2021). \(ESS = NM /\widehat\pi\), where \(N\) is the length of the chain, \(M\) is the total number of chains, and \(\widehat\pi\) represents estimated correlation between lagged samples in the chain.
Tail_ESS represents the minimum sample size needed to create 5% and 95% quantiles of the posterior.
\(\widehat R\) of 1 and \(ESS\) greater than 400 are required to ensure an MCMC sample is useful in practice. More information on \(\widehat R\), \(ESS\), and \(Tail ESS\) calculations can be found in Vehtari et al (2021).
pop_effects <- summary(model1)$fixed
This section shows the estimate (posterior mean) of the intercept, slope, and the random effect terms.
fam_effects <- summary(model1)$spec_pars
This shows our Bayes estimate for \(\widehat\sigma_e\) = 31.24, while estimate from the classical method using the lme4
package is 30.99.
While the summary()
function has a tidy appearance, the posterior_summary()
function produces much more information about the estimates.
posterior_summary(model1)
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 227.132047 8.5110609 210.011897 243.3147620
## b_Days 10.459994 0.8110058 8.868201 12.0808333
## sd_Subject__Intercept 42.552923 6.2008675 31.538635 55.8567590
## sigma 31.243158 1.7305825 28.110510 34.8360205
## r_Subject[308,Intercept] 64.395375 12.4022506 40.149882 88.7474326
## r_Subject[309,Intercept] -55.634287 11.8970336 -78.431389 -31.7925528
## r_Subject[310,Intercept] -40.858222 11.8695490 -63.893391 -17.4527266
## r_Subject[330,Intercept] 27.464542 12.1917459 3.893000 51.5879079
## r_Subject[331,Intercept] 33.489801 12.1656300 10.053813 57.5066848
## r_Subject[332,Intercept] 31.422443 12.1080476 7.842747 55.6753750
## r_Subject[333,Intercept] 39.769985 12.2542521 15.863203 63.7315155
## r_Subject[334,Intercept] 20.021023 12.1596605 -3.523049 44.3472413
## r_Subject[335,Intercept] -22.717244 11.9533152 -46.177643 0.7032051
## r_Subject[337,Intercept] 95.966007 12.4709017 71.690211 120.4976203
## r_Subject[349,Intercept] 1.615151 12.1256699 -21.799628 25.4673204
## r_Subject[350,Intercept] 37.414615 12.1055378 13.702174 61.0783374
## r_Subject[351,Intercept] 15.151710 12.0579344 -8.176647 39.2082310
## r_Subject[352,Intercept] 59.731059 12.3570132 35.987047 84.5744143
## r_Subject[369,Intercept] 30.183657 12.2144481 6.526597 54.6015443
## r_Subject[370,Intercept] 16.620757 12.0964969 -6.754476 40.6666686
## r_Subject[371,Intercept] 19.777051 12.0504798 -3.887480 43.2967901
## r_Subject[372,Intercept] 41.417989 12.2161905 17.894376 65.6737382
## lprior -16.240267 1.7290961 -20.273374 -13.6635974
## lp__ -909.585952 4.2226882 -918.860830 -902.4809172
The first 4 rows are repeated from the summary, along with estimate of the quantiles of the posterior distribution. The next section of our model output displays \(c_{i}\), our intercept estimates for each subject. For example, subject 308 has the subject specific intercept of 64.35, so this could be interpreted as the subject having a mean reaction time of (64.35 + 227.04) = 291.39ms for day 0.
lprior
and lp__
are used for automatic prior/likelihood sensitivity analysis using an additional package, priorsens
and are not included in the brms
documentation.
To obtain the trace and density plots for all model parameters, one may use plot(model)
:
plot(model1)
The left side of the plot shows the posterior density of the parameters. The right side of the plot shows trace plots of the MCMC samples. To obtain a plot of an individual parameter, pars=
argument can be added to the plot function, i.e., plot(model1, pars=b_Intercept)
returns only the density and trace plots of the intercept.
As an alternative to the summary(model)$
function, the coef()
function can be used to pull model coefficients. We can access intercept estimates for each subject, \(\widehat c_i\), to plot them along with the dataset. For more model plotting options in the brms
package, see the mcmc_plot.brmsfit
section of the package documentation.
# population level slop estimate
pop_slope <- coef(model1)[["Subject"]][,,'Days'][1,1]
m1_coefs <- coef(model1)[["Subject"]][,,'Intercept'][,1]
estimate_df <- data.frame(subject = factor(names(m1_coefs)),
slope = pop_slope,
intercept = m1_coefs)
# Plotting first 4 subjects
# Function to emulate first 4 default colors from ggplot package
gg_color_hue <- function(n) {
hues = seq(15, 375, length = n + 1)
hcl(h = hues, l = 65, c = 100)[1:n]
}
colors = gg_color_hue(4)
sleepstudy %>%
filter(Subject %in% c(308,309,310,330)) %>% # first 4 subjects
ggplot(aes(x = Days, y = Reaction, color = Subject)) +
geom_point() +
# subject 308
geom_abline(slope = estimate_df[1,]$slope,
intercept = estimate_df[1,]$intercept,
color = colors[1]) +
# subject 309
geom_abline(slope = estimate_df[2,]$slope,
intercept = estimate_df[2,]$intercept,
color = colors[2]) +
# subject 310
geom_abline(slope = estimate_df[3,]$slope,
intercept = estimate_df[3,]$intercept,
color = colors[3]) +
# subject 330
geom_abline(slope = estimate_df[4,]$slope,
intercept = estimate_df[4,]$intercept,
color = colors[4])
The posterior distribution of each parameter can be assessed by using the as_draws()
function on the brms model object. By default, this function will return all variables from all chains, excluding estimates from warmup/burn-in draws. However, one may pull a single variable or list of variables by adding the variables=c()
, pull values for a specific chain using the $
operator, and include inc_warmup = TRUE
to the function call to include warmup draws.
Ex: as_draws(model1, variables=c('b_Intercept', 'sigma'), inc_warmup=TRUE)$3
will return b_Intercept
and sigma
estimates for all 10,000 draws on chain #3.
We will utilize the brms
package to approach a nonlinear problem with a Bayesian mixed model.
The lme4
package offers another dataset, "Contagious Bovine Pleuropneumonia" (CBPP) that is accessible using data(cbpp)
. This dataset describes the serological incidence of CBPP in zebu cattle during a survey implemented in 15 commercial herds in Ethiopia. The data contains 4 columns: 1. herd - a factor identifying the herd; 2. incidence - the number of new serological cases for a given herd and time period; 3. size - the number describing the size of the herd at the beginning of a given time period; 4. period - a factors with levels 1 to 4. The goal is modelling the incidence (health incidence) probability in terms of herd and period.
head(cbpp)
## herd incidence size period
## 1 1 2 14 1
## 2 1 3 12 2
## 3 1 4 9 3
## 4 1 0 5 4
## 5 2 3 22 1
## 6 2 1 18 2
First we'll plot the data points, colored by herd:
It is a little difficult to see which points belong to which herd. We can add a line to make it a little more clearer:
cbpp %>%
ggplot(aes(x = period, y = (size-incidence)/size, group=herd)) +
geom_point(aes(col=herd)) +
geom_line(aes(col=herd)) +
ylab("Cattle Without Disease (%)") +
xlab("Period")
We will fit a generalized linear mixed model with family set to "binomial". This model will have a random herd specific intercept and random period specific intercept. As before, we will build a model using the lmer
package first to obtain priors.
\[\mbox{ln}\left\{\frac{Pr(\mbox{Healthy}=1)}{Pr(\mbox{Healthy}=0)}\right\}=\beta_0 + \beta_h + \beta_p,\] where \(\beta_0\) is the population intercept, \(\beta_h\) is the random herd-level intercept for herd \(h\), \(\beta_p\) is the random period-level intercept for period \(p\). The herd level random effects are assumed to come from an iid normal distribution with mean zero and some shared, herd-level variance, \(\beta_h \sim N(0,\sigma^2_h)\) , \(h \in {1,...,15}\). Period-level random intercepts are assumed to be iid normal with mean zero and shared period-level variance, \(\beta_p \sim N(0,\sigma^2_p)\), \(p\in {1,2,3,4}\).
There is a useful function in the brms
package that gives information on all parameters that may have priors assigned, including the default priors that the package will use if unspecified.
Calling the get_prior()
function with the formula, family, and data specified shows this information:
# Get priors for our model
get_prior(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp)
## prior class coef group resp dpar nlpar lb ub
## student_t(3, 0, 2.5) Intercept
## student_t(3, 0, 2.5) sd 0
## student_t(3, 0, 2.5) sd herd 0
## student_t(3, 0, 2.5) sd Intercept herd 0
## student_t(3, 0, 2.5) sd period 0
## student_t(3, 0, 2.5) sd Intercept period 0
## source
## default
## default
## (vectorized)
## (vectorized)
## (vectorized)
## (vectorized)
As before, we use the estimates from the classical analysis using the lme4
package to figure out the priors for a brm
model.
nlin_base_model <- glmer(cbind(incidence, size-incidence) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp)
s <- summary(nlin_base_model)
coef(nlin_base_model)
TODO: Explain LMER model and how to obtain priors...
There is a slight change in notation in the brm
call here since we are using a generalized linear mixed effects model with the binomial family. The left side of the model reads size-incidence | trials(size)
, which notes the number of successes on the left (herd size - incidence) and the size of each herd inside of trials()
on the right.
priors <- c(prior(normal(-2.3, 10), class = Intercept), # pop slope
prior(normal(0, 1), class = sd), # residual error
prior(normal(0, 0.71), class = sd, group = herd, coef = Intercept), # herd-level intercept sd
prior(normal(0, 0.54), class = sd, group = period, coef = Intercept)) # period-level intercept sd
nonlin_bayes <- brm(size-incidence | trials(size) ~ (1 | period) + (1 | herd),
family = binomial(link="logit"),
data = cbpp,
prior = priors,
chains = 4,
iter = 10000,
thin = 1,
cores = 4
)
## Compiling Stan program...
## Start sampling
## Family: binomial
## Links: mu = logit
## Formula: size - incidence | trials(size) ~ (1 | period) + (1 | herd)
## Data: cbpp (Number of observations: 56)
## Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
## total post-warmup draws = 20000
##
## Group-Level Effects:
## ~herd (Number of levels: 15)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.72 0.19 0.39 1.14 1.00 7363 11147
##
## ~period (Number of levels: 4)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.63 0.23 0.28 1.18 1.00 9694 12761
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 2.28 0.42 1.49 3.14 1.00 6653 8564
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Posterior summary:
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 2.27957223 0.4178845 1.49258837 3.1424604
## sd_herd__Intercept 0.71512718 0.1929422 0.38703292 1.1382284
## sd_period__Intercept 0.63297275 0.2329689 0.27945501 1.1762494
## r_herd[1,Intercept] -0.58242152 0.4063733 -1.39344655 0.2037430
## r_herd[2,Intercept] 0.32725075 0.4101696 -0.44584835 1.1676858
## r_herd[3,Intercept] -0.37510695 0.3619084 -1.08191298 0.3396342
## r_herd[4,Intercept] 0.00480476 0.4573402 -0.87600084 0.9286435
## r_herd[5,Intercept] 0.22884232 0.3988658 -0.53622945 1.0476199
## r_herd[6,Intercept] 0.47217614 0.4295647 -0.31412619 1.3731280
## r_herd[7,Intercept] -0.89245793 0.3925574 -1.68916376 -0.1442033
## r_herd[8,Intercept] -0.65430451 0.4027218 -1.46575679 0.1086774
## r_herd[9,Intercept] 0.30614757 0.5099247 -0.63495739 1.3786286
## r_herd[10,Intercept] 0.61043971 0.4266187 -0.16704633 1.5059330
## r_herd[11,Intercept] 0.12613279 0.3676035 -0.59338019 0.8628040
## r_herd[12,Intercept] 0.10868315 0.4794709 -0.79375938 1.1027455
## r_herd[13,Intercept] 0.77849663 0.4569669 -0.03487779 1.7503533
## r_herd[14,Intercept] -1.01532988 0.4407256 -1.92053672 -0.1877496
## r_herd[15,Intercept] 0.60810221 0.4587012 -0.22315351 1.5684108
## r_period[1,Intercept] -0.80003273 0.3935593 -1.62821679 -0.0839223
## r_period[2,Intercept] 0.10064658 0.3988218 -0.73069515 0.8853567
## r_period[3,Intercept] 0.20919100 0.4016436 -0.60672948 1.0119745
## r_period[4,Intercept] 0.50426822 0.4350174 -0.28905147 1.4317063
## lprior -4.14438682 0.6673631 -5.83582125 -3.2934867
## lp__ -117.55461792 4.2924870 -127.08076189 -110.3135065
There are two ways to make predictions from a brm
model. The first way is to use the generic predict()
function that is built in. This function returns an estimate using the fitted parameters from the model including the estimated error and quantiles.
If the interest is in predicting the number of healthy cattle from herd #2 in period 4 with a total size of 15, we can use the following code:
predict(nonlin_bayes, newdata = data.frame(herd = 2,
period =4,
size =15))
## Estimate Est.Error Q2.5 Q97.5
## [1,] 14.2849 0.8902087 12 15
The second way is to use the posterior_predict()
function that is built in to the brms
package. This will return an S x N array of values where S is the total number of posterior draws from our model (total number of chains x number of post burn-in draws = 4 x 5,000 in our example) and N is the number of observations being predicted.
The following output represents the predicted number of healthy (non-diseased) cattle.
preds <- posterior_predict(nonlin_bayes, newdata = data.frame(herd = 2,
period =4,
size =15))
head(preds)
## [,1]
## [1,] 15
## [2,] 13
## [3,] 13
## [4,] 13
## [5,] 13
## [6,] 15
One may calculate the predictive probability as follows:
\[{\rm Pr}(\mbox{Healthy}=1)=\frac{\exp(\beta_{0, s}+\beta_{h,s}+\beta_{p, s})}
{1+\exp(\beta_{0, s}+\beta_{h, s}+\beta_{p, s})},\] where \((\beta_{0, s}+\beta_{h, s}+\beta_{p, s})\) denotes the samples drawn from the posterior distribution of the parameters. A simple way to show a histogram of the predictive probabilities is to use the hist()
function:
(preds / 15) %>% head()
## [,1]
## [1,] 1.0000000
## [2,] 0.8666667
## [3,] 0.8666667
## [4,] 0.8666667
## [5,] 0.8666667
## [6,] 1.0000000
hist(preds/15, main="Predictive density", xlab="probability of healthy for herd 2 and period 4", prob=T)
In the Bayesian context, we obtain the predictive distribution of a future observation, which sheds light on the uncertainty of the prediction. To view model estimates for each parameter, one may use the as_draws()
function as we did earlier. The code below provides \(\widehat\beta_0\) from all posterior samples after the burn-in period of chain 1:
as_draws(nonlin_bayes)$`1`$b_Intercept
Using this function, we can create an dataframe of estimates for population slope \(\widehat\beta_0\), herd specific random slope \(\widehat\beta_h\), and period specific random slope \(\widehat\beta_p\) for our earlier example of cattle in herd number 2 and period 4:
chain1 <- as_draws(nonlin_bayes)$`1`
chain2 <- as_draws(nonlin_bayes)$`2`
chain3 <- as_draws(nonlin_bayes)$`3`
chain4 <- as_draws(nonlin_bayes)$`4`
beta_hat0 <- c(chain1$b_Intercept,
chain2$b_Intercept,
chain3$b_Intercept,
chain4$b_Intercept)
beta_hath2 <- c(chain1$`r_herd[2,Intercept]`,
chain2$`r_herd[2,Intercept]`,
chain3$`r_herd[2,Intercept]`,
chain4$`r_herd[2,Intercept]`)
beta_hatp4 <- c(chain1$`r_period[4,Intercept]`,
chain2$`r_period[4,Intercept]`,
chain3$`r_period[4,Intercept]`,
chain4$`r_period[4,Intercept]`)
beta_hat_df<- data.frame('beta_hat_0' = beta_hat0,
'beta_hat_h2' = beta_hath2,
'beta_hat_p4' = beta_hatp4)
kable(head(beta_hat_df))
beta_hat_0 | beta_hat_h2 | beta_hat_p4 |
---|---|---|
1.926996 | 0.5357044 | 0.6697856 |
2.260085 | 0.1031165 | 0.4283443 |
2.001483 | 0.3337946 | 0.5822312 |
1.683224 | 0.4592128 | 1.4044626 |
1.936809 | 0.2113180 | 0.4486134 |
2.222582 | 0.2694498 | 1.1179977 |
https://cran.microsoft.com/snapshot/2021-10-10/web/packages/lme4/vignettes/lmer.pdf
https://www.lcampanelli.org/mixed-effects-modeling-lme4/
https://www.rensvandeschoot.com/tutorials/lme4/
https://vitalflux.com/fixed-vs-random-vs-mixed-effects-models-examples/
https://www.rdocumentation.org/packages/lme4/versions/1.1-29/topics/lmer
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5970551/
Belenky et al. (2003). [[Sleepstudy data]]
Bürkner, (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Magnusson et al. (2019). Bayesian leave-one-out cross-validation for large data (2019) (http://proceedings.mlr.press/v97/magnusson19a/magnusson19a.pdf)
Vehtari et al (2013). Understanding predictive information criteria for Bayesian models. http://www.stat.columbia.edu/~gelman/research/published/waic_understand3.pdf
Vehtari et al. (2018). R-squared for Bayesian regression models (http://www.stat.columbia.edu/~gelman/research/unpublished/bayes_R2.pdf)
Vehtari et al. (2019). Bayesian R2 and LOO-R2 (https://avehtari.github.io/bayes_R2/bayes_R2.html)
Vehtari et al. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (with discussion). Bayesian Data Analysis. (https://projecteuclid.org/journals/bayesian-analysis/advance-publication/Rank-Normalization-Folding-and-Localization--An-Improved-R%CB%86-for/10.1214/20-BA1221.full)
Comments on the variance component estimation
To analyze the data using the mixed effects model, we will utilize the R package
lme4
. Generally, the restricted maximum likelihood (REML) method is recommended for model fitting. However, REML does not work with model selection techniques, such as AIC and BIC. Therefore, if it is necessary to compare different models, REML should not be used. However, once the best model is selected, REML can be used with that model for more accurate confidence intervals. To avoid the REML method, one should setREML=FALSE
in thelmer
function. The default method oflmer
is REML.Here is the code for fitting model-1.
Output
The summary of the model gives us AIC, BIC, and log-likelihood scores, deviance, degrees of freedom, scaled residuals, and parameter estimates. First, the estimated standard deviation of the random intercept term is 36.01, and that for the residuals is 30.9. Next, the fixed effect of days can be measured via the regression coefficient, and the estimated regression coefficient is 10.47.
Prediction under model 1
Next we plot the predicted response \(\widehat{Y}_{i, j} = \widehat{\alpha}+\widehat{c}_{i} + j\widehat{\beta}\). Each line corresponds to each subject.