Learn multilevel models: Multilevel formula syntax

Andrew Ellis

library(tidyverse)
library(brms)
library(tidybayes)

theme_set(theme_grey(base_size = 14) +
            theme(panel.grid = element_blank()))

Types of clusters

So far we have looked at simple hierachical structures. For example, we have considered repeated measures on subjects at two time points, or over the course of several days.

Multilevel models allow us take complex hierarchical structures into account when modelling data—for instance. we might have data from pupils nested within classes and schools, we might have two separate grouping variable that are crossed, such as subjects and word stimuli in an experiment.

Crossed

Imagine that we are testing 3 subjects in an experiment, and each subject is shown each of 10 stimuli.

subject_id <- str_c("subj_", 1:3)
item_id <- 1:10

crossing(subject_id, item_id)

# A tibble: 30 x 2
   subject_id item_id
   <chr>        <int>
 1 subj_1           1
 2 subj_1           2
 3 subj_1           3
 4 subj_1           4
 5 subj_1           5
 6 subj_1           6
 7 subj_1           7
 8 subj_1           8
 9 subj_1           9
10 subj_1          10
# … with 20 more rows

In this case we might want to take into account that both subjects and items can be considered as sources of variance, and that the observations can be clustered by both subjects and item. Every subject and every item appears multiple times in the dataset, and each subject is presented with all the items. In this case, subjects and items are crossed.

In a traditional analysis, it is commonplace to aggregate over one of the clusters while computing estimates for the other. For instance, when computing subject-specific means, we might simply ignore the items by aggregating over them.

However, this can lead to aggregation artefacts, and systematically ignores aources of variation (Baayen, Davidson, and Bates 2008).

In a multilevel model, we can instead simultaneously model both sources of variability as separate cluster.

If we are predicting a response variable in different conditions, we can include these crossed varying effects using the following notation.

Varying intercepts

response ~ 1 + condition + (1 | subject) + (1 | item)

or, equivalently

response ~ 1 + condition + (1 | subject + item)

Varying intercepts and slopes

Note that here the intercept is automatically added to the formula.

response ~ condition + (condition | subject) + (condition | item)

Example

For example, Wagenmakers and Brown (2007) had subjects perform a lexical decision task in two conditions, speeded and accuracy, in which the instructions were to decide either as quickly or as accurately as posssible whether a string of letters was a word or a non-word. The response variable were response times and choices.

library(rtdists)
data(speed_acc) 

speed_acc <- speed_acc %>%
  as_tibble() |> 
  select(-censor, -block)


df_speed_acc <- speed_acc %>% 
   # remove rt under 180 ms and above 3000 ms
  filter(rt > 0.18, rt < 3) %>% 
   # convert to character
  mutate(across(c(stim_cat, response), as.character),
         correct = as.numeric(stim_cat == response),
         across(c(stim_cat, response), as_factor))

df_speed_acc

# A tibble: 31,355 x 8
   id    condition stim  stim_cat frequency   response    rt correct
   <fct> <fct>     <fct> <fct>    <fct>       <fct>    <dbl>   <dbl>
 1 1     speed     5015  nonword  nw_low      nonword  0.7         1
 2 1     speed     3623  word     very_low    nonword  0.392       0
 3 1     speed     6481  nonword  nw_very_low nonword  0.46        1
 4 1     speed     3305  word     very_low    word     0.455       1
 5 1     speed     2244  word     low         nonword  0.505       0
 6 1     speed     4468  nonword  nw_high     nonword  0.773       1
 7 1     speed     1047  word     high        word     0.39        1
 8 1     speed     5711  nonword  nw_low      word     0.587       0
 9 1     speed     5036  nonword  nw_low      nonword  0.603       1
10 1     speed     1111  word     high        word     0.435       1
# … with 31,345 more rows

We are interested in the subject and stimulus IDs as sources of variability, and the differences in accuracy between instruction conditions.

Varying intercept model

brm(correct ~ condition + (1 | id) + (1 | stim),
    family = bernoulli(),
    data = df_speed_acc)

Varying intercept and slope model

brm(correct ~ condition + (condition | id) + (condition | stim),
    family = bernoulli(),
    data = df_speed_acc)

Nested

Scholastic aptitude tests are given multiple times to students (repeated observations nested within students), and students are nested within schools. Schools in turn can be nested within districts. Here, students are said to be nested within schools, meaning that each school contains students unique to that school.

A further example could be therapists working with several different patients.

For example, to model varying effects for therapists nested within patients, we can can use the following syntax.

response ~ condition + (condition | therapist/patient)

This is expanded to

response ~ condition + (condition | therapist) + (condition | therapist:patient)

The : operator creates a new grouping factor that consists of the combined levels of therapist and patient.

nested <- tibble(
  patient = c(rep("A",2), rep("B",2), rep("C",2), rep("D",2)),
  therapist  = c(rep("GE", 4), rep("DF", 4))
)

Extended multilevel formula syntax

This is explained in Bürkner (2018) and in the vignettes for the brms package.

You can access this by entering vignette("brms_multilevel") in the R console.

The syntax in brms is very close to that introduced by lme4 (Bates et al. 2015). However, brms extend the syntax so that nore complex models cab be expressed.

In general, the syntax has the form

response ~ pterms + (gterms | group)

The pterms part contains the population-level effects. The gterms part contains so called group-level effects that are assumed to vary within levels of a grouping variable

For example, (1 | group), is the simplest possible mixed-model formula, where each level of the grouping factor, group, has its own varying intercept, which can be thought of in two ways:

as a deviation \(\alpha_j\) from the average intercept \(\alpha\). The \(\alpha_j\) terms have the distribution \(\alpha_j \sim \mathcal{N}(0, \sigma_{\alpha})\).
as a random variate drawn from the distribution \(\alpha_j \sim \mathcal{N}(\alpha, \sigma_{\alpha})\)

The two are equivalent (you can always remove the mean from a normal distribution and add it to a zero mean random variate), but 1) emphasizes the decomposition into two components, whereas 2) emphasizes the hierarchical structure.

In addition to the lme4 inspired syntax, brms offers several extensions to the formula syntax.

Grouping terms

Varying effects can be grouped using the gr() function, with the argument by specifying sub-populations of the groups. For each level of the by variable, a separate variance-covariance matrix will be fitted.

For example, the following code fits varying intercepts for patients, but separately for each treatment.

response ~ 1 + (1 | gr(patient, by = "treatment"))

Multi-membership grouping terms

Varying effects can also be members of several groups—this is taken care of with the mm() function. As an example, pupils might change schools during the term, meaning that they are member of both school1 and school2. We can also take into account the amount of time spent at each school using the weights argument.

response ~ x1 + (1 | mm(school1, school2, weights = cbind(w1, w2)

Correlation between varying effects

If you are predicting family-specific (distributional) parameters, and you are including varying effects for a grouping variable, you might want to model a correlation between these effects. You can do by including the term |s|, where s can be any arbitrary identifier.

bf(response ~ 1 + (1 |s| subject),
   sigma ~ 1 + (1 |s| subject))

If on the other hand you don’t want to estimate the correlation between varying intercepts and varying slopes, you can omit this by leaving out the identifier between the | |.

response ~ 1 + condition + (1 || subject)

Baayen, R. H., D. J. Davidson, and D. M. Bates. 2008. “Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items.” Journal of Memory and Language, Special issue: Emerging Data Analysis, 59 (4): 390–412. https://doi.org/10.1016/j.jml.2007.12.005.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1). https://doi.org/10.18637/jss.v067.i01.

Bürkner, Paul-Christian. 2018. “Advanced Bayesian Multilevel Modeling with the R Package Brms.” The R Journal 10 (1): 395. https://doi.org/10.32614/RJ-2018-017.

Wagenmakers, Eric-Jan, and Scott Brown. 2007. “On the Linear Relation Between the Mean and the Standard Deviation of a Response Time Distribution.” Psychological Review 114 (3): 830–41. https://doi.org/10.1037/0033-295X.114.3.830.

Multilevel formula syntax