Learn multilevel models: Estimating a Bayes factor using the Savage-Dickey density ratio

Andrew Ellis

Bayes factor for difference in means

We are going to look at data from a study in which children with ADHD were compared to a control group in terms of their working memory capacity

library(tidyverse)
nback <- read_csv("https://raw.githubusercontent.com/kogpsy/neuroscicomplab/main/data/adhd-nback.csv")

Die grouping variable group should be converted to a factor.

nback <- nback %>% 
  mutate(group = as_factor(group),
         group = fct_relevel(group, "control"))

A directed Welch test result in a p-value of approx. 0.07. We cannot reject the null hypothesis.

t.test(rt ~ group,
       data = nback,
       alternative = "less")


    Welch Two Sample t-test

data:  rt by group
t = -1.5282, df = 28.57, p-value = 0.06873
alternative hypothesis: true difference in means between group control and group adhd is less than 0
95 percent confidence interval:
       -Inf 0.01757687
sample estimates:
mean in group control    mean in group adhd 
             1.153130              1.309528

We now want to perform a Bayesian model comparison using a Bayes factor, in order to determine whether there is evidence that the ADHD group has a larger mean than the control group. However, we will start by estimating a Bayes factor for the undirected hypotheses that the group means differ.

Exercise 1 Estimate an intercept and a group difference. Use a normal(0, 1) for the group difference.

summary(m1)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: rt ~ group 
   Data: nback1 (Number of observations: 51) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     1.15      0.06     1.03     1.28 1.00     3543     2601
groupadhd     0.16      0.10    -0.04     0.35 1.00     3581     2679

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.34      0.04     0.27     0.41 1.00     3673     2572

Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

The population level effects Intercept and groudadhd represent the expected mean of the control group, and the expected difference between groups. The parameter has a mean of $0.16$ a $95%$ credible interval of $[-0.04, 0.35]$.

You can use the hypothesis() function to get the number of samples that are positive divided by the number of samples that are negative.

Exercise 2

Use the hypothesis() function to obtain the probability that the parameter is $>0$.
Save the output of hypothesis() and plot it using plot().

h1 <- m1 %>% 
  hypothesis("groupadhd > 0")

h1

Hypothesis Tests for class b:
       Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) > 0     0.16       0.1        0     0.31      17.18
  Post.Prob Star
1      0.94     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

This returns the ratio of positive versus negative samples from the posterior distribution. THe evidence ratio Evid.Ratio represents the area under the posterior greater than zero divided by the area under the posterior less than zero. In this case the evidence ratio is $17.18$, which means that positive values are $17.18$ more probable than negative values, or the positive area is $17.18$ times greater than the negative area.

We can also compute the posterior probability that the value is positive. This means that we have to find the probability $p_+$ for which $\frac{p_+}{1-p_+} = 17.18$. This results in $p_+ = 17.18/(1 + 17.18) = 0.945$, which is given in the output under Post.Prob.

p_pos <- 17.18 / (1 + 17.18)
p_pos

[1] 0.9449945

plot(h1)

Exercise 3

Refit the model from exercise 1, but this time adding the argument sample_prior = TRUE, and saving asm2.

Two-sided

m2 <- brm(rt ~ group,
          prior = priors,
          sample_prior = TRUE,
          data = nback1,
          file = "models/exc4-m2")

By including this argument we are telling brms to sample from both the prior and posterior distributions, which we need to compute the Savage-Dickey density ratio.

We can plot both prior and posterior, i.e. what we believed about our parameter before seeing the data, and our updated belief after taking the data into account.

m2 %>% 
  mcmc_plot(c("b_groupadhd", "prior_b"))

Exercise 4

Try to wuantify evidence for the null hypothesis that the group difference is zero, using the hypothesis() function to compute the Savage-Dickey density ratio.

save the ouput as h2.
inspect the ouput using print(h2) or simply h2.
the evidence ratio Evid.Ratio is the Bayes factor $BF_{01}$. THe ouput of hypothesis() is a list;Evid.Ratio can be accessed under h2$hypothesis$Evid.Ratio. Save this as BF01.
Show the Bayes factor $BF_{10}$ (this is easy if you already have $BF_{01}$). What does $BF_{10}$ tell us?

h2 <- m2 %>% 
  hypothesis("groupadhd = 0")
h2

Hypothesis Tests for class b:
       Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) = 0     0.16       0.1    -0.03     0.34       2.63
  Post.Prob Star
1      0.72     
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

BF01 <- h2$hypothesis$Evid.Ratio
BF01

[1] 2.62944

The Bayes factor $BF_{01}$ tells us how much more likely the data are under the null hypothesis than under the alternative hypothesis.. The alternative hypothesis corresponds to our prior on the group difference, which stated that we were 95% certain that the difference lies between -2 and 2. This prior is not very specific, and the resulting Bayes factor reflects this—under the null hypothesis, the data are approximately 2.6 times more likely than under the alternative hypothesis. Despite the fact that when estimating the parameter we are almost 95% certain that the parameter is positive, the Bayes factor provides evidence for the null hypothesis. The Bayes factor for the alternative hypothesis is $0.38$.

BF10 <- 1/BF01
BF10

[1] 0.3803091

We have to extremely careful when chooing prior distributions when estimating Bayes factors. Uninformative priors are useful when estimating parameters, but are a bad idea when estimating Bayes factors.

Exercise 5

Plot the output hypothesis() using the plot() method.

The density evaluated at 0 is higher under the posterior than under prior, leading to the BF for the null.

One-sided

Now we will attempt to estimate the Bayes factor $BF_{10}$ for the directed hypothesis that the mean of the ADHD group is greater than that of the control group.

This hypothesis states that the group difference is constraine to be positive. Therefore, we will use a lower bound on our prior distribution. Furthermore, we expect the difference to be small (based on previous research), so we use a small standar deviation of $0.1$. Combining both of these beliefs, we can express our prior as prior(normal(0, 0.1), lb = 0). This corresponds to a folded normal distribution, or a half-normal.

tibble(x = seq(-0.5, 1, by = 0.01),
       y = if_else(x >= 0,  dnorm(x, 0, 0.1), 0)) |> 
  ggplot(aes(x, abs(y))) +
  geom_line(size = 1.5) +
  xlab("") + ylab("") +
  ggtitle("Half-normal distribution")

m3 <- brm(rt ~ group,
          prior = prior(normal(0, 0.1), lb = 0),
          sample_prior = TRUE,
          data = nback1,
          file = "models/exc4-m3")

We plot both prior and posterior, as before. THe prior now has zero density for values $<0$.

m3 %>% 
  mcmc_plot(c("b_groupadhd", "prior_b"))

Now we can use the same hypothesis as before—we are now comparing the null to an alternative that states that the group difference is small, but positive.

h3 <- m3 %>% 
  hypothesis("groupadhd = 0")
h3

Hypothesis Tests for class b:
       Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) = 0      0.1      0.06     0.01     0.22       0.42
  Post.Prob Star
1       0.3    *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.

The Bayes factor is now $0.42$ in favour of the null hypothesis. This means that the data are $0.42$ times as likely under the null.

BF01alt <- h3$hypothesis$Evid.Ratio
BF01alt

[1] 0.423604

This is better expresssed as a Bayes factor in favour of the alternative, $BF_{10}$.

BF10alt <- 1/BF01alt
BF10alt

[1] 2.360695

Now the data are $2.36$ more likely to have occurred under the alternative than under the null hypothesis.

The Bayes factor tells us how well a model’s prior distributions predicted the data. The Bayes factor is this related to the prior predictive distribution. Our hypotheses correspond to prior distributions on the parameters, and thus it is extremely important that our priors reflect our hypotheses.

Estimating a Bayes factor using the Savage-Dickey density ratio

Bayes factor for difference in means

Two-sided

One-sided

References

Corrections

Reuse

Citation