For a simple mean comparison
We are going to look at data from a study in which children with ADHD were compared to a control group in terms of their working memory capacity
Die grouping variable group
should be converted to a factor.
nback <- nback %>%
mutate(group = as_factor(group),
group = fct_relevel(group, "control"))
A directed Welch test result in a p-value of approx. 0.07. We cannot reject the null hypothesis.
t.test(rt ~ group,
data = nback,
alternative = "less")
Welch Two Sample t-test
data: rt by group
t = -1.5282, df = 28.57, p-value = 0.06873
alternative hypothesis: true difference in means between group control and group adhd is less than 0
95 percent confidence interval:
-Inf 0.01757687
sample estimates:
mean in group control mean in group adhd
1.153130 1.309528
We now want to perform a Bayesian model comparison using a Bayes factor, in order to determine whether there is evidence that the ADHD group has a larger mean than the control group. However, we will start by estimating a Bayes factor for the undirected hypotheses that the group means differ.
Exercise 1 Estimate an intercept and a group difference. Use a normal(0, 1)
for the group difference.
summary(m1)
Family: gaussian
Links: mu = identity; sigma = identity
Formula: rt ~ group
Data: nback1 (Number of observations: 51)
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 4000
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.15 0.06 1.03 1.28 1.00 3543 2601
groupadhd 0.16 0.10 -0.04 0.35 1.00 3581 2679
Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.34 0.04 0.27 0.41 1.00 3673 2572
Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The population level effects Intercept
and groudadhd
represent the expected mean of the control group, and the expected difference between groups. The parameter has a mean of \(0.16\) a \(95%\) credible interval of \([-0.04, 0.35]\).
You can use the hypothesis()
function to get the number of samples that are positive divided by the number of samples that are negative.
Exercise 2
Use the hypothesis()
function to obtain the probability that the parameter is \(>0\).
Save the output of hypothesis()
and plot it using plot()
.
h1 <- m1 %>%
hypothesis("groupadhd > 0")
h1
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) > 0 0.16 0.1 0 0.31 17.18
Post.Prob Star
1 0.94
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
This returns the ratio of positive versus negative samples from the posterior distribution. THe evidence ratio Evid.Ratio
represents the area under the posterior greater than zero divided by the area under the posterior less than zero. In this case the evidence ratio is \(17.18\), which means that positive values are \(17.18\) more probable than negative values, or the positive area is \(17.18\) times greater than the negative area.
We can also compute the posterior probability that the value is positive. This means that we have to find the probability \(p_+\) for which \(\frac{p_+}{1-p_+} = 17.18\). This results in \(p_+ = 17.18/(1 + 17.18) = 0.945\), which is given in the output under Post.Prob
.
p_pos <- 17.18 / (1 + 17.18)
p_pos
[1] 0.9449945
plot(h1)
Exercise 3
Refit the model from exercise 1, but this time adding the argument sample_prior = TRUE
, and saving asm2
.
m2 <- brm(rt ~ group,
prior = priors,
sample_prior = TRUE,
data = nback1,
file = "models/exc4-m2")
By including this argument we are telling brms
to sample from both the prior and posterior distributions, which we need to compute the Savage-Dickey density ratio.
We can plot both prior and posterior, i.e. what we believed about our parameter before seeing the data, and our updated belief after taking the data into account.
Exercise 4
Try to wuantify evidence for the null hypothesis that the group difference is zero, using the hypothesis()
function to compute the Savage-Dickey density ratio.
h2
.print(h2)
or simply h2
.Evid.Ratio
is the Bayes factor \(BF_{01}\). THe ouput of hypothesis()
is a list;Evid.Ratio
can be accessed under h2$hypothesis$Evid.Ratio
. Save this as BF01
.h2 <- m2 %>%
hypothesis("groupadhd = 0")
h2
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) = 0 0.16 0.1 -0.03 0.34 2.63
Post.Prob Star
1 0.72
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
BF01 <- h2$hypothesis$Evid.Ratio
BF01
[1] 2.62944
The Bayes factor \(BF_{01}\) tells us how much more likely the data are under the null hypothesis than under the alternative hypothesis.. The alternative hypothesis corresponds to our prior on the group difference, which stated that we were 95% certain that the difference lies between -2 and 2. This prior is not very specific, and the resulting Bayes factor reflects this—under the null hypothesis, the data are approximately 2.6 times more likely than under the alternative hypothesis. Despite the fact that when estimating the parameter we are almost 95% certain that the parameter is positive, the Bayes factor provides evidence for the null hypothesis. The Bayes factor for the alternative hypothesis is \(0.38\).
BF10 <- 1/BF01
BF10
[1] 0.3803091
We have to extremely careful when chooing prior distributions when estimating Bayes factors. Uninformative priors are useful when estimating parameters, but are a bad idea when estimating Bayes factors.
Exercise 5
Plot the output hypothesis()
using the plot()
method.
The density evaluated at 0 is higher under the posterior than under prior, leading to the BF for the null.
Now we will attempt to estimate the Bayes factor \(BF_{10}\) for the directed hypothesis that the mean of the ADHD group is greater than that of the control group.
This hypothesis states that the group difference is constraine to be positive. Therefore, we will use a lower bound on our prior distribution. Furthermore, we expect the difference to be small (based on previous research), so we use a small standar deviation of \(0.1\). Combining both of these beliefs, we can express our prior as prior(normal(0, 0.1), lb = 0)
. This corresponds to a folded normal distribution, or a half-normal.
We plot both prior and posterior, as before. THe prior now has zero density for values \(<0\).
Now we can use the same hypothesis as before—we are now comparing the null to an alternative that states that the group difference is small, but positive.
h3 <- m3 %>%
hypothesis("groupadhd = 0")
h3
Hypothesis Tests for class b:
Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
1 (groupadhd) = 0 0.1 0.06 0.01 0.22 0.42
Post.Prob Star
1 0.3 *
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
The Bayes factor is now \(0.42\) in favour of the null hypothesis. This means that the data are \(0.42\) times as likely under the null.
BF01alt <- h3$hypothesis$Evid.Ratio
BF01alt
[1] 0.423604
This is better expresssed as a Bayes factor in favour of the alternative, \(BF_{10}\).
BF10alt <- 1/BF01alt
BF10alt
[1] 2.360695
Now the data are \(2.36\) more likely to have occurred under the alternative than under the null hypothesis.
The Bayes factor tells us how well a model’s prior distributions predicted the data. The Bayes factor is this related to the prior predictive distribution. Our hypotheses correspond to prior distributions on the parameters, and thus it is extremely important that our priors reflect our hypotheses.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/awellis/learnmultilevelmodels, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ellis (2021, May 29). Learn multilevel models: Estimating a Bayes factor using the Savage-Dickey density ratio. Retrieved from https://awellis.github.io/learnmultilevelmodels/bayes-factor-group-difference.html
BibTeX citation
@misc{ellis2021estimating, author = {Ellis, Andrew}, title = {Learn multilevel models: Estimating a Bayes factor using the Savage-Dickey density ratio}, url = {https://awellis.github.io/learnmultilevelmodels/bayes-factor-group-difference.html}, year = {2021} }