Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Intro to Bayesian Statistics

Part 2
Linear models

Andrew Ellis

Bayesian multilevel modelling workshop 2021

05-21-2021

Estimating parameters of a normal distribution

We have 3 data points, y1=85, y2=100, und y3=115. We can assume that they are normally distributed.

p(y|μ,σ)=1Zexp(12(yμ)2σ2)

Goal: We want to estimate the mean and standard deviation of a normal distribution

Z=σ2π is a normalising constant.

We can compute the probability of one data point, and if the data points are i.i.d, then P(data|,μ,σ) is the product of the indidividual probabilities.

Estimating parameters of a normal distribution

We have 3 data points, y1=85, y2=100, und y3=115. We can assume that they are normally distributed.

p(y|μ,σ)=1Zexp(12(yμ)2σ2)

Goal: We want to estimate the mean and standard deviation of a normal distribution

Z=σ2π is a normalising constant.

We can compute the probability of one data point, and if the data points are i.i.d, then P(data|,μ,σ) is the product of the indidividual probabilities.

But how do we find μ und σ?

Graphical model


Graphical Model für normalverteilte Daten.

Graphical Model für normalverteilte Daten.

Graphical model


Graphical Model für normalverteilte Daten.

Graphical Model für normalverteilte Daten.

μ??? σ??? yN(μ,σ)

library(tidyverse)
sequence_length <- 100
d1 <- crossing(y = seq(from = 50, to = 150, length.out = sequence_length),
mu = c(87.8, 100, 112),
sigma = c(7.35, 12.2, 18.4)) %>%
mutate(density = dnorm(y, mean = mu, sd = sigma),
mu = factor(mu, labels = str_c("mu==", c(87.8, 100, 112))),
sigma = factor(sigma,
labels = str_c("sigma==", c(7.35, 12.2, 18.4))))
theme_set(
theme_bw() +
theme(text = element_text(color = "white"),
axis.text = element_text(color = "black"),
axis.ticks = element_line(color = "white"),
legend.background = element_blank(),
legend.box.background = element_rect(fill = "white",
color = "transparent"),
legend.key = element_rect(fill = "white",
color = "transparent"),
legend.text = element_text(color = "black"),
legend.title = element_text(color = "black"),
panel.background = element_rect(fill = "white",
color = "white"),
panel.grid = element_blank()))
d1 %>%
ggplot(aes(x = y)) +
geom_ribbon(aes(ymin = 0, ymax = density),
fill = "steelblue") +
geom_vline(xintercept = c(85, 100, 115),
linetype = 3, color = "white") +
geom_point(data = tibble(y = c(85, 100, 115)),
aes(y = 0.002),
size = 2, color = "red") +
scale_y_continuous(expression(italic(p)(italic(y)*"|"*mu*", "*sigma)),
expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
ggtitle("Which distribution?") +
coord_cartesian(xlim = c(60, 140)) +
facet_grid(sigma ~ mu, labeller = label_parsed) +
theme_bw() +
theme(panel.grid = element_blank())

Example: t-Test as a linaer model

We will use some simulated data:

we have two groups. One group were given a smart drug, the other is the control (placebo) group. IQ values have a mean of 100, and an SD of 15

We would like to know if the smart group is smarter than the control group.

Kruschke, John. Doing Bayesian Data Analysis (2nd Ed.). Boston: Academic Press, 2015.

smart = tibble(IQ = c(101,100,102,104,102,97,105,105,98,101,100,123,105,103,
100,95,102,106,109,102,82,102,100,102,102,101,102,102,
103,103,97,97,103,101,97,104,96,103,124,101,101,100,
101,101,104,100,101),
Group = "SmartDrug")
placebo = tibble(IQ = c(99,101,100,101,102,100,97,101,104,101,102,102,100,105,
88,101,100,104,100,100,100,101,102,103,97,101,101,100,101,
99,101,100,100,101,100,99,101,100,102,99,100,99),
Group = "Placebo")
TwoGroupIQ <- bind_rows(smart, placebo) %>%
mutate(Group = fct_relevel(as.factor(Group), "Placebo"))
library(kableExtra)
TwoGroupIQ %>%
group_by(Group) %>%
summarise(mean = mean(IQ),
sd = sd(IQ)) %>%
mutate(across(where(is.numeric), round, 2)) %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover"))
Group mean sd
Placebo 100.36 2.52
SmartDrug 101.91 6.02

Smart Drug Group

d <- TwoGroupIQ %>%
filter(Group == "SmartDrug") %>%
mutate(Group = fct_drop(Group))
d %>%
ggplot(aes(x = IQ)) +
geom_histogram(fill = "skyblue3", binwidth = 1) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)))

Parameter estimation using brms

Priors

library(brms)
priors <- get_prior(IQ ~ 1,
data = d)
priors
## prior class coef group resp dpar nlpar bound source
## student_t(3, 102, 3) Intercept default
## student_t(3, 0, 3) sigma default

Priors

σ

tibble(x = seq(from = 0, to = 10, by = .025)) %>%
mutate(d = dt(x, df = 3)) %>%
ggplot(aes(x = x, ymin = 0, ymax = d)) +
geom_ribbon(fill = "skyblue3") +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
coord_cartesian(xlim = c(0, 8),
ylim = c(0, 0.35)) +
xlab(expression(sigma)) +
labs(subtitle = "Half-student-t prior on standard deviation") +
theme_bw(base_size = 14)

Priors

μ

tibble(x = seq(from = 0, to = 200, by = .025)) %>%
mutate(d = dnorm(x, mean = 102, sd = 3)) %>%
ggplot(aes(x = x, ymin = 0, ymax = d)) +
geom_ribbon(fill = "skyblue3") +
scale_x_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05)), breaks = NULL) +
coord_cartesian(xlim = c(50, 150),
ylim = c(0, 0.15)) +
xlab(expression(sigma)) +
labs(subtitle = "Normal prior on mean") +
theme_bw(base_size = 14)

Prior Predictive Distribution

m1_prior <- brm(IQ ~ 1,
prior = priors,
data = d,
sample_prior = "only",
file = "models/twogroupiq-prior-1")

Prior Predictive Distribution

summary(m1_prior)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: IQ ~ 1
## Data: d (Number of observations: 47)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 101.95 6.20 92.34 111.71 1.00 2048 1510
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 3.75 17.71 0.10 12.94 1.00 1757 1199
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Prior Predictive Distribution

plot(m1_prior)

Prior Predictive Distribution

library(tidybayes)
prior_pred_1 <- d %>%
modelr::data_grid(Group) %>%
add_predicted_draws(m1_prior) %>%
ggplot(aes(y = Group, x = .prediction)) +
stat_interval(.width = c(.50, .80, .95, .99)) +
geom_point(aes(x = IQ), alpha = 0.4, data = d) +
scale_color_brewer() +
theme_tidybayes()

Sampling from the Posterior

m1 <- brm(IQ ~ 1,
prior = priors,
data = d,
file = "models/twogroupiq-1")
plot(m1)

summary(m1)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: IQ ~ 1
## Data: d (Number of observations: 47)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 101.92 0.83 100.31 103.53 1.00 3134 2075
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 6.04 0.64 4.96 7.47 1.00 3060 2458
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Posterior Quantile Intervals

library(patchwork)
mcmc_plot(m1, pars = "b") / mcmc_plot(m1, pars = "sigma")

Posterior Samples

samples <- posterior_samples(m1) %>%
transmute(mu = b_Intercept, sigma = sigma)
library(tidybayes)
samples %>%
select(mu) %>%
median_qi(.width = c(.50, .80, .95, .99))
## mu .lower .upper .width .point .interval
## 1 101.9132 101.36505 102.4841 0.50 median qi
## 2 101.9132 100.86039 102.9730 0.80 median qi
## 3 101.9132 100.30502 103.5255 0.95 median qi
## 4 101.9132 99.71696 104.1686 0.99 median qi
samples %>%
select(mu) %>%
median_qi(.width = c(.50, .80, .95, .99)) %>%
ggplot(aes(x = mu, xmin = .lower, xmax = .upper)) +
geom_pointinterval() +
ylab("")

Posterior Samples

samples %>%
select(mu) %>%
ggplot(aes(x = mu)) +
stat_halfeye()

samples %>%
select(sigma) %>%
ggplot(aes(x = sigma)) +
stat_halfeye(point_interval = mode_hdi)

Posterior Predictive Distribution

post_pred_1 <- d %>%
modelr::data_grid(Group) %>%
add_predicted_draws(m1) %>%
ggplot(aes(y = Group, x = .prediction)) +
stat_interval(.width = c(.50, .80, .95, .99)) +
geom_point(aes(x = IQ), alpha = 0.4, data = d) +
scale_color_brewer()
post_pred_1

Prior vs Posterior Predictive Distribution

cowplot::plot_grid(prior_pred_1,
post_pred_1,
labels = c('Prior predictive', 'Posterior predictive'),
label_size = 12,
align = "h",
nrow = 2)

Two Groups

t-Test as General Linear Model

  • For simplicity, we assume equal variances

Tradional notation

yij=α+βxij+ϵij ϵN(0,σ2)

t-Test as General Linear Model

  • For simplicity, we assume equal variances

Tradional notation

yij=α+βxij+ϵij ϵN(0,σ2)

Probabilistic notation

yijN(μ,σ2) μij=α+βxij

Xij is an indicator variable.

Ordinary Least Squares

levels(TwoGroupIQ$Group)
## [1] "Placebo" "SmartDrug"

Using R's formula notation:

fit_ols <- lm(IQ ~ Group,
data = TwoGroupIQ)
summary(fit_ols)
##
## Call:
## lm(formula = IQ ~ Group, data = TwoGroupIQ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.9149 -0.9149 0.0851 1.0851 22.0851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 100.3571 0.7263 138.184 <2e-16 ***
## GroupSmartDrug 1.5578 0.9994 1.559 0.123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.707 on 87 degrees of freedom
## Multiple R-squared: 0.02717, Adjusted R-squared: 0.01599
## F-statistic: 2.43 on 1 and 87 DF, p-value: 0.1227

Dummy Coding in R

contrasts(TwoGroupIQ$Group)
## SmartDrug
## Placebo 0
## SmartDrug 1
mm1 <- model.matrix(~ Group, data = TwoGroupIQ)
head(mm1)
## (Intercept) GroupSmartDrug
## 1 1 1
## 2 1 1
## 3 1 1
## 4 1 1
## 5 1 1
## 6 1 1
as_tibble(mm1) %>%
group_by(GroupSmartDrug) %>%
slice_sample(n= 3)
## # A tibble: 6 x 2
## # Groups: GroupSmartDrug [2]
## `(Intercept)` GroupSmartDrug
## <dbl> <dbl>
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 1
## 5 1 1
## 6 1 1

Dummy Coding in R

as_tibble(mm1) %>%
group_by(GroupSmartDrug) %>%
slice_sample(n= 3)
## # A tibble: 6 x 2
## # Groups: GroupSmartDrug [2]
## `(Intercept)` GroupSmartDrug
## <dbl> <dbl>
## 1 1 0
## 2 1 0
## 3 1 0
## 4 1 1
## 5 1 1
## 6 1 1
mm2 <- model.matrix(~ 0 + Group, data = TwoGroupIQ)
as_tibble(mm2) %>%
group_by(GroupSmartDrug) %>%
slice_sample(n= 3)
## # A tibble: 6 x 2
## # Groups: GroupSmartDrug [2]
## GroupPlacebo GroupSmartDrug
## <dbl> <dbl>
## 1 1 0
## 2 1 0
## 3 1 0
## 4 0 1
## 5 0 1
## 6 0 1

Graphical Model



knitr::include_graphics("images/two-group-iq-graphical-model.png")

Get Priors

priors2 <- get_prior(IQ ~ 1 + Group,
data = TwoGroupIQ)
priors2
## prior class coef group resp dpar nlpar bound
## (flat) b
## (flat) b GroupSmartDrug
## student_t(3, 101, 2.5) Intercept
## student_t(3, 0, 2.5) sigma
## source
## default
## (vectorized)
## default
## default

Get Priors

priors3 <- get_prior(IQ ~ 0 + Group,
data = TwoGroupIQ)
priors3
## prior class coef group resp dpar nlpar bound
## (flat) b
## (flat) b GroupPlacebo
## (flat) b GroupSmartDrug
## student_t(3, 0, 2.5) sigma
## source
## default
## (vectorized)
## (vectorized)
## default

Define Priors on Group Means

priors2_b <- prior(normal(0, 2), class = b)

We can sample from the prior.

m2_prior <- brm(IQ ~ 1 + Group,
prior = priors2_b,
data = TwoGroupIQ,
sample_prior = "only",
file = "models/twogroupiq-prior-2")
prior_pred_2 <- TwoGroupIQ %>%
modelr::data_grid(Group) %>%
add_predicted_draws(m2_prior) %>%
ggplot(aes(y = Group, x = .prediction)) +
stat_interval(.width = c(.50, .80, .95, .99)) +
geom_point(aes(x = IQ), alpha = 0.4, data = TwoGroupIQ) +
scale_color_brewer() +
theme_tidybayes()

Priors definieren und vom Prior Sampeln

priors3_b <- prior(normal(100, 10), class = b)
m3_prior <- brm(IQ ~ 0 + Group,
prior = priors3_b,
data = TwoGroupIQ,
sample_prior = "only",
file = "models/twogroupiq-prior-3")
prior_pred_3 <- TwoGroupIQ %>%
modelr::data_grid(Group) %>%
add_predicted_draws(m3_prior) %>%
ggplot(aes(y = Group, x = .prediction)) +
stat_interval(.width = c(.50, .80, .95, .99)) +
geom_point(aes(x = IQ), alpha = 0.4, data = TwoGroupIQ) +
scale_color_brewer() +
theme_tidybayes()

Posterior Sampling

m2 <- brm(IQ ~ 1 + Group,
prior = priors2_b,
data = TwoGroupIQ,
file = "models/twogroupiq-2")
m3 <- brm(IQ ~ 0 + Group,
prior = priors3_b,
data = TwoGroupIQ,
file = "models/twogroupiq-3")

Summary

summary(m2)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: IQ ~ 1 + Group
## Data: TwoGroupIQ (Number of observations: 89)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 100.51 0.69 99.18 101.90 1.00 3735 2921
## GroupSmartDrug 1.23 0.88 -0.49 2.95 1.00 3633 3160
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 4.72 0.36 4.09 5.52 1.00 3002 2088
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Summary

summary(m3)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: IQ ~ 0 + Group
## Data: TwoGroupIQ (Number of observations: 89)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## GroupPlacebo 100.35 0.73 98.95 101.75 1.00 3914 3163
## GroupSmartDrug 101.92 0.69 100.62 103.28 1.00 4057 2954
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 4.71 0.36 4.07 5.45 1.00 3342 2697
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_plot(m2, "b_GroupSmartDrug")

mcmc_plot(m3, "b")

## Get Posterior Samples

samples_m3 <- posterior_samples(m3) %>%
transmute(Placebo = b_GroupPlacebo,
SmartDrug = b_GroupSmartDrug,
sigma = sigma)

--

samples_m3 <- samples_m3 %>%
mutate(diff = SmartDrug - Placebo,
effect_size = diff/sigma)

Get Posterior Samples

samples_m3 %>%
select(diff) %>%
median_qi()
## diff .lower .upper .width .point .interval
## 1 1.57189 -0.3887631 3.521777 0.95 median qi
samples_m3 %>%
select(diff) %>%
ggplot(aes(x = diff)) +
stat_halfeye(point_interval = median_qi)

samples_m3 %>%
select(effect_size) %>%
ggplot(aes(x = effect_size)) +
stat_halfeye(point_interval = median_qi)

Using default priors

fit_eqvar <- brm(IQ ~ Group,
data = TwoGroupIQ,
file = "models/fit_eqvar")
fit_eqvar %>%
gather_draws(b_GroupSmartDrug) %>%
ggplot(aes(y = .variable, x = .value)) +
stat_halfeye(fill = "Steelblue4") +
geom_vline(xintercept = 0, color = "white", linetype = 1, size = 1) +
ylab("") +
xlab("Estimated difference") +
theme_tidybayes()

grid <- TwoGroupIQ %>%
modelr::data_grid(Group)
fits_IQ <- grid %>%
add_fitted_draws(fit_eqvar)
preds_IQ <- grid %>%
add_predicted_draws(fit_eqvar)
pp_eqvar <- TwoGroupIQ %>%
ggplot(aes(x = IQ, y = Group)) +
stat_halfeye(aes(x = .value),
scale = 0.7,
position = position_nudge(y = 0.1),
data = fits_IQ,
.width = c(.66, .95, 0.99)) +
stat_interval(aes(x = .prediction),
data = preds_IQ,
.width = c(.66, .95, 0.99)) +
scale_x_continuous(limits = c(75, 125)) +
geom_point(data = TwoGroupIQ) +
scale_color_brewer() +
labs(title = "Equal variance model predictions") +
theme_tidybayes()

Estimating parameters of a normal distribution

We have 3 data points, y1=85, y2=100, und y3=115. We can assume that they are normally distributed.

p(y|μ,σ)=1Zexp(12(yμ)2σ2)

Goal: We want to estimate the mean and standard deviation of a normal distribution

Z=σ2π is a normalising constant.

We can compute the probability of one data point, and if the data points are i.i.d, then P(data|,μ,σ) is the product of the indidividual probabilities.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow