+ - 0:00:00
Notes for current slide
Notes for next slide

Intro to Bayesian Statistics

Part 1
Gentle introduction

Andrew Ellis

Bayesian multilevel modelling workshop 2021

05-21-2021

Frequentist statistics

Relies on:

  • point estimation
  • summary statistics
  • often uses null hypothesis significance testing

Problems:

  • p-values can be hard to understand
  • confidence intervals are hard to understand
  • it is unclear whether p-values and confidence intervals really allow us to address the questions we care about.
    • what is the probability that my hypothesis might be true?
    • how can I quantify evidence against a null hypothesis?

Example: t-test

We want to compare two groups. One group is wearing fancy hats, the other a the control group. We are interested in their creativity scores.

library(tidyverse)
library(kableExtra)
set.seed(12)
# Number of people per group
N <- 50
# Population mean of creativity for people wearing fancy hats
mu_fancyhats <- 103
# Population mean of creativity for people wearing no fancy hats
mu_nofancyhats <- 98
# Average population standard deviation of both groups
sigma <- 15
# Generate data
fancyhats = tibble(Creativity = rnorm(N, mu_fancyhats, sigma),
Group = "Fancy Hat")
nofancyhats = tibble(Creativity = rnorm(N, mu_nofancyhats, sigma),
Group = "No Fancy Hat")
FancyHat <- bind_rows(fancyhats, nofancyhats) %>%
mutate(Group = fct_relevel(as.factor(Group), "No Fancy Hat"))
FancyHat
## # A tibble: 100 x 2
## Creativity Group
## <dbl> <fct>
## 1 80.8 Fancy Hat
## 2 127. Fancy Hat
## 3 88.6 Fancy Hat
## 4 89.2 Fancy Hat
## 5 73.0 Fancy Hat
## 6 98.9 Fancy Hat
## 7 98.3 Fancy Hat
## 8 93.6 Fancy Hat
## 9 101. Fancy Hat
## 10 109. Fancy Hat
## # … with 90 more rows

fancyhat_ttest <- t.test(Creativity ~ Group,
var.equal = FALSE,
data = FancyHat)
fancyhat_ttest_tab <- broom::tidy(fancyhat_ttest)
fancyhat_ttest_tab %>%
select(estimate, estimate1, estimate2, statistic, p.value, conf.low, conf.high) %>%
round(3) %>%
kbl() %>%
kable_classic(full_width = FALSE, html_font = "Cambria")
estimate estimate1 estimate2 statistic p.value conf.low conf.high
-1.647 99.209 100.856 -0.637 0.526 -6.78 3.486

1) We estimated two means (and two standard deviations). More specifically, we obtained point estimates.

2) We estimated the difference in means (again, a point estimate).

3) We computed a test statistic..

4) We computed the probability of obtaining a value for the test statistic that is at least as extreme as the one obtained. This is called a p-value.

  • Can you explain what the p-value and confidence interval mean?
  • What can you conclude from this analysis?
  • Can you think of any problems that might be associated with this type of approach?
03:00

Interpretations of Probability

  • In the classical, frequentist approach, parameter, e.g. the means estimated above, do not have probability distributions.
  • Only events that can be repeated infinitely many times have a probability, and probability is simply relative frequency.
  • In the Bayesian worldview, probability quantifies degree of belief. More specificcally, our uncertainty is expressed as a probability distribution. Probability quantifies knowledge, and is not a fundamental property of things.

Some Gamma distributions

Bayesian inference

  • Parameters have probability distributions.
  • Parameters have prior distributions. These quantify our belief before we see the data.
  • We obtain posterior distributions instead of point estimates. Posterior distributions reflect our belief after having observed data.
  • We go from prior to posterior by applying Bayes theorem.
  • Most important point: Uses probability to quantify uncertainty.

Why should you use Bayesian inference?

  • More fun
  • Much easier to understand
  • Corresponds to our intuitive understanding
  • More principled apporach
  • Generally much more flexible
  • Better for fitting complex models (including multilevel models)
  • Allows us to quantify evidence

Why shouldn't you use Bayesian inference?

  • Everyone else is using frequentist methods
  • Frequentist methods have fancy names for everything (i.e. established off-the-shelf methods)
  • Bayesian inference is computationally expensive (as you will soon discover)
  • Hypothesis testing is difficult (but the same applies to NHST)

Frequentist statistics

Relies on:

  • point estimation
  • summary statistics
  • often uses null hypothesis significance testing

Problems:

  • p-values can be hard to understand
  • confidence intervals are hard to understand
  • it is unclear whether p-values and confidence intervals really allow us to address the questions we care about.
    • what is the probability that my hypothesis might be true?
    • how can I quantify evidence against a null hypothesis?
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow