Intro to Bayesian Statistics

class: center, middle, inverse, title-slide

# Intro to Bayesian Statistics
## Part 1 <br/> Gentle introduction
### Andrew Ellis
### Bayesian multilevel modelling workshop 2021
### 05-21-2021

---

layout: true
  

<div class="my-footer">
<span>
<a href="https://awellis.github.io/learnmultilevelmodels/" target="_blank"><svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#0F4C81;overflow:visible;position:relative;"><path d="M280.37 148.26L96 300.11V464a16 16 0 0 0 16 16l112.06-.29a16 16 0 0 0 15.92-16V368a16 16 0 0 1 16-16h64a16 16 0 0 1 16 16v95.64a16 16 0 0 0 16 16.05L464 480a16 16 0 0 0 16-16V300L295.67 148.26a12.19 12.19 0 0 0-15.3 0zM571.6 251.47L488 182.56V44.05a12 12 0 0 0-12-12h-56a12 12 0 0 0-12 12v72.61L318.47 43a48 48 0 0 0-61 0L4.34 251.47a12 12 0 0 0-1.6 16.9l25.5 31A12 12 0 0 0 45.15 301l235.22-193.74a12.19 12.19 0 0 1 15.3 0L530.9 301a12 12 0 0 0 16.9-1.6l25.5-31a12 12 0 0 0-1.7-16.93z"/></svg></a> Graduate School workshop 2021
</span>
</div>

---

## Frequentist statistics

Relies on:

- point estimation
- summary statistics
- often uses null hypothesis significance testing

Problems:

- p-values can be hard to understand
- confidence intervals are hard to understand
- it is unclear whether p-values and confidence intervals really allow us to address the questions we care about.
    + what is the probability that my hypothesis might be true?
    + how can I quantify evidence against a null hypothesis?

---

## Example: t-test

We want to compare two groups. One group is wearing fancy hats, the other a the control group. We are interested in their creativity scores.

.panelset[
.panel[.panel-name[Parameters]

```r
library(tidyverse)
library(kableExtra)

set.seed(12)
# Number of people per group
N <- 50 
# Population mean of creativity for people wearing fancy hats
mu_fancyhats <- 103 
# Population mean of creativity for people wearing no fancy hats
mu_nofancyhats <- 98 
# Average population standard deviation of both groups
sigma <- 15 
```

]

.panel[.panel-name[Make dataframe]

```r
# Generate data
fancyhats = tibble(Creativity = rnorm(N, mu_fancyhats, sigma),
               Group = "Fancy Hat")
nofancyhats = tibble(Creativity = rnorm(N, mu_nofancyhats, sigma),
                 Group = "No Fancy Hat")
FancyHat <- bind_rows(fancyhats, nofancyhats)  %>%
    mutate(Group = fct_relevel(as.factor(Group), "No Fancy Hat"))
```
]

.panel[.panel-name[Data]

```r
FancyHat
```

```
## # A tibble: 100 x 2
##    Creativity Group    
##         <dbl> <fct>    
##  1       80.8 Fancy Hat
##  2      127.  Fancy Hat
##  3       88.6 Fancy Hat
##  4       89.2 Fancy Hat
##  5       73.0 Fancy Hat
##  6       98.9 Fancy Hat
##  7       98.3 Fancy Hat
##  8       93.6 Fancy Hat
##  9      101.  Fancy Hat
## 10      109.  Fancy Hat
## # … with 90 more rows
```
]
.panel[.panel-name[Plot data]

]
]

---

.panelset[
.panel[.panel-name[Welch test]

```r
fancyhat_ttest <- t.test(Creativity ~ Group,
       var.equal = FALSE,
       data = FancyHat)
```

]
.panel[.panel-name[Results]

```r
fancyhat_ttest_tab <- broom::tidy(fancyhat_ttest)
```

```r
fancyhat_ttest_tab %>%
    select(estimate, estimate1, estimate2, statistic, p.value, conf.low, conf.high) %>%
    round(3) %>% 
    kbl() %>%
    kable_classic(full_width = FALSE, html_font = "Cambria")
```

<table class=" lightable-classic" style="font-family: Cambria; width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> estimate1 </th>
   <th style="text-align:right;"> estimate2 </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> -1.647 </td>
   <td style="text-align:right;"> 99.209 </td>
   <td style="text-align:right;"> 100.856 </td>
   <td style="text-align:right;"> -0.637 </td>
   <td style="text-align:right;"> 0.526 </td>
   <td style="text-align:right;"> -6.78 </td>
   <td style="text-align:right;"> 3.486 </td>
  </tr>
</tbody>
</table>

]
.panel[.panel-name[What have we done here?]

1) We estimated two means (and two standard deviations). More specifically, we obtained **point estimates**.

2) We estimated the difference in means (again, a point estimate).

3) We computed a test statistic..

4) We computed the probability of obtaining a value for the test statistic that is at least as extreme as the one obtained. This is called a p-value.

]]

---

.your-turn[

- Can you explain what the p-value and confidence interval mean? 
- What can you conclude from this analysis?
- Can you think of any problems that might be associated with this type of approach?
]

---

## Interpretations of Probability

- In the classical, frequentist approach, parameter, e.g. the means estimated above, do not have probability distributions.
- Only events that can be repeated infinitely many times have a probability, and probability is simply relative frequency.
- In the Bayesian worldview, probability quantifies *degree of belief*. More specificcally, our uncertainty is expressed as a probability distribution. Probability quantifies knowledge, and is not a fundamental property of things.

---

## Some Gamma distributions

---

## Bayesian inference

- Parameters have probability distributions.
- Parameters have prior distributions. These quantify our belief before we see the data.
- We obtain posterior distributions instead of point estimates. Posterior distributions 
reflect our belief after having observed data.
- We go from prior to posterior by applying Bayes theorem.
- **Most important point**: Uses probability to quantify uncertainty.

---

## Why should you use Bayesian inference?

- More fun
- Much easier to understand
- Corresponds to our intuitive understanding
- More principled apporach
- Generally much more flexible
- Better for fitting complex models (including multilevel models)
- Allows us to quantify evidence

---

## Why shouldn't you use Bayesian inference?

- Everyone else is using frequentist methods
- Frequentist methods have fancy names for everything (i.e. established off-the-shelf methods)
- Bayesian inference is computationally expensive (as you will soon discover)
- Hypothesis testing is difficult (but the same applies to NHST)