Hierarchical Model

Chapter 08.
Group Comparisons and Hierarchical Modeling

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.

Hierarchical Model์€ ๊ทธ๋ฃน ๊ฐ„ ๊ทธ๋ฆฌ๊ณ  ๊ทธ๋ฃน ๋‚ด variability๋ฅผ ์„ค์ •ํ•˜๋Š” ๋ฐ์— ์œ ์šฉํ•˜๋‹ค.

Hierarchical Model describes both with-in group and between-group variability.

Hierarchical Model์€ ๋ฒ ์ด์ฆˆ ํ†ต๊ณ„๊ฐ€ ๋‹ค๋ฅธ ์‘์šฉ๋ถ„์•ผ์— ๋„๋ฆฌ ํผ์ ธ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฐ์ •์  ๊ณ„๊ธฐ๊ฐ€ ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์œ„์˜ ๊ทธ๋ฆผ์„ ํ†ตํ•ด์„œ ์•Œ ์ˆ˜ ์žˆ๋“ฏ์ด, ์ธต์ด ์—ฌ๋Ÿฌ ๊ฐœ ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ณต์žกํ•œ ์ƒํ™ฉ์—์„œ๋„ Estimation์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.

๊ฐ€์žฅ ํฐ ํŠน์ง•์„ ๊ฐ„๋‹จํ•˜๊ฒŒ ์š”์•ฝํ•ด๋ณด์ž๋ฉด, ์—ฌ๋Ÿฌ ๊ทธ๋ฃน๋ผ๋ฆฌ ์„œ๋กœ ์ •๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ›๋Š”๋‹ค๋Š” ์ ์ด๋‹ค.

1. Exchangeability & De Finetti’s Theorem

์ด๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋Š” Chapter2์—์„œ ์ด๋ฏธ ๋‹ค๋ฃฌ ์ ์€ ์žˆ๋‹ค. ๊ทธ๋ž˜๋„ ์ด๋ฒˆ ์ฑ•ํ„ฐ์˜ ๋…ผ๋ฆฌ์ „๊ฐœ์— ๋Œ€ํ•ด์„œ ์ •๋‹น์„ฑ์„ ๋ถ€์—ฌํ•ด์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ๋ฒˆ ๋” ๋ณต์Šตํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค. ์šฐ์„  Exchangeability๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๋‹ค.
$$p(y_1, ..., y_n) = p(y_{\pi_{1}}, ..., y_{\pi_{n}})$$
๊ทธ๋ฆฌ๊ณ  De Finetti’s Theorem์€ exchangeability๊ฐ€ ๋งŒ์กฑ๋˜๋ฉด, Conditional Independence๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฒƒ์ด์—ˆ๋‹ค. (์ฐธ๊ณ ๋กœ, ๊ทธ ์—ญ์€ ์ž๋ช…ํ•˜๋‹ค.) ์ด๋•Œ ์กฐ๊ฑด์œผ๋กœ๋Š” small sample from large population์ด๋ผ๊ณ  ํ•œ๋‹ค. (์ž์„ธํ•œ ๊ฒƒ์€ FCB๋ฅผ ์ฝ์–ด๋ณด์ž.)

2. Hierarchical Beta-Binomial

2-1. Posterior ์ข…๋ฅ˜ ์ •๋ฆฌ

Joint Posterior

$$\begin{align} p(\theta,\psi |y) &\propto p(y|\theta, \psi) \ p(\theta, \psi) \\ &= p(y|\theta) \ p(\theta, \psi) \\ &= p(y|\theta) \ p(\theta| \psi) \ p(\psi) \end{align}$$

Conditional Posterior

$$\begin{align} p(\theta | \psi, y) &\propto p(y, \psi|\theta) \ p(\theta) \\ &= p(y|\theta,\psi) \ p(\psi|\theta) \ p(\theta) \\ &= p(y|\theta) \ p(\theta, \psi) \\ &= p(y|\theta) \ p(\theta| \psi) \ p(\psi) \end{align}$$

Joint Posterior์™€ Conditional Posterior ๊ฐ๊ฐ์˜ ๋งˆ์ง€๋ง‰๋ผ๋ฆฌ ๊ฐ™๋‹ค๋Š” ์‚ฌ์‹ค์— ์ฃผ๋ชฉํ•ด๋ณด์ž.

Marginal Posterior

$$p(\psi|y) = \int_\Theta p(\theta, \psi | y)d\theta \\ p(\psi|y) = \frac{p(\theta,\psi|y)}{p(\theta|\psi, y)}$$
๋‘˜ ์ค‘ ํŽธํ•œ ๊ฒƒ์œผ๋กœ ๊ณ„์‚ฐํ•œ๋‹ค.

2-2. ๊ฐ„๋‹จ ์ •๋ฆฌ

$$p(\theta, \alpha, \beta|y) \propto p(\alpha,\beta) \ \prod_{j=1}^{m}\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta_j^{\alpha-1}(1-\theta_j)^{\beta-1} \ \prod_{j=1}^{m}\theta_j^{y_j}(1-\theta_j)^{n_j-y_j}$$

3์ธต: hyperprior $\psi$
2์ธต: $\theta_j|\psi \sim Beta(\alpha,\beta)$
1์ธต: $y_j|\theta_j \sim Binom(n_j, \theta_j)$

2-3. hyperprior๋Š” ์–ด๋–ป๊ฒŒ ์ฃผ์–ด์•ผ ํ• ๊นŒ?

๊ฒฐ๋ก ๋ถ€ํ„ฐ ๋งํ•˜์ž๋ฉด,
$$p(\alpha, \beta) \propto (\alpha+\beta)^{-\frac{5}{2}}$$
์ด๋ ‡๊ฒŒ ์ค€๋‹ค๊ณ  ํ•œ๋‹ค. ํ•ด๋‹น ์ˆ˜์‹์— ๋Œ€ํ•œ ๊ทธ๋ž˜ํ”„๋Š” ์•„๋ž˜์™€ ๊ฐ™์€๋ฐ, ์‹œ๊ฐ์ ์œผ๋กœ uninformative์— ๊ฐ€๊น๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

hyperprior

์ด๋Š” ์›๋ž˜๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ํ˜•ํƒœ์˜€๋‹ค.
$$p\Big(log\frac{\alpha}{\beta}, (\alpha+\beta)^{-\frac{1}{2}}\Big) \propto 1$$
๋ณ€์ˆ˜๋ณ€ํ™˜์„ ํ†ตํ•ด ์œ ๋„ํ•˜๋Š” ๊ณผ์ •์€ ๋ณ€์ˆ˜๋ณ€ํ™˜ ํฌ์ŠคํŒ…์— ์ž์„ธํ•˜๊ฒŒ ์„ค๋ช…๋˜์–ด ์žˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฒฐ๋ก ์ ์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์“ธ ์ˆ˜ ์žˆ๋‹ค.

$$\rightarrow p(\theta, \alpha, \beta|y) \propto (\alpha+\beta)^{-\frac{5}{2}} \ \prod_{j=1}^{m}\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}\theta_j^{\alpha-1}(1-\theta_j)^{\beta-1} \ \prod_{j=1}^{m}\theta_j^{y_j}(1-\theta_j)^{n_j-y_j}$$

2-4. ์˜ˆ์‹œ

FCB ์ฑ…์— ๋‚˜์˜จ ์ข…์–‘ ์ฅ(Rat Tumor) ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด์ž.

1) 95% Posterior interval

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
### Raw data
num_diseased <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
                  2,1,5,2,5,3,2,7,7,3,3,2,9,10,4,4,4,4,4,4,4,10,4,4,4,5,11,12,
                  5,5,6,5,6,6,6,6,16,15,15,9,4)
num_of_obs <- c(20,20,20,20,20,20,20,19,19,19,19,18,18,17,20,20,20,20,19,19,18,18,25,24,
                23,20,20,20,20,20,20,10,49,19,46,27,17,49,47,20,20,13,48,50,20,20,20,20,
                20,20,20,48,19,19,19,22,46,49,20,20,23,19,22,20,20,20,52,46,47,24,14)

### Data frame
DF <- data.frame(cbind(num_diseased,num_of_obs))
colnames(DF) <- c("y","n")
DF$ybar = DF$y/DF$n

### theta's uniform prior: beta(1,1)
DF$postmean = (DF$y + 1) / (DF$n + 1)
DF$lb = mapply(function(y, n) qbeta(0.025, y + 1, n - y + 1), DF$y, DF$n)
DF$ub = mapply(function(y, n) qbeta(1 - 0.025, y + 1, n - y + 1), DF$y, DF$n)
title1 = expression(paste("95% Posterior interval, ", beta(1, 1), " prior"))
p1 = ggplot(DF, aes(x = ybar, ymin = lb, ymax = ub)) +
    geom_linerange() + geom_point(aes(y = postmean)) +
    geom_abline(slope = 1, intercept = 0) +
    labs(title = title1, x = "observed mean", y = "posterior mean")
p1

2) Marginal Distribution of alpha & beta

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
### Hierarchical model using theta's prior: beta(alpha,beta)
griddens = 1e2
A = seq(0.5, 6, length.out = griddens)
B = seq(3, 33, length.out = griddens)
cA = rep(A, each = length(B))
cB = rep(B, length(A))
lpfun = function(a, b, y, n){ # marginal posterior
    (-5/2)* log(a+b) + sum(lgamma(a+b) - lgamma(a) - lgamma(b) + lgamma(a+y) + lgamma(b+n-y) - lgamma(a+b+n))}
lp = mapply(lpfun, cA, cB, MoreArgs = list(DF$y, DF$n))
df_marg = data.frame(alpha= cA, beta= cB, posterior = exp(lp)/sum(exp(lp))) # posterior prob.ํ•ฉ์ด 1์ด ๋˜๋„๋ก ์กฐ์ •
title2 = TeX('The marginal of $\\alpha$ and $\\beta$')
p2 = ggplot(data = df_marg, aes(x=alpha, y=beta))+
    geom_raster(aes(fill = posterior, alpha= posterior), interpolate= T)+
    geom_contour(aes(z= posterior), color = 'black', size= 0.2)+
    coord_cartesian(xlim= c(1,5), ylim= c(4,26))+
    labs(x = TeX('$\\alpha$'), y= TeX('$\\beta$'), title= title2)+
    scale_fill_gradient(low= "cornflowerblue", high= "navy", guide= F)+
    scale_alpha(range= c(0,1), guide=F)
p2

1
2
3
4
5
6
7
8
title_alpha <- TeX('Marginal Posterior of $\\alpha$|y')
title_beta <- TeX('Marginal Posterior of $\\beta$|y')

marg_alpha <- df_marg %>%
    group_by(alpha) %>% 
    summarise(marg = sum(posterior)) %>% 
    ggplot(aes(x=alpha,y=marg)) + geom_line(size=2, color='cornflowerblue') +
    labs(x=TeX('$\\alpha$'), y= TeX('p($\\alpha$|y)'), title=title_alpha)
## `summarise()` ungrouping output (override with `.groups` argument)
1
2
3
4
5
marg_beta <- df_marg %>%
    group_by(beta) %>% 
    summarise(marg = sum(posterior)) %>% 
    ggplot(aes(x=beta,y=marg)) + geom_line(size=2, color='darkgreen') +
    labs(x=TeX('$\\beta$'), y= TeX('p($\\beta$|y)'), title=title_beta)
## `summarise()` ungrouping output (override with `.groups` argument)
1
grid.arrange(marg_alpha, marg_beta, ncol=2)

3) 95% posterior interval, uninformative prior($\alpha, \beta$)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Grid sampling to generate posterior samples of alpha, beta
# (re)settings
DF <- data.frame(cbind(num_diseased,num_of_obs))
colnames(DF) <- c("y","n")
DF$ybar = DF$y/DF$n
DF$postmean = DF$ub = DF$lb = rep(NA, nrow(DF))
Nsamp = 1e3
set.seed(101)

# sample alpha, beta
samp_idx = sample(length(df_marg$posterior),
                  size = Nsamp, replace=T, prob = df_marg$posterior)
samp_A = cA[samp_idx]
samp_B = cB[samp_idx]

# sample theta_j for each j
for(i in 1:nrow(DF)){
    n = DF$n[i]; y = DF$y[i]
    theta_j = mapply(function(a, b, n, y) rbeta(1, y+a, n-y+b),
                     samp_A, samp_B, MoreArgs = list(n=n, y=y))
    DF$lb[i] = quantile(theta_j, 0.025)
    DF$ub[i] = quantile(theta_j, 1-0.025)
    DF$postmean[i] = mean(theta_j)
}

# 95% posterior interval
title3 = TeX('95% posterior interval, uninformative $\\alpha$, $\\beta$ prior')
p3 = ggplot(DF, aes(x=ybar, ymin=lb, ymax=ub))+
    geom_linerange()+geom_point(aes(y=postmean))+geom_abline(slope = 1, intercept = 0)+
    labs(title=title3, x="observed mean", y="posterior mean")
p3

4) 1๊ณผ 3 ๋น„๊ตํ•ด๋ณด๊ธฐ

3. Hierarchical Normal

Conclusion

Prior์˜ Prior, HyperPrior๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ณด๋‹ค ์„ค๋“๋ ฅ ์žˆ๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด๋ณด์ž!


Reference

[1] FCB
[2] ESC 2021-1 Spring ์„ธ์…˜



ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.