Conjugacy

Chapter 03.
One-parameter Models

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.

Binomial Model

Prior: $\theta \text{ ~ } Beta(a,b)$
Likelihood: $Y|\theta \text{ ~ } Binomial(n, \theta) $
Posterior: $\theta|y \text{ ~ } Beta(a+y, b+n-y) $

a: prior ์„ฑ๊ณตํšŸ์ˆ˜, b: prior ์‹คํŒจํšŸ์ˆ˜, $\omega$=a+b: concentration

$E[\theta|y] = \frac{a+y}{a+b+n} = \frac{n}{a+b+n}\times\frac{y}{n} + \frac{a+b}{a+b+n}\times\frac{a}{a+b}$ where $\frac{y}{n}$ = sample mean, $\frac{a}{a+b}$ = prior expectation

Posterior Predictive
$n^* = 1$์ผ ๋•Œ : $\tilde{Y}|y \text{ ~ } Ber(\frac{a+y}{a+b+n})$
$n^* \geq 2$์ผ ๋•Œ : $p(\tilde{Y}=y^*|y) = \binom{n^*}{y^*}\frac{B(a+y+y^*, b+n+n^*-y-y^*)}{B(a+y, b+n-y)}$ where $B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)} $

Poisson Model

Prior: $\theta \text{ ~ } Gamma(a,b) $
Likelihood: $Y_1, ..., Y_n \text{ ~ iid. } Poisson(\theta)$
Posterior: $\theta|y_1, ..., y_n \text{ ~ } Gamma(a+\sum_{i=1}^{n}{y_i}, b+n) $

a: sum of counts from b prior observations, b: number of prior observations

$E[\theta|y_1, ..., y_n] = \frac{a+\sum y_i}{b+n} = \frac{b}{b+n}\frac{a}{b} + \frac{n}{b+n}\frac{\sum y_i}{n}$

Posterior Predictive: $\tilde{Y}=y^*|y_1, ..., y_n \text{ ~ } NB(a+\sum y_i+y^*, \frac{b+n}{b+n+1}) $
๋‹จ, ์—ฌ๊ธฐ์„œ $Negative Binomial$์€ ์„ฑ๊ณต์ด ์•„๋‹Œ ์‹คํŒจํšŸ์ˆ˜๋ฅผ ์„ธ๋Š” ๋ถ„ํฌ ํ˜•ํƒœ์ด๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ํ™•๋ฅ ๋ถ„ํฌ ํฌ์ŠคํŒ…์—์„œ ํ™•์ธํ•˜์ž.

Exponential Family

exponential family(์ง€์ˆ˜์กฑ)์˜ pdf ๋˜๋Š” pmf๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.
$ p(y_i|\phi) = h(y)c(\phi)exp\big[\phi K(y)\big]$
exponential family ์ž์ฒด์— ๋Œ€ํ•ด์„œ ๋ณด๋‹ค ์ž์„ธํ•œ ๊ฒƒ์€ ํ•ด๋‹น ํฌ์ŠคํŒ…์„ ์ฐธ๊ณ ํ•˜์ž.

Prior
$$\begin{align} p(\phi) &= k(n_0, t_0)c(\phi)^n_0e^{n_0t_0\phi} \\ &\propto c(\phi)^n_0e^{n_0t_0\phi} \end{align}$$

Likelihood
$$L(\phi|y_1,...,y_n) \propto c(\phi)^n exp(\phi \sum_{i=1}^{n}K(y_i))$$

Posterior
$$\begin{align} p(\phi|y) &\propto p(\phi)f(y|\phi) \\ &\propto c(\phi)^{n_0}e^{n_0t_0\phi} \cdot c(\phi)^n exp(\phi \sum_{i=1}^{n}K(y_i)) \\ &\propto c(\phi)^{n_0}exp\big[n_0t_0\phi + \phi \sum_{i=1}^{n}K(y_i) \big] \\ &\propto c(\phi)^{n_0}exp\big[ \phi \big( n_0t_0 + n\frac{\sum_{i=1}^{n}K(y_i)}{n} \big)\big] \end{align}$$
์—ฌ๊ธฐ์„œ $n_0$์™€ $t_0$์€ ๊ฐ๊ฐ prior sample size์™€ prior guess of $K(Y)$๋ฅผ ๋œปํ•œ๋‹ค.

Prior: $p(\theta) \propto g(\theta)^\eta \ exp(\phi(\theta)^T \ \nu)$
Likelihood: $p(y|\theta) = \prod_{i=1}^{N} f(y_i) \ g(\theta)^N \ exp(\phi(\theta)^T \ \sum_{i=1}^{N}s(y_i))$ where $\sum_{i=1}^{N}s(y_i))$ is sufficient statistics $t(y)$
Posterior: $p(\theta|y) \propto g(\theta)^{\eta+N} \ exp(\phi(\theta)^T \ (\nu + t(y)) $

Conjugate Prior

prior์™€ posterior์˜ ํ™•๋ฅ ๋ถ„ํฌํ˜•ํƒœ๊ฐ€ ๊ฐ™์„ ์ˆ˜ ์žˆ๋„๋ก prior์„ ์„ค์ •ํ•˜๋ฉด ์ด๋ฅผ conjugate prior๋ผ๊ณ  ํ•œ๋‹ค.
์œ„์˜ ์˜ˆ์‹œ ์™ธ์—๋„ Normal model ๋“ฑ์ด ์žˆ๋Š”๋ฐ, ์ด๋“ค์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ์— ์ด์–ด์„œ ์‚ดํŽด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.
๋‹ค์–‘ํ•œ ์˜ˆ์‹œ๋“ค์€ ์œ„ํ‚ค๋ฐฑ๊ณผ์— ์ž์„ธํžˆ ๋‚˜์™€์žˆ์œผ๋‹ˆ ๊ถ๊ธˆํ•œ ์‚ฌ๋žŒ๋“ค์€ ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ดํŽด๋ณด์•„๋„ ์ข‹๊ฒ ๋‹ค.

์ฃผ์˜์‚ฌํ•ญ

์‚ฌํ›„ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚˜๋Š” ๊ฒƒ๊ณผ ์‚ฌํ›„์˜ˆ์ธก์น˜๊ฐ€ ์ฐจ์ด๊ฐ€ ๋งŽ์ด ๋‚˜๋Š” ๊ฒƒ์˜ ์ฐจ์ด๋ฅผ ์•Œ์•„๋‘์–ด์•ผ ํ•œ๋‹ค. ์ฆ‰, {${\theta_1 > \theta_2}$}์™€ {$\tilde{Y_1} > \tilde{Y_2}$}๋Š” ๋‹ค๋ฅด๋‹ค.

Strong evidence of a difference between two populations does not mean that the difference itself is large.

Conclusion

Conjugacy๋ฅผ ์ž˜ ์•Œ์•„๋‘์ž.



ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.