Normal Model

Chapter 05.
Normal Model

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods์™€ Bayesian Data Analysis๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.

Warm up!

  • Gamma Distribution
  • Inverse Gamma Distribution
  • Scaled Inverse Chi-squared Distribution

1. Single Parameter Conjugacy

ํ‰๊ท ์ด๋‚˜ ๋ถ„์‚ฐ ์ค‘ ํ•˜๋‚˜๋งŒ์„ ๋ชจ๋ฅด๋Š” ๊ฒฝ์šฐ

1-1. ํ‰๊ท ์„ ๋ชจ๋ฅด๋Š” ๊ฒฝ์šฐ

Prior: $\mu \text{ ~ } N(\mu_0, \tau_0^{2})$
Likelihood: $y|\mu \text{ ~ } N(\mu, \sigma^2)$
Posterior: $\mu|y \text{ ~ } N(\mu_n, \tau_n^{2})$

where $\frac{1}{\tau_n^{2}} = \frac{1}{\tau_0^{2}} + \frac{n}{\sigma^2}$ and $\mu_n = \frac{\frac{1}{\tau_0^{2}}}{\frac{1}{\tau_0^{2}} + \frac{n}{\sigma^2}}\mu_0 + \frac{\frac{n}{\sigma^2}}{\frac{1}{\tau_0^{2}} + \frac{n}{\sigma^2}}\bar{y} $

Posterior Predictive: $\tilde{y}|y \text{ ~ } N(\mu_n, \sigma^2+\tau_n^{2})$

1-2. ๋ถ„์‚ฐ์„ ๋ชจ๋ฅด๋Š” ๊ฒฝ์šฐ

Prior: $\sigma^2 \text{ ~ } \chi^{-2}(\nu_0, \sigma_0^2)$
Likelihood: $y|\sigma^2 \text{ ~ } N(\mu, \sigma^2)$
Posterior: $\sigma^2|y \text{ ~ } \chi^{-2}(\nu_n, \sigma_n^2)$

where $\nu_n = \nu_0 + n$ and $\sigma_n^2 = \frac{\nu_0\sigma_0^2 + ns(y)}{\nu_0 + n}$
c.f. $s(y) = \frac{1}{n}\sum_{i=1}^{n}(y_i-\mu)^2$, ์ด๋Š” MLE์ด๋‹ค(biased estimator). ์ฐธ๊ณ ๋กœ, ๋ฒ ์ด์ง€์•ˆ์€ frequentist๋“ค์˜ ๊ธฐ์ค€์ธ unbiasedness๋ฅผ ์ค‘์š”ํ•˜๊ฒŒ ์ƒ๊ฐํ•˜์ง€ ์•Š๋Š”๋‹ค.

2. Two Parameter

marginal distribution ์–ป๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•
  1. Integreation: joint posterior distribution์„ ๊ตฌํ•œ ํ›„, ๊ด€์‹ฌ ์—†๋Š” ๋ชจ์ˆ˜(nuisance parameter)์— ๋Œ€ํ•ด ์ ๋ถ„
  2. Simulation: joint posterior distribution์—์„œ sample์„ ๊ตฌํ•œ ํ›„, ๊ด€์‹ฌ ์žˆ๋Š” ๋ชจ์ˆ˜์˜ ๋ถ„ํฌ๋งŒ ๊ณ ๋ ค(๋‚˜๋จธ์ง€๋Š” ๋ฌด์‹œ)
๊ทธ๋ ‡๋‹ค๋ฉด joint posterior distribution์€ ์–ด๋–ป๊ฒŒ ๊ตฌํ• ๊นŒ?
  1. marginal and conditional simulation์„ ํ†ตํ•ด์„œ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.
    $\theta_2 \text{ ~ } \theta_2|y$ and $\theta_1 | \theta_2, y$
    $\rightarrow (\theta_1, \theta_2) \text{ ~ } (\theta_1, \theta_2|y)$

2-1. noninformative prior

Prior: $p(\mu, \sigma^2) = p(\mu)p(\sigma^2) \propto (\sigma^2)^{-1} $ (๋…๋ฆฝ ๊ฐ€์ •, improper prior)
Likelihood: $p(y|\mu, \sigma^2) \propto \sigma^{-n}exp(\frac{-1}{2}\sigma^2\sum_{i=1}^{n}(y_i - \mu)^2) $
Posterior: $\mu, \sigma^2 |y \text{ ~ } N(\bar{y}, \frac{\sigma^2}{n}) \times \chi^{-2}(n-1, s^2)$
Posterior Predictive: $\tilde{y}|y \text{ ~ } t_{n-1}(\bar{y}, (1+\frac{1}{n}s^2))$
์ด๋Š” posterior๊ณผ ๋น„๊ตํ•ด์„œ, data์˜ uncertainty($s^2$)์ด ์ถ”๊ฐ€๋œ ํ˜•ํƒœ๋ผ๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๋‹ค.

Posterior Distribution ๊ตฌํ•˜๊ธฐ (Noninformative)

ํ•ด๋‹น Posterior Distribution์„ ๊ตฌํ•˜๋Š” ๊ณผ์ •์€ ๋‹ค์†Œ ๋ณต์žกํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์„ธํ•˜๊ฒŒ ์„œ์ˆ ํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.
์šฐ์„  ์‹œ์ž‘ํ•˜๊ธฐ์— ์•ž์„œ, ํ•œ๋งˆ๋””๋กœ ์ด ๊ณผ์ •์„ ์š”์•…ํ•œ๋‹ค๋ฉด Conditional Posterior X Marginal์ผ ๊ฒƒ์ด๋‹ค.

STEP1. $p(\mu|\sigma^2,y)$ $p(\sigma^2|y)$์˜ ํ˜•ํƒœ๋ฅผ ํŒŒ์•…ํ•œ๋‹ค.

  1. $\mu|\sigma^2,y \text{ ~ } N(\bar{y}, \frac{\sigma^2}{n})$
    ์ด๋ถ€๋ถ„์€ ์œ„์˜ ํ‰๊ท ์„ ๋ชจ๋ฅด์ง€๋งŒ, ๋ถ„์‚ฐ์„ ์•„๋Š” ๊ฒฝ์šฐ์—์„œ prior precision $\frac{1}{\tau^2}=0$์œผ๋กœ ์ฃผ๋ฉด ์œ„์™€ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค. prior precision์„ 0์œผ๋กœ ์ฃผ๋Š” ์ด์œ ๋Š”, non-informative prior๋ฅผ ๊ฐ€์ •ํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

  2. $\sigma^2|y \text{ ~ } \chi^{-2}(n-1, s^2)$
    ์ด๋Š” ์•„๋ž˜์˜ ์ˆ˜์‹์„ ๊ณ„์‚ฐํ•ด์„œ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

\begin{align} p(\mu, \sigma^2|y) &\propto p(\mu, \sigma^2) \times p(y|\mu, \sigma^2) \\ &\propto \sigma^{-n-2}exp\bigg(\frac{-1}{2\sigma^2}\big[(n-1)s^2 + n(\bar{y}-\mu)^2\big]\bigg) \\ \rightarrow p(\sigma^2|y) &= \int p(\mu,\sigma^2|y)d\mu \end{align}

STEP2. ๋ฒ ์ด์ฆˆ๋ฃฐ์„ ์ด์šฉํ•˜์—ฌ posterior distribution์„ ๊ณ„์‚ฐํ•ด์ค€๋‹ค.

์œ„์˜ ๊ณผ์ •์„ ๊ฑฐ์นœ๋‹ค๋ฉด, ๊ทธ ๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

\begin{align} \mu|\sigma^2,y &\text{ ~ } N(\bar{y}, \frac{\sigma^2}{n}) \\ \sigma^2|y &\text{ ~ } \chi^{-2}(n-1, s^2) \\ \mu, \sigma^2 |y &\text{ ~ } N(\bar{y}, \frac{\sigma^2}{n}) \times \chi^{-2}(n-1, s^2) \end{align}


Posterior Mean์˜ Marginal Distribution ๊ตฌํ•˜๊ธฐ

๋ฒˆ์™ธ๋กœ, $\mu$์˜ marginal posterior distribution $p(\mu|y)$์€ $\int p(\mu,\sigma^2)d\sigma^2$๋ฅผ ํ†ตํ•ด์„œ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ˜•ํƒœ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
$$p(\mu|y) \text{ ~ } t_{n-1}(\bar{y}, \frac{s^2}{n})$$

Posterior Prediction ๊ตฌํ•˜๋Š” ๊ณผ์ •

\begin{align} p(\tilde{y}|y) &= \int\int p(\tilde{y}|\mu,\sigma^2) p(\mu, \sigma^2|y)\ d\mu \ d\sigma^2 \\ &= \int\int p(\tilde{y}|\mu,\sigma^2) \ p(\mu|\sigma^2,y)\ d\mu \cdot p(\sigma^2|y) \ d\sigma^2 \\ &= \int p(\tilde{y}|\sigma^2) \ p(\sigma^2|y) \ d\sigma^2 \end{align}

Posterior Predictive: $\tilde{y}|y \text{ ~ } t_{n-1}(\bar{y}, (1+\frac{1}{n}s^2))$
์ด ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”๋กœ ์œ„์˜ Posterior Mean์˜ marginal ๋ถ„ํฌ์™€ ๋น„๊ตํ•ด๋ณด๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.
์™œ๋ƒํ•˜๋ฉด prediction์„ ํ•  ๋•Œ์— $s^2$, ์ฆ‰ uncertainty๊ฐ€ ์ถ”๊ฐ€๋œ๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

Two parameter Normal model์ด ์ค‘์š”ํ•œ ์ด์œ ๋Š” ๋‹ค์Œ 3. Frequentist์™€ Bayesian์˜ ์ฐจ์ด์„ ๋ณด๋ฉด ๋ช…ํ™•ํ•˜๋‹ค. Frequentist์™€ Bayesian์˜ ๊ธฐ๋ณธ์ ์ธ ์ „์ œ์™€ ์ž…์žฅ ์ฐจ์ด๋ฅผ ์ดํ•ดํ•œ๋‹ค๋ฉด, ์ •๋ณด๊ฐ€ ์—†๋Š” prior๊ฐ€ ๊ฒฐ๊ตญ ์–ด๋– ํ•œ ๊ฒฐ๋ก ์œผ๋กœ ์ด์–ด๊ฐ€๋Š”์ง€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.

2-2. conjugate prior

Prior: $p(\mu, \sigma^2) = p(\mu|\sigma^2) \times p(\sigma_0^2) \text{ ~ N-Inv-} \chi^2(\mu_0, \frac{\sigma^2}{k_0}; v_0, \sigma_0^2)$
\begin{align} \mu|\sigma^2 &\text{ ~ } N(\mu_0, \frac{\sigma^2}{k_0}) \\ \sigma^2 &\text{ ~ } \chi^{-2}(v_0, \sigma^2_0) \\ \rightarrow \mu, \sigma^2 &\propto \sigma^{-1}(\sigma^2)^{-(\frac{v_0}{2}+1)}exp\bigg(\frac{-1}{2\sigma^2}\big[v_0\sigma_0^2 + k_0(\mu_0 - \mu)^2\big]\bigg) \end{align}
Likelihood: $p(y|\mu, \sigma^2) \propto \sigma^{-n}exp\bigg(\frac{-1}{2\sigma^2}\sum_{i=1}{n}(y_i-\mu)^2\bigg)$
Posterior: $p(\mu, \sigma^2|y) \text{ ~ N-Inv-}\chi^2(\mu_n, \frac{\sigma_n^2}{k_n}; v_n, \sigma_n^2) $

\begin{align} \mu_n &= \frac{k_0}{k_0+n}\mu_0 + \frac{n}{k_0+n}\bar{y} \\ k_n &= k_0 +n \\ v_n &= v_o + n \\ v_n\sigma_n^2 &= v_0\sigma_0^2 + (n-1)s^2 + \frac{k_0n}{k_0+n}(\bar{y}-\mu_0)^2 \\ \rightarrow \text{posterior ss} &= \text{prior ss} + \text{sample ss} + \text{additional uncertainty}(\bar{y}-\mu_0) \end{align}

3. Frequentist์™€ Bayesian์˜ ์ฐจ์ด

Frequentist: parameter๋ฅผ ์•Œ ๋•Œ, ํ†ต๊ณ„๋Ÿ‰์˜ ๋ถ„ํฌ์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•œ๋‹ค.
let $y \text{ ~ } N(\mu, \sigma^2)$

  1. $\bar{y} \text{ ~ } N(\mu, \frac{\sigma^2}{n}) $
  2. $\frac{(n-1)s^2}{\sigma^2} \text{ ~ } \chi^2(n-1)$
  3. $\frac{\bar{y}-\mu}{s/\sqrt{n}}|\mu,\sigma^2 \text{ ~ } t_{n-1}$

Bayesian: data๋ฅผ ์•Œ ๋•Œ, parameter์˜ ๋ถ„ํฌ์— ๋Œ€ํ•ด ์ด์•ผ๊ธฐํ•œ๋‹ค.

  1. $\mu \text{ ~ } N(\bar{y}, \frac{\sigma^2}{n})$
  2. $\sigma^2 \text{ ~ } \chi^{-2}(n-1, s^2)$
  3. $\frac{\mu-\bar{y}}{s/\sqrt{n}}|y \text{ ~ } t_{n-1} $

๋งŒ์•ฝ Bayesian์ด noninformative prior๋ฅผ ๊ฐ€์ •ํ•œ๋‹ค๋ฉด, ์ฆ‰ prior๊ฐ€ ๊ฑฐ์˜ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค๋ฉด frequetist๋ž‘ ๊ฒฐ๊ณผ๊ฐ€ ๋น„์Šทํ•˜๊ฒŒ ๋‚˜์˜ค๋Š” ๊ฒƒ์€ ๋‹น์—ฐํ•˜๋‹ค.

4. Multinomial Model

Likelihood: $y|\theta \text{ ~ Multinomial}(\theta) \propto \prod_{j=1}^{k}\theta_j^{y_j}$
Prior: $\theta \text{ ~ } Dir(\alpha) \propto \prod_{j=1}^{k}\theta_j^{\alpha_j-1}$
Posterior: $\theta|y \text{ ~ } Dir(\alpha +y) \propto \prod_{j=1}^{k}\theta_j^{\alpha_j-y_j-1}$

์ฐธ๊ณ ๋กœ Multinomial distribution์€ ์ดํ•ญ๋ถ„ํฌ์˜ ํ™•์žฅ์ด๋ฉฐ, Dirichlet distribution์€ ๋ฒ ํƒ€๋ถ„ํฌ์˜ ํ™•์žฅ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ์‰ฝ๋‹ค. ์™œ๋ƒํ•˜๋ฉด Beta-Binomial ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋Š” Chapter3์—์„œ ์ด๋ฏธ ์ถฉ๋ถ„ํžˆ ๋‹ค๋ฃจ์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.



ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.