What is Bayesian

Chapter 01.
Introduction and Examples

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.
์ด๋ฒˆ ์žฅ์„ ํ†ตํ•ด์„œ๋Š” Likelihood and Prior๋ฅผ ์‚ดํŽด๋ณด๊ณ  Full probability model์˜ ์˜๋ฏธ๋ฅผ ๋ณด๋Š” ๋ฐ์— ์ฃผ๋ชฉํ•ด๋ณด์Ÿˆ.

๋ฒ ์ด์ง€์•ˆ ์ถ”๋ก ์˜ ๋ชฉ์ 

์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ ํš๋“์„ ํ†ตํ•ด, ๋ชจ์ง‘๋‹จ ํŠน์„ฑ์— ๋Œ€ํ•œ ๋ถˆํ™•์‹ค์„ฑ์„ ์ค„์—ฌ๋‚˜๊ฐ€๊ณ ์ž ํ•œ๋‹ค. ์ด๋•Œ, ๋ถˆํ™•์‹ค์„ฑ ์ •๋„์˜ ๋ณ€ํ™” ์ˆ˜์ค€์„ ๊ณ„๋Ÿ‰ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ฒ ์ด์ง€์•ˆ ์ถ”๋ก ํ†ต๊ณ„์˜ ๋ชฉ์ ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ•ต์‹ฌ ๊ฐœ๋…

  1. prior distribution $p(\theta)$
    • ์‚ฌ์ „ํ™•๋ฅ 
    • ๋ชจ์ˆ˜์— ๋Œ€ํ•ด ๊ธฐ์กด์— ๊ฐ–๊ณ  ์žˆ๋˜ ๋ฏฟ์Œ์˜ ์ •๋„
  2. sampling model $p(y|\theta)$
    • ์ผ์ข…์˜ ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜(likelihood)
    • ์‚ฌ์ „ํ™•๋ฅ ์ด ์ฐธ์ด๋ผ๋Š” ๊ฐ€์ • ํ•˜์—, ํŠน์ • ๋ฐ์ดํ„ฐ๊ฐ€ ๊ด€์ฐฐ๋œ ํ™•๋ฅ 
  3. posterior distribution $p(\theta|y)$
    • ๋ฐ์ดํ„ฐ๊ฐ€ ๊ด€์ฐฐ๋˜์—ˆ์„ ๋•Œ, ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ˆ˜์ •๋œ ๋ชจ์ˆ˜์— ๋Œ€ํ•œ ๋ฏฟ์Œ์˜ ์ •๋„

Bayes' Rule

$$p(\theta|y) = \frac{p(y|\theta)p(\theta)}{\int_{\Theta}p(y|\tilde{\theta})p(\tilde{\theta})d\tilde{\theta}}$$

์ด๋Š” ์‚ฌํ›„๋ถ„ํฌ๊ฐ€ ์‚ฌ์ „๋ถ„ํฌ์™€ ๊ฐ€๋Šฅ๋„ ํ•จ์ˆ˜์— ์˜ํ•ด ์–ด๋–ป๊ฒŒ ์—…๋ฐ์ดํŠธ ๋˜๋Š”์ง€๋ฅผ ์ˆ˜์‹์ ์œผ๋กœ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์ด๋‹ค.
๋ฒ ์ด์ฆˆ ํ†ต๊ณ„์˜ ์ „๋ถ€๋ผ๊ณ  ํ•ด๋„ ๋ฌด๋ฐฉํ•˜๋‹ค.

ํ™œ์šฉ์˜ˆ์‹œ

  1. ํฌ์†Œ์‚ฌ๊ฑด ํ™•๋ฅ  ์ถ”์ •(Estimation)
    • ๊ฐ์—ผ ํ™•๋ฅ (infectious probability)
    • ํ™•๋ฅ ๋ก ์ž(frequentist)๋Š” sample์ด ์ ์„ ๋•Œ ํ™•๋ฅ ์ถ”์ •์„ ํ•ฉ๋ฆฌ์ ์œผ๋กœ ํ•˜๋Š” ๋ฐ์— ์žˆ์–ด์„œ ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 20๋ช…๋งŒ์„ ๋Œ€์ƒ์œผ๋กœ ๊ฐ์—ผ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ณ  ๊ฐ์—ผ ํ™•๋ฅ ์„ ์ถ”๋ก ํ•œ๋‹ค๋ฉด, ๊ฐ์—ผํ™•๋ฅ ์„ 0%๋ผ๊ณ  ์ œ์•ˆํ•˜๋Š” ๊ฒƒ์€ ํ†ต๊ณ„์ ์œผ๋กœ๋Š” ๊ทธ๋Ÿด ๋“ฏํ•˜๊ฒŒ ๊ณ„์‚ฐ๋  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Š” ํ˜„์‹ค๊ณผ๋Š” ๋‹ค์†Œ ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค.
    • ์ด์— ๋ฐ˜ํ•ด, ๋ฒ ์ด์ง€์•ˆ์€ ๊ฐ์—ผ ํ™•๋ฅ ์„ ๋ถ„ํฌ๋กœ์„œ ์ œ์‹œํ•  ๋ฟ๋”๋Ÿฌ ๊ธฐ์กด์˜ ๋ฏฟ์Œ์„ ์‚ฌ์ „ํ™•๋ฅ ๋กœ์„œ ์ œ์•ˆํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋Ÿฌํ•œ ๋ถ€๋ถ„์— ์žˆ์–ด์„œ ๋œ ์ทจ์•ฝํ•  ์ˆ˜ ์žˆ๋‹ค.
  2. ์˜ˆ์ธก ๋ชจ๋ธ ๊ตฌ์ถ•(Prediction)
    • ๋‹น๋‡จ๋ณ‘(diabetes progression)
    • 50% ํ™•๋ฅ ๋กœ ๋ณ€์ˆ˜์˜ coefficient๊ฐ€ 0๋ผ๊ณ  ์‚ฌ์ „ํ™•๋ฅ ์„ ์ œ์•ˆํ•œ๋‹ค๋ฉด, ๋ณ€์ˆ˜์„ ํƒ์˜ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.
    • ์ด์™€ ๊ด€๋ จ๋œ ์ž์„ธํ•œ ๋‚ด์šฉ์€ FCB chapter 09์„œ Bayesian Linear Regression๊ณผ ๊ด€๋ จํ•˜์—ฌ ์„ค๋ช…๋  ์˜ˆ์ •์ด๋‹ค.

ETC

  • ‘Adjusted’ Wald interval
    ํ”ํžˆ ์•Œ๋ ค์ง„ ์‹ ๋ขฐ๊ตฌ๊ฐ„์„ ๋ฒ ์ด์ง€์•ˆ์ ์œผ๋กœ ๋ฐ”๊พผ ํ˜•ํƒœ์ด๋‹ค.

`\hat{\theta} \pm 1.96\sqrt{\hat{\theta}(1-\hat{\theta})//n}` , where

`\hat{\theta} = \frac{n}{n+4}\bar{y} + \frac{4}{n+4}\frac{1}{2}`

  • Lasso
    ๋ณ€์ˆ˜ ์„ ํƒ์˜ ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ์•„๋ž˜ ์ œ์‹œ๋œ SSR๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค.
    ๋ฒ ์ด์ง€์•ˆ์˜ ๋งฅ๋ฝ์—์„œ ์ฒ˜์Œ ์—ฐ๊ตฌ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ์•„๋‹ˆ์ง€๋งŒ, ํŠน์ • ์‚ฌ์ „ํ™•๋ฅ ์„ ์ ์šฉํ•œ๋‹ค๋ฉด ๋ฒ ์ด์ง€์•ˆ์˜ ๊ด€์ ๊ณผ ์ผ์น˜ํ•œ๋‹ค.
    ์—ฌ๊ธฐ์„œ ๋งํ•˜๋Š” ๊ทธ ํŠน์ • ์‚ฌ์ „ํ™•๋ฅ ๋ถ„ํฌ๋ž€, $\beta_j$๊ฐ€ 0์—์„œ ์ฒจ์ ์„ ๊ฐ–๋Š” ๋ผํ”Œ๋ผ์Šค ๋ถ„ํฌ(๋˜๋Š” double-exponential distribution)๋ฅผ ๋”ฐ๋ฅธ๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.
    ๊ทธ๋ฆฌ๊ณ  ์ด๋•Œ lasso estimate์€ $\beta$์˜ ์‚ฌํ›„ ์ตœ๋นˆ๊ฐ’(posterior mode)๊ณผ ๊ฐ™๋‹ค.
    $$SSR(\beta:\lambda) = \sum_{i=1}^{n}(y_i-\boldsymbol{x_i}^T\boldsymbol{\beta})^2 + \lambda\sum_{j=1}^{n}|\beta_j|$$

Conclusion

"All models are wrong, but some are useful"

- Box and Draper, 1987




ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.