MVN

Chapter 07.
The Multivariate Normal Model

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.

1. Multivariate Normal Desity

univariate model์— ๋Œ€ํ•ด์„œ ์•ž์„  ์ฑ•ํ„ฐ์—์„œ ์ด์•ผ๊ธฐ๋ฅผ ๋งŽ์ด ํ–ˆ์ง€๋งŒ, ์‚ฌ์‹ค ํ˜„์‹ค์„ธ๊ณ„์—์„œ๋Š” multivariate์ธ ๊ฒฝ์šฐ๊ฐ€ ํ›จ์”ฌ ๋งŽ๋‹ค.

1-1. Bivariate Normal

1
2
3
4
5
library(tidyverse)
library(gridExtra)
library(MASS)
library(reshape2)
library(ash)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#### 4-1. Draw yourself Figure 7.1

# ์ดˆ๊ธฐ ์„ค์ •
inv <- solve
MU = matrix(c(50,50), ncol=1)
SIGMA = matrix(c(64,0,0,144), ncol=2)

# MVN pdf
calc.dmvn = Vectorize(function(a,b, mu=MU, sigma=SIGMA){
  y <- c(a,b)
  log.p <- (-nrow(mu)/2)*log(2*pi) - 0.5*log(det(sigma)) - 0.5*(t(y-mu) %*% inv(sigma) %*% (y-mu))
  exp(log.p)
})
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# do it at once
allInOne <- function(corr){
  SIGMA = matrix(c(64,0,0,144), ncol=2)
  s1 <- sqrt(SIGMA[1,1]); s2 <- sqrt(SIGMA[2,2])
  SIGMA[1,2] <-  s1*s2*corr; SIGMA[2,1] <-  s1*s2*corr
  
  # MVN density function
  calc.dmvn = Vectorize(function(a,b, mu=MU, sigma=SIGMA){
    y <- c(a,b)
    log.p <- (-nrow(mu)/2)*log(2*pi) - 0.5*log(det(sigma)) - 0.5*(t(y-mu) %*% inv(sigma) %*% (y-mu))
    exp(log.p)
  })
  
  # sample
  sample = mvrnorm(n=30, mu=MU, Sigma=SIGMA)
  sample = data.frame(sample)
  colnames(sample) = c('y1','y2')
  
  # calculate density
  xLim = seq(20, 80, length=101)
  yLim = seq(20, 80, length=101)
  density.mvn <- outer(xLim, yLim, FUN=calc.dmvn)
  rownames(density.mvn) <- xLim
  colnames(density.mvn) <- yLim
  density.mvn <- melt(density.mvn)
  
  # graph
  density.mvn %>% 
    ggplot(aes(x=Var1, y=Var2)) +
    geom_tile(aes(fill=value, alpha=value)) +
    geom_contour(aes(z=value), color='white', size=0.1) +
    geom_point(data=sample, mapping=aes(x=y1, y=y2, color='red'), show.legend=FALSE) +
    scale_fill_gradient(low='grey', high='steelblue', guide=FALSE) +
    scale_alpha(guide=FALSE) +
    theme(legend.position='None') + theme_bw() +
    ggtitle(paste0('corr=',corr)) + xlab('y1') + ylab('y2')
}
1
2
3
4
p1 <- allInOne(corr=-0.5)
p2 <- allInOne(corr=0)
p3 <- allInOne(corr=0.5)
grid.arrange(p1,p2,p3, nrow=1)

1-2. Multivariate Normal Model

$$p(\boldsymbol{y}|\boldsymbol{\mu}, \Sigma) = (2\pi)^{-p/2}|\Sigma|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{y}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y}-\boldsymbol{\mu}) \Big) $$
where

$$\boldsymbol{y} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_p \end{pmatrix}$$

$$\boldsymbol{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \end{pmatrix}$$

$$\Sigma = \begin{pmatrix} \sigma^2_{1} & \sigma_{1,2} & \cdots & \sigma_{1,p} \\ \sigma_{2,1} & \sigma^2_{2} & \cdots & \sigma_{2,p} \\ \vdots & \vdots & & \vdots \\ \sigma_{p,1} & \cdots & \cdots & \sigma^2_{p} \end{pmatrix}$$

2. Semiconjugate prior distribution for the mean (known covariance matrix)

Semiconjugate๋ผ๊ณ  ํ•˜๋Š” ๊ฒƒ์€, ๋‘ ๋ชจ์ˆ˜ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ฃผ์–ด์กŒ์„ ๊ฒฝ์šฐ์— conjugateํ•œ ๊ฒฝ์šฐ๋ฅผ ๋œปํ•œ๋‹ค.
์—ฌ๊ธฐ์„œ๋Š” ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ‰๊ท  ๋ฒกํ„ฐ์˜ semiiconjugate prior๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ(์กฐ๊ธˆ ๋” ์‰ฌ์›€)์„ ๋จผ์ € ๋ณด๊ณ  ์ด์–ด์„œ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ semiconjugate prior๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์„ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค.

Prior: $\boldsymbol{\mu} \text{ ~ } MVN(\boldsymbol{\mu_0}, \Lambda_0)$

\begin{align} p(\boldsymbol{\mu}) &= (2\pi)^{-p/2}|\Lambda_0|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{\mu_0})^T\Lambda_0^{-1}(\boldsymbol{\mu}-\boldsymbol{\mu_0})\Big) \\ &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^T\Lambda^{-1}\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\Lambda^{-1}_0\boldsymbol{\mu_0}) \\ &= exp(-\frac{1}{2}\boldsymbol{\mu}^TA_0\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_0}) \end{align}

where $A_0 = \Lambda^{-1}_0, \boldsymbol{b_0} = \Lambda^{-1}_0\boldsymbol{\mu_0}$

Likelihood: $Y_1, ..., Y_n|\boldsymbol{\mu},\Sigma \text{ ~ iid } MVN(\boldsymbol{\mu}, \Sigma)$

\begin{align} p(\boldsymbol{y_1}, ...,\boldsymbol{y_n}|\boldsymbol{\mu},\Sigma) &= \prod^{n}_{i=1} (2\pi)^{-p/2}|\Sigma|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{y_i}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y_i}-\boldsymbol{\mu}) \Big) \\ &= (2\pi)^{-np/2}|\Sigma|^{-n/2}exp\Big(-\frac{1}{2}\sum_{i=1}^{n}(\boldsymbol{y_i}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y_i}-\boldsymbol{\mu}) \Big) \\ &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^TA_1\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_1}) \end{align}

where $A_1 = n\Sigma^{-1}, \boldsymbol{b_1} = n\Sigma^{-1}\boldsymbol{\bar{y}}$

Posterior: $\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \text{ ~ } MVN(\boldsymbol{\mu_n}, \Lambda_n)$

$$
\begin{align} p(\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma) &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^TA_0\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_0}) \times exp(-\frac{1}{2}\boldsymbol{\mu}^TA_1\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_1}) \\ &= exp(-\frac{1}{2}\boldsymbol{\mu}^TA_n\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_n}) \\ \\ \text{where } A_n &= A_0 + A_1 = \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{b_n} &= \boldsymbol{b_0} + \boldsymbol{b_1} = \Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}} \\ \\ \rightarrow \Lambda_n^{-1} &= \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{\mu_n} &= (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}}) \end{align}
$$

3. Semiconjugate prior distribution for the covariance matrix (known mean)

์ด๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋ณต์žกํ•œ๋ฐ, ๊ทธ ์ด์œ ๋Š” ์ด์ „๊ณผ ๋‹ฌ๋ฆฌ matrix ํ˜•ํƒœ์—๋‹ค๊ฐ€ prior์„ ์ฃผ์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.
๊ทธ๋ž˜์„œ ๊ตฌ์ฒด์ ์ธ prior์™€ likelihood๋ฅผ ์ด์•ผ๊ธฐํ•˜๊ธฐ ์•ž์„œ์„œ ํ•„์š”ํ•œ ๋‘ ๊ฐ€์ง€ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ์งš๊ณ  ๋„˜์–ด๊ฐ€๋„๋ก ํ•˜์ž.
์ฒซ ๋ฒˆ์งธ๋Š” inverse-Wishart ๋ถ„ํฌ์ด๋ฉฐ, ๋‹ค์Œ์€ Positive Definite์ด๋ผ๋Š” ์„ ํ˜•๋Œ€์ˆ˜ ๊ฐœ๋…์ด๋‹ค.

3-1. inverse-Wishart Distribution

inverse-Wishart Distribution์€ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ semiconjugate prior์„ ์ฃผ๊ธฐ ์œ„ํ•ด์„œ ์•Œ์•„์•ผ ํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค. ๋‚ฏ์„  ํ™•๋ฅ ๋ถ„ํฌ์ฒ˜๋Ÿผ ๋ณด์ด๊ธฐ๋„ ํ•˜์ง€๋งŒ, ์ž์„ธํžˆ ์‚ดํŽด๋ณด๋ฉด ์ด๋Š” inverse-Gamma distribution์˜ ๋‹ค์ฐจ์› ํ™•์žฅ ๋ฒ„์ „์— ๋ถˆ๊ณผํ•˜๊ธด ํ•˜๋‹ค.

3-2. Positive Definite

Covariance Matrix๋Š” Positive Definite Matrix์ด์–ด์•ผ ํ•œ๋‹ค. Positive Definite์˜ ์ •์˜๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$$\boldsymbol{x'}\Sigma\boldsymbol{x} > 0 \ \text{ for all vectors} \ \boldsymbol{x}$$

univariate case์—์„œ ๋ถ„์‚ฐ์ด ์–ธ์ œ๋‚˜ 0 ์ด์ƒ์ด์–ด์•ผ ํ•˜๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์ด์™€ ๊ฐ™์€ ๋งฅ๋ฝ์˜ ์กฐ๊ฑด์„ ๋‹ค์ฐจ์›์—์„œ ๋งŒ์กฑํ•˜๋ ค๋ฉด PD(Positive Definite)์ด์–ด์•ผ ํ•œ๋‹ค.
๋งŒ์•ฝ ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์ด Positive Definiteํ•˜๋‹ค๋ฉด, ์ด๋Š” ๋ชจ๋“  ๋ถ„์‚ฐ์ด 0๋ณด๋‹ค ํฌ๋ฉฐ ๊ณต๋ถ„์‚ฐ์ด -1๊ณผ 1 ์‚ฌ์ด์— ์žˆ๋„๋ก ํ•œ๋‹ค.
๋˜ํ•œ, PD๋Š” ๋Œ€์นญํ–‰๋ ฌ(symmetric)์—์„œ ์ •์˜๋˜๋Š” ๊ฐœ๋…์ด๊ธฐ ๋•Œ๋ฌธ์—, $\sigma_{i,j} = \sigma_{j,i}$๋ผ๋Š” ์กฐ๊ฑด๋„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์„ฑ๋ฆฝํ•œ๋‹ค.

3-2-1. Positive Definite์ด ๋˜๊ธฐ ์œ„ํ•œ ์กฐ๊ฑด์€?

๋‹ค์‹œ ํ•œ๋ฒˆ ๋งํ•˜์ž๋ฉด, Positive Definite์€ ๋Œ€์นญํ–‰๋ ฌ์˜ ํŠน์ˆ˜ํ•œ ํ˜•ํƒœ์ด๋ฉฐ, ๋ชจ๋“  eigenvalue๋“ค์ด 0๋ณด๋‹ค ํฌ๋‹ค๋Š” ๋ง๊ณผ ๊ฐ™๋‹ค.
์—ฌ๊ธฐ์„œ eigenvalue๊ฐ€ 0๋ณด๋‹ค ํฌ๋‹ค๋Š” ๊ฒƒ์€ ์ •ํ™•ํžˆ ๋ฌด์Šจ ์˜๋ฏธ์ผ๊นŒ?

์ •๋ฐฉํ–‰๋ ฌ์„ Spectral Decomposition์„ ํ–ˆ์„ ๋•Œ, $A = VDV^{-1}$

$$A_{p\text{ x }p} = \begin{bmatrix} | & & |\\ a_1 & \cdots & a_p\\ | & & |\\ \end{bmatrix} = \begin{bmatrix} | & & |\\ v_1 & \cdots & v_p\\ | & & | \\ \end{bmatrix} \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_p \end{bmatrix} \begin{bmatrix} | & & |\\ v_1 & \cdots & v_p\\ | & & |\\ \end{bmatrix}^{-1}$$

eigenvalue๊ฐ€ 0๋ณด๋‹ค ํฌ๋‹ค๋Š” ๊ฒƒ์€, ์„ ํ˜•๋ณ€ํ™˜์„ ํ–ˆ์„ ๋•Œ ๊ทธ ๊ธฐ์ €์˜ ๋ฐฉํ–ฅ์ด ๋ฐ˜๋Œ€๋กœ ๋ฐ”๋€Œ์ง€๋Š” ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

3-3. random Covariance Matrix ๋งŒ๋“ค๊ธฐ

์ด๋Š” covariance matrix์— ๋Œ€ํ•ด uninformative prior๋ฅผ ์ฃผ๊ธฐ ์œ„ํ•จ์ด๋‹ค.

$$\frac{1}{n}\sum_{i=1}^n\boldsymbol{z_iz_i^T} = \frac{1}{n}Z^TZ$$

$$\boldsymbol{z_iz_i^T} = \begin{pmatrix} z_{i,1}^2 & z_{i,1}z_{i,2} & \cdots & z_{i,1}z_{i,p} \\ z_{i,2}z_{i,1} & z_{i,2}^2 & \cdots & z_{i,2}z_{i,p} \\ \vdots & & & \vdots \\ z_{i,p}z_{i,1} & z_{i,p}z_{i,2} & \cdots & z_{i,p}^2 \end{pmatrix}$$

\begin{align} \frac{1}{n}\big[Z^TZ\big]_{j,j} &= \frac{1}{n}\sum_{i=1}^{n}z_{i,j}^2 = s_{j,j} = s_j^2\\ \frac{1}{n}\big[Z^TZ\big]_{j,k} &= \frac{1}{n}\sum_{i=1}^{n}z_{i,j}z_{i,k} = s_{j,k} \end{align}

์—ฌ๊ธฐ์„œ n > p ์ด๊ณ , ๋ชจ๋“  $\boldsymbol{z_i}$๋“ค์ด ์„œ๋กœ ์„ ํ˜•๋…๋ฆฝ์ด๋ผ๋ฉด, $Z^TZ$๋Š” ํ•ญ์ƒ positive definite์ผ ๊ฒƒ์ด๋‹ค.
$$\text{Proof) } \boldsymbol{x}^{T}Z^{T}Z\boldsymbol{x} = (Z\boldsymbol{x})^{T}(Z\boldsymbol{x}) = ||Z\boldsymbol{x}||^2 \ge 0$$

STEP1. Set $\nu_0$(prior sample size), $\Phi_0$(prior covariance matrix)
STEP2. Sample $\boldsymbol{z_i} \ \stackrel{iid}{\sim} \ MVN(\boldsymbol{0}, \Phi_0)$
STEP3. Calculate $Z_TZ = \sum_{i=1}^{\nu_0}\boldsymbol{z_iz_i^T}$
STEP4. repeat the procedure S times generating $Z_i^TZ_i$
${Z_1^TZ_1, Z_2^TZ_2, ..., Z_S^TZ_S} \text{ ~ } Wis(\nu_0, \Phi_0)$

Wishart๋ถ„ํฌ์™€ inv-Wishart๋ถ„ํฌ ํŠน์ง•

$$\Sigma^{-1} \sim Wis(\nu_0, S_0^{-1}) \rightarrow E[\Sigma^{-1}]=\nu_0S_0^{-1} \\ \Sigma \sim Wis^{-1}(\nu_0, S_0^{-1}) \rightarrow E[\Sigma]=\frac{1}{\nu_0-p-1}S_0$$

covariance matrix semiconjugate prior ๋ชจ์ˆ˜ ์„ค์ • ๋ฐฉ๋ฒ•

1-1. If belief that $\Sigma = \Sigma_0$ is strong, $\nu_0$ \uparrow
1-2. If belief that $\Sigma = \Sigma_0$ is weak, $\nu_0 = p+2$
2. Set $S_0=(\nu_0-p-1)\Sigma_0$

3-4. Full conditional distribution of Covariance Matrix

Prior: $\Sigma \sim \text{inv-}Wis(\nu_0, S_0^{-1})$

$$p(\Sigma) = \bigg[2^{\nu_0p/2}\pi^{p/2}|S_0|^{\nu_0/2}\prod_{j=1}^{p}\Gamma(\frac{\nu_0+1-j}{2})\bigg]^{-1} \times |\Sigma|^{-(\nu_0+p+1)/2} \times exp\Big(-\frac{1}{2}tr(S_0\Sigma^{-1})\Big) $$

Likelihood: $\boldsymbol{Y}|\boldsymbol{\mu} \stackrel{iid}\sim MVN(\boldsymbol{\mu}, \Sigma)$

\begin{align} p(\boldsymbol{y_1, ..., y_n}|\boldsymbol{\mu}, \Sigma) &= (2\pi)^{-np/2}|\Sigma|^{-n/2} exp\bigg(-\frac{1}{2}\sum_{i=1}^{n} \boldsymbol{(y_i-\mu)}^T\Sigma^{-1}\boldsymbol{(y_i-\mu)} \bigg) \\ &\propto |\Sigma|^{-n/2}exp\bigg(-\frac{1}{2}tr(S_\mu\Sigma^{-1})\bigg) \end{align}

where $S_\mu = \sum_{i=1}^{n}\boldsymbol{(y_i-\mu)}\boldsymbol{(y_i-\mu)}^T$

Posterior: $\Sigma|\boldsymbol{y} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$

\begin{align} p(\Sigma|\boldsymbol{y_1, ..., y_n, \mu}) &\propto p(\Sigma)p(\boldsymbol{y_1, ..., y_n}|\boldsymbol{\mu},\Sigma) \\ &\propto|\Sigma|^{-(\nu_0+p+1)/2} \times exp\Big(-\frac{1}{2}tr(S_0\Sigma^{-1})\Big) \times |\Sigma|^{-n/2}exp\bigg(-\frac{1}{2}tr(S_\mu\Sigma^{-1})\bigg) \\ &\propto |\Sigma|^{-(\nu_0+p+n+1)/2}exp\bigg(-\frac{1}{2}tr([S_0+S_\mu]\Sigma^{-1})\bigg) \end{align}

\begin{align} E[\Sigma|\boldsymbol{y_1, ..., y_n, \mu}] &= \frac{1}{\nu_0+n-p-1}(S_0+S_\mu) \\ &= \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\frac{1}{\nu_0-p-1}S_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu \\ &= \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\Sigma_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu \end{align}

Summary

  1. Semiconjugate prior for $\mu$
    Prior: $\boldsymbol{\mu} \text{ ~ } MVN(\boldsymbol{\mu_0}, \Lambda_0)$
    Likelihood: $Y_1, ..., Y_n|\boldsymbol{\mu},\Sigma \text{ ~ iid } MVN(\boldsymbol{\mu}, \Sigma)$
    Posterior: $\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \text{ ~ } MVN(\boldsymbol{\mu_n}, \Lambda_n)$

\begin{align} \Lambda_n^{-1} &= \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{\mu_n} &= (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}}) \end{align}

  1. Semiconjugate prior for $\Sigma$
    Prior: $\Sigma \sim \text{inv-}Wis(\nu_0, S_0^{-1}) \text{ where } S_0 = (\nu_0-p-1)\Sigma_0$
    Likelihood: $\boldsymbol{Y}|\boldsymbol{\mu} \stackrel{iid}\sim MVN(\boldsymbol{\mu}, \Sigma)$
    Posterior: $\Sigma|\boldsymbol{y_1, ..., y_n} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$

$$E[\Sigma|\boldsymbol{y_1, ..., y_n, \mu}] = \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\Sigma_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu$$

4-2. Draw yourself Figure 7.2

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Load Data
test <- matrix(c(59, 43, 34, 32, 42, 38, 55, 67, 64, 45, 49, 72, 34, 
                 70, 34, 50, 41, 52, 60, 34, 28, 35, 77, 39, 46, 26, 38, 43, 68, 
                 86, 77, 60, 50, 59, 38, 48, 55, 58, 54, 60, 75, 47, 48, 33), ncol=2, byrow=FALSE)
colnames(test) <- c('pretest','posttest')

# Preparing
n <- nrow(test)
ybar <- colMeans(test)
Sigma <- cov(test)
THETA <- NULL
SIGMA <- NULL
inv <- solve
sample.size = 5000
sample.new = NULL

# prior
mu0 <- c(50,50); nu0 <- 4 #(nu0 = p+2 = 4) 
S0 <- L0 <- matrix(c(625,312.5,312.5,625), nrow=2, ncol=2)

set.seed(2021)
for(i in 1:sample.size){
  # update theta
  Ln = inv(inv(L0) + n*inv(Sigma))
  mun = Ln %*% (inv(L0)%*%mu0 + n*inv(Sigma)%*%ybar)
  theta = mvrnorm(1, mun, Ln)
  
  # update sigma
  Sn = S0 + (t(test)-theta)%*%t(t(test)-theta)
  Sigma = inv(rWishart(1, nu0+n, inv(Sn))[,,1])
  
  # Save results
  THETA <- rbind(THETA, theta)
  SIGMA <- rbind(SIGMA, c(Sigma))
  
  # sample new
  sample.new = rbind(sample.new, mvrnorm(n=1, mu=theta, Sigma=Sigma))
}
rownames(THETA) <- 1:sample.size
rownames(SIGMA) <- 1:sample.size
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# graph(์ฝ”๋“œ ๋”ฐ๋ผํ•˜๊ธฐ)
par(mfrow=c(1,2),mgp=c(1.75,.75,0),mar=c(3,3,1,1))

plot.hdr2d(THETA,xlab=expression(theta[1]),ylab=expression(theta[2]) )
abline(0,1)

plot.hdr2d(sample.new,xlab=expression(italic(y[1])),ylab=expression(italic(y[2])), 
           xlim=c(0,100),ylim=c(0,100) )
points(test[,1],test[,2],pch=16,cex=.7)
abline(0,1)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# graph(ggplot ํ™œ์šฉ)
p1 <- data.frame(THETA) %>% 
  ggplot(aes(x=pretest, y=posttest)) +
  geom_point(size=1, color='orange') +
  geom_abline(slope=1, intercept=0) +
  xlab(expression(theta[1])) + ylab(expression(theta[2])) +
  ggtitle('Posterior draws of Mu')

p2 <- data.frame(sample.new) %>% 
  ggplot(aes(x=pretest, y=posttest)) +
  geom_point(size=1, color='orange') +
  geom_abline(slope=1, intercept=0) +
  xlab(expression(y[1])) + ylab(expression(y[2])) +
  ggtitle('Posterior Predictive')

grid.arrange(p1, p2, nrow=1)

4. Gibbs Sampling of the mean and covariance

Gibbs sampling์€ full conditional distribution์„ ํ†ตํ•ด ์ฐจ๋ก€๋Œ€๋กœ ๋ชจ์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ joint posterior distribution์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค.

STEP1. Full conditional distribution์„ ํ™•๋ณดํ•œ๋‹ค.

  • $\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \sim MVN(\boldsymbol{\mu_n}, \Lambda_n)$
  • $\Sigma|\boldsymbol{y_1, ..., y_n}, \boldsymbol{\mu} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$

STEP2. ์ฐจ๋ก€๋Œ€๋กœ ์—…๋ฐ์ดํŠธํ•˜๋ฉด์„œ joint posterior distribution $\boldsymbol{\mu},\Sigma|\boldsymbol{y_1}, ..., \boldsymbol{y_n}$์„ ๊ตฌํ•œ๋‹ค.

4-1. NA imputation

MAR(Missing at Random)์ธ ๊ฒฝ์šฐ์—, missing data๋ฅผ ์ผ์ข…์˜ ๋ชจ์ˆ˜๋กœ ๋ณด๊ณ  gibbs sampling์„ ํ†ตํ•ด na imputation์„ ํ•ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

Conclusion

MVN๋„ ์ž˜ ์•Œ์•„๋‘์ž. inv-Wishart ๋ถ„ํฌ๋„!




ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.