Chapter 07.
The Multivariate Normal Model
๋ณธ ํฌ์คํ ์ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ์๋ค.
1. Multivariate Normal Desity
univariate model์ ๋ํด์ ์์ ์ฑํฐ์์ ์ด์ผ๊ธฐ๋ฅผ ๋ง์ด ํ์ง๋ง, ์ฌ์ค ํ์ค์ธ๊ณ์์๋ multivariate์ธ ๊ฒฝ์ฐ๊ฐ ํจ์ฌ ๋ง๋ค.
1-1. Bivariate Normal
1-2. Multivariate Normal Model
$$p(\boldsymbol{y}|\boldsymbol{\mu}, \Sigma) = (2\pi)^{-p/2}|\Sigma|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{y}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y}-\boldsymbol{\mu}) \Big) $$
where
$$\boldsymbol{y} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_p \end{pmatrix}$$
$$\boldsymbol{\mu} = \begin{pmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \end{pmatrix}$$
$$\Sigma = \begin{pmatrix} \sigma^2_{1} & \sigma_{1,2} & \cdots & \sigma_{1,p} \\ \sigma_{2,1} & \sigma^2_{2} & \cdots & \sigma_{2,p} \\ \vdots & \vdots & & \vdots \\ \sigma_{p,1} & \cdots & \cdots & \sigma^2_{p} \end{pmatrix}$$
2. Semiconjugate prior distribution for the mean (known covariance matrix)
Semiconjugate๋ผ๊ณ ํ๋ ๊ฒ์, ๋ ๋ชจ์ ์ค ํ๋๊ฐ ์ฃผ์ด์ก์ ๊ฒฝ์ฐ์ conjugateํ ๊ฒฝ์ฐ๋ฅผ ๋ปํ๋ค.
์ฌ๊ธฐ์๋ ๊ณต๋ถ์ฐ ํ๋ ฌ์ด ์ฃผ์ด์ก์ ๋, ํ๊ท ๋ฒกํฐ์ semiiconjugate prior๋ฅผ ๊ตฌํ๋ ๊ฒ(์กฐ๊ธ ๋ ์ฌ์)์ ๋จผ์ ๋ณด๊ณ ์ด์ด์ ๊ณต๋ถ์ฐ ํ๋ ฌ์ semiconjugate prior๋ฅผ ๊ตฌํ๋ ๊ฒ์ ์ดํด๋ณผ ๊ฒ์ด๋ค.
Prior: $\boldsymbol{\mu} \text{ ~ } MVN(\boldsymbol{\mu_0}, \Lambda_0)$
\begin{align} p(\boldsymbol{\mu}) &= (2\pi)^{-p/2}|\Lambda_0|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{\mu}-\boldsymbol{\mu_0})^T\Lambda_0^{-1}(\boldsymbol{\mu}-\boldsymbol{\mu_0})\Big) \\ &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^T\Lambda^{-1}\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\Lambda^{-1}_0\boldsymbol{\mu_0}) \\ &= exp(-\frac{1}{2}\boldsymbol{\mu}^TA_0\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_0}) \end{align}
where $A_0 = \Lambda^{-1}_0, \boldsymbol{b_0} = \Lambda^{-1}_0\boldsymbol{\mu_0}$
Likelihood: $Y_1, ..., Y_n|\boldsymbol{\mu},\Sigma \text{ ~ iid } MVN(\boldsymbol{\mu}, \Sigma)$
\begin{align} p(\boldsymbol{y_1}, ...,\boldsymbol{y_n}|\boldsymbol{\mu},\Sigma) &= \prod^{n}_{i=1} (2\pi)^{-p/2}|\Sigma|^{-1/2}exp\Big(-\frac{1}{2}(\boldsymbol{y_i}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y_i}-\boldsymbol{\mu}) \Big) \\ &= (2\pi)^{-np/2}|\Sigma|^{-n/2}exp\Big(-\frac{1}{2}\sum_{i=1}^{n}(\boldsymbol{y_i}-\boldsymbol{\mu})^T\Sigma^{-1}(\boldsymbol{y_i}-\boldsymbol{\mu}) \Big) \\ &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^TA_1\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_1}) \end{align}
where $A_1 = n\Sigma^{-1}, \boldsymbol{b_1} = n\Sigma^{-1}\boldsymbol{\bar{y}}$
Posterior: $\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \text{ ~ } MVN(\boldsymbol{\mu_n}, \Lambda_n)$
$$
\begin{align} p(\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma) &\propto exp(-\frac{1}{2}\boldsymbol{\mu}^TA_0\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_0}) \times exp(-\frac{1}{2}\boldsymbol{\mu}^TA_1\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_1}) \\ &= exp(-\frac{1}{2}\boldsymbol{\mu}^TA_n\boldsymbol{\mu} \ + \ \boldsymbol{\mu}^T\boldsymbol{b_n}) \\ \\ \text{where } A_n &= A_0 + A_1 = \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{b_n} &= \boldsymbol{b_0} + \boldsymbol{b_1} = \Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}} \\ \\ \rightarrow \Lambda_n^{-1} &= \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{\mu_n} &= (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}}) \end{align}
$$
3. Semiconjugate prior distribution for the covariance matrix (known mean)
์ด๋ ์๋์ ์ผ๋ก ๋ณต์กํ๋ฐ, ๊ทธ ์ด์ ๋ ์ด์ ๊ณผ ๋ฌ๋ฆฌ matrix ํํ์๋ค๊ฐ prior์ ์ฃผ์ด์ผํ๊ธฐ ๋๋ฌธ์ด๋ค.
๊ทธ๋์ ๊ตฌ์ฒด์ ์ธ prior์ likelihood๋ฅผ ์ด์ผ๊ธฐํ๊ธฐ ์์์ ํ์ํ ๋ ๊ฐ์ง ๊ฐ๋
์ ๋ํด์ ์ง๊ณ ๋์ด๊ฐ๋๋ก ํ์.
์ฒซ ๋ฒ์งธ๋ inverse-Wishart ๋ถํฌ์ด๋ฉฐ, ๋ค์์ Positive Definite์ด๋ผ๋ ์ ํ๋์ ๊ฐ๋
์ด๋ค.
3-1. inverse-Wishart Distribution
inverse-Wishart Distribution์ ๊ณต๋ถ์ฐ ํ๋ ฌ์ semiconjugate prior์ ์ฃผ๊ธฐ ์ํด์ ์์์ผ ํ๋ ํจ์์ด๋ค. ๋ฏ์ ํ๋ฅ ๋ถํฌ์ฒ๋ผ ๋ณด์ด๊ธฐ๋ ํ์ง๋ง, ์์ธํ ์ดํด๋ณด๋ฉด ์ด๋ inverse-Gamma distribution์ ๋ค์ฐจ์ ํ์ฅ ๋ฒ์ ์ ๋ถ๊ณผํ๊ธด ํ๋ค.
3-2. Positive Definite
Covariance Matrix๋ Positive Definite Matrix์ด์ด์ผ ํ๋ค. Positive Definite์ ์ ์๋ ์๋์ ๊ฐ๋ค.
$$\boldsymbol{x'}\Sigma\boldsymbol{x} > 0 \ \text{ for all vectors} \ \boldsymbol{x}$$
univariate case์์ ๋ถ์ฐ์ด ์ธ์ ๋ 0 ์ด์์ด์ด์ผ ํ๋ ๊ฒ์ฒ๋ผ, ์ด์ ๊ฐ์ ๋งฅ๋ฝ์ ์กฐ๊ฑด์ ๋ค์ฐจ์์์ ๋ง์กฑํ๋ ค๋ฉด PD(Positive Definite)์ด์ด์ผ ํ๋ค.
๋ง์ฝ ๊ณต๋ถ์ฐ ํ๋ ฌ์ด Positive Definiteํ๋ค๋ฉด, ์ด๋ ๋ชจ๋ ๋ถ์ฐ์ด 0๋ณด๋ค ํฌ๋ฉฐ ๊ณต๋ถ์ฐ์ด -1๊ณผ 1 ์ฌ์ด์ ์๋๋ก ํ๋ค.
๋ํ, PD๋ ๋์นญํ๋ ฌ(symmetric)์์ ์ ์๋๋ ๊ฐ๋
์ด๊ธฐ ๋๋ฌธ์, $\sigma_{i,j} = \sigma_{j,i}$๋ผ๋ ์กฐ๊ฑด๋ ์์ฐ์ค๋ฝ๊ฒ ์ฑ๋ฆฝํ๋ค.
3-2-1. Positive Definite์ด ๋๊ธฐ ์ํ ์กฐ๊ฑด์?
๋ค์ ํ๋ฒ ๋งํ์๋ฉด, Positive Definite์ ๋์นญํ๋ ฌ์ ํน์ํ ํํ์ด๋ฉฐ, ๋ชจ๋ eigenvalue๋ค์ด 0๋ณด๋ค ํฌ๋ค๋ ๋ง๊ณผ ๊ฐ๋ค.
์ฌ๊ธฐ์ eigenvalue๊ฐ 0๋ณด๋ค ํฌ๋ค๋ ๊ฒ์ ์ ํํ ๋ฌด์จ ์๋ฏธ์ผ๊น?
์ ๋ฐฉํ๋ ฌ์ Spectral Decomposition์ ํ์ ๋, $A = VDV^{-1}$
$$A_{p\text{ x }p} = \begin{bmatrix} | & & |\\ a_1 & \cdots & a_p\\ | & & |\\ \end{bmatrix} = \begin{bmatrix} | & & |\\ v_1 & \cdots & v_p\\ | & & | \\ \end{bmatrix} \begin{bmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_p \end{bmatrix} \begin{bmatrix} | & & |\\ v_1 & \cdots & v_p\\ | & & |\\ \end{bmatrix}^{-1}$$
eigenvalue๊ฐ 0๋ณด๋ค ํฌ๋ค๋ ๊ฒ์, ์ ํ๋ณํ์ ํ์ ๋ ๊ทธ ๊ธฐ์ ์ ๋ฐฉํฅ์ด ๋ฐ๋๋ก ๋ฐ๋์ง๋ ์๋๋ค๋ ๊ฒ์ ์๋ฏธํ๋ค.
3-3. random Covariance Matrix ๋ง๋ค๊ธฐ
์ด๋ covariance matrix์ ๋ํด uninformative prior๋ฅผ ์ฃผ๊ธฐ ์ํจ์ด๋ค.
$$\frac{1}{n}\sum_{i=1}^n\boldsymbol{z_iz_i^T} = \frac{1}{n}Z^TZ$$
$$\boldsymbol{z_iz_i^T} = \begin{pmatrix} z_{i,1}^2 & z_{i,1}z_{i,2} & \cdots & z_{i,1}z_{i,p} \\ z_{i,2}z_{i,1} & z_{i,2}^2 & \cdots & z_{i,2}z_{i,p} \\ \vdots & & & \vdots \\ z_{i,p}z_{i,1} & z_{i,p}z_{i,2} & \cdots & z_{i,p}^2 \end{pmatrix}$$
\begin{align} \frac{1}{n}\big[Z^TZ\big]_{j,j} &= \frac{1}{n}\sum_{i=1}^{n}z_{i,j}^2 = s_{j,j} = s_j^2\\ \frac{1}{n}\big[Z^TZ\big]_{j,k} &= \frac{1}{n}\sum_{i=1}^{n}z_{i,j}z_{i,k} = s_{j,k} \end{align}
์ฌ๊ธฐ์ n > p ์ด๊ณ , ๋ชจ๋ $\boldsymbol{z_i}$๋ค์ด ์๋ก ์ ํ๋
๋ฆฝ์ด๋ผ๋ฉด, $Z^TZ$๋ ํญ์ positive definite์ผ ๊ฒ์ด๋ค.
$$\text{Proof) } \boldsymbol{x}^{T}Z^{T}Z\boldsymbol{x} = (Z\boldsymbol{x})^{T}(Z\boldsymbol{x}) = ||Z\boldsymbol{x}||^2 \ge 0$$
STEP1. Set $\nu_0$(prior sample size), $\Phi_0$(prior covariance matrix)
STEP2. Sample $\boldsymbol{z_i} \ \stackrel{iid}{\sim} \ MVN(\boldsymbol{0}, \Phi_0)$
STEP3. Calculate $Z_TZ = \sum_{i=1}^{\nu_0}\boldsymbol{z_iz_i^T}$
STEP4. repeat the procedure S times generating $Z_i^TZ_i$
${Z_1^TZ_1, Z_2^TZ_2, ..., Z_S^TZ_S} \text{ ~ } Wis(\nu_0, \Phi_0)$
Wishart๋ถํฌ์ inv-Wishart๋ถํฌ ํน์ง
$$\Sigma^{-1} \sim Wis(\nu_0, S_0^{-1}) \rightarrow E[\Sigma^{-1}]=\nu_0S_0^{-1} \\ \Sigma \sim Wis^{-1}(\nu_0, S_0^{-1}) \rightarrow E[\Sigma]=\frac{1}{\nu_0-p-1}S_0$$
covariance matrix semiconjugate prior ๋ชจ์ ์ค์ ๋ฐฉ๋ฒ
1-1. If belief that $\Sigma = \Sigma_0$ is strong, $\nu_0$ \uparrow
1-2. If belief that $\Sigma = \Sigma_0$ is weak, $\nu_0 = p+2$
2. Set $S_0=(\nu_0-p-1)\Sigma_0$
3-4. Full conditional distribution of Covariance Matrix
Prior: $\Sigma \sim \text{inv-}Wis(\nu_0, S_0^{-1})$
$$p(\Sigma) = \bigg[2^{\nu_0p/2}\pi^{p/2}|S_0|^{\nu_0/2}\prod_{j=1}^{p}\Gamma(\frac{\nu_0+1-j}{2})\bigg]^{-1} \times |\Sigma|^{-(\nu_0+p+1)/2} \times exp\Big(-\frac{1}{2}tr(S_0\Sigma^{-1})\Big) $$
Likelihood: $\boldsymbol{Y}|\boldsymbol{\mu} \stackrel{iid}\sim MVN(\boldsymbol{\mu}, \Sigma)$
\begin{align} p(\boldsymbol{y_1, ..., y_n}|\boldsymbol{\mu}, \Sigma) &= (2\pi)^{-np/2}|\Sigma|^{-n/2} exp\bigg(-\frac{1}{2}\sum_{i=1}^{n} \boldsymbol{(y_i-\mu)}^T\Sigma^{-1}\boldsymbol{(y_i-\mu)} \bigg) \\ &\propto |\Sigma|^{-n/2}exp\bigg(-\frac{1}{2}tr(S_\mu\Sigma^{-1})\bigg) \end{align}
where $S_\mu = \sum_{i=1}^{n}\boldsymbol{(y_i-\mu)}\boldsymbol{(y_i-\mu)}^T$
Posterior: $\Sigma|\boldsymbol{y} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$
\begin{align} p(\Sigma|\boldsymbol{y_1, ..., y_n, \mu}) &\propto p(\Sigma)p(\boldsymbol{y_1, ..., y_n}|\boldsymbol{\mu},\Sigma) \\ &\propto|\Sigma|^{-(\nu_0+p+1)/2} \times exp\Big(-\frac{1}{2}tr(S_0\Sigma^{-1})\Big) \times |\Sigma|^{-n/2}exp\bigg(-\frac{1}{2}tr(S_\mu\Sigma^{-1})\bigg) \\ &\propto |\Sigma|^{-(\nu_0+p+n+1)/2}exp\bigg(-\frac{1}{2}tr([S_0+S_\mu]\Sigma^{-1})\bigg) \end{align}
\begin{align} E[\Sigma|\boldsymbol{y_1, ..., y_n, \mu}] &= \frac{1}{\nu_0+n-p-1}(S_0+S_\mu) \\ &= \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\frac{1}{\nu_0-p-1}S_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu \\ &= \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\Sigma_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu \end{align}
Summary
- Semiconjugate prior for
$\mu$
Prior:$\boldsymbol{\mu} \text{ ~ } MVN(\boldsymbol{\mu_0}, \Lambda_0)$
Likelihood:$Y_1, ..., Y_n|\boldsymbol{\mu},\Sigma \text{ ~ iid } MVN(\boldsymbol{\mu}, \Sigma)$
Posterior:$\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \text{ ~ } MVN(\boldsymbol{\mu_n}, \Lambda_n)$
\begin{align} \Lambda_n^{-1} &= \Lambda_0^{-1}+n\Sigma^{-1} \\ \boldsymbol{\mu_n} &= (\Lambda_0^{-1}+n\Sigma^{-1})^{-1}(\Lambda_0^{-1}\boldsymbol{\mu_0}+n\Sigma^{-1}\boldsymbol{\bar{y}}) \end{align}
- Semiconjugate prior for
$\Sigma$
Prior:$\Sigma \sim \text{inv-}Wis(\nu_0, S_0^{-1}) \text{ where } S_0 = (\nu_0-p-1)\Sigma_0$
Likelihood:$\boldsymbol{Y}|\boldsymbol{\mu} \stackrel{iid}\sim MVN(\boldsymbol{\mu}, \Sigma)$
Posterior:$\Sigma|\boldsymbol{y_1, ..., y_n} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$
$$E[\Sigma|\boldsymbol{y_1, ..., y_n, \mu}] = \frac{\nu_0-p-1}{\nu_0+n-p-1}\cdot\Sigma_0 + \frac{n}{\nu_0+n-p-1}\cdot\frac{1}{n}S_\mu$$
4. Gibbs Sampling of the mean and covariance
Gibbs sampling์ full conditional distribution์ ํตํด ์ฐจ๋ก๋๋ก ๋ชจ์๋ฅผ ์ ๋ฐ์ดํธํ๋ฉด์ joint posterior distribution์ ๊ตฌํ๋ ๊ฒ์ด ๋ชฉ์ ์ด๋ค.
STEP1. Full conditional distribution์ ํ๋ณดํ๋ค.
$\boldsymbol{\mu}|\boldsymbol{y_1}, ..., \boldsymbol{y_n}, \Sigma \sim MVN(\boldsymbol{\mu_n}, \Lambda_n)$$\Sigma|\boldsymbol{y_1, ..., y_n}, \boldsymbol{\mu} \sim \text{inv-}Wis(\nu_0+n, [S_0+S_\mu]^{-1})$
STEP2. ์ฐจ๋ก๋๋ก ์
๋ฐ์ดํธํ๋ฉด์ joint posterior distribution $\boldsymbol{\mu},\Sigma|\boldsymbol{y_1}, ..., \boldsymbol{y_n}$์ ๊ตฌํ๋ค.
4-1. NA imputation
MAR(Missing at Random)์ธ ๊ฒฝ์ฐ์, missing data๋ฅผ ์ผ์ข ์ ๋ชจ์๋ก ๋ณด๊ณ gibbs sampling์ ํตํด na imputation์ ํด์ค ์ ์๋ค.
Conclusion
MVN๋ ์ ์์๋์. inv-Wishart ๋ถํฌ๋!
ํน์ ๊ถ๊ธํ ์ ์ด๋ ์๋ชป๋ ๋ด์ฉ์ด ์๋ค๋ฉด, ๋๊ธ๋ก ์๋ ค์ฃผ์๋ฉด ์ ๊ทน ๋ฐ์ํ๋๋ก ํ๊ฒ ์ต๋๋ค.


