Monte Carlo Method

Chapter 04.
Monte Carlo Approximation

๋ณธ ํฌ์ŠคํŒ…์€ First Course in Bayesian Statistical Methods๋ฅผ ์ฐธ๊ณ ํ•˜์˜€๋‹ค.

Monte Carlo Method

Monte Carlo Method๋Š” ์ด๋ฆ„์€ ๊ฑฐ์ฐฝํ•ด๋ณด์ด์ง€๋งŒ ์‚ฌ์‹ค ๊ทธ ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๋‹ค.
์šฐ์„ , ์‚ฌํ›„๋ถ„ํฌ($p(\theta|y_1,...,y_n)$)๋กœ๋ถ€ํ„ฐ S๊ฐœ์˜ random sample์„ ๋ฝ‘๋Š”๋‹ค.
$$\theta^{1}, …, \theta^{S} \ \stackrel{iid}{\sim} \ p(\theta|y_1, …, y_n) $$
๊ทธ๋Ÿฌ๋ฉด S๊ฐ€ ์ปค์งˆ์ˆ˜๋ก {$\theta^{1}, ..., \theta^{S}$}๋Š” ๊ทผ์‚ฌ์ ์œผ๋กœ ์‚ฌํ›„๋ถ„ํฌ($p(\theta|y_1,...,y_n)$)๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด $E[\theta|y_1, ..., y_n]$, $Var[\theta|y_1, ..., y_n]$๋ถ€ํ„ฐ ์ค‘์•™๊ฐ’, $\alpha$ percentile ๋“ฑ์˜ ํ†ต๊ณ„๋Ÿ‰๊ฐ’๋“ค์„ ๊ทผ์‚ฌ์ ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด๋•Œ approximate Monte Carlo Standard error์€ $\sqrt{\hat{\sigma}^2/S}$์ด๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ€๋ น $E[\theta|y_1, ..., y_n]$์™€ Monte Carlo ์ถ”์ •์น˜์˜ ์ฐจ์ด๊ฐ€ 0.01์ดํ•˜๋กœ ํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•œ๋‹ค๋ฉด, Monte Carlo Sample Size๋ฅผ ์กฐ์ •ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ์ด๋•Œ ์˜ˆ๋ฅผ ๋“ค์–ด์„œ $\hat{\sigma}^2$๊ฐ€ 0.024๋ผ๊ณ  ํ•œ๋‹ค๋ฉด, Sample Size๋Š” $2\sqrt{0.024/S} < 0.01$๋กœ ๊ณ„์‚ฐํ•ด์„œ sample์„ 960๊ฐœ๋ณด๋‹ค๋Š” ๋งŽ์ด ๋ฝ‘์•„์•ผ ํ•จ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

Monte Carlo Method๋ฅผ ํ™œ์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ๊ฒƒ๋“ค์„ ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๊ทธ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋กœ ์•„๋ž˜ ์„ธ ๊ฐœ๋ฅผ ์ดํ•ดํ•ด๋ณด์ž.

1. Posterior Inference for Arbitrary Functions

$\theta$ ๊ทธ ์ž์ฒด๊ฐ€ ์•„๋‹ˆ๋ผ ์ž„์˜์˜ $f(\theta)$์˜ posterior distribution์ด ๊ถ๊ธˆํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ, log odds์™€ ๊ฐ™์€ ๊ฒƒ ๋ง์ด๋‹ค. ํ•˜์ง€๋งŒ ๊ฒฐ๊ตญ $\gamma = f(\theta)$๋„ $\theta$์ฒ˜๋Ÿผ Monte Carlo Method๋กœ ์‚ฌํ›„๋ถ„ํฌ์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํ•˜๋‚˜์˜ parameter๋งŒ ์žˆ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ์‹ฌ์ง€์–ด $Pr(\theta_1 > \theta_2 | Y_{1,1} = y_{1,1}, ..., Y_{n_2,2}=y_{n_2,2})$๋‚˜ $Pr(\theta_1/\theta_2 | Y_{1,1} = y_{1,1}, ..., Y_{n_2,2}=y_{n_2,2})$์ฒ˜๋Ÿผ parameter๊ฐ€ ๋‘ ๊ฐœ์ธ ๊ฒฝ์šฐ๋„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

2. Sampling from Predictive Distributions

Step1. sample $\theta^{(1)},...,\theta^{(S)} \text{ ~ i.i.d} \ p(\theta|y_1,...,y_n)$
Step2. approximate $p(\tilde{y}|y_1,...,.y_n)$ with $\sum_{s=1}^{S}p(\tilde{y}|\theta^{(s)})/S$

์œ„ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด $Pr(\tilde{Y_1}>\tilde{Y_2}|\sum Y_{i,1}=217,\sum Y_{i,2}=66)$๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค. ์™œ๋ƒํ•˜๋ฉด $Pr(\tilde{Y_1})$์™€ $Pr(\tilde{Y_2})$์€ posterior independentํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(1)
a<-2 ; b<-1
sy1<-217 ;  n1<-111
sy2<-66  ;  n2<-44

theta1.mc<-rgamma(10000,a+sy1, b+n1)
theta2.mc<-rgamma(10000,a+sy2, b+n2)

y1.mc<-rpois(10000,theta1.mc)
y2.mc<-rpois(10000,theta2.mc)

mean(theta1.mc>theta2.mc)
## [1] 0.9708
1
mean(y1.mc>y2.mc)
## [1] 0.4846

์œ„ ๊ฒฐ๊ณผ๋ฅผ ํ†ตํ•ด์„œ ์ฃผ์˜๊นŠ๊ฒŒ ์‚ดํŽด๋ณด์•„์•ผ ํ•  ๊ฒƒ์€, ์˜ˆ์ธก์น˜์˜ ์ฐจ์ด์™€ ๋ชจ์ˆ˜์˜ ์ฐจ์ด๊ฐ€ ๊ฐ™์ง€ ์•Š๋‹ค๋Š” ์ ์ด๋‹ค.

์•„๋ž˜ ์„ธ ๊ฐœ ๊ตฌ๋ถ„ํ•˜๊ธฐ
  • (1) sampling model: $Pr(\tilde{Y}=\tilde{y}|\theta)$
  • (2) prior predictive model: $Pr(\tilde{Y}=\tilde{y})$
    • $\theta$์— ๋Œ€ํ•œ ์‚ฌ์ „ํ™•๋ฅ ๋ถ„ํฌ๊ฐ€ ๊ด€์ธก๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ $\tilde{Y}$์— ๋Œ€ํ•ด ํ•ฉ๋ฆฌ์ ์ธ ๋ฏฟ์Œ์„ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๋Š” ์šฉ๋„๋กœ ํ™œ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค.
  • (3) posterior predictive model: $Pr(\tilde{Y}=\tilde{y}|Y_1=y_1, ..., Y_n=y_n)$

3. Posterior Predictive Model Checking

We should at least make sure that our model generates predictive datasets $\tilde{Y}$ that resemble the observed dataset in terms of features that are of interest

1
2
3
load("gss.RData")

table(gss$DEG[gss$YEAR==1998])
## 
##    0    1    2    3    4 
##  430 1500  209  478  205
1
2
3
4
5
6
y1<-gss$PRAYER[gss$YEAR==1998 & gss$RELIG==1 ]
y1<-1*(y1==1)
y1<-y1[!is.na(y1) ]
sy1<-sum(y1)
n1<-length(y1)
sy1/n1
## [1] 0.3616803
1
2
3
4
5
6
y2<-gss$PRAYER[gss$YEAR==1998 & gss$RELIG!=1 ]
y2<-1*(y2==1)
y2<-y2[!is.na(y2) ]
sy2<-sum(y2)
n2<-length(y2)
sy2/n2
## [1] 0.5471464
1
table(gss$FEMALE[gss$YEAR==1998])
## 
##    0    1 
## 1232 1600
1
2
3
4
5
6
y<-gss$FEMALE[gss$YEAR==1998]
y<-1*(y==1)
y<-y[!is.na(y) ]
sy<-sum(y)
n<-length(y)
sy/n
## [1] 0.5649718
1
2
3
sy<-sy2
n<-n2
sy/n
## [1] 0.5471464
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#### MC approximations
set.seed(1)

a<-1 ; b<-1 
theta.prior.sim<-rbeta(10000,a,b)
gamma.prior.sim<- log( theta.prior.sim/(1-theta.prior.sim) )

n0<-860-441 ; n1<-441
theta.post.sim<-rbeta(10000,a+n1,b+n0)
gamma.post.sim<- log( theta.post.sim/(1-theta.post.sim) )

par(mar=c(3,3,1,1),mgp=c(1.75,.75,0))
par(mfrow=c(2,3))
par(cex=.8)

par(mfrow=c(1,2),mar=c(3,3,1,1), mgp=c(1.75,.75,.0))
plot(density(gamma.prior.sim,adj=2),xlim=c(-5,5),main="", 
     xlab=expression(gamma),
     ylab=expression(italic(p(gamma))),col="gray")
plot(density(gamma.post.sim,adj=2),xlim=c(-5,5),main="",xlab=expression(gamma),
     ylab=expression(paste(italic("p("),gamma,"|",y[1],"...",y[n],")",
                           sep="")) )
lines(density(gamma.prior.sim,adj=2),col="gray")

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
set.seed(1)
a<-2 ; b<-1
sy1<-217 ;  n1<-111
sy2<-66  ;  n2<-44

theta1.mc<-rgamma(10000,a+sy1, b+n1)
theta2.mc<-rgamma(10000,a+sy2, b+n2)

y1.mc<-rpois(10000,theta1.mc)
y2.mc<-rpois(10000,theta2.mc)

#### Posterior predictive check 
y1<-gss$CHILDS[gss$FEMALE==1 &  gss$YEAR>=1990  & gss$AGE==40 & gss$DEG<3 ]
y1<-y1[!is.na(y1)]

set.seed(1)

a<-2 ; b<-1
t.mc<-NULL
for(s in 1:10000) {
  theta1<-rgamma(1,a+sum(y1), b+length(y1))
  y1.mc<-rpois(length(y1),theta1)
  t.mc<-c(t.mc,sum(y1.mc==2)/sum(y1.mc==1))
}

t.obs<-sum(y1==2)/sum(y1==1)
mean(t.mc>=t.obs)
## [1] 0.0049
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
par(mar=c(3,3,1,1),mgp=c(1.75,.75,0))
par(mfrow=c(1,2))

ecdf<-(table(c(y1,0:9))-1 )/sum(table(y1))
#ecdf.mc<-(table(c(y1.mc,0:9))-1 )/sum(table(y1.mc))
ecdf.mc<- dnbinom(0:9,size=a+sum(y1),mu=(a+sum(y1))/(b+length(y1)))
plot(0:9+.1,ecdf.mc,type="h",lwd=5,xlab="number of children",
     ylab=expression(paste("Pr(",italic(Y[i]==y[i]),")",sep="")),col="gray",
     ylim=c(0,.35))
points(0:9-.1, ecdf,lwd=5,col="black",type="h")

legend(1.8,.35,
       legend=c("empirical distribution","predictive distribution"),
       lwd=c(2,2),col=
         c("black","gray"),bty="n",cex=.8)


hist(t.mc,prob=T,main="",ylab="",xlab=expression(t(tilde(Y))) )

segments(t.obs,0,t.obs,.25,col="black",lwd=3)

์‹ค์ œ(empirical) ๋ถ„ํฌ์™€ ์˜ˆ์ธก ๋ถ„ํฌ์˜ ๋ชจ์Šต์ด ๋‹ค์†Œ ์ฐจ์ด๊ฐ€ ๋‚จ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฐธ๊ณ 

[1] FCB code



ํ˜น์‹œ ๊ถ๊ธˆํ•œ ์ ์ด๋‚˜ ์ž˜๋ชป๋œ ๋‚ด์šฉ์ด ์žˆ๋‹ค๋ฉด, ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๋ฉด ์ ๊ทน ๋ฐ˜์˜ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.