9.4.3 First Order Variance Reduction

1์ฐจ ๋ฏธ๋ถ„์—์„œ์˜ variance๋ฅผ ์ค„์ด๋Š” ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด๋Š” magicbox์˜ ํŠน์„ฑ์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

Lโ–ก=โˆ‘cโˆˆC(Wc)c+โˆ‘wโˆˆS(1โˆ’โ–ก({w}))bw \mathcal{L}_\square = \sum_{c\in\mathcal{C}}(\mathcal{W}_c)c + \sum_{w\in\mathcal{S}}(1-\square(\{w\}))b_w

๋’ค์˜ term์˜ bwb_w๋Š” ์–ด๋Š stochastic nodes์—๋„ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š” ํ‰๊ฐ€ ์ง€ํ‘œ๊ฐ€ ๋˜๋Š” ํ•จ์ˆ˜๋กœ, ์ฃผ๋กœ Lโ–ก\mathcal{L}_\square์˜ ํ‰๊ท ์ •๋„๋กœ ์ •ํ•ด์ง‘๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด 1โˆ’โ–ก(โ‹…)โ†’01-\square(\cdot) \rightarrow 0 ์ด ๋˜๋ฏ€๋กœ bias์— ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š๊ฒŒ๋˜๋ฉด์„œ variance๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ์ข‹์€ ํ…Œํฌ๋‹‰์ด ๋ฉ๋‹ˆ๋‹ค.

Last updated