9.2.2 Surrogate Losses

cost node์˜ estimate gradients๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ Schulman์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ surrogate loss๋ฅผ ์ œ์•ˆํ–ˆ์Šต๋‹ˆ๋‹ค.

SL(ฮ˜,S)=โˆ‘wโˆˆSlogโกp(wโˆฃDEPSw)Q^w+โˆ‘cโˆˆCc(DEPSc) {SL}(\Theta,\mathcal{S}) = \sum_{w\in\mathcal{S}}\log p(w|\mathrm{DEPS}_w)\hat{Q}_w + \sum_{c\in\mathcal{C}}c(\mathrm{DEPS}_c)

DEPSw\mathrm{DEPS_w}๋Š” ww์— deterministicํ•˜๊ฒŒ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” stochastic nodes๋‚˜ input nodes์ž…๋‹ˆ๋‹ค.

Q^w \hat{Q}_w๋Š” ww์— ์˜ํ–ฅ์„ ๋ฐ›๋Š” sampled costs c^ \hat{c}์˜ ํ•ฉ์ž…๋‹ˆ๋‹ค.

SL์€ ํ•œ๋ฒˆ ๋ฏธ๋ถ„๋์„ ๋•Œ, ๋‹ค์Œ๊ณผ ๊ฐ™์€ gradient estimator๋กœ ํ‘œํ˜„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

โˆ‡ฮธL=E[โˆ‡ฮธSL(ฮ˜,S)] \nabla_{\theta}\mathcal{L} = \mathbb{E}[\nabla_{\theta}\mathrm{SL}(\Theta,\mathcal{S})]

์ด ๋•Œ, Q^w\hat{Q}_w๋Š” SL์•ˆ์—์„œ ๊ณ ์ •๋œ sample๋กœ ์ทจ๊ธ‰๋˜๋ฏ€๋กœ,(ฮธ\theta์— ๋Œ€ํ•ด ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค.) ์ด๋Š” ๊ธฐ์กด SCG์—์„œ์˜ ์˜์กด์„ฑ์ด ๋ถ„๋ฆฌ๋ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ๋˜๋ฉด, SL์˜ 1์ฐจ gradient๊ฐ€ logโก(p)โˆ‡ฮธQ\log(p)\nabla_{\theta}Q์˜ form์„ ํฌํ•จํ•˜์ง€ ์•Š๊ณ ๋„ score function estimator๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

๋งจ ์œ„์˜ SLSL์— ๋Œ€ํ•œ ์ •์˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ฆ๋ช…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.(9.2์—์„œ ์‚ฌ์šฉํ•œ ํ…Œํฌ๋‹‰๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.)

Last updated

Was this helpful?