9.5.1 Empirical Verification

์ฒซ๋ฒˆ์งธ๋กœ ์‹คํ—˜ํ•œ ๊ฒƒ์€ DiCE๊ฐ€ SCG์—์„œ gradient์™€ hessian์„ ์ž˜ ์ฐพ์•„๋‚ด๋Š”๊ฐ€์— ๋Œ€ํ•œ ์‹คํ—˜์ด์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ์‹คํ—˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ IPDํ™˜๊ฒฝ์—์„œ fixed policies๋ฅผ ๊ฐ€์ง€๊ณ  ์ง„ํ–‰์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์œ„์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด, (a)๋Š” gradient, (b)๋Š” hessian ์ž…๋‹ˆ๋‹ค. red๊ฐ€ true value, green์ด estimated value์ธ๋ฐ, ๊ฑฐ์˜ ์ผ์น˜ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋˜ํ•œ ์œ„์˜ (a)๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด, value function์ด ์ •๋ฐ€ํ•ด์งˆ์ˆ˜๋ก gradient ์ถ”์ •๊ฐ’์ด ์–ด๋–ป๊ฒŒ ์ •ํ™•ํ•ด์ง€๋Š”์ง€ ๋ณด์ž…๋‹ˆ๋‹ค. value function์˜ variance์ด ์ž‘์•„์งˆ์ˆ˜๋ก, baseline ์ถ”์ • ์˜ค์ฐจ์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋กœ์จ gradient estimator์™€ correlation์ด ๋†’์•„์ง€๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (b)์—์„  sample size์™€ baseline์— ๋Œ€ํ•ด ์‹คํ—˜ํ–ˆ๋Š”๋ฐ, baseline์„ ์ ์šฉํ•œ ๊ฒƒ์ด sample size๊ฐ€ ์ ์–ด๋„ ๊ต‰์žฅํžˆ ํšจ์œจ์ ์ž„์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋‘ ์‹คํ—˜ ๋ชจ๋‘ baseline์ด DiCE์˜ ์•„์ฃผ ์ค‘์š”ํ•œ ์š”์†Œ์ž„์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Last updated