8.3 Methods

์ด๋ฒˆ Section์—์„œ๋Š” naive learner๋กœ ์‹œ์ž‘ํ•ด์„œ LOLA๊นŒ์ง€ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. agent๊ฐ€ Expected discounted reward sum์˜ ์ •ํ™•ํ•œ hessian์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์— ๋Œ€ํ•ด 8.3.2์—์„œ ์ •์˜ํ•˜๊ณ , ๊ทธ๋ ‡์ง€ ๋ชปํ•  ๋•Œ์— ๋Œ€ํ•ด 8.3.3์—์„œ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. 8.3.4์—์„  opponent์˜ modeling์— ๋Œ€ํ•ด ๋ชจ๋ฅผ ๋•Œ๋ฅผ ๊ฐ€์ •ํ•œ ์ƒํ™ฉ์— ๋Œ€ํ•ด ์ •์˜ํ•˜๊ณ , 8.3.5์—์„œ๋Š” ์ƒ๋Œ€์˜ Expected discounted reward sum์„ 1์ฐจ ๊ทผ์‚ฌ๊ฐ€ ์•„๋‹Œ ๋” ์ •ํ™•ํ•˜๊ฒŒ ๊ทผ์‚ฌํ–ˆ์„ ๋•Œ์— ๋Œ€ํ•ด ์–˜๊ธฐํ•ฉ๋‹ˆ๋‹ค.

Last updated