7.5.1 Learning to Communicate

์ด ์—ฐ๊ตฌ ์ด์ „์—๋„ agent๋ผ๋ฆฌ ํ˜‘๋ ฅํ•˜๊ธฐ ์œ„ํ•ด communicationํ•ด์•ผํ•˜๋Š” ์ƒํ™ฉ์— ๋Œ€ํ•œ ๋งŽ์€ ์—ฐ๊ตฌ๊ฐ€ ์žˆ์—ˆ์œผ๋‚˜ ์ด๋Š” toy problem์— ํ•œ์ •๋˜์–ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฐ ์—ฐ๊ตฌ๋“ค์€ ํŠนํžˆ ์ฃผ๋กœ cheap-talk communication channel์„ ์‚ฌ์šฉํ•˜์˜€๋Š”๋ฐ, ์ด๋Š” Chapter 6์— ๋‚˜์˜จ RIAL๊ณผ DIAL๋„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฒˆ ์—ฐ๊ตฌ์—์„œ๋Š” cheap-talk channel์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์‹ค์ œ environment์™€ ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” action์„ ๊ด€์ฐฐํ•˜๋ฉด์„œ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋งž์ถฅ๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „์— ํ•ด๊ฒฐํ•˜๋ คํ–ˆ๋˜ 'hat game'์ด๋ผ๋Š” ๋ฌธ์ œ์˜ ์„ค์ •๊ณผ ๋น„์Šทํ•œ๋ฐ, ์ด๋Š” Bayesian beliefs๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ DRQN์˜ ์ ์šฉ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ คํ•œ ์—ฐ๊ตฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ Nayyar์˜ ์—ฐ๊ตฌ์—์„œ๋Š” common information์ด๋ผ๋Š” ์•„์ด๋””์–ด๋Š” ์‚ฌ์šฉํ–ˆ์œผ๋‚˜, ์‹ค์ œ high-dimensionalํ•œ ์ƒํ™ฉ์—์„œ์˜ ์ ์šฉ์— ๋Œ€ํ•œ method๋ฅผ ๋‚ด๋†“์ง„ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค.

Last updated