7.6 Conclusion & Future Work

์ด๋ฒˆ chapter์—์„œ๋Š” cooperative partially observable MARL ์ƒํ™ฉ์—์„œ ์ ์šฉ ๊ฐ€๋Šฅํ•œ Bayesian Action Decoder๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. BAD๋Š” factorized, approximate belief state๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, ์ด๋Š” agent๊ฐ€ ์ •๋ณด๋ฅผ ์ž˜ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” action๊ณผ convention์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. BAD์— ๋Œ€ํ•ด matrix game๊ณผ Hanabi์—์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์—ฌ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ , ์ด๋Š” DMARL๋ฅผ ํ†ตํ•ด communication protocol์„ ๋ฐœ๊ฒฌํ•ด์•ผํ•˜๊ณ  ์›๋ž˜ ์‚ฌ๋žŒ์—๊ฒŒ ๋งž์ถฐ ๋””์ž์ธ๋œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์ฒซ ์‹œ๋„์ž…๋‹ˆ๋‹ค. future work๋กœ๋Š” BAD๋ฅผ 2๋ช… ์ด์ƒ์ผ ๋•Œ์— ์ ์šฉํ•˜๊ณ  ์ข€๋” ์ผ๋ฐ˜ํ™” ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. belief update๊ฐ€ sampling์„ ํ•˜๋Š” ๊ณผ์ •์ด ์žˆ์œผ๋‚˜, ๋‹ค๋ฅธ ๊ตฌ์„ฑ์š”์†Œ๋“ค์€ ์›ฌ๋งŒํ•˜๋ฉด end-to-end๋กœ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋˜ BAD๋ฅผ value-based method๋กœ ํ™•์žฅํ•˜๊ณ , counterfactual gradients์— ๋Œ€ํ•œ ์—ฐ๊ด€์„ฑ์„ ์กฐ์‚ฌํ•  ๊ณ„ํš์ž…๋‹ˆ๋‹ค.

Last updated