7.5.2 Research on Hanabi

์ˆ˜๋งŽ์€ Hanabi์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Baffier๋Š” ์ด Hanabi์—์„œ์˜ ์ตœ์ ์˜ ์ „๋žต์€ ์ž๊ธฐ๊ฐ€ ์ž์‹ ์˜ ํŒจ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•ด๋„ NP-hard์ž„์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค. hat game๊ณผ ๋น„์Šทํ•˜๊ฒŒ encodingํ•˜๋Š” ๋ฐฉ์‹์€ ๊ธฐ๋ณธ์ ์œผ๋กœ 5์ธ๋„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, 2์ธ์šฉ์—์„œ๋Š” 17.8์ ๋ฐ–์— ์–ป์ง€ ๋ชปํ–ˆ์Šต๋‹ˆ๋‹ค. Walton์˜ ์—ฐ๊ตฌ๋Š” Monte-Carlo tree search์™€ rule based method๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, BAD๋ณด๋‹ค 50% ๋‚ฎ์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ , Osawa๋„ heuristicํ•œ rule์„ ๋งŒ๋“ค์–ด ํ•ด๊ฒฐํ•˜๋ คํ–ˆ์œผ๋‚˜ ์ด๋Š” ๊ฒฐ๊ตญ heuristicํ•œ ๋ฐฉ๋ฒ•์ด๊ณ , ๊ฒฐ๊ตญ BAD๋ณด๋‹ค ์ข‹์ง€์•Š์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค.

BAD 2์ธ์šฉ Hanabi์—์„œ ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์€ agent์— ๋Œ€ํ•ด SmartBot์œผ๋กœ ์ด๋ฆ„๋ถ™์˜€๋Š”๋ฐ ์ด๋Š” ๋งํฌ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ‰๊ท ์ ์œผ๋กœ 23.09์ ์„ ์–ป๋Š” SOTA๋ฅผ ๋‹ฌ์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

Last updated