2.3 Centralized vs Decentralized Control

Centralized Control

์ „์ฒด๋ฅผ ๊ด€์ธกํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ(fully observable)์—์„œ Multi Agent๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ๋ณด๋‹ค, ์ „์ฒด๋ฅผ ์ด๊ด„ํ•˜๋Š” ํ•˜๋‚˜์˜ Agent(centralized controller) ฯ€C(uโˆฃst)\pi^C(\bold{u}|s_t)๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ์ƒ๊ฐํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉด,

ฯ€C(uโˆฃst):Uร—Sโ†’[0,1] \pi^C(\bold{u}|s_t):\bold{U}\times S \rightarrow [0,1]

๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Š” ๊ทผ๋ณธ์ ์œผ๋กœ ํฐ ๋ฌธ์ œ์  ๋‘๊ฐ€์ง€๊ฐ€ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.

  • joint action space U\bold{U}๋Š” agent๋“ค์˜ action์ด combinatorialํ•˜๊ฒŒ ๊ฒฐํ•ฉ๋œ ํ˜•ํƒœ์ž…๋‹ˆ๋‹ค.

    • P(u1โˆฃs1)โ‹…P(u2โˆฃs2)โ‹…...โ‹…P(unโˆฃsn) P(u^1|s^1)\cdot P(u^2|s^2) \cdot ...\cdot P(u^n|s^n)

    • ์ด๋Š” agent์˜ action space์˜ exponentialํ•œ ์ฆ๊ฐ€๋ฅผ ์˜๋ฏธํ•˜๋ฏ€๋กœ ํ™•์žฅ์„ฑ์—์„œ ๊ต‰์žฅํ•œ ์ œ์•ฝ์ด ๋ฉ๋‹ˆ๋‹ค.

  • local observation ์ƒํ™ฉ์—์„œ์˜ ์ ์šฉ์ด ๋ถˆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ๋‹ค์–‘ํ•œ ์ƒํ™ฉ์—์„œ agent์˜ observation์€ ์ œํ•œ๋˜๋Š” ์ƒํ™ฉ์ด ์˜ค๋Š”๋ฐ, ์ด๋Š” centralized controller์˜ ์ ์šฉ์ด ๋ถˆ๊ฐ€ํ•จ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

Decentralized Control

Agent๊ฐ์ž local policy๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, centralized control์˜ action space์— ๋Œ€ํ•œ ๋‹จ์ ์„ ๊ทน๋ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋•Œ ํ•œ agent์˜ policy๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ฯ€a(uaโˆฃst)\pi^a(u^a|s_t)

๊ทธ๋ ‡๋‹ค๋ฉด, ์ „์ฒด joint-action์— ๋Œ€ํ•œ ํ™•๋ฅ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

P(uโˆฃst)=โˆaฯ€a(uaโˆฃst)P(\bold{u}|s_t) = \prod_a{\pi^a(u^a|s_t)}

Last updated