4.4 Common Knowledge

binary mask $\mu^a$ 에 대한 중요한 특성은 agent $a$ 가 entity $e$ 를 볼 수 있느냐에만 달려 있다는 점입니다. 그렇기 때문에 만약 agent $a$ 의 모든 mask $\mu^a$ 가 common knowledge라면, 다른 agent $b$ 가 $a$ 를 볼 수 있다면 $a$ 와 그가 볼 수 있는 entities $e$ 에 대해 $a,e \in \mathcal{M}^b_s$ 임을 알 수 있습니다.

한 그룹이 알고있는 Mutual knowledge $\mathcal{M}^\mathcal{G}_s ,\mathcal{G}\subseteq \mathcal{A}$ 라고 정의할 때 이는 다음과 같이 표현할 수 있습니다.

$\mathcal{M}^\mathcal{G}_s = \cap_{a \in \mathcal{G}} \mathcal{M}^\mathcal{a}_s$

그러나 이러한 mutual knowledge는 common knowledge를 뜻하지 않습니다. 그저 다같이 알고있는 entities이기 때문입니다. 대신 한 group의 common knowledge는 $\mathcal{L}^\mathcal{G}_s$ 로 표현하는데, 이는 정의대로 group $\mathcal{G}$ 가 $\mathcal{L}^\mathcal{G}_s$ 를 알고, $\mathcal{G}$ 모두가 $\mathcal{L}^\mathcal{G}_s$ 를 안다는 사실을 아는 정보입니다.

agent $a$ 가 다른 agent $b$ 도 $e \in\xi$ 를 보고있는지 알기 위해선, agent $a$ 도 $b$ 를 봐야하고 $b$ 가 $e$ 를 보고있는지 알아야하는데 이는 다음과 같이 표기할 수 있습니다.

$\mu^a(s^a,s^b) \wedge \mu^b(s^b,s^e)= \top$

$\mathcal{L}^\mathcal{G}_s$ 는 다음과 같이 재귀적인 형태로 표현도 가능합니다. 모든 agent $a$ 가 그룹 $\mathcal{G}$ 에 속한다면, 다음과 같이 표현 가능합니다.

$\mathcal{L}^{\mathcal{G}}_s = \lim_{m\rightarrow \infty}{\mathcal{L}^{a,m}_s}, \mathcal{L}^{a,0}_s = \mathcal{M}^a_s$

$\mathcal{L}^{a,m}_s =\bigcap_{b\in\mathcal{G}}\{e\in \mathcal{L}^{b,m-1}_s|\mu^a(s^a,s^b)\} \cdots (4.4.2)$

이는 iteration m=0일 때, agent $a$ 에 대한 mutual knowledge $\mathcal{M}^a_s$ 는 $\mathcal{L}^{a,0}_s$ 와 같습니다. 스스로 가지고 있는 mutual knowledge는 스스로의 common knowledge가 됩니다.

m = 1로 iteration을 진행해보겠습니다.

$\mathcal{L}^{a,1}_s =\bigcap_{b\in\mathcal{G}}\{e\in \mathcal{L}^{b,0}_s|\mu^a(s^a,s^b)\}$

$=\bigcap_{b\in\mathcal{G}}\{e\in \mathcal{M}^{b}_s|\mu^a(s^a,s^b)\}$

이를 해석하면, a가 observe할 때, a가 보는 b에 관한 entities에 대해 mutual knowledge $\mathcal{M}^b_s$ 에 포함되고, 이것에 대한 그룹 전체의 교집합을 하게 되면, $\mathcal{L}^{a,1}_s$ 는 그룹 내 전체가 아는 entities라는 것을 알 수 있습니다.

m = 2로 iteration을 더 진행해 보겠습니다.

$\mathcal{L}^{a,2}_s =\bigcap_{b\in\mathcal{G}}\{e\in \mathcal{L}^{b,1}_s|\mu^a(s^a,s^b)\}$

결국, agent가 a가 봤을 때, 모든 agent가 $\mathcal{L}^{a,1}_s$ 를 알고 있다는 사실은 상대방이 알고있다는걸 알고있는 상태를 의미합니다. 이를 무한대로 보내는 행위는 위의 행위를 반복하는 것은 mutual knowledge $\mathcal{M}^b_s$ 가 common knowledge $\mathcal{L}^\mathcal{G}_s$ 가 된다는 것을 알 수 있습니다.

Lemma 4.4.1

만약 모든 mask $\mu$ 가 모든 agent에게 알려져있다면, common knowledge $\mathcal{L}^\mathcal{G}_s$ 는 다음과 같이 정의할 수 있습니다.

$\mathcal{L}^\mathcal{G}_s= \left\{\begin{matrix} \mathcal{M}^\mathcal{G}_s,\mathrm{if} \wedge_{a,b\in \mathcal{G}}\mu^a(s^a,s^b)\\ \emptyset\ \ \ ,\ \ \ \ \ \ \ \ \ \ \ \ \mathrm{otherwise} \end{matrix}\right.$

이를 해석해보면, 각 agent가 다른 모든 agent를 관찰했을 때에 공통적으로 아는 mutual knowledge $\mathcal{M}^b_s$ 에 대해 common knowledge $\mathcal{L}^\mathcal{G}_s$ 로 표현할 수 있다는 뜻입니다.

위의 재귀적인 표현에서는 공집합에대해 rough하게 보여줬지만 여기서는 좀 더 엄격하게 나타내었다고 볼 수 있습니다.

이를 정의하기 위해 (4.4.2)를 보겠습니다. 이는 재귀의 $\mathcal{L}^{a,0}_s = \mathcal{M}^a_s$ 부터 시작한다면, 아래와 같음을 알 수 있습니다.

$\mathcal{L}^{a,1}_s= \left\{\begin{matrix} \mathcal{M}^\mathcal{G}_s,\mathrm{if} \wedge_{b\in \mathcal{G}}\mu^a(s^a,s^b)\\ \emptyset,\mathrm{otherwise} \end{matrix}\right.$

이 때, 그룹내의 mutual knowledge $\mathcal{M}^{\mathcal{G}}_s$ 는 귀납적으로 몇번의 iteration 후에 $\mathcal{L}^{c,m}_s = \mathcal{M}^{\mathcal{G}}_s$ 이 됨을 볼건데, 이전에 mutual knowledge가 common knowledge가 되는 것은 2번임을 보았습니다. 이는 다음을 통해 수식화 가능합니다.

$\mathcal{L}^{a,m+2}_s = \{e\in\xi|\bigwedge_{b\in\mathcal{G}}(\mu^a(s^a,s^b)\wedge \bigwedge_{c\in\mathcal{G}}(\mu^b(s^b,s^c)\wedge e\in \mathcal{L}^{c,m}_s))\}$

$= \{ e \in \mathcal{M}^\mathcal{G}_s| \wedge_{b,c\in\mathcal{G}}\mu^b(s^b,s^c)\} = \mathcal{L}^\mathcal{G}_s$

그러므로 그룹내의 어느 agent의 knowledge로 부터 시작하던지, agent는 모든 agent는 서로를 볼 수 있습니다.

Common Knowledge는 그룹 내의 모든 agent에 대해 오직 볼 수 있는 mutual knowledge에서만 계산되어 얻어질 수 있습니다. policy에 의해 선택된 action은 그 자체로 Common knowledge로 볼 수 있는데, 이는 오직 common knowledge와 그를 랜덤으로 선택할때의 seed에 대한 rule common knowledge에만 의존합니다.

그룹 내에서 시간에 따른 common knowledge는 이전의 trajectories $\tau_0$ 부터 최근 관측한 trajectory $\tau^\mathcal{G}_t = (\tau_0,o^\mathcal{G}_1,\bold{u}^\mathcal{G}_1,...,o^\mathcal{G}_t,\bold{u}^\mathcal{G}_t)$ 와 $o^\mathcal{G}_k = \{s^e_k | e \in \mathcal{L}^{\mathcal{G}}_{s_k}\}$ 가 있습니다. 모든 masks $\mu^a$ 를 아는 것은 $\tau^{\mathcal{G}}_t = \mathcal{L}^\mathcal{G}(\tau^a_t)$ 를 $\tau^a_t = (\tau_0,o^a_1,...,o^a_t)$ 를 통해 도출할 수 있는 것을 의미합니다. 그렇기 때문에 $\tau^{\mathcal{G}}$ 에 의해 조건화되는 모든 함수는 그룹내의 agent에 의해 독립적으로 계산될 수 있습니다.

Previous4.3 Dec-POMDP and Features Next4.5 Multi-Agent Common Knowledge Reinforcement Learning

Last updated 4 years ago

Was this helpful?