7.3.4 Factorized Belief Updates

모든 beliefs를 표현하는 것은 어렵습니다. 예를 들면, 카드 게임에서 모든 agent가 들고 있을 카드 패에 대해 가능한 개수를 나타내려면 exponential한 경우의 수를 가집니다. 이러한 상황을 피하기 위해, belief state에 대해 이전에 $\hat{\pi}$ 를 가정했 것 처럼 다음과 같이 근사할 수 있습니다.

$P(f^{\mathrm{pri}}_t|f^{\mathrm{pub}}_{\leq t}) \approx \prod_i P(f^{\mathrm{pri}}_t[i]|f^{\mathrm{pub}}_{\leq t})= \mathcal{B}^{\mathrm{fact}}_{t}$

이제 이 factorized belief에 대해 superscription을 없앤 채로 사용하겠습니다. 다시 카드 게임의 예를 들면, 각 요소는 카드당 확률 분포를 나타내는데, 이때 손에 든 카드와 플레이어들 사이의 카드가 독립적이라고 가정한 것과 같습니다. 이러한 가정은 다른 다루기 어려운 상황에서도 문제를 다루기 쉽게 근사하는 방법중에 하나입니다.

public belief update를 이 factorized representation과 함께 사용하기 위해, factorized likelihood term $\mathcal{L}_t[f[i]]$ 을 각 private feature에 재귀적으로 적용합니다.

$\mathcal{L}_t[f[i]] = P (u^a_{\leq{t}}|f[i],\mathcal{B}_{\leq t}, f^{\mathrm{pub}}_{\leq t } , \hat{\pi}_{\leq t})$

$\approx \mathcal{L}_{t-1}[f[i]] \cdot P(u^a_t|f[i],\mathcal{B}_t,f^{\mathrm{pub}}_t,\hat{\pi}_t)$

$= L_{t-1}[f[i]] \cdot \frac{\mathbb{E}_{f_t \sim \mathcal{B}_t}[\bm{1}(f_t[i],f[i])\bm{1}(\hat{\pi}(f^a_t),u^a_t)]}{\mathbb{E}_{f_t\sim \mathcal{B}_t}[\bm{1}(f_t[i],f[i])]}$

마지막 텀에서 보다시피 expectation으로 나타나는데 이는 sampling을 통해서 계산할 수 있고, sample 개수가 늘어날수록 정확해집니다.

Previous7.3.3 Sampling Deterministic Partial Policies Next7.3.5 Self-Consistent Beliefs

Last updated 4 years ago

Was this helpful?