7.5.3 Belief State Methods

DeepStack에 의해 쓰인 continual re-solving algorithm과 Libratus도 belief state space를 사용했습니다. continual re-solving BAD와 같이 player state에서 결정을 내릴 때, 연관된 현재 플레이어에 연관된 belief state를 고려하고, belief와 일치하는 모든 player state에 대해 joint policy를 생성했습니다. 그리고 player가 실제 사용하는 policy는 이 joint policy에 의해 선택됩니다. Continual re-solving도 belief update를 위해 Bayesian을 사용했는데 여기서 중요한 차이점이 있습니다. Continual re-solving은 joint policy space가 굉장히 작을 때에 belief state에 대한 업데이트를 진행합니다. 또한 belief state가 상대편의 value에 의해서도 만들어 집니다. Continual re-solving은 value-based method로, training process에 optimal play상황에서 belief의 value를 배우는 것이 포함되어 있습니다. 마지막으로 이 알고리즘은 2 player의 zero-sum game에서 적용되는 algorithm으로, joint action policy에 대한 최적의 선택을 보장하면서 독립적으로 player state value를 고려합니다.

Previous7.5.2 Research on Hanabi Next7.6 Conclusion & Future Work

Last updated 5 years ago