3. Counterfactual Multi-Agent Policy Gradients | Deep Multi-Agent Reinforcement Learning