MADDPG-PyTorch
PyTorch Implementation of MADDPG from Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (Lowe et. al. 2017)
Requirements
- OpenAI baselines, commit hash: 98257ef8c9bd23a24a330731ae54ed086d9ce4a7
- My fork of Multi-agent Particle Environments
- PyTorch, version: 0.3.0.post4
- OpenAI Gym, version: 0.9.4
- Tensorboard, version: 0.4.0rc3 and Tensorboard-Pytorch, version: 1.0 (for logging)
The versions are just what I used and not necessarily strict requirements.
How to Run
All training code is contained within main.py
. To view options simply run:
python main.py --help
Results
Physical Deception
In this task, the two blue agents are rewarded by minimizing the closest of their distances to the green landmark (only one needs to be close to get optimal reward), while maximizing the distance of the red adversary from the green landmark. The red adversary is rewarded by minimizing it's distance to the green landmark; however, on any given trial, it does not know which landmark is green, so it must follow the blue agents. As such, the blue agents should learn to deceive the red agent by covering both landmarks.
Cooperative Communication
This task involves two agents, one that is stationary and one that can move. The stationary agent sees the color of the other agent as its observation, and outputs a one-hot communication vector as its action. The moving agent receives the communication vector, as well as its relative distance to all landmarks on the screen; however, it does not know its own color. The goal of both agents is for the moving agent to reach the landmark that matches its own color. Thus, the agents must learn to communicate such that the moving agent knows where to go on each randomized trial.
Predator-Prey
This task involves a single prey agent (in green) and a team of three predators (in red). The prey agent is 30% faster than the predators, so the predators must learn how to team up in order to catch the prey.
In the trials below, the prey agent uses DDPG as its learning algorithm.
Not Implemented
There are a few items from the paper that have not been implemented in this repo
- Ensemble Training
- Inferring other agents' policies
- Mixed continuous/discrete action spaces
Acknowledgements
The OpenAI baselines Tensorflow implementation and Ilya Kostrikov's Pytorch implementation of DDPG were used as references. After the majority of this codebase was complete, OpenAI released their code for MADDPG, and I made some tweaks to this repo to reflect some of the details in their implementation (e.g. gradient norm clipping and policy regularization).