This project focuses on using the multi-agent deep deterministic policy gradient (MADDPG) algorithm in a novel use case — training the ghosts in the game of Pac-Man to capture Pac-Man.
- Python 3 and higher.
- To install, run
pip install -r requirements.txt
- To begin training with GUI, run
python train.py --display
Command-line option | Purpose |
---|---|
--max-episode-len |
Maximum length of each episode (default: 100) |
--num-episodes |
Total number of training episodes (default: 200000) |
--num-adversaries |
Number of ghost agents in the environment (default: 2) |
--good-policy |
Algorithm used for Pac-Man agent (default: ddpg , options: ddpg or maddpg ) |
--adv-policy |
Algorithm used for Ghost agents (default: maddpg , options: ddpg or maddpg ) |
--lr |
Learning rate for Adam optimizer (default: 1e-2 ) |
--gamma |
Discount factor (default: 0.95 ) |
--batch-size |
Batch size (default: 1024 ) |
--save-dir |
Directory where training state and model will be saved (default: "./save_files/" ) |
--save-rate |
Model is saved every x episodes (default: 1000 ) |
--restore |
Restore training from last training checkpoint (default: False ) |
--display |
Displays the GUI (default: False ) |
--load-dir |
Directory where training state and model are loaded from (default: "" ) |
--load |
Only loads model if this is set to True (default: False ) |
--load-episode |
Loads a model tagged to a particular episode (default: 0 ) |
--layout |
Selects the game map (default: smallClassic ) |
--pacman_obs_type |
Observation space for Pac-Man agent (default: partial_obs , options: partial_obs or full_obs ) |
--ghost_obs_type |
Observation space for Ghost agents (default: full_obs , options: partial_obs or full_obs ) |
--partial_obs_range |
Range for partial observation space, if chosen (default: 3 ) e.g. 3x3, 5x5, 7x7... |
--shared_obs |
Include same features in observation spaces of both Pac-Man and Ghost agents (default: False ) |
--astarSearch |
Factor step distance between Pac-Man and Ghost into reward and observation of agents (default: False ) |
--astartAlpha |
Multiplier for penalizing/rewarding agents using increase/decrease in step distance (default: 1 ) |
This project is licensed under the MIT License - see LICENSE for details