Skip to content

Reproduce our Results

Joery edited this page Jan 6, 2021 · 6 revisions

This page describes how to reproduce the MountainCar visualization figures along with the numerical results shown for both MountainCar and CartPole.

Generating the data

Generating the data for our results should be quite straightforward, we provided all neccesary JSON configuration files within publish/configs.

CartPole: For the CartPole ablation analysis, you can run:
python Main.py experiment -c CartpoleAblations.json --gpu -1
Before you do this, however, you may want to alter the n_jobs flag in this file for hyperthreading.

AlphaZero: To get the results for AlphaZero, we manually ran experiments n times using:
python Main.py train -c AlphazeroCartpole.json --game gym_CartPole-v1 --gpu -1 --debug
or
python Main.py train -c AlphazeroCartpole.json --game gym_MountainCar-v0 --gpu -1 --debug

MountainCar We also did this ad-hoc for the MountainCar agents, which can be run with:
python Main.py train -c MuzeroMC.json --game gym_MountainCar-v0 --gpu -1 --debug
using either MuzeroMC.json, MuzeroMC_d.json, or MuzeroMC_kl.json for altering between the transition function regularization terms. We ran these three configurations each two times, where we specified latent_depth: 4 in the first run and latent_depth: 8 in the second run (found in net_args).

Test: MountainCar Blindfolded After having trained all MuZero agents on the MountainCar environment, the final agents were tested in a single-player tourney on this environment. In this tourney the agents were employed with various observation sparsities in order to see how performance degrades when relying more strongly on the learned model. To generate the data, you can run: <br. python Main.py experiment -c Tourney_MC_MuZeroBlindfold.json
It is important to specify the correct paths within this file for each of the runs. So for each trained agent, one must correct the name, output_dir and all file entries within the players list.

All runs can provide quite a bit of data in the form of tensorboard event files, keras model checkpoint files, and the most recent replay-buffer.

Handling the data

For the results shown on our main README page, we parsed all tensorboard event files to extract the average trial rewards --- this should be straight-forward but can take some time as some tensorboard files can be quite large (it may help to remove the --debug flag for this). The visualizations for the numerical results were then made in the Ablation Analysis Notebook, which should illustrate the visualization process.

To utilize our visualization tool for MuZero MountainCar at https://kaesve.nl/projects/cartpole-inspector/#/, you can either choose to view existing models or generate new data from freshly trained agents. The process of generating the plot data is outlined in our MDP Abstraction Notebook.

We hope that this helps in reproducing our results. If not, please let us know by opening a Github Issue.

Clone this wiki locally