Main Flags and JSON Configurations

To run the code, you must specify the hyperparameters for MuZero along with some functional parameters. This page describes the structure of the JSON files and how to run the different parts of our code.

Main Flags

From the root/ main, our code is split into three pipelines: train, experiment, and play. Aside from selecting the pipeline, general options that can be given to Main include the following flags:

--gpu [INT] Select tf.device (-1 for CPU).
--debug When including this flag lots more data will be tracked in TensorBoard during training (loss of each prediction head, distributions; see utils/debugging.py). Generated tensorboard files may become an order of magnitude larger.
--lograte [INT] When --debug is specified, you can control the rate of data snapshots to be taken every n'th training step.
--render Will call a Game's .render() function during training if implemented. This is useful to check behaviour of agents.
--run_name [STR] Specify the name of a training session; this overrides the name specified in the JSON.

Train Pipeline:

The train flag can be used to train a single agent in a straight-forward manner on some specified environment. Mandatory arguments for this pipeline include:

-c [FILE] Path to the JSON configuration file that specifies a ModelConfig. This includes hyperparameters for the agent.
--game [STR] Game specification by string, options for environments are specified in Main.py game_from_name.

A functioning example to call this pipeline, in strict order, is:
python Main.py train -c Configurations/ModelConfigs/MuzeroCartpole.json --game gym_CartPole-v1 [optional flags]

Experiment Pipeline:

The experiment flag is used to specify an overencompassing JSON configuration to specify multiple training runs (e.g., an ablation analysis). Mandatory arguments include:

-c [FILE] Path to the JSON configuration file that specifies a JobConfig.

A functioning example to this pipeline, in strict order, is:
python Main.py experiment -c Configurations/JobConfigs/CartPoleAblations.json [optional flags]
Note that [optional flags] may not be utilized when performing a parameter-grid analysis, this experiment pipeline functions as a manager for scheduling training runs. Flags for each training instance can be provided within the JobConfig.

Play Pipeline:

The play flag is used to apply a learned agent on a particular environment. TODO

JSON Configuration files

There are two types of JSON files within our codebase, the ModelConfig and the JobConfig. The overall format of both are already provided in the repository, this section describes the structure of the JSON files in more detail --- but not every value explicitly.

ModelConfig

Example JSON files for specifying a single agent are already provided in Configurations/ModelConfigs, each parameter that we used is specified within the .format.json file. The structure of this file is as follows:

General arguments: name, algorithm, and architecture. For the latter two arguments, see Agents/init.py to see the currently implemented options.
Algorithm arguments args. These values provide the arguments such as MCTS simulation number, or n-step return discounting. Whether the values here are actually used depends on the implementation specified by algorithm.
Neural Network arguments net_args. These values are used to create neural networks in a modular fashion. Whether the values here are actually used depends on the implementation specified by architecture.

JobConfig

Example JSON files for experiments are again provided in Configurations/JobConfigs with the corresponding format specified in .format.json. The requisite structure of these files differ slightly depedendent on the specified experiment. An overview of the implemented experiments is given in Experimenter/init.py, this includes the option to perform tourneys across a pool of Agents (including checkpointed models during training) and the option to train a ModelConfig over a grid of varied hyperparameters.

Important parameters for pipeline specification and bookkeeping are the name, experiment, and output_dir keys.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly