-
Notifications
You must be signed in to change notification settings - Fork 25
Main Flags and JSON Configurations
To run the code, you must specify the hyperparameters for MuZero along with some functional parameters. This page describes the structure of the JSON files and how to run the different parts of our code.
From the root/ main, our code is split into three pipelines: train
, experiment
, and play
. Aside from selecting the pipeline, general options that can be given to Main include the following flags:
-
--gpu [INT]
Select tf.device (-1 for CPU). -
--debug
When including this flag lots more data will be tracked in TensorBoard during training (loss of each prediction head, distributions; see utils/debugging.py). Generated tensorboard files may become an order of magnitude larger. -
--lograte [INT]
When--debug
is specified, you can control the rate of data snapshots to be taken every n'th training step. -
--render
Will call a Game's .render() function during training if implemented. This is useful to check behaviour of agents. -
--run_name [STR]
Specify the name of a training session; this overrides the name specified in the JSON.
The train
flag can be used to train a single agent in a straight-forward manner on some specified environment. Mandatory arguments for this pipeline include:
-
-c [FILE]
Path to the JSON configuration file that specifies a ModelConfig. This includes hyperparameters for the agent. -
--game [STR]
Game specification by string, options for environments are specified in Main.pygame_from_name
.
A functioning example to call this pipeline, in strict order, is:
python Main.py train -c Configurations/ModelConfigs/MuzeroCartpole.json --game gym_CartPole-v1 [optional flags]
The experiment
flag is used to specify an overencompassing JSON configuration to specify multiple training runs (e.g., an ablation analysis). Mandatory arguments include:
-
-c [FILE]
Path to the JSON configuration file that specifies a JobConfig.
A functioning example to this pipeline, in strict order, is:
python Main.py experiment -c Configurations/JobConfigs/CartPoleAblations.json [optional flags]
Note that [optional flags] may not be utilized when performing a parameter-grid analysis, this experiment pipeline functions as a manager for scheduling training runs. Flags for each training instance can be provided within the JobConfig.
The play
flag is used to apply a learned agent on a particular environment. TODO
There are two types of JSON files within our codebase, the ModelConfig and the JobConfig. The overall format of both are already provided in the repository, this section describes the structure of the JSON files in more detail --- but not every value explicitly.
Example JSON files for specifying a single agent are already provided in Configurations/ModelConfigs, each parameter that we used is specified within the .format.json file. The structure of this file is as follows:
- General arguments:
name
,algorithm
, andarchitecture
. For the latter two arguments, see Agents/init.py to see the currently implemented options. - Algorithm arguments
args
. These values provide the arguments such as MCTS simulation number, or n-step return discounting. Whether the values here are actually used depends on the implementation specified byalgorithm
. - Neural Network arguments
net_args
. These values are used to create neural networks in a modular fashion. Whether the values here are actually used depends on the implementation specified byarchitecture
.
Example JSON files for experiments are again provided in Configurations/JobConfigs with the corresponding format specified in .format.json. The requisite structure of these files differ slightly depedendent on the specified experiment. An overview of the implemented experiments is given in Experimenter/init.py, this includes the option to perform tourneys across a pool of Agents (including checkpointed models during training) and the option to train a ModelConfig over a grid of varied hyperparameters.
Important parameters for pipeline specification and bookkeeping are the name
, experiment
, and output_dir
keys.