TF2RL is a deep reinforcement learning library that implements various deep reinforcement learning algorithms using TensorFlow 2.x.
Following algorithms are supported:
Algorithm | Dicrete action | Continuous action | Support | Category |
---|---|---|---|---|
VPG, PPO | ✓ | ✓ | GAE | Model-free On-policy RL |
DQN (including DDQN, Prior. DQN, Duel. DQN, Distrib. DQN, Noisy DQN) | ✓ | - | ApeX | Model-free Off-policy RL |
DDPG (including TD3, BiResDDPG) | - | ✓ | ApeX | Model-free Off-policy RL |
SAC | ✓ | ✓ | ApeX | Model-free Off-policy RL |
CURL, SAC-AE | - | ✓ | - | Model-free Off-policy RL |
MPC, ME-TRPO | ✓ | ✓ | - | Model-base RL |
GAIL, GAIfO, VAIL (including Spectral Normalization) | ✓ | ✓ | - | Imitation Learning |
Following papers have been implemented in tf2rl:
- Model-free On-policy RL
- Model-free Off-policy RL
- Playing Atari with Deep Reinforcement Learning, code
- Human-level control through Deep Reinforcement Learning, code
- Deep Reinforcement Learning with Double Q-learning, code
- Prioritized Experience Replay, code
- Dueling Network Architectures for Deep Reinforcement Learning, code
- A Distributional Perspective on Reinforcement Learning, code
- Noisy Networks for Exploration, code
- Distributed Prioritized Experience Replay, code
- Continuous control with deep reinforcement learning, code
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Soft Actor-Critic Algorithms and Applications, code
- Addressing Function Approximation Error in Actor-Critic Methods, code
- Deep Residual Reinforcement Learning, code
- Soft Actor-Critic for Discrete Action Settings, code
- Improving Sample Efficiency in Model-Free Reinforcement Learning from Images, code
- CURL: Contrastive Unsupervised Representations for Reinforcement Learning, code
- Model-base RL
- Imitation Learning
Also, some useful techniques are implemented:
There are several ways to install tf2rl. The recommended way is "2.1 Install from PyPI".
If TensorFlow is already installed, we try to identify the best version of TensorFlow Probability.
You can install tf2rl
from PyPI:
$ pip install tf2rl
You can also install from source:
$ git clone https://github.com/keiohta/tf2rl.git tf2rl
$ cd tf2rl
$ pip install .
Instead of installing tf2rl on your (virtual) system, you can use preinstalled Docker containers.
Only the first execution requires time to download the container image.
At the following commands, you need to replace <version>
with the
version tag which you want to use.
The following simple command starts preinstalled container.
$ docker run -it ghcr.io/keiohta/tf2rl/cpu:<version> bash
If you also want to mount your local directory /local/dir/path
at
container /mount/point
$ docker run -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/cpu:<version> bash
WARNING: We encountered unsolved errors when running ApeX multiprocess learning.
Requirements
- Linux
- NVIDIA GPU
- TF2.2 compatible driver
- Docker 19.03 or later
The following simple command starts preinstalled container.
$ docker run --gpus all -it ghcr.io/keiohta/tf2rl/nvidia:<version> bash
If you also want to mount your local directory /local/dir/path
at
container /mount/point
$ docker run --gpus all -it -v /local/dir/path:/mount/point ghcr.io/keiohta/tf2rl/nvidia:<version> bash
If your container can see GPU correctly, you can check inside container by the following comand;
$ nvidia-smi
Here is a quick example of how to train DDPG agent on a Pendulum environment:
import gym
from tf2rl.algos.ddpg import DDPG
from tf2rl.experiments.trainer import Trainer
parser = Trainer.get_argument()
parser = DDPG.get_argument(parser)
args = parser.parse_args()
env = gym.make("Pendulum-v1")
test_env = gym.make("Pendulum-v1")
policy = DDPG(
state_shape=env.observation_space.shape,
action_dim=env.action_space.high.size,
gpu=-1, # Run on CPU. If you want to run on GPU, specify GPU number
memory_capacity=10000,
max_action=env.action_space.high[0],
batch_size=32,
n_warmup=500)
trainer = Trainer(policy, env, args, test_env=test_env)
trainer()
You can check implemented algorithms in examples. For example if you want to train DDPG agent:
# You must change directory to avoid importing local files
$ cd examples
# For options, please specify --help or read code for options
$ python run_ddpg.py [options]
You can see the training progress/results from TensorBoard as follows:
# When executing `run_**.py`, its logs are automatically generated under `./results`
$ tensorboard --logdir results
In basic usage, what you need is initializing one of the policy
classes and Trainer
class.
As a option, tf2rl supports command line program style, so that you can also pass configuration parameters from command line arguments.
Trainer
class and policy classes have class method get_argument
,
which creates or updates
ArgParser object.
You can parse the command line arguments with the
ArgParser.parse_args
method, which returns Namespace
object.
Policy's constructor option can be extracted from the Namespace
object explicitly. Trainer
constructor accepts the Namespace
object.
from tf2rl.algos.dqn import DQN
from tf2rl.experiments.trainer import Trainer
env = ... # Create gym.env like environment.
parser = DQN.get_argument(Trainer.get_argument())
args = parser.parse_args()
policy = DQN(enable_double_dqn = args.enable_double_dqn,
enable_dueling_dqn = args.enable_dueling_dqn,
enable_noisy_dqn = args.enable_noisy_dqn)
trainer = Trainer(policy, env, args)
trainer()
ArgParser
doesn't fit the usage on Jupyter Notebook like
envrionment. Trainer
constructor can accept dict
as args
argument instead of Namespace
object.
from tf2rl.algos.dqn import DQN
from tf2rl.experiments.trainer import Trainer
env = ... # Create gym.env like environment.
policy = DQN( ... )
trainer = Trainer(policy, env, {"max_steps": int(1e+6), ... })
trainer()
The Trainer
class saves logs and models under
<logdir>/%Y%m%dT%H%M%S.%f
. The default logdir
is "results"
, and
it can be changed by --logdir
command argument or "logdir"
key in
constructor args
.
@misc{ota2020tf2rl,
author = {Kei Ota},
title = {TF2RL},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/keiohta/tf2rl/}}
}