Personal Reinforcement Learning (RL) repo made to backup codes I implemented
SimpleRL is a repository that contains variety of Deep Reinforcement Learning (Deep RL) algorithms using Tensorflow2. This repo is mainly made to backup codes that I implemented while studying RL, but also made to let others easily learn Deep RL. For easy-to-understand RL code, each algorithm is written as simple as possible. This repository will be constantly updated with new Deep RL algorithms.
Deep Q Network (DQN) and algorithms derived from it.
Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013
Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015.
Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015.
Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015.
Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, 2018.
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018.
Soft actor-critic algorithms and applications, Haarnoja et al, 2018.
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Ronald J. Williams, 1992.
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 2000.
Trust Region Policy Optimization, Schulman et al, 2015.
Proximal Policy Optimization Algorithms, Schulman et al, 2017.
RL algorithms that learns policy from pixels
CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning, Srinivas et al, 2020.
RAD: Reinforcement Learning with Augmented Data, Laskin et al, 2020.
Improving Sample Efficiency in Model-Free Reinforcement Learning from Images, Yarats et al, 2020.
Learning Invariant Representations for Reinforcement Learning without Reconstruction, A. Zhang et al, 2020.
D2RL: Deep Dense Architectures in Reinforcement Learning, Sinha et al, 2020
This code is built in Windows using Anaconda. You can see full environment exported as yaml file (tf2.yaml)
You can run algorithms by using examples in SimpleRL/Example
folder. All run_XXX.py
defines Hyperparameters for the experiment.
Also, RL Environment
and Algorithm
, and its Trainer
is required to run the experiment.
There are some unresovled errors and issues you have to know:
- Official benchmark score may not be guaranteed. This can happen due to random seed, hyperparameter, etc.
- Especially, On-policy algorithms (REINFORCE, VPG, TRPO, PPO) in continous action environment shows poor performance for unknown reasons.
- DBC (Deep Bisimulation for Control) also seems to show performance poorer than the official paper.
Any advice on code is always welcomed!
- https://spinningup.openai.com/en/latest/index.html
- https://github.com/keiohta/tf2rl
- https://github.com/reinforcement-learning-kr/pg_travel
- https://github.com/MishaLaskin/rad
- https://github.com/MishaLaskin/curl
- https://github.com/denisyarats/pytorch_sac_ae
- https://github.com/facebookresearch/deep_bisim4control