Skip to content

Simple Reinforcement Learning algorithms implemented in Tensorflow 2

Notifications You must be signed in to change notification settings

Cerphilly/SimpleRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Depend TF Depend GYM Depend

SimpleRL

Personal Reinforcement Learning (RL) repo made to backup codes I implemented

What is this?

SimpleRL is a repository that contains variety of Deep Reinforcement Learning (Deep RL) algorithms using Tensorflow2. This repo is mainly made to backup codes that I implemented while studying RL, but also made to let others easily learn Deep RL. For easy-to-understand RL code, each algorithm is written as simple as possible. This repository will be constantly updated with new Deep RL algorithms.

Algorithms

DQNs

Deep Q Network (DQN) and algorithms derived from it.

DQN (Deep Q-Networks)

Playing Atari with Deep Reinforcement Learning, Mnih et al, 2013

DDQN (Double Deep Q-Networks)

Deep Reinforcement Learning with Double Q-learning, Hasselt et al 2015.

Dueling DQN (Dueling Deep Q-Networks)

Dueling Network Architectures for Deep Reinforcement Learning, Wang et al, 2015.


DDPG (Deep Deterministic Policy Gradient)

Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015.

TD3 (Twin Delayed Deep Deterministic Policy Gradient)

Addressing Function Approximation Error in Actor-Critic Methods, Fujimoto et al, 2018.

SAC_v1 (Soft Actor Critics)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al, 2018.

SAC_v2 (Soft Actor Critics)

Soft actor-critic algorithms and applications, Haarnoja et al, 2018.


REINFORCE

Simple statistical gradient-following algorithms for connectionist reinforcement learning, Ronald J. Williams, 1992.

VPG (Vanilla Policy Gradient)

Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton et al, 2000.

TRPO (Trust Region Policy Optimization)

Trust Region Policy Optimization, Schulman et al, 2015.

PPO (Proximal Policy Optimization)

Proximal Policy Optimization Algorithms, Schulman et al, 2017.

ImageRL

RL algorithms that learns policy from pixels

CURL (Contrastive Unsupervised Reinforcement Learning)

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning, Srinivas et al, 2020.

RAD (Reinforcement learning with Augmented Data)

RAD: Reinforcement Learning with Augmented Data, Laskin et al, 2020.

SAC_AE (Soft Actor Critics with AutoEncoder)

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images, Yarats et al, 2020.

DBC (Deep Bisimulation for Control)

Learning Invariant Representations for Reinforcement Learning without Reconstruction, A. Zhang et al, 2020.


D2RL (Deep Dense Architectures in Reinforcement Learning)

D2RL: Deep Dense Architectures in Reinforcement Learning, Sinha et al, 2020


Installation

This code is built in Windows using Anaconda. You can see full environment exported as yaml file (tf2.yaml)

How to use

You can run algorithms by using examples in SimpleRL/Example folder. All run_XXX.py defines Hyperparameters for the experiment. Also, RL Environment and Algorithm, and its Trainer is required to run the experiment.

Warning

There are some unresovled errors and issues you have to know:

  1. Official benchmark score may not be guaranteed. This can happen due to random seed, hyperparameter, etc.
  2. Especially, On-policy algorithms (REINFORCE, VPG, TRPO, PPO) in continous action environment shows poor performance for unknown reasons.
  3. DBC (Deep Bisimulation for Control) also seems to show performance poorer than the official paper.

Any advice on code is always welcomed!

Reference

Releases

No releases published

Packages

No packages published

Languages