SimpleRL

Personal Reinforcement Learning (RL) repo made to backup codes I implemented

What is this?

SimpleRL is a repository that contains variety of Deep Reinforcement Learning (Deep RL) algorithms using Tensorflow2. This repo is mainly made to backup codes that I implemented while studying RL, but also made to let others easily learn Deep RL. For easy-to-understand RL code, each algorithm is written as simple as possible. This repository will be constantly updated with new Deep RL algorithms.

Algorithms

DQNs
- DQN
- DDQN
- Dueling DQN
DDPG
TD3
SAC_v1
SAC_v2
REINFORCE
VPG
TRPO
PPO
ImageRL
- CURL
- RAD
- SAC_AE
- DBC
D2RL

DQNs

Deep Q Network (DQN) and algorithms derived from it.

ImageRL

RL algorithms that learns policy from pixels

CURL (Contrastive Unsupervised Reinforcement Learning)

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning, Srinivas et al, 2020.

RAD (Reinforcement learning with Augmented Data)

RAD: Reinforcement Learning with Augmented Data, Laskin et al, 2020.

SAC_AE (Soft Actor Critics with AutoEncoder)

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images, Yarats et al, 2020.

DBC (Deep Bisimulation for Control)

Learning Invariant Representations for Reinforcement Learning without Reconstruction, A. Zhang et al, 2020.

D2RL (Deep Dense Architectures in Reinforcement Learning)

D2RL: Deep Dense Architectures in Reinforcement Learning, Sinha et al, 2020

Installation

This code is built in Windows using Anaconda. You can see full environment exported as yaml file (tf2.yaml)

How to use

You can run algorithms by using examples in SimpleRL/Example folder. All run_XXX.py defines Hyperparameters for the experiment. Also, RL Environment and Algorithm, and its Trainer is required to run the experiment.

Warning

There are some unresovled errors and issues you have to know:

Official benchmark score may not be guaranteed. This can happen due to random seed, hyperparameter, etc.
Especially, On-policy algorithms (REINFORCE, VPG, TRPO, PPO) in continous action environment shows poor performance for unknown reasons.
DBC (Deep Bisimulation for Control) also seems to show performance poorer than the official paper.

Any advice on code is always welcomed!

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.idea		.idea
Algorithm		Algorithm
Common		Common
Etc		Etc
Example		Example
Network		Network
Trainer		Trainer
README.md		README.md
data_sample.npy		data_sample.npy
temp.py		temp.py
tf2.yaml		tf2.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimpleRL

What is this?

Algorithms

DQNs

DQN (Deep Q-Networks)

DDQN (Double Deep Q-Networks)

Dueling DQN (Dueling Deep Q-Networks)

DDPG (Deep Deterministic Policy Gradient)

TD3 (Twin Delayed Deep Deterministic Policy Gradient)

SAC_v1 (Soft Actor Critics)

SAC_v2 (Soft Actor Critics)

REINFORCE

VPG (Vanilla Policy Gradient)

TRPO (Trust Region Policy Optimization)

PPO (Proximal Policy Optimization)

ImageRL

CURL (Contrastive Unsupervised Reinforcement Learning)

RAD (Reinforcement learning with Augmented Data)

SAC_AE (Soft Actor Critics with AutoEncoder)

DBC (Deep Bisimulation for Control)

D2RL (Deep Dense Architectures in Reinforcement Learning)

Installation

How to use

Warning

Reference

About

Releases

Packages

Languages

Cerphilly/SimpleRL

Folders and files

Latest commit

History

Repository files navigation

SimpleRL

What is this?

Algorithms

DQNs

DQN (Deep Q-Networks)

DDQN (Double Deep Q-Networks)

Dueling DQN (Dueling Deep Q-Networks)

DDPG (Deep Deterministic Policy Gradient)

TD3 (Twin Delayed Deep Deterministic Policy Gradient)

SAC_v1 (Soft Actor Critics)

SAC_v2 (Soft Actor Critics)

REINFORCE

VPG (Vanilla Policy Gradient)

TRPO (Trust Region Policy Optimization)

PPO (Proximal Policy Optimization)

ImageRL

CURL (Contrastive Unsupervised Reinforcement Learning)

RAD (Reinforcement learning with Augmented Data)

SAC_AE (Soft Actor Critics with AutoEncoder)

DBC (Deep Bisimulation for Control)

D2RL (Deep Dense Architectures in Reinforcement Learning)

Installation

How to use

Warning

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages