• Stars
    star
    232
  • Rank 172,358 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of Soft-Actor-Critic and Prioritized Experience Replay (PER) + Emphasizing Recent Experience (ERE) + Munchausen RL + D2RL and parallel Environments.

Soft-Actor-Critic-and-Extensions

PyTorch implementation of Soft-Actor-Critic with the Extensions PER + ERE + Munchausen RL and the option for Multi-Environments for parallel data collection and faster training.


This repository includes the newest Soft-Actor-Critic version (Paper 2019) as well as extensions for SAC:

  • Prioritized Experience Replay (PER)
  • Emphasizing Recent Experience without Forgetting the Past(ERE)
  • Munchausen Reinforcement Learning Paper
  • D2RL: DEEP DENSE ARCHITECTURES IN REINFORCEMENT LEARNING Paper
  • N-step Bootstrapping
  • Parallel Environments

In the paper implementation of ERE the authors used and older version of SAC, whereas this repository contains the newest version of SAC as well as a Proportional Prioritization implementation of PER.

TODO:

  • add IQN Critic [X] with IQN critic its 10x slower... need to fix that
  • adding D2DRL IQN Critic [ ]
  • create distributed SAC version with ray [ ]
  • added N-step bootstrapping [X]
  • Check performance with all add-ons [ ]
  • added pybulletgym

Dependencies

Trained and tested on:

Python 3.6
PyTorch 1.7.0  
Numpy 1.15.2 
gym 0.10.11 
pybulletgym

How to use:

The new script combines all extensions and the add-ons can be simply added by setting the corresponding flags.

python run.py -info sac

Parameter: To see the options: python run.py -h

-env, Environment name, default = Pendulum-v0
-per, Adding Priorizied Experience Replay to the agent if set to 1, default = 0
-munchausen, Adding Munchausen RL to the agent if set to 1, default = 0
-dist, --distributional, Using a distributional IQN Critic network if set to 1, default = 0
-d2rl, Uses Deep Actor and Deep Critic Networks if set to 1, default = 0
-n_step, Using n-step bootstrapping, default = 1
-ere, Adding Emphasizing Recent Experience to the agent if set to 1, default = 0
-info, Information or name of the run
-frames, The amount of training interactions with the environment, default is 100000
-eval_every, Number of interactions after which the evaluation runs are performed, default = 5000
-eval_runs, Number of evaluation runs performed, default = 1
-seed, Seed for the env and torch network weights, default is 0
-lr_a, Actor learning rate of adapting the network weights, default is 3e-4
-lr_c, Critic learning rate of adapting the network weights, default is 3e-4
-a, --alpha, entropy alpha value, if not choosen the value is leaned by the agent
-layer_size, Number of nodes per neural network layer, default is 256
-repm, --replay_memory, Size of the Replay memory, default is 1e6
-bs, --batch_size, Batch size, default is 256
-t, --tau, Softupdate factor tau, default is 0.005
-g, --gamma, discount factor gamma, default is 0.99
--saved_model, Load a saved model to perform a test run!
-w, --worker, Number of parallel worker (attention, batch-size increases proportional to worker number!), default = 1

old scripts

with the old scripts you can still run three different SAC versions

Run regular SAC: python SAC.py -env Pendulum-v0 -ep 200 -info sac

Run SAC + PER: python SAC_PER.py -env Pendulum-v0 -ep 200 -info sac_per

Run SAC + ERE + PER: python SAC_ERE_PER.py -env Pendulum-v0 -frames 20000 -info sac_per_ere

For further input arguments and hyperparameter check the code.

Observe training results

tensorboard --logdir=runs

Results

It can be seen that the extensions not always bring improvements to the algorithm. This is depending on the environment and from environment to environment different - as the authors mention in their paper (ERE).

Pendulum

LLC

  • All runs without hyperparameter-tuning

PyBullet Environments

HalfCheetah HalfCheetah Hopper

Comparison SAC and D2RL-SAC

D2RL-Pendulum

Comparison SAC and M-SAC

munchausenRL munchausenRL2

Help and issues:

Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.

Author

  • Sebastian Dittert

Feel free to use this code for your own projects or research.

@misc{SAC,
  author = {Dittert, Sebastian},
  title = {PyTorch Implementation of Soft-Actor-Critic-and-Extensions},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/BY571/Soft-Actor-Critic-and-Extensions}},
}

More Repositories

1

DQN-Atari-Agents

DQN-Atari-Agents: Modularized & Parallel PyTorch implementation of several DQN Agents, i.a. DDQN, Dueling DQN, Noisy DQN, C51, Rainbow, and DRQN
Jupyter Notebook
102
star
2

CQL

PyTorch implementation of the Offline Reinforcement Learning algorithm CQL. Includes the versions DQN-CQL and SAC-CQL for discrete and continuous action spaces.
Python
85
star
3

Upside-Down-Reinforcement-Learning

Upside-Down Reinforcement Learning (β…‚κ“€) implementation in PyTorch. Based on the paper published by JΓΌrgen Schmidhuber.
Jupyter Notebook
71
star
4

Deep-Reinforcement-Learning-Algorithm-Collection

Collection of Deep Reinforcement Learning Algorithms implemented in PyTorch.
Jupyter Notebook
65
star
5

IQN-and-Extensions

PyTorch Implementation of Implicit Quantile Networks (IQN) for Distributional Reinforcement Learning with additional extensions like PER, Noisy layer, N-step bootstrapping, Dueling architecture and parallel env support.
Jupyter Notebook
65
star
6

Munchausen-RL

PyTorch implementation of the Munchausen Reinforcement Learning Algorithms M-DQN and M-IQN
Jupyter Notebook
36
star
7

SAC_discrete

PyTorch implementation of the discrete Soft-Actor-Critic algorithm.
Python
31
star
8

Implicit-Q-Learning

PyTorch implementation of the implicit Q-learning algorithm (IQL)
Python
30
star
9

FQF-and-Extensions

PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF) and Extensions: N-step Bootstrapping, PER, Noisy Layer, Dueling Networks, and parallelization.
Jupyter Notebook
24
star
10

QR-DQN

PyTorch implementation of QR-DQN: Distributional Reinforcement Learning with Quantile Regression
Jupyter Notebook
22
star
11

Normalized-Advantage-Function-NAF-

PyTorch implementation of the Q-Learning Algorithm Normalized Advantage Function for continuous control problems + PER and N-step Method
Jupyter Notebook
20
star
12

Randomized-Ensembled-Double-Q-learning-REDQ-

Pytorch implementation of Randomized Ensembled Double Q-learning (REDQ)
Jupyter Notebook
18
star
13

D4PG

PyTorch implementation of D4PG with the SOTA IQN Critic instead of C51. Implementation includes also the extensions Munchausen RL and D2RL which can be added to D4PG to improve its performance.
Python
12
star
14

GANs

ClusterGAN PyTorch implementation
Jupyter Notebook
11
star
15

Medium_Code_Examples

Implementation of fundamental concepts and algorithms for reinforcement learning
Jupyter Notebook
11
star
16

OFENet

Jupyter Notebook
10
star
17

Genetic-Algorithms-Neural-Network-Optimization

Genetic Algorithm for Neural Network Architecture and Hyperparameter Optimization and Neural Network Weight Optimization with Genetic Algorithm
Jupyter Notebook
10
star
18

GARNE-Genetic-Algorithm-with-Recurrent-Network-and-Novelty-Exploration

GARNE: Genetic-Algorithm-with-Recurrent-Network-and-Novelty-Exploration
Python
7
star
19

MBPO

Python
6
star
20

Hindsight-Experience-Replay

Jupyter Notebook
4
star
21

D4PG-ray

Distributed PyTorch implementation of D4PG with ray. Using a SOTA IQN Critic instead of C51. Implementation includes also the extensions Munchausen RL and D2RL which can be added to D4PG to improve its performance.
Python
4
star
22

pytorch-vmpo

PyTorch implementation of V-MPO
Python
3
star
23

PETS-MPC

Python
3
star
24

RA-PPO

PyTorch implementation of Risk-Averse Policy Learning
Python
3
star
25

Udacity-DRL-Nanodegree-P3-Multiagent-RL-

Multi-Agent-RL Competition on Unitys Tennis Environment
ASP
2
star
26

CEN-Network

Jupyter Notebook
2
star
27

TD3-and-Extensions

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradient (TD3) - including additional Extension to improve the algorithm's performance.
Python
1
star
28

DRQN

Jupyter Notebook
1
star