Kaixhin/spinning-up-basic

Stars
197
Rank 197,722 (Top 4 %)
Language
Python
License
MIT License
Created almost 6 years ago
Updated over 3 years ago

Kaixhin/spinning-up-basic

Kaixhin

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Basic versions of agents from Spinning Up in Deep RL written in PyTorch

spinning-up-basic

Basic versions of agents from Spinning Up in Deep RL written in PyTorch. Designed to run quickly on CPU on Pendulum-v0 from OpenAI Gym.

To see differences between algorithms, try running diff -y <file1> <file2>, e.g., diff -y ddpg.py td3.py.

For MPI versions of on-policy algorithms, see the mpi branch.

Algorithms

Vanilla Policy Gradient/Advantage Actor-Critic (vpg.py)
Trust Region Policy Gradient (trpo.py)
Proximal Policy Optimization (ppo.py)
Deep Deterministic Policy Gradient (ddpg.py)
Twin Delayed DDPG (td3.py)
Soft Actor-Critic (sac.py)
Deep Q-Network (dqn.py)

Implementation Details

Note that implementation details can have a significant effect on performance, as discussed in What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study. This codebase attempts to be as simple as possible, but note that for instance on-policy algorithms use separate actor and critic networks, a state-independent policy standard deviation, per-minibatch advantage normalisation, and several critic updates per minibatch, while the deterministic off-policy algorithms use layer normalisation. Equally, soft actor-critic uses a transformed Normal distribution by default, but this can also help the on-policy algorithms.

Results

Vanilla Policy Gradient/Advantage Actor-Critic

Trust Region Policy Gradient

Proximal Policy Optimization

Deep Deterministic Policy Gradient

Twin Delayed DDPG

Soft Actor-Critic

Deep Q-Network

Code Links

Spinning Up in Deep RL (TensorFlow)
Fired Up in Deep RL (PyTorch)

Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning

grokking-pytorch

The Hitchiker's Guide to PyTorch

dockerfiles

Compilation of Dockerfiles with automated builds enabled on the Docker Registry

Autoencoders

Torch implementations of various types of autoencoders

PlaNet

Deep Planning Network: Control from pixels by latent planning with learned dynamics

imitation-learning

Imitation learning algorithms

Atari

Persistent advantage learning dueling double DQN for the Arcade Learning Environment

ACER

Actor-critic with experience replay

FGLab

Future Gadget Laboratory

FCN-semantic-segmentation

Fully convolutional networks for semantic segmentation

NoisyNet-A3C

Noisy Networks for Exploration

nninit

Weight initialisation schemes for Torch7 neural network modules

rlenvs

Reinforcement learning environments for Torch7

FGMachine

Future Gadget Machine

malmo-challenge

Malmo Collaborative AI Challenge - Team Pig Catcher

torch-pastalog

A Torch interface for pastalog - simple, realtime visualization of neural network training performance

GUDRL

Generalised UDRL

Dist-A3C

Distributed A3C

EC

Episodic Control

human-level-control

Presentation on Human-Level Control Through Deep Reinforcement Learning

Easy21

Reinforcement Learning Assignment: Easy21

end-to-end

Presentation on End-to-End Training of Deep Visuomotor Policies

docker-torch-mega

Docker image for Torch with CUDA support + extra Torch libraries

cuda-workshop

SARCOS

ML models trained on the SARCOS dataset

IncSFA

Incremental Slow Feature Analysis

sybilsystem

MATLAB Deep Learning Library

MCAC

Minimal Criterion Artist Collective

GlassMate

Team Inforaptor's project for IC Hack '14

bakapunk

A tool for finding similar songs in your music library