Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Groovy

Kotlin

Erlang

Java

HTML

Perl

Elixir

PowerShell

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Ruby

Shell

C++

Swift

Zig

Go

Objective-C

JavaScript

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇲🇪 Montenegro

🇳🇮 Nicaragua

🇧🇬 Bulgaria

🇵🇭 Philippines

🇰🇷 South Korea

🇮🇳 India

🇧🇲 Bermuda

🇬🇶 Equatorial Guinea

All Countries Compare Countries

Top Contributors
Users
Organizations
Repositories
Discover Languages
Awesome lists
Ranking by Country
Interviews

zhangchuheng123/Reinforcement-Implementation

Stars
429
Rank 101,271 (Top 2 %)
Language
Python
Created about 6 years ago
Updated over 2 years ago

zhangchuheng123/Reinforcement-Implementation

zhangchuheng123

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implementation of benchmark RL algorithms

Reinforcement-Implementation

This project aims to reproduce the results of several model-free RL algorithms in continuous action domain (mujuco environment).

This projects

uses pytorch package
implements different algorithms independently in seperate files / minimal files
is written in simplest style
tries to follow the original paper and reproduce their results

My first stage of work is to reproduce this figure in the PPO paper.

A2C
ACER (A2C + Trust Region): It seems that this implementation has some problems ... (welcome bug report)
CEM
TRPO (TRPO single path)
PPO (PPO clip)
Vanilla PG

On the next stage, I want to implement

DDPG
Random Search (see Simple random search provides a competitive approach to reinforcement learning)
SAC (soft actor-critic) with continuous action space
SAC (soft actor-critic) with discrete action space
DQN

Then next stage, discrete action space problem and raw video input (Atari) problems:

Rainbow: DQN and relevant techniques (target network / double Q-learning / prioritized experience replay / dueling network structure / distributional RL)
PPO with random network distillation (RND)

Rainbow on Atari with only 3M: It works but may need further tuning.

And then model-based algorithms (not planned)

PILCO
PE-TS

TODOs:

change the way reward counts, current way may underestimate the reward (evaluate a deterministic model rather a stochastic/exploratory model)

PPO Implementation

PPO implementation is of high quality - matches the performance of openai.baselines.

Update

Recently, I added Rainbow and DQN. The Rainbow implementation is of high quality on Atari games - enough for you to modify and write your own research paper. The DQN implementation is a minimum workaround and reaches a good performance on MountainCar (which is a simple task but many codes on Github do not achieve good performance or need additional reward/environment engineering). This is enough for you to have a fast test of your research ideas.

More Repositories

iQuant

FactorModel

References for factor model

IIIS-preliminary

Jupyter Notebook

NJUThesis

南京大学本科毕业论文模板

Connect4

zhangchuheng123.github.io

My Personal Homepage on Github

Qlib

A special version of Qlib == 0.7.0.99

QHU-MOOC

青海大学程序设计类课程慕课平台

MATLAB_with_Comsol

Home
Users
Organizations
Repositories
Rating by Country
Discover
Awesome
Interviews
Support
Contact

© Copyright 2024 Opensource Heroes

Love Open Source and this site? Check out how you can help us