• Stars
    star
    158
  • Rank 235,814 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyTorch implementation of FQF, IQN and QR-DQN.

FQF, IQN and QR-DQN in PyTorch

This is a PyTorch implementation of Fully parameterized Quantile Function(FQF)[1], Implicit Quantile Networks(IQN)[2] and Quantile Regression DQN(QR-DQN)[3]. I tried to make it easy for readers to understand algorithms. Please let me know if you have any questions. Also, any pull requests are welcomed.

UPDATE

  • 2020.6.9
    • Bump torch up to 1.5.0.
  • 2020.5.10
    • Refactor codes.
    • Fix Prioritized Experience Replay and Noisy Networks.
    • Test IQN with Rainbow's components.
  • 2020.6.9
    • Bump Torch up to 1.5.0.

Setup

If you are using Anaconda, first create the virtual environment.

conda create -n fqf python=3.8 -y
conda activate fqf

You can install Python liblaries using pip.

pip install --upgrade pip
pip install -r requirements.txt

If you're using other than CUDA 10.2, you may need to install PyTorch. See instructions for more details.

Examples

You can train FQF agent using hyperparameters here.

python train_fqf.py --cuda --env_id PongNoFrameskip-v4 --seed 0 --config config/fqf.yaml

You can also train IQN or QR-DQN agent in the same way. Note that we log results with the number of frames, which equals to the number of agent's steps multiplied by 4 (e.g. 100M frames means 25M agent's steps).

Results

Results of examples (without n-step rewards, double q-learning, dueling network nor noisy net) are shown below, which is comparable (if no better) with the paper. Scores below are evaluated arfer every 1M frames (250k agent's steps). Result are averaged over 2 seeds and visualized with min/max.

Note that I reported the "mean" score, not the "best" score as in the paper. Also, I only trained a limited number of frames due to limited resources (e.g. 100M frames instead of 200M).

BreakoutNoFrameskip-v4

I tested FQF, IQN and QR-DQN on BreakoutNoFrameskip-v4 for 30M frames to see algorithms worked.

BerzerkNoFrameskip-v4

I also tested FQF and IQN on BerzerkNoFrameskip-v4 for 100M frames to see the difference between FQF's performance and IQN's, which is quite obvious on this task.

IQN-Rainbow

I also tested IQN with Rainbow's components on PongNoFrameskip-v4 (just 1 seed). Note that I decreased num_steps to 7500000(30M frames), but kept start_steps as the same.

TODO

  • Implement risk-averse policies for IQN.
  • Test FQF-Rainbow agent.

References

[1] Yang, Derek, et al. "Fully Parameterized Quantile Function for Distributional Reinforcement Learning." Advances in Neural Information Processing Systems. 2019.

[2] Dabney, Will, et al. "Implicit quantile networks for distributional reinforcement learning." arXiv preprint. 2018.

[3] Dabney, Will, et al. "Distributional reinforcement learning with quantile regression." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.

More Repositories

1

sac-discrete.pytorch

PyTorch implementation of SAC-Discrete.
Python
273
star
2

gail-airl-ppo.pytorch

PyTorch implementation of GAIL and AIRL based on PPO.
Python
182
star
3

soft-actor-critic.pytorch

PyTorch implementation of Soft Actor-Critic(SAC).
Python
94
star
4

rljax

A collection of RL algorithms written in JAX.
Python
92
star
5

slac.pytorch

PyTorch implementation of Stochastic Latent Actor-Critic(SLAC).
Python
87
star
6

discor.pytorch

PyTorch implementation of Distribution Correction(DisCor) based on Soft Actor-Critic.
Python
38
star
7

alfred-aws-icons

Alfred Workflow for quickly pasting AWS architecture icons onto PowerPoint.
Go
27
star
8

rltorch

A simple framework for distributed reinforcement learning in PyTorch.
Python
16
star
9

vae.pytorch

PyTorch Implementation of Deep Feature Consistent Variational Autoencoder.
Python
12
star
10

simple-rl.pytorch

Simple implementation of model-free RL algorithms written in PyTorch.
Python
9
star
11

wappo.pytorch

PyTorch implementation of Wasserstein Adversarial Proximal Policy Optimization(WAPPO).
Python
6
star
12

slac-discrete.pytorch

PyTorch implementation of Stochastic Latent Actor-Critic(SLAC) extended for discrete action settings.
Python
2
star
13

gec-app

This project contains frontend/backend application code and infrastructure for grammatical error correction.
Python
2
star
14

dmm-schedule-checker

DMM schedule checker continuously monitors the schedule of your favorite teachers, and notifies via LINE whenever new slots are available.
Go
1
star
15

sagemaker-tutorial

Amazon SageMaker tutorial
Jupyter Notebook
1
star
16

ssm-enforcement-tool

This project contains a set of infrastructure implemented in Terraform to monitor your "not-managed-by-SSM" instances accross all regions.
Go
1
star