Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

C++

Ruby

Nix

Dart

Julia

Erlang

PowerShell

R

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Lua

R

Objective-C

PowerShell

F#

Go

Nix

Jupyter Notebook

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇧🇩 Bangladesh

🇧🇴 Bolivia

🇻🇺 Vanuatu

🇴🇲 Oman

🇱🇻 Latvia

🇯🇵 Japan

🇮🇱 Israel

🇿🇲 Zambia

All Countries Compare Countries

ikostrikov/pytorch-trpo

Stars
433
Rank 100,464 (Top 2 %)
Language
Python
License
MIT License
Created over 7 years ago
Updated about 6 years ago

ikostrikov

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

PyTorch implementation of Trust Region Policy Optimization

PyTorch implementation of TRPO

Try my implementation of PPO (aka newer better variant of TRPO), unless you need to you TRPO for some specific reasons.

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

python main.py --env-name "Reacher-v1"

Recommended hyper parameters

InvertedPendulum-v1: 5000

Reacher-v1, InvertedDoublePendulum-v1: 15000

HalfCheetah-v1, Hopper-v1, Swimmer-v1, Walker2d-v1: 25000

Ant-v1, Humanoid-v1: 50000

Results

More or less similar to the original code. Coming soon.

Todo

Plots.
Collect data in multiple threads.

pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

Python

3,573

pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".

Python

1,224

jaxrl

JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.

Jupyter Notebook

613

TensorFlow-VAE-GAN-DRAW

A collection of generative methods implemented with TensorFlow (Deep Convolutional Generative Adversarial Networks (DCGAN), Variational Autoencoder (VAE) and DRAW: A Recurrent Neural Network For Image Generation).

Python

595

pytorch-flows

PyTorch implementations of algorithms for density estimation

Python

573

pytorch-meta-optimizer

A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Python

309

pytorch-ddpg-naf

Implementation of algorithms for continuous control (DDPG and NAF).

walk_in_the_park

implicit_q_learning

rlpd

TensorFlow-Pointer-Networks

TensorFlow implementation of Pointer Networks

Python

204

pytorch-rl

jaxrl2

dmcgym

linenplus

cql-results

gail-experts

gym_dmc

Python

ikostrikov/pytorch-trpo

ikostrikov

Reviews

Repository Details

PyTorch implementation of TRPO

Contributions

Usage

Recommended hyper parameters

Results

Todo

More Repositories