Reinforcement Learning Exploration Baselines (RLeXplore)
RLeXplore is a set of implementations of intrinsic reward driven-exploration approaches in reinforcement learning using PyTorch, which can be deployed in arbitrary algorithms in a plug-and-play manner. In particular, RLeXplore is designed to be well compatible with Stable-Baselines3, providing more stable exploration benchmarks.
Notice
This repo has been merged with a new project: https://github.com/RLE-Foundation/Hsuanwu, in which more reasonable implementations are provided!
Invoke the intrinsic reward module by:
from hsuanwu.xplore.reward import ICM, RIDE, ...
Module List
Module | Remark | Repr. | Visual | Reference |
---|---|---|---|---|
PseudoCounts | Count-Based exploration | Never Give Up: Learning Directed Exploration Strategies | ||
ICM | Curiosity-driven exploration | Curiosity-Driven Exploration by Self-Supervised Prediction | ||
RND | Count-based exploration | Exploration by Random Network Distillation | ||
GIRM | Curiosity-driven exploration | Intrinsic Reward Driven Imitation Learning via Generative Model | ||
NGU | Memory-based exploration | Never Give Up: Learning Directed Exploration Strategies | ||
RIDE | Procedurally-generated environment | RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments | ||
RE3 | Entropy Maximization | State Entropy Maximization with Random Encoders for Efficient Exploration | ||
RISE | Entropy Maximization | Rényi State Entropy Maximization for Exploration Acceleration in Reinforcement Learning | ||
REVD | Divergence Maximization | Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning |
🐌 : Developing.Repr.
: The method involves representation learning.Visual
: The method works well in visual RL.
Example
Due to the large differences in the calculation of different intrinsic reward methods, Hsuanwu has the following rules:
- The environments are assumed to be vectorized;
- The compute_irs function of each intrinsic reward module has a mandatory argument samples, which is a dict like:
- obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>
- actions (n_steps, n_envs, action_shape) <class 'torch.Tensor'>
- rewards (n_steps, n_envs) <class 'torch.Tensor'>
- next_obs (n_steps, n_envs, *obs_shape) <class 'torch.Tensor'>
Take RE3 for instance, it computes the intrinsic reward for each state based on the Euclidean distance between the state and
its
from hsuanwu.xplore.reward import RE3
from hsuanwu.env import make_dmc_env
import torch as th
if __name__ == '__main__':
num_envs = 7
num_steps = 128
# create env
env = make_dmc_env(env_id="cartpole_balance", num_envs=num_envs)
print(env.observation_space, env.action_space)
# create RE3 instance
re3 = RE3(
observation_space=env.observation_space,
action_space=env.action_space
)
# compute intrinsic rewards
obs = th.rand(size=(num_steps, num_envs, *env.observation_space.shape))
intrinsic_rewards = re3.compute_irs(samples={'obs': obs})
print(intrinsic_rewards.shape, type(intrinsic_rewards))
print(intrinsic_rewards)
# Output:
# {'shape': [9, 84, 84]} {'shape': [1], 'type': 'Box', 'range': [-1.0, 1.0]}
# torch.Size([128, 7]) <class 'torch.Tensor'>