• Stars
    star
    1,178
  • Rank 39,705 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created about 5 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning

Meta-World

License Build Status

The current version of Meta-World is a work in progress. If you find any bugs/errors please open an issue.

Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms' generalization ability to new behaviors.

For more background information, please refer to our website and the accompanying conference publication, which provides baseline results for 8 state-of-the-art meta- and multi-task RL algorithms.

Table of Contents

Join the Community

Metaworld is now maintained by the Farama Foundation! You can interact with our community and the new developers in our Discord server

Maintenance Status

The current roadmap for Meta-World can be found here

Installation

To install everything, run:

pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld

Alternatively, you can clone the repository and install an editable version locally:

git clone https://github.com/Farama-Foundation/Metaworld.git
cd Metaworld
pip install -e .

For users attempting to reproduce results found in the Meta-World paper please use this command:

pip install git+https://github.com/Farama-Foundation/Metaworld.git@04be337a12305e393c0caf0cbf5ec7755c7c8feb

Using the benchmark

Here is a list of benchmark environments for meta-RL (ML*) and multi-task-RL (MT*):

  • ML1 is a meta-RL benchmark environment which tests few-shot adaptation to goal variation within single task. You can choose to test variation within any of 50 tasks for this benchmark.
  • ML10 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 10 meta-train tasks, and 3 test tasks.
  • ML45 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 45 meta-train tasks and 5 test tasks.
  • MT10, MT1, and MT50 are multi-task-RL benchmark environments for learning a multi-task policy that perform 10, 1, and 50 training tasks respectively. MT1 is similar to ML1 because you can choose to test variation within any of 50 tasks for this benchmark. In the original Meta-World experiments, we augment MT10 and MT50 environment observations with a one-hot vector which identifies the task. We don't enforce how users utilize task one-hot vectors, however one solution would be to use a Gym wrapper such as this one

Basics

We provide a Benchmark API, that allows constructing environments following the gymnasium.Env interface.

To use a Benchmark, first construct it (this samples the tasks allowed for one run of an algorithm on the benchmark). Then, construct at least one instance of each environment listed in benchmark.train_classes and benchmark.test_classes. For each of those environments, a task must be assigned to it using env.set_task(task) from benchmark.train_tasks and benchmark.test_tasks, respectively. Tasks can only be assigned to environments which have a key in benchmark.train_classes or benchmark.test_classes matching task.env_name. Please see the sections Running ML1, MT1 and Running ML10, ML45, MT10, MT50 for more details.

You may wish to only access individual environments used in the Metaworld benchmark for your research. See the Accessing Single Goal Environments for more details.

Seeding a Benchmark Instance

For the purposes of reproducibility, it may be important to you to seed your benchmark instance. For example, for the ML1 benchmark environment with the 'pick-place-v2' environment, you can do so in the following way:

import metaworld

SEED = 0  # some seed number here
benchmark = metaworld.ML1('pick-place-v2', seed=SEED)

Running ML1 or MT1

import metaworld
import random

print(metaworld.ML1.ENV_NAMES)  # Check out the available environments

ml1 = metaworld.ML1('pick-place-v2') # Construct the benchmark, sampling tasks

env = ml1.train_classes['pick-place-v2']()  # Create an environment with task `pick_place`
task = random.choice(ml1.train_tasks)
env.set_task(task)  # Set task

obs = env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

MT1 can be run the same way except that it does not contain any test_tasks

Running a benchmark

Create an environment with train tasks (ML10, MT10, ML45, or MT50):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

Create an environment with test tasks (this only works for ML10 and ML45, since MT10 and MT50 don't have a separate set of test tasks):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

testing_envs = []
for name, env_cls in ml10.test_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.test_tasks
                        if task.env_name == name])
  env.set_task(task)
  testing_envs.append(env)

for env in testing_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action

Accessing Single Goal Environments

You may wish to only access individual environments used in the Meta-World benchmark for your research. We provide constructors for creating environments where the goal has been hidden (by zeroing out the goal in the observation) and environments where the goal is observable. They are called GoalHidden and GoalObservable environments respectively.

You can access them in the following way:

from metaworld.envs import (ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE,
                            ALL_V2_ENVIRONMENTS_GOAL_HIDDEN)
                            # these are ordered dicts where the key : value
                            # is env_name : env_constructor

import numpy as np

door_open_goal_observable_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE["door-open-v2-goal-observable"]
door_open_goal_hidden_cls = ALL_V2_ENVIRONMENTS_GOAL_HIDDEN["door-open-v2-goal-hidden"]

env = door_open_goal_hidden_cls()
env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environment with the sampled random action
assert (obs[-3:] == np.zeros(3)).all() # goal will be zeroed out because env is HiddenGoal

# You can choose to initialize the random seed of the environment.
# The state of your rng will remain unaffected after the environment is constructed.
env1 = door_open_goal_observable_cls(seed=5)
env2 = door_open_goal_observable_cls(seed=5)

env1.reset()  # Reset environment
env2.reset()
a1 = env1.action_space.sample()  # Sample an action
a2 = env2.action_space.sample()
next_obs1, _, _, _ = env1.step(a1)  # Step the environment with the sampled random action

next_obs2, _, _, _ = env2.step(a2)
assert (next_obs1[-3:] == next_obs2[-3:]).all() # 2 envs initialized with the same seed will have the same goal
assert not (next_obs2[-3:] == np.zeros(3)).all()   # The env's are goal observable, meaning the goal is not zero'd out

env3 = door_open_goal_observable_cls(seed=10)  # Construct an environment with a different seed
env1.reset()  # Reset environment
env3.reset()
a1 = env1.action_space.sample()  # Sample an action
a3 = env3.action_space.sample()
next_obs1, _, _, _ = env1.step(a1)  # Step the environment with the sampled random action
next_obs3, _, _, _ = env3.step(a3)

assert not (next_obs1[-3:] == next_obs3[-3:]).all() # 2 envs initialized with different seeds will have different goals
assert not (next_obs1[-3:] == np.zeros(3)).all()   # The env's are goal observable, meaning the goal is not zero'd out

Citing Meta-World

If you use Meta-World for academic research, please kindly cite our CoRL 2019 paper the using following BibTeX entry.

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

Accompanying Baselines

If you're looking for implementations of the baselines algorithms used in the Meta-World conference publication, please look at our sister directory, Garage.

Note that these aren't the exact same baselines that were used in the original conference publication, however they are true to the original baselines.

Become a Contributor

We welcome all contributions to Meta-World. Please refer to the contributor's guide for how to prepare your contributions.

Acknowledgements

Meta-World is a work by Tianhe Yu (Stanford University), Deirdre Quillen (UC Berkeley), Zhanpeng He (Columbia University), Ryan Julian (University of Southern California), Karol Hausman (Google AI), Chelsea Finn (Stanford University) and Sergey Levine (UC Berkeley).

The code for Meta-World was originally based on multiworld, which is developed by Vitchyr H. Pong, Murtaza Dalal, Ashvin Nair, Shikhar Bahl, Steven Lin, Soroush Nasiriany, Kristian Hartikainen and Coline Devin. The Meta-World authors are grateful for their efforts on providing such a great framework as a foundation of our work. We also would like to thank Russell Mendonca for his work on reward functions for some of the environments.

More Repositories

1

Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
Python
6,383
star
2

PettingZoo

An API standard for multi-agent reinforcement learning environments, with popular reference environments and related utilities
Python
2,553
star
3

HighwayEnv

A minimalist environment for decision-making in autonomous driving
Python
2,506
star
4

Arcade-Learning-Environment

The Arcade Learning Environment (ALE) -- a platform for AI research.
C++
2,106
star
5

Minigrid

Simple and easily configurable grid world environments for reinforcement learning
Python
2,051
star
6

ViZDoom

Reinforcement Learning environments based on the 1993 game Doom :godmode:
C++
1,723
star
7

chatarena

ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.
Python
1,344
star
8

D4RL

A collection of reference environments for offline reinforcement learning
Python
1,256
star
9

Miniworld

Simple and easily configurable 3D FPS-game-like environments for reinforcement learning
Python
683
star
10

Gymnasium-Robotics

A collection of robotics simulation environments for reinforcement learning
Python
489
star
11

SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Python
449
star
12

MO-Gymnasium

Multi-objective Gymnasium environments for reinforcement learning
Python
282
star
13

miniwob-plusplus

MiniWoB++: a web interaction benchmark for reinforcement learning
HTML
276
star
14

MicroRTS

A simple and highly efficient RTS-game-inspired environment for reinforcement learning
Java
271
star
15

Minari

A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
Python
268
star
16

MicroRTS-Py

A simple and highly efficient RTS-game-inspired environment for reinforcement learning (formerly Gym-MicroRTS)
Python
219
star
17

MAgent2

An engine for high performance multi-agent environments with very large numbers of agents, along with a set of reference environments
C++
202
star
18

D4RL-Evaluations

Python
187
star
19

stable-retro

Retro games for Reinforcement Learning
C
146
star
20

Shimmy

An API conversion tool for popular external reinforcement learning environments
Python
129
star
21

AutoROM

A tool to automate installing Atari ROMs for the Arcade Learning Environment
Python
75
star
22

gym-examples

Example code for the Gym documentation
Python
68
star
23

momaland

Benchmarks for Multi-Objective Multi-Agent Decision Making
Python
58
star
24

Jumpy

On-the-fly conversions between Jax and NumPy tensors
Python
45
star
25

gym-docs

Code for Gym documentation website
41
star
26

Procgen2

Fast and procedurally generated side-scroller-game-like graphical environments (formerly Procgen)
C++
27
star
27

CrowdPlay

A web based platform for collecting human actions in reinforcement learning environments
Jupyter Notebook
26
star
28

TinyScaler

A small and fast image rescaling library with SIMD support
C
19
star
29

minari-dataset-generation-scripts

Scripts to recreate the D4RL datasets with Minari
Python
15
star
30

rlay

A relay between Gymnasium and any software
Rust
8
star
31

gymnasium-env-template

A template gymnasium environment for users to build upon
Jinja
7
star
32

A2Perf

A2Perf is a benchmark for evaluating agents on sequential decision problems that are relevant to the real world. This repository contains code for running and evaluating participant's submissions on the benchmark platform.
Python
4
star
33

farama.org

HTML
2
star
34

gym-notices

Python
1
star
35

Celshast

Sass
1
star
36

MPE2

A set of communication oriented environments
Python
1
star
37

Farama-Notifications

Allows for providing notifications on import to all Farama Packages
Python
1
star
38

a2perf-circuit-training

Python
1
star
39

a2perf-benchmark-submission

Python
1
star
40

a2perf-web-nav

HTML
1
star
41

a2perf-quadruped-locomotion

Python
1
star
42

a2perf-reliability-metrics

Python
1
star
43

a2perf-code-carbon

Jupyter Notebook
1
star