• Stars
    star
    600
  • Rank 74,616 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ•ΉοΈ A diverse suite of scalable reinforcement learning environments in JAX

Jumanji logo

Python Versions PyPI Version Tests Code Style MyPy License Hugging Face

Environments | Installation | Quickstart | Training | Citation | Docs

BinPack Cleaner Connector CVRP Game2048 GraphColoring
JobShop Knapsack Maze Minesweeper MMST MultiCVRP
RobotWarehouse RubiksCube Snake Sudoku Tetris Tetris

Welcome to the Jungle! 🌴

Jumanji is a diverse suite of scalable reinforcement learning environments written in JAX.

Jumanji is helping pioneer a new wave of hardware-accelerated research and development in the field of RL. Jumanji's high-speed environments enable faster iteration and large-scale experimentation while simultaneously reducing complexity. Originating in the Research Team at InstaDeep, Jumanji is now developed jointly with the open-source community. To join us in these efforts, reach out, raise issues and read our contribution guidelines or just star 🌟 to stay up to date with the latest developments!

Goals πŸš€

  1. Provide a simple, well-tested API for JAX-based environments.
  2. Make research in RL more accessible.
  3. Facilitate the research on RL for problems in the industry and help close the gap between research and industrial applications.
  4. Provide environments whose difficulty can be scaled to be arbitrarily hard.

Overview 🦜

  • πŸ₯‘ Environment API: core abstractions for JAX-based environments.
  • πŸ•ΉοΈ Environment Suite: a collection of RL environments ranging from simple games to NP-hard combinatorial problems.
  • 🍬 Wrappers: easily connect to your favourite RL frameworks and libraries such as Acme, Stable Baselines3, RLlib, OpenAI Gym and DeepMind-Env through our dm_env and gym wrappers.
  • πŸŽ“ Examples: guides to facilitate Jumanji's adoption and highlight the added value of JAX-based environments.
  • 🏎️ Training: example agents that can be used as inspiration for the agents one may implement in their research.

Environments 🌍

Jumanji provides a diverse range of environments ranging from simple games to NP-hard combinatorial problems.

Environment Category Registered Version(s) Source Description
πŸ”’ Game2048 Logic Game2048-v1 code doc
🎨 GraphColoring Logic GraphColoring-v0 code doc
πŸ’£ Minesweeper Logic Minesweeper-v0 code doc
🎲 RubiksCube Logic RubiksCube-v0
RubiksCube-partly-scrambled-v0
code doc
✏️ Sudoku Logic Sudoku-v0
Sudoku-very-easy-v0
code doc
πŸ“¦ BinPack (3D BinPacking Problem) Packing BinPack-v2 code doc
🏭 JobShop (Job Shop Scheduling Problem) Packing JobShop-v0 code doc
πŸŽ’ Knapsack Packing Knapsack-v1 code doc
β–’ Tetris Packing Tetris-v0 code doc
🧹 Cleaner Routing Cleaner-v0 code doc
πŸ”— Connector Routing Connector-v2 code doc
🚚 CVRP (Capacitated Vehicle Routing Problem) Routing CVRP-v1 code doc
🚚 MultiCVRP (Multi-Agent Capacitated Vehicle Routing Problem) Routing MultiCVRP-v0 code doc
πŸ” Maze Routing Maze-v0 code doc
πŸ€– RobotWarehouse Routing RobotWarehouse-v0 code doc
🐍 Snake Routing Snake-v1 code doc
πŸ“¬ TSP (Travelling Salesman Problem) Routing TSP-v1 code doc
Multi Minimum Spanning Tree Problem Routing MMST-v0 code doc

Installation 🎬

You can install the latest release of Jumanji from PyPI:

pip install jumanji

Alternatively, you can install the latest development version directly from GitHub:

pip install git+https://github.com/instadeepai/jumanji.git

Jumanji has been tested on Python 3.8 and 3.9. Note that because the installation of JAX differs depending on your hardware accelerator, we advise users to explicitly install the correct JAX version (see the official installation guide).

Rendering: Matplotlib is used for rendering all the environments. To visualize the environments you will need a GUI backend. For example, on Linux, you can install Tk via: apt-get install python3-tk, or using conda: conda install tk. Check out Matplotlib backends for a list of backends you can use.

Quickstart ⚑

RL practitioners will find Jumanji's interface familiar as it combines the widely adopted OpenAI Gym and DeepMind Environment interfaces. From OpenAI Gym, we adopted the idea of a registry and the render method, while our TimeStep structure is inspired by DeepMind Environment.

Basic Usage πŸ§‘β€πŸ’»

import jax
import jumanji

# Instantiate a Jumanji environment using the registry
env = jumanji.make('Snake-v1')

# Reset your (jit-able) environment
key = jax.random.PRNGKey(0)
state, timestep = jax.jit(env.reset)(key)

# (Optional) Render the env state
env.render(state)

# Interact with the (jit-able) environment
action = env.action_spec().generate_value()          # Action selection (dummy value here)
state, timestep = jax.jit(env.step)(state, action)   # Take a step and observe the next state and time step
  • state represents the internal state of the environment: it contains all the information required to take a step when executing an action. This should not be confused with the observation contained in the timestep, which is the information perceived by the agent.
  • timestep is a dataclass containing step_type, reward, discount, observation and extras. This structure is similar to dm_env.TimeStep except for the extras field that was added to allow users to log environments metrics that are neither part of the agent's observation nor part of the environment's internal state.

Advanced Usage πŸ§‘β€πŸ”¬

Being written in JAX, Jumanji's environments benefit from many of its features including automatic vectorization/parallelization (jax.vmap, jax.pmap) and JIT-compilation (jax.jit), which can be composed arbitrarily. We provide an example of a more advanced usage in the advanced usage guide.

Registry and Versioning πŸ“–

Like OpenAI Gym, Jumanji keeps a strict versioning of its environments for reproducibility reasons. We maintain a registry of standard environments with their configuration. For each environment, a version suffix is appended, e.g. Snake-v1. When changes are made to environments that might impact learning results, the version number is incremented by one to prevent potential confusion. For a full list of registered versions of each environment, check out the documentation.

Training 🏎️

To showcase how to train RL agents on Jumanji environments, we provide a random agent and a vanilla actor-critic (A2C) agent. These agents can be found in jumanji/training/.

Because the environment framework in Jumanji is so flexible, it allows pretty much any problem to be implemented as a Jumanji environment, giving rise to very diverse observations. For this reason, environment-specific networks are required to capture the symmetries of each environment. Alongside the A2C agent implementation, we provide examples of such environment-specific actor-critic networks in jumanji/training/networks.

⚠️ The example agents in jumanji/training are only meant to serve as inspiration for how one can implement an agent. Jumanji is first and foremost a library of environments - as such, the agents and networks will not be maintained to a production standard.

For more information on how to use the example agents, see the training guide.

Contributing 🀝

Contributions are welcome! See our issue tracker for good first issues. Please read our contributing guidelines for details on how to submit pull requests, our Contributor License Agreement, and community guidelines.

Citing Jumanji ✏️

If you use Jumanji in your work, please cite the library using:

@misc{bonnet2023jumanji,
    title={Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX},
    author={
        ClΓ©ment Bonnet and Daniel Luo and Donal Byrne and Shikha Surana and Vincent Coyette and
        Paul Duckworth and Laurence I. Midgley and Tristan Kalloniatis and Sasha Abramowitz and
        Cemlyn N. Waters and Andries P. Smit and Nathan Grinsztajn and Ulrich A. Mbou Sob and
        Omayma Mahjoub and Elshadai Tegegn and Mohamed A. Mimouni and Raphael Boige and
        Ruan de Kock and Daniel Furelos-Blanco and Victor Le and Arnu Pretorius and
        Alexandre Laterre
    },
    year={2023},
    eprint={2306.09884},
    url={https://arxiv.org/abs/2306.09884},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

See Also πŸ”Ž

Other works have embraced the approach of writing RL environments in JAX. In particular, we suggest users check out the following sister repositories:

  • πŸ€– Qdax is a library to accelerate Quality-Diversity and neuro-evolution algorithms through hardware accelerators and parallelization.
  • 🌳 Evojax provides tools to enable neuroevolution algorithms to work with neural networks running across multiple TPU/GPUs.
  • 🦾 Brax is a differentiable physics engine that simulates environments made up of rigid bodies, joints, and actuators.
  • πŸ‹οΈβ€ Gymnax implements classic environments including classic control, bsuite, MinAtar and a collection of meta RL tasks.
  • 🎲 Pgx provides classic board game environments like Backgammon, Shogi, and Go.

Acknowledgements πŸ™

The development of this library was supported with Cloud TPUs from Google's TPU Research Cloud (TRC) 🌀.

More Repositories

1

Mava

🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
Python
704
star
2

nucleotide-transformer

🧬 Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
Python
363
star
3

flashbax

⚑ Flashbax: Accelerated Replay Buffers in JAX
Python
142
star
4

og-marl

Datasets with baselines for offline multi-agent reinforcement learning.
Python
127
star
5

tunbert

TunBERT is the first release of a pre-trained BERT model for the Tunisian dialect using a Tunisian Common-Crawl-based dataset. TunBERT was applied to three NLP downstream tasks: Sentiment Analysis (SA), Tunisian Dialect Identification (TDI) and Reading Comprehension Question-Answering (RCQA)
Python
92
star
6

AlphaNPI

Adapting the AlphaZero algorithm to remove the need of execution traces to train NPI.
Python
77
star
7

manyfold

🧬 ManyFold: An efficient and flexible library for training and validating protein folding models
Python
71
star
8

catx

πŸˆβ€β¬› Contextual bandits library for continuous action trees with smoothing in JAX
Python
61
star
9

marl-eval

A tool for aggregating and plotting MARL experiment data.
Python
59
star
10

poppy

🌺 Population-Based Reinforcement Learning for Combinatorial Optimization
Python
58
star
11

fastpbrl

Vectorization techniques for fast population-based training.
Python
52
star
12

FrameDiPT

FrameDiPT: an SE(3) diffusion model for protein structure inpainting
Jupyter Notebook
49
star
13

sebulba

πŸͺ The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
Python
46
star
14

InstaNovo

De novo peptide sequencing with InstaNovo: Accurate, database-free peptide identification for large scale proteomics experiments
Python
46
star
15

awesome-marl

A categorised list of Multi-Agent Reinforcemnt Learning (MARL) papers
46
star
16

compass

🧭 COMPASS: Combinatorial Optimization with Policy Adaptation using Latent Space Search
Python
21
star
17

EGTA-NMARL

Experiments for performing empirical game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning.
Python
16
star
18

protein-sequence-bfn

Supporting code for our paper "Protein Sequence Modelling with Bayesian Flow Networks"
Python
14
star
19

bioclip

Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs
Python
12
star
20

DebateLLM

Benchmarking Multi-Agent Debate between Language Models for Truthfulness in Q&A.
Jupyter Notebook
12
star
21

gtc-course-2020

Tutorial on Multi-Agent Reinforcement for Train Scheduling
Python
11
star
22

LightMHC

LightMHC: A Light Model for pMHC Structure Prediction with Graph Neural Networks
Python
11
star
23

gcp-gpu-metrics

πŸ“ˆ Tiny Go binary that aims to export Nvidia GPU metrics to GCP monitoring, based on nvidia-smi.
Go
11
star
24

outer-value-function-meta-rl

Code of the paper: Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function
Jupyter Notebook
10
star
25

matrax

A collection of matrix games in JAX
Python
9
star
26

qd-skill-discovery-benchmark

Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery
Python
9
star
27

scaling-resnets

⚑️ A framework that investigates the scaling limit of ResNets and compares it to Neural ODEs. Tested on synthetic and standardized datasets. πŸ“ˆ
Python
5
star
28

IndabaX-TN-2023-RL

Jupyter Notebook
4
star
29

selective-reincarnation-marl

Official repository for Reduce, Reuse, Recycle: Selective Reincarnation in Multi-Agent Reinforcement Learning paper, accepted at the Reincarnating RL workshop at ICLR 2023.
Python
4
star
30

amld-africa-2021

Repository for the workshop at AMLD Africa 2021.
Jupyter Notebook
3
star
31

Indabax-Tunisia-2019

This repository contains the practical notebooks for the Indabax Tunisia 2019, held in Tunis on 13 April.
Jupyter Notebook
3
star
32

IndabaX-SA-2021

IndabaX-SA-2021
Jupyter Notebook
2
star
33

locust-predict

Locust breeding ground prediction using pseudo-absence generation and machine learning.
Jupyter Notebook
2
star
34

tpu-workshop

Materials for the TPU Workshop
Jupyter Notebook
1
star
35

SKAInnotate

Jupyter Notebook
1
star