• Stars
    star
    685
  • Rank 65,548 (Top 2 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 5 years ago
  • Updated 26 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A benchmark environment for fully cooperative human-AI performance.

MDP python tests overcooked-ai codecov PyPI version "Open Issues" GitHub issues by-label Downloads arXiv

Overcooked-AI 🧑‍🍳🤖

5 of the available layouts. New layouts are easy to hardcode or generate programmatically.

Introduction 🥘

Overcooked-AI is a benchmark environment for fully cooperative human-AI task performance, based on the wildly popular video game Overcooked.

The goal of the game is to deliver soups as fast as possible. Each soup requires placing up to 3 ingredients in a pot, waiting for the soup to cook, and then having an agent pick up the soup and delivering it. The agents should split up tasks on the fly and coordinate effectively in order to achieve high reward.

You can try out the game here (playing with some previously trained DRL agents). To play with your own trained agents using this interface, you can use this repo. To run human-AI experiments, check out this repo. You can find some human-human and human-AI gameplay data already collected here.

DRL implementations compatible with the environment are included in the repo as a submodule under src/human_aware_rl.

The old human_aware_rl is being deprecated and should only used to reproduce the results in the 2019 paper: On the Utility of Learning about Humans for Human-AI Coordination (also see our blog post).

For simple usage of the environment, it's worthwhile considering using this environment wrapper.

Research Papers using Overcooked-AI 📑

Installation ☑️

Installing from PyPI 🗜

You can install the pre-compiled wheel file using pip.

pip install overcooked-ai

Note that PyPI releases are stable but infrequent. For the most up-to-date development features, build from source with pip install -e ..

Building from source 🔧

It is useful to setup a conda environment with Python 3.7 (virtualenv works too):

conda create -n overcooked_ai python=3.7
conda activate overcooked_ai

Clone the repo

git clone https://github.com/HumanCompatibleAI/overcooked_ai.git

Finally, use python setup-tools to locally install

If you just want to use the environment:

pip install -e overcooked_ai/

If you also need the DRL implementations:

pip install -e overcooked_ai[harl]

Verifying Installation 📈

When building from source, you can verify the installation by running the Overcooked unit test suite. The following commands should all be run from the overcooked_ai project root directory:

python testing/overcooked_test.py

To check whether the humam_aware_rl is installed correctly, you can run the following script from the src/human_aware_rl directory

$ ./run_tests.sh

⚠️Be sure to change your CWD to the human_aware_rl directory before running the script, as the test script uses the CWD to dynamically generate a path to save temporary training runs/checkpoints. The testing script will fail if not being run from the correct directory.

This will run all tests belonging to the human_aware_rl module. You can checkout the README in the submodule for instructions of running target-specific tests. This can be initiated from any directory.

If you're thinking of using the planning code extensively, you should run the full testing suite that verifies all of the Overcooked accessory tools (this can take 5-10 mins):

python -m unittest discover -s testing/ -p "*_test.py"

Code Structure Overview 🗺

overcooked_ai_py contains:

mdp/:

  • overcooked_mdp.py: main Overcooked game logic
  • overcooked_env.py: environment classes built on top of the Overcooked mdp
  • layout_generator.py: functions to generate random layouts programmatically

agents/:

  • agent.py: location of agent classes
  • benchmarking.py: sample trajectories of agents (both trained and planners) and load various models

planning/:

  • planners.py: near-optimal agent planning logic
  • search.py: A* search and shortest path logic

human_aware_rl contains:

ppo/:

  • ppo_rllib.py: Primary module where code for training a PPO agent resides. This includes an rllib compatible wrapper on OvercookedEnv, utilities for converting rllib Policy classes to Overcooked Agents, as well as utility functions and callbacks
  • ppo_rllib_client.py Driver code for configuing and launching the training of an agent. More details about usage below
  • ppo_rllib_from_params_client.py: train one agent with PPO in Overcooked with variable-MDPs
  • ppo_rllib_test.py Reproducibility tests for local sanity checks
  • run_experiments.sh Script for training agents on 5 classical layouts
  • trained_example/ Pretrained model for testing purposes

rllib/:

  • rllib.py: rllib agent and training utils that utilize Overcooked APIs
  • utils.py: utils for the above
  • tests.py: preliminary tests for the above

imitation/:

  • behavior_cloning_tf2.py: Module for training, saving, and loading a BC model
  • behavior_cloning_tf2_test.py: Contains basic reproducibility tests as well as unit tests for the various components of the bc module.

human/:

  • process_data.py script to process human data in specific formats to be used by DRL algorithms
  • data_processing_utils.py utils for the above

utils.py: utils for the repo

overcooked_demo contains:

server/:

  • app.py: The Flask app
  • game.py: The main logic of the game. State transitions are handled by overcooked.Gridworld object embedded in the game environment
  • move_agents.py: A script that simplifies copying checkpoints to agents directory. Instruction of how to use can be found inside the file or by running python move_agents.py -h

up.sh: Shell script to spin up the Docker server that hosts the game

Python Visualizations 🌠

See this Google Colab for some sample code for visualizing trajectories in python.

We have incorporated a notebook that guides users on the process of training, loading, and evaluating agents. Ideally, we would like to enable users to execute the notebook in Google Colab; however, due to Colab's default kernel being Python 3.10 and our repository being optimized for Python 3.7, some functions are presently incompatible with Colab. To provide a seamless experience, we have pre-executed all the cells in the notebook, allowing you to view the expected output when running it locally following the appropriate setup.

Overcooked_demo can also start an interactive game in the browser for visualizations. Details can be found in its README

Raw Data 📒

The raw data used in training is >100 MB, which makes it inconvenient to distribute via git. The code uses pickled dataframes for training and testing, but in case one needs to original data it can be found here

Further Issues and questions

If you have issues or questions, don't hesitate to contact Micah Carroll at [email protected].

More Repositories

1

imitation

Clean PyTorch implementations of imitation and reward learning algorithms
Python
1,264
star
2

adversarial-policies

Find best-response to a fixed policy in multi-agent RL
Python
272
star
3

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"
Python
107
star
4

evaluating-rewards

Library to compare and evaluate reward functions
Python
61
star
5

overcooked-demo

Web application where humans can play Overcooked with AI agents.
JavaScript
55
star
6

seals

Benchmark environments for reward modelling and imitation learning algorithms.
Python
44
star
7

rlsp

Reward Learning by Simulating the Past
Python
43
star
8

tensor-trust

A prompt injection game to collect data for robust ML research
Python
39
star
9

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21
Python
36
star
10

go_attack

Python
31
star
11

tensor-trust-data

Dataset for the Tensor Trust project
Jupyter Notebook
29
star
12

atari-irl

Python
26
star
13

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.
Python
26
star
14

ranking-challenge

Testing ranking algorithms to improve social cohesion
Python
25
star
15

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
Python
25
star
16

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.
Jupyter Notebook
22
star
17

human_ai_robustness

Python
21
star
18

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Python
21
star
19

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
JavaScript
12
star
20

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
Jupyter Notebook
11
star
21

better-adversarial-defenses

Training in bursts for defending against adversarial policies
Python
11
star
22

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.
Jupyter Notebook
9
star
23

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.
Python
6
star
24

reward-preprocessing

Preprocessing reward functions to make them more interpretable
Python
5
star
25

recon-email

Script for automatically creating the reconnaissance email.
HTML
5
star
26

assistance-games

Supporting code for Assistance Games as a Framework paper
Python
3
star
27

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.
C++
3
star
28

reducing-exploitability

Python
3
star
29

KataGoVisualizer

Jupyter Notebook
2
star
30

multi-agent

Python
2
star
31

derail

Supporting code for diagnostic seals paper
Python
2
star
32

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.
Python
1
star
33

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control
TeX
1
star
34

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world
Python
1
star
35

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.
Python
1
star
36

reward-function-interpretability

Jupyter Notebook
1
star