• Stars
    star
    107
  • Rank 321,736 (Top 7 %)
  • Language
    Python
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for "On the Utility of Learning about Humans for Human-AI Coordination"

Human-Aware Reinforcement Learning

⚠️ DEPRECATION WARNING

This repo is being deprecated and should no longer be used indepdently. This repo is now a module under the overcooked_ai project as we are in the process of consolidating several repos into one for convenience and better maintainability.

This repo should now only be used to reproduce the results in the 2019 paper On the Utility of Learning about Humans for Human-AI Coordination.

Note that this repository uses a specific older commit of the overcooked_ai repository, and should not be expected to work with the current version of that repository.

To play the game with trained agents, you can use Overcooked-Demo.

For more information about the Overcooked-AI environment, check out this repo.

Installation

When cloning the repository, make sure you also clone the submodules (this implementation is linked to specific commits of the submodules, and will mostly not work with more recent ones):

$ git clone --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git

If you want to clone a specific branch with its submodules, use:

$ git clone --single-branch --branch BRANCH_NAME --recursive https://github.com/HumanCompatibleAI/human_aware_rl.git

It is useful to setup a conda environment with Python 3.7:

$ conda create -n harl python=3.7
$ conda activate harl

To complete the installation, run:

               $ cd human_aware_rl
human_aware_rl $ ./install.sh

Then install tensorflow and mpi4py (the GPU or non-GPU version depending on your setup):

$ pip install tensorflow==1.13.1
$ conda install mpi4py
$ pip install tensorflow-gpu==1.13.1
$ conda install mpi4py

Note that using tensorflow-gpu will not enable to pass the DRL tests due to intrinsic randomness introduced by GPU computations. We recommend to first install tensorflow (non-GPU), run the tests, and then install tensorflow-gpu.

Verify Installation

To verify your installation, you can try running the following command from the inner human_aware_rl folder:

python run_tests.py

Note that most of the DRL tests rely on having the exact randomness settings that were used to generate the tests (and thus will not pass on a GPU-enabled device).

On OSX, you may run into an error saying that Python must be installed as a framework. You can fix it by telling Matplotlib to use a different backend.

Repo Structure Overview

ppo/ (both using baselines):

  • ppo.py: train one agent with PPO in Overcooked with other agent fixed

pbt/ (all using baselines):

  • pbt.py: train agents with population based training in overcooked

imitation/:

  • behaviour_cloning.py: simple script to perform BC on trajectory data using baselines

human/:

  • process_data.py script to process human data in specific formats to be used by DRL algorithms
  • data_processing_utils.py utils for the above

experiments/: folder with experiment scripts used to generate experimental results in the paper

baselines_utils.py: utility functions used for pbt.py overcooked_interactive.py: script to play Overcooked in terminal against trained agents run_tests.py: script to run all tests

Playing with trained agents

In terminal-graphics

To play with trained agents in the terminal, use overcooked_interactive.py. A sample command is:

python overcooked_interactive.py -t bc -r simple_bc_test_seed4

Playing requires not clicking away from the terminal window.

With JavaScript graphics

This requires converting the trained models to Tensorflow JS format, and visualizing with the overcooked-demo code. First install overcooked-demo and ensure it works properly.

Converting models to JS format

Unfortunately, converting models requires creating a new conda environment to avoid module conflicts.

Create and activate a new conda environment:

$ conda create -n model_conversion python=3.7
$ conda activate model_conversion

Run the base setup.py (from the inner human_aware_rl) and then install tensorflowjs:

human_aware_rl $ cd human_aware_rl
human_aware_rl $ python setup.py develop
human_aware_rl $ pip install tensorflowjs==0.8.5

To convert models in the right format, use the convert_model_to_web.sh script. Example usage:

human_aware_rl $ ./convert_model_to_web.sh ppo_runs ppo_sp_simple 193

where 193 is the seed number of the DRL run.

Transferring agents to Overcooked-Demo

The converted models can be found in human_aware_rl/data/web_models/ and should be transferred to the static/assets folder with the same naming as the standard models.

Playing with newly trained agents

To play with newly trained agents, just follow the instructions in the Overcooked-Demo README.

Reproducing results

All DRL results can be reproduced by running the .sh scripts under human_aware_rl/experiments/.

All non-DRL results can be reproduced by running cells in NeurIPS Experiments and Visualizations.ipynb.

More Repositories

1

imitation

Clean PyTorch implementations of imitation and reward learning algorithms
Python
1,264
star
2

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
Jupyter Notebook
685
star
3

adversarial-policies

Find best-response to a fixed policy in multi-agent RL
Python
272
star
4

evaluating-rewards

Library to compare and evaluate reward functions
Python
61
star
5

overcooked-demo

Web application where humans can play Overcooked with AI agents.
JavaScript
55
star
6

seals

Benchmark environments for reward modelling and imitation learning algorithms.
Python
44
star
7

rlsp

Reward Learning by Simulating the Past
Python
43
star
8

tensor-trust

A prompt injection game to collect data for robust ML research
Python
39
star
9

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21
Python
36
star
10

go_attack

Python
31
star
11

tensor-trust-data

Dataset for the Tensor Trust project
Jupyter Notebook
29
star
12

atari-irl

Python
26
star
13

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.
Python
26
star
14

ranking-challenge

Testing ranking algorithms to improve social cohesion
Python
25
star
15

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
Python
25
star
16

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.
Jupyter Notebook
22
star
17

human_ai_robustness

Python
21
star
18

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Python
21
star
19

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
JavaScript
12
star
20

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
Jupyter Notebook
11
star
21

better-adversarial-defenses

Training in bursts for defending against adversarial policies
Python
11
star
22

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.
Jupyter Notebook
9
star
23

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.
Python
6
star
24

reward-preprocessing

Preprocessing reward functions to make them more interpretable
Python
5
star
25

recon-email

Script for automatically creating the reconnaissance email.
HTML
5
star
26

assistance-games

Supporting code for Assistance Games as a Framework paper
Python
3
star
27

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.
C++
3
star
28

reducing-exploitability

Python
3
star
29

KataGoVisualizer

Jupyter Notebook
2
star
30

multi-agent

Python
2
star
31

derail

Supporting code for diagnostic seals paper
Python
2
star
32

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.
Python
1
star
33

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control
TeX
1
star
34

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world
Python
1
star
35

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.
Python
1
star
36

reward-function-interpretability

Jupyter Notebook
1
star