Center for Human-Compatible AI (@HumanCompatibleAI)

Top repositories

1

imitation

Clean PyTorch implementations of imitation and reward learning algorithms
Python
1,294
star
2

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
Jupyter Notebook
706
star
3

adversarial-policies

Find best-response to a fixed policy in multi-agent RL
Python
272
star
4

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"
Python
107
star
5

evaluating-rewards

Library to compare and evaluate reward functions
Python
61
star
6

overcooked-demo

Web application where humans can play Overcooked with AI agents.
JavaScript
55
star
7

seals

Benchmark environments for reward modelling and imitation learning algorithms.
Python
44
star
8

rlsp

Reward Learning by Simulating the Past
Python
43
star
9

tensor-trust

A prompt injection game to collect data for robust ML research
Python
40
star
10

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21
Python
36
star
11

tensor-trust-data

Dataset for the Tensor Trust project
Jupyter Notebook
31
star
12

go_attack

Python
31
star
13

ranking-challenge

Testing ranking algorithms to improve social cohesion
Python
27
star
14

atari-irl

Python
26
star
15

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.
Python
26
star
16

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
Python
25
star
17

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.
Jupyter Notebook
22
star
18

human_ai_robustness

Python
21
star
19

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Python
21
star
20

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
JavaScript
12
star
21

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
Jupyter Notebook
11
star
22

better-adversarial-defenses

Training in bursts for defending against adversarial policies
Python
11
star
23

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.
Jupyter Notebook
9
star
24

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.
Python
6
star
25

reward-preprocessing

Preprocessing reward functions to make them more interpretable
Python
5
star
26

recon-email

Script for automatically creating the reconnaissance email.
HTML
5
star
27

assistance-games

Supporting code for Assistance Games as a Framework paper
Python
3
star
28

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.
C++
3
star
29

reducing-exploitability

Python
3
star
30

KataGoVisualizer

Jupyter Notebook
2
star
31

multi-agent

Python
2
star
32

derail

Supporting code for diagnostic seals paper
Python
2
star
33

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.
Python
1
star
34

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control
TeX
1
star
35

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world
Python
1
star
36

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.
Python
1
star
37

reward-function-interpretability

Jupyter Notebook
1
star