• Stars
    star
    3
  • Rank 3,944,358 (Top 79 %)
  • Language
    C++
  • License
    Other
  • Created over 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Child repository of https://github.com/HumanCompatibleAI/go_attack.

More Repositories

1

imitation

Clean PyTorch implementations of imitation and reward learning algorithms
Python
1,264
star
2

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
Jupyter Notebook
685
star
3

adversarial-policies

Find best-response to a fixed policy in multi-agent RL
Python
272
star
4

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"
Python
107
star
5

evaluating-rewards

Library to compare and evaluate reward functions
Python
61
star
6

overcooked-demo

Web application where humans can play Overcooked with AI agents.
JavaScript
55
star
7

seals

Benchmark environments for reward modelling and imitation learning algorithms.
Python
44
star
8

rlsp

Reward Learning by Simulating the Past
Python
43
star
9

tensor-trust

A prompt injection game to collect data for robust ML research
Python
39
star
10

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21
Python
36
star
11

go_attack

Python
31
star
12

tensor-trust-data

Dataset for the Tensor Trust project
Jupyter Notebook
29
star
13

atari-irl

Python
26
star
14

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.
Python
26
star
15

ranking-challenge

Testing ranking algorithms to improve social cohesion
Python
25
star
16

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards
Python
25
star
17

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.
Jupyter Notebook
22
star
18

human_ai_robustness

Python
21
star
19

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"
Python
21
star
20

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)
JavaScript
12
star
21

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"
Jupyter Notebook
11
star
22

better-adversarial-defenses

Training in bursts for defending against adversarial policies
Python
11
star
23

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.
Jupyter Notebook
9
star
24

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.
Python
6
star
25

reward-preprocessing

Preprocessing reward functions to make them more interpretable
Python
5
star
26

recon-email

Script for automatically creating the reconnaissance email.
HTML
5
star
27

assistance-games

Supporting code for Assistance Games as a Framework paper
Python
3
star
28

reducing-exploitability

Python
3
star
29

KataGoVisualizer

Jupyter Notebook
2
star
30

multi-agent

Python
2
star
31

derail

Supporting code for diagnostic seals paper
Python
2
star
32

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.
Python
1
star
33

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control
TeX
1
star
34

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world
Python
1
star
35

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.
Python
1
star
36

reward-function-interpretability

Jupyter Notebook
1
star