HumanCompatibleAI/deep-rlsp

Stars
26
Rank 930,752 (Top 19 %)
Language
Python
License
MIT License
Created over 3 years ago
Updated over 3 years ago

HumanCompatibleAI/deep-rlsp

HumanCompatibleAI

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

imitation

Clean PyTorch implementations of imitation and reward learning algorithms

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.

Jupyter Notebook

adversarial-policies

Find best-response to a fixed policy in multi-agent RL

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"

evaluating-rewards

Library to compare and evaluate reward functions

overcooked-demo

Web application where humans can play Overcooked with AI agents.

seals

Benchmark environments for reward modelling and imitation learning algorithms.

rlsp

Reward Learning by Simulating the Past

tensor-trust

A prompt injection game to collect data for robust ML research

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21

tensor-trust-data

Dataset for the Tensor Trust project

Jupyter Notebook

go_attack

ranking-challenge

Testing ranking algorithms to improve social cohesion

atari-irl

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.

Jupyter Notebook

human_ai_robustness

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"

Jupyter Notebook

better-adversarial-defenses

Training in bursts for defending against adversarial policies

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.

Jupyter Notebook

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.

reward-preprocessing

Preprocessing reward functions to make them more interpretable

recon-email

Script for automatically creating the reconnaissance email.

assistance-games

Supporting code for Assistance Games as a Framework paper

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.

reducing-exploitability

KataGoVisualizer

Jupyter Notebook

multi-agent

derail

Supporting code for diagnostic seals paper

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.

reward-function-interpretability

Jupyter Notebook