There are no reviews yet. Be the first to send feedback to the community and the maintainers!
imitation
Clean PyTorch implementations of imitation and reward learning algorithmsovercooked_ai
A benchmark environment for fully cooperative human-AI performance.adversarial-policies
Find best-response to a fixed policy in multi-agent RLhuman_aware_rl
Code for "On the Utility of Learning about Humans for Human-AI Coordination"evaluating-rewards
Library to compare and evaluate reward functionsovercooked-demo
Web application where humans can play Overcooked with AI agents.seals
Benchmark environments for reward modelling and imitation learning algorithms.rlsp
Reward Learning by Simulating the Pasttensor-trust
A prompt injection game to collect data for robust ML researcheirli
An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21tensor-trust-data
Dataset for the Tensor Trust projectgo_attack
ranking-challenge
Testing ranking algorithms to improve social cohesionatari-irl
population-irl
(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewardslearning_biases
Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.human_ai_robustness
learning-from-human-preferences
Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"overcooked-hAI-exp
Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)leela-interp
Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"better-adversarial-defenses
Training in bursts for defending against adversarial policiesinterpreting-rewards
Experiments in applying interpretability techniques to learned reward functions.nn-clustering-pytorch
Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.reward-preprocessing
Preprocessing reward functions to make them more interpretablerecon-email
Script for automatically creating the reconnaissance email.assistance-games
Supporting code for Assistance Games as a Framework paperKataGo-custom
Child repository of https://github.com/HumanCompatibleAI/go_attack.reducing-exploitability
KataGoVisualizer
multi-agent
derail
Supporting code for diagnostic seals paperepic
Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.cs294-149-fa18-notes
LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Controlsimulation-awareness
(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real worldlogical-active-classification
Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.reward-function-interpretability
Love Open Source and this site? Check out how you can help us