Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Shell

C#

Crystal

Kotlin

Perl

Solidity

Java

C

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Lua

Dart

C#

MATLAB

Ruby

Scala

C++

Objective-C

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇺🇦 Ukraine

🇲🇺 Mauritius

🇪🇬 Egypt

🇹🇳 Tunisia

🇸🇸 South Sudan

🇦🇿 Azerbaijan

🇬🇲 The Gambia

🇹🇱 Timor-Leste

All Countries Compare Countries

HumanCompatibleAI/tensor-trust

Stars
40
Rank 680,660 (Top 14 %)
Language
Python
License
BSD 2-Clause "Sim...
Created over 1 year ago
Updated about 1 month ago

HumanCompatibleAI/tensor-trust

HumanCompatibleAI

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A prompt injection game to collect data for robust ML research

imitation

Clean PyTorch implementations of imitation and reward learning algorithms

overcooked_ai

A benchmark environment for fully cooperative human-AI performance.

Jupyter Notebook

adversarial-policies

Find best-response to a fixed policy in multi-agent RL

human_aware_rl

Code for "On the Utility of Learning about Humans for Human-AI Coordination"

evaluating-rewards

Library to compare and evaluate reward functions

overcooked-demo

Web application where humans can play Overcooked with AI agents.

seals

Benchmark environments for reward modelling and imitation learning algorithms.

rlsp

Reward Learning by Simulating the Past

eirli

An Empirical Investigation of Representation Learning for Imitation (EIRLI), NeurIPS'21

tensor-trust-data

Dataset for the Tensor Trust project

Jupyter Notebook

go_attack

ranking-challenge

Testing ranking algorithms to improve social cohesion

atari-irl

deep-rlsp

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

population-irl

(Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards

learning_biases

Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.

Jupyter Notebook

human_ai_robustness

learning-from-human-preferences

Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"

overcooked-hAI-exp

Overcooked-AI Experiment Psiturk Demo (for MTurk experiments)

leela-interp

Code for "Evidence of Learned Look-Ahead in a Chess-Playing Neural Network"

Jupyter Notebook

better-adversarial-defenses

Training in bursts for defending against adversarial policies

interpreting-rewards

Experiments in applying interpretability techniques to learned reward functions.

Jupyter Notebook

nn-clustering-pytorch

Checking the divisibility of neural networks, and investigating the nature of the pieces networks can be divided into.

reward-preprocessing

Preprocessing reward functions to make them more interpretable

recon-email

Script for automatically creating the reconnaissance email.

assistance-games

Supporting code for Assistance Games as a Framework paper

KataGo-custom

Child repository of https://github.com/HumanCompatibleAI/go_attack.

reducing-exploitability

KataGoVisualizer

Jupyter Notebook

multi-agent

derail

Supporting code for diagnostic seals paper

epic

Implements the Equivalent-Policy Invariant Comparison (EPIC) distance for reward functions.

cs294-149-fa18-notes

LaTeX Notes from the Fall 2018 version of CS294-149: AGI Safety and Control

simulation-awareness

(experimental) RL agents should be more aligned if they do not know whether they are in simulation or in the real world

logical-active-classification

Use active learning to classify data represented as boundaries of regions in parameter space where a parametrised logical formula holds.

reward-function-interpretability

Jupyter Notebook