• Stars
    star
    1,205
  • Rank 37,306 (Top 0.8 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Video-Pre-Training

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

📄 Read Paper
📣 Blog Post
👾 MineRL Environment (note version 1.0+ required)
🏁 MineRL BASALT Competition

Running agent models

Install pre-requirements for MineRL. Then install requirements with:

pip install git+https://github.com/minerllabs/minerl
pip install -r requirements.txt

To run the code, call

python run_agent.py --model [path to .model file] --weights [path to .weight file]

After loading up, you should see a window of the agent playing Minecraft.

Agent Model Zoo

Below are the model files and weights files for various pre-trained Minecraft models. The 1x, 2x and 3x model files correspond to their respective model weights width.

Demonstration Only - Behavioral Cloning

These models are trained on video demonstrations of humans playing Minecraft using behavioral cloning (BC) and are more general than later models which use reinforcement learning (RL) to further optimize the policy. Foundational models are trained across all videos in a single training run while house and early game models refine their respective size foundational model further using either the housebuilding contractor data or early game video sub-set. See the paper linked above for more details.

Foundational Model 📈

Fine-Tuned from House 📈

Fine-Tuned from Early Game 📈

Models With Environment Interactions

These models further refine the above demonstration based models with a reward function targeted at obtaining diamond pickaxes. While less general then the behavioral cloning models, these models have the benefit of interacting with the environment using a reward function and excel at progressing through the tech tree quickly. See the paper for more information on how they were trained and the exact reward schedule.

RL from Foundation 📈

RL from House 📈

RL from Early Game 📈

Running Inverse Dynamics Model (IDM)

IDM aims to predict what actions player is taking in a video recording.

Setup:

To run the model with above files placed in the root directory of this code:

python run_inverse_dynamics_model.py -weights 4x_idm.weights --model 4x_idm.model --video-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.mp4 --jsonl-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.jsonl

A window should pop up which shows the video frame-by-frame, showing the predicted and true (recorded) actions side-by-side on the left.

Note that run_inverse_dynamics_model.py is designed to be a demo of the IDM, not code to put it into practice.

Using behavioural cloning to fine-tune the models

Disclaimer: This code is a rough demonstration only and not an exact recreation of what original VPT paper did (but it contains some preprocessing steps you want to be aware of)! As such, do not expect replicate the original experiments with this code. This code has been designed to be run-able on consumer hardware (e.g., 8GB of VRAM).

Setup:

  • Install requirements: pip install -r requirements.txt
  • Download .weights and .model file for model you want to fine-tune.
  • Download contractor data (below) and place the .mp4 and .jsonl files to the same directory (e.g., data). With default settings, you need at least 12 recordings.

If you downloaded the "1x Width" models and placed some data under data directory, you can perform finetuning with

python behavioural_cloning.py --data-dir data --in-model foundation-model-1x.model --in-weights foundation-model-1x.weights --out-weights finetuned-1x.weights

You can then use finetuned-1x.weights when running the agent. You can change the training settings at the top of behavioural_cloning.py.

Major limitations:

  • Only trains single step at the time, i.e., errors are not propagated through timesteps.
  • Computes gradients one sample at a time to keep memory use low, but also slows down the code.

Contractor Demonstrations

Versions

Over the course of the project we requested various demonstrations from contractors which we release as index files below. In general, major recorder versions change for a new prompt or recording feature while bug-fixes were represented as minor version changes. However, some recorder versions we asked contractors to change their username when recording particular modalities. Also, as contractors internally ask questions, clarification from one contractor may result in a behavioral change in the other contractor. It is intractable to share every contractor's view for each version, but we've shared the prompts and major clarifications for each recorder version where the task changed significantly.

Initial Prompt

We are collecting data for training AI models in Minecraft. You'll need to install java, download the modified version of minecraft (that collects and uploads your play data), and play minecraft survival mode! Paid per hour of gameplay. Prior experience in minecraft not. necessary. We do not collect any data that is unrelated to minecraft from your computer.

The following is a list of the available versions:

  • 6.x Core recorder features subject to change ⬇️ index file

    • 6.9 First feature complete recorder version

    • 6.10 Fixes mouse scaling on Mac when gui is open

    • 6.11 Tracks the hotbar slot

    • 6.13 Sprinting, swap-hands, ... (see commits below)

      Commits
      • improve replays that are cut in the middle of gui; working on riding boats / replays cut in the middle of a run
      • improve replays by adding dwheel action etc, also, loosen up replay tolerances
      • opencv version bump
      • add swap hands, and recording of the step timestamp
      • implement replaying from running and sprinting and tests
      • do not record sprinting (can use stats for that)
      • check for mouse button number, ignore >2
      • handle the errors when mouse / keyboard are recorded as null
  • 7.x Prompt changes ⬇️ index file

    • 7.6 Bump version for internal tracking

      Additional ask to contractors

      Right now, early game data is especially valuable to us. As such, we request that at least half of the data you upload is from the first 30 minutes of the game. This means that, for every hour of gameplay you spend in an older world, we ask you to play two sessions in which you create a new world and play for 30 minutes. You can play for longer in these worlds, but only the first 30 minutes counts as early game data.

  • 8.x 📋 House Building from Scratch Task ⬇️ index

    Changes and Prompt

    Hi all! Thank you for your hard work so far.

    This week we would like to have you all collect data on a specific task.

    This comes with a new recorder version 8.0 which you will need to update your recording script to download.

    This week we would like you to use a new world each time you play, so loading existing worlds is disabled.

    The new task is as follows:

    Starting in a new world, build a simple house in 10-15 minutes. This corresponds to one day and a bit of the night. Please use primarily wood, dirt, and sand, as well as crafted wood items such as doors, fences, ect. in constructing your house. Avoid using difficult items such as stone. Aside from those constraints, you may decorate the structure you build as you wish. It does not need to have any specific furniture. For example, it is OK if there is no bed in your house. If you have not finished the house by the sunrise (20 minutes) please exit and continue to another demonstration. Please continue to narrate what you are doing while completing this task.

    Since you will be unable to resume building after exiting Minecraft or going back to the main menu, you must finish these demonstrations in one session. Pausing via the menu is still supported. If you want to view your creations later, they will be saved locally so you can look at them in your own time. We may use these save files in a future task so if you have space, please leave the save files titled “build-house-15-min-“.

    For this week try to avoid all cobblestone / stone / granite

    For this week we just want simple houses without sleeping. If 10 minutes is too short, let us know and we can think of how to adjust!

    Stone tools are ok but I think you may run-out of time

    Changes:

    • Timer ends episode after 10 realtime minutes
    • Worlds are named: "build-house-15-min-" + Math.abs(random.nextInt());
    • Note this version introduces 10-minute timer that ends the episode. It cut experiments short occasionally and was fixed in 9.1
    • 8.0 Simple House
    • 8.2 Update upload script
  • 9.x 📋 House Building from Random Starting Materials Task ⬇️ index

    Changes and Prompt

    You now will have 10 minutes to use the provided resources to build your house / home / or structure. In this version, the experiment will time out after 10 minutes if you are not complete so don't be alarmed if that happens, it is intentional.

    No need to use up all the resources! It's ok to collect a few things but spend the majority of the time placing blocks (the act of placing seems to be harder to learn)

    Changes:

    • Worlds are named: "design-house-10-min-" + Math.abs(random.nextInt());
    • Starting inventory given by code below
    Random Starting Inventory Code
          Random random = new Random();
          List<ItemStack> hotbar = new ArrayList<>();
          List<ItemStack> inventory = new ArrayList<>();
    
          // Ensure we give the player the basic tools in their hot bar
          hotbar.add(new ItemStack(Items.STONE_AXE));
          hotbar.add(new ItemStack(Items.STONE_PICKAXE));
          hotbar.add(new ItemStack(Items.STONE_SHOVEL));
          hotbar.add(new ItemStack(Items.CRAFTING_TABLE));
    
          // Add some random items to the player hotbar as well
          addToList(hotbar, inventory, Items.TORCH, random.nextInt(16) * 2 + 2);
    
          // Next add main building blocks
          if (random.nextFloat() < 0.7) {
             addToList(hotbar, inventory, Items.OAK_FENCE_GATE, random.nextInt(5));
             addToList(hotbar, inventory, Items.OAK_FENCE, random.nextInt(5) * 64);
             addToList(hotbar, inventory, Items.OAK_DOOR, random.nextInt(5));
             addToList(hotbar, inventory, Items.OAK_TRAPDOOR, random.nextInt(2) * 2);
             addToList(hotbar, inventory, Items.OAK_PLANKS, random.nextInt(3) * 64 + 128);
             addToList(hotbar, inventory, Items.OAK_SLAB, random.nextInt(3) * 64);
             addToList(hotbar, inventory, Items.OAK_STAIRS, random.nextInt(3) * 64);
             addToList(hotbar, inventory, Items.OAK_LOG, random.nextInt(2) * 32);
             addToList(hotbar, inventory, Items.OAK_PRESSURE_PLATE, random.nextInt(5));
          } else {
             addToList(hotbar, inventory, Items.BIRCH_FENCE_GATE, random.nextInt(5));
             addToList(hotbar, inventory, Items.BIRCH_FENCE, random.nextInt(5) * 64);
             addToList(hotbar, inventory, Items.BIRCH_DOOR, random.nextInt(5));
             addToList(hotbar, inventory, Items.BIRCH_TRAPDOOR, random.nextInt(2) * 2);
             addToList(hotbar, inventory, Items.BIRCH_PLANKS, random.nextInt(3) * 64 + 128);
             addToList(hotbar, inventory, Items.BIRCH_SLAB, random.nextInt(3) * 64);
             addToList(hotbar, inventory, Items.BIRCH_STAIRS, random.nextInt(3) * 64);
             addToList(hotbar, inventory, Items.BIRCH_LOG, random.nextInt(2) * 32);
             addToList(hotbar, inventory, Items.BIRCH_PRESSURE_PLATE, random.nextInt(5));
          }
    
          // Now add some random decoration items to the player inventory
          addToList(hotbar, inventory, Items.CHEST, random.nextInt(3));
          addToList(hotbar, inventory, Items.FURNACE, random.nextInt(2) + 1);
          addToList(hotbar, inventory, Items.GLASS_PANE,  random.nextInt(5) * 4);
          addToList(hotbar, inventory, Items.WHITE_BED, (int) (random.nextFloat() + 0.2)); // Bed 20% of the time
          addToList(hotbar, inventory, Items.PAINTING, (int) (random.nextFloat() + 0.1)); // Painting 10% of the time
          addToList(hotbar, inventory, Items.FLOWER_POT, (int) (random.nextFloat() + 0.1) * 4); // 4 Flower pots 10% of the time
          addToList(hotbar, inventory, Items.OXEYE_DAISY, (int) (random.nextFloat() + 0.1) * 4); // 4 Oxeye daisies 10% of the time
          addToList(hotbar, inventory, Items.POPPY, (int) (random.nextFloat() + 0.1) * 4); // 4 Poppies 10% of the time
          addToList(hotbar, inventory, Items.SUNFLOWER, (int) (random.nextFloat() + 0.1) * 4); // 4 Sunflowers 10% of the time
    
          // Shuffle the hotbar slots and inventory slots
          Collections.shuffle(hotbar);
          Collections.shuffle(inventory);
    
          // Give the player the items
          this.mc.getIntegratedServer().getPlayerList().getPlayers().forEach(p -> {
             if (p.getUniqueID().equals(this.getUniqueID())) {
                 hotbar.forEach(p.inventory::addItemStackToInventory);
                 inventory.forEach(p.inventory::addItemStackToInventory);
             }
          });
    • 9.0 First version
    • 9.1 Fixed timer bug
  • 10.0 📋 Obtain Diamond Pickaxe Task ⬇️ index

    Changes and Prompt Prompt:

    For this new task we have given you 20 minutes to craft a diamond pickaxe. We ask that you do not try to search for villages or other ways of getting diamonds, but if you are spawned in view of one, or happen to fall into a cave structure feel free to explore it for diamonds. If 20 min is not enough that is OK. It will happen on some seeds because of bad luck. Please do not use glitches to find the diamonds.

    Changes:

    • change to 20 minute time limit
    • don't count gui time as part of the time limit
    • World are named "collect-diamond-pickaxe-15min-" + Math.abs(random.nextInt());

Sometimes we asked the contractors to signify other tasks besides changing the version. This primarily occurred in versions 6 and 7 as 8, 9 and 10 are all task specific.

Prompt to contractors (click to show) Another request about additional time - please use some of it to chop trees. Specifically, please start the recorder by adding --username treechop argument to the script (i.e. use play --username treechop on windows, ./play.sh --username treechop on osx/linux), and spend some time chopping trees! Getting wooden or stone tools is ok, but please spend the majority of the with username treechop specifically chopping. I did it myself for about 15 minutes, and it does get boring pretty quickly, so I don't expect you to do it all the time, but please do at least a little bit of chopping. Feel free to play normally the rest of the time (but please restart without --username treechop argument when you are not chopping) However, it is preferable that you start a new world though, and use only the tools that are easily obtainable in that world. I'll see what I can do about getting player an iron axe - that sounds reasonable, and should not be hard, but will require a code update.

Environment

We restrict the contractors to playing Minecraft in windowed mode at 720p which we downsample at 20hz to 360p to minimize space. We also disabled the options screen to prevent the contractor from changing things such as brightness, or rendering options. We ask contractors not to press keys such as f3 which shows a debug overlay, however some contractors may still do this.

Data format

Demonstrations are broken up into up to 5 minute segments consisting of a series of compressed screen observations, actions, environment statistics, and a checkpoint save file from the start of the segment. Each relative path in the index will have all the files for that given segment, however if a file was dropped while uploading, the corresponding relative path is not included in the index therefore there may be missing chunks from otherwise continuous demonstrations.

Index files are provided for each version as a json file:

{
  "basedir": "https://openaipublic.blob.core.windows.net/data/",
  "relpaths": [
    "8.0/cheeky-cornflower-setter-74ae6c2eae2e-20220315-122354",
    ...
  ]
}

Relative paths follow the following format:

  • <recorder-version>/<contractor-alias>-<session-id>-<date>-<time>

Note that due to network errors, some segments may be missing from otherwise continuous demonstrations.

Your data loader can then find following files:

  • Video observation: <basedir>/<relpath>.mp4
  • Action file: <basedir>/<relpath>.jsonl
  • Options file: <basedir>/<relpath>-options.json
  • Checkpoint save file: <basedir>/<relpath>.zip

The action file is not a valid json object: each line in action file is an individual action dictionary.

For v7.x, the actions are in form

{
  "mouse": {
    "x": 274.0,
    "y": 338.0,
    "dx": 0.0,
    "dy": 0.0,
    "scaledX": -366.0,
    "scaledY": -22.0,
    "dwheel": 0.0,
    "buttons": [],
    "newButtons": []
  },
  "keyboard": {
    "keys": [
      "key.keyboard.a",
      "key.keyboard.s"
    ],
    "newKeys": [],
    "chars": ""
  },
  "isGuiOpen": false,
  "isGuiInventory": false,
  "hotbar": 4,
  "yaw": -112.35006,
  "pitch": 8.099996,
  "xpos": 841.364694513396,
  "ypos": 63.0,
  "zpos": 24.956354839537802,
  "tick": 0,
  "milli": 1649575088006,
  "inventory": [
    {
      "type": "oak_door",
      "quantity": 3
    },
    {
      "type": "oak_planks",
      "quantity": 59
    },
    {
      "type": "stone_pickaxe",
      "quantity": 1
    },
    {
      "type": "oak_planks",
      "quantity": 64
    }
  ],
  "serverTick": 6001,
  "serverTickDurationMs": 36.3466,
  "stats": {
    "minecraft.custom:minecraft.jump": 4,
    "minecraft.custom:minecraft.time_since_rest": 5999,
    "minecraft.custom:minecraft.play_one_minute": 5999,
    "minecraft.custom:minecraft.time_since_death": 5999,
    "minecraft.custom:minecraft.walk_one_cm": 7554,
    "minecraft.use_item:minecraft.oak_planks": 5,
    "minecraft.custom:minecraft.fall_one_cm": 269,
    "minecraft.use_item:minecraft.glass_pane": 3
  }
}

BASALT 2022 dataset

We also collected a dataset of demonstrations for the MineRL BASALT 2022 competition, with around 150GB of data per task.

Note: To avoid confusion with the competition rules, the action files (.jsonl) have been stripped of information that is not allowed in the competition. We will upload unmodified dataset after the competition ends.

  • FindCave ⬇️ index file
    • Prompt to contractors (click to show)
      Look around for a cave. When you are inside one, quit the game by opening main menu and pressing "Save and Quit To Title".
      You are not allowed to dig down from the surface to find a cave.
      
      Timelimit: 3 minutes.
      Example recordings: https://www.youtube.com/watch?v=TclP_ozH-eg
      
  • MakeWaterfall ⬇️ index file
    • Prompt to contractors (click to show)
      After spawning in a mountainous area with a water bucket and various tools, build a beautiful waterfall and then reposition yourself to “take a scenic picture” of the same waterfall, and then quit the game by opening the menu and selecting "Save and Quit to Title"
      
      Timelimit: 5 minutes.
      Example recordings: https://youtu.be/NONcbS85NLA
      
  • MakeVillageAnimalPen ⬇️ index file
    • Prompt to contractors (click to show)
      After spawning in a village, build an animal pen next to one of the houses in a village. Use your fence posts to build one animal pen that contains at least two of the same animal. (You are only allowed to pen chickens, cows, pigs, sheep or rabbits.) There should be at least one gate that allows players to enter and exit easily. The animal pen should not contain more than one type of animal. (You may kill any extra types of animals that accidentally got into the pen.) Don’t harm the village.
      After you are done, quit the game by opening the menu and pressing "Save and Quit to Title".
      
      You may need to terraform the area around a house to build a pen. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Animal pens must have a single type of animal: pigs, cows, sheep, chicken or rabbits.
      
      The food items can be used to lure in the animals: if you hold seeds in your hand, this attracts nearby chickens to you, for example.
      
      Timelimit: 5 minutes.
      Example recordings: https://youtu.be/SLO7sep7BO8
      
  • BuildVillageHouse ⬇️ index file
    • Prompt to contractors (click to show)
      Taking advantage of the items in your inventory, build a new house in the style of the village (random biome), in an appropriate location (e.g. next to the path through the village), without harming the village in the process.
      Then give a brief tour of the house (i.e. spin around slowly such that all of the walls and the roof are visible).
      
      * You start with a stone pickaxe and a stone axe, and various building blocks. It’s okay to break items that you misplaced (e.g. use the stone pickaxe to break cobblestone blocks).
      * You are allowed to craft new blocks.
      
      Please spend less than ten minutes constructing your house.
      
      You don’t need to copy another house in the village exactly (in fact, we’re more interested in having slight deviations, while keeping the same "style"). You may need to terraform the area to make space for a new house.
      When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers.
      
      After you are done, quit the game by opening the menu and pressing "Save and Quit to Title".
      
      Timelimit: 12 minutes.
      Example recordings: https://youtu.be/WeVqQN96V_g
      

Contribution

This was a large effort by a dedicated team at OpenAI: Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune The code here represents a minimal version of our model code which was prepared by Anssi Kanervisto and others so that these models could be used as part of the MineRL BASALT competition.

More Repositories

1

whisper

Robust Speech Recognition via Large-Scale Weak Supervision
Python
57,624
star
2

openai-cookbook

Examples and guides for using the OpenAI API
MDX
55,428
star
3

gym

A toolkit for developing and comparing reinforcement learning algorithms.
Python
33,715
star
4

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Jupyter Notebook
21,231
star
5

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
Python
20,844
star
6

chatgpt-retrieval-plugin

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
Python
20,818
star
7

openai-python

The official Python library for the OpenAI API
Python
19,939
star
8

gpt-3

GPT-3: Language Models are Few-Shot Learners
15,573
star
9

baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
Python
15,252
star
10

evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Python
13,483
star
11

triton

Development repository for the Triton language and compiler
C++
11,038
star
12

DALL-E

PyTorch package for the discrete VAE used for DALL·E.
Python
10,672
star
13

shap-e

Generate 3D objects conditioned on text or images
Python
10,285
star
14

spinningup

An educational resource to help anyone learn deep reinforcement learning.
Python
8,587
star
15

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Python
8,533
star
16

universe

Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
Python
7,385
star
17

jukebox

Code for the paper "Jukebox: A Generative Model for Music"
Python
7,326
star
18

openai-node

The official Node.js / Typescript library for the OpenAI API
TypeScript
6,824
star
19

point-e

Point cloud diffusion for 3D model synthesis
Python
5,777
star
20

consistency_models

Official repo for consistency models.
Python
5,725
star
21

guided-diffusion

Python
5,000
star
22

plugins-quickstart

Get a ChatGPT plugin up and running in under 5 minutes!
Python
4,133
star
23

transformer-debugger

Python
3,607
star
24

retro

Retro Games in Gym
C
3,289
star
25

glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model
Python
3,277
star
26

glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
Python
3,016
star
27

mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
Cython
2,586
star
28

openai-quickstart-node

Node.js example app from the OpenAI API quickstart tutorial
JavaScript
2,501
star
29

weak-to-strong

Python
2,341
star
30

improved-gan

Code for the paper "Improved Techniques for Training GANs"
Python
2,218
star
31

improved-diffusion

Release for Improved Denoising Diffusion Probabilistic Models
Python
2,102
star
32

roboschool

DEPRECATED: Open-source software for robot simulation, integrated with OpenAI Gym.
Python
2,064
star
33

image-gpt

Python
1,990
star
34

consistencydecoder

Consistency Distilled Diff VAE
Python
1,933
star
35

finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
Python
1,929
star
36

multiagent-particle-envs

Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Python
1,871
star
37

gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
Python
1,865
star
38

pixel-cnn

Code for the paper "PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications"
Python
1,856
star
39

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
Python
1,755
star
40

requests-for-research

A living collection of deep learning problems
HTML
1,625
star
41

openai-quickstart-python

Python example app from the OpenAI API quickstart tutorial
1,608
star
42

gpt-discord-bot

Example Discord bot written in Python that uses the completions API to have conversations with the `text-davinci-003` model, and the moderations API to filter the messages.
Python
1,569
star
43

multi-agent-emergence-environments

Environment generation code for the paper "Emergent Tool Use From Multi-Agent Autocurricula"
Python
1,557
star
44

evolution-strategies-starter

Code for the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning"
Python
1,504
star
45

generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
Python
1,491
star
46

neural-mmo

Code for the paper "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents"
Python
1,463
star
47

sparse_attention

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Python
1,347
star
48

maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Python
1,284
star
49

prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems
Python
1,239
star
50

following-instructions-human-feedback

1,129
star
51

universe-starter-agent

A starter agent that can solve a number of universe environments.
Python
1,086
star
52

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences
Python
1,067
star
53

dalle-2-preview

1,049
star
54

InfoGAN

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
Python
1,029
star
55

procgen

Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
C++
972
star
56

supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"
JavaScript
955
star
57

blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution
Cuda
941
star
58

openai-openapi

OpenAPI specification for the OpenAI API
917
star
59

automated-interpretability

Python
875
star
60

grade-school-math

Python
859
star
61

kubernetes-ec2-autoscaler

A batch-optimized scaling manager for Kubernetes
Python
849
star
62

random-network-distillation

Code for the paper "Exploration by Random Network Distillation"
Python
847
star
63

summarize-from-feedback

Code for "Learning to summarize from human feedback"
Python
833
star
64

large-scale-curiosity

Code for the paper "Large-Scale Study of Curiosity-Driven Learning"
Python
798
star
65

multiagent-competition

Code for the paper "Emergent Complexity via Multi-agent Competition"
Python
761
star
66

imitation

Code for the paper "Generative Adversarial Imitation Learning"
Python
643
star
67

deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
Python
633
star
68

mlsh

Code for the paper "Meta-Learning Shared Hierarchies"
Python
588
star
69

iaf

Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"
Python
499
star
70

mujoco-worldgen

Automatic object XML generation for Mujoco
Python
475
star
71

safety-gym

Tools for accelerating safe exploration research.
Python
421
star
72

vdvae

Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"
Python
407
star
73

coinrun

Code for the paper "Quantifying Transfer in Reinforcement Learning"
C++
381
star
74

robogym

Robotics Gym Environments
Python
370
star
75

weightnorm

Example code for Weight Normalization, from "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks"
Python
357
star
76

atari-py

A packaged and slightly-modified version of https://github.com/bbitmaster/ale_python_interface
C++
354
star
77

openai-gemm

Open single and half precision gemm implementations
C
335
star
78

vime

Code for the paper "Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks"
Python
331
star
79

safety-starter-agents

Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.
Python
312
star
80

ebm_code_release

Code for Implicit Generation and Generalization with Energy Based Models
Python
311
star
81

CLIP-featurevis

code for reproducing some of the diagrams in the paper "Multimodal Neurons in Artificial Neural Networks"
Python
294
star
82

gym-http-api

API to access OpenAI Gym from other languages via HTTP
Python
291
star
83

gym-soccer

Python
289
star
84

robosumo

Code for the paper "Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments"
Python
283
star
85

EPG

Code for the paper "Evolved Policy Gradients"
Python
240
star
86

phasic-policy-gradient

Code for the paper "Phasic Policy Gradient"
Python
240
star
87

orrb

Code for the paper "OpenAI Remote Rendering Backend"
C#
235
star
88

miniF2F

Formal to Formal Mathematics Benchmark
Objective-C++
202
star
89

web-crawl-q-and-a-example

Learn how to crawl your website and build a Q/A bot with the OpenAI API
Jupyter Notebook
199
star
90

atari-reset

Code for the blog post "Learning Montezuma’s Revenge from a Single Demonstration"
Python
183
star
91

spinningup-workshop

For educational materials related to the spinning up workshops.
TeX
181
star
92

train-procgen

Code for the paper "Leveraging Procedural Generation to Benchmark Reinforcement Learning"
Python
167
star
93

human-eval-infilling

Code for the paper "Efficient Training of Language Models to Fill in the Middle"
Python
142
star
94

dallify-discord-bot

Example code for using OpenAI’s NodeJS SDK with discord.js SDK to create a Discord Bot that uses Slash Commands.
TypeScript
139
star
95

gym3

Vectorized interface for reinforcement learning environments
Python
136
star
96

lean-gym

Lean
134
star
97

retro-baselines

Publicly releasable baselines for the Retro contest
Python
128
star
98

neural-gpu

Code for the Neural GPU model originally described in "Neural GPUs Learn Algorithms"
Python
120
star
99

baselines-results

Jupyter Notebook
117
star
100

go-vncdriver

Fast VNC driver
Go
116
star