• This repository has been archived on 06/Apr/2018
  • Stars
    star
    7,385
  • Rank 4,944 (Top 0.2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Universe: a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.

This repository has been deprecated in favor of the Retro (https://github.com/openai/retro) library. See our Retro Contest (https://blog.openai.com/retro-contest) blog post for detalis.

universe

Universe is a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications. This is the universe open-source library, which provides a simple Gym interface to each Universe environment.

Universe allows anyone to train and evaluate AI agents on an extremely wide range of real-time, complex environments.

Universe makes it possible for any existing program to become an OpenAI Gym environment, without needing special access to the program's internals, source code, or APIs. It does this by packaging the program into a Docker container, and presenting the AI with the same interface a human uses: sending keyboard and mouse events, and receiving screen pixels. Our initial release contains over 1,000 environments in which an AI agent can take actions and gather observations.

Additionally, some environments include a reward signal sent to the agent, to guide reinforcement learning. We've included a few hundred environments with reward signals. These environments also include automated start menu clickthroughs, allowing your agent to skip to the interesting part of the environment.

We'd like the community's help to grow the number of available environments, including integrating increasingly large and complex games.

The following classes of tasks are packaged inside of publicly-available Docker containers, and can be run today with no work on your part:

  • Atari and CartPole environments over VNC: gym-core.Pong-v3, gym-core.CartPole-v0, etc.
  • Flashgames over VNC: flashgames.DuskDrive-v0, etc.
  • Browser tasks ("World of Bits") over VNC: wob.mini.TicTacToe-v0, etc.

We've scoped out integrations for many other games, including completing a high-quality GTA V integration (thanks to Craig Quiter and NVIDIA), but these aren't included in today's release.

Getting started

Installation

Supported systems

We currently support Linux and OSX running Python 2.7 or 3.5.

We recommend setting up a conda environment before getting started, to keep all your Universe-related packages in the same place.

Install Universe

To get started, first install universe:

git clone https://github.com/openai/universe.git
cd universe
pip install -e .

If this errors out, you may be missing some required packages. Here's the list of required packages we know about so far (please let us know if you had to install any others).

On Ubuntu 16.04:

pip install numpy
sudo apt-get install golang libjpeg-turbo8-dev make

On Ubuntu 14.04:

sudo add-apt-repository ppa:ubuntu-lxc/lxd-stable  # for newer golang
sudo apt-get update
sudo apt-get install golang libjpeg-turbo8-dev make

On OSX:

You might need to install Command Line Tools by running:

xcode-select --install

Or numpy, libjpeg-turbo and incremental packages:

pip install numpy incremental
brew install golang libjpeg-turbo
Install Docker

The majority of the environments in Universe run inside Docker containers, so you will need to install Docker (on OSX, we recommend Docker for Mac). You should be able to run docker ps and get something like this:

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
Alternate configuration - running the agent in docker

The above instructions result in an agent that runs as a regular python process in your OS, and launches docker containers as needed for the remotes. Alternatively, you can build a docker image for the agent and run it as a container as well. You can do this in any operating system that has a recent version of docker installed, and the git client.

To get started, clone the universe repo:

git clone https://github.com/openai/universe.git
cd universe

Build a docker image, tag it as 'universe':

docker build -t universe .

This may take a while the first time, as the docker image layers are pulled from docker hub.

Once the image is built, you can do a quick run of the test cases to make sure everything is working:

docker run --privileged --rm -e DOCKER_NET_HOST=172.17.0.1 -v /var/run/docker.sock:/var/run/docker.sock universe pytest

Here's a breakdown of that command:

  • docker run - launch a docker container
  • --rm - delete the container once the launched process finishes
  • -e DOCKER_NET_HOST=172.17.0.1 - tells the universe remote (when launched) to make its VNC connection back to this docker-allocated IP
  • -v /var/run/docker.sock:/var/run/docker.sock - makes the docker unix socket from the host available to the container. This is a common technique used to allow containers to launch other containers alongside itself.
  • universe - use the imaged named 'universe' built above
  • pytest - run 'pytest' in the container, which runs all the tests

At this point, you'll see a bunch of tests run and hopefully all pass.

To do some actual development work, you probably want to do another volume map from the universe repo on your host into the container, then shell in interactively:

docker run --privileged --rm -it -e DOCKER_NET_HOST=172.17.0.1 -v /var/run/docker.sock:/var/run/docker.sock -v (full path to cloned repo above):/usr/local/universe universe python

As you edit the files in your cloned git repo, they will be changed in your docker container and you'll be able to run them in python.

Note if you are using docker for Windows, you'll need to enable the relevant shared drive for this to work.

Notes on installation
  • When installing universe, you may see warning messages. These lines occur when installing numpy and are normal.
  • You'll need a go version of at least 1.5. Ubuntu 14.04 has an older Go, so you'll need to upgrade your Go installation.
  • We run Python 3.5 internally, so the Python 3.5 variants will be much more thoroughly performance tested. Please let us know if you see any issues on 2.7.
  • While we don't officially support Windows, we expect our code to be very close to working there. We'd be happy to take pull requests that take our Windows compatibility to 100%. In the meantime, the easiest way for Windows users to run universe is to use the alternate configuration described above.

System overview

A Universe environment is similar to any other Gym environment: the agent submits actions and receives observations using the step() method.

Internally, a Universe environment consists of two pieces: a client and a remote:

  • The client is a VNCEnv instance which lives in the same process as the agent. It performs functions like receiving the agent's actions, proxying them to the remote, queuing up rewards for the agent, and maintaining a local view of the current episode state.
  • The remote is the running environment dynamics, usually a program running inside of a Docker container. It can run anywhere -- locally, on a remote server, or in the cloud. (We have a separate page describing how to manage remotes.)
  • The client and the remote communicate with one another using the VNC remote desktop system, as well as over an auxiliary WebSocket channel for reward, diagnostic, and control messages. (For more information on client-remote communication, see the separate page on the Universe internal communication protocols.)

The code in this repository corresponds to the client side of the Universe environments. Additionally, you can freely access the Docker images for the remotes. We'll release the source repositories for the remotes in the future, along with tools to enable users to integrate new environments. Please sign up for our beta if you'd like early access.

Run your first agent

Now that you've installed the universe library, you should make sure it actually works. You can paste the example below into your python REPL. (You may need to press enter an extra time to make sure the while loop is executing.)

import gym
import universe  # register the universe environments

env = gym.make('flashgames.DuskDrive-v0')
env.configure(remotes=1)  # automatically creates a local docker container
observation_n = env.reset()

while True:
  action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]  # your agent here
  observation_n, reward_n, done_n, info = env.step(action_n)
  env.render()

The example will instantiate a client in your Python process, automatically pull the quay.io/openai/universe.flashgames image, and will start that image as the remote. (In our remotes documentation page, we explain other ways you can run remotes.)

It will take a few minutes for the image to pull the first time. After that, if all goes well, a window like the one below will soon pop up. Your agent, which is just pressing the up arrow repeatedly, is now playing a Flash racing game called Dusk Drive. Your agent is programmatically controlling a VNC client, connected to a VNC server running inside of a Docker container in the cloud, rendering a headless Chrome with Flash enabled:

https://github.com/openai/universe/blob/master/doc/dusk-drive.png?raw=true

You can even connect your own VNC client to the environment, either just to observe or to interfere with your agent. Our flashgames and gym-core images conveniently bundle a browser-based VNC client, which can be accessed at http://localhost:15900/viewer/?password=openai. If you're on Mac, connecting to a VNC server is as easy as running: open vnc://localhost:5900.

(If using docker-machine, you'll need to replace "localhost" with the IP address of your Docker daemon, and use openai as the password.)

Breaking down the example

So we managed to run an agent, what did all the code actually mean? We'll go line-by-line through the example.

  • First, we import the gym library, which is the base on which Universe is built. We also import universe, which registers all the Universe environments.
import gym
import universe # register the universe environments
  • Next, we create the environment instance. Behind the scenes, gym looks up the registration for flashgames.DuskDrive-v0, and instantiates a VNCEnv object which has been wrapped to add a few useful diagnostics and utilities. The VNCEnv object is the client part of the environment, and it is not yet connected to a remote.
env = gym.make('flashgames.DuskDrive-v0')
  • The call to configure() connects the client to a remote environment server. When called with configure(remotes=1), Universe will automatically create a Docker image running locally on your computer. The local client connects to the remote using VNC. (More information on client-remote communication can be found in the page on universe internal communication protocols. More on configuring remotes is at remotes.)
env.configure(remotes=1)
  • When starting a new environment, you call env.reset(). Universe environments run in real-time, rather than stepping synchronously with the agent's actions, so reset is asynchronous and returns immediately. Since the environment will not have waited to finish connecting to the VNC server before returning, the initial observations from reset will be None to indicate that there is not yet a valid observation.

    Similarly, the environment keeps running in the background even if the agent does not call env.step(). This means that an agent that successfully learns from a Universe environment cannot take "thinking breaks": it must keep sending actions to the environment at all times.

    Additionally, Universe introduces the vectorized Gym API. Rather than controlling a single environment at a time, the agent can control a fixed-size vector of n environments, each with its own remote. The return value from reset is therefore a vector of observations. For more information, see the separate page on environment semantics)

observation_n = env.reset()
  • At each step() call, the agent submits a vector of actions; one for each environment instance it is controlling. Each VNC action is a list of events; above, each action is the single event "press the ArrowUp key". The agent could press and release the key in one action by instead submitting [('KeyEvent', 'ArrowUp', True), ('KeyEvent', 'ArrowUp', False)] for each observation.

    In fact, the agent could largely have the same effect by just submitting ('KeyEvent', 'ArrowUp', True) once and then calling env.step([[] for ob in observation_n]) thereafter, without ever releasing the key using ('KeyEvent', 'ArrowUp', False). The browser running inside the remote would continue to statefully represent the arrow key as being pressed. Sending other unrelated keypresses would not disrupt the up arrow keypress; only explicitly releasing the key would cancel it. There's one slight subtlety: when the episode resets, the browser will reset, and will forget about the keypress; you'd need to submit a new ArrowUp at the start of each episode.

action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]
  • After we submit the action to the environment and render one frame, step() returns a list of observations, a list of rewards, a list of "done" booleans indicating whether the episode has ended, and then finally an info dictionary of the form {'n': [{}, ...]}, in which you can access the info for environment i as info['n'][i].

    Each environment's info message contains useful diagnostic information, including latency data, client and remote timings, VNC update counts, and reward message counts.

observation_n, reward_n, done_n, info = env.step(action_n)
env.render()
  • We call step in what looks like a busy loop. In reality, there is a Throttle wrapper on the client which defaults to a target frame rate of 60fps, or one frame every 16.7ms. If you call it more frequently than that, step will sleep with any leftover time.

Testing

We are using pytest for tests. You can run them via:

pytest

Run pytest --help for useful options, such as pytest -s (disables output capture) or pytest -k <expression> (runs only specific tests).

Additional documentation

More documentation not covered in this README can be found in the doc folder of this repository.

Getting help

If you encounter a problem that is not addressed in this README page or in the extra docs, then try our wiki page of solutions to common problems - and add to it if your solution isn't there!

You can also search through the issues on this repository and our discussion board to see if another user has posted about the same problem or to ask for help from the community.

If you still can't solve your problem after trying all of the above steps, please post an issue on this repository.

What's next?

  • Get started training RL algorithms! You can try out the Universe Starter Agent, an implementation of the A3C algorithm that can solve several VNC environments.
  • For more information on how to manage remotes, see the separate documentation page on remotes.
  • Sign up for a beta to get early access to upcoming Universe releases, such as tools to integrate new Universe environments or a dataset of recorded human demonstrations.

Changelog

  • 2017-02-08: The old location for wrappers.SafeActionSpace has been moved to wrappers.experimental.SafeActionSpace. SoftmaxClickMouse has also been moved to wrappers.experimental.SoftmaxClickMouse
  • 2017-01-08: The wrappers.SafeActionSpace has been moved to wrappers.experimental.SafeActionSpace. The old location will remain with a deprecation warning until 2017-02-08.
  • 2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a wrapper. Rather than starting monitoring as env.monitor.start(directory), envs are now wrapped as follows: env = wrappers.Monitor(env, directory). This change is on master and will be released with 0.21.0.

More Repositories

1

whisper

Robust Speech Recognition via Large-Scale Weak Supervision
Python
57,624
star
2

openai-cookbook

Examples and guides for using the OpenAI API
MDX
55,428
star
3

gym

A toolkit for developing and comparing reinforcement learning algorithms.
Python
33,715
star
4

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Jupyter Notebook
21,231
star
5

gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"
Python
20,844
star
6

chatgpt-retrieval-plugin

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.
Python
20,818
star
7

openai-python

The official Python library for the OpenAI API
Python
19,939
star
8

gpt-3

GPT-3: Language Models are Few-Shot Learners
15,573
star
9

baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
Python
15,252
star
10

evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Python
13,483
star
11

triton

Development repository for the Triton language and compiler
C++
11,038
star
12

DALL-E

PyTorch package for the discrete VAE used for DALL·E.
Python
10,672
star
13

shap-e

Generate 3D objects conditioned on text or images
Python
10,285
star
14

spinningup

An educational resource to help anyone learn deep reinforcement learning.
Python
8,587
star
15

tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Python
8,533
star
16

jukebox

Code for the paper "Jukebox: A Generative Model for Music"
Python
7,326
star
17

openai-node

The official Node.js / Typescript library for the OpenAI API
TypeScript
6,824
star
18

point-e

Point cloud diffusion for 3D model synthesis
Python
5,777
star
19

consistency_models

Official repo for consistency models.
Python
5,725
star
20

guided-diffusion

Python
5,000
star
21

plugins-quickstart

Get a ChatGPT plugin up and running in under 5 minutes!
Python
4,133
star
22

transformer-debugger

Python
3,607
star
23

retro

Retro Games in Gym
C
3,289
star
24

glide-text2im

GLIDE: a diffusion-based text-conditional image synthesis model
Python
3,277
star
25

glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
Python
3,016
star
26

mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
Cython
2,586
star
27

openai-quickstart-node

Node.js example app from the OpenAI API quickstart tutorial
JavaScript
2,501
star
28

weak-to-strong

Python
2,341
star
29

improved-gan

Code for the paper "Improved Techniques for Training GANs"
Python
2,218
star
30

improved-diffusion

Release for Improved Denoising Diffusion Probabilistic Models
Python
2,102
star
31

roboschool

DEPRECATED: Open-source software for robot simulation, integrated with OpenAI Gym.
Python
2,064
star
32

image-gpt

Python
1,990
star
33

consistencydecoder

Consistency Distilled Diff VAE
Python
1,933
star
34

finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
Python
1,929
star
35

multiagent-particle-envs

Code for a multi-agent particle environment used in the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Python
1,871
star
36

gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
Python
1,865
star
37

pixel-cnn

Code for the paper "PixelCNN++: A PixelCNN Implementation with Discretized Logistic Mixture Likelihood and Other Modifications"
Python
1,856
star
38

human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"
Python
1,755
star
39

requests-for-research

A living collection of deep learning problems
HTML
1,625
star
40

openai-quickstart-python

Python example app from the OpenAI API quickstart tutorial
1,608
star
41

gpt-discord-bot

Example Discord bot written in Python that uses the completions API to have conversations with the `text-davinci-003` model, and the moderations API to filter the messages.
Python
1,569
star
42

multi-agent-emergence-environments

Environment generation code for the paper "Emergent Tool Use From Multi-Agent Autocurricula"
Python
1,557
star
43

evolution-strategies-starter

Code for the paper "Evolution Strategies as a Scalable Alternative to Reinforcement Learning"
Python
1,504
star
44

generating-reviews-discovering-sentiment

Code for "Learning to Generate Reviews and Discovering Sentiment"
Python
1,491
star
45

neural-mmo

Code for the paper "Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents"
Python
1,463
star
46

sparse_attention

Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Python
1,347
star
47

maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"
Python
1,284
star
48

prm800k

800,000 step-level correctness labels on LLM solutions to MATH problems
Python
1,239
star
49

Video-Pre-Training

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Python
1,205
star
50

following-instructions-human-feedback

1,129
star
51

universe-starter-agent

A starter agent that can solve a number of universe environments.
Python
1,086
star
52

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences
Python
1,067
star
53

dalle-2-preview

1,049
star
54

InfoGAN

Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"
Python
1,029
star
55

procgen

Procgen Benchmark: Procedurally-Generated Game-Like Gym-Environments
C++
972
star
56

supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"
JavaScript
955
star
57

blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution
Cuda
941
star
58

openai-openapi

OpenAPI specification for the OpenAI API
917
star
59

automated-interpretability

Python
875
star
60

grade-school-math

Python
859
star
61

kubernetes-ec2-autoscaler

A batch-optimized scaling manager for Kubernetes
Python
849
star
62

random-network-distillation

Code for the paper "Exploration by Random Network Distillation"
Python
847
star
63

summarize-from-feedback

Code for "Learning to summarize from human feedback"
Python
833
star
64

large-scale-curiosity

Code for the paper "Large-Scale Study of Curiosity-Driven Learning"
Python
798
star
65

multiagent-competition

Code for the paper "Emergent Complexity via Multi-agent Competition"
Python
761
star
66

imitation

Code for the paper "Generative Adversarial Imitation Learning"
Python
643
star
67

deeptype

Code for the paper "DeepType: Multilingual Entity Linking by Neural Type System Evolution"
Python
633
star
68

mlsh

Code for the paper "Meta-Learning Shared Hierarchies"
Python
588
star
69

iaf

Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"
Python
499
star
70

mujoco-worldgen

Automatic object XML generation for Mujoco
Python
475
star
71

safety-gym

Tools for accelerating safe exploration research.
Python
421
star
72

vdvae

Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"
Python
407
star
73

coinrun

Code for the paper "Quantifying Transfer in Reinforcement Learning"
C++
381
star
74

robogym

Robotics Gym Environments
Python
370
star
75

weightnorm

Example code for Weight Normalization, from "Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks"
Python
357
star
76

atari-py

A packaged and slightly-modified version of https://github.com/bbitmaster/ale_python_interface
C++
354
star
77

openai-gemm

Open single and half precision gemm implementations
C
335
star
78

vime

Code for the paper "Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks"
Python
331
star
79

safety-starter-agents

Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.
Python
312
star
80

ebm_code_release

Code for Implicit Generation and Generalization with Energy Based Models
Python
311
star
81

CLIP-featurevis

code for reproducing some of the diagrams in the paper "Multimodal Neurons in Artificial Neural Networks"
Python
294
star
82

gym-http-api

API to access OpenAI Gym from other languages via HTTP
Python
291
star
83

gym-soccer

Python
289
star
84

robosumo

Code for the paper "Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments"
Python
283
star
85

EPG

Code for the paper "Evolved Policy Gradients"
Python
240
star
86

phasic-policy-gradient

Code for the paper "Phasic Policy Gradient"
Python
240
star
87

orrb

Code for the paper "OpenAI Remote Rendering Backend"
C#
235
star
88

miniF2F

Formal to Formal Mathematics Benchmark
Objective-C++
202
star
89

web-crawl-q-and-a-example

Learn how to crawl your website and build a Q/A bot with the OpenAI API
Jupyter Notebook
199
star
90

atari-reset

Code for the blog post "Learning Montezuma’s Revenge from a Single Demonstration"
Python
183
star
91

spinningup-workshop

For educational materials related to the spinning up workshops.
TeX
181
star
92

train-procgen

Code for the paper "Leveraging Procedural Generation to Benchmark Reinforcement Learning"
Python
167
star
93

human-eval-infilling

Code for the paper "Efficient Training of Language Models to Fill in the Middle"
Python
142
star
94

dallify-discord-bot

Example code for using OpenAI’s NodeJS SDK with discord.js SDK to create a Discord Bot that uses Slash Commands.
TypeScript
139
star
95

gym3

Vectorized interface for reinforcement learning environments
Python
136
star
96

lean-gym

Lean
134
star
97

retro-baselines

Publicly releasable baselines for the Retro contest
Python
128
star
98

neural-gpu

Code for the Neural GPU model originally described in "Neural GPUs Learn Algorithms"
Python
120
star
99

baselines-results

Jupyter Notebook
117
star
100

go-vncdriver

Fast VNC driver
Go
116
star