• Stars
    star
    136
  • Rank 267,670 (Top 6 %)
  • Language
    Python
  • Created over 5 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

My solution to the Unity Obstacle Tower Challenge

obs-tower2

This is my solution to the Unity Obstacle Tower Challenge. Almost all of the code was freshly written for this contest, including a simple implementation of Proximal Policy Optimization.

Overview

The final agent has the following components. This is what is included in my contest submissions:

  • A classifier to tell what objects (e.g. box, door, key) are in an image
  • A state-augmented environment, providing a history of previous actions, rewards, and classifier outputs to the agent
  • Two feedforward policies: one for floors 0 through 9, and one for floors 10 and upwards

The agents are pre-trained with behavior cloning on roughly 2 million frames (~2.3 days) of human demonstration data. These pre-trained agents themselves do not perform well (they achieve an average floor of ~6, and solve floors >9 with a fairly low probability). The pre-trained agents are then fine-tuned using prierarchy, where the prior is the original pre-trained agent. This can be seen as fine-tuning the agent while keeping its behavior close to "human" behavior.

Two different agents are fine-tuned: one for floors 0-9, and one for 10 and onwards. This is because, while the 0-9 agent makes lots of progress on the lower floors very quickly, it seems to ignore the higher floors. Only after the 0-9 agent reaches an average of ~9.8 floors does it slowly start to conquer higher floors. My hypothesis is that the 0-9 agent has much more signal coming from the lower floors than from the higher floors, since lower floors are a much larger source of rewards. Thus, the lower floors drown out any learning that might take place on the higher floors.

The "10 and onwards" agent starts out solving floors with a fairly low probability (between 1% and 5%). Since this agent never sees easier (i.e. lower) floors, it has no choice but to focus on the difficult Sokoban puzzle and the other difficulties of the higher floors. Because of the human-based prior, the agent succeeds at these challenges with a non-negligible probability, giving it enough signal to learn from.

The agent itself is a feedforward model; it contains no recurrent connections. To help the agent remember the past, I feed it a stack of state vectors for the past 50 frames. Each state vector contains:

  • The action taken at that timestep
  • The reward received
  • Whether or not the agent has a key
  • The probability outputs from a hand-crafted classifier

During behavior cloning and fine-tuning, the agent has little control over what features it can remember from the past. All it has access to is what I thought would be important (e.g. whether or not a box was on the screen). This has obvious drawbacks, but it also has the advantage that the agent will definitely have access to important information. In practice, I found that using an RNN model was not nearly as effective as hand-crafting the agent's memory.

Codebase overview

This codebase has several components:

  • obs_tower2 - a library of learning algorithms and ML models
  • scripts - a set of scripts for training classifiers and agents
  • recorder - an application for recording human demonstrations
  • labeler - a web application for labeling images with various classes

Running the code

First, install the obs_tower2 package using pip:

pip install -e .

Next, configure your environment variables. The scripts depend on a few environment variables to locate training data and the obstacle tower binary. Here are the environment variables you should set:

  • OBS_TOWER_PATH - the path to the obstacle tower binary.
  • OBS_TOWER_RECORDINGS - the path to a directory where demonstrations are stored.
  • OBS_TOWER_IMAGE_LABELS - the path to the directory of labeled images.

Getting data

If you don't have a directory of labeled images or recordings, you can create an empty directory. However, the training scripts require that you have some data, and the agent will not learn well unless you give it a lot of hand-labeled and human-recorded data. You can either hand-generate the data yourself, or download all of the data I created myself.

To record data yourself, see the scripts recorder/record.py and labeler/main.py, which help you record demonstrations and label images, respectively. The recorder uses a pyglet UI to record demonstrations. The labeler is a web application that loads images from the recordings and lets you check off which classes they contain. The labeler also supports keyboard inputs, making it possible for an expert labeler to hit rates of anywhere from 20 to 40 labels per minute.

Training the models

Once you have labeled data and recordings, you are ready to train the classifier used for the agent's memory:

cd obs_tower2/scripts
python run_classifier.py

This script saves its result to save_classifier.pkl periodically. You will want to run the script until the model starts to overfit. With my dataset, this takes a couple of hours on a single GPU.

Next, you can use behavior cloning to train a prior agent:

python run_clone.py

This script saves its result to save_clone.pkl periodically. This may take up to a day to run, and with my dataset it will not overfit very much no matter how long you run it for. Once this is done, you can copy the classifier to be used as a prior:

cp save_clone.pkl save_prior.pkl

Next, you can train an agent that solves the first 10 floors:

cp save_prior.pkl save.pkl
python run_tail.py --min 0 --max 1 --path save.pkl

This script saves its result to whatever is passed as the --path, in this case save.pkl. Notice how we start out by copying save_prior.pkl as save.pkl. This means that the agent is initialized out as the human-based prior. You will likely want to run this script for a couple of weeks. You can run this at the same time as training an agent that solves the 10th floor and greater.

To train an agent that solves floors above the 10th floor, you can use the same run_tail.py script with different arguments:

cp save_prior.pkl save_tail.pkl
python run_tail.py --min 10 --max 15 --path save_tail.pkl

If you want to run two run_tail.py instances simultaneously, you should pass --worker-idx 0 to one of them. This ensures that one script uses worker IDs 0-7, while the other uses IDs 8-16.

More Repositories

1

gobfuscate

Obfuscate Go binaries and packages
Go
1,323
star
2

JamWiFi

A GUI, easy to use WiFi network jammer for Mac OS X
Objective-C
779
star
3

kahoot-hack

Reverse engineering kahoot.it
Go
470
star
4

muniverse

µniverse: RL environments for HTML5 games
JavaScript
361
star
5

Giraffe

Encode animated GIF files on the iPhone
Objective-C
261
star
6

weakai

AI algorithms implemented in Go
Go
235
star
7

anyrl-py

A reinforcement learning framework
Python
156
star
8

model3d

Create & render beautiful 3D models
Go
126
star
9

audioset

Fetch and use Google's AudioSet dataset
Go
123
star
10

sk2torch

Convert scikit-learn models to PyTorch modules
Python
113
star
11

num-analysis

Learning some Numerical Analysis
Go
95
star
12

cbyge

Reverse engineering Cync (formerly "C by GE") WiFi devices
Go
94
star
13

fbmsgr

Reverse engineering Facebook Messenger
Go
89
star
14

ANImageBitmapRep

A set of classes for easily manipulating images with bitmap data or CoreGraphics
Objective-C
85
star
15

car-data

Scraping and predicting car info
Python
82
star
16

vq-vae-2

A PyTorch implementation of the VQ-VAE-2 paper
Python
74
star
17

Benchmarks

Some language performance comparisons.
Rust
66
star
18

SnapchatHax

Hacking away at Snapchat from iOS!
Objective-C
65
star
19

learn-nerf

Learning about Neural Radiance Fields
Python
63
star
20

ImageReflection

A simple addition to UIImage allowing the reflection of images
Objective-C
55
star
21

cve-2018-4407

Crash macOS and iOS devices with one packet
Go
48
star
22

vq-voice-swap

Voice swapping with VQ-VAE and diffusion models
Python
48
star
23

GifPro

My new and improved Gif encoder for Mac
Objective-C
43
star
24

LibOrange

A simple AOL Instant Messenger implementation for Objective-C
Objective-C
42
star
25

vae-textures

Texture mapping with variational auto-encoders
Python
41
star
26

vq-draw

A discrete sequential VAE
Jupyter Notebook
38
star
27

PathIntersection

A class that can be used to find line intersections of CGPaths
Objective-C
36
star
28

learn-quantum

Learning about quantum computing
Go
36
star
29

anynet

Framework for artificial neural networks
Go
35
star
30

MP4Audio

A partially broken Objective-C API for extracting audio from MP4 files and editing metadata.
Objective-C
35
star
31

ANColorPicker

A custom mac-like color well for iPhone
Objective-C
33
star
32

sgdstore

Augmented RNN memory via live SGD
Go
32
star
33

Mac-Utils

A series of small applications to increase the Mac OS X experience
Objective-C
31
star
34

whichlang

Using ML to recognize programming languages
Go
27
star
35

spherenet

Implementing Deep Hyperspherical Learning
Python
27
star
36

cuda

Go bindings for CUDA, done right.
Go
26
star
37

svm-playground

Play around with SVMs in the browser
JavaScript
25
star
38

hopfield

Hopfield networks in TensorFlow
Python
23
star
39

char-rnn

Generate text with recurrent neural nets
Go
22
star
40

ddim

Denoising Diffusion Implicit Models
Jupyter Notebook
22
star
41

demoverse

Record demonstrations for µniverse
Go
21
star
42

alux

A lightweight C++ kernel designed to run a JavaScript or Dart VM
C++
21
star
43

rwa

RWA recurrent neural networks
Go
20
star
44

camera-hijack

A chrome extension to mess with the webcam
JavaScript
20
star
45

treeagent

Decision tree ensembles as RL policies
Go
19
star
46

SoundArt

Draw sound waves and hear them, iOS only
Objective-C
19
star
47

learnos

Reminding myself everything I knew about OSDev (and more)
C
19
star
48

ANExpressionParser

Terrible, old, Objective-C expression parser.
Objective-C
19
star
49

ImageTransfer

Bluetooth image transferring app for the iPhone
18
star
50

SocketKit

A C socket wrapper (with SSL) written in Objective-C
17
star
51

ScreenPear

A remote displays application for OS X, still in the works.
Objective-C
16
star
52

heatgrid

Emulate heat conduction in a solid
JavaScript
16
star
53

uno-ai

AI for the game Uno
Python
16
star
54

FreeRez

A GUI Mac OS X application for setting the native resolution on a Retina MBP
Objective-C
15
star
55

voronoi-interp

Create cool animations by gradually adding pixels to an interpolated image.
Go
15
star
56

sentigraph

Graph sentiment throughout a piece of text
Go
15
star
57

bezier-mnist

MNIST, but with Bezier curves instead of pixels
Python
15
star
58

ANDownload

A small download manager with pause&resume support for iphone and mac
Objective-C
15
star
59

anyrl

[Deprecated] APIs for Reinforcement Learning
Go
14
star
60

VideoExporter

A basic Objective-C wrapper for AV Foundation's AVAssetWriter
Objective-C
14
star
61

SpinWheel

A UIView that the user can spin with touch events
Objective-C
14
star
62

godsalg

Trying to find God's algorithm on a Rubik's cube
Go
14
star
63

statushub

A simple log aggregation tool
JavaScript
13
star
64

cnn-toys

Playing around with CNNs
Python
13
star
65

Wolfram-API

An Objective-C implementation of the Wolfram API 2.0
Objective-C
13
star
66

dist-sys

Teaching myself about distributed systems
Go
12
star
67

essentials

Things I wish were Go built-ins
Go
12
star
68

chatbot

Instant messaging with a neural network
Go
12
star
69

neuralspell

Spell and pronounce words with a neural network
Go
12
star
70

polish

Denoising networks for ray traced images
Go
12
star
71

text2emoji

Neural network that produces emojis from text
Python
12
star
72

ffmpego

A Go package for encoding and decoding video and audio files.
Go
12
star
73

torch-bandpass

An implementation of the Prism layer (https://arxiv.org/abs/2011.04823)
Jupyter Notebook
11
star
74

packet-proxy

A proxy for reverse engineering a communication protocol
Go
11
star
75

setres

A CLI for setting the resolution on Mac OS X on the retina MBPs
Objective-C
11
star
76

markovchain

Markov chains for text and anything else
Go
11
star
77

mnistdemo

Test MNIST classifiers from your browser
Go
11
star
78

cubezapp

An amazing cube timer
JavaScript
11
star
79

Expressions

An object-oriented mathematical expression parser for Objective-C
Objective-C
10
star
80

uber-ga

Implementation of Uber's genetic algorithm for RL
Python
10
star
81

learning-tf

Learning TensorFlow
Python
10
star
82

pca-compress

Compressing neural network initializations with PCA
Python
10
star
83

tweetembed

Build word embeddings for Tweets
Go
10
star
84

LassoCapture-old

Extended screenshot options for Mac OS X
Objective-C
10
star
85

SlideToUnlock

A slide-to-unlock interface for iOS
Objective-C
10
star
86

anarch

API for architecture-specific abstractions in OS kernels
C++
10
star
87

tf-env

RL environments written in pure TensorFlow
Python
10
star
88

agg

Command-line tool for numerical aggregates
Go
9
star
89

payrange

Tracking laundry machines
Rust
9
star
90

wav

A WAV encoding/decoding library for Go
Go
9
star
91

voronoi-glass

Create a cool glass-like pattern using Voronoi cells
Go
9
star
92

ANHTML

A lightweight HTML parser for Objective-C (ARC only)
Objective-C
9
star
93

captcha-crack

Cracking a simple captcha system
Python
9
star
94

ErrorScatter

A small prank application for Mac OS X
Objective-C
9
star
95

anyvec

Precision-agnostic vector abstractions
Go
9
star
96

solid-trace

Visualize 3D solids implemented as JavaScript boolean functions
JavaScript
9
star
97

smallpng

Lossy compression for PNG files
Go
9
star
98

speechrecog

Tools for speech recognition
Go
9
star
99

wavenet

A convenient TensorFlow package for the WaveNet architecture
Python
9
star
100

gospeech

An attempt at speech synthesis in Go
Go
9
star