• Stars
    star
    178
  • Rank 207,943 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created about 5 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

EvoGrad

EvoGrad is a lightweight tool for differentiating through expectation, built on top of PyTorch.

Tools that enable fast and flexible experimentation democratize and accelerate machine learning research. However, one field that so far has not been greatly impacted by automatic differentiation tools is evolutionary computation The reason is that most evolutionary algorithms are gradient-free: they do not follow any explicit mathematical gradient (i.e., the mathematically optimal local direction of improvement), and instead proceed through a generate-and-test heuristic. In other words, they create new variants, test them out, and keep the best.

Recent and exciting research in evolutionary algorithms for deep reinforcement learning, however, has highlighted how a specific class of evolutionary algorithms can benefit from auto-differentiation. Work from OpenAI demonstrated that a form of Natural Evolution Strategies (NES) is massively scalable, and competitive with modern deep reinforcement learning algorithms.

EvoGrad enables fast prototyping of NES-like algorithms. We believe there are many interesting algorithms yet to be discovered in this vein, and we hope this library will help to catalyze progress in the machine learning community.

Examples

Natural Evolution Strategies

As a first example, weโ€™ll implement the simplified NES algorithm of Salimans et al. (2017) in EvoGrad. EvoGrad provides several probability distributions which may be used in the expectation function. We will use a normal distribution because it is the most common choice in practice.

Letโ€™s consider the problem of finding a fitness peak in a simple 1-D search space.

We can define our population distribution over this search space to be initially centered at 1.0, with a fixed variance of 0.05, with the following Python code:

mu = torch.tensor([1.0], requires_grad=True)
p = Normal(mu, 0.05)

Next, letโ€™s define a simple fitness function that rewards individuals for approaching the location 5.0 in the search space:

def fitness(xs):
	return -(x - 5.0) ** 2

Each generation of evolution in NES takes samples from the population distribution and evaluates the fitness of each of those individual samples. Here we sample and evaluate 100 individuals from the distribution:

sample = p.sample(n=100)
fitnesses = fitness(sample)

Optionally, we can apply a whitening transformation to the fitnesses (a form of pre-processing that often increases NES performance), like this:

fitnesses = (fitnesses - fitnesses.mean()) / fitnesses.std()

Now we can use these calculated fitness values to estimate the mean fitness over our population distribution:

mean = expectation(fitnesses, sample, p=p)

Although we could have estimated the mean value directly with the snippet: mean = fitnesses.mean(), what we gain by instead using the EvoGrad expectation function is the ability to backpropagate through mean. We can then use the resulting auto-differentiated gradients to optimize the center of the 1D Gaussian population distribution (mu) through gradient descent (here, to increase the expected fitness value of the population):

mean.backward()
with torch.no_grad():
	mu += alpha * mu.grad
	mu.grad.zero_()

Maximizing Variance

As a more sophisticated example, rather than maximizing the mean fitness, we can maximize the variance of behaviors in the population. While fitness is a measure of quality for a fixed task, in some situations we want to prepare for the unknown, and instead might want our population to contain a diversity of behaviors that can easily be adapted to solve a wide range of possible future tasks.

To do so, we need a quantification of behavior, which we can call a behavior characterization. Similarly to how you can evaluate an individual parameter vector drawn from the population distribution to establish its fitness (e.g. how far does this controller cause a robot to walk?), you could evaluate such a draw and return some quantification of its behavior (e.g., what position does a robot controlled by this neural network locomote to?).

For this example, letโ€™s choose a simple but illustrative, 1D behavior characterization, namely, the product of two sine waves (one with much faster frequency than the other):

def behavior(x):
	return 5 * torch.sin(0.2 * x) * torch.sin(20 * x)

Now, instead of estimating the mean fitness, we can calculate a statistic that reflects the diversity of sampled behaviors. The variance of a distribution is one metric of diversity, and one variant of evolvability ES measures and optimizes such variance of behaviors sampled from the population distribution:

sample = p.sample(n=100)
behaviors = behavior(sample)
zscore = (behaviors - behaviors.mean()) / behaviors.std()
variance = expectation(zscore ** 2, sample, p=p)

Maximizing Entropy

In the previous example, the gradient would be relatively straightforward to compute by hand. However, sometimes we may need to maximize objectives whose derivatives would be much more challenging to derive. In particular, this final example will seek to maximize the entropy of the distribution of behaviors (a variant of evolvability ES).

Note that for this example you'll also have to install scipy from pip.

To create a differentiable estimate of entropy, we first compute the pairwise distances between the different behaviors. Next, we create a smooth probability distribution by fitting a kernel density estimate:

dists = scipy.spatial.distance.squareform(scipy.spatial.distance.pdist(behaviors, "sqeuclidean"))
kernel = torch.tensor(scipy.exp(-dists / k_sigma ** 2), dtype=torch.float32)
p_x = expectation(kernel, sample, p=p, dim=1)

Then, we can use these probabilities to estimate the entropy of the distribution, and run gradient descent on it as before:

entropy = expectation(-torch.log(p_x), sample, p=p)

Full code for these examples can be found in the demos directory of this repository.

Installation

Either install EvoGrad from pip:

pip install evograd

Or from the source code in this repository:

git clone github.com/uber-research/EvoGrad
cd EvoGrad
pip install -r requirements.txt
pip install -e .

About

Development of EvoGrad was led by Alex Gajewski as a Summer intern at Uber AI Labs.

More Repositories

1

deep-neuroevolution

Deep Neuroevolution
Python
1,616
star
2

PPLM

Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
Python
1,102
star
3

UPSNet

UPSNet: A Unified Panoptic Segmentation Network
Python
639
star
4

go-explore

Code for Go-Explore: a New Approach for Hard-Exploration Problems
Python
547
star
5

PyTorch-NEAT

Python
526
star
6

LaneGCN

[ECCV2020 Oral] Learning Lane Graph Representations for Motion Forecasting
Python
476
star
7

sbnet

Sparse Blocks Networks
Python
430
star
8

differentiable-plasticity

Implementations of the algorithms described in Differentiable plasticity: training plastic networks with gradient descent, a research paper from Uber AI Labs.
Python
394
star
9

DeepPruner

DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch (ICCV 2019)
Python
343
star
10

parallax

Tool for interactive embeddings visualization
Python
270
star
11

learning-to-reweight-examples

Code for paper "Learning to Reweight Examples for Robust Deep Learning"
Python
269
star
12

jpeg2dct

C++
251
star
13

poet

Paired Open-Ended Trailblazer (POET) and Enhanced POET
Python
235
star
14

intrinsic-dimension

Jupyter Notebook
220
star
15

CoordConv

Python
208
star
16

atari-model-zoo

A binary release of trained deep reinforcement learning models trained in the Atari machine learning benchmark, and a software release that enables easy visualization and analysis of models, and comparison across training algorithms.
Jupyter Notebook
201
star
17

ape-x

This repo replicates the results Horgan et al obtained in "Distributed Prioritized Experience Replay"
Python
188
star
18

TuRBO

Python
159
star
19

safemutations

safemutations
C++
143
star
20

permute-quantize-finetune

Using ideas from product quantization for state-of-the-art neural network compression.
Python
143
star
21

deconstructing-lottery-tickets

Python
142
star
22

CRISP

Python
131
star
23

metropolis-hastings-gans

Python
112
star
24

GTN

Python
75
star
25

backpropamine

Train self-modifying neural networks with neuromodulated plasticity
Python
73
star
26

loss-change-allocation

Python
61
star
27

MARVIN

Uber's Multi-Agent Routing Value Iteration Network
Python
52
star
28

GOCC

Go
51
star
29

Synthetic-Petri-Dish

Python
42
star
30

RxThreadEffectChecker

Static checker for Rx Threading Effects, based on the Checker Framework
Java
35
star
31

Map-Elites-Evolutionary

Map-Elites based on Evolution Strategies
Python
29
star
32

D3G

Estimating Q(s,s') with Deep Deterministic Dynamics Gradients
Python
29
star
33

java-dependency-validator

Dependency validator detects runtime compatibility issues at build time
Java
23
star
34

vargp

Variational Auto-Regressive Gaussian Processes for Continual Learning
Python
20
star
35

normative-uncertainty

Python
15
star
36

Evolvability-ES

Python
14
star
37

brezel

Starlark
8
star
38

dispatch-optim

Constrainted based optimization
Python
8
star
39

ga-world-models

Python
7
star
40

FSDM

Code tor the SIGDIAL 2019 paper Flexibly-Structured Model for Task-Oriented Dialogues. It implements a deep learning end-to-end differentiable dialogue system model
Python
7
star
41

rl-controller-verification

Quadcopter Verification
Python
5
star
42

go-context-propagate

Go
4
star
43

last-diff-analyzer

A multi-language tool for checking semantic equivalence for code
Go
2
star
44

tailr

TAILR
Python
1
star
45

xplane-bazel-docker

Bazel Xplane
C++
1
star