• Stars
    star
    901
  • Rank 50,477 (Top 1.0 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created almost 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Evolution Strategies Tool

ESTool

Evolved Biped Walker.

Implementation of various Evolution Strategies, such as GA, Population-based REINFORCE (Section 6 of Williams 1992), CMA-ES and OpenAI's ES using common interface.

CMA-ES is wrapping around pycma.

Notes

The tool last tested using the following configuration:

  • NumPy 1.13.3 (1.14 has some annoying warning).

  • OpenAI Gym 0.9.4 (breaks for 0.10.0+ since they changed the API).

  • cma 2.2.0, basically 2+ should work.

  • PyBullet 1.6.3 (possible that newer versions might work, but have not tested).

  • Python 3, although 2 might work.

  • mpi4py 2

Backround Reading:

A Visual Guide to Evolution Strategies

Evolving Stable Strategies

Using Evolution Strategies Library

To use es.py, please check out the simple_es_example.ipynb notebook.

The basic concept is:

solver = EvolutionStrategy()
while True:

  # ask the ES to give us a set of candidate solutions
  solutions = solver.ask()

  # create an array to hold the solutions.
  # solver.popsize = population size
  rewards = np.zeros(solver.popsize)

  # calculate the reward for each given solution
  # using your own evaluate() method
  for i in range(solver.popsize):
    rewards[i] = evaluate(solutions[i])

  # give rewards back to ES
  solver.tell(rewards)

  # get best parameter, reward from ES
  reward_vector = solver.result()

  if reward_vector[1] > MY_REQUIRED_REWARD:
    break

Parallel Processing Training with MPI

Please read Evolving Stable Strategies article for more demos and use cases.

To use the training tool (relies on MPI):

python train.py bullet_racecar -n 8 -t 4

will launch training jobs with 32 workers (using 8 MPI processes). the best model will be saved as a .json file in log/. This model should train in a few minutes on a 2014 MacBook Pro.

If you have more compute and have access to a 64-core CPU machine, I recommend:

python train.py name_of_environment -e 16 -n 64 -t 4

This will calculate fitness values based on an average of 16 random runs, on 256 workers (64 MPI processes x 4). In my experience this works reasonably well for most tasks inside config.py.

After training, to run pre-trained models:

python model.py bullet_ant log/name_of_your_json_file.json

Self-Contained Cartpole Swingup Task


If you don't want to install a physics engine, try it on the cartpole_swingup task that doesn't have any dependencies:

Training command:

python train.py cartpole_swingup -n 8 -e 1 -t 4 --sigma_init 1.0

After 400 generations, the final average score (over 32 trials) should be over 900. You can run it with this command:

python model.py cartpole_swingup log/cartpole_swingup.cma.1.32.best.json

If you haven't bothered to run the previous training command, you can load the pre-trained version:

python model.py cartpole_swingup zoo/cartpole_swingup.cma.json

Self-Contained Slime Volleyball Gym Environment


Here is an example for training slime volleyball gym environment:

Training command:

python train.py slimevolley -n 8 -e 8 -t 4 --sigma_init 0.5

Pre-trained model:

python model.py slimevolley zoo/slimevolley.cma.64.96.best.json

PyBullet Envs


bullet_ant pybullet environment. Population-based REINFORCE.

Another example: to run a minitaur duck model, run this locally:

python model.py bullet_minitaur_duck zoo/bullet_minitaur_duck.cma.256.json


Custom Minitaur Env.

In the .hist.json file, and on the screen output, we track the progress of training. The ordering of fields are:

  • generation count
  • time (seconds) taken so far
  • average fitness
  • worst fitness
  • best fitness
  • average standard deviation of params
  • average timesteps taken
  • max timesteps taken

Using plot_training_progress.ipynb in an IPython notebook, you can plot the traning logs for the .hist.json files. For example, in the bullet_ant task:


Bullet Ant training progress.

You need to install mpi4py, pybullet, gym etc to use various environments. Also roboschool/Box2D for some of the OpenAI gym envs.

On Windows, it is easiest to install mpi4py as follows:

  • Download and install mpi_x64.Msi from the HPC Pack 2012 MS-MPI Redistributable Package
  • Install a recent Visual Studio version with C++ compiler
  • Open a command prompt
git clone https://github.com/mpi4py/mpi4py
cd mpi4py
python setup.py install

Modify the train.py script and replace mpirun with mpiexec and -np with -n

Citation

If you find this work useful, please cite it as:

@article{ha2017evolving,
  title   = "Evolving Stable Strategies",
  author  = "Ha, David",
  journal = "blog.otoro.net",
  year    = "2017",
  url     = "http://blog.otoro.net/2017/11/12/evolving-stable-strategies/"
}

More Repositories

1

sketch-rnn

Multilayer LSTM and Mixture Density Network for modelling path-level SVG Vector Graphics data in TensorFlow
Python
793
star
2

write-rnn-tensorflow

Generative Handwriting using LSTM Mixture Density Network with TensorFlow
Python
692
star
3

slimevolleygym

A simple OpenAI Gym environment for single and multi-agent reinforcement learning
Python
633
star
4

WorldModelsExperiments

World Models Experiments
Jupyter Notebook
545
star
5

pytorch_notebooks

tutorial notebooks
Jupyter Notebook
371
star
6

cppn-gan-vae-tensorflow

Train CPPNs as a Generative Model, using Generative Adversarial Networks and Variational Autoencoder techniques to produce high resolution images.
Python
346
star
7

cppn-tensorflow

Very Simple and Basic Implementation of Compositional Pattern Producing Network in TensorFlow
Python
311
star
8

sketch-rnn-datasets

optional extra vector image datasets for sketch-rnn
Python
215
star
9

neuralslimevolley

Neural Slime Volleyball
JavaScript
201
star
10

supercell

supercell
Jupyter Notebook
188
star
11

rnn-tutorial

RNN Tutorial for Artists
JavaScript
183
star
12

resnet-cppn-gan-tensorflow

Using Residual Generative Adversarial Networks and Variational Auto-encoder techniques to produce high resolution images.
Python
123
star
13

backprop-neat-js

Neural Network Evolution Playground with Backprop NEAT
JavaScript
122
star
14

astool

Augmented environments with RL
Jupyter Notebook
102
star
15

image-notebook

image-notebook
Jupyter Notebook
87
star
16

gecco-tutorial-2019

2019 talk at GECCO
67
star
17

neurogram

neurogram
JavaScript
64
star
18

mdn_jax_tutorial

Mixture Density Networks (Bishop, 1994) tutorial in JAX
Jupyter Notebook
56
star
19

presentations

presentations
44
star
20

cppn-gan-vae-cifar-tensorflow

First attempt to use the previous CPPN-GAN-VAE model to train on CIFAR-10 images.
Python
39
star
21

netart-js

quick hacked together js demo to implement fixed-topology cppn in the web browser for interactive usage
JavaScript
39
star
22

sketch-rnn-flowchart

trained sketch-rnn / deployed with sketch-rnn-js on flowchart dataset
JavaScript
33
star
23

kanji2kanji

Reproduce domain transfer results in Deep Learning for Classical Japanese Literature
Jupyter Notebook
31
star
24

paperjs_box2d

demo of working with box2d and paper.js
JavaScript
31
star
25

RainbowSlimeVolley

Using Rainbow implementation in Chainer RL for Slime Volleyball Pixel Environment
Python
24
star
26

sketch-rnn-poster

Poster for "A Neural Representation for Sketch Drawings" for ICLR 2018 Conference
22
star
27

rlzoo

fork of rl-baseline-zoo
Python
21
star
28

quickdraw-ndjson-to-npz

Convert RAW Quickdraw datafiles into stroke-based .npz format with RDP at variable epsilon for sketch-rnn
Jupyter Notebook
15
star
29

sema-demo

sketch-rnn demo for seoul mediacity biennale 2018
JavaScript
13
star
30

diff-vae-tensorflow

skeleton variation encoder code in tensorflow
Python
12
star
31

pybullet_animations

pybullet_animations
12
star
32

cma

Fork of cma-es library by Nikolaus Hansen
Python
11
star
33

cantonese-list

List of 4000 Chinese characters sorted by historical usage frequency, with Cantonese yale romanization and definition
11
star
34

creatures

autonomous creatures written in processing / processing.js
Processing
7
star
35

zombie

A* search algorithm example, zombie chasing humans
Python
4
star
36

tshirtai

3
star
37

batchnorm-convnet-mnist-tensorflow

self contained skeleton script for testing and training mnist convnet+batchnorm based classifier in tensorflow
Python
2
star
38

chainerrl

ChainerRL is a deep reinforcement learning library built on top of Chainer.
Python
2
star