• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Tensorflow Implementation of PathNet: Evolution Channels Gradient Descent in Super Neural Networks

pathnet

Tensorflow Implementation of Pathnet from Google Deepmind.

Implementation is on Tensorflow r1.2

https://arxiv.org/pdf/1701.08734.pdf

"Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning." Form Paper

alt tag

Failure Story

Memory Leak Problem was happened without placeholder for geopath. Without placeholder, changing the value of tensor variable is to assign new memory, thus assigning new path for each generation caused memory leak and slow learning.

Binary MNIST classification tasks

python binary_mnist_pathnet.py

If you want to run that repeatly, then do as followed.

./auto_binary_mnist_pathnet.sh

Settings

L, M, N, B and the number of populations are 3, 10, 3, 2 and 20, respectively (In paper, the number of populations is 64.). GradientDescent Method is used with learning rate=0.05 (In paper, learning rate=0.0001.). Aggregation function between layers is average (In paper, that is summation.). Skip connection, Resnet and linear modules are used for each layers except input layer. Fixed path of first task is always activated when feed-forwarding the networks on second task (In paper, the path is not always activated.). The learning is converaged, when training accuracy is over 99%.

Chrisantha Fernando (1st author of this paper) and I checked the results of the paper was generated when the value is 20. Thus, I set that as 20. I set bigger learning rate vaule than that of paper for getting results faster than before. Higher learning rate can accelate network learning faster than positive transfer learning. For de-accelating converage, average function is used. The author and I checked the paper results was generated when last aggregation function is average not summation (Except last one, others are summation.). Fixed path activation is for generating more dramatic results than before. For faster converage than before, lower converage accuracy then before(99.8%) is used.

B candidates use same data batchs. geopath set and parameters except the ones on optimal path of first task are reset after finishing first task.

Results

alt tag alt tag alt tag

The experiments are 1vs3 <-> 1vs2 and 4vs5 <-> 6vs7. The reason of selecting those classes is to check positive transfer learning whenever there are sharing class or not.

1vs3 experiments showed first task and second task after 1vs2 converage generation means are about 168.25 and 82.64. Pathnet made about 2 times faster converage than that from the scratch.

1vs2 experiments showed first task and second task after 1vs3 converage generation means are about 196.60 and 118.32. Pathnet made about 1.7 times faster converage than that from the scratch.

4vs5 experiments showed first task and second task after 6vs7 converage generation means are about 270.68 and 149.31. Pathnet made about 1.8 times faster converage than that from the scratch.

6vs7 experiments showed first task and second task after 4vs5 converage generation means are about 93.69 and 55.91. Pathnet made about 1.7 times faster converage than that from the scratch.

Pathnet showed about 1.7~2 times better performance than that of "learning from scratch" on Binary MNIST Classification whenever there are sharing class or not.

CIFAR10 and SVHN classification tasks

python cifar_svhn_pathnet.py

If you want to run that repeatly, then do as followed.

./auto_cifar_svhn_pathnet.sh

Settings

L, M, N, B and the number of populations are 3, 20, 5, 2 and 20, respectively. GradientDescent Method is used with learning rate=0.2 (With learning rate=0.05, this task can not be learned. Thus, higher learning rate than before is set). The accuracy is checked after 500 epoches.

Except M, N and learning rate, other parameters are same to that of Binary MNIST classification task.

Results

alt tag

The experiments are CIFAR10 <-> SVHN.

CIFAR10 experiments showed first task and second task after SVHN accuracy means are about 38.56% and 41.75%. Pathnet made about 1.1 times higher accuracy than that from the scratch.

SVHN experiments showed first task and second task after CIFAR10 accuracy means are about 19.68% and 56.25%. Pathnet made about 2.86 times higher accuracy than that from the scratch.

Pathnet showed positive transfer learning performance for both of the datasets. For SVHN, quitely higher transfer learning performance than CIFAR10 is showed. Because, CIFAR10 dataset has more plenty of patterns than SVHN.

Atari Game (Pong)

./auto_atari_pathnet.sh

This module is implemented by Distributed Tensorflow. You can set the number of parameter server and worker in the shell script, and please before running that, check the port is idle (used port number is from 2222 to 2222+ps#+w#).

Basic code for A3C is based on https://github.com/miyosuda/async_deep_reinforce

Settings

L, M, N are 4, 10, anf 4, respectively (same to the paper). The feature for each conv layer is 8 (same to original ones from author, I did check that.) B and the number ofpopulations are 3 and 10, respectively, which are different to the paper, because my server cannot run 64 worker parallelly, thus, I did decrease the number of populations and B. Aggregation function between layers is summation for faster learning than average (In paper, that is summation.).

I implemented PathNet with Distributed Tensorflow by adding one worker for processing genetic algorithm. The worker checks score set including each worker's one, and operates genetic algorithm (in here, that is tournament algorithm.). Those operation is processed per each 5 seconds. As same to the paper, winner score is not initialized as -1000.

I apply LSTM layer after last layer of pathnet for learning the model more efficiently than original one. (LSTM layer is also initilized after the task.) (LSTM layer makes really more efficient learning than before. I checked the model except LSTM, which are saturated at about 100M step, however the model having LSTM just needs about 20M step.)

I used just pong game for checking positive transfer learning (the parameters except fixed path are initialzed after first task.), assumed second pong game will be more quickly saturated than first one. Each task learns pong game in 15M steps, and I checked score graph in tensorboard.

Results

alt tag

The experiments are just two times pong game for checking positive transfer learning (the parameters except fixed path are initialzed after first task.), and I assumed second pong game will be more quickly saturated than first one. Each task learns pong game in 15M steps, and I checked score graph in tensorboard.

I can check second pong game was saturated i more quickly than first one.

More Repositories

1

adv_attack_capsnet

Tensorflow Implementation of Adversarial Attack to Capsule Networks
Python
174
star
2

Interaction-networks_tensorflow

Tensorflow Implementation of Interaction Networks for Learning about Objects, Relations and Physics
Python
158
star
3

visual-interaction-networks_tensorflow

Tensorflow Implementation of Visual Interaction Networks
Python
132
star
4

dreamer-torch

Pytorch version of Dreamer, which follows the original TF v2 codes.
Python
111
star
5

SequentialData-GAN

Tensorflow Implementation of GAN modeling for sequential data
Python
69
star
6

bmaml

This repository contains implementations of the paper, Bayesian Model-Agnostic Meta-Learning.
Python
56
star
7

programmable-agents_tensorflow

Tensorflow Implementation of Programmable Agents
Python
36
star
8

a3c-distributed_tensorflow

Distributed Tensorflow Implementation of Asynchronous Methods for Deep Reinforcement Learning
Python
31
star
9

bmaml_rl

This repository contains implementations of the paper, Bayesian Model-Agnostic Meta-Learning.
Python
19
star
10

V-MPO_torch

V-MPO torch version with DMLab30 and GTrXL
Python
12
star
11

Asynchronous-DDPG_distributed_tensorflow

Distributed Tensorflow Implementation of Asynchronous DDPG
Python
12
star
12

tree-structured-group-lasso

Python Implementation of Proximal Methods for Hierarchical Sparse Coding
Python
10
star
13

nips17_adv_attack

Tensorflow Implementation of adversarial learning based adversarial example generator
Python
9
star
14

OCRL

Object-Centric-Representation Library (OCRL): This repo is to explore OCR on various downstream tasks from supervised learning tasks to RL tasks.
Python
9
star
15

ddocker

Python
8
star
16

svpg_tensorflow

Tensorflow Implementation of Stein Variational Policy Gradients
Python
8
star
17

ASNP-RMR

This is an official Tensorflow implementation of the ASNP-RMR.
Python
7
star