Meta-Learning with Latent Embedding Optimization
Overview
This repository contains the implementation of the meta-learning model described in the paper "Meta-Learning with Latent Embedding Optimization" by Rusu et. al. It was posted on arXiv in July 2018 and will be presented at ICLR 2019.
The paper learns a data-dependent latent representation of model parameters and performs gradient-based meta-learning in this low-dimensional space.
The code here doesn't include the (standard) method for pre-training the data embeddings. Instead, the trained embeddings are provided.
Disclaimer: This is not an official Google product.
Running the code
Setup
To run the code, you first need to need to install:
- TensorFlow and TensorFlow Probability (we used version 1.12),
- Sonnet (we used version v1.29), and
- Abseil (we use only the FLAGS module).
Getting the data
You need to download the embeddings and extract them on disk:
$ wget http://storage.googleapis.com/leo-embeddings/embeddings.zip
$ unzip embeddings.zip
$ EMBEDDINGS=`pwd`/embeddings
Running the code
Then, clone this repository using:
$ git clone https://github.com/deepmind/leo
and run the code as:
$ python runner.py --data_path=$EMBEDDINGS
This will train the model for solving 5-way 1-shot miniImageNet classification.
Hyperparameters
To train the model on the tieredImageNet dataset or with a different number of
training examples per class (K-shot), you can pass these parameters with
command-line or in config.py
, e.g.:
$ python runner.py --data_path=$EMBEDDINGS --dataset_name=tieredImageNet --num_tr_examples_per_class=5 --outer_lr=1e-4
See config.py
for the list of options to set.
Comparison of paper and open-source implementations in terms of test set accuracy:
Implementation | miniImageNet 1-shot | miniImageNet 5-shot | tieredImageNet 1-shot | tieredImageNet 5-shot |
---|---|---|---|---|
LEO Paper |
61.76 ± 0.08% |
77.59 ± 0.12% |
66.33 ± 0.05% |
81.44 ± 0.09% |
This code |
61.89 ± 0.16% |
77.65 ± 0.09% |
66.25 ± 0.14% |
81.77 ± 0.09% |
The hyperparameters we found working best for different setups are as follows:
Hyperparameter | miniImageNet 1-shot | miniImageNet 5-shot | tieredImageNet 1-shot | tieredImageNet 5-shot |
---|---|---|---|---|
outer_lr |
2.739071e-4 |
4.102361e-4 |
8.659053e-4 |
6.110314e-4 |
l2_penalty_weight |
3.623413e-10 |
8.540338e-9 |
4.148858e-10 |
1.690399e-10 |
orthogonality_penalty_weight |
0.188103 |
1.523998e-3 |
5.451078e-3 |
2.481216e-2 |
dropout_rate |
0.307651 |
0.300299 |
0.475126 |
0.415158 |
kl_weight |
0.756143 |
0.466387 |
2.034189e-3 |
1.622811 |
encoder_penalty_weight |
5.756821e-6 |
2.661608e-7 |
8.302962e-5 |
2.672450e-5 |