• Stars
    star
    257
  • Rank 158,728 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created about 3 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

torch-imle

Concise and self-contained PyTorch library implementing the I-MLE gradient estimator proposed in our NeurIPS 2021 paper Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions.

This repository contains a library for transforming any combinatorial black-box solver in a differentiable layer. All code for reproducing the experiments in the NeurIPS paper is available in the official NEC Laboratories Europe repository.

Overview

Implicit MLE (I-MLE) makes it possible to include discrete combinatorial optimization algorithms, such as Dijkstra's algorithm or integer linear program (ILP) solvers, in standard deep learning architectures. The core idea of I-MLE is that it defines an implicit maximum likelihood objective whose gradients are used to update upstream parameters of the model. Every instance of I-MLE requires two ingredients:

  1. A method to approximately sample from a complex and intractable distribution induced by the combinatorial solver over the space of solutions, where optimal solutions have the highest probability mass. For this, we use Perturb-and-MAP (aka the Gumbel-max trick) and propose a novel family of noise perturbations tailored to the problem at hand.
  2. A method to compute a surrogate empirical distribution: Vanilla MLE reduces the KL divergence between the current distribution and the empirical distribution. Since in our setting, we do not have access to an empirical distribution, we have to design surrogate empirical distributions. Here we propose two families of surrogate distributions which are widely applicable and work well in practice.

Example

For example, let's consider a map from a simple game where the task is to find the shortest path from the top-left to the bottom-right corner. Darker areas have a higher cost, and brighter areas have a lower cost. In the middle, you can see what happens when we use the proposed sum-of-gamma noise distribution to sample paths. On the right, you can see the resulting marginal probabilities for every tile (the probability of each tile being part of a sampled path).

Gradients and Learning

Let us assume that the optimal shortest path is the one of the left. Starting from random weights, the model can learn to produce the weights that will result in the optimal shortest path via Gradient Descent, by minimising the Hamming loss between the produced path and the gold path. Here we show the paths being produced during training (middle), and the corresponding map weights (right).

Input noise temperature set to 0.0, and target noise temperature set to 0.0:

Input noise temperature set to 1.0, and target noise temperature set to 1.0:

Input noise temperature set to 2.0, and target noise temperature set to 2.0:

Input noise temperature set to 5.0, and target noise temperature set to 5.0:

Input noise temperature set to 5.0, and target noise temperature set to 0.0:

All animations were generated by this script.

Code

Using this library is extremely easy -- see this example as a reference. Assuming we have a method that implements a black-box combinatorial solver such as Dijkstra's algorithm:

import numpy as np

import torch
from torch import Tensor

def torch_solver(weights_batch: Tensor) -> Tensor:
    weights_batch = weights_batch.detach().cpu().numpy()
    y_batch = np.asarray([solver(w) for w in list(weights_batch)])
    return torch.tensor(y_batch, requires_grad=False)

We can obtain the corresponding distribution and gradients in this way:

from imle.wrapper import imle
from imle.target import TargetDistribution
from imle.noise import SumOfGammaNoiseDistribution

target_distribution = TargetDistribution(alpha=0.0, beta=10.0)
noise_distribution = SumOfGammaNoiseDistribution(k=k, nb_iterations=100)

def torch_solver(weights_batch: Tensor) -> Tensor:
    weights_batch = weights_batch.detach().cpu().numpy()
    y_batch = np.asarray([solver(w) for w in list(weights_batch)])
    return torch.tensor(y_batch, requires_grad=False)

imle_solver = imle(torch_solver,
                   target_distribution=target_distribution,
                    noise_distribution=noise_distribution,
                    nb_samples=10,
                    input_noise_temperature=input_noise_temperature,
                    target_noise_temperature=target_noise_temperature)

Or, alternatively, using a simple function annotation:

@imle(target_distribution=target_distribution,
      noise_distribution=noise_distribution,
      nb_samples=10,
      input_noise_temperature=input_noise_temperature,
      target_noise_temperature=target_noise_temperature)
def imle_solver(weights_batch: Tensor) -> Tensor:
    return torch_solver(weights_batch)

Papers using I-MLE

Reference

@inproceedings{niepert21imle,
  author    = {Mathias Niepert and
               Pasquale Minervini and
               Luca Franceschi},
  title     = {Implicit {MLE:} Backpropagating Through Discrete Exponential Family
               Distributions},
  booktitle = {NeurIPS},
  series    = {Proceedings of Machine Learning Research},
  publisher = {{PMLR}},
  year      = {2021}
}

More Repositories

1

stat-nlp-book

Interactive Lecture Notes, Slides and Exercises for Statistical NLP
Jupyter Notebook
269
star
2

egal

easy drawing in jupyter
JavaScript
257
star
3

jack

Jack the Reader
Python
257
star
4

emoji2vec

emoji2vec: Learning Emoji Representations from their Description
Jupyter Notebook
257
star
5

fakenewschallenge

UCL Machine Reading - FNC-1 Submission
Python
166
star
6

pycodesuggest

Learning to Auto-Complete using RNN Language Models
Python
156
star
7

cqd

Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs
Python
95
star
8

ntp

End-to-End Differentiable Proving
NewLisp
88
star
9

d4

Differentiable Forth Interpreter
Python
66
star
10

low-rank-logic

Code for Injecting Logical Background Knowledge into Embeddings for Relation Extraction
Scala
65
star
11

inferbeddings

Injecting Background Knowledge in Neural Models via Adversarial Set Regularisation
Python
59
star
12

gntp

Python
57
star
13

ctp

Conditional Theorem Proving
Python
51
star
14

EMAT

Efficient Memory-Augmented Transformers
Python
34
star
15

stat-nlp-book-scala

Interactive book on Statistical NLP
Scala
32
star
16

simpleNumericalFactChecker

Fact checker for simple claims about statistical properties
Python
26
star
17

adversarial-nli

Code and data for the CoNLL 2018 paper "Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge."
Python
25
star
18

acl2015tutorial

Moro files for the ACL 2015 Tutorial on Matrix and Tensor Factorization Methods for Natural Language Processing
Scala
20
star
19

numerate-language-models

Python
19
star
20

fever

FEVER Workshop Shared-Task
Python
16
star
21

APE

Adaptive Passage Encoder for Open-domain Question Answering
Python
15
star
22

stat-nlp-course

Code for the UCL Statistical NLP course
Scala
11
star
23

newshack

BBC Newshack code
Scala
1
star
24

eqa-tools

Tools for Exam Question Answering
Python
1
star
25

softconf-start-sync

Softconf START sync, tool for Google Sheets
JavaScript
1
star
26

bibtex

BibTeX files
TeX
1
star