deep-spin/entmax

Stars
406
Rank 106,421 (Top 3 %)
Language
Python
License
MIT License
Created over 5 years ago
Updated 5 months ago

deep-spin/entmax

deep-spin

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The entmax mapping and its loss, a family of sparse softmax alternatives.

entmax

This package provides a pytorch implementation of entmax and entmax losses: a sparse family of probability mappings and corresponding loss functions, generalizing softmax / cross-entropy.

Features:

Exact partial-sort algorithms for 1.5-entmax and 2-entmax (sparsemax).
A bisection-based algorithm for generic alpha-entmax.
Gradients w.r.t. alpha for adaptive, learned sparsity!

Requirements: python 3, pytorch >= 1.0 (and pytest for unit tests)

Example

In [1]: import torch

In [2]: from torch.nn.functional import softmax

In [2]: from entmax import sparsemax, entmax15, entmax_bisect

In [4]: x = torch.tensor([-2, 0, 0.5])

In [5]: softmax(x, dim=0)
Out[5]: tensor([0.0486, 0.3592, 0.5922])

In [6]: sparsemax(x, dim=0)
Out[6]: tensor([0.0000, 0.2500, 0.7500])

In [7]: entmax15(x, dim=0)
Out[7]: tensor([0.0000, 0.3260, 0.6740])

Gradients w.r.t. alpha (continued):

In [1]: from torch.autograd import grad

In [2]: x = torch.tensor([[-1, 0, 0.5], [1, 2, 3.5]])

In [3]: alpha = torch.tensor(1.33, requires_grad=True)

In [4]: p = entmax_bisect(x, alpha)

In [5]: p
Out[5]:
tensor([[0.0460, 0.3276, 0.6264],
        [0.0026, 0.1012, 0.8963]], grad_fn=<EntmaxBisectFunctionBackward>)

In [6]: grad(p[0, 0], alpha)
Out[6]: (tensor(-0.2562),)

Installation

pip install entmax

Citations

Sparse Sequence-to-Sequence Models

@inproceedings{entmax,
  author    = {Peters, Ben and Niculae, Vlad and Martins, Andr{\'e} FT},
  title     = {Sparse Sequence-to-Sequence Models},
  booktitle = {Proc. ACL},
  year      = {2019},
  url       = {https://www.aclweb.org/anthology/P19-1146}
}

Adaptively Sparse Transformers

@inproceedings{correia19adaptively,
  author    = {Correia, Gon\c{c}alo M and Niculae, Vlad and Martins, Andr{\'e} FT},
  title     = {Adaptively Sparse Transformers},
  booktitle = {Proc. EMNLP-IJCNLP (to appear)},
  year      = {2019},
}

Further reading:

Blondel, Martins, and Niculae, 2019. Learning with Fenchel-Young Losses.
Martins and Astudillo, 2016. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification.
Peters and Martins, 2019 IT-IST at the SIGMORPHON 2019 Shared Task: Sparse Two-headed Models for Inflection.

infinite-former

tutorial

Web page for our tutorial on latent structure for NLP

lp-sparsemap

LP-SparseMAP: Differentiable sparse structured prediction in coarse factor graphs

UA_COMET

Repository for "Uncertainty-Aware Machine Translation Evaluation", accepted to Findings of EMNLP 2021.

OpenNMT-APE

sparse-marginalization-lvm

Official PyTorch (Lightning) implementation of the NeurIPS 2020 paper "Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity".

scheduled-sampling-transformers

Code for the paper "Scheduled Sampling for Transformers"

tower-eval

mcan-vqa-continuous-attention

uncertainties_MT_eval

Code and data for the paper "Disentangling Uncertainty in Machine Translation Evaluation", accepted at EMNLP 2022.

sparse_text_generation

hallucinations-in-nmt

sparse_continuous_distributions

This repository provides open-source code for sparse continuous distributions and corresponding Fenchel-Young losses.

robust_MT_evaluation

Repository for "BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation", accepted at EAMT 2023.

Jupyter Notebook

lmt_hallucinations

qaware-decode

A repository for experiments in quality-aware decoding

OpenNMT-entmax

qe-evaluation

Evaluation scripts for the 2019 machine translation quality estimation shared task

sparse-communication

Jupyter Notebook

understanding-spigot

Code for the paper "Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning"

entmax-jax

The entmax mapping in JAX

spectra-rationalization

Repository for SPECTRA: Sparse Structured Text Rationalization, accepted at EMNLP 2021 main conference.

explainable-qe-shared-task

IST-Unbabel 2021 Submission for the Quality Estimation Shared Task

Jupyter Notebook

translation_llm

Jupyter Notebook

non-exchangeable-crc

Jupyter Notebook

crest

Code for CREST: A Joint Framework for Rationalization and Counterfactual Text Generation, accepted at ACL 2023.

pyturbo

Neural dependency parser with higher-order features

S7

Smoothing and Shrinking the Sparse Seq2Seq Search Space

ot-hallucination-detection

efficient_kNN_MT

unn

Code for the paper "Modeling Structure with Undirected Neural Networks"

chunk-based_knn-mt

sigmorphon-seq2seq

DeepSPIN's submission to SIGMORPHON 2020

spec

The Explanation Game: Towards Prediction Explainability through Sparse Communication

SIGMORPHON2019

IT-IST's submission to SIGMORPHON 2019 Task 1

speech-continuous-attention

Speech Classification using Continuous Attention Mechanisms

tutorial-latent-struct-src

Sources for our slides for the latent structure in NLP tutorial

quest-decoding

A package for sampling from Gibbs distributions during inference with LLMs.

translation-hypothesis-ensembling

vqa-multimodal-continuous-attention

quati

Simple and modular library for document classification and sequence tagging.

TVmax

SSHN

Sparse and Structured Hopfield Networks

doce

This is the a repo of DOCE

Jupyter Notebook

deep-spin.github.io

Website of the DeepSPIN ERC project.