• Stars
    star
    143
  • Rank 257,007 (Top 6 %)
  • Language Jsonnet
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Collection of NLP model explanations and accompanying analysis tools

logo

Thermostat is a large collection of NLP model explanations and accompanying analysis tools.

  • Combines explainability methods from the captum library with Hugging Face's datasets and transformers.
  • Mitigates repetitive execution of common experiments in Explainable NLP and thus reduces the environmental impact and financial roadblocks.
  • Increases comparability and replicability of research.
  • Reduces the implementational burden.

This work is described in our paper accepted to EMNLP 2021 System Demonstrations :
Nils Feldhus, Robert Schwarzenberg, and Sebastian Möller.
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools. 2021.

arXiv pre-print available here: https://arxiv.org/abs/2108.13961

Installation

With pip

PyPI

pip install thermostat-datasets

Explore on Hugging Face Spaces

The Spaces edition of Thermostat launched on October 26, 2021. Check it out here:

Usage

Downloading a dataset requires just two lines of code:

import thermostat
data = thermostat.load("imdb-bert-lig")

Thermostat datasets can be addressed and loaded with an identifier string that contains three basic coordinates: Dataset, Model, and Explainer. In this example, the dataset is IMDb (sentiment analysis of movie reviews), the model is a BERT model fine-tuned on the IMDb data, the explanations are generated using a (Layer) Integrated Gradients explainer.

data then contains the following columns/features:

  • attributions (the attributions for each token for each data point; type: List of floats)
  • idx (the index of the instance in the dataset)
  • input_ids (the token IDs of the original dataset; type: List of ints)
  • label (the label of the original dataset; type: int)
  • predictions (the class logits of the classifier/downstream model; type: List of floats)

This is the raw content stored in each of the instances of data:

instance-contents

If we print data, we get more info such as the actual names of the dataset, the explainer and the model:

print(data)
> IMDb dataset, BERT model, Layer Integrated Gradients explanations
> Explainer: LayerIntegratedGradients
> Model: textattack/bert-base-uncased-imdb
> Dataset: imdb

Indexing an instance

We can simply index the loaded dataset like a list:

import thermostat
instance = thermostat.load("imdb-bert-lig")[429]

Visualizing attributions as a heatmap

We can apply .render() to every instance to display a heatmap visualization generated by the displaCy library.

instance.render()  # instance refers to the variable assigned in the last codebox

heatmap-html

Get simple tuple-based heatmap

The explanation attribute stores a tuple-based heatmap with the token, the attribution, and the token index as elements.

print(instance.explanation)  # instance refers to the variable assigned in the second to last codebox

> [('[CLS]', 0.0, 0),
 ('amazing', 2.3141794204711914, 1),
 ('movie', 0.06655970215797424, 2),
 ('.', -0.47832658886909485, 3),
 ('some', 0.15708176791667938, 4),
 ('of', -0.02931656688451767, 5),
 ('the', -0.08834744244813919, 6),
 ('script', -0.2660972774028778, 7),
 ('writing', -0.4021594822406769, 8),
 ('could', -0.19280624389648438, 9),
 ('have', -0.015477157197892666, 10),
 ('been', -0.21898044645786285, 11),
 ('better', -0.4095713794231415, 12),
 ...]  # abbreviated

The heatmap attribute displays it as a pandas table:

print(instance.heatmap)

> token_index    0         1          2         3          4         5    \
token        [CLS]         i       went       and        saw      this   
attribution      0 -0.117371  0.0849944  0.165192  0.0362542 -0.029687   
text_field    text      text       text      text       text      text   

token_index       6         7         8          9          10         11   \
token           movie      last     night      after      being     coaxed   
attribution  0.533126  0.240222  0.171116 -0.0450005 -0.0103401  0.0166524   
text_field       text      text      text       text       text       text   

token_index        13         14          15         16         17   \
token               to         by           a        few    friends   
attribution  0.0269605 -0.0213463  0.00761083  0.0216749  0.0579834   
text_field        text       text        text       text       text   

# abbreviated

Modifying the load function

thermostat.load() is a wrapper around datasets.load_dataset() and you can use any keyword arguments from load_dataset() in load(), too (except path, name and split which are reserved), e.g. if you want to use another cache directory, you can use the cache_dir argument in thermostat.load().


Explainers

Name captum implementation Parameters
Layer Gradient x Activation (lgxa) .attr.LayerGradientXActivation
Layer Integrated Gradients (lig) .attr.LayerIntegratedGradients # samples = 25
LIME (lime) .attr.LimeBase # samples = 25,
mask prob = 0.3
Occlusion (occ) .attr.Occlusion sliding window = 3
Shapley Value Sampling (svs) .attr.ShapleyValueSampling # samples = 25
Layer DeepLiftShap (lds) .attr.LayerDeepLiftShap
Layer GradientShap (lgs) .attr.LayerGradientShap # samples = 5

Datasets + Models

Overview

= Dataset is downloadable
⏏️ = Dataset is finished, but not uploaded yet
🔄 = Currently running on cluster (x n = number of jobs/screens)
⚠️ = Issue

IMDb

imdb is a sentiment analysis dataset with 2 classes (pos and neg). The available split is the test subset containing 25k examples.
Example configuration: imdb-xlnet-lig

Name 🤗 lgxa lig lime occ svs lds lgs
ALBERT (albert) textattack/albert-base-v2-imdb
BERT (bert) textattack/bert-base-uncased-imdb
ELECTRA (electra) monologg/electra-small-finetuned-imdb
RoBERTa (roberta) textattack/roberta-base-imdb
XLNet (xlnet) textattack/xlnet-base-cased-imdb ⚠️ ⚠️

MultiNLI

multi_nli is a textual entailment dataset. The available split is the validation_matched subset containing 9815 examples.
Example configuration: multi_nli-roberta-lime

Name 🤗 lgxa lig lime occ svs lds lgs
ALBERT (albert) prajjwal1/albert-base-v2-mnli
BERT (bert) textattack/bert-base-uncased-MNLI
ELECTRA (electra) howey/electra-base-mnli
RoBERTa (roberta) textattack/roberta-base-MNLI
XLNet (xlnet) textattack/xlnet-base-cased-MNLI ⚠️ ⚠️

XNLI

xnli is a textual entailment dataset. It provides the test set of MultiNLI through the "en" configuration. The fine-tuned models used here are the same as the MultiNLI ones. The available split is the test subset containing 5010 examples.
Example configuration: xnli-roberta-lime

Name 🤗 lgxa lig lime occ svs lds lgs
ALBERT (albert) prajjwal1/albert-base-v2-mnli
BERT (bert) textattack/bert-base-uncased-MNLI
ELECTRA (electra) howey/electra-base-mnli
RoBERTa (roberta) textattack/roberta-base-MNLI
XLNet (xlnet) textattack/xlnet-base-cased-MNLI ⚠️ ⚠️

AG News

ag_news is a news topic classification dataset. The available split is the test subset containing 7600 examples.
Example configuration: ag_news-albert-svs

Name 🤗 lgxa lig lime occ svs lds lgs
ALBERT (albert) textattack/albert-base-v2-ag-news
BERT (bert) textattack/bert-base-uncased-ag-news
RoBERTa (roberta) textattack/roberta-base-ag-news

Contribute a dataset

New explanation datasets must follow the JSONL format and include the five fields attributions, idx, input_ids, label and predictions as described above in "Usage".

Please follow the instructions for writing a dataset loading script in the official docs of datasets.

Provide the additional Thermostat metadata via the list of builder configs (click here to see the Thermostat implementation of builder configs).

Necessary fields include...

  • name : The unique identifier string, e.g. including the three coordinates <DATASET>-<MODEL>-<EXPLAINER>
  • dataset : The full name of the dataset, usually follows the naming convention in datasets, e.g. "imdb"
  • explainer : The full name of the explainer, usually follows the naming convention in captum, e.g. "LayerIntegratedGradients"
  • model : The full name of the model, usually follows the naming convention in transformers, e.g. "textattack/bert-base-uncased-imdb"
  • label_column : The name of the column in the JSONL file that contains the label, usually "label"
  • label_classes : The list of label names or classes, e.g. ["entailment", "neutral", "contradiction"] for NLI datasets
  • text_column : Either a string (if there is only one text column) or a list of strings that identify the column in the JSONL file that contains the text(s), e.g. "text" (IMDb) or ["premise", "hypothesis"] (NLI)
  • description : Should at least state the full names of the three coordinates, can optionally include more info such as hyperparameter choices
  • data_url : The URL to the data storage, e.g. a Google Drive link

plus features which you can copy from the codebox below:

features={"attributions": "attributions",
          "predictions": "predictions",
          "input_ids": "input_ids"}

While debugging, you can wrap your data with the Thermopack class and see if it correctly parses your data:

import thermostat
from datasets import load_dataset
data = load_dataset('your_dataset')
thermostat.Thermopack(data)

If you're successful, follow the official instructions for sharing a community provided dataset at the HuggingFace hub.

At first, all Thermostat contributions will have to be loaded via the code example above. Please notify us of existing explanation datasets by creating an Issue with the tag Contribution and a maintainer of this repository will add your dataset to the Thermostat configs s.t. it can be accessed by everyone via thermostat.load().


Cite Thermostat

@inproceedings{feldhus2021thermostat,
    title={Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools},
    author={Nils Feldhus and Robert Schwarzenberg and Sebastian Möller},
    year={2021},
    editor = {Heike Adel and Shuming Shi},
    booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
}

Disclaimer

We give no warranties for the correctness of the heatmaps or any other part of the data. This is evolving work and will be hot-patched continuously.

The Thermostat project follows the ACL and ACM Code of Ethics.

Acknowledgements

The majority of the codebase, especially regarding the combination of transformers and captum, stems from our other recent project Empirical Explainers.

More Repositories

1

TRE

[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
Python
108
star
2

DISTRE

[ACL 19] Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction
Python
85
star
3

tacrev

[ACL 20] TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task
Jupyter Notebook
69
star
4

RelEx

RelEx - A simple framework for Relation Extraction built on AllenNLP
Jsonnet
16
star
5

REval

[ACL 20] Probing Linguistic Features of Sentence-level Representations in Neural Relation Extraction
Python
13
star
6

lrv

Layerwise Relevance Visualization in Convolutional Text Graph Classifiers
Python
12
star
7

product-corpus

This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the relation CompanyProvidesProduct.
12
star
8

MobIE

[Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fine-grained geo-entities, such as streets, stops and routes, as well as standard named entity types (organization, date, number, etc).
Python
11
star
9

fewie

Few-shot named entity recognition
Python
11
star
10

meffi-prompt

[EMNLP 2022] Multilingual Relation Classification via Efficient and Effective Prompting
Python
10
star
11

LLMCheckup

Code for the NAACL 2024 HCI+NLP Workshop paper "LLMCheckup: Conversational Examination of Large Language Models via Interpretability Tools and Self-explanation" (Wang et al. 2024)
Python
10
star
12

nvc

Neural Vector Conceptualization
Jupyter Notebook
9
star
13

OLM

Explanation Method for Neural Models in NLP
Jupyter Notebook
8
star
14

MultiTACRED

[ACL23] This repository contains the code for our paper "MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset"
Python
8
star
15

SMV

Code and data for the ACL 2023 NLReasoning Workshop paper "Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods" (Feldhus et al., 2023)
Python
8
star
16

smartdata-corpus

A dataset of almost 2600 German-language documents which has been annotated with fine-grained geo-entities, standard named entity types, and a set of 15 traffic- and industry-related relations.
8
star
17

marian-docker

Dockerfiles for providing a compiled version of the Marian neural machine translation toolkit
Dockerfile
7
star
18

defx

[SemEval 2020] Defx at SemEval-2020 Task 6: Joint Extraction of Concepts and Relations for Definition Extraction
Jupyter Notebook
7
star
19

emp-exp

Jsonnet
7
star
20

gevalm

Code and data for the paper "Evaluating German Transformer Language Models with Syntactic Agreement Tests" (Zaczynska et al., 2020)
Python
7
star
21

language-attributions

Learning Explanations from Language Data
Python
6
star
22

pgig

Jupyter Notebook
6
star
23

diamat

Machine Translation Diagnostics Tool
Python
6
star
24

TQ_AutoTest

5
star
25

InterroLang

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations [EMNLP 2023 Findings]
Python
5
star
26

pegasus-bridle

Ease your training experience on the DFKI GPU cluster 🦄
Shell
5
star
27

xai-shift-detection

Python
5
star
28

cross-nvc

Crosslingual Neural Vector Conceptualization
Jupyter Notebook
4
star
29

sherlock

State-of-the-art Information Extraction
Python
3
star
30

mEx-Docker-Deployment

Python
3
star
31

covid19-law-matching

Claim retrieval and matching with laws for COVID-19 related legislation (LREC 2022).
HTML
3
star
32

weighting-schemes-report

Extend the scikit-learn classification report with custom weighting schemes.
Python
3
star
33

mtb-bert-em

Replication of the BERT-EM relation extraction models from the "Matching the Blanks" paper with additional evaluation.
Python
3
star
34

mt-testsuite

Test Suite for linguistically-motivated fine grained evaluation of machine translation
TeX
3
star
35

dfki-nlp.github.io

https://dfki-nlp.github.io
TeX
2
star
36

mEx_medical_information_extraction

Python
2
star
37

discrete-ehr

ICU predictions on MIMIC-III with discrete and distributed event representations.
Jupyter Notebook
2
star
38

Ex4CDS

The repository for our annotated corpus of textual explanations for clinical decision support.
2
star
39

sim3s-corpus

Corpus MobASA: a German-language corpus of tweets annotated with their relevance for public transportation, and with sentiment towards aspects related to barrier-free travel.
Python
2
star
40

CoXQL

Code for the paper accepted to EMNLP 2024 Findings: "CoXQL: A Dataset for Parsing Explanation Requests in Conversational XAI Systems" (Wang et al., 2024)
Python
2
star
41

low-resource

[ACL-ECNLP 2020] Bootstrapping Named Entity Recognition in E-Commerce with Positive Unlabeled Learning
Jupyter Notebook
1
star
42

cross-ling-adr

[LREC 2022] Cross-lingual Approaches for the Detection of Adverse Drug Reactions
Python
1
star
43

tohyve-services

Python
1
star
44

pynegex

PyNegEx pypi modular package for negex
Python
1
star
45

CockrACE-corpus

The “CockrACE” corpus consists of 140 news articles annotated with mentions of entities and their coreference links, as well as relation mentions for the evaluation of relation extraction (RE) experiments. Three semantic relations have been annotated, each of them dealing with people's family relationships (marriages, brother/sister, parent/child).
XML
1
star
46

recon

Recon is a Java-based tool for the annotation of relations among textual elements and semantic concepts.
1
star
47

sam

Python
1
star
48

ADE_templates

This project contains templates and evaluation of models with these templates for the task of Adverse Drug Effect (ADE) detection.
Jupyter Notebook
1
star
49

weak-supervision-rlhf

Bachelor Thesis: Language Modeling with Reinforcement Learning from Human Feedback with Weak Supervision
Jupyter Notebook
1
star
50

semisupervised-mt-qe

Scripts and data to reproduce the experiments of the paper Bhatia et. al 2023, " Semi-supervised learning for Quality Estimation of Machine Translation" - MT Summit 2023
Jupyter Notebook
1
star
51

nfdi4ds-forc

Repository for constructing a dataset for the Field of Research Classification task.
Jupyter Notebook
1
star
52

for-classifier

📚 Code for my master's thesis "Investigating Knowledge Injection Approaches for Research Field Classification of Scholarly Articles".
Python
1
star
53

semantic-storytelling

Jupyter Notebook
1
star
54

keepha_annotation_guidelines

TeX
1
star
55

Taxonomy4CL

Taxonomy for Computational Linguistics topics
Python
1
star