ferret is Python library that streamlines the use and benchmarking of interpretability techniques on Transformers models.
- Documentation: https://ferret.readthedocs.io
- Paper: https://arxiv.org/abs/2208.01575
- Demo: https://huggingface.co/spaces/g8a9/ferret
ferret is meant to integrate seamlessly with 🤗 transformers models, among which it currently supports text models only. We provide:
- 🔍 Four established interpretability techniques based on Token-level Feature Attribution. Use them to find the most relevant words to your model output quickly.
⚖️ Six Faithfulness and Plausibility evaluation protocols. Benchmark any token-level explanation against these tests to guide your choice toward the most reliable explainer.
📝 Examples
All around tutorial (to test all explainers, evaluation metrics, and interface with XAI datasets): Colab
Text Classification
- Intent Detection with Multilingual XLM RoBERTa: Colab
Getting Started
Installation
pip install -U ferret-xai
Our main dependencies are tranformers
and datasets
.
Important Some of our dependencies might use the package name for scikit-learn
and that breaks ferret installation.
If your pip install command fails, try:
SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True pip install -U ferret-xai
This is hopefully a temporary situation!
Explain & Benchmark
The code below provides a minimal example to run all the feature-attribution explainers supported by ferret and benchmark them on faithfulness metrics.
We start from a common text classification pipeline
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from ferret import Benchmark
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(name)
tokenizer = AutoTokenizer.from_pretrained(name)
Using ferret is as simple as:
bench = Benchmark(model, tokenizer)
explanations = bench.explain("You look stunning!", target=1)
evaluations = bench.evaluate_explanations(explanations, target=1)
bench.show_evaluation_table(evaluations)
Be sure to run the code in a Jupyter Notebook/Colab: the cell above will produce a nicely-formatted table to analyze the saliency maps.
Features
ferret offers a painless integration with Hugging Face models and naming conventions. If you are already using the transformers library, you immediately get access to our Explanation and Evaluation API.
Post-Hoc Explainers
- Gradient (plain gradients or multiplied by input token embeddings) (Simonyan et al., 2014)
- Integrated Gradient (plain gradients or multiplied by input token embeddings) (Sundararajan et al., 2017)
- SHAP (via Partition SHAP approximation of Shapley values) (Lundberg and Lee, 2017)
- LIME (Ribeiro et al., 2016)
Evaluation Metrics
Faithfulness measures:
- AOPC Comprehensiveness (DeYoung et al., 2020)
- AOPC Sufficiency (DeYoung et al., 2020)
- Kendall's Tau correlation with Leave-One-Out token removal. (Jain and Wallace, 2019)
Plausibility measures:
- Area-Under-Precision-Recall-Curve (soft score) (DeYoung et al., 2020)
- Token F1 (hard score) (DeYoung et al., 2020)
- Token Intersection Over Union (hard score) (DeYoung et al., 2020)
See our paper for details.
Visualization
The Benchmark
class exposes easy-to-use table
visualization methods (e.g., within Jupyter Notebooks)
bench = Benchmark(model, tokenizer)
# Pretty-print feature attribution scores by all supported explainers
explanations = bench.explain("You look stunning!")
bench.show_table(explanations)
# Pretty-print all the supported evaluation metrics
evaluations = bench.evaluate_explanations(explanations)
bench.show_evaluation_table(evaluations)
Dataset Evaluations
The Benchmark
class has a handy method to compute and
average our evaluation metrics across multiple samples from a dataset.
import numpy as np
bench = Benchmark(model, tokenizer)
# Compute and average evaluation scores one of the supported dataset
samples = np.arange(20)
hatexdata = bench.load_dataset("hatexplain")
sample_evaluations = bench.evaluate_samples(hatexdata, samples)
# Pretty-print the results
bench.show_samples_evaluation_table(sample_evaluations)
Planned Developement
See the changelog file for further details.
- ✅ GPU acceleartion support for inference (v0.4.0)
✅ Batched Inference for internal methods's approximation steps (e.g., LIME or SHAP) (v0.4.0)⚙️ Simplified Task API to support NLI, Zero-Shot Text Classification, Language Modeling (branch).- ⚙️ Multi-sample explanation generation and evaluation
⚙️ Support to explainers for seq2seq and autoregressive generation through inseq.⚙️ New evaluation measure: Sensitivity, Stability (Yin et al.)- ⚙️ New evaluation measure: Area Under the Threshold-Performance Curve (AUC-TP) (Atanasova et al.)
⚙️ New explainer: Sampling and Occlusion (SOC) (Jin et al., 2020)⚙️ New explainer: Discretized Integrated Gradient (DIG) (Sanyal and Ren, 2021)⚙️ New explainer: Value Zeroing (Mohebbi et al, 2023)- ⚙️ Support additional form of aggregation over embeddings' hidden dimension.
Authors
- Giuseppe Attanasio
- Eliana Pastor
- Debora Nozza
- Chiara Di Bonaventura
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
Logo and graphical assets made by Luca Attanasio.
If you are using ferret for your work, please consider citing us!
@inproceedings{attanasio-etal-2023-ferret,
title = "ferret: a Framework for Benchmarking Explainers on Transformers",
author = "Attanasio, Giuseppe and Pastor, Eliana and Di Bonaventura, Chiara and Nozza, Debora",
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
publisher = "Association for Computational Linguistics",
}