Interpretable NLP
This repo collects recent publications on NLP interpretability research from top venues in NLP and AI, including ACL, EMNLP, ICLR, NIPS, ICML, NAACL, etc.
Welcome to contribute!
Please follow the template and raise a pull request: Paper Title (Venue Year)
Table of Content
General Study
- Learning to Deceive with Attention-Based Explanations (ACL 2020)
- Towards Transparent and Explainable Attention Models (ACL 2020)
- Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? (ACL 2020)
- Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions (ACL 2020)
- What Is One Grain of Sand in the Desert? Analyzing Individual Neurons in Deep NLP Models (AAAI 2019)
Pretraining
- ExpBERT: Representation Engineering with Natural Language Explanations (ACL 2020)
- How does BERT’s attention change when you fine-tune? An analysis methodology and a case study in negation scope (ACL 2020)
- Understanding Advertisements with BERT (ACL 2020)
- Interpreting Pretrained Contextualized Representations via Reductions to Static Embeddings (ACL 2020)
- Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT (ACL 2020)
- Quantifying Attention Flow in Transformers (ACL 2020)
- Visualizing and Understanding the Effectiveness of BERT (Arxiv 2019)
Sequence to Sequence
- Evaluating Explanation Methods for Neural Machine Translation (ACL 2020)
- Understanding Points of Correspondence between Sentences for Abstractive Summarization (ACL 2020) [github]
- Identifying and Controlling Important Neurons in Neural Machine Translation (ICLR 2019)
- Towards Understanding Neural Machine Translation with Word Importance (EMNLP 2019)
- SEQ2SEQ-VIS: A Visual Debugging Tool for Sequence-to-Sequence Models (IEEE VIS 2018)
- Did the Model Understand the Question? (ACL 2018)
- Pathologies of Neural Models Make Interpretations Difficult (EMNLP 2018)
- Latent Alignment and Variational Attention (NIPS 2018)
- LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks (IEEE TVCG 2018)
- Visualizing and Understanding Neural Machine Translation (ACL 2017)
- A Causal Framework for Explaining the Predictions of Black-box Sequence-to-Sequence Models (EMNLP 2017)
- Axiomatic Attribution for Deep Networks (ICML 2017)
- Visualizing and Understanding Neural Models in NLP (NAACL 2016)
Classification
- Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection (ACL 2020)
- Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL 2020)
- Understanding Attention for Text Classification (ACL 2020)
- Attention is not not Explanation (EMNLP 2019)
- EDUCE: Explaining model Decisions through Unsupervised Concepts Extraction (Arxiv 2019)
- Is Attention Interpretable? (NAACL 2019)
- Attention is not Explanation (NAACL 2019)
- Towards Explainable NLP: A Generative Explanation Framework for Text Classification (ACL 2019)
- Interpretable Neural Predictions with Differentiable Binary Variables (ACL 2019)
- How Important Is a Neuron? (ICLR 2019)
- Understanding Convolutional Neural Networks for Text Classification (EMNLP 2018 Workshop)
- Beyond Word Importance Contextual Decomposition to Extract Interactions from LSTMs (ICLR 2018)
- Automatic Rule Extraction From LSTM Networks (ICLR 2017)
- Understanding Neural Networks through Representation Erasure (Arxiv 2016)
- Explaining Predictions of Non-Linear Classifiers in NLP (ACL 2016 Workshop)
- Rationalizing Neural Predictions (EMNLP 2016)
- Comparing Automatic and Human Evaluation of Local Explanations for Text Classification (NAACL 2018)
Sequence Labeling
- Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules? (EMNLP 2018)
Others
- Generating Fact Checking Explanations (ACL 2020)
- Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations (ACL 2020)
- NILE: Natural Language Inference with Faithful Natural Language Explanations (ACL 2020)
- Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness? (ACL 2020)
- Interpreting Twitter User Geolocation (ACL 2020)
- Obtaining Faithful Interpretations from Compositional Neural Networks (ACL 2020)
- Towards Understanding Gender Bias in Relation Extraction (ACL 2020)
- Understanding the Language of Political Agreement and Disagreement in Legislative Texts (ACL 2020)
- Learning to Understand Child-directed and Adult-directed Speech (ACL 2020)
- Learning Corresponded Rationales for Text Matching (Openreview 2019)
- Interpretable Neural Architectures for Attributing an Ad’s Performance to its Writing Style (EMNLP 2018 Workshop)
- Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference (EMNLP 2018)
- SPINE: SParse Interpretable Neural Embeddings (AAAI 2018)