• Stars
    star
    234
  • Rank 171,630 (Top 4 %)
  • Language
  • Created 9 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

awesome papers in LLM interpretability

awesome papers for understanding LLM mechanism

Focusing on: understanding the internal mechanism of large language models (LLM).

(keep updating when I read good papers ...)

survey

A Comprehensive Overview of Large Language Models. [pdf] [2023.12] [LLM]

A Survey of Large Language Models. [pdf] [2023.11] [LLM]

Explainability for Large Language Models: A Survey. [pdf] [2023.11] [interpretability]

A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future. [pdf] [2023.10] [chain of thought]

Instruction tuning for large language models: A survey. [pdf] [2023.10] [instruction tuning]

Sirenโ€™s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. [pdf] [2023.9] [hallucination]

Reasoning with language model prompting: A survey. [pdf] [2023.9] [reasoning]

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. [pdf] [2023.8] [interpretability]

A Survey on In-context Learning. [pdf] [2023.6] [in-context learning]

Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning. [pdf] [2023.3] [parameter-efficient fine-tuning]

papers

Successor Heads: Recurring, Interpretable Attention Heads In The Wild. [pdf] [ICLR 2024 poster] [2023.12]

Impact of Co-occurrence on Factual Knowledge of Large Language Models. [pdf] [EMNLP 2023 findings] [2023.10]

Can Large Language Models Explain Themselves? [pdf] [2023.10]

Neurons in Large Language Models: Dead, N-gram, Positional. [pdf] [2023.9]

Do Machine Learning Models Memorize or Generalize? [blog] [2023.8]

Overthinking the Truth: Understanding how Language Models Process False Demonstrations. [pdf] [2023.7]

Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning. [pdf] [EMNLP 2023 best paper] [2023.5]

Let's Verify Step by Step. [pdf] [ICLR 2024 poster] [2023.5]

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning. [pdf] [ACL 2023 findings] [2023.5]

Language models can explain neurons in language models. [blog] [2023.5]

Dissecting Recall of Factual Associations in Auto-Regressive Language Models. [pdf] [EMNLP 2023 main] [2023.4]

Are Emergent Abilities of Large Language Models a Mirage? [pdf] [NeurIPS 2023 best paper] [2023.4]

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression. [pdf] [2023.4]

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model. [pdf] [NeurIPS 2023 poster] [2023.4]

A Theory of Emergent In-Context Learning as Implicit Structure Induction. [pdf] [2023.3]

Larger language models do in-context learning differently. [pdf] [2023.3]

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models. [pdf] [NeurIPs 2023 spotlight] [2023.1]

Transformers as Algorithms: Generalization and Stability in In-context Learning. [pdf] [ICML 2023 poster] [2023.1]

Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers. [pdf] [ACL 2023 findings] [2022.12]

How does gpt obtain its ability? tracing emergent abilities of language models to their sources. [blog] [2022.12]

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. [pdf] [ACL 2023 long] [2022.12]

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small. [pdf] [ICLR 2023 poster] [2022.11]

Inverse scaling can become U-shaped. [pdf] [EMNLP 2023 main] [2022.11]

What learning algorithm is in-context learning? Investigations with linear models. [pdf] [ICLR 2023 notable] [2022.11]

Mass-Editing Memory in a Transformer. [pdf] [ICLR 2023 notable] [2022.10]

Polysemanticity and Capacity in Neural Networks. [pdf] [2022.10]

Analyzing Transformers in Embedding Space. [pdf] [ACL 2023 long] [2022.9]

Toy Models of Superposition. [blog] [2022.9]

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango. [pdf] [2022.9]

Emergent Abilities of Large Language Models. [pdf] [2022.6]

Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases. [blog] [2022.6]

Towards Tracing Factual Knowledge in Language Models Back to the Training Data. [pdf] [EMNLP 2022 findings] [2022.5]

Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations. [pdf] [EMNLP 2022 main] [2022.5]

Large Language Models are Zero-Shot Reasoners. [pdf] [NeurIPS 2022] [2022.5]

Scaling Laws and Interpretability of Learning from Repeated Data. [pdf] [2022.5]

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. [pdf] [EMNLP 2022 main] [2022.3]

In-context Learning and Induction Heads. [blog] [2022.3]

Locating and Editing Factual Associations in GPT. [pdf] [NeurIPS 2022] [2022.2]

Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? [pdf] [EMNLP 2022 main] [2022.2]

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets. [pdf] [2022.1]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. [pdf] [2022.1]

A Mathematical Framework for Transformer Circuits. [blog] [2021.12]

An Explanation of In-context Learning as Implicit Bayesian Inference. [pdf] [ICLR 2022 poster] [2021.11]

Towards a Unified View of Parameter-Efficient Transfer Learning. [pdf] [ICLR 2022 spotlight] [2021.10]

Do Prompt-Based Models Really Understand the Meaning of their Prompts? [pdf] [NAACL 2022] [2021.9]

Deduplicating Training Data Makes Language Models Better. [pdf] [ACL 2022 long] [2021.7]

LoRA: Low-Rank Adaptation of Large Language Models. [pdf] [ICLR 2022 poster] [2021.6]

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity. [pdf] [ACL 2022 long] [2021.4]

The Power of Scale for Parameter-Efficient Prompt Tuning. [pdf] [EMNLP 2021 main] [2021.4]

Calibrate Before Use: Improving Few-Shot Performance of Language Models [pdf] [ICML 2021] [2021.2]

Prefix-Tuning: Optimizing Continuous Prompts for Generation. [pdf] [ACL 2021 long] [2021.1]

Transformer Feed-Forward Layers Are Key-Value Memories. [pdf] [EMNLP 2021 main] [2020.12]

Scaling Laws for Neural Language Models. [pdf] [2020.1]