• Stars
    star
    888
  • Rank 51,399 (Top 2 %)
  • Language
  • Created over 1 year ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

llm-hallucination-survey

Hallucination refers to the generated content that is nonsensical or unfaithful to the provided source content or even world knowledge.

This issue can hinder the real-world adoption of LLMs in various applications and scenarios.

Evaluation of Hallucination for LLMs

  1. TruthfulQA: Measuring How Models Mimic Human Falsehoods

    Stephanie Lin, Jacob Hilton, Owain Evans [paper] 2022.5

  2. A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

    Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, Bill Dolan [paper] 2022.5

  3. Towards Tracing Factual Knowledge in Language Models Back to the Training Data

    Ekin Akyürek, Tolga Bolukbasi, Frederick Liu, Binbin Xiong, Ian Tenney, Jacob Andreas, Kelvin Guu [paper] 2022.5

  4. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

    Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung [paper] 2023.2

  5. Why Does ChatGPT Fall Short in Providing Truthful Answers?

    Shen Zheng, Jie Huang, Kevin Chen-Chuan Chang [paper] 2023.4

  6. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

    Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen [paper] 2023.5

  7. Automatic Evaluation of Attribution by Large Language Models

    Xiang Yue, Boshi Wang, Kai Zhang, Ziru Chen, Yu Su, Huan Sun [paper] 2023.5

  8. Adaptive Chameleon or Stubborn Sloth: Unraveling the Behavior of Large Language Models in Knowledge Clashes

    Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su [paper] 2023.5

  9. LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu [paper] 2023.5

  10. Evaluating the Factual Consistency of Large Language Models Through News Summarization

    Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, Colin Raffel [paper] 2023.5

  11. Methods for Measuring, Updating, and Visualizing Factual Beliefs in Language Models

    Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer [paper] 2023.5

  12. How Language Model Hallucinations Can Snowball

    Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith [paper] 2023.5

  13. Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

    Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev [paper] 2023.5

  14. Evaluating Factual Consistency of Texts with Semantic Role Labeling

    Jing Fan, Dennis Aumiller, Michael Gertz [paper] 2023.5

  15. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

    Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi [paper] 2023.5

  16. Sources of Hallucination by Large Language Models on Inference Tasks

    Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman [paper] 2023.5

  17. KoLA: Carefully Benchmarking World Knowledge of Large Language Models

    Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, Yunjia Qi, Weikai Li, Yong Guan, Kaisheng Zeng, Ji Qi, Hailong Jin, Jinxin Liu, Yu Gu, Yuan Yao, Ning Ding, Lei Hou, Zhiyuan Liu, Bin Xu, Jie Tang, Juanzi Li [paper] 2023.6

  18. Generating Benchmarks for Factuality Evaluation of Language Models

    Dor Muhlgay, Ori Ram, Inbal Magar, Yoav Levine, Nir Ratner, Yonatan Belinkov, Omri Abend, Kevin Leyton-Brown, Amnon Shashua, Yoav Shoham [paper] 2023.7

  19. Overthinking the Truth: Understanding how Language Models Process False Demonstrations

    Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt [paper] 2023.7

  20. Fact-Checking of AI-Generated Reports

    Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan [paper] 2023.7

  21. Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering

    Vaibhav Adlakha, Parishad BehnamGhader, Xing Han Lu, Nicholas Meade, Siva Reddy [paper] 2023.7

  22. Med-HALT: Medical Domain Hallucination Test for Large Language Models

    Logesh Kumar Umapathi, Ankit Pal, Malaikannan Sankarasubbu [paper] 2023.7

  23. Head-to-Tail: How Knowledgeable are Large Language Models (LLM)? A.K.A. Will LLMs Replace Knowledge Graphs?

    Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, Xin Luna Dong [paper] 2023.8

  24. Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts

    Fan Gao, Hang Jiang, Moritz Blum, Jinghui Lu, Yuang Jiang, Irene Li [paper] 2023.8

Mitigation of Hallucination for LLMs

  1. Factuality Enhanced Language Models for Open-Ended Text Generation

    Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro [paper] 2022.6

  2. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

    Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao [paper] 2023.2

  3. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

    Potsawee Manakul, Adian Liusie, Mark J. F. Gales [paper] 2023.3

  4. Zero-shot Faithful Factual Error Correction

    Kung-Hsiang Huang, Hou Pong Chan, Heng Ji [paper] 2023.5

  5. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen [paper] 2023.5

  6. PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions

    Anthony Chen, Panupong Pasupat, Sameer Singh, Hongrae Lee, Kelvin Guu [paper] 2023.5

  7. Mitigating Language Model Hallucination with Interactive Question-Knowledge Alignment

    Shuo Zhang, Liangming Pan, Junzhou Zhao, William Yang Wang [paper] 2023.5

  8. Improving Factuality and Reasoning in Language Models through Multiagent Debate

    Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch [paper] 2023.5

  9. Enabling Large Language Models to Generate Text with Citations

    Tianyu Gao, Howard Yen, Jiatong Yu, Danqi Chen [paper] 2023.5

  10. Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

    Ruochen Zhao, Xingxuan Li, Shafiq Joty, Chengwei Qin, Lidong Bing [paper] 2023.5

  11. Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

    Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih [paper] 2023.5

  12. Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models

    Miaoran Li, Baolin Peng, Zhu Zhang [paper] 2023.5

  13. Augmented Large Language Models with Parametric Knowledge Guiding

    Ziyang Luo, Can Xu, Pu Zhao, Xiubo Geng, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang [paper] 2023.5

  14. LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu [paper] 2023.5

  15. LM vs LM: Detecting Factual Errors via Cross Examination

    Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon [paper] 2023.5

  16. Measuring and Modifying Factual Knowledge in Large Language Models

    Pouya Pezeshkpour [paper] 2023.6

  17. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

    Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg [paper] 2023.6

  18. LLM Calibration and Automatic Hallucination Detection via Pareto Optimal Self-supervision

    Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon [paper] 2023.6

  19. A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation

    Neeraj Varshney, Wenlin Yao, Hongming Zhang, Jianshu Chen, Dong Yu [paper] 2023.7

  20. FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios

    I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu [paper] 2023.7

  21. Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

    Mohamed Elaraby, Mengyin Lu, Jacob Dunn, Xueying Zhang, Yu Wang, Shizhu Liu [paper] 2023.8

  22. PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

    Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai [paper] 2023.8