Dense Retrieval Papers
A collection of papers related to dense retrieval.
The arrangement of papers refers to our survey "Dense Text Retrieval based on Pretrained Language Models: A Survey".
If you find our survey useful for your research, please cite the following paper:
@article{DRSurvey,
title={Dense Text Retrieval based on Pretrained Language Models: A Survey},
author={Wayne Xin Zhao, Jing Liu, Ruiyang Ren, Ji-Rong Wen},
year={2022},
journal={arXiv preprint arXiv:2211.14876}
}
Table of Contents
- Survey paper
- Architecture
- Training
- Indexing
- Interation with Re-ranking
- Advanced Topics
- Applications
- Datasets
- Libraries
Survey Paper
Paper | Author | Venue | Code |
---|---|---|---|
Pretrained Transformers for Text Ranking: BERT and Beyond. | Jimmy Lin et al. | Synthesis HLT 2021 | NA |
Semantic Models for the First-stage Retrieval: A Comprehensive Review. | Yinqiong Cai et al. | Arxiv 2021 | NA |
Pre-training Methods in Information Retrieval. | Yixing Fan et al. | Arxiv 2021 | NA |
A Deep Look into Neural Ranking Models for Information Retrieval. | Jiafeng Guo et al. | Inf. Process. Manag. 2020 | NA |
Lecture Notes on Neural Information Retrieval. | Nicola Tonellotto. | Arxiv 2022 | NA |
Low-Resource Dense Retrieval for Open-Domain Question Answering: A Comprehensive Survey. | Xiaoyu Shen et al. | Arxiv 2022 | NA |
Architecture
Training
Formulation
Paper | Author | Venue | Code |
---|---|---|---|
More Robust Dense Retrieval with Contrastive Dual Learning. | Yizhi Li et al. | ICTIR 2021 | Python |
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval. | Ruiyang Ren et al. | ACL 2021 | Python |
xMoCo: Cross Momentum Contrastive Learning for Open-Domain Question Answering. | Nan Yang et al. | ACL 2021 | NA |
A Modern Perspective on Query Likelihood with Deep Generative Retrieval Models. | Oleg Lesota et al. | ICTIR 2021 | Python |
Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval. | Zehan Li et al. | Arxiv 2022 | Python |
Shallow pooling for sparse labels. | Negar Arabzadeh et al. | Arxiv 2021 | NA |
Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models. | Yinqiong Cai et al. | Arxiv 2022 | NA |
Debiased Contrastive Learning of Unsupervised Sentence Representations. | Kun Zhou et al. | ACL 2022 | NA |
Negative Selection
Data Augmentation
Pre-training
Indexing
Interation with Re-ranking
Advanced Topics
Zero-shot Dense Retrieval
Improving the Robustness to Query Variations
Paper | Author | Venue | Code |
---|---|---|---|
Towards Robust Dense Retrieval via Local Ranking Alignment. | Xuanang Chen et al. | IJCAI 2022 | Python |
CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos. | Shengyao Zhuang et al. | SIGIR 2022 | Python |
Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. | Gustavo Penha et al. | ECIR 2022 | Python |
Retrieval Consistency in the Presence of Query Variations. | Peter Bailey et al. | SIGIR 2017 | NA |
Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings. | Peter Bailey et al. | Arxiv 2022 | Shell |
A Survey of Automatic Query Expansion in Information Retrieval. | Claudio Carpineto et al. | CSUR 2012 | NA |
BERT Rankers are Brittle: a Study using Adversarial Document Perturbations. | Yumeng Wang et al. | SIGIR 2022 | Python |
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models. | Jiawei Liu et al. | CoRR 2022 | NA |
Generative Text Retrieval
Retrieval-Augmented Language Model
Paper | Author | Venue | Code |
---|---|---|---|
Generalization through memorization: Nearest neighbor language models. | Urvashi Khandelwa et al. | Arxiv 2020 | Python |
Adaptive semiparametric language models. | Dani Yogatama et al. | TACL 2021 | NA |
Improving language models by retrieving from trillions of tokens. | Borgeaud, Sebastian, et al. | Arxiv 2021 | NA |
REALM: Retrieval-Augmented Language Model Pre-Training. | Kelvin Guu et al. | ICML 2020 | Python |
Simple and Efficient ways to Improve REALM. | Vidhisha Balachandran et al. | Arxiv 2021 | NA |
Adaptive Semiparametric Language Models. | Dani Yogatama et al. | TACL 2021 | NA |
Efficient Nearest Neighbor Language Models. | Junxian He et al. | EMNLP 2021 | Python |
Applications
Information Retrieval Applications
Paper | Author | Venue | Code |
---|---|---|---|
Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models. | Bogdan Kostic et al. | Arxiv 2021 | NA |
Open Domain Question Answering over Tables via Dense Retrieval. | Jonathan Herzig et al. | NAACL 2021 | Python |
SituatedQA: Incorporating Extra-Linguistic Contexts into QA. | Michael J.Q. Zhang et al. | EMNLP 2021 | DATA |
XOR QA: Cross-lingual Open-Retrieval Question Answering. | Akari Asai et al. | NAACL 2021 | Python |
One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval. | Akari Asai et al. | NeurIPS 2021 | Python |
Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval. | Wei Zhong et al. | Arxiv 2022 | Python |
ReACC: A Retrieval-Augmented Code Completion Framework. | Shuai Lu et al. | ACL 2022 | Python |
Improving Biomedical Information Retrieval with Neural Retrievers. | Man Luo et al. | AAAI 2022 | NA |