• Stars
    star
    329
  • Rank 128,030 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Neural Search

Neural-Cherche

Neural Search

documentation license

Neural-Cherche is a library designed to fine-tune neural search models such as Splade, ColBERT, and SparseEmbed on a specific dataset. Neural-Cherche also provide classes to run efficient inference on a fine-tuned retriever or ranker. Neural-Cherche aims to offer a straightforward and effective method for fine-tuning and utilizing neural search models in both offline and online settings. It also enables users to save all computed embeddings to prevent redundant computations.

Installation

We can install neural-cherche using:

pip install neural-cherche

If we plan to evaluate our model while training install:

pip install "neural-cherche[eval]"

Documentation

The complete documentation is available here.

Quick Start

Your training dataset must be made out of triples (anchor, positive, negative) where anchor is a query, positive is a document that is directly linked to the anchor and negative is a document that is not relevant for the anchor.

X = [
    ("anchor 1", "positive 1", "negative 1"),
    ("anchor 2", "positive 2", "negative 2"),
    ("anchor 3", "positive 3", "negative 3"),
]

And here is how to fine-tune ColBERT from a Sentence Transformer pre-trained checkpoint using neural-cherche:

import torch

from neural_cherche import models, utils, train

model = models.ColBERT(
    model_name_or_path="sentence-transformers/all-mpnet-base-v2",
    device="cuda" if torch.cuda.is_available() else "cpu"
)

optimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)

X = [
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
    ("query", "positive document", "negative document"),
]

for anchor, positive, negative in utils.iter(
        X,
        epochs=1,
        batch_size=32,
        shuffle=True
    ):

    loss = train.train_colbert(
        model=model,
        optimizer=optimizer,
        anchor=anchor,
        positive=positive,
        negative=negative,
    )

model.save_pretrained("checkpoint")

Neural-Cherche Contributors

References