Build and explore efficient retrieval-augmented generative models and applications
Key Features โข Components โข Installation โข Getting Started โข Examples
fastRAG is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation.
Updates
- June 2023: ColBERT index modification: adding/removing documents; see IndexUpdater.
- May 2023: RAG with LLM and dynamic prompt synthesis example.
- April 2023: Qdrant
DocumentStore
support.
๐ฉ Key Features
- Retrieval Augmented X: A framework for developing efficient and fast retrieval augmented generative applications using the latest transformer-based NLP models (but not only).
- Optimized Models: Includes optimized models of supported pipelines with greater compute efficiency.
- Intel Optimizations (TBA): Leverage the latest optimizations developed by Intel for running pipelines with maximum hardware utilization, reduced latency, and increased throughput, using frameworks such as Intel extensions for PyTorch (IPEX) and Intel extension for Transformers.
- Customizable: Built using Haystack and HuggingFace. All of fastRAG's components are 100% Haystack compatible.
Components | fastRAG components |
Models | Models overview |
Configs | Example and predefined configurations |
Example notebooks | Example jupyter notebooks |
Demos | Example UIs for demos |
Benchmarks | Misc. benchmarks of fastRAG components |
Scripts | Scripts for creating indexes and fine-tuning models |
๐ Components
For a brief overview of the various models, please refer to the Models Overview section.
Unique components in fastRAG:
- PLAID: An incredibly efficient engine designed for retrieving information through late interaction.
- ColBERT: A Retriever (used in conjunction with PLAID) and re-ranker (employed with dense embeddings) that employs late interaction to determine relevancy scores.
- Fusion-in-Decoder (FiD): A generative reader tailored for multi-document retrieval augmentation tasks.
- Stable Diffusion Generator: A text-to-image generator that can be seamlessly integrated into any pipeline output.
- Retrieval-Oriented Knowledge Graph Construction: A pipeline component responsible for extracting named entities and creating a graph encompassing all entities specified in the retrieved documents, including the relationships between related pairs of entities.
๐ Installation
Preliminary requirements:
- Python version 3.8 or higher
- PyTorch library
To set up the software, perform the following steps in a fresh virtual environment:
pip install .
There are several dependencies to consider, depending on your specific usage:
# Additional engines/components
pip install .[elastic] # Support for ElasticSearch store
pip install .[qdrant] # Support for Qdrant store
pip install libs/colbert # Indexing engine for ColBERT/PLAID
pip install .[faiss-cpu] # CPU-based Faiss library
pip install .[faiss-gpu] # GPU-based Faiss library
pip install .[image-generation] # Stable diffusion library for image generation
pip install .[knowledge_graph] # Libraries for working with spacy and KG
# User interface (for demos)
pip install .[ui]
# Benchmarking
pip install .[benchmark]
# Development tools
pip install .[dev]
๐ Getting Started
fastRAG leverages Haystack's pipelining abstraction. We recommend constructing a flow by incorporating components provided by fastRAG and Haystack, tailored to the specific task you aim to tackle. There are various approaches to achieving this using fastRAG.
Defining Pipelines in Your Code
To define a pipeline in your Python code, you can initialize all the components with the desired configuration directly in your code. This allows you to have full control over the pipeline structure and parameters. For concrete examples and detailed implementation guidance, please refer to the example notebooks provided by our team.
Defining Pipelines Using YAML
Another approach to defining pipelines is by writing a YAML file following Haystack's format. This method allows for a more declarative and modular pipeline configuration. You can find detailed information on how to define pipelines using a YAML file in the Haystack documentation. The documentation provides guidance on the structure of the YAML file, available components, their parameters, and how to combine them to create a custom pipeline.
We have provided miscellaneous pipeline configurations in the config directory.
Serving a Pipeline via REST API
To serve a fastRAG pipeline through a REST API, you can follow these steps:
- Execute the following command in your terminal:
python -m fastrag.rest_api.application --config=pipeline.yaml
-
If needed, you can explore additional options using the
-h
flag. -
The REST API service includes support for Swagger. You can access a user-friendly UI to observe and interact with the API endpoints by visiting
http://localhost:8000/docs
in your web browser.
The available endpoints for the REST API service are as follows:
status
: This endpoint can be used to perform a sanity check.version
: This endpoint provides the project version, as defined in__init__.py
.query
: Use this endpoint to run a query through the pipeline and retrieve the results.
By leveraging the REST API service, you can integrate fastRAG pipelines into your applications and easily interact with them using HTTP requests.
Generating Pipeline Configurations
generate using a script
The pipeline in fastRAG is constructed using the Haystack pipeline API and is dynamically generated based on the user's selection of components. To generate a Haystack pipeline that can be executed as a standalone REST server service (refer to REST API), you can utilize the Pipeline Generation script.
Below is an example that demonstrates how to use the script to generate a pipeline with a ColBERT retriever, an SBERT reranker, and an FiD reader:
python generate_pipeline.py --path "retriever,reranker,reader" \
--store config/store/plaid-wiki.yaml \
--retriever config/retriever/colbert-v2.yaml \
--reranker config/reranker/sbert.yaml \
--reader config/reader/FiD.yaml \
--file pipeline.yaml
In the above command, you specify the desired components using the --path
option, followed by providing the corresponding configuration YAML files for each component (e.g., --store
, --retriever
, --reranker
, --reader
). Finally, you can specify the output file for the generated pipeline configuration using the --file
option (in this example, it is set to pipeline.yaml
).
Index Creation
For detailed instructions on creating various types of indexes, please refer to the Indexing Scripts directory. It contains valuable information and resources to guide you through the process of creating different types of indexes.
Customizing Models
To cater to different use cases, we provide a variety of training scripts that allow you to fine-tune models of your choice. For detailed examples, model descriptions, and more information, please refer to the Models Overview page. It will provide you with valuable insights into different models and their applications.
๐ฏ Example Use Cases
Efficient Open Domain Question-Answering
Generate answers to questions answerable by using a corpus of knowledge.
Retrieval with fast lexical retrieval with BM25 or late-interaction dense retrieval with PLAID
Ranking with Sentence Transformers or ColBERT
Generation with Fusion-in-Decoder
flowchart LR
id1[(Elastic<br>/PLAID)] <--> id2(BM25<br>/ColBERT) --> id3(ST<br>/ColBERT) --> id4(FiD)
style id1 fill:#E1D5E7,stroke:#9673A6
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id4 fill:#D5E8D4,stroke:#82B366
Retrival Augmented Generation with a LLM
To enhance generations using a Large Language Model (LLM) with retrieval augmentation, you can follow these steps:
-
Define a retrieval flow: This involves creating a store that holds the relevant information and one or more retrievers/rankers to retrieve the most relevant documents or passages.
-
Define a prompt template: Design a template that includes a suitable context or instruction, along with placeholders for the query and information retrieved by the pipeline. These placeholders will be filled in dynamically during generation.
-
Request token generation from the LLM: Utilize the prompt template and pass it to the LLM, allowing it to generate tokens based on the provided context, query, and retrieved information.
Most of Huggingface Decoder LLMs are supported.
See a complete example in our RAG with LLMs
flowchart LR
id1[(Index)] <-->id2(.. Retrieval pipeline ..) --> id3(Prompt Template) --> id4(LLM)
style id1 fill:#E1D5E7,stroke:#9673A6
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id3 fill:#F3CECC,stroke:#B25450
style id4 fill:#D5E8D4,stroke:#82B366
ChatGPT Open Domain Reranking and QA
Use ChatGPT API to both rerank the documents for any query, and provide an answer to the query using the chosen documents.
flowchart LR
id1[(Index)] <--> id2(.. Retrieval pipeline ..) --> id4(ChatGPT)
style id1 fill:#E1D5E7,stroke:#9673A6
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id4 fill:#D5E8D4,stroke:#82B366
Open Domain Summarization
Summarize topics given free-text input and a corpus of knowledge.
Retrieval with BM25 or other retrievers
Ranking with Sentence Transformers or other rankers
Generation Using "summarize: "
prompt, all documents concatenated and FLAN-T5 generative model
flowchart LR
id1[(Elastic)] <--> id2(BM25) --> id3(SentenceTransformer) -- summarize--> id4(FLAN-T5)
style id1 fill:#E1D5E7,stroke:#9673A6
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id4 fill:#D5E8D4,stroke:#82B366
Retrieval-Oriented Knowledge Graph Construction
Use with any retrieval pipeline to extract Named Entities (NER) and generate relation-maps using Relation Classification Model (RC).
flowchart LR
id2(.. Retrieval pipeline ..) --> id4(NER) --> id5(RC)
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id4 fill:#D5E8D4,stroke:#82B366
style id5 fill:#F3CECC,stroke:#B25450
Retrieval-Oriented Answer Image Generation
Use with any retrieval pipeline to generate a dynamic image from the answer to the query, using a diffusion model.
flowchart LR
id2(.. Retrieval pipeline ..) --> id4(FiD) --> id5(Diffusion)
style id2 fill:#DAE8FC,stroke:#6C8EBF
style id4 fill:#D5E8D4,stroke:#82B366
style id5 fill:#F3CECC,stroke:#B25450
License
The code is licensed under the Apache 2.0 License.
Disclaimer
This is not an official Intel product.