• Stars
    star
    1,194
  • Rank 39,182 (Top 0.8 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Efficient Retrieval Augmentation and Generation Framework

Build and explore efficient retrieval-augmented generative models and applications


Key Features β€’ Components β€’ Installation β€’ Getting Started β€’ Examples

fastRAG is a research framework designed to facilitate the building of retrieval augmented generative pipelines. Its main goal is to make retrieval augmented generation as efficient as possible through the use of state-of-the-art and efficient retrieval and generative models. The framework includes a variety of sparse and dense retrieval models, as well as different extractive and generative information processing models. fastRAG aims to provide researchers and developers with a comprehensive tool-set for exploring and advancing the field of retrieval augmented generation.

Updates

🎩 Key Features

  • Retrieval Augmented X: A framework for developing efficient and fast retrieval augmented generative applications using the latest transformer-based NLP models (but not only).
  • Optimized Models: Includes optimized models of supported pipelines with greater compute efficiency.
  • Intel Optimizations (TBA): Leverage the latest optimizations developed by Intel for running pipelines with maximum hardware utilization, reduced latency, and increased throughput, using frameworks such as Intel extensions for PyTorch (IPEX) and Intel extension for Transformers.
  • Customizable: Built using Haystack and HuggingFace. All of fastRAG's components are 100% Haystack compatible.
Components fastRAG components
Models Models overview
Configs Example and predefined configurations
Example notebooks Example jupyter notebooks
Demos Example UIs for demos
Benchmarks Misc. benchmarks of fastRAG components
Scripts Scripts for creating indexes and fine-tuning models

πŸ“š Components

For a brief overview of the various models, please refer to the Models Overview section.

Unique components in fastRAG:

  • PLAID: An incredibly efficient engine designed for retrieving information through late interaction.
  • ColBERT: A Retriever (used in conjunction with PLAID) and re-ranker (employed with dense embeddings) that employs late interaction to determine relevancy scores.
  • Fusion-in-Decoder (FiD): A generative reader tailored for multi-document retrieval augmentation tasks.
  • Stable Diffusion Generator: A text-to-image generator that can be seamlessly integrated into any pipeline output.
  • Retrieval-Oriented Knowledge Graph Construction: A pipeline component responsible for extracting named entities and creating a graph encompassing all entities specified in the retrieved documents, including the relationships between related pairs of entities.

πŸ“ Installation

Preliminary requirements:

  • Python version 3.8 or higher
  • PyTorch library

To set up the software, perform the following steps in a fresh virtual environment:

pip install .

There are several dependencies to consider, depending on your specific usage:

# Additional engines/components
pip install .[elastic]             # Support for ElasticSearch store
pip install .[qdrant]              # Support for Qdrant store
pip install libs/colbert           # Indexing engine for ColBERT/PLAID
pip install .[faiss-cpu]           # CPU-based Faiss library
pip install .[faiss-gpu]           # GPU-based Faiss library
pip install .[image-generation]    # Stable diffusion library for image generation
pip install .[knowledge_graph]     # Libraries for working with spacy and KG

# User interface (for demos)
pip install .[ui]

# Benchmarking
pip install .[benchmark]

# Development tools
pip install .[dev]

πŸš€ Getting Started

fastRAG leverages Haystack's pipelining abstraction. We recommend constructing a flow by incorporating components provided by fastRAG and Haystack, tailored to the specific task you aim to tackle. There are various approaches to achieving this using fastRAG.

Defining Pipelines in Your Code

To define a pipeline in your Python code, you can initialize all the components with the desired configuration directly in your code. This allows you to have full control over the pipeline structure and parameters. For concrete examples and detailed implementation guidance, please refer to the example notebooks provided by our team.

Defining Pipelines Using YAML

Another approach to defining pipelines is by writing a YAML file following Haystack's format. This method allows for a more declarative and modular pipeline configuration. You can find detailed information on how to define pipelines using a YAML file in the Haystack documentation. The documentation provides guidance on the structure of the YAML file, available components, their parameters, and how to combine them to create a custom pipeline.

We have provided miscellaneous pipeline configurations in the config directory.

Serving a Pipeline via REST API

To serve a fastRAG pipeline through a REST API, you can follow these steps:

  1. Execute the following command in your terminal:
python -m fastrag.rest_api.application --config=pipeline.yaml
  1. If needed, you can explore additional options using the -h flag.

  2. The REST API service includes support for Swagger. You can access a user-friendly UI to observe and interact with the API endpoints by visiting http://localhost:8000/docs in your web browser.

The available endpoints for the REST API service are as follows:

  • status: This endpoint can be used to perform a sanity check.
  • version: This endpoint provides the project version, as defined in __init__.py.
  • query: Use this endpoint to run a query through the pipeline and retrieve the results.

By leveraging the REST API service, you can integrate fastRAG pipelines into your applications and easily interact with them using HTTP requests.

Generating Pipeline Configurations

generate using a script

The pipeline in fastRAG is constructed using the Haystack pipeline API and is dynamically generated based on the user's selection of components. To generate a Haystack pipeline that can be executed as a standalone REST server service (refer to REST API), you can utilize the Pipeline Generation script.

Below is an example that demonstrates how to use the script to generate a pipeline with a ColBERT retriever, an SBERT reranker, and an FiD reader:

python generate_pipeline.py --path "retriever,reranker,reader" \
    --store config/store/plaid-wiki.yaml \
    --retriever config/retriever/colbert-v2.yaml \
    --reranker config/reranker/sbert.yaml \
    --reader config/reader/FiD.yaml \
    --file pipeline.yaml

In the above command, you specify the desired components using the --path option, followed by providing the corresponding configuration YAML files for each component (e.g., --store, --retriever, --reranker, --reader). Finally, you can specify the output file for the generated pipeline configuration using the --file option (in this example, it is set to pipeline.yaml).

Index Creation

For detailed instructions on creating various types of indexes, please refer to the Indexing Scripts directory. It contains valuable information and resources to guide you through the process of creating different types of indexes.

Customizing Models

To cater to different use cases, we provide a variety of training scripts that allow you to fine-tune models of your choice. For detailed examples, model descriptions, and more information, please refer to the Models Overview page. It will provide you with valuable insights into different models and their applications.

🎯 Example Use Cases

Efficient Open Domain Question-Answering

Generate answers to questions answerable by using a corpus of knowledge.

Retrieval with fast lexical retrieval with BM25 or late-interaction dense retrieval with PLAID
Ranking with Sentence Transformers or ColBERT
Generation with Fusion-in-Decoder

flowchart LR
    id1[(Elastic<br>/PLAID)] <--> id2(BM25<br>/ColBERT) --> id3(ST<br>/ColBERT) --> id4(FiD)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

πŸ““ Simple generative open-domain QA with BM25 and ST
πŸ““ Efficient and fast ODQA with PLAID, ColBERT and FiD

Retrival Augmented Generation with a LLM

To enhance generations using a Large Language Model (LLM) with retrieval augmentation, you can follow these steps:

  1. Define a retrieval flow: This involves creating a store that holds the relevant information and one or more retrievers/rankers to retrieve the most relevant documents or passages.

  2. Define a prompt template: Design a template that includes a suitable context or instruction, along with placeholders for the query and information retrieved by the pipeline. These placeholders will be filled in dynamically during generation.

  3. Request token generation from the LLM: Utilize the prompt template and pass it to the LLM, allowing it to generate tokens based on the provided context, query, and retrieved information.

Most of Huggingface Decoder LLMs are supported.

See a complete example in our RAG with LLMsπŸ““ notebook.

flowchart LR
    id1[(Index)] <-->id2(.. Retrieval pipeline ..) --> id3(Prompt Template) --> id4(LLM)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id3 fill:#F3CECC,stroke:#B25450
    style id4 fill:#D5E8D4,stroke:#82B366

ChatGPT Open Domain Reranking and QA

Use ChatGPT API to both rerank the documents for any query, and provide an answer to the query using the chosen documents.

πŸ““ GPT as both Reranker and Reader

flowchart LR
    id1[(Index)] <--> id2(.. Retrieval pipeline ..) --> id4(ChatGPT)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

Open Domain Summarization

Summarize topics given free-text input and a corpus of knowledge. Retrieval with BM25 or other retrievers
Ranking with Sentence Transformers or other rankers
Generation Using "summarize: " prompt, all documents concatenated and FLAN-T5 generative model

πŸ““ Open Domain Summarization

flowchart LR
    id1[(Elastic)] <--> id2(BM25) --> id3(SentenceTransformer) -- summarize--> id4(FLAN-T5)
    style id1 fill:#E1D5E7,stroke:#9673A6
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366

Retrieval-Oriented Knowledge Graph Construction

Use with any retrieval pipeline to extract Named Entities (NER) and generate relation-maps using Relation Classification Model (RC).

πŸ““ Knowledge Graph Construction

flowchart LR
    id2(.. Retrieval pipeline ..) --> id4(NER) --> id5(RC)
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
    style id5 fill:#F3CECC,stroke:#B25450

Retrieval-Oriented Answer Image Generation

Use with any retrieval pipeline to generate a dynamic image from the answer to the query, using a diffusion model.

πŸ““ Answer Image Generation

flowchart LR
    id2(.. Retrieval pipeline ..) --> id4(FiD) --> id5(Diffusion)
    style id2 fill:#DAE8FC,stroke:#6C8EBF
    style id4 fill:#D5E8D4,stroke:#82B366
    style id5 fill:#F3CECC,stroke:#B25450

License

The code is licensed under the Apache 2.0 License.

Disclaimer

This is not an official Intel product.

More Repositories

1

distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Jupyter Notebook
4,332
star
2

nlp-architect

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Python
2,936
star
3

coach

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Python
2,321
star
4

control-flag

A system to flag anomalous source code expressions by learning typical expressions from training data
C++
1,241
star
5

flrc

Haskell Research Compiler
Standard ML
814
star
6

RiverTrail

An API for data parallelism in JavaScript
JavaScript
748
star
7

kAFL

A fuzzer for full VM kernel/driver targets
Makefile
636
star
8

bayesian-torch

A library for Bayesian neural network layers and uncertainty estimation in Deep Learning extending the core of PyTorch
Python
503
star
9

academic-budget-bert

Repository containing code for "How to Train BERT with an Academic Budget" paper
Python
308
star
10

ParallelAccelerator.jl

The ParallelAccelerator package, part of the High Performance Scripting project at Intel Labs
Julia
294
star
11

RAGFoundry

Framework for enhancing LLMs for RAG tasks using fine-tuning.
Python
289
star
12

SkimCaffe

Caffe for Sparse Convolutional Neural Network
C++
238
star
13

pWord2Vec

Parallelizing word2vec in shared and distributed memory
C++
191
star
14

causality-lab

Causal discovery algorithms and tools for implementing new ones
Jupyter Notebook
167
star
15

matsciml

Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.
Python
143
star
16

riscv-vector

Vector Acceleration IP core for RISC-V*
Scala
136
star
17

Model-Compression-Research-Package

A library for researching neural networks compression and acceleration methods.
Python
134
star
18

IntelNeuromorphicDNSChallenge

Intel Neuromorphic DNS Challenge
Jupyter Notebook
126
star
19

MMPano

Official implementation of L-MAGIC
Python
123
star
20

rnnlm

Recurrent Neural Network Language Modeling (RNNLM) Toolkit
C++
121
star
21

HPAT.jl

High Performance Analytics Toolkit (HPAT) is a Julia-based framework for big data analytics on clusters.
Julia
120
star
22

FP8-Emulation-Toolkit

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
Python
90
star
23

ScalableVectorSearch

C++
88
star
24

VL-InterpreT

Visual Language Transformer Interpreter - An interactive visualization tool for interpreting vision-language transformers
Python
84
star
25

vdms

VDMS: Your Favorite Visual Data Management System
C++
82
star
26

SpMP

sparse matrix pre-processing library
C++
81
star
27

SLIDE_opt_ia

C++
74
star
28

CLNeRF

Python
63
star
29

baa-ngp

This repository contains the official Implementation for "BAA-NGP: Bundle-Adjusting Accelerated Neural Graphics Primitives".
Python
56
star
30

autonomousmavs

Framework for Autonomous Navigation of Micro Aerial Vehicles
C++
56
star
31

multimodal_cognitive_ai

research work on multimodal cognitive ai
Python
56
star
32

Latte.jl

A high-performance DSL for deep neural networks in Julia
Julia
53
star
33

AVUC

Code to accompany the paper 'Improving model calibration with accuracy versus uncertainty optimization'.
Python
51
star
34

GraVi-T

Graph learning framework for long-term video understanding
Python
49
star
35

PreSiFuzz

Pre-Silicon Hardware Fuzzing Toolkit
Rust
47
star
36

pmgd

Persistent Memory Graph Database
C++
43
star
37

TSAD-Evaluator

Intel Labs open source repository for time series anomaly detection evaluator
C++
41
star
38

Open-Omics-Acceleration-Framework

Intel lab's open sourced data science framework for accelerating digital biology
Jupyter Notebook
36
star
39

Auto-Steer

Auto-Steer
Python
36
star
40

FloorSet

Jupyter Notebook
34
star
41

SAR

Python
34
star
42

kafl.fuzzer

kAFL Fuzzer
Python
32
star
43

CompilerTools.jl

The CompilerTools package, part of the High Performance Scripting project at Intel Labs
Julia
30
star
44

TinyGarble2.0

C++
29
star
45

t2sp

Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)
C++
29
star
46

DyNAS-T

Dynamic Neural Architecture Search Toolkit
Jupyter Notebook
28
star
47

ParallelJavaScript

A collection of example workloads for Parallel JavaScript
HTML
26
star
48

kafl.targets

Target components for kAFL/Nyx Fuzzer
C
25
star
49

continuallearning

Python
25
star
50

iHRC

Intel Heterogeneous Research Compiler (iHRC)
C++
25
star
51

scenario_execution

Scenario Execution for Robotics
Python
25
star
52

flrc-lib

Pillar compiler, Pillar runtime, garbage collector.
C++
23
star
53

lvlm-interpret

Python
23
star
54

iACT

C++
22
star
55

OSCAR

Object Sensing and Cognition for Adversarial Robustness
Jupyter Notebook
20
star
56

MICSAS

MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
Python
19
star
57

mat2qubit

Python
19
star
58

csg

IV 2020 "CSG: Critical Scenario Generation from Real Traffic Accidents"
Python
18
star
59

Sparso

Julia package for accelerating sparse matrix applications.
Julia
18
star
60

open-omics-alphafold

Python
17
star
61

MART

Modular Adversarial Robustness Toolkit
Python
16
star
62

Trans-Omics-Acceleration-Library

HTML
15
star
63

Hardware-Aware-Automated-Machine-Learning

Jupyter Notebook
15
star
64

kafl.linux

Linux kernel branches for confidential compute research
15
star
65

c3-simulator

C3-Simulator is a Simics-based functional simulator for the X86 C3 processor, including library and kernel support for pointer and data encryption, stack unwinding support for C++ exception handling, debugger enabling, and scripting for running tests.
C++
14
star
66

VectorSearchDatasets

Python
11
star
67

flrc-benchmarks

Benchmarks for use with IntelLabs/flrc.
Haskell
10
star
68

ais-benchmarks

A framework, based on python and numpy, for evaluation of sampling methods
Python
10
star
69

ALTO

A template-based implementation of the Adaptive Linearized Tensor Order (ALTO) format for storing and processing sparse tensors.
C++
10
star
70

hec-p-isa-tools

Intel’s HERACLES accelerator introduces a new set of fundamental instructions, the Polynomial Instructions Set Architecture (P-ISA) that operates directly on polynomials requiring a completely new programming environment. This open-source project aims at developing the building blocks for a compiler toolchain for HERACLES.
Python
10
star
71

PyTorchALFI

Application Level Fault Injection for Pytorch
Python
9
star
72

RiverTrail-interactive

An interactive shell in your browser for writing and running River Trail programs
JavaScript
8
star
73

gma

Linux Client & Server Software to support Generic Multi-Access Network Virtualization
C++
8
star
74

dfm

DFM (Deep Feature Modeling) is an efficient and principled method for out-of-distribution detection, novelty and anomaly detection.
Python
7
star
75

SOI_FFT

Segment-of-interest low-communication FFT algorithm
C
7
star
76

vcl

DEPRECATED - No longer maintained. Updates are will be provided through the VDMS project
C++
6
star
77

DATSA

DATSA
C++
6
star
78

Hybrid-Quantum-Classical-Library

Hybrid Quantum-Classical Library (HQCL)
C++
6
star
79

spic

Semantic Preserving Image Compression
Python
6
star
80

generative-ai

Intel Generative Image Model Benchmark
Jupyter Notebook
6
star
81

Optimized-Implementation-of-Word-Movers-Distance

C++
6
star
82

token_elimination

Python
6
star
83

NeuroCounterfactuals

Jupyter Notebook
5
star
84

c3-glibc

C
5
star
85

PolarFly

Source code repository for paper being presented at Super Computing 22 Conference.
C++
5
star
86

aspect-extraction

Pattern Based Aspect Term Extraction
Python
5
star
87

networkgym

NetworkGym is a Simulation-aaS framework to support Network AI algorithm development by providing high-fidelity full-stack e2e network simulation in cloud and allowing AI developers to interact with the simulated network environment through open APIs.
C++
5
star
88

Latte.py

Python
5
star
89

HDFIT

HDFIT (Hardware Design Fault Injection Toolkit) Github documentation pages.
5
star
90

TME-MK-Fine-Grained-Encryption-Integrity

Makefile
5
star
91

EquiTriton

EquiTriton is a project that seeks to implement high-performance kernels for commonly used building blocks in equivariant neural networks, enabling compute efficient training and inference.
Python
4
star
92

Incremental-Neural-Videos-with-PyTorch

Incremental-Neural-Videos-with-PyTorch*
Python
4
star
93

kafl.qemu

4
star
94

simics-plus-rtl

This project contains the Chisel code for a CRC32 datapath alongside a skeleton PCI component in Simics DML which connects to the C++ conversion of the CRC32 datapath.
Scala
4
star
95

Chisel-cocotb-Examples

This project contains generic example hardware modules and their testbenches written in Chisel and cocotb to demonstrate an agile hardware development methodology.
Python
4
star
96

LogReplicationRocksDB

C++
4
star
97

emp-ot

C++
3
star
98

kafl.libxdc

C
3
star
99

kafl.actions

Github actions for KAFL
Python
3
star
100

emp-tool

C++
3
star