• Stars
    star
    6,002
  • Rank 6,716 (Top 0.2 %)
  • Language
    TypeScript
  • License
    BSD 3-Clause "New...
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

Verba

๐Ÿ• The Golden RAGtriever

Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally or through LLM providers such as OpenAI, Cohere, and HuggingFace.

pip install goldenverba

Weaviate PyPi downloads Docker support Demo

Demo of Verba

๐ŸŽฏ What Is Verba?

Verba is more than just a toolโ€”it's a personal assistant for querying and interacting with your data, either locally or deployed via cloud. Have questions about your documents? Need to cross-reference multiple data points? Want to gain insights from your existing knowledge base? Verba empowers you with the combined capabilities of Weaviate's context-aware database and the analytical power of Large Language Models (LLMs). Interact with your data through an intuitive chat interface that refines search results by using the ongoing conversation context to deliver even more accurate and relevant information.

Demo of Verba

โš™๏ธ Under the Hood

Verba is engineered with Weaviate's cutting-edge Generative Search technology at its core, extracting relevant context from your pool of documents to resolve queries with precision. By utilizing the power of Large Language Models, Verba doesn't just search for answersโ€”it understands and provides responses that are contextually rich and informed by the content of your documents, all through an intuitive user interface designed for simplicity and efficiency.

๐Ÿ’ก Effortless Data Import with Weaviate

Verba offers seamless data import functionality through its frontend, supporting a diverse range of file types including .txt, .md, .pdf and more. Before feeding your data into Weaviate, Verba handles chunking and vectorization to optimize it for search and retrieval. Together with collaborative partners we support popular libraries such as HuggingFace, Haystack, Unstructured and many more!

Demo of Verba

๐Ÿ’ฅ Advanced Query Resolution with Hybrid Search

Experience the hybrid search capabilities of Weaviate within Verba, which merges vector and lexical search methodologies for even greater precision. This dual approach not only navigates through your documents to pinpoint exact matches but also understands the nuance of context, enabling the Large Language Models to craft responses that are both comprehensive and contextually aware. It's an advanced technique that redefines document retrieval, providing you with precisely what you need, when you need it.

๐Ÿ”ฅ Accelerate Queries with Semantic Cache

Verba enhances search efficiency with Weaviate's Semantic Cache, a sophisticated system that retains the essence of your queries, results, and dialogues. This proactive feature means that Verba anticipates your needs, using cached data to expedite future inquiries. With semantic matching, it quickly determines if your question has been asked before, delivering instant results, and even suggests auto-completions based on historical interactions, streamlining your search experience to be faster and more intuitive.


โœจ Getting Started with Verba

Starting your Verba journey is super easy, with multiple deployment options tailored to your preferences. Follow these simple steps to get Verba up and running:

  • Deploy with pip (Quickstart)
pip install goldenverba
  • Build from Source (Quickstart)
git clone https://github.com/weaviate/Verba

pip install -e .
  • Use Docker for Deployment (Quickstart)

Prerequisites: If you're not using Docker, ensure that you have Python >=3.9.0 installed on your system.

๐Ÿ Installing Python and Setting Up a Virtual Environment

Before you can use Verba, you'll need to ensure that Python >=3.9.0 is installed on your system and that you can create a virtual environment for a safer and cleaner project setup.

Installing Python

Python is required to run Verba. If you don't have Python installed, follow these steps:

For Windows:

Download the latest Python installer from the official Python website. Run the installer and make sure to check the box that says Add Python to PATH during installation.

For macOS:

You can install Python using Homebrew, a package manager for macOS, with the following command in the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install Python:

brew install python

For Linux:

Python usually comes pre-installed on most Linux distributions. If it's not, you can install it using your distribution's package manager. You can read more about it here

Setting Up a Virtual Environment

It's recommended to use a virtual environment to avoid conflicts with other projects or system-wide Python packages.

Install the virtualenv package:

First, ensure you have pip installed (it comes with Python if you're using version 3.4 and above). Install virtualenv by running:

pip install virtualenv

Create a Virtual Environment:

Navigate to your project's directory in the terminal. Run the following command to create a virtual environment named venv (you can name it anything you like):

python3 -m virtualenv venv

Activate the Virtual Environment:

  • On Windows, activate the virtual environment by running:
venv\Scripts\activate.bat
  • On macOS and Linux, activate it with:
source venv/bin/activate

Once your virtual environment is activated, you'll see its name in the terminal prompt. Now you're ready to install Verba using the steps provided in the Quickstart sections.

Remember to deactivate the virtual environment when you're done working with Verba by simply running deactivate in the terminal.

Linting

We use ruff for automatic code formation and linting. The process is automated with a pre-commit hook. To install the hook, run:

pre-commit install

or for shorthand:

make pre-commit

After that all your commits will be automatically linted and formatted. The linting will happen only on the files you changed.

make pre-commit formats all files in the repository and install the hooks if needed.

๐Ÿ“ฆ Choosing the Right Verba Installation Package

Verba comes in several installation packages, each tailored for specific use cases and environments. Choose the package that aligns with your requirements:

Default Package

The default package is perfect for getting started quickly and includes support for popular models and services like OpenAI, Cohere, and spaCy. This package is suitable for general use and can be installed easily via pip:

pip install goldenverba

This will set you up with all you need to integrate Verba with these services without additional configuration.

HuggingFace Version

For those looking to leverage models from the HuggingFace ecosystem, including SentenceTransformer and LLama2, the HuggingFace version is the ideal choice. This package is optimized for GPU usage to accommodate the high performance demands of these models:

pip install goldenverba[huggingface]

Note: It's recommended to run this version on a system with a GPU to fully utilize the capabilities of the advanced models.

Development Version

If you're a developer looking to contribute to Verba or need the latest features still in development, the dev version is what you're looking for. This version may be less stable but offers the cutting edge of Verba's capabilities:

pip install goldenverba[dev]

Keep in mind that this version is intended for development purposes and may contain experimental features.

๐Ÿš€ Quickstart: Deploy with pip

  1. Initialize a new Python Environment
python3 -m virtualenv venv
  1. Install Verba
pip install goldenverba
  1. Launch Verba
verba start
  1. Access Verba
Visit localhost:8000
  1. Create .env file and add environment variables

๐Ÿ› ๏ธ Quickstart: Build from Source

  1. Clone the Verba repos
git clone https://github.com/weaviate/Verba.git
  1. Initialize a new Python Environment
python3 -m virtualenv venv
  1. Install Verba
pip install -e .
  1. Launch Verba
verba start
  1. Access Verba
Visit localhost:8000
  1. Create .env file and add environment variables

๐Ÿ”‘ API Keys

Before diving into Verba's capabilities, you'll need to configure access to various components depending on your chosen technologies, such as OpenAI, Cohere, and HuggingFace. Start by obtaining the necessary API keys and setting them up through a .env file based on our provided example , or by declaring them as environment variables on your system. If you're building from source or using Docker, make sure your .env file is within the goldenverba directory.

Below is a comprehensive list of the API keys and variables you may require:

Weaviate

Verba provides flexibility in connecting to Weaviate instances based on your needs. By default, Verba opts for Weaviate Embedded if it doesn't detect the WEAVIATE_URL_VERBA and WEAVIATE_API_KEY_VERBA environment variables. This local deployment is the most straightforward way to launch your Weaviate database for prototyping and testing.

However, you have other compelling options to consider:

๐ŸŒฉ๏ธ Weaviate Cloud Service (WCS)

If you prefer a cloud-based solution, Weaviate Cloud Service (WCS) offers a scalable, managed environment. Learn how to set up a cloud cluster and get the API keys by following the Weaviate Cluster Setup Guide.

๐Ÿณ Docker Deployment Another robust local alternative is deploying Weaviate using Docker. For more details, consult the Weaviate Docker Guide.

WEAVIATE_URL_VERBA=URL-TO-YOUR-WEAVIATE-CLUSTER

WEAVIATE_API_KEY_VERBA=API-KEY-OF-YOUR-WEAVIATE-CLUSTER

OpenAI

Verba supports OpenAI Models such as Ada, GPT3, and GPT4. To use them, you need to specify the OPENAI_API_KEY environment variable. You can get it from OpenAI

OPENAI_API_KEY=YOUR-OPENAI-KEY

You can also add a OPENAI_BASE_URL to use proxies such as LiteLLM (https://github.com/BerriAI/litellm)

OPENAI_BASE_URL=YOUR-OPENAI_BASE_URL

Azure OpenAI

To use Azure OpenAI, you need to set

  • The API type:
OPENAI_API_TYPE="azure"
  • The key and the endpoint:
OPENAI_API_KEY=<YOUR_KEY>
OPENAI_BASE_URL=http://XXX.openai.azure.com
  • Azure OpenAI ressource name, which is XXX if your endpoint is XXX.openai.azure.com
AZURE_OPENAI_RESOURCE_NAME=<YOUR_AZURE_RESOURCE_NAME>
  • You need to set the models, for the embeddings and for the query.
AZURE_OPENAI_EMBEDDING_MODEL="text-embedding-ada-002"
OPENAI_MODEL="gpt-4" 
  • Finally, as Azure is using per-minute quota, you might need to add a waiting time between each chunk upload. For example, if you have a limit of 240k tokens per minute, if your chunks are 400 tokens max, then 100ms between queries should be fine. If you get error 429 from weaviate, then increase this value.
WAIT_TIME_BETWEEN_INGESTION_QUERIES_MS="100"

Cohere

Verba supports Cohere Models, to use them, you need to specify the COHERE_API_KEY environment variable. You can get it from Cohere

COHERE_API_KEY=YOUR-COHERE-KEY

HuggingFace

Verba supports HuggingFace models, such as SentenceTransformers and Llama2. To use them you need the HF_TOKEN environment variable. You can get it from HuggingFace

HF_TOKEN=YOUR-HUGGINGFACE-TOKEN

Llama2

To use the Llama2 model from Meta, you first need to request access to it. Read more about accessing the Llama model here. To enable the LLama2 model for Verba use:

LLAMA2-7B-CHAT-HF=True

Unstructured

Verba supports importing documents through Unstructured (e.g .pdf). To use them you need the UNSTRUCTURED_API_KEY environment variable. You can get it from Unstructured

UNSTRUCTURED_API_KEY=YOUR-UNSTRUCTURED-KEY
UNSTRUCTURED_API_URL=YOUR-SELF-HOSTED-INSTANCE # If you are self hosting, in the form of `http://localhost:8000/general/v0/general`

Github

If you want to use the Github Reader, you need the GITHUB_TOKEN environment variable. You can get it from GitHub

GITHUB_TOKEN=YOUR-GITHUB-TOKEN

Status Page

Once configured, you can monitor your Verba installation's health and status via the 'Status Verba' page. This dashboard provides insights into your deployment type, libraries, environment settings, Weaviate schema counts, and more. It's also your go-to for maintenance tasks like resetting Verba, clearing the cache, or managing auto-complete suggestions.

Demo of Verba

๐Ÿณ Quickstart: Deploy with Docker

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries, and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating system kernel and are thus more lightweight than virtual machines. Docker provides an additional layer of abstraction and automation of operating-system-level virtualization on Windows and Linux.

Docker's use of containers to package software means that the application and its dependencies, libraries, and other binaries are packaged together and can be moved between environments easily. This makes it incredibly useful for developers looking to create predictable environments that are isolated from other applications.

To get started with deploying Verba using Docker, follow the steps below. If you need more detailed instructions on Docker usage, check out the Docker Curriculum.

If you're unfamiliar with Docker, you can learn more about it here.

  1. Clone the Verba repos Ensure you have Git installed on your system. Then, open a terminal or command prompt and run the following command to clone the Verba repository:
git clone https://github.com/weaviate/Verba.git
  1. Deploy using Docker With Docker installed and the Verba repository cloned, navigate to the directory containing the Docker Compose file in your terminal or command prompt. Run the following command to start the Verba application in detached mode, which allows it to run in the background:
docker compose up -d

This command will download the necessary Docker images, create containers, and start Verba. Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.

๐Ÿ’พ Importing Your Data into Verba

With Verba configured, you're ready to import your data and start exploring. Follow these simple steps to get your data into Verba:

Demo of Verba

  1. Initiate the Import Process

    • Click on "Add Documents" to begin.
  2. Select Your Data Processing Tools

    • At the top, you'll find three tabs labeled Reader, Chunker, and Embedder, each offering different options for handling your data.
  3. Choose a Reader

    • The Reader is responsible for importing your data. Select from the available options:
      • SimpleReader: For importing .txt and .md files.
      • GitHubReader: For loading data directly from a GitHub repository by specifying the path (owner/repo/folder_path).
      • PDFReader: For importing .pdf files.
  4. Select a Chunker

    • Chunkers break down your data into manageable pieces. Choose a suitable chunker:
      • WordChunker: Chunks the text by words.
      • SentenceChunker: Chunks the text by sentences.
  5. Pick an Embedder

    • Embedders are crucial for integrating your data into Weaviate. Select one based on your preference:
      • AdaEmbedder: Utilizes OpenAI's ADA model for embedding.
      • MiniLMEmbedder: Employs Sentence Transformers for embedding.
      • CohereEmbedder: Uses Cohere for embedding.
  6. Commence Data Ingestion

    • After setting up your preferences, click on "Import" to ingest your data into Verba.

Now your data is ready to be used within Verba, enabling you to leverage its powerful search and retrieval capabilities.

๐Ÿ’ฐ Large Language Model (LLM) Costs

Verba utilizes LLM models through APIs. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding and answer generation processes.

๐Ÿ’– Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Before contributing, please read the Contribution Guide. Visit our Weaviate Community Forum if you need any help!

๐Ÿ› ๏ธ Project Architecture

You can learn more about Verba's architecture and implementation in its technical documentation and frontend documentation. It's recommended to read them before making any contributions.

More Repositories

1

weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseโ€‹.
Go
10,796
star
2

recipes

This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!
Jupyter Notebook
478
star
3

weaviate-examples

Weaviate vector database โ€“ examples
HTML
297
star
4

semantic-search-through-wikipedia-with-weaviate

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
Python
241
star
5

healthsearch-demo

Discover Healthsearch: Unlocking Health with Semantic Search โœจ
TypeScript
162
star
6

weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
Python
160
star
7

structured-rag

StructuredRAG Benchmarker
Jupyter Notebook
85
star
8

awesome-weaviate

Awesome Weaviate
79
star
9

weaviate-io

Website for the Weaviate vector database
MDX
70
star
10

typescript-client

Official Weaviate TypeScript Client
TypeScript
64
star
11

weaviate-podcast-search

Search through the Weaviate Podcast!
Python
57
star
12

BookRecs

A simple semantic search demo to list books based on user query
TypeScript
51
star
13

weaviate-helm

Helm charts to deploy Weaviate to k8s
Shell
50
star
14

st-weaviate-connection

A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector database
Jupyter Notebook
49
star
15

generator9000

Web App for generating synthetic data
TypeScript
45
star
16

t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module
Python
38
star
17

Generative-Feedback-Loops

Resources for exploring Generative Feedback Loops with Weaviate!
Jupyter Notebook
35
star
18

spark-connector

Weaviate connector for Apache Spark
Scala
33
star
19

biggraph-wikidata-search-with-weaviate

Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
JavaScript
31
star
20

ref2vec-ecommerce-demo

Demo on using Weaviate's ref2vec vectorizer for building Recommendation Systems!
Python
30
star
21

weaviate-go-client

Go
30
star
22

DEMO-text2vec-openai

This repository contains an example of how to use the Weaviate vector search engine's text2vec-openai module
Python
29
star
23

weaviate-benchmarking

Tools for various benchmarking scenarios
Go
24
star
24

howto-weaviate-retrieval-plugin

Python
19
star
25

how-to-ingest-pdfs-with-unstructured

Jupyter Notebook
16
star
26

java-client

Official Weaviate Java Client
Java
15
star
27

weaviate-chaos-engineering

Chaos-Engineering-Style CI Pipelines to make sure Weaviate handles whatever the real world throws at it.
Go
15
star
28

weaviate-gorilla

Fine-tuned LLMs to use the Weaviate APIs!
Jupyter Notebook
13
star
29

contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
Go
13
star
30

weaviate-rust-client

Rust client library to interact with Weaviate
Rust
12
star
31

weaviate-javascript-client

No longer maintained, please see the TypeScript client
TypeScript
12
star
32

weaviate-cli

CLI tool for Weaviate
Python
11
star
33

weaviate-infra

JavaScript
11
star
34

typescript-embedded

An embedded Weaviate database with TypeScript client interface
TypeScript
11
star
35

arXiv-demo-dataset

This repository will contain a demo using Weaviate with data and metadata from the arXiv dataset.
HTML
11
star
36

quote-finder

TypeScript
9
star
37

multi2vec-bind-inference

Python
9
star
38

partner-integration-examples

Jupyter Notebook
8
star
39

weaviate-diagnostics

Weaviate Diagnostics ๐Ÿฉบ
Go
7
star
40

multi2vec-clip-inference

Weaviate module inference code for the multi2vec-clip module
Python
6
star
41

DEMO-datasets

Weaviate Demo Docker Compose files
6
star
42

reranker-transformers

Python
6
star
43

ner-transformers-models

The inference container for the Weaviate NER transformers module
Python
6
star
44

Getting-Started-With-Weaviate-Python-Client

Jupyter Notebook
5
star
45

demo-fixie-weaviate

How to build an agent that integrates with weaviate
Jupyter Notebook
4
star
46

verba-weaviate-data

Python
4
star
47

recipes-ts

TypeScript
4
star
48

weaviate-local-k8s

Github action to deploy a local kubernetes cluster with Weaviate installed on it
Shell
4
star
49

late-chunking-experiments

Jupyter Notebook
3
star
50

CORD-19-Weaviate

Python
3
star
51

qna-transformers-models

The inference container for the qna module
Python
3
star
52

DEMO-NewsPublications

Weaviate demo with news publications
Python
3
star
53

DEMO-GameWalkthroughs

Weaviate demo dataset with game walkthroughs
Python
3
star
54

t2v-transformers-models-rs

This is the repo for the container that holds the pure Rust implementation for the `text2vec-transformers` module
Rust
3
star
55

t2v-gpt4all-models

This is the repo for the container that holds the models for the text2vec-gpt4all module
Python
2
star
56

weaviate-io-site-search

Python
2
star
57

multi-tenancy-load-test

Smarty
2
star
58

confluent-connector

Jupyter Notebook
2
star
59

weaviate-BEIR-benchmarks

Collection of the BEIR benchmarks uploaded and backed up in Weaviate!
Jupyter Notebook
2
star
60

sum-transformers-models

Transformers-based Summarization inference models based on transformers architecture
Python
2
star
61

weaviate-recommend-python-client

Python client for interacting with the Weaviate recommend service.
Python
2
star
62

weaviatest

CLI tool to perform different weaviate operations seamlessly. It's main use is for testing the Weaviate application or reproduce specific scenarios.
Python
2
star
63

DEMO-SimpleWiki

Wikipedia simple english for Weaviate
Python
2
star
64

weaviate-operator

A Kubernetes Operator to automate the management of Weaviate Database Clusters
Smarty
1
star
65

weaviate-on-gcp-marketplace

Required Images and Build Scripts to publish Weaviate on GCP Marketplace
Python
1
star
66

weaviate-breadboard-kit

A breadboard kit for weaviate
TypeScript
1
star
67

i2v-pytorch-models

Inference containers for the Weaviate `img2vec-pytorch` module
Python
1
star
68

DEMO-ProductCatalog

Product catalog for Weaviate
Python
1
star
69

TEMPLATE-python

A python project template
Python
1
star
70

weaviate-graphql-prototype

weaviate-graphql-prototype
JavaScript
1
star
71

podcast-flow

Generate new content ideas from your existing content
Python
1
star
72

demo-chirpchase-weaviate

TypeScript
1
star
73

aws-marketplace-checkmy-iam

Python
1
star