• Stars
    star
    2,028
  • Rank 21,835 (Top 0.5 %)
  • Language
    Python
  • License
    BSD 3-Clause "New...
  • Created 9 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

Verba

πŸ• The Golden RAGtriever

Welcome to Verba: The Golden RAGtriever, an open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally or through LLM providers such as OpenAI, Cohere, and HuggingFace.

pip install goldenverba

Weaviate PyPi downloads Docker support Demo

Demo of Verba

🎯 What Is Verba?

Verba is more than just a toolβ€”it's a personal assistant for querying and interacting with your data, either locally or deployed via cloud. Have questions about your documents? Need to cross-reference multiple data points? Want to gain insights from your existing knowledge base? Verba empowers you with the combined capabilities of Weaviate's context-aware database and the analytical power of Large Language Models (LLMs). Interact with your data through an intuitive chat interface that refines search results by using the ongoing conversation context to deliver even more accurate and relevant information.

Demo of Verba

βš™οΈ Under the Hood

Verba is engineered with Weaviate's cutting-edge Generative Search technology at its core, extracting relevant context from your pool of documents to resolve queries with precision. By utilizing the power of Large Language Models, Verba doesn't just search for answersβ€”it understands and provides responses that are contextually rich and informed by the content of your documents, all through an intuitive user interface designed for simplicity and efficiency.

πŸ’‘ Effortless Data Import with Weaviate

Verba offers seamless data import functionality through its frontend, supporting a diverse range of file types including .txt, .md, .pdf and more. Before feeding your data into Weaviate, Verba handles chunking and vectorization to optimize it for search and retrieval. Together with collaborative partners we support popular libraries such as HuggingFace, Haystack, Unstructured and many more!

Demo of Verba

πŸ’₯ Advanced Query Resolution with Hybrid Search

Experience the hybrid search capabilities of Weaviate within Verba, which merges vector and lexical search methodologies for even greater precision. This dual approach not only navigates through your documents to pinpoint exact matches but also understands the nuance of context, enabling the Large Language Models to craft responses that are both comprehensive and contextually aware. It's an advanced technique that redefines document retrieval, providing you with precisely what you need, when you need it.

πŸ”₯ Accelerate Queries with Semantic Cache

Verba enhances search efficiency with Weaviate's Semantic Cache, a sophisticated system that retains the essence of your queries, results, and dialogues. This proactive feature means that Verba anticipates your needs, using cached data to expedite future inquiries. With semantic matching, it quickly determines if your question has been asked before, delivering instant results, and even suggests auto-completions based on historical interactions, streamlining your search experience to be faster and more intuitive.


✨ Getting Started with Verba

Starting your Verba journey is super easy, with multiple deployment options tailored to your preferences. Follow these simple steps to get Verba up and running:

  • Deploy with pip (Quickstart)
pip install goldenverba
  • Build from Source (Quickstart)
git clone https://github.com/weaviate/Verba

pip install -e .
  • Use Docker for Deployment (Quickstart)

Prerequisites: If you're not using Docker, ensure that you have Python >=3.9.0 installed on your system.

🐍 Installing Python and Setting Up a Virtual Environment

Before you can use Verba, you'll need to ensure that Python >=3.9.0 is installed on your system and that you can create a virtual environment for a safer and cleaner project setup.

Installing Python

Python is required to run Verba. If you don't have Python installed, follow these steps:

For Windows:

Download the latest Python installer from the official Python website. Run the installer and make sure to check the box that says Add Python to PATH during installation.

For macOS:

You can install Python using Homebrew, a package manager for macOS, with the following command in the terminal:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then install Python:

brew install python

For Linux:

Python usually comes pre-installed on most Linux distributions. If it's not, you can install it using your distribution's package manager. You can read more about it here

Setting Up a Virtual Environment

It's recommended to use a virtual environment to avoid conflicts with other projects or system-wide Python packages.

Install the virtualenv package:

First, ensure you have pip installed (it comes with Python if you're using version 3.4 and above). Install virtualenv by running:

pip install virtualenv

Create a Virtual Environment:

Navigate to your project's directory in the terminal. Run the following command to create a virtual environment named venv (you can name it anything you like):

python3 -m virtualenv venv

Activate the Virtual Environment:

  • On Windows, activate the virtual environment by running:
venv\Scripts\activate.bat
  • On macOS and Linux, activate it with:
source venv/bin/activate

Once your virtual environment is activated, you'll see its name in the terminal prompt. Now you're ready to install Verba using the steps provided in the Quickstart sections.

Remember to deactivate the virtual environment when you're done working with Verba by simply running deactivate in the terminal.

Linting

We use ruff for automatic code formation and linting. The process is automated with a pre-commit hook. To install the hook, run:

pre-commit install

or for shorthand:

make pre-commit

After that all your commits will be automatically linted and formatted. The linting will happen only on the files you changed.

make pre-commit formats all files in the repository and install the hooks if needed.

πŸ“¦ Choosing the Right Verba Installation Package

Verba comes in several installation packages, each tailored for specific use cases and environments. Choose the package that aligns with your requirements:

Default Package

The default package is perfect for getting started quickly and includes support for popular models and services like OpenAI, Cohere, and spaCy. This package is suitable for general use and can be installed easily via pip:

pip install goldenverba

This will set you up with all you need to integrate Verba with these services without additional configuration.

HuggingFace Version

For those looking to leverage models from the HuggingFace ecosystem, including SentenceTransformer and LLama2, the HuggingFace version is the ideal choice. This package is optimized for GPU usage to accommodate the high performance demands of these models:

pip install goldenverba[huggingface]

Note: It's recommended to run this version on a system with a GPU to fully utilize the capabilities of the advanced models.

Development Version

If you're a developer looking to contribute to Verba or need the latest features still in development, the dev version is what you're looking for. This version may be less stable but offers the cutting edge of Verba's capabilities:

pip install goldenverba[dev]

Keep in mind that this version is intended for development purposes and may contain experimental features.

πŸš€ Quickstart: Deploy with pip

  1. Initialize a new Python Environment
python3 -m virtualenv venv
  1. Install Verba
pip install goldenverba
  1. Launch Verba
verba start
  1. Access Verba
Visit localhost:8000
  1. Create .env file and add environment variables

πŸ› οΈ Quickstart: Build from Source

  1. Clone the Verba repos
git clone https://github.com/weaviate/Verba.git
  1. Initialize a new Python Environment
python3 -m virtualenv venv
  1. Install Verba
pip install -e .
  1. Launch Verba
verba start
  1. Access Verba
Visit localhost:8000
  1. Create .env file and add environment variables

πŸ”‘ API Keys

Before diving into Verba's capabilities, you'll need to configure access to various components depending on your chosen technologies, such as OpenAI, Cohere, and HuggingFace. Start by obtaining the necessary API keys and setting them up through a .env file based on our provided example , or by declaring them as environment variables on your system. If you're building from source or using Docker, make sure your .env file is within the goldenverba directory.

Below is a comprehensive list of the API keys and variables you may require:

Weaviate

Verba provides flexibility in connecting to Weaviate instances based on your needs. By default, Verba opts for Weaviate Embedded if it doesn't detect the WEAVIATE_URL_VERBA and WEAVIATE_API_KEY_VERBA environment variables. This local deployment is the most straightforward way to launch your Weaviate database for prototyping and testing.

However, you have other compelling options to consider:

🌩️ Weaviate Cloud Service (WCS)

If you prefer a cloud-based solution, Weaviate Cloud Service (WCS) offers a scalable, managed environment. Learn how to set up a cloud cluster and get the API keys by following the Weaviate Cluster Setup Guide.

🐳 Docker Deployment Another robust local alternative is deploying Weaviate using Docker. For more details, consult the Weaviate Docker Guide.

WEAVIATE_URL_VERBA=URL-TO-YOUR-WEAVIATE-CLUSTER

WEAVIATE_API_KEY_VERBA=API-KEY-OF-YOUR-WEAVIATE-CLUSTER

OpenAI

Verba supports OpenAI Models such as Ada, GPT3, and GPT4. To use them, you need to specify the OPENAI_API_KEY environment variable. You can get it from OpenAI

OPENAI_API_KEY=YOUR-OPENAI-KEY

You can also add a OPENAI_BASE_URL to use proxies such as LiteLLM (https://github.com/BerriAI/litellm)

OPENAI_BASE_URL=YOUR-OPENAI_BASE_URL

Azure OpenAI

To use Azure OpenAI, you need to set

  • The API type:
OPENAI_API_TYPE="azure"
  • The key and the endpoint:
OPENAI_API_KEY=<YOUR_KEY>
OPENAI_BASE_URL=http://XXX.openai.azure.com
  • Azure OpenAI ressource name, which is XXX if your endpoint is XXX.openai.azure.com
AZURE_OPENAI_RESOURCE_NAME=<YOUR_AZURE_RESOURCE_NAME>
  • You need to set the models, for the embeddings and for the query.
AZURE_OPENAI_EMBEDDING_MODEL="text-embedding-ada-002"
OPENAI_MODEL="gpt-4" 
  • Finally, as Azure is using per-minute quota, you might need to add a waiting time between each chunk upload. For example, if you have a limit of 240k tokens per minute, if your chunks are 400 tokens max, then 100ms between queries should be fine. If you get error 429 from weaviate, then increase this value.
WAIT_TIME_BETWEEN_INGESTION_QUERIES_MS="100"

Cohere

Verba supports Cohere Models, to use them, you need to specify the COHERE_API_KEY environment variable. You can get it from Cohere

COHERE_API_KEY=YOUR-COHERE-KEY

HuggingFace

Verba supports HuggingFace models, such as SentenceTransformers and Llama2. To use them you need the HF_TOKEN environment variable. You can get it from HuggingFace

HF_TOKEN=YOUR-HUGGINGFACE-TOKEN

Llama2

To use the Llama2 model from Meta, you first need to request access to it. Read more about accessing the Llama model here. To enable the LLama2 model for Verba use:

LLAMA2-7B-CHAT-HF=True

Unstructured

Verba supports importing documents through Unstructured (e.g .pdf). To use them you need the UNSTRUCTURED_API_KEY environment variable. You can get it from Unstructured

UNSTRUCTURED_API_KEY=YOUR-UNSTRUCTURED-KEY
UNSTRUCTURED_API_URL=YOUR-SELF-HOSTED-INSTANCE # If you are self hosting, in the form of `http://localhost:8000/general/v0/general`

Github

If you want to use the Github Reader, you need the GITHUB_TOKEN environment variable. You can get it from GitHub

GITHUB_TOKEN=YOUR-GITHUB-TOKEN

Status Page

Once configured, you can monitor your Verba installation's health and status via the 'Status Verba' page. This dashboard provides insights into your deployment type, libraries, environment settings, Weaviate schema counts, and more. It's also your go-to for maintenance tasks like resetting Verba, clearing the cache, or managing auto-complete suggestions.

Demo of Verba

🐳 Quickstart: Deploy with Docker

Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. Containers are isolated from one another and bundle their own software, libraries, and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating system kernel and are thus more lightweight than virtual machines. Docker provides an additional layer of abstraction and automation of operating-system-level virtualization on Windows and Linux.

Docker's use of containers to package software means that the application and its dependencies, libraries, and other binaries are packaged together and can be moved between environments easily. This makes it incredibly useful for developers looking to create predictable environments that are isolated from other applications.

To get started with deploying Verba using Docker, follow the steps below. If you need more detailed instructions on Docker usage, check out the Docker Curriculum.

If you're unfamiliar with Docker, you can learn more about it here.

  1. Clone the Verba repos Ensure you have Git installed on your system. Then, open a terminal or command prompt and run the following command to clone the Verba repository:
git clone https://github.com/weaviate/Verba.git
  1. Deploy using Docker With Docker installed and the Verba repository cloned, navigate to the directory containing the Docker Compose file in your terminal or command prompt. Run the following command to start the Verba application in detached mode, which allows it to run in the background:
docker compose up -d

This command will download the necessary Docker images, create containers, and start Verba. Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.

πŸ’Ύ Importing Your Data into Verba

With Verba configured, you're ready to import your data and start exploring. Follow these simple steps to get your data into Verba:

Demo of Verba

  1. Initiate the Import Process

    • Click on "Add Documents" to begin.
  2. Select Your Data Processing Tools

    • At the top, you'll find three tabs labeled Reader, Chunker, and Embedder, each offering different options for handling your data.
  3. Choose a Reader

    • The Reader is responsible for importing your data. Select from the available options:
      • SimpleReader: For importing .txt and .md files.
      • GitHubReader: For loading data directly from a GitHub repository by specifying the path (owner/repo/folder_path).
      • PDFReader: For importing .pdf files.
  4. Select a Chunker

    • Chunkers break down your data into manageable pieces. Choose a suitable chunker:
      • WordChunker: Chunks the text by words.
      • SentenceChunker: Chunks the text by sentences.
  5. Pick an Embedder

    • Embedders are crucial for integrating your data into Weaviate. Select one based on your preference:
      • AdaEmbedder: Utilizes OpenAI's ADA model for embedding.
      • MiniLMEmbedder: Employs Sentence Transformers for embedding.
      • CohereEmbedder: Uses Cohere for embedding.
  6. Commence Data Ingestion

    • After setting up your preferences, click on "Import" to ingest your data into Verba.

Now your data is ready to be used within Verba, enabling you to leverage its powerful search and retrieval capabilities.

πŸ’° Large Language Model (LLM) Costs

Verba utilizes LLM models through APIs. Be advised that the usage costs for these models will be billed to the API access key you provide. Primarily, costs are incurred during data embedding and answer generation processes.

πŸ’– Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Before contributing, please read the Contribution Guide. Visit our Weaviate Community Forum if you need any help!

πŸ› οΈ Project Architecture

You can learn more about Verba's architecture and implementation in its technical documentation and frontend documentation. It's recommended to read them before making any contributions.

More Repositories

1

weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
Go
9,241
star
2

weaviate-examples

Weaviate vector database – examples
HTML
279
star
3

semantic-search-through-wikipedia-with-weaviate

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
Python
238
star
4

recipes

This repository shares end-to-end notebooks on how to use various features and integrations with Weaviate at the core!
Jupyter Notebook
235
star
5

healthsearch-demo

Discover Healthsearch: Unlocking Health with Semantic Search ✨
TypeScript
141
star
6

weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
Python
127
star
7

awesome-weaviate

Awesome Weaviate
78
star
8

weaviate-podcast-search

Search through the Weaviate Podcast!
Python
56
star
9

typescript-client

Official Weaviate TypeScript Client
TypeScript
53
star
10

weaviate-io

Website for the Weaviate vector database
MDX
47
star
11

weaviate-helm

Helm charts to deploy Weaviate to k8s
Shell
43
star
12

st-weaviate-connection

A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector database
Jupyter Notebook
43
star
13

generator9000

Web App for generating synthetic data
TypeScript
35
star
14

t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module
Python
33
star
15

spark-connector

Weaviate connector for Apache Spark
Scala
33
star
16

biggraph-wikidata-search-with-weaviate

Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
JavaScript
31
star
17

BookRecs

A simple semantic search demo to list books based on user query
TypeScript
30
star
18

DEMO-text2vec-openai

This repository contains an example of how to use the Weaviate vector search engine's text2vec-openai module
Python
30
star
19

Generative-Feedback-Loops

Resources for exploring Generative Feedback Loops with Weaviate!
Jupyter Notebook
28
star
20

ref2vec-ecommerce-demo

Demo on using Weaviate's ref2vec vectorizer for building Recommendation Systems!
Python
26
star
21

weaviate-go-client

Go
22
star
22

weaviate-benchmarking

Tools for various benchmarking scenarios
Go
21
star
23

howto-weaviate-retrieval-plugin

Python
19
star
24

how-to-ingest-pdfs-with-unstructured

Jupyter Notebook
16
star
25

java-client

Official Weaviate Java Client
Java
15
star
26

weaviate-gorilla

Fine-tuned LLMs to use the Weaviate APIs!
Jupyter Notebook
12
star
27

weaviate-rust-client

Rust client library to interact with Weaviate
Rust
12
star
28

weaviate-javascript-client

No longer maintained, please see the TypeScript client
TypeScript
12
star
29

weaviate-infra

JavaScript
11
star
30

contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
Go
11
star
31

weaviate-chaos-engineering

Chaos-Engineering-Style CI Pipelines to make sure Weaviate handles whatever the real world throws at it.
Python
10
star
32

weaviate-cli

CLI tool for Weaviate
Python
10
star
33

arXiv-demo-dataset

This repository will contain a demo using Weaviate with data and metadata from the arXiv dataset.
HTML
10
star
34

partner-integration-examples

Jupyter Notebook
8
star
35

typescript-embedded

An embedded Weaviate database with TypeScript client interface
TypeScript
8
star
36

weaviate-diagnostics

Weaviate Diagnostics 🩺
Go
7
star
37

DEMO-datasets

Weaviate Demo Docker Compose files
6
star
38

multi2vec-bind-inference

Python
6
star
39

ner-transformers-models

The inference container for the Weaviate NER transformers module
Python
6
star
40

demo-fixie-weaviate

How to build an agent that integrates with weaviate
Jupyter Notebook
4
star
41

Getting-Started-With-Weaviate-Python-Client

Jupyter Notebook
4
star
42

multi2vec-clip-inference

Weaviate module inference code for the multi2vec-clip module
Python
3
star
43

CORD-19-Weaviate

Python
3
star
44

DEMO-GameWalkthroughs

Weaviate demo dataset with game walkthroughs
Python
3
star
45

recipes-ts

TypeScript
3
star
46

reranker-transformers

Python
3
star
47

verba-weaviate-data

Python
2
star
48

qna-transformers-models

The inference container for the qna module
Python
2
star
49

DEMO-NewsPublications

Weaviate demo with news publications
Python
2
star
50

t2v-gpt4all-models

This is the repo for the container that holds the models for the text2vec-gpt4all module
Python
2
star
51

weaviate-io-site-search

Python
2
star
52

multi-tenancy-load-test

Smarty
2
star
53

confluent-connector

Jupyter Notebook
2
star
54

weaviate-BEIR-benchmarks

Collection of the BEIR benchmarks uploaded and backed up in Weaviate!
Jupyter Notebook
2
star
55

sum-transformers-models

Transformers-based Summarization inference models based on transformers architecture
Python
2
star
56

DEMO-SimpleWiki

Wikipedia simple english for Weaviate
Python
2
star
57

weaviate-on-gcp-marketplace

Required Images and Build Scripts to publish Weaviate on GCP Marketplace
Python
1
star
58

weaviate-breadboard-kit

A breadboard kit for weaviate
TypeScript
1
star
59

i2v-pytorch-models

Inference containers for the Weaviate `img2vec-pytorch` module
Python
1
star
60

DEMO-ProductCatalog

Product catalog for Weaviate
Python
1
star
61

TEMPLATE-python

A python project template
Python
1
star
62

weaviate-graphql-prototype

weaviate-graphql-prototype
JavaScript
1
star
63

podcast-flow

Generate new content ideas from your existing content
Python
1
star