• Stars
    star
    162
  • Rank 232,284 (Top 5 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Discover Healthsearch: Unlocking Health with Semantic Search ✨

Healthsearch

Welcome to the Healthsearch Demo, an open-source project aimed at showcasing the potential of leveraging user-written reviews and queries to retrieve supplement products based on specific health effects.

Weaviate Docker support Demo Weaviate

Demo of Health Search

🎯 Overview

The search functionality in this demo accepts natural language queries that are translated into GraphQL queries using LLMs. These GraphQL queries are then utilized to retrieve supplements from a Weaviate database. The demo also exhibits an example of generative search by providing product summaries generated based on the retrieved objects.

⚠️ Disclaimer: Healthsearch is a technical demonstration, and the results shown should not be treated as health advice. The results and generated summaries are purely based on user-written reviews.

💡 Natural Language Translation to GraphQL

We use Large Language Models (LLM), like GPT4, to translate natural language queries into a structured query format, called a GraphQL query. The demo extracts information about filters, sorting, and limits directly from the context of the query. Whether the query is the top 10 products for glowing skin, products for sleep from a specific brand, or best-rated products for toothache, the demo can interpret these queries and generate an appropriate GraphQL query in return.

🔎 Semantic Search

Healthsearch relies on the power of semantic search in user reviews. When seeking products that are good for joint pain, for instance, Healthsearch scans user reviews for discussions on products that have alleviated joint pain or similar conditions. The results are then aggregated and grouped according to their respective products.

💥 Generative Search

After the translation of the query to GraphQL and the retrieval of the most semantically relevant product, we enhance our demo with a feature called Generative Search. Essentially, we examine the top five results and employ an LLM to generate a product summary. This concise summary offers a brief overview of the products, highlighting their pros and cons and providing valuable insights. Each summary is crafted around the query, ensuring every search is unique and interesting.

🔥 Semantic Cache

We embed the generated results and queries to Weaviate, and use it as a Semantic Cache. This method is advantageous as it enables the demo to return results from queries that are semantically equal to the new query. For example good for joint pain and helpful for joint pain are semantically very similar and should return the same results, whereas bad for joint pain should have its own generated result. This method allows us to gain much more from generated results than traditional string matching would permit. It's a simple yet potent solution that enhances the efficiency of the search process.

🔧 Template

This repository is designed to serve as a template - a starting point for your own projects with Weaviate. Take inspiration from how we've implemented certain features and feel free to enhance it in your own project. We welcome comments, ideas, and feedback. Embrace the open-source spirit!

💰 Language Learning Model (LLM) Costs

This demonstration primarily uses OpenAI models for embedding supplement products, processing user queries, and generating summaries. By default, any costs associated with using these services will be billed to the access key that you provide.

If you prefer, you can replace the OpenAI models with any other Language Learning Model (LLM) provider. However, please be aware that completely changing the API will require further adjustments to the code.

Below, we provide a rough estimate of the costs involved in importing data to Weaviate. For a comprehensive understanding, please visit OpenAI's pricing page at https://openai.com/pricing.

Data Embedding Costs

We employ the Ada v2 model for embedding data into the Weaviate cluster. At the time of writing this README, the model costs $0.0001 for every 1k tokens (note that approximately 4 characters equal 1 token). As a rough approximation, importing the dataset to Weaviate might cost around $0.002. However, we also provide the same dataset with pre-generated vectors so that it is not required to generate and pay for the product embeddings. The file is called dataset_100_supplements_with_vectors.json. The import script automatically detects whether the datasets contains the vector key or not.

Query Construction and Summary Generation Costs

We use the GPT-4 model for building GraphQL queries and generating summaries. As of the time of writing this README, this model costs $0.03/1k tokens for input and $0.06/1k tokens for output. The exact costs are dependent on the user query and the results returned by the GraphQL query. Please take these factors into account when calculating your expected costs. You can also change the model_name variable to gpt-3.5-turbo inside the api.py script in the backend folder. The GPT-3 model costs $0.0015/1k tokens for input and $0.002/1k tokens for output.

🛠️ Project Structure

The Healthsearch Demo is structured in three main components:

  1. A Weaviate database (either cluster hosted on WCS or local).
  2. A FastAPI endpoint facilitating communication between the LLM provider and database.
  3. An interactive React frontend for displaying the information.

Make sure you have Python (>=3.8.0) and Node (>=18.16.0) installed. We also support Docker and provide Dockerfiles for the setup.

🐳 Quickstart with Docker

You can use Docker to setup the demo in one line of code! If you're not familiar with Docker you can read more about it here (https://docker-curriculum.com/)

  1. Set environment variables:
  • The following environment variables need to be set
  • OPENAI_API_KEY=your-openai-api-key

Use the .env file inside the backend folder to set the variable (https://github.com/theskumar/python-dotenv) Note that if you're using the GPT-4 model (by default), ensure your OpenAI key has access. You can change the model_name variable to gpt-3.5-turbo inside the api.py script.

  1. Use docker compose
  • docker-compose up
  1. Access the frontend on:
  • localhost:3000

📚 Getting Started

To kick-start with the Healthsearch Demo, please refer to the READMEs in the Frontend and Backend folders:

💡 Usage

Follow these steps to use the Healthsearch Demo:

  1. Set up the Weaviate database, FastAPI backend, and the React frontend by following the instructions in their respective READMEs.
  2. Launch the database, backend server, and the frontend application.
  3. Use the interactive frontend to input your natural language query related to a health condition or benefit.
  4. The frontend sends the query to the backend, which transforms the natural language query into a GraphQL query using the LLM.
  5. The backend sends the GraphQL query to the Weaviate database to fetch relevant reviews based on the user query.
  6. The frontend displays the results, allowing you to explore the most semantic-related supplements to your specific health-related query.

💖 Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Please adhere to the code guidelines that include formatting, linting, and testing.

More Repositories

1

weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.
Go
10,796
star
2

Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
TypeScript
6,002
star
3

recipes

This repository shares end-to-end notebooks on how to use various Weaviate features and integrations!
Jupyter Notebook
478
star
4

weaviate-examples

Weaviate vector database – examples
HTML
297
star
5

semantic-search-through-wikipedia-with-weaviate

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
Python
241
star
6

weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
Python
160
star
7

structured-rag

StructuredRAG Benchmarker
Jupyter Notebook
85
star
8

awesome-weaviate

Awesome Weaviate
79
star
9

weaviate-io

Website for the Weaviate vector database
MDX
70
star
10

typescript-client

Official Weaviate TypeScript Client
TypeScript
64
star
11

weaviate-podcast-search

Search through the Weaviate Podcast!
Python
57
star
12

BookRecs

A simple semantic search demo to list books based on user query
TypeScript
51
star
13

weaviate-helm

Helm charts to deploy Weaviate to k8s
Shell
50
star
14

st-weaviate-connection

A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector database
Jupyter Notebook
49
star
15

generator9000

Web App for generating synthetic data
TypeScript
45
star
16

t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module
Python
38
star
17

Generative-Feedback-Loops

Resources for exploring Generative Feedback Loops with Weaviate!
Jupyter Notebook
35
star
18

spark-connector

Weaviate connector for Apache Spark
Scala
33
star
19

biggraph-wikidata-search-with-weaviate

Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
JavaScript
31
star
20

ref2vec-ecommerce-demo

Demo on using Weaviate's ref2vec vectorizer for building Recommendation Systems!
Python
30
star
21

weaviate-go-client

Go
30
star
22

DEMO-text2vec-openai

This repository contains an example of how to use the Weaviate vector search engine's text2vec-openai module
Python
29
star
23

weaviate-benchmarking

Tools for various benchmarking scenarios
Go
24
star
24

howto-weaviate-retrieval-plugin

Python
19
star
25

how-to-ingest-pdfs-with-unstructured

Jupyter Notebook
16
star
26

java-client

Official Weaviate Java Client
Java
15
star
27

weaviate-chaos-engineering

Chaos-Engineering-Style CI Pipelines to make sure Weaviate handles whatever the real world throws at it.
Go
15
star
28

weaviate-gorilla

Fine-tuned LLMs to use the Weaviate APIs!
Jupyter Notebook
13
star
29

contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
Go
13
star
30

weaviate-rust-client

Rust client library to interact with Weaviate
Rust
12
star
31

weaviate-javascript-client

No longer maintained, please see the TypeScript client
TypeScript
12
star
32

weaviate-cli

CLI tool for Weaviate
Python
11
star
33

weaviate-infra

JavaScript
11
star
34

typescript-embedded

An embedded Weaviate database with TypeScript client interface
TypeScript
11
star
35

arXiv-demo-dataset

This repository will contain a demo using Weaviate with data and metadata from the arXiv dataset.
HTML
11
star
36

quote-finder

TypeScript
9
star
37

multi2vec-bind-inference

Python
9
star
38

partner-integration-examples

Jupyter Notebook
8
star
39

weaviate-diagnostics

Weaviate Diagnostics 🩺
Go
7
star
40

multi2vec-clip-inference

Weaviate module inference code for the multi2vec-clip module
Python
6
star
41

DEMO-datasets

Weaviate Demo Docker Compose files
6
star
42

reranker-transformers

Python
6
star
43

ner-transformers-models

The inference container for the Weaviate NER transformers module
Python
6
star
44

Getting-Started-With-Weaviate-Python-Client

Jupyter Notebook
5
star
45

demo-fixie-weaviate

How to build an agent that integrates with weaviate
Jupyter Notebook
4
star
46

verba-weaviate-data

Python
4
star
47

recipes-ts

TypeScript
4
star
48

weaviate-local-k8s

Github action to deploy a local kubernetes cluster with Weaviate installed on it
Shell
4
star
49

late-chunking-experiments

Jupyter Notebook
3
star
50

CORD-19-Weaviate

Python
3
star
51

qna-transformers-models

The inference container for the qna module
Python
3
star
52

DEMO-NewsPublications

Weaviate demo with news publications
Python
3
star
53

DEMO-GameWalkthroughs

Weaviate demo dataset with game walkthroughs
Python
3
star
54

t2v-transformers-models-rs

This is the repo for the container that holds the pure Rust implementation for the `text2vec-transformers` module
Rust
3
star
55

t2v-gpt4all-models

This is the repo for the container that holds the models for the text2vec-gpt4all module
Python
2
star
56

weaviate-io-site-search

Python
2
star
57

multi-tenancy-load-test

Smarty
2
star
58

confluent-connector

Jupyter Notebook
2
star
59

weaviate-BEIR-benchmarks

Collection of the BEIR benchmarks uploaded and backed up in Weaviate!
Jupyter Notebook
2
star
60

sum-transformers-models

Transformers-based Summarization inference models based on transformers architecture
Python
2
star
61

weaviate-recommend-python-client

Python client for interacting with the Weaviate recommend service.
Python
2
star
62

weaviatest

CLI tool to perform different weaviate operations seamlessly. It's main use is for testing the Weaviate application or reproduce specific scenarios.
Python
2
star
63

DEMO-SimpleWiki

Wikipedia simple english for Weaviate
Python
2
star
64

weaviate-operator

A Kubernetes Operator to automate the management of Weaviate Database Clusters
Smarty
1
star
65

weaviate-on-gcp-marketplace

Required Images and Build Scripts to publish Weaviate on GCP Marketplace
Python
1
star
66

weaviate-breadboard-kit

A breadboard kit for weaviate
TypeScript
1
star
67

i2v-pytorch-models

Inference containers for the Weaviate `img2vec-pytorch` module
Python
1
star
68

DEMO-ProductCatalog

Product catalog for Weaviate
Python
1
star
69

TEMPLATE-python

A python project template
Python
1
star
70

weaviate-graphql-prototype

weaviate-graphql-prototype
JavaScript
1
star
71

podcast-flow

Generate new content ideas from your existing content
Python
1
star
72

demo-chirpchase-weaviate

TypeScript
1
star
73

aws-marketplace-checkmy-iam

Python
1
star