• Stars
    star
    141
  • Rank 250,703 (Top 6 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created 10 months ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Discover Healthsearch: Unlocking Health with Semantic Search โœจ

Healthsearch

Welcome to the Healthsearch Demo, an open-source project aimed at showcasing the potential of leveraging user-written reviews and queries to retrieve supplement products based on specific health effects.

Weaviate Docker support Demo Weaviate

Demo of Health Search

๐ŸŽฏ Overview

The search functionality in this demo accepts natural language queries that are translated into GraphQL queries using LLMs. These GraphQL queries are then utilized to retrieve supplements from a Weaviate database. The demo also exhibits an example of generative search by providing product summaries generated based on the retrieved objects.

โš ๏ธ Disclaimer: Healthsearch is a technical demonstration, and the results shown should not be treated as health advice. The results and generated summaries are purely based on user-written reviews.

๐Ÿ’ก Natural Language Translation to GraphQL

We use Large Language Models (LLM), like GPT4, to translate natural language queries into a structured query format, called a GraphQL query. The demo extracts information about filters, sorting, and limits directly from the context of the query. Whether the query is the top 10 products for glowing skin, products for sleep from a specific brand, or best-rated products for toothache, the demo can interpret these queries and generate an appropriate GraphQL query in return.

๐Ÿ”Ž Semantic Search

Healthsearch relies on the power of semantic search in user reviews. When seeking products that are good for joint pain, for instance, Healthsearch scans user reviews for discussions on products that have alleviated joint pain or similar conditions. The results are then aggregated and grouped according to their respective products.

๐Ÿ’ฅ Generative Search

After the translation of the query to GraphQL and the retrieval of the most semantically relevant product, we enhance our demo with a feature called Generative Search. Essentially, we examine the top five results and employ an LLM to generate a product summary. This concise summary offers a brief overview of the products, highlighting their pros and cons and providing valuable insights. Each summary is crafted around the query, ensuring every search is unique and interesting.

๐Ÿ”ฅ Semantic Cache

We embed the generated results and queries to Weaviate, and use it as a Semantic Cache. This method is advantageous as it enables the demo to return results from queries that are semantically equal to the new query. For example good for joint pain and helpful for joint pain are semantically very similar and should return the same results, whereas bad for joint pain should have its own generated result. This method allows us to gain much more from generated results than traditional string matching would permit. It's a simple yet potent solution that enhances the efficiency of the search process.

๐Ÿ”ง Template

This repository is designed to serve as a template - a starting point for your own projects with Weaviate. Take inspiration from how we've implemented certain features and feel free to enhance it in your own project. We welcome comments, ideas, and feedback. Embrace the open-source spirit!

๐Ÿ’ฐ Language Learning Model (LLM) Costs

This demonstration primarily uses OpenAI models for embedding supplement products, processing user queries, and generating summaries. By default, any costs associated with using these services will be billed to the access key that you provide.

If you prefer, you can replace the OpenAI models with any other Language Learning Model (LLM) provider. However, please be aware that completely changing the API will require further adjustments to the code.

Below, we provide a rough estimate of the costs involved in importing data to Weaviate. For a comprehensive understanding, please visit OpenAI's pricing page at https://openai.com/pricing.

Data Embedding Costs

We employ the Ada v2 model for embedding data into the Weaviate cluster. At the time of writing this README, the model costs $0.0001 for every 1k tokens (note that approximately 4 characters equal 1 token). As a rough approximation, importing the dataset to Weaviate might cost around $0.002. However, we also provide the same dataset with pre-generated vectors so that it is not required to generate and pay for the product embeddings. The file is called dataset_100_supplements_with_vectors.json. The import script automatically detects whether the datasets contains the vector key or not.

Query Construction and Summary Generation Costs

We use the GPT-4 model for building GraphQL queries and generating summaries. As of the time of writing this README, this model costs $0.03/1k tokens for input and $0.06/1k tokens for output. The exact costs are dependent on the user query and the results returned by the GraphQL query. Please take these factors into account when calculating your expected costs. You can also change the model_name variable to gpt-3.5-turbo inside the api.py script in the backend folder. The GPT-3 model costs $0.0015/1k tokens for input and $0.002/1k tokens for output.

๐Ÿ› ๏ธ Project Structure

The Healthsearch Demo is structured in three main components:

  1. A Weaviate database (either cluster hosted on WCS or local).
  2. A FastAPI endpoint facilitating communication between the LLM provider and database.
  3. An interactive React frontend for displaying the information.

Make sure you have Python (>=3.8.0) and Node (>=18.16.0) installed. We also support Docker and provide Dockerfiles for the setup.

๐Ÿณ Quickstart with Docker

You can use Docker to setup the demo in one line of code! If you're not familiar with Docker you can read more about it here (https://docker-curriculum.com/)

  1. Set environment variables:
  • The following environment variables need to be set
  • OPENAI_API_KEY=your-openai-api-key

Use the .env file inside the backend folder to set the variable (https://github.com/theskumar/python-dotenv) Note that if you're using the GPT-4 model (by default), ensure your OpenAI key has access. You can change the model_name variable to gpt-3.5-turbo inside the api.py script.

  1. Use docker compose
  • docker-compose up
  1. Access the frontend on:
  • localhost:3000

๐Ÿ“š Getting Started

To kick-start with the Healthsearch Demo, please refer to the READMEs in the Frontend and Backend folders:

๐Ÿ’ก Usage

Follow these steps to use the Healthsearch Demo:

  1. Set up the Weaviate database, FastAPI backend, and the React frontend by following the instructions in their respective READMEs.
  2. Launch the database, backend server, and the frontend application.
  3. Use the interactive frontend to input your natural language query related to a health condition or benefit.
  4. The frontend sends the query to the backend, which transforms the natural language query into a GraphQL query using the LLM.
  5. The backend sends the GraphQL query to the Weaviate database to fetch relevant reviews based on the user query.
  6. The frontend displays the results, allowing you to explore the most semantic-related supplements to your specific health-related query.

๐Ÿ’– Open Source Contribution

Your contributions are always welcome! Feel free to contribute ideas, feedback, or create issues and bug reports if you find any! Please adhere to the code guidelines that include formatting, linting, and testing.

More Repositories

1

weaviate

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseโ€‹.
Go
9,241
star
2

Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
Python
2,028
star
3

weaviate-examples

Weaviate vector database โ€“ examples
HTML
279
star
4

semantic-search-through-wikipedia-with-weaviate

Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
Python
238
star
5

recipes

This repository shares end-to-end notebooks on how to use various features and integrations with Weaviate at the core!
Jupyter Notebook
235
star
6

weaviate-python-client

A python native client for easy interaction with a Weaviate instance.
Python
127
star
7

awesome-weaviate

Awesome Weaviate
78
star
8

weaviate-podcast-search

Search through the Weaviate Podcast!
Python
56
star
9

typescript-client

Official Weaviate TypeScript Client
TypeScript
53
star
10

weaviate-io

Website for the Weaviate vector database
MDX
47
star
11

weaviate-helm

Helm charts to deploy Weaviate to k8s
Shell
43
star
12

st-weaviate-connection

A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector database
Jupyter Notebook
43
star
13

generator9000

Web App for generating synthetic data
TypeScript
35
star
14

t2v-transformers-models

This is the repo for the container that holds the models for the text2vec-transformers module
Python
33
star
15

spark-connector

Weaviate connector for Apache Spark
Scala
33
star
16

biggraph-wikidata-search-with-weaviate

Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
JavaScript
31
star
17

BookRecs

A simple semantic search demo to list books based on user query
TypeScript
30
star
18

DEMO-text2vec-openai

This repository contains an example of how to use the Weaviate vector search engine's text2vec-openai module
Python
30
star
19

Generative-Feedback-Loops

Resources for exploring Generative Feedback Loops with Weaviate!
Jupyter Notebook
28
star
20

ref2vec-ecommerce-demo

Demo on using Weaviate's ref2vec vectorizer for building Recommendation Systems!
Python
26
star
21

weaviate-go-client

Go
23
star
22

weaviate-benchmarking

Tools for various benchmarking scenarios
Go
21
star
23

howto-weaviate-retrieval-plugin

Python
19
star
24

how-to-ingest-pdfs-with-unstructured

Jupyter Notebook
16
star
25

java-client

Official Weaviate Java Client
Java
15
star
26

weaviate-gorilla

Fine-tuned LLMs to use the Weaviate APIs!
Jupyter Notebook
12
star
27

weaviate-rust-client

Rust client library to interact with Weaviate
Rust
12
star
28

weaviate-infra

JavaScript
11
star
29

contextionary

Weaviate's own language vectorizer, which allows for semantic context-based searches in Weaviate
Go
11
star
30

weaviate-chaos-engineering

Chaos-Engineering-Style CI Pipelines to make sure Weaviate handles whatever the real world throws at it.
Python
10
star
31

weaviate-cli

CLI tool for Weaviate
Python
10
star
32

arXiv-demo-dataset

This repository will contain a demo using Weaviate with data and metadata from the arXiv dataset.
HTML
10
star
33

weaviate-javascript-client

No longer maintained, please see the TypeScript client
TypeScript
10
star
34

partner-integration-examples

Jupyter Notebook
8
star
35

typescript-embedded

An embedded Weaviate database with TypeScript client interface
TypeScript
8
star
36

weaviate-diagnostics

Weaviate Diagnostics ๐Ÿฉบ
Go
7
star
37

DEMO-datasets

Weaviate Demo Docker Compose files
6
star
38

multi2vec-bind-inference

Python
6
star
39

ner-transformers-models

The inference container for the Weaviate NER transformers module
Python
6
star
40

demo-fixie-weaviate

How to build an agent that integrates with weaviate
Jupyter Notebook
4
star
41

Getting-Started-With-Weaviate-Python-Client

Jupyter Notebook
4
star
42

multi2vec-clip-inference

Weaviate module inference code for the multi2vec-clip module
Python
3
star
43

CORD-19-Weaviate

Python
3
star
44

DEMO-GameWalkthroughs

Weaviate demo dataset with game walkthroughs
Python
3
star
45

recipes-ts

TypeScript
3
star
46

reranker-transformers

Python
3
star
47

verba-weaviate-data

Python
2
star
48

qna-transformers-models

The inference container for the qna module
Python
2
star
49

DEMO-NewsPublications

Weaviate demo with news publications
Python
2
star
50

t2v-gpt4all-models

This is the repo for the container that holds the models for the text2vec-gpt4all module
Python
2
star
51

weaviate-io-site-search

Python
2
star
52

multi-tenancy-load-test

Smarty
2
star
53

confluent-connector

Jupyter Notebook
2
star
54

weaviate-BEIR-benchmarks

Collection of the BEIR benchmarks uploaded and backed up in Weaviate!
Jupyter Notebook
2
star
55

sum-transformers-models

Transformers-based Summarization inference models based on transformers architecture
Python
2
star
56

DEMO-SimpleWiki

Wikipedia simple english for Weaviate
Python
2
star
57

weaviate-on-gcp-marketplace

Required Images and Build Scripts to publish Weaviate on GCP Marketplace
Python
1
star
58

weaviate-breadboard-kit

A breadboard kit for weaviate
TypeScript
1
star
59

i2v-pytorch-models

Inference containers for the Weaviate `img2vec-pytorch` module
Python
1
star
60

DEMO-ProductCatalog

Product catalog for Weaviate
Python
1
star
61

TEMPLATE-python

A python project template
Python
1
star
62

weaviate-graphql-prototype

weaviate-graphql-prototype
JavaScript
1
star
63

podcast-flow

Generate new content ideas from your existing content
Python
1
star