Biblos - Bible Exploration with Vector Search and Summarization
Biblos allows semantic search and summarization of Bible passages using state-of-the-art NLP techniques:
- Vector search over the entire Bible text using Chroma and instructor-large embeddings
- Summarization of search results using Anthropic's Claude large language model
This enables powerful semantic search over biblical texts to find related passages, along with high quality summaries of the relationships between verses on a given topic.
Features
- Semantic search over the entire Bible text
- Summarization of search results using Claude LLM
- Web UI built with Streamlit for easy exploration
- Leverages Chroma for vector search over instructor-large embeddings
- Modular design allowing swapping of components like DB, embeddings, LLM etc.
Architecture
Biblos follows a RAG (Retrieval Augmented Generation) architecture:
- Bible text is indexed in a Chroma vector database using sentence embeddings
- User searches for a topic, and relevant passages are retrieved by semantic similarity
- Top results are collated and passed to Claude to generate a summarization
This enables combining the strengths of dense vector search for retrieval with a powerful LLM for summarization.
The UI is built using Streamlit for easy exploration, with Python code modularized for maintainability.
Running Biblos
To run Biblos locally:
- Install requirements
pip install -r requirements.txt
- Download embedding model and preprocess Bible text into a Chroma database (optional -- if you don't recreate this, you can use the default embedding database that comes with the application)
cd data
python create_db.py
cd ..
Note: This can take a long time (approx 18 minutes on an M1 Macbook Pro)
- Obtain an Anthropic API Key and set it to environment variable
ANTHROPIC_API_KEY
export ANTHROPIC_API_KEY ***your_api_key***
- Launch the Streamlit app:
streamlit run app.py
Credits
Biblos leverages the following open source projects:
- Langchain - Building LLMs through composability
- Chroma - Vector similarity search
- Anthropic - Claude summarization model
- instructor-large Embeddings - Text embeddings
- Streamlit - Web UI