• Stars
    star
    314
  • Rank 132,608 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ask questions about any data source by leveraging langchains

DataChad V2πŸ€–

This is an app that let's you ask questions about any data source by leveraging embeddings, vector databases, large language models and last but not least langchains

How does it work?

  1. Upload any file(s) or enter any path or url
  2. The data source is detected and loaded into text documents
  3. The text documents are embedded using openai embeddings
  4. The embeddings are stored as a vector dataset to activeloop's database hub
  5. A langchain is created consisting of a LLM model (gpt-3.5-turbo by default) and the vector store as retriever
  6. When asking questions to the app, the chain embeds the input prompt and does a similarity search of in the vector store and uses the best results as context for the LLM to generate an appropriate response
  7. Finally the chat history is cached locally to enable a ChatGPT like Q&A conversation

Good to know

  • The app only runs on py>=3.10!
  • As default context this git repository is taken so you can directly start asking question about its functionality without chosing an own data source.
  • To run locally or deploy somewhere, execute cp .env.template .env and set credentials in the newly created .env file. Other options are manually setting of system environment variables, or storing them into .streamlit/secrets.toml when hosted via streamlit.
  • If you have credentials set like explained above, you can just hit submit in the authentication without reentering your credentials in the app.
  • Your data won't load? Feel free to open an Issue or PR and contribute!
  • Yes, Chad in DataChad refers to the well-known meme
  • DataChad V2 does not support local mode, but many feature will soon come. Stay tuned!

How does it look like?

TODO LIST

If you like to contribute, feel free to grab any task

  • Refactor utils, especially the loaders
  • Add option to choose model and embeddings
  • Enable fully local / private mode
  • Add option to upload multiple files to a single dataset
  • Decouple datachad modules from streamlit
  • remove all local mode and other V1 stuff
  • Load existing knowledge bases
  • Delete existing knowledge bases
  • Enable streaming responses
  • Show retrieved context
  • Refactor UI
  • Introduce smart FAQs
  • Exchange downloaded file storage with tempfile
  • Add user creation and login
  • Add chat history per user
  • Make all I/O asynchronous
  • Implement FastAPI routes and backend app
  • Implement a proper frontend (react or whatever)
  • containerize the app

More Repositories

1

realtime_object_detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV
Python
280
star
2

realtime_segmenation

Realtime Semantic Segmentation for Mobile Platforms based on Tensorflow's Deeblap Model
Python
84
star
3

whatsbot

python flask app serving as webhook for whatsapp business accounts making prompts to openai api
Python
64
star
4

AttractiveNet

AttractiveNet - Regressing on Facial Attractiveness with Neural Networks - An End-to-End Deep Learning Tutorial in Python.
Jupyter Notebook
35
star
5

deeptraining_hands

Dataset, necessary Scripts and trained SSD Model for detecting Hands in Realtime.
Python
31
star
6

yolo_for_tf_od_api

Files Added or Updated to be able to use yolo-darknet in tensorflows object detection api
Python
24
star
7

objectdetection_ros

ROS Object Detection Package [based on github/gustavz/realtime_object_detection]
Python
14
star
8

jaivus

JAIvus is a ChatGPT powered personal assistant implemented in python and deployed as streamlit app
Python
13
star
9

computer_vision

TUM Master Course "Computer Vision" Assignment Solutions
MATLAB
8
star
10

setup_jetsontx2

Scripts and Information to setup my jetson tx2 developer environment
C++
4
star
11

tf_training

Training Repository for Tensorflow Models (Currently mask_rcnn_mobilenet_v1_coco)
Python
4
star
12

music_generator

Android App for Music Generation. TUM Master Course Project
Java
3
star
13

roboSoccer_championship

Winner Team C++ Code of TUM Robosoccer Championship 2016
C++
3
star
14

audio-to-text

streamlit app to transcript audio to text using openai's whisper library
Python
3
star
15

text-to-speech

A simple tool to create speech from text to use as overlay for demo videos
Python
1
star
16

eetfm_automation

eetfm_automation: Export and Evaluate TensorFlow Model Automation based on TensorFlow Object Detection API
Python
1
star
17

test_models

Benchmark Testing Purpose: test model frozen graphs for my realtime_detection_api
PureBasic
1
star
18

codewars_challanges

Completed Codewars Challanges
Python
1
star
19

streamlit_terminal

Security Warning: This app allows execution of arbitrary commands which can be used to compromise your system
Python
1
star