• Stars
    star
    575
  • Rank 77,622 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Question answering system for PDF files

Ask my PDF

Thank you for your interest in my application. Please be aware that this is only a Proof of Concept system and may contain bugs or unfinished features. If you like this app you can ❤️ follow me on Twitter for news and updates.

Ask my PDF - Question answering system built on top of GPT3

🎲 The primary use case for this app is to assist users in answering questions about board game rules based on the instruction manual. While the app can be used for other tasks, helping users with board game rules is particularly meaningful to me since I'm an avid fan of board games myself. Additionally, this use case is relatively harmless, even in cases where the model may experience hallucinations.

🌐 The app can be accessed on the Streamlit Community Cloud at https://ask-my-pdf.streamlit.app/. 🔑 However, to use the app, you will need your own OpenAI's API key.

📄 The app implements the following academic papers:

Installation

  1. Clone the repo:

    git clone https://github.com/mobarski/ask-my-pdf

  2. Install dependencies:

    pip install -r ask-my-pdf/requirements.txt

  3. Run the app:

    cd ask-my-pdf/src

    run.sh or run.bat

High-level documentation

RALM + HyDE

RALM + HyDE

RALM + HyDE + context

RALM + HyDE + context

Environment variables used for configuration

General configuration:
  • STORAGE_SALT - cryptograpic salt used when deriving user/folder name and encryption key from API key, hexadecimal notation, 2-16 characters

  • STORAGE_MODE - index storage mode: S3, LOCAL, DICT (default)

  • STATS_MODE - usage stats storage mode: REDIS, DICT (default)

  • FEEDBACK_MODE - user feedback storage mode: REDIS, NONE (default)

  • CACHE_MODE - embeddings cache mode: S3, DISK, NONE (default)

Local filesystem configuration (storage / cache):
  • STORAGE_PATH - directory path for index storage

  • CACHE_PATH - directory path for embeddings cache

S3 configuration (storage / cache):
  • S3_REGION - region code

  • S3_BUCKET - bucket name (storage)

  • S3_SECRET - secret key

  • S3_KEY - access key

  • S3_URL - URL

  • S3_PREFIX - object name prefix

  • S3_CACHE_BUCKET - bucket name (cache)

  • S3_CACHE_PREFIX - object name prefix (cache)

Redis configuration (for persistent usage statistics / user feedback):
  • REDIS_URL - Redis DB URL (redis[s]://:password@host:port/[db])
Community version related options:
  • OPENAI_KEY - API key used for the default user
  • COMMUNITY_DAILY_USD - default user's daily budget
  • COMMUNITY_USER - default user's code

More Repositories

1

alpaca-libre

Reimplementation of the task generation part from the Alpaca paper
Python
118
star
2

aidapter

Adapter / facade for language models (OpenAI, Anthropic, Cohere, local transformers, etc)
Python
18
star
3

ai-bricks

AI adapters / facade
Python
9
star
4

tkv

Table-Key-Value adapter for various db-engines: SQLite, Redis, MongoDB, Snowflake, DuckDB, ...
Python
3
star
5

thorvald

Similarity calculation engine for unary data.
Go
3
star
6

bench

Lean micro-benchmarking framework for the V language
V
2
star
7

morty

Morty programming language, Morty virtual machine and MortyVM assembler
C
2
star
8

vimes

Virtual Machines Experimentation Sandbox
C
2
star
9

vimes2

Virtual Machines Experimentation Sandbox 2
Nim
2
star
10

st_repl_connection

Connect Streamlit to local REPL applications
Python
1
star
11

fabris

Fabris Programming Language
C
1
star
12

clean-room

Data Clean Room utilities for probabilistic information exchange.
Python
1
star
13

faraway

Remote Hadoop operations via SSH
Python
1
star
14

itsy

Minimalistic fantasy console API for JS
JavaScript
1
star
15

hike

Hike is a library for automatically generating command line interfaces (CLIs) from Python scripts allowing selection and reordering of steps to run.
Python
1
star
16

smol

Smol is a minimal register-based virtual machine and assembly language designed for building simple games and applications.
JavaScript
1
star
17

kraken

Contextual Bandit Engine
Python
1
star
18

st_redis_connection

Connect to Redis and other compatible databases (KeyDB, DragonflyDB, LedisDB, SSDB, ARDB) from your Streamlit app.
Python
1
star
19

inverness

Natural Language Processing framework built on top of gensim and nmslib.
Python
1
star