local_llama

Interested in chatting with your PDFs entirely offline and free from OpenAI dependencies? Then you're in the right place. I made my other project, gpt_chatwithPDF with the ultimate goal of local_llama in mind. This repo assumes the same functionality as that project but is local and can be run in airplane mode.. Drop a star if you like it!

Video demo here: https://www.reddit.com/user/Jl_btdipsbro/comments/13n6hbz/local_llama/?utm_source=share&utm_medium=ios_app&utm_name=ioscss&utm_content=2&utm_term=1

DISCLAIMER: This is an experimental repo, not an end all be all for your solution. It is meant as a way forward towards many use cases for local offline use of LLMs.

Installation

On windows you have to have Visual Studio with a C compiler installed. Secondly you need a model, I used GPT4All-13B-snoozy.ggml.q4_0.bin, however any ggml should work. I downloaded the model here https://huggingface.co/TheBloke/GPT4All-13B-snoozy-GGML/tree/main Run pip install -r requirements.txt

Usage

Use the command python -m streamlit run "path/to/project/local_llama.py". This will start the app in your browser. Once you have uploaded your PDFs, refresh the browser, select a manual, and ask away!

CLI output as an example for inference time running on my alienware x14 with 3060:

TIMES
/GPT4All-13B-snoozy.ggml.q4_0.bin
llama_model_load_internal: format = ggjt v2 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
llama_print_timings: load time = 21283.78 ms
llama_print_timings: sample time = 3.08 ms / 13 runs ( 0.24 ms per token)
llama_print_timings: prompt eval time = 21283.70 ms / 177 tokens ( 120.25 ms per token)
llama_print_timings: eval time = 2047.03 ms / 12 runs ( 170.59 ms per token)
llama_print_timings: total time = 24057.21 ms

jlonge4/local_llama

jlonge4

Reviews

Repository Details

local_llama

Installation

Usage

History

Credits

License

More Repositories