RWKV Runner
This project aims to eliminate the barriers of using large language models by automating everything for you. All you need is a lightweight executable program of just a few megabytes. Additionally, this project provides an interface compatible with the OpenAI API, which means that every ChatGPT client is an RWKV client.
Install
Use Custom CUDA kernel to Accelerate
.
Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues, go to the Configs page and turn off v1.3.7_win.zip and letting it update automatically to the latest version, or add it to the trusted list (Windows Security
-> Virus & threat protection
-> Manage settings
-> Exclusions
-> Add or remove exclusions
-> Add an exclusion
-> Folder
-> RWKV-Runner
).
If Windows Defender claims this is a virus, you can try downloading For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.
Features
- RWKV model management and one-click startup
- Fully compatible with the OpenAI API, making every ChatGPT client an RWKV client. After starting the model, open http://127.0.0.1:8000/docs to view more details.
- Automatic dependency installation, requiring only a lightweight executable program
- Configs with 2G to 32G VRAM are included, works well on almost all computers
- User-friendly chat and completion interaction interface included
- Easy-to-understand and operate parameter configuration
- Built-in model conversion tool
- Built-in download management and remote model inspection
- Built-in one-click LoRA Finetune
- Can also be used as an OpenAI ChatGPT and GPT-Playground client
- Multilingual localization
- Theme switching
- Automatic updates
API Concurrency Stress Testing
ab -p body.json -T application/json -c 20 -n 100 -l http://127.0.0.1:8000/chat/completions
body.json:
{
"messages": [
{
"role": "user",
"content": "Hello"
}
]
}
Embeddings API Example
Note: v1.4.0 has improved the quality of embeddings API. The generated results are not compatible with previous versions. If you are using embeddings API to generate knowledge bases or similar, please regenerate.
If you are using langchain, just use OpenAIEmbeddings(openai_api_base="http://127.0.0.1:8000", openai_api_key="sk-")
import numpy as np
import requests
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
values = [
"I am a girl",
"我是个女孩",
"私は女の子です",
"广东人爱吃福建人",
"我是个人类",
"I am a human",
"that dog is so cute",
"私はねこむすめです、にゃん♪",
"宇宙级特大事件!号外号外!"
]
embeddings = []
for v in values:
r = requests.post("http://127.0.0.1:8000/embeddings", json={"input": v})
embedding = r.json()["data"][0]["embedding"]
embeddings.append(embedding)
compared_embedding = embeddings[0]
embeddings_cos_sim = [cosine_similarity(compared_embedding, e) for e in embeddings]
for i in np.argsort(embeddings_cos_sim)[::-1]:
print(f"{embeddings_cos_sim[i]:.10f} - {values[i]}")
Related Repositories:
- RWKV-4-World: https://huggingface.co/BlinkDL/rwkv-4-world/tree/main
- RWKV-4-Raven: https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main
- ChatRWKV: https://github.com/BlinkDL/ChatRWKV
- RWKV-LM: https://github.com/BlinkDL/RWKV-LM
- RWKV-LM-LoRA: https://github.com/Blealtan/RWKV-LM-LoRA
- MIDI-LLM-tokenizer: https://github.com/briansemrau/MIDI-LLM-tokenizer