Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Swift

OCaml

Groovy

Erlang

Julia

Java

Clojure

Python

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

Dart

Clojure

F#

Go

MATLAB

PowerShell

Scala

C++

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇦🇽 Åland Islands

🇲🇪 Montenegro

🇨🇳 China

🇮🇹 Italy

🇹🇷 Türkiye

🇹🇬 Togo

🇸🇰 Slovakia

All Countries Compare Countries

BlinkDL/RWKV-CUDA

Stars
211
Rank 186,867 (Top 4 %)
Language Cuda
Created over 2 years ago
Updated 6 months ago

BlinkDL/RWKV-CUDA

BlinkDL

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

RWKV-CUDA

The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )

Towards RWKV-4 (see the wkv folder)

I have a basic RWKV-4 kernel in the wkv folder. Let's optimize it.

Experiment 1 - depthwise_conv1d - 20x faster than pytorch

The formula:

w.shape = (C, T)
k.shape = (B, C, T)
out.shape = (B, C, T)
out[b][c][t] = sum_u{ w[c][(T-1)-(t-u)] * k[b][c][u] }

pytorch = fwd 94ms bwd 529ms

CUDA kernel v0 = fwd 45ms bwd 84ms (simple)

CUDA kernel v1 = fwd 17ms bwd 43ms (shared memory)

CUDA kernel v2 = fwd 13ms bwd 31ms (float4)

CUDA kernel v3 = fwd 3.4ms bwd 23ms (B-group)

More test on RTX3090:

pytorch = fwd 14ms bwd 65ms

CUDA kernel v3 = fwd 0.8ms bwd 5.5ms

How to use: python run.py and it will compile everything for you (pip install Ninja if you don't have it).

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

ChatRWKV

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

AI-Writer

AI 写小说，生成玄幻和言情网文等等。中文预训练生成模型。采用我的 RWKV 模型，类似 GPT-2 。AI写作。RWKV for Chinese novel generation.

Hua

Hua is an AI image editor with Stable Diffusion (and more).

BlinkDL.github.io

A collection of State of the Art results in AI / ML / DL / RL / CV / NLP.

BlinkDL

A minimalist deep learning library in Javascript using WebGL + asm.js. Run convolutional neural network in your browser.

YYDZ

丁真宇宙，一眼丁真合集，已有两千多张图片。The YYDZ (Yi Yan Ding Zhen / One Eye Ding Zhen) dataset.

RWKV-v2-RNN-Pile

RWKV-v2-RNN trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

LinearAttentionArena

Here we will test various linear attention designs.

BookCNN

《深度卷积网络：原理与实践》现已在淘宝天猫京东当当发售. 这里是其中的代码下载.

Jupyter Notebook

SmallInitEmb

LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

WorldModel

Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business / finance / governance, and can align agents with human too.

LM-Trick-Questions

Here we collect trick questions and failed tasks for open source LLMs to improve them.

Basis

The Basis Programming Language

BlinkToDo

A minimalist ToDo.txt page. 如果你的ToDo有一百项以上，试试这个基于txt的极简事项管理工具。

AntiAging

List of Anti-aging Research

RWKV.com

Nala

The Nala markup, to turn a "Natural Language" sentence into a code-like statement. Nala 标注，将自然语言变为编程语言。

PathTracingJS

Path tracing demo with JS in your web browser. 用浏览器JS做路径跟踪渲染。

BlinkColorTheme

A colorful theme for HTML+JS+CSS.

Model_Leaderboard

Leaderboard of AI models.

MathBook

一个较为系统的数学笔记（graduate level）

BasisLang.com