• Stars
    star
    574
  • Rank 74,799 (Top 2 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

a lightweight LLM model inference framework

InferLLM

中文 README

InferLLM is a lightweight LLM model inference framework that mainly references and borrows from the llama.cpp project. llama.cpp puts almost all core code and kernels in a single file and use a large number of macros, making it difficult for developers to read and modify. InferLLM has the following features:

  • Simple structure, easy to get started and learning, and decoupled the framework part from the kernel part.
  • High efficiency, ported most of the kernels in llama.cpp.
  • Defined a dedicated KVstorage type for easy caching and management.
  • Compatible with multiple model formats (currently only supporting alpaca Chinese and English int4 models).
  • Currently only supports CPU, mainly Arm and x86 platforms, and can be deployed on mobile phones, with acceptable speed.

In short, InferLLM is a simple and efficient LLM CPU inference framework that can deploy quantized models in LLM locally and has good inference speed.

How to use

Download model

Currently, InferLLM uses the same models as llama.cpp and can download models from the llama.cpp project. In addition, models can also be downloaded directly from Hugging Face kewin4933/InferLLM-Model. Currently, two alpaca models are uploaded in this project, one is the Chinese int4 model and the other is the English int4 model.

Compile InferLLM

Local compilation

mkdir build
cd build
cmake ..
make

Android cross compilation

According to the cross compilation, you can use the pre-prepared tools/android_build.sh script. You need to install NDK in advance and configure the path of NDK to the NDK_ROOT environment variable.

export NDK_ROOT=/path/to/ndk
./tools/android_build.sh

Run InferLLM

Running ChatGLM model please refer to ChatGLM model documentation.

If it is executed locally, execute ./chatglm -m chatglm-q4.bin -t 4 directly. If you want to execute it on your mobile phone, you can use the adb command to copy alpaca and the model file to your mobile phone, and then execute adb shell ./chatglm -m chatglm-q4.bin -t 4.

  • x86 is:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz x86 running
  • android is xiaomi9,Qualcomm SM8150 Snapdragon 855 android running

According to x86 profiling result, we strongly advise using 4 threads.

Supported model

Now InferLLM supports ChatGLM-6B, llama, alpaca models.

License

InferLLM is licensed under the Apache License, Version 2.0

More Repositories

1

MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
C++
4,717
star
2

MegCC

MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
C++
461
star
3

MegSpot

MegSpot是一款高效、专业、跨平台的图片&视频对比应用
Vue
405
star
4

MegFlow

Efficient ML solution for long-tailed demands.
Rust
399
star
5

Models

采用MegEngine实现的各种主流深度学习模型
Python
300
star
6

RepLKNet

Official MegEngine implementation of RepLKNet
Python
261
star
7

MegPeak

C++
237
star
8

PMRID

ECCV2020 - Practical Deep Raw Image Denoising on Mobile Devices
Python
199
star
9

mperf

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
C++
157
star
10

NBNet

NBNet: Noise Basis Learning for Image Denoising with Subspace Projection
Python
141
star
11

YOLOX

MegEngine implementation of YOLOX
Python
98
star
12

Hub

基于旷视研究院领先的深度学习算法,提供满足多业务场景的预训练模型
Python
90
star
13

mgeconvert

MegEngine到其他框架的转换器
Python
67
star
14

ICD

This is the official implementation of the paper "Instance-conditional Knowledge Distillation for Object Detection", based on MegEngine and Pytorch.
Python
54
star
15

MegRay

A communication library for deep learning
C++
48
star
16

Docs

MegEngine Documentations
Python
44
star
17

GyroFlow

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
Python
43
star
18

Documentation

MegEngine Official Documentation
Python
38
star
19

Resource

Jupyter Notebook
32
star
20

OMNet

OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration, ICCV 2021, MegEngine implementation.
Python
32
star
21

ECCV2022-RIFE

Official MegEngine Implementation of Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Python
29
star
22

examples

A set of examples around MegEngine
Python
27
star
23

FINet

This is the official MegEngine implementation of FINet: Dual Branches Feature Interaction for Partial-to-Partial Point Cloud Registration, AAAI 2022
Python
20
star
24

awesome-megengine

Awesome Resources about MegEngine
15
star
25

cutlass-bak

modified cutlass
C++
14
star
26

MegDiffusion

MegEngine implementation of Diffusion Models.
Python
13
star
27

End-to-end-ASR-Transformer

An end to end ASR Transformer model training repo
Python
13
star
28

swin-transformer

Swin-Transformer implementation in MegEngine. This is a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.
Python
12
star
29

MegCat

A Deep Learning Project about cats.
11
star
30

Inference-Demo

推理样例
C++
8
star
31

NeRF

NeRF implementation in MegEngine
Python
8
star
32

megenginelite-rs

Rust
7
star
33

MegCookbook

以《解析深度学习——卷积神经网络原理与视觉实践》一书内容为脉络,提供MegEngine具体代码实现示例和项目案例
7
star
34

cheat_sheet_for_pytorch_immigrant

一份给从 PyTorch 迁移过来的用户的小抄
6
star
35

MegEngine-DMVFN

Python
6
star
36

MegEngine-SAM

Python
5
star
37

invis

invisible megengine API
Python
4
star
38

midout

Reduce binary size by removing code blocks
C++
4
star
39

MegRL

A MegEngine implementation of 6 RL algorithms
Python
3
star
40

MegEngine-Benchmark

Python
3
star
41

torch2mge

Python
2
star
42

Transfer-Learning-Library

Transfer Learning Library for Domain Adaptation, Task Adaptation, and Domain Generalization
Python
2
star
43

xopr

Experimental Operator Library for MegEngine
Python
2
star
44

mperf-libpfm4

forked from https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/
C
1
star