• Stars
    star
    877
  • Rank 51,695 (Top 2 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TinyMaix is a tiny inference library for microcontrollers (TinyML).

TinyMaix

中文 | English

TinyMaix is a tiny inference Neural Network library specifically for microcontrollers (TinyML).
We design it follow the rule: Easy-to-Use > Portable > Speed > Space

Introduction to tinyML: TinyML
See tested 48 chips and benchmark: benchmark
Good News: Rewarded Porting TinyMaix

Highlights

  • Core Code less than 400 lines(tm_layers.c+tm_model.c+arch_cpu.h), code .text section less than 3KB
  • Low ram consume, even Arduino ATmega328 (32KB Flash, 2KB Ram) can run mnist with TinyMaix~
  • Support INT8/FP32/FP16 model, experimentally support FP8, convert from keras h5 or tflite.
  • Supoort multi architecture accelerate: ARM SIMD/NEON/MVEI,RV32P, RV64V, CSKYV2, X86 SSE2 ~
  • User-friendly interfaces, just load/run models~
  • Support Full Static Memory config
  • MaixHub Online Model Training support

Run mnist demo on Arduino ATmega328

mnist demo
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
000000000077AFF9500000000000
000000000AFFFFFFD10000000000
00000000AFFFD8BFF70000000000
00000003FFD2000CF80000000000
00000004FD10007FF40000000000
00000000110000DFF40000000000
00000000000007FFC00000000000
0000000000004FFE300000000000
0000000000008FF9000000000000
00000000000BFF90000000000000
00000000001EFE20000000000000
0000000000CFF800000000000000
0000000004FFB000000000000000
000000001CFF8000000000000000
000000008FFA0000000000000000
00000000FFF10000000000000000
00000000FFF21111000112999900
00000000FFFFFFFFA8AFFFFFFF70
00000000AFFFFFFFFFFFFFFA7730
0000000007777AFFF97720000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
===use 49912us
0: 0
1: 0
2: 89
3: 0
4: 1
5: 6
6: 1
7: 0
8: 0
9: 0
### Predict output is: Number 2, prob=89

TODO List

  1. optimize tm_layers.c to tm_layers_O1.c, aimed to speed up to 1.4~2.0X Done
  2. ~~Add "ADD" OPS to support resnet/mbnet v2 ~~ Done!
  3. Train good backbone for 64KB/128KB/256KB/512KB ram litmit
  4. Add example: Detector,KWS,HAR,Gesture,OCR,...
  5. ...

Do you want take participate in development of TinyMaix, or discuss with TinyML hobbyist?
Join our telegram group: https://t.me/tinymaix

TinyMaix Design

TinyMaix is design for running AI Neural Network Mdoels on resources limited MCUs, which usually called TinyML

There are many TinyML infer library now, like TFLite micro, microTVM, NNoM, so why we need TinyMaix?

TinyMaix is a weekend hackathons project, so it is simple enough to read though in 30 minutes, and it will help TinyML newbies to understand how is it running.

TinyMaix aims to be a simple TinyML infererence library, it abandon many new features and doesn't use libs like CMSIS-NN.

Following this design goal, now TinyMaix is as simple as 5 files to compile~

We hope TinyMaix can help any MCU run AI Neural Network Mdoels, every one can port it to theirself hardware platform~

Note: Although TinyMaix support multi architecture accelerate, but it still need more effort to balance size and speed.

Features in design

  • Support up to mobilenet v1, RepVGG backbone
    • they are most common used, efficient structure for MCUs
    • Basic Conv2d, dwConv2d, FC, Relu/Relu6/Softmax, GAP, Reshape
    • ADD (for resnet,mbnet v2 struct, experimental)
    • MaxPool, AvgPool (now use stride instead)
  • FP32 model, INT8 quant model, FP16 model(NEW)
  • Convert tmdl from keras h5 or tflite
    • model is simple enough to train with keras/tf
    • tflite have quant functions already
  • Model statistics functions in C
    • Optional for reduce code

Features maybe added

  • INT16 quant model
    • Advantages:
      • more accuracy
      • friendly for SIMD/RV32P accelerate
    • Disadvantages:
      • increase FLASH/RAM consume 2X
  • Concat OP
    • Advantages:
      • support mobilenet v2, more accuracy
    • Disadvantages:
      • increase RAM consume 2X
      • concat mat cost many time cause model infer slow
      • need more work to cvt model into flat structure (in script)
  • Winograd Convolution Optimization
    • Advantages:
      • may speed up Conv computing
    • Disadvantages:
      • increase RAM consume, and consume more memory bandwidth
      • increase code (.text) size
      • need many Transforms, weak MCU may cost many time here

Features won't be added

  • BF16 model
    • most MCU don't have BF16 computing ability
    • accuracy won't better than INT16 to much
    • increase FLASH/RAM consume 2X
  • AVX/vulkan acceleration
    • TinyMaix is for MCUs, not for powerful PC/mobilephones
  • other misc OPs
    • TinyMaix support MCUs to run basic model in minimum resource consumption, if you want more OPs, switch to TFlite-micro/TVM/NCNN...

Try Demos

mnist

MNIST is handwritten digit recognition task, it is simple enough for even 8bit MCU like ATmega328.
Try it on PC:

cd examples/mnist
mkdir build
cd build 
cmake ..
make
./mnist

mbnet

mbnet (mobilenet v1) is simple classification model for mobile devices, but it is still a little heavy for MCUs.
The model in demo is mobilenet v1 0.25, it input 128x128x3 RGB image, output 1000 classes predict.
It need at least 128KB SRAM and 512KB Flash, STM32F411 is the typical minimum config for this model.

Try run mobilenet

cd examples/mbnet
mkdir build
cd build 
cmake ..
make
./mbnet

How to use (API)

Load Model

tm_err_t tm_load (tm_mdl_t* mdl, const uint8_t* bin, uint8_tbuf, tm_cb_t cb, tm_mat_t in);

mdl: model handle;
bin: model bin buf;
buf: main buf for middle output; if NULL, auto malloc main buf; else, use your static buffer.
cb: layer callback;
in: return input mat, include buf addr; //you can ignore it if use static buf

Remove Model

void tm_unload(tm_mdl_t* mdl);

Preprocess Input Data

tm_err_t tm_preprocess(tm_mdl_t* mdl, tm_pp_t pp_type, tm_mat_t* in, tm_mat_t* out);
TMPP_FP2INT //user own fp buf -> int input buf
TMPP_UINT2INT //int8: cvt in place; int16: can't cvt in place
TMPP_UINT2FP01 // u8/255.0
TMPP_UINT2FPN11// (u8-128)/128

Run Model

tm_err_t tm_run (tm_mdl_t* mdl, tm_mat_t* in, tm_mat_t* out);

How to port

The core file is those 5 files: tm_model.c, tm_layers.c, tinymaix.h, tm_port.h, arch_xxx.h

If you are using normal mcu without any acceleration instructions, choose arch_cpu.h, otherwise choose corresponding architecture header.

And you should edit tm_port.h to fill your desired configs, all config macro have annotation follow it.

Note TM_MAX_CSIZE,TM_MAX_KSIZE,TM_MAX_KCSIZE will occupy static buffers.

And now just put them into your project, compile it~

How to train/convert models

There are training scripts in examples/mnist to learn how to train simple mnist models.

Note: you need install TensorFlow (>=2.7) first.

After training and save h5 models, you can use scripts in tools to convert to tmdl or c header files.

  1. h5_to_tflite.py
    convert h5 model to float or int8 quant tflite files
    python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_f.tflite 0
    python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_q.tflite 1 quant_img_mnist/ 0to1
  2. tflite2tmdl.py
    convert tflite file to tmdl or c header files.
    python3 tflite2tmdl.py tflite/mnist_q.tflite tmdl/mnist_q.tmdl int8 1 28,28,1 10
================ pack model head ================
mdl_type   =0
out_deq    =1
input_cnt  =1
output_cnt =1
layer_cnt  =6
buf_size   =1464
sub_size   =0
in_dims    = [3, 28, 28, 1]
out_dims   = [1, 1, 1, 10]
================   pack layers   ================
CONV_2D
    [3, 28, 28, 1] [3, 13, 13, 4]
    in_oft:0, size:784;  out_oft:784, size:680
    padding valid
    layer_size=152
CONV_2D
    [3, 13, 13, 4] [3, 6, 6, 8]
    in_oft:784, size:680;  out_oft:0, size:288
    padding valid
    layer_size=432
CONV_2D
    [3, 6, 6, 8] [3, 2, 2, 16]
    in_oft:0, size:288;  out_oft:1400, size:64
    padding valid
    layer_size=1360
MEAN
    [3, 2, 2, 16] [1, 1, 1, 16]
    in_oft:1400, size:64;  out_oft:0, size:16
    layer_size=48
FULLY_CONNECTED
    [1, 1, 1, 16] [1, 1, 1, 10]
    in_oft:0, size:16;  out_oft:1448, size:16
    layer_size=304
SOFTMAX
    [1, 1, 1, 10] [1, 1, 1, 10]
    OUTPUT!
    in_oft:1448, size:16;  out_oft:0, size:56
    layer_size=48
================    pack done!   ================
    model  size 2.4KB (2408 B) FLASH
    buffer size 1.4KB (1464 B) RAM
    single layer mode subbuff size 1.4KB (64+1360=1424 B) RAM
Saved to tmdl/mnist_q.tmdl, tmdl/mnist_q.h

Now you have tmdl or c header files, put it into your project to use it~

How to train models online with MaixHub

You can download models from MaixHub or train your AI models online easily with MaixHub, don't need AI knowledge, train your model just click your mouse.

  • Register MaixHub account and login.
  • You can download TinyMaix models from model zoo or upload your models to model zoo for sharing.
  • Create a train project, collect dataset and train models online, finally you will get files:
    • .tmdl file and .h file, use one of them in your code.
    • report.json, report info, json format, we can find labels or anchors in this file, we will use these params in our code. Attention, these params will change in every training, you should copy these params to your code when change model, or you will the result will be wrong.
  • There's two type: classification and detection, for first time usage, use classification is recommended.
  • There's many backbone, you should select proper backbone according to your MCU's RAM size, the smaller RAM size, should choose the smaller backbone.
  • For easier understanding how MaixHub works, at first time you can choose tfjs platform instead of tinymaix to run model on your mobile phone.
  • Find demo in examples folder, use the maixhub_image_classification demo or maixhub_image_detection demo to run your model.

How to add new platform acceleration code

For new platforms, you just need add arch_xxx.h to src dir, and implement functions inside.
Here is the main functions you need implement (sort by importance):

a. TM_INLINE void tm_dot_prod(mtype_t* sptr, mtype_t* kptr,uint32_t size, sumtype_t* result)
	implement platform's dot product functions, usually use MAC related instructions. 

b. TM_INLINE void tm_dot_prod_pack2(mtype_t* sptr, mtype_t* kptr, uint32_t size, sumtype_t* result)
	implement platform's dual channel dot product functions  
  (not 4 or more channel, because some chip platform's register is not enough to support more channels)

c. TM_INLINE void tm_postprocess_sum(int n, sumtype_t* sums, btype_t* bs, int act, mtype_t* outp, sctype_t* scales, sctype_t out_s, zptype_t out_zp)
	implement platform's batch postprocess functions, note n is power of 2.

d. TM_INLINE void tm_dot_prod_3x3x1(mtype_t* sptr, mtype_t* kptr, sumtype_t* result)
	implement platform 3x3 dot product. mostly use handwrite cpu code.
  
e. TM_INLINE void tm_dot_prod_gap_3x3x1(mtype_t* sptr, mtype_t* kptr, uint32_t* k_oft, sumtype_t* result)
	implement platform 3x3 gap dot product. 

...

Contribution & Contacts

If you want contribute functions to TinyMaix, please read "TinyMaix Design" sections, we only want functions in "Features in design" and "Features maybe added".

If you want commit your port test result, please commit to benchmark.md. You are welcome to port TinyMaix to your chip/boards, it will prove how easy to use TinyMaix run Deeplearning model in MCUs~

If you have question with TinyMaix usage/porting, please feedback Issues in this repo.

If you have bussiness project consulting or private questions, you can send mail to [email protected] or [email protected] (Caesar Wu).

More Repositories

1

MaixPy-v1

MicroPython for K210 RISC-V, let's play with edge AI easier
Python
1,681
star
2

MaixPy-v1_scripts

micropython scripts for MaixPy
Python
622
star
3

Maix-Speech

Maix Speech AI lib, a fast and small speech lib running on embedded devices, including ASR, chat, TTS etc.
Python
312
star
4

Maixduino

Arduino port on Maix board ( k210 )
C++
213
star
5

TangNano-9K-example

TangNano-9K-example project
GLSL
211
star
6

platform-gd32v

GD32V: development platform for PlatformIO
Python
182
star
7

MaixPy3

Python sdk for Sipeed Maix-II-Dock(v831). Other board please use https://github.com/sipeed/MaixPy
Python
173
star
8

kflash_gui

Cross platform GUI wrapper for kflash.py (download(/burn) tool for k210)
Python
150
star
9

MaixPy

Easily create AI projects with Python on edge device
Python
142
star
10

RV-Debugger-BL702

RV-Debugger-BL702 Project, an opensource debugger implement
C
140
star
11

TangPrimer-20K-example

TangPrimer-20K-example project
GLSL
129
star
12

sipeed_wiki

sipeed wiki:https://wiki.sipeed.com
JavaScript
91
star
13

TangNano-4K-example

TangNano-4K-example project
GLSL
87
star
14

Longan_GD32VF_examples

example project for Longan Nano (GD32VF)
C
85
star
15

Maix_Toolbox

Maix Toolbox: Collections of model scripts
Python
82
star
16

maix_train

k210(MaixPy)/V831 model example train code, include mobilenet classifier and YOLO V2 detector
Python
81
star
17

platform-kendryte210

Kendryte K210: development platform for PlatformIO
Python
78
star
18

MaixUI

This is the MicroPython UI framework.
Python
76
star
19

LicheeRV-Nano-Build

LicheeRV-Nano-Build
C
75
star
20

bl602-hal

Hardware Abstract Layer for BL602 RISC-V WiFi + BLE SoC in embedded Rust
Rust
74
star
21

Longduino

Longduino is an Arduino port for gd32v mcu.
C
69
star
22

libmaix

New MaixCDK will replace this repo: https://github.com/sipeed/MaixCDK
C++
68
star
23

Maix-EMC

Maix-EMC: Embedded Model Convertor, convert NN model for embedded systems
Python
66
star
24

sipeed_keyboard

sipeed opensource mechanical keyboard make with BL706
C
64
star
25

TangNano-20K-example

TangNano-20K-example
GLSL
64
star
26

Tang-Nano-examples

Tang-Nano-examples
GLSL
62
star
27

M1s_BL808_example

M1s_BL808_example
C
61
star
28

Tang-Nano-Doc

Document for Tang Nano FPGA board
CSS
56
star
29

M1s_BL808_SDK

M1s(BL808)SDK
C
55
star
30

sipeed2022_autumn_competition

sipeed2022_autumn_competition
47
star
31

MF1-User-Manual

MF1 Face Recognition Module User Manual
Python
46
star
32

Longan-DOC

Longan RV32 Boards Document
CSS
46
star
33

MaixCDK

C/C++ development kit for Sipeed Maix ecosystem boards
C
42
star
34

TangMega-138KPro-example

Tang Mega 138K Pro examples
GLSL
39
star
35

axpi_bsp_sdk

linux bsp app & sample for axpi (ax620a)
C
34
star
36

LicheePi4A

LicheePi4A info&sdk
32
star
37

bl602-rust-guide

Instructions and examples for BL602 Rust support libraries
Rust
31
star
38

Sipeed_HardWare

Sipeed_HardWare wiki
27
star
39

M0S_BL616_example

M0S BL616 example (M0S dock)
C
25
star
40

TangPrimer-25K-example

TangPrimer-25K-example project
GLSL
25
star
41

MF1_SDK

C
22
star
42

bl602-pac

Embedded Rust's Peripheral Access Crate for BL602 microcontrollers
Rust
19
star
43

R329-Tina-jishu

Roff
18
star
44

M0sense_BL702_example

M0sense_BL702_example
C
16
star
45

TangMega-138K-example

TangMega-138K-example project
GLSL
16
star
46

Maixduino_DOC

Docs of Maixduino
CSS
14
star
47

MaixPy_Doc_Us_En_Backup

Python
14
star
48

Tang-Hex-BSP

Tang-Hex-BSP: BSP for ZYNQ 7020 based FPGA Board Tang-Hex
13
star
49

sipeed2022_spring_competition

sipeed2022_spring_competition
Python
11
star
50

Maix-Keras-workspace

Keras workspace for Sipeed MAIX boards
Python
11
star
51

Maix-TF-workspace

Maix-TF-workspace: collections of tensorflow works
Python
11
star
52

rpyc_ikernel

more lightweight remote calling Python on jupyter.
Jupyter Notebook
11
star
53

Maix-Face-Uart-Lib

Sipeed Maix Uart Face Recognition Module/Firmware's Protocol Parse Library
C
10
star
54

TangNano-1K-examples

TangNano-1K-examples (GW1NZ-LV1)
GLSL
10
star
55

Tang-Hex-Doc

Tang-Hex-Doc: Documents for ZYNQ 7020 Based FPGA Board Tang-Hex
HTML
10
star
56

LonganPi-3H-SDK

LonganPi 3H SDK
Shell
7
star
57

MetaSense-ComTool

MetaSense A010's ComTool
Python
6
star
58

r329-linux-4.9

C
5
star
59

MaixSense_ROS

ROS package for MaixSense RGBD/ToF Sensor
C
5
star
60

M0P_BL618_examples

M0P_BL618_examples
C
5
star
61

LicheePi4A-Build

LicheePi4A-Build
C
5
star
62

libomv

Transplantation of openMV core algorithm imlib
C
5
star
63

Nano-Doc-Backup

The backup of Lichee Nano Doc,building with sphnix. http://nano.lichee.pro
Python
4
star
64

r329-target

Roff
3
star
65

ullama2.c

ullama2.c: run llama2 on MCUs (minimum 480KB SRAM)
3
star
66

r329-package

C
2
star
67

platform-riscv-e203

platform IDE for E203 RISC-V core, based on Lichee Tang board
2
star
68

BL702_Si2c_uart_translation

BL702 Slave i2c uart translation.
C
2
star
69

MaixPy_Doc_Zh_Cn_Backup

Python
2
star
70

TangMega-60K-example

TangMega 60K example
1
star
71

NanoKVM

NanoKVM: Affordable, Multifunctional, Nano RISC-V IP-KVM
1
star