• Stars
    star
    355
  • Rank 119,764 (Top 3 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Benchmarking Neural Network Inference on Mobile Devices
Mobile AI Bench

License pipeline status

FAQ | 中文

In recent years, the on-device deep learning applications are getting more and more popular on mobile phones or IoT devices. It's a challenging task for the developers to deploy their deep learning models in their mobile applications or IoT devices.

They need to optionally choose a cost-effective hardware solution (i.e. chips and boards), then a proper inference framework, optionally utilizing quantization or compression techniques regarding the precision-performance trade-off, and finally run the model on one or more of heterogeneous computing devices. How to make an appropriate decision among these choices is a tedious and time-consuming task.

Mobile AI Benchmark (i.e. MobileAIBench) is an end-to-end benchmark tool which covers different chips and inference frameworks, with results include both speed and model accuracy, which will give insights for developers.

Daily Benchmark Results

Please check benchmark step in daily CI pipeline page, due to the lack of test devices, the CI result may not cover all hardwares and frameworks.

FAQ

Q: Why are benchmark results not stable on my device?

A: Due to power save considerations, some SoCs have aggressive and advanced power control scheduling to reduce power consumption which make performance quite unstable (especially CPU). Benchmark results highly depend on states of devices, e.g., running processes, temperature, power control policy. It is recommended to disable power control policy (as shown in tools/power.sh) if possible (e.g., rooted phone). Otherwise, keep your device at idle state with low temperature, and benchmark one model on one framework each time.

Q: Why do some devices run faster (or slower) than expected in the CI benchmark result?

A: Some devices is rooted and has some specialized performance tuning while some others is not rooted and failed to make such tuning (see the code for more details).

Q: Why is ncnn initialization time much less than others?

A: ncnn benchmark uses fake model parameters and skips loading weights from filesystem.

Q: Does benchmark use all available cores of devices?

A: Most modern Android phones use ARM big.LITTLE architecture which can lead to significant variance between different runs of the benchmark, we use only available big cores to reduce this variance by taskset command for MACE/NCNN/TFLITE benchmark. Moreover, there are no well-defined APIs for SNPE to bind to big cores and set thread count. Thread count can be set by adding --num_threads to tools/benchmark.sh command.

Environment requirement

MobileAIBench supports several deep learning frameworks (called executor in this project, i.e., MACE, SNPE, ncnn, TensorFlow Lite and HIAI) currently, which may require the following dependencies:

Software Installation command Tested version
Python 2.7
ADB apt-get install android-tools-adb Required by Android run, >= 1.0.32
Android NDK NDK installation guide Required by Android build, r15c
Bazel bazel installation guide 0.13.0
CMake apt-get install cmake >= 3.11.3
FileLock pip install -I filelock==3.0.0 Required by Android run
PyYaml pip install -I pyyaml==3.12 3.12.0
sh pip install -I sh==1.12.14 1.12.14
SNPE (optional) download and uncompress 1.18.0

Note 1: SNPE has strict license that disallows redistribution, so the default link in the Bazel WORKSPACE file is only accessible by the CI server. To benchmark SNPE in your local system (i.e. set --executors with all or SNPE explicitly), you need to download the SDK here, uncompress it, copy libgnustl_shared.so and modify WORKSPACE as the following:

#new_http_archive(
#    name = "snpe",
#    build_file = "third_party/snpe/snpe.BUILD",
#    sha256 = "8f2b92b236aa7492e4acd217a96259b0ddc1a656cbc3201c7d1c843e1f957e77",
#    strip_prefix = "snpe-1.22.2.233",
#    urls = [
#        "https://cnbj1-fds.api.xiaomi.net/aibench/third_party/snpe-1.22.2_with_libgnustl_shared.so.zip",
#    ],
#)

new_local_repository(
    name = "snpe",
    build_file = "third_party/snpe/snpe.BUILD",
    path = "/path/to/snpe",
)

Note 2: HIAI has strict license that disallows redistribution, so the default link in the Bazel WORKSPACE file is only accessible by the CI server. To benchmark HIAI in your local system (i.e. set --executors with all or HIAI explicitly), you need to login and download the SDK here, uncompress it and get the HiAI_DDK_100.200.010.011.zip file, uncompress it and modify WORKSPACE as the following:

#new_http_archive(
#    name = "hiai",
#    build_file = "third_party/hiai/hiai.BUILD",
#    sha256 = "8da8305617573bc495df8f4509fcb1655ffb073d790d9c0b6ca32ba4a4e41055",
#    strip_prefix = "HiAI_DDK_100.200.010.011",
#    type = "zip",
#    urls = [
#        "http://cnbj1.fds.api.xiaomi.com/aibench/third_party/HiAI_DDK_100.200.010.011_LITE.zip",
#    ],
#)

new_local_repository(
    name = "hiai",
    build_file = "third_party/hiai/hiai.BUILD",
    path = "/path/to/hiai",
)

Architecture

+-----------------+         +------------------+      +---------------+
|   Benchmark     |         |   BaseExecutor   | <--- | MaceExecutor  |
+-----------------+         +------------------+      +---------------+
| - executor      |-------> | - executor       |
| - model_name    |         | - device_type    |      +---------------+
| - quantize      |         |                  | <--- | SnpeExecutor  |
| - input_names   |         +------------------+      +---------------+
| - input_shapes  |         | + Init()         |
| - output_names  |         | + Prepare()      |      +---------------+
| - output_shapes |         | + Run()          | <--- | NcnnExecutor  |
| - run_interval  |         | + Finish()       |      +---------------+
| - num_threads   |         |                  |
+-----------------+         |                  |      +---------------+
| - Run()         |         |                  | <--- | TfLiteExecutor|
+-----------------+         |                  |      +---------------+
        ^     ^             |                  |
        |     |             |                  |      +---------------+
        |     |             |                  | <--- | HiaiExecutor  |
        |     |             +------------------+      +---------------+
        |     |
        |     |             +--------------------+
        |     |             |PerformanceBenchmark|
        |     --------------+--------------------+
        |                   | - Run()            |
        |                   +--------------------+
        |
        |                   +---------------+      +---------------------+                           
+--------------------+ ---> |PreProcessor   | <--- |ImageNetPreProcessor |
| PrecisionBenchmark |      +---------------+      +---------------------+
+--------------------+
| - pre_processor    |      +---------------+      +---------------------+
| - post_processor   | ---> |PostProcessor  | <--- |ImageNetPostProcessor|
| - metric_evaluator |      +---------------+      +---------------------+
+--------------------+
| - Run()            |      +---------------+
+--------------------+ ---> |MetricEvaluator|
                            +---------------+

How To Use

Benchmark Performance of all models on all executors

bash tools/benchmark.sh --benchmark_option=Performance \
                        --target_abis=armeabi-v7a,arm64-v8a,aarch64,armhf

The whole benchmark may take a few time, and continuous benchmarking may heat the device very quickly, so you may set the following arguments according to your interests. Only MACE supports precision benchmark right now.

option type default explanation
--benchmark_option str Performance Benchmark options, Performance/Precision.
--output_dir str output Benchmark output directory.
--executors str all Executors(MACE/SNPE/NCNN/TFLITE/HIAI), comma separated list or all.
--device_types str all DeviceTypes(CPU/GPU/DSP/NPU), comma separated list or all.
--target_abis str armeabi-v7a Target ABIs(armeabi-v7a,arm64-v8a,aarch64,armhf), comma separated list.
--model_names str all Model names(InceptionV3,MobileNetV1...), comma separated list or all.
--run_interval int 10 Run interval between benchmarks, seconds.
--num_threads int 4 The number of threads.
--input_dir str "" Input data directory for precision benchmark.

Configure ssh devices

For embedded ARM-Linux devices whose abi is aarch64 or armhf, ssh connection is supported. Configure ssh devices in generic-mobile-devices/devices_for_ai_bench.yml, for example:

devices:
  nanopi:
    target_abis: [aarch64, armhf]
    target_socs: RK3333
    models: Nanopi M4
    address: 10.231.46.118
    username: pi

Adding a model to run on existing executor

  • Add the new model name in aibench/proto/base.proto if not in there.

  • Configure the model info in aibench/proto/model.meta.

  • Configure the benchmark info in aibench/proto/benchmark.meta.

  • Run benchmark

    Performance benchmark.

    bash tools/benchmark.sh --benchmark_option=Performance \
                            --executors=MACE --device_types=CPU --model_names=MobileNetV1 \
                            --target_abis=armeabi-v7a,arm64-v8a,aarch64,armhf

    Precision benchmark. Only supports ImageNet images as inputs for benchmarking MACE precision.

    bash tools/benchmark.sh --benchmark_option=Precision --input_dir=/path/to/inputs \
                            --executors=MACE --device_types=CPU --model_names=MobileNetV1 \
                            --target_abis=armeabi-v7a,arm64-v8a,aarch64,armhf
  • Check benchmark result

    python report/csv_to_html.py

    Open the corresponding link in a browser to see the report.

Adding a new AI executor

  • Define executor and implement the interfaces:

    class YourExecutor : public BaseExecutor {
     public:
      YourExecutor() :
          BaseExecutor(executor_type, device_type, model_file, weight_file) {}
      
      // Init method should invoke the initializing process for your executor 
      // (e.g.  Mace needs to compile OpenCL kernel once per target). It will be
      // called only once when creating executor engine.
      virtual Status Init(int num_threads);
    
      // Load model and prepare to run. It will be called only once before 
      // benchmarking the model.
      virtual Status Prepare();
      
      // Run the model. It will be called more than once.
      virtual Status Run(const std::map<std::string, BaseTensor> &inputs,
                         std::map<std::string, BaseTensor> *outputs);
      
      // Unload model and free the memory after benchmarking. It will be called
      // only once.
      virtual void Finish();
    };
  • Include your executor header in aibench/benchmark/benchmark_main.cc:

    #ifdef AIBENCH_ENABLE_YOUR_EXECUTOR
    #include "aibench/executors/your_executor/your_executor.h"
    #endif
  • Add dependencies to third_party/your_executor, aibench/benchmark/BUILD and WORKSPACE. Put macro AIBENCH_ENABLE_YOUR_EXECUTOR into aibench/benchmark/BUILD at model_benchmark target.

  • Benchmark a model on existing executor

    Refer to [Adding a model to run on existing executor](#Adding a model to run on existing executor).

License

Apache License 2.0.

Notice

For third party dependencies, please refer to their licenses.

More Repositories

1

soar

SQL Optimizer And Rewriter
Go
8,659
star
2

mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
C++
4,922
star
3

open-falcon

A Distributed and High-Performance Monitoring System
3,025
star
4

Gaea

Gaea is a mysql proxy, it's developed by xiaomi b2c-dev team.
Go
2,621
star
5

naftis

An awesome dashboard for Istio built with love.
Go
1,891
star
6

mone

No description, website, or topics provided
Java
1,112
star
7

MiNLP

XiaoMi Natural Language Processing Toolkits
Scala
781
star
8

hiui

HIUI is a solution that is adequate for the fomulation and implementation of interaction and UI design standard for front, middle and backend.
TypeScript
738
star
9

android_tv_metro

android tv metro framework and server API
Java
653
star
10

minos

Minos is beyond a hadoop deployment system.
Python
522
star
11

rose

Rose is not only a framework.
Java
498
star
12

shepher

Java
493
star
13

MiLM-6B

427
star
14

chronos

Network service to provide globally strictly monotone increasing timestamp
Java
399
star
15

LuckyMoneyTool

Java
376
star
16

mace-models

Mobile AI Compute Engine Model Zoo
Python
368
star
17

kaldi-onnx

Kaldi model converter to ONNX
Python
236
star
18

linden

Java
233
star
19

themis

Themis provides cross-row/cross-table transaction on HBase based on google's percolator.
Java
226
star
20

rdsn

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/rdsn
C++
144
star
21

StableDiffusionOnDevice

本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。
C++
91
star
22

thain

Thain is a distributed flow schedule platform.
TypeScript
81
star
23

ozhera

Application Observable Platform in the Cloud Native Era
Java
72
star
24

misound

MiSound is a Android application making XiaoMi's SoundBar more powerful. EQ, control, player all in one.
Java
64
star
25

galaxy-sdk-java

Java SDK for Xiaomi Structured Datastore Service
Java
63
star
26

C3KG

Python
63
star
27

nnlib

Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
C
53
star
28

subllm

This repository is the official implementation of the ECAI 2024 conference paper SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Python
53
star
29

galaxy-fds-sdk-python

Python SDK for Xiaomi File Data Storage.
Python
51
star
30

jack

Jack is a cluster manager built on top of Zookeeper and thrift.
50
star
31

dasheng

Official PyTorch code for Deep Audio-Signal Holistic Embeddings
Python
46
star
32

cmath

CMATH: Can your language model pass Chinese elementary school math test?
Python
38
star
33

pegasus-rocksdb

Has been migrated to https://github.com/pegasus-kv/rocksdb
C++
34
star
34

cloud-ml-sdk

Python
32
star
35

talos-sdk-golang

Go SDK for Xiaomi Streaming Message Queue
Go
32
star
36

pegasus-java-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/java-client
Java
31
star
37

ECFileCache

Java
30
star
38

mace-kit

C++
27
star
39

pegasus-go-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/go-client
Go
24
star
40

emma

Python
22
star
41

galaxy-fds-sdk-java

Java SDK for Xiaomi File Data Storage.
Java
22
star
42

xiaomi.github.com

JavaScript
21
star
43

CGNet

The official implementation of the ECCV 2024 paper: Continuity Preserving Online CenterLine Graph Learning
Python
20
star
44

galaxy-fds-sdk-android

Android SDK for Xiaomi File Data Storage.
Java
18
star
45

go-fds

Next-generation fds golang sdk
Go
17
star
46

galaxy-fds-sdk-php

PHP SDK for Xiaomi File Data Storage.
PHP
16
star
47

galaxy-sdk-python

Python SDK for Xiaomi Structured Datastore Service
Python
16
star
48

galaxy-sdk-go

Go SDK for Xiaomi Structured Datastore Service
Go
15
star
49

galaxy-hadoop

Hadoop interface for Xiaomi Open Storage
Java
13
star
50

galaxy-thrift-api

Thrift API for Xiaomi Structured Datastore Service
Thrift
12
star
51

galaxy-fds-sdk-cpp

C++ SDK for Xiaomi File Data Storage
C++
11
star
52

galaxy-fds-sdk-javascript

JavaScript
9
star
53

pegasus-python-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/python-client
Python
8
star
54

DetermLR

Open source code for paper
Python
8
star
55

galaxy-sdk-php

PHP SDK for Xiaomi Structured Datastore Service
PHP
8
star
56

pegasus-datax

Provide pegasus plugin in alibaba/DataX, please refer to 'pegasuswriter/doc/pegasuswriter.md'.
Java
8
star
57

galaxy-fds-migration-tool

A MapReduce tool to migrate objects or files parallely between different object storage systems
Java
7
star
58

galaxy-sdk-nodejs

Node.js SDK for Xiaomi Structured Datastore Service
JavaScript
6
star
59

pegasus-YCSB

Provide pegasus plugin in YCSB, please refer to 'Test Pegasus' section in README.
Java
6
star
60

pegasus-nodejs-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/nodejs-client
JavaScript
6
star
61

pegasus-scala-client

Has been migrated to https://github.com/apache/incubator-pegasus/tree/master/scala-client
Scala
6
star
62

PowerTestDemo

Java
5
star
63

galaxy-fds-sdk-ios

ios sdk for galaxy-fds
Objective-C
5
star
64

SiMuST-C

Python
5
star
65

galaxy-sdk-cpp

C++ SDK for Xiaomi Structured Datastore Service
C++
5
star
66

nlpcc-2023-shared-task-9

https://mp.weixin.qq.com/s/pBDvTmr_oOHUPzBhjXG-aw
Python
5
star
67

TED-MMST

1
star
68

PowerTestDemoGlobal

The demo script of Power Consumption Test.
Java
1
star
69

galaxy-sdk-javascript

Javascript SDK for Xiaomi Structured Datastore Service
JavaScript
1
star