• Stars
    star
    169
  • Rank 224,453 (Top 5 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

mperf

Release Notes | Roadmap | Apps | 中文

mperf is a modular micro-benchmark/toolkit for kernel performance analysis.

Features.

  • Investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.
  • Draw graph of hierarchical roofline model, used to evaluate performance.
  • Collect CPU/GPU PMU events data.
  • Analyze CPU/GPU PMU events data(TMA Methodology and customized metrics), used to identify performance bottlenecks.
  • OpenCL Linter, used to guide manual OpenCL kernel optimization[TBD].
  • C++ Project
  • support platform: ARM CPUs, Mali GPUs, Adreno 6xx GPUs
  • Lightweight and embeddable library
  • The iOS platform is not yet fully functional.

Installation

mperf support CMake build system and require CMake version upper than 3.15.2, you can compile the mperf follow the step:

  • clone or download the project
    git clone https://github.com/MegEngine/mperf.git
    git submodule update --init --recursive
  • choose a test platform
    • if you will test arm processor in android OS
      • a ndk is required
        • download the NDK and extract to the host machine
        • set the NDK_ROOT env to the path of extracted NDK directory
    • if you will test x86 processor in linux OS
      • a gcc or clang compiler should find by cmake through PATH env
  • if your target test OS is android,run the android_build.sh to build it
    • print the usage about android_build.sh
      ./android_build.sh -h
    • build for armv7 cpu
      ./android_build.sh -m armeabi-v7a
    • build for arm64 cpu
      ./android_build.sh [-m arm64-v8a] // default march is arm64-v8a
    • build with mali mobile gpu
      ./android_build.sh -g mali [arm64-v8a, armeabi-v7a]
    • build with adreno mobile gpu
      ./android_build.sh -g adreno [arm64-v8a, armeabi-v7a]
    • build with pfm
      ./android_build.sh -p [arm64-v8a, armeabi-v7a]
    • build in debug mode
      ./android_build.sh -d [arm64-v8a, armeabi-v7a]
    • build with your custom install directory
      ./android_build.sh -i /your/custom/cmake/install/prefix [arm64-v8a, armeabi-v7a]
      e.g.: ./android_build.sh -i ~/mperf_install [-m arm64-v8a] // default march is arm64-v8a
  • if you target test OS is linux,if you want to enable pfm add -DMPERF_ENABLE_PFM=ON to cmake command
    cmake -S . -B "build-x86" -DMPERF_ENABLE_PFM=ON
    cmake --build "build-x86" --config Release 
  • after build, some executable files are stored in mperf build_dir/apps directory. And you can install the mperf to your system path or your custom install directory by
    cmake --build <mperf_build_dir> --target install 
    e.g.: cmake --build ./build-arm64-v8a/ --target install
  • and now, you can use find_package command to import the installed mperf, and use like
    set(mperf_DIR /path/to/your/installed/mperfConfig.cmake) # Note, it's the dirname of mperfConfig.cmake, e.g. set(mperf_DIR ~/mperf_install/lib/cmake/mperf/)
    find_package(mperf REQUIRED)
    target_link_libraries(your_target mperf::mperf)
  • alternatively, add_subdirectory(mperf) will incorporate the library directly in to your's CMake project.

Usage

  • basic usage for mperf xpmu module:
    mperf::CpuCounterSet cpuset = "CYCLES,INSTRUCTIONS,...";
    mperf::XPMU xpmu(cpuset);
    xpmu.run();
    
    ... // add your function to be measured
    
    xpmu.sample();
    xpmu.stop();
    please see cpu_pmu / mali_pmu / adreno_pmu for more details.
  • basic usage for mperf tma module:
    mperf::tma::MPFTMA mpf_tma(mperf::MPFXPUType::A55);
    mpf_tma.init(
            {"Frontend_Bound", "Bad_Speculation", "Backend_Bound", "Retiring", ...});
    size_t gn = mpf_tma.group_num();
    for (size_t i = 0; i < gn; ++i) {
        mpf_tma.start(i);
        for (size_t j = 0; j < iter_num; ++j) {
            ... // add your function to be measured
        }
        mpf_tma.sample_and_stop(iter_num);
    }
    mpf_tma.deinit();
    please see arm_cpu_tma for more details.

Source Directory Structure

  • apps Various user examples, please see apps doc for more details.
  • eca A module for collecting and analyzing PMU events data(Including TMA analysis).
  • uarch A set of low-level micro-benchmarks to investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.
  • doc Some documents about roofline and tma usage, please see index for the list.
  • cmake Some cmake relative files.
  • common Some common helper functions.
  • third_party Some dependent libraries.
  • linter OpenCL Linter [TBD].

Tutorial

  • A tutorial about how to optimize matmul to achieve peak performance on ARM A55 core, which will illustrate the basic logic of how to use mperf help your optimization job, please reference optimize the matmul with the help of mperf.

License

mperf is licensed under the Apache-2.0 license.

More Repositories

1

MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
C++
4,758
star
2

InferLLM

a lightweight LLM model inference framework
C++
670
star
3

MegCC

MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
C++
473
star
4

MegSpot

MegSpot是一款高效、专业、跨平台的图片&视频对比应用
Vue
459
star
5

MegFlow

Efficient ML solution for long-tailed demands.
Rust
402
star
6

Models

采用MegEngine实现的各种主流深度学习模型
Python
303
star
7

RepLKNet

Official MegEngine implementation of RepLKNet
Python
268
star
8

MegPeak

C++
247
star
9

PMRID

ECCV2020 - Practical Deep Raw Image Denoising on Mobile Devices
Python
199
star
10

NBNet

NBNet: Noise Basis Learning for Image Denoising with Subspace Projection
Python
148
star
11

YOLOX

MegEngine implementation of YOLOX
Python
106
star
12

Hub

基于旷视研究院领先的深度学习算法,提供满足多业务场景的预训练模型
Python
90
star
13

mgeconvert

MegEngine到其他框架的转换器
Python
67
star
14

ICD

This is the official implementation of the paper "Instance-conditional Knowledge Distillation for Object Detection", based on MegEngine and Pytorch.
Python
57
star
15

MegRay

A communication library for deep learning
C++
48
star
16

GyroFlow

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning
Python
44
star
17

Docs

MegEngine Documentations
Python
44
star
18

Documentation

MegEngine Official Documentation
Python
39
star
19

Resource

Jupyter Notebook
32
star
20

OMNet

OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration, ICCV 2021, MegEngine implementation.
Python
32
star
21

examples

A set of examples around MegEngine
Python
29
star
22

ECCV2022-RIFE

Official MegEngine Implementation of Real-Time Intermediate Flow Estimation for Video Frame Interpolation
Python
29
star
23

FINet

This is the official MegEngine implementation of FINet: Dual Branches Feature Interaction for Partial-to-Partial Point Cloud Registration, AAAI 2022
Python
20
star
24

MegDiffusion

MegEngine implementation of Diffusion Models.
Python
16
star
25

awesome-megengine

Awesome Resources about MegEngine
15
star
26

cutlass-bak

modified cutlass
C++
14
star
27

End-to-end-ASR-Transformer

An end to end ASR Transformer model training repo
Python
13
star
28

swin-transformer

Swin-Transformer implementation in MegEngine. This is a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.
Python
12
star
29

MegCat

A Deep Learning Project about cats.
11
star
30

NeRF

NeRF implementation in MegEngine
Python
9
star
31

Inference-Demo

推理样例
C++
8
star
32

megenginelite-rs

Rust
7
star
33

MegCookbook

以《解析深度学习——卷积神经网络原理与视觉实践》一书内容为脉络,提供MegEngine具体代码实现示例和项目案例
7
star
34

cheat_sheet_for_pytorch_immigrant

一份给从 PyTorch 迁移过来的用户的小抄
6
star
35

MegEngine-DMVFN

Python
6
star
36

MegEngine-SAM

Python
5
star
37

invis

invisible megengine API
Python
4
star
38

midout

Reduce binary size by removing code blocks
C++
4
star
39

MegRL

A MegEngine implementation of 6 RL algorithms
Python
3
star
40

MegEngine-Benchmark

Python
3
star
41

torch2mge

Python
2
star
42

Transfer-Learning-Library

Transfer Learning Library for Domain Adaptation, Task Adaptation, and Domain Generalization
Python
2
star
43

xopr

Experimental Operator Library for MegEngine
Python
2
star
44

mperf-libpfm4

forked from https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/
C
1
star