• Stars
    star
    277
  • Rank 148,875 (Top 3 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A

CMSIS-DSP

GitHub release (latest by date including pre-releases) GitHub

About

CMSIS-DSP is an optimized compute library for embedded systems (DSP is in the name for legacy reasons).

It provides optimized compute kernels for Cortex-M and for Cortex-A.

Different variants are available according to the core and most of the functions are using a vectorized version when the Helium or Neon extension is available.

This repository contains the CMSIS-DSP library and several other projects:

  • Test framework for bare metal Cortex-M or Cortex-A
  • Examples for bare metal Cortex-M
  • ComputeGraph
  • PythonWrapper

You don't need any of the other projects to build and use CMSIS-DSP library. Building the other projects may require installation of other libraries (CMSIS), other tools (Arm Virtual Hardware) or CMSIS build tools.

License Terms

CMSIS-DSP is licensed under Apache License 2.0.

CMSIS-DSP Kernels

Kernels provided by CMSIS-DSP (list not exhaustive):

  • Basic mathematics (real, complex, quaternion, linear algebra, fast math functions)
  • DSP (filtering)
  • Transforms (FFT, MFCC, DCT)
  • Statistics
  • Classical ML (Support Vector Machine, Distance functions for clustering ...)

Kernels are provided with several datatypes : f64, f32, f16, q31, q15, q7.

Python wrapper

A PythonWrapper is also available and can be installed with:

pip install cmsisdsp

With this wrapper you can design your algorithm in Python using an API as close as possible to the C API. The wrapper is compatible with NumPy. The wrapper is supporting fixed point arithmetic. This wrapper works in google colab.

The goal is to make it easier to move from a design to a final implementation in C.

Compute Graph

CMSIS-DSP is also providing an experimental static scheduler for compute graph to describe streaming solutions:

  • You define your compute graph in Python
  • A static and deterministic schedule (computed by the Python script) is generated
  • The static schedule can be run on the device with low overhead

The Python scripts for the static scheduler generator are part of the CMSIS-DSP Python wrapper.

The header files are part of the CMSIS-DSP pack (version 1.10.2 and above).

The Compute Graph makes it easier to implement a streaming solution : connecting different compute kernels each consuming and producing different amount of data.

Support / Contact

For any questions or to reach the CMSIS-DSP team, please create a new issue in https://github.com/ARM-software/CMSIS-DSP/issues

Table of content

Building for speed

CMSIS-DSP is used when you need performance. As consequence CMSIS-DSP should be compiled with the options giving the best performance:

Options to use

  • -Ofast must be used for best performances.
  • When using Helium it is strongly advised to use -Ofast
  • GCC is currently not giving good performances when targeting Helium. You should use the Arm compiler

When float are used, then the fpu should be selected to ensure that the compiler is not using a software float emulation.

When building with Helium support, it will be automatically detected by CMSIS-DSP. For Neon, it is not the case and you must enable the option -DARM_MATH_NEON for the C compilation. With cmake this option is controlled with -DNEON=ON.

  • -DLOOPUNROLL=ON can also be used when compiling with cmake
  • It corresponds to the C options -DARM_MATH_LOOPUNROLL

Compilers are doing unrolling. So this option may not be needed but it is highly dependent on the compiler. With some compilers, this option is needed to get better performances.

Speed of memory is important. If you can map the data and the constant tables used by CMSIS-DSP in DTCM memory then it is better. If you have a cache, enable it.

Options to avoid

  • -fno-builtin
  • -ffreestanding because it enables previous options

The library is doing some type punning to process word 32 from memory as a pair of q15 or a quadruple of q7. Those type manipulations are done through memcpy functions. Most compilers should be able to optimize out those function calls when the length to copy is small (4 bytes).

This optimization will not occur when -fno-builtin is used and it will have a very bad impact on the performances.

Some compiler may also require the use of option -munaligned-access to specify that unaligned accesses are used.

Half float support

f16 data type (half float) has been added to the library. It is useful only if your Cortex has some half float hardware acceleration (for instance with Helium extension). If you don't need f16, you should disable it since it may cause compilation problems. Just define -DDISABLEFLOAT16 when building.

How to build

You can build CMSIS-DSP with the open CMSIS-Pack, or cmake, or Makefile and it is also easy to build if you use any other build tool.

How to build with MDK or Open CMSIS-Pack

The standard way to build is by using the CMSIS pack technology. CMSIS-DSP is available as a pack.

This pack technology is supported by some IDE like Keil MDK or Keil studio.

You can also use those packs using the Open CMSIS-Pack technology and from command line on any platform.

You should first install the tools from https://github.com/Open-CMSIS-Pack/devtools/tree/main/tools

You can get the CMSIS-Toolbox which is containing the package installer, cmsis build and cmsis project manager. Here is some documentation:

Once you have installed the tools, you'll need to download the pack index using the cpackget tool.

Then, you'll need to convert a solution file into .cprj. For instance, for the CMSIS-DSP Examples, you can go to:

Examples/cmsis_build

and then type

csolution convert -s examples.csolution_ac6.yml

This command processes the examples.csolution_ac6.yml describing how to build the examples for several platforms. It will generate lots of .cprj files that can be built with cbuild.

If you want to build the FFT example for the Corstone-300 virtual hardware platform, you could just do:

cbuild "fftbin.Release+VHT-Corstone-300.cprj"

How to build with Make

There is an example Makefile in Source.

In each source folder (like BasicMathFunctions), you'll see files with no _datatype suffix (like BasicMathFunctions.c and BasicMathFunctionsF16.c).

Those files are all you need in your makefile. They are including all other C files from the source folders.

Then, for the includes you'll need to add the paths: Include, PrivateInclude and, since there is a dependency to CMSIS Core, Core/Include from CMSIS_5/CMSIS.

If you are building for Cortex-A and want to use Neon, you'll also need to include ComputeLibrary/Include and the source file in ComputeLibrary/Source.

How to build with cmake

Create a CMakeLists.txt and inside add a project.

Add CMSIS-DSP as a subdirectory. The variable CMSISDSP is the path to the CMSIS-DSP repository in below example.

cmake_minimum_required (VERSION 3.14)

# Define the project
project (testcmsisdsp VERSION 0.1)

add_subdirectory(${CMSISDSP}/Source bin_dsp)

CMSIS-DSP is dependent on the CMSIS Core includes. So, you should define CMSISCORE on the cmake command line. The path used by CMSIS-DSP will be ${CMSISCORE}/Include.

You should also set the compilation options to use to build the library.

If you build for Helium, you should use any of the option MVEF, MVEI or HELIUM.

If you build for Neon, use NEON and/or NEONEXPERIMENTAL.

Launching the build

Once cmake has generated the makefiles, you can use a GNU Make to build.

make VERBOSE=1

How to build with any other build system

You need the following folders:

  • Source
  • Include
  • PrivateInclude
  • ComputeLibrary (only if you target Neon)

In Source subfolders, you may either build all of the source file with a datatype suffix (like _f32.c), or just compile the files without a datatype suffix. For instance for BasicMathFunctions, you can build all the C files except BasicMathFunctions.c and BasicMathFunctionsF16.c, or you can just build those two files (they are including all of the other C files of the folder).

f16 files are not mandatory. You can build with -DDISABLEFLOAT16

How to build for aarch64

The intrinsics defined in Core_A/Include are not available on recent Cortex-A processors.

But you can still build for those Cortex-A cores and benefit from the Neon intrinsics.

You need to build with -D__GNUC_PYTHON__ on the compiler command line. This flag was introduced for building the Python wrapper and is disabling the use of CMSIS Core includes.

When this flag is enabled, CMSIS-DSP is defining a few macros used in the library for compiler portability:

#define  __ALIGNED(x) __attribute__((aligned(x)))
#define __STATIC_FORCEINLINE static inline __attribute__((always_inline)) 
#define __STATIC_INLINE static inline

If the compiler you are using is requiring different definitions, you can add them to arm_math_types.h in the Include folder of the library. MSVC and XCode are already supported and in those case, you don't need to define -D__GNUC_PYTHON__

Then, you need to define -DARM_MATH_NEON

For cmake the equivalent options are:

  • -DHOST=ON
  • -DNEON=ON

cmake is automatically including the ComputeLibrary folder. If you are using a different build, you need to include this folder too to build with Neon support.

Code size

Previous versions of the library were using compilation directives to control the code size. It was too complex and not available in case CMSIS-DSP is only delivered as a static library.

Now, the library relies again on the linker to do the code size optimization. But, this implies some constraints on the code you write and new functions had to be introduced.

If you know the size of your FFT in advance, use initializations functions like arm_cfft_init_64_f32 instead of using the generic initialization functions arm_cfft_init_f32. Using the generic function will prevent the linker from being able to deduce which functions and tables must be kept for the FFT and everything will be included.

There are similar functions for RFFT, MFCC ...

If the flag ARM_DSP_CONFIG_TABLES is still set, you'll now get a compilation error to remind you that this flag no more have any effect on code size and that you may have to rework the initializations.

Folders and files

The only folders required to build and use CMSIS-DSP Library are:

  • Source
  • Include
  • PrivateInclude
  • ComputeLibrary (only when using Neon)

Other folders are part of different projects, tests or examples.

Folders

  • cmsisdsp

    • Required to build the CMSIS-DSP PythonWrapper for the Python repository
    • It contains all Python packages
  • ComputeLibrary:

    • Some kernels required when building CMSIS-DSP with Neon acceleration
  • Examples:

    • Examples of use of CMSIS-DSP on bare metal Cortex-M
    • Require the use of CMSIS Build tools
  • Include:

    • Include files for CMSIS-DSP
  • PrivateInclude:

    • Some include needed to build CMSIS-DSP
  • PythonWrapper:

    • C code for the CMSIS-DSP PythonWrapper
    • Examples for the PythonWrapper
  • Scripts:

    • Debugging scripts
    • Script to generate some coefficient tables used by CMSIS-DSP
  • Compute Graph:

    • Not needed to build CMSIS-DSP. This project is relying on CMSIS-DSP library
    • Examples for the Compute Graph
    • C++ templates for the Compute Graph
    • Default (and optional) nodes
  • Source:

    • CMSIS-DSP source
  • Testing:

    • CMSIS-DSP Test framework for bare metal Cortex-M and Cortex-A
    • Require the use of CMSIS build tools

Files

Some files are needed to generate the PythonWrapper:

  • PythonWrapper_README.md
  • LICENSE
  • MANIFEST.in
  • pyproject.toml
  • setup.py

And we have a script to make it easier to customize the build:

  • cmsisdspconfig.py:
    • Web browser UI to generate build configurations (temporary until the CMSIS-DSP configuration is reworked to be simpler and more maintainable)

More Repositories

1

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
C++
2,539
star
2

arm-trusted-firmware

Read-only mirror of Trusted Firmware-A
C
1,690
star
3

CMSIS_5

CMSIS Version 5 Development Repository
C
1,327
star
4

armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
C++
1,162
star
5

ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
C
1,040
star
6

astc-encoder

The Arm ASTC Encoder, a compressor for the Adaptive Scalable Texture Compression data format.
C
880
star
7

abi-aa

Application Binary Interface for the Arm® Architecture
HTML
673
star
8

vulkan_best_practice_for_mobile_developers

Vulkan best practice for mobile developers
C++
564
star
9

CMSIS-FreeRTOS

FreeRTOS adaptation for CMSIS-RTOS Version 2
C
502
star
10

optimized-routines

Optimized implementations of various library functions for ARM architecture processors
C
486
star
11

CMSIS_4

Cortex Microcontroller Software Interface Standard (V4 no longer maintained)
C
451
star
12

mango

Parallel Hyperparameter Tuning in Python
Jupyter Notebook
396
star
13

ML-examples

Arm Machine Learning tutorials and examples
C++
371
star
14

LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets.
CMake
331
star
15

opengl-es-sdk-for-android

OpenGL ES SDK for Android
CSS
325
star
16

SCALE-Sim

Python
296
star
17

Arm-2D

2D Graphic Library optimized for Cortex-M processors
C
295
star
18

Tool-Solutions

Tutorials & examples for Arm software development tools.
C
217
star
19

EndpointAI

C++
216
star
20

SCP-firmware

Read-only mirror of System Control Processor (SCP) firmware
C
205
star
21

vulkan-sdk

Github repository for the Vulkan SDK
C
199
star
22

lisa

Linux Integrated System Analysis
Jupyter Notebook
192
star
23

HWCPipe

Hardware counters interface
C++
188
star
24

u-boot

Clone of upstream U-Boot repo with patches for Arm development boards
C
177
star
25

CMSIS-NN

CMSIS-NN Library
C
173
star
26

CMSIS-Driver

Repository of microcontroller peripheral driver implementing the CMSIS-Driver API specification
C
165
star
27

android-nn-driver

C++
151
star
28

CMSIS_6

CMSIS version 6 (successor of CMSIS_5)
C
149
star
29

ML-zoo

Python
149
star
30

workload-automation

A framework for automating workload execution and measurement collection on ARM devices.
Python
138
star
31

gator

Sources for Arm Streamline's gator daemon
C++
121
star
32

keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769
Jupyter Notebook
116
star
33

ebbr

Embedded Base Boot Requirements Specification
PostScript
113
star
34

perfdoc

A cross-platform Vulkan layer which checks Vulkan applications for best practices on Arm Mali devices.
C++
112
star
35

linux

C
95
star
36

asl-interpreter

Example implementation of Arm's Architecture Specification Language (ASL)
OCaml
94
star
37

MDK-Middleware

MDK-Middleware (file system, network and USB components) source code for Arm Cortex-M using CMSIS-Drivers and CMSIS-RTOS2 APIs.
C
93
star
38

sbsa-acs

ARM Enterprise: SBSA Architecture Compliance Suite
C
88
star
39

sesr

Super-Efficient Super Resolution
Python
87
star
40

mobile-studio-integration-for-unity

Mobile Studio tool integration with C# scripting for the Unity game engine.
C
86
star
41

CSAL

Coresight Access Library
C
78
star
42

progress64

PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.
C
70
star
43

psa-arch-tests

Tests for verifying implementations of TBSA-v8M and the PSA Certified APIs
C
66
star
44

CMSIS-RTX

RTX5 real time kernel for Arm Cortex-based embedded systems (spin-off from CMSIS_5)
C
64
star
45

Cloud-IoT-Core-Kit-Examples

Example projects and code are supplied to support the Arm-based IoT Kit for Cloud IoT Core
Python
62
star
46

developer

GTM related documentation
C++
61
star
47

cmsis-pack-eclipse

CMSIS-Pack Eclipse Plug-ins
Java
60
star
48

trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Python
60
star
49

ethos-n-driver-stack

Driver stack (including user space libraries, kernel module and firmware) for the Arm® Ethos™-N NPU
C++
59
star
50

AVH-GetStarted

DEPRECATED - use instead AVH_CI_Template
C
58
star
51

CMSIS-CV

Computer Vision library for IoT
C++
54
star
52

acle

Arm C Language Extensions (ACLE)
Python
52
star
53

arm-systemready

Arm SystemReady
Shell
52
star
54

patrace

C++
52
star
55

tarmac-trace-utilities

Tools for analyzing and browsing Tarmac instruction traces.
C++
47
star
56

devlib

Library for interaction with and instrumentation of remote devices.
Python
47
star
57

speculation-barrier

This project provides a header file which contains wrapper macros for the __builtin_load_no_speculate builtin function defined at https://www.arm.com/security-update This builtin function defines a speculation barrier, which can be used to limit the conditions under which a value which has been loaded can be used under speculative execution.
Objective-C
44
star
58

arm-enterprise-acs

ARM Enterprise ACS
C
42
star
59

DeepFreeze

SystemVerilog
38
star
60

tf-issues

Issue tracking for the ARM Trusted Firmware project
36
star
61

scalpel

This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100 and LeNet-5 is included.
Python
35
star
62

psa-api

Documentation source and development of the PSA Certified API
C
34
star
63

TZ-TRNG

TrustZone True Number Generator
C
33
star
64

AVH

AVH-FVP: Arm Virtual Hardware - Fixed Virtual Platform
C
32
star
65

CMSIS-View

Repository of CMSIS Software Pack for software event generation and input/output handling.
Go
32
star
66

perf-libs-tools

C
31
star
67

bob-build

Meta-build system using Blueprint and ninja
Go
30
star
68

CMSIS-DAP

CoreSight Debug Access Port (DAP) debug probe protocol reference implementation (spin-off from CMSIS_5)
C
30
star
69

mram_simulation_framework

MRAM magnetization simulation framework. s-LLGS python and verilog-a solvers for transients simulation and Fokker-planck equation solver for stochastic analysis
Python
28
star
70

bento-linker

A light-weight alternative to processes for microcontrollers.
C
27
star
71

toolchain-gnu-bare-metal

A toolchain sub-project dedicated to build GNU toolchain for 32-bit bare-metal targets
Shell
26
star
72

data

Machine-readable data describing Arm architecture and implementations. Includes JSON descriptions of implemented PMU events.
26
star
73

synchronization-benchmarks

Collection of synchronization micro-benchmarks and traces from infrastructure applications
C
26
star
74

libGPUInfo

A utility library for application developers to query the configuration of the Arm Immortalis GPU or Arm Mali GPU present in their system.
C++
24
star
75

cryptocell-312-runtime

CryptoCell 312 runtime code
C
24
star
76

CMSIS-Compiler

CMSIS Compiler support for Arm Compiler
C
24
star
77

vscode-cmsis-csolution

Extension support for VS Code CMSIS Project Extension
24
star
78

libddssec

DDS Security library - Project moved to https://gitlab.arm.com/libraries/libddssec
C
23
star
79

NXP_LPC

CMSIS Driver Implementations for the NXP LPC Microcontroller Series
C
23
star
80

golang-utils

Helpers and utilities for Golang in order to do actions not available in the standard library.
Go
23
star
81

AArch64cryptolib

AArch64cryptolib is a from scratch implementation of cryptographic primitives aiming for optimal performance on Arm A-class cores
C
23
star
82

AVH-TFLmicrospeech

Example: Micro speech for TensorFlow Lite
C
22
star
83

Shackleton-Framework

A generic genetic programming framework that aims to make genetic programming easier for a myriad of uses. Currently, the main target is to use the framework for code optimization in tandem with the LLVM framework.
C
22
star
84

CMSIS-Stream

CMSIS-Stream software component
Python
21
star
85

bart

Behavioural Analysis and Regression Toolkit
Python
20
star
86

PAF

PAF (the Physical Attack Framework) is a framework for analyzing physical attacks: fault injection and side channels
C++
20
star
87

HPCG_for_Arm

C++
20
star
88

armnn-mlperf

Arm mlperf.org benchmark port
C++
20
star
89

coresight-wire-protocol

Coresight Wire Protocol (CSWP) Server/Client and streaming trace examples.
HTML
18
star
90

ATP-Engine

C++
18
star
91

bsa-acs

Arm SystemReady : BSA Architecture Compliance Suite
C
17
star
92

ATS-Keyword

Smart Home Total Solution - Keyword Recognition
C
17
star
93

open-iot-sdk

Open-IoT-SDK - Home of the Total Solution applications.
C
16
star
94

vscode-keil-studio-pack

Extension pack for all VS Code extensions
16
star
95

CMSIS-RTOS2_Validation

Validation test suite for CMSIS-RTOS2 API implementations using Arm Virtual Hardware (AVH).
C
16
star
96

vr-sdk-for-android

VR SDK for Android
CSS
16
star
97

meabo

Multi-purpose multi-phase micro-benchmark
C
15
star
98

avhclient

Arm Virtual Hardware Client
Python
15
star
99

CMSIS-Driver_Validation

Test suite for verifying CMSIS-Driver implementations.
C
15
star
100

Methodology_for_ArmIE_SVE

C++
15
star