• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    C++
  • Created about 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sources for Arm Streamline's gator daemon

Gator daemon, barman embedded agent, and related tools

The source code for barman, gatord, and related tools.

The barman subdirectory contains the sources for the barman embedded agent which can be used to collect performance data within an embedded environment as per the Arm Streamline Target Setup Guide for Bare-metal Applications.

The rest of this document refers to gatord and related tools.

License

This project contains code from other projects listed below. The original license text is included in those source files.

  • libsensors source code in daemon/libsensors licensed under LGPL-2.1-or-later
  • perf_event.h from Linux userspace kernel headers in daemon/k licensed under GPL-2.0-only WITH Linux-syscall-note

The pre-built gatord shipped with Streamline uses musl. For musl license information see the COPYRIGHT file shipped with Streamline, or https://git.musl-libc.org/cgit/musl/tree/COPYRIGHT

Contributing

Contributions are accepted under the same license as the associated subproject with developer sign-off as described in Contributing.

Purpose

Instructions on setting up Arm Streamline on the target.

A target agent (gator) is required to run on the Arm Linux target in order for Arm Streamline to operate. Gator requires Linux kernel version 3.4 or later.

Introduction

A Linux development environment with cross compiling tools is most likely required, depending on what is already created and provided.

Please see release notes for information about changes in this release.

Kernel configuration

Gator uses the Linux Perf API (perf_event_open) for most of its data collection. Additionally it will use ftrace tracepoints and some other common features such as debugfs/sysfs.

Most users will not need to make any changes to their kernel configuration (and in many cases they cannot) as most recent Android devices and Linux distributions correctly configure their kernel with the required options.

If you are a system integrator, or compiling your own kernel, refer to the section [Kernel configuration options].

Use the pre-built gator daemon

Streamline provides pre-built binaries for aarch64 and armv7a-hardfloat Linux and Android. This gator daemon should work in most cases so building the gator daemon is only required if a non-standard configuration is required.

To improve portablility gatord is statically compiled against musl libc from http://www.musl-libc.org/download.html instead of glibc. The gator daemon will work correctly with either glibc or musl.

Building the gator daemon

Building gatord has the following requirements:

  • C++17 supporting compiler.
  • CMake (3.16 or later).
  • GCC or Clang compiler able to target the appropriate target architecture.
  • GCC or Clang compiler able to target the host architecture if cross compiling.
  • For Android, the Android NDK (LTS r21e, r23b are tested and known to work).
  • A Linux build environment (other CMake compatible enironments may work but are not tested).
  • Additionally, vcpkg depends on various unix tools being installed. -- A minimal build environment for linux can be achieved on Ubuntu 20.04 with: sudo apt-get install ninja-build cmake gcc g++ g++-aarch64-linux-gnu curl zip unzip tar pkg-config git

For Android targets

The most convenient option is to use the provided build-android.sh script.

./build-android.sh -h

Prints a summary of the available configuration options.

In most cases it should be possible to run:

./build-android.sh

which will compile gatord for Android targetting aarch64 devices.

Using the configuration options, it is possible to change the minimum SDK level, architecture, CMake binary to use, CMake generator, build directory, NDK path.

For Linux targets

For simple configurations, the most convenient option is to use the provided build-linux.sh script. This allows selection of one of a few predefined configurations:

  • Building using the host native Clang toolchain.
  • Building using the host native GCC toolchain.
  • Building using GCC targetting aarch64 or armv7a against the glibc or musl based libc.
./build-linux.sh -h

Prints a summary of the available configuration options.

When natively compiling it should be possible to run:

./build-linux.sh

Otherwise the typical use is to pass a profile option using -p.

Running CMake manually

Since the build is CMake based, it is possible to invoke cmake directly. This option requires some understanding of vcpkg and cmake tools.

Please note, the section on telemetry that vcpkg collects by default. The provided build-android.sh and build-linux.sh scripts disable this by default.

It is possible to build with out using vcpkg at all, by passing -DENABLE_VCPKG=OFF to the cmake build, but it will be necessary to provide precompiled versions of any dependencies.

Running gator

As a root user

  • Copy gatord into the target's filesystem.
  • Ensure gatord has execute permissions: chmod +x gatord
  • The daemon must be run with root privileges: sudo su gatord &

This configuration requires Linux 3.4 or later with a correctly configured kernel.

As a non-root user

  • Copy gatord into the target's filesystem.
  • Ensure gatord has execute permissions: chmod +x gatord
  • Run the daemon: ./gatord &

This configuration provides a reduced set of software only CPU counters such as CPU utilization and process statistics, as well as Mali hardware counters on supported Mali platforms.

Perf PMU support

To check the perf PMUs support by your kernel, run ls /sys/bus/event_source/devices/ If you see something like ARMv7_Cortex_A## this indicates A## support. If you see CCI_400 this indicates CCI-400 support. If you see ccn, it indicates CCN support.

CCN

CCN requires a perf driver to work. The necessary perf driver has been merged into Linux 3.17 but can be backported to previous versions (see https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/?id=a33b0daab73a0e08cc04459dd44b0121a8e8f81b and later bugfixes)

Compiling an application or shared library

Recommended compiler settings:

  • -g: Debug information, such as line numbers, needed for best analysis results.
  • -fno-inline: Speed improvement when processing the image files and most accurate analysis results.
  • -fno-omit-frame-pointer: Arm EABI frame pointers allow recording of the call stack with each sample taken when in Arm state (i.e. not -mthumb).
  • -marm: This option is required for ARMv7 and earlier if your compiler is configured with --with-mode=thumb, otherwise call stack unwinding will not work.

For Android ART, passing --no-strip-symbols to dex2oat will result in function names but not line numbers to be included in the dex files. This can be done by running setprop dalvik.vm.dex2oat-flags --no-strip-symbols on the device and then regenerating the dex files.

Polling /dev, /sys and /proc files

Gator supports reading arbitrary /dev, /sys and /proc files 10 times a second. It will either interpret the file contents as a number or use a POSIX extended regex to extract the number, see events-Filesystem.xml for examples.

Kernel configuration options

The following options are required for correct functioning of Gator.

menuconfig options (depending on the kernel version, the location of these configuration settings within menuconfig may differ)

  • General Setup
    • Timers subsystem
      • [*] High Resolution Timer Support (enables CONFIG_HIGH_RES_TIMERS)
    • Kernel Performance Events And Counters
      • [*] Kernel performance events and counters (enables CONFIG_PERF_EVENTS)
    • [*] Profiling Support (enables CONFIG_PROFILING)
  • Kernel Features
    • [*] Use local timer interrupts (only required for SMP and for version before Linux 3.12, enables CONFIG_LOCAL_TIMERS)
    • [*] Enable hardware performance counter support for perf events (enables CONFIG_HW_PERF_EVENTS)
  • CPU Power Management
    • CPU Frequency scaling
      • [*] CPU Frequency scaling (enables CONFIG_CPU_FREQ)
  • Kernel hacking
    • [*] Compile the kernel with debug info (optional, enables CONFIG_DEBUG_INFO)
    • [*] Tracers
      • [*] Trace process context switches and events (#)

(#) The "Trace process context switches and events" is not the only option that enables tracing (CONFIG_GENERIC_TRACER or CONFIG_TRACING as well as CONFIG_CONTEXT_SWITCH_TRACER) and may not be visible in menuconfig as an option if other trace configurations are enabled. Other trace configurations being enabled is sufficient to turn on tracing.

The configuration options:

  • CONFIG_MODULES and MODULE_UNLOAD (not needed if the gator driver is built into the kernel)
  • CONFIG_GENERIC_TRACER or CONFIG_TRACING
  • CONFIG_CONTEXT_SWITCH_TRACER
  • CONFIG_PROFILING
  • CONFIG_HIGH_RES_TIMERS
  • CONFIG_LOCAL_TIMERS (for SMP systems and kernel versions before 3.12)
  • CONFIG_PERF_EVENTS and CONFIG_HW_PERF_EVENTS (kernel versions 3.0 and greater)
  • CONFIG_DEBUG_INFO (optional, used for analyzing the kernel)
  • CONFIG_CPU_FREQ (optional, provides frequency setting of the CPU)

These may be verified on a running system using /proc/config.gz (if this file exists) by running zcat /proc/config.gz | grep <option>. For example, confirming that CONFIG_PROFILING is enabled

> zcat /proc/config.gz | grep CONFIG_PROFILING
CONFIG_PROFILING=y

If a device tree is used it must include the pmu bindings, see Documentation/devicetree/bindings/arm/pmu.txt for details.

Bugs

Kernels with CONFIG_CPU_PM enabled may produce invalid results on kernel versions prior to 4.6. The problem manifests as counters not showing any data, large spikes and non-sensible values for counters (e.g. Cycle Counter reading as very high). This issue stems from the fact that the kernel PMU driver does not save/restore state when the CPU is powered down/up. This issue is fixed in 4.6 so to resolve the issue either upgrade to a later kernel, or apply the fix to an older kernel. The patch for 4.6 that resolves the issue is found here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da4e4f18afe0f3729d68f3785c5802f786d36e34 - this patch has been tested as applying cleanly to 4.4 kernel and it may be possible to back port it to other versions as well. Users of this patch may also need to apply https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cbcc72e037b8a3eb1fad3c1ae22021df21c97a51 as well.

There is a bug in some Linux kernels where an Oops may occur when a core is offlined (user space gator only). The fix was merged into mainline in 3.14-rc5, see http://git.kernel.org/tip/e3703f8cdfcf39c25c4338c3ad8e68891cca3731, and has been backported to older kernels (3.4.83, 3.10.33, 3.12.14 and 3.13.6).

CPU PMU: CPUx reading wrong counter -1 in dmesg. To work around, update to the latest Linux kernel.

Scheduler switch resolutions are on exact millisecond boundaries. To work around, update to the latest Linux kernel.

There is a bug in some Linux kernels where perf misidentifies the CPU type. To see if you are affected by this, run ls /sys/bus/event_source/devices/ and verify the listed processor type matches what is expected. For example, an A9 should show the following.

# ls /sys/bus/event_source/devices/
ARMv7_Cortex_A9  breakpoint  software  tracepoint

To work around the issue try upgrading to a later kernel.

On some versions of Android, annotations may not work unless SELinux is disabled by running # setenforce 0

Some targets do not correctly emit uevents when cores go on/offline. This will cause CPU Activity with user space gator to be either 0% or 100% on a given core and the Heat Map may show a large number of unresolved processes. There is no user accessible workaround. To test for this run # ./gatord -d | grep uevent When cores go on/offline with user space gator something similar to the following should be emitted

INFO: read(UEvent.cpp:61): uevent: offline@/devices/system/cpu/cpu1
INFO: read(UEvent.cpp:61): uevent: online@/devices/system/cpu/cpu1

The cores that are on/offline can be checked by running # cat /sys/devices/system/cpu/cpu*/online This issue affects a given target if the on/offline cores shown by the cat command change but no cpu uevent is emitted.

On some older versions of Android, the following issue may occur when starting gatord when using ndk-build

# ./gatord
[1] + Stopped (signal)        ./gatord
#
[1]   Segmentation fault      ./gatord
#

Starting with Android-L only position independent executables (pie) are supported, but some older versions of Android do not support them. To avoid this issue, modify Android.mk and remove the references to pie.

Profiling the kernel (optional)

CONFIG_DEBUG_INFO must be enabled, see "Kernel configuration" section above.

Use vmlinux as the image for debug symbols in Streamline.

Drivers may be profiled using this method by statically linking the driver into the kernel image or adding the driver as an image to Streamline.

More Repositories

1

ComputeLibrary

The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
C++
2,539
star
2

arm-trusted-firmware

Read-only mirror of Trusted Firmware-A
C
1,690
star
3

CMSIS_5

CMSIS Version 5 Development Repository
C
1,327
star
4

armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
C++
1,162
star
5

ML-KWS-for-MCU

Keyword spotting on Arm Cortex-M Microcontrollers
C
1,040
star
6

astc-encoder

The Arm ASTC Encoder, a compressor for the Adaptive Scalable Texture Compression data format.
C
880
star
7

abi-aa

Application Binary Interface for the Arm® Architecture
HTML
673
star
8

vulkan_best_practice_for_mobile_developers

Vulkan best practice for mobile developers
C++
564
star
9

CMSIS-FreeRTOS

FreeRTOS adaptation for CMSIS-RTOS Version 2
C
502
star
10

optimized-routines

Optimized implementations of various library functions for ARM architecture processors
C
486
star
11

CMSIS_4

Cortex Microcontroller Software Interface Standard (V4 no longer maintained)
C
451
star
12

mango

Parallel Hyperparameter Tuning in Python
Jupyter Notebook
396
star
13

ML-examples

Arm Machine Learning tutorials and examples
C++
371
star
14

LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets.
CMake
331
star
15

opengl-es-sdk-for-android

OpenGL ES SDK for Android
CSS
325
star
16

SCALE-Sim

Python
296
star
17

Arm-2D

2D Graphic Library optimized for Cortex-M processors
C
295
star
18

CMSIS-DSP

CMSIS-DSP embedded compute library for Cortex-M and Cortex-A
C
277
star
19

Tool-Solutions

Tutorials & examples for Arm software development tools.
C
217
star
20

EndpointAI

C++
216
star
21

SCP-firmware

Read-only mirror of System Control Processor (SCP) firmware
C
205
star
22

vulkan-sdk

Github repository for the Vulkan SDK
C
199
star
23

lisa

Linux Integrated System Analysis
Jupyter Notebook
192
star
24

HWCPipe

Hardware counters interface
C++
188
star
25

u-boot

Clone of upstream U-Boot repo with patches for Arm development boards
C
177
star
26

CMSIS-NN

CMSIS-NN Library
C
173
star
27

CMSIS-Driver

Repository of microcontroller peripheral driver implementing the CMSIS-Driver API specification
C
165
star
28

android-nn-driver

C++
151
star
29

CMSIS_6

CMSIS version 6 (successor of CMSIS_5)
C
149
star
30

ML-zoo

Python
149
star
31

workload-automation

A framework for automating workload execution and measurement collection on ARM devices.
Python
138
star
32

keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769
Jupyter Notebook
116
star
33

ebbr

Embedded Base Boot Requirements Specification
PostScript
113
star
34

perfdoc

A cross-platform Vulkan layer which checks Vulkan applications for best practices on Arm Mali devices.
C++
112
star
35

linux

C
95
star
36

asl-interpreter

Example implementation of Arm's Architecture Specification Language (ASL)
OCaml
94
star
37

MDK-Middleware

MDK-Middleware (file system, network and USB components) source code for Arm Cortex-M using CMSIS-Drivers and CMSIS-RTOS2 APIs.
C
93
star
38

sbsa-acs

ARM Enterprise: SBSA Architecture Compliance Suite
C
88
star
39

sesr

Super-Efficient Super Resolution
Python
87
star
40

mobile-studio-integration-for-unity

Mobile Studio tool integration with C# scripting for the Unity game engine.
C
86
star
41

CSAL

Coresight Access Library
C
78
star
42

progress64

PROGRESS64 is a C library of scalable functions for concurrent programs, primarily focused on networking applications.
C
70
star
43

psa-arch-tests

Tests for verifying implementations of TBSA-v8M and the PSA Certified APIs
C
66
star
44

CMSIS-RTX

RTX5 real time kernel for Arm Cortex-based embedded systems (spin-off from CMSIS_5)
C
64
star
45

Cloud-IoT-Core-Kit-Examples

Example projects and code are supplied to support the Arm-based IoT Kit for Cloud IoT Core
Python
62
star
46

developer

GTM related documentation
C++
61
star
47

cmsis-pack-eclipse

CMSIS-Pack Eclipse Plug-ins
Java
60
star
48

trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Python
60
star
49

ethos-n-driver-stack

Driver stack (including user space libraries, kernel module and firmware) for the Arm® Ethos™-N NPU
C++
59
star
50

AVH-GetStarted

DEPRECATED - use instead AVH_CI_Template
C
58
star
51

CMSIS-CV

Computer Vision library for IoT
C++
54
star
52

acle

Arm C Language Extensions (ACLE)
Python
52
star
53

arm-systemready

Arm SystemReady
Shell
52
star
54

patrace

C++
52
star
55

tarmac-trace-utilities

Tools for analyzing and browsing Tarmac instruction traces.
C++
47
star
56

devlib

Library for interaction with and instrumentation of remote devices.
Python
47
star
57

speculation-barrier

This project provides a header file which contains wrapper macros for the __builtin_load_no_speculate builtin function defined at https://www.arm.com/security-update This builtin function defines a speculation barrier, which can be used to limit the conditions under which a value which has been loaded can be used under speculative execution.
Objective-C
44
star
58

arm-enterprise-acs

ARM Enterprise ACS
C
42
star
59

DeepFreeze

SystemVerilog
38
star
60

tf-issues

Issue tracking for the ARM Trusted Firmware project
36
star
61

scalpel

This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100 and LeNet-5 is included.
Python
35
star
62

psa-api

Documentation source and development of the PSA Certified API
C
34
star
63

TZ-TRNG

TrustZone True Number Generator
C
33
star
64

AVH

AVH-FVP: Arm Virtual Hardware - Fixed Virtual Platform
C
32
star
65

CMSIS-View

Repository of CMSIS Software Pack for software event generation and input/output handling.
Go
32
star
66

perf-libs-tools

C
31
star
67

bob-build

Meta-build system using Blueprint and ninja
Go
30
star
68

CMSIS-DAP

CoreSight Debug Access Port (DAP) debug probe protocol reference implementation (spin-off from CMSIS_5)
C
30
star
69

mram_simulation_framework

MRAM magnetization simulation framework. s-LLGS python and verilog-a solvers for transients simulation and Fokker-planck equation solver for stochastic analysis
Python
28
star
70

bento-linker

A light-weight alternative to processes for microcontrollers.
C
27
star
71

toolchain-gnu-bare-metal

A toolchain sub-project dedicated to build GNU toolchain for 32-bit bare-metal targets
Shell
26
star
72

data

Machine-readable data describing Arm architecture and implementations. Includes JSON descriptions of implemented PMU events.
26
star
73

synchronization-benchmarks

Collection of synchronization micro-benchmarks and traces from infrastructure applications
C
26
star
74

libGPUInfo

A utility library for application developers to query the configuration of the Arm Immortalis GPU or Arm Mali GPU present in their system.
C++
24
star
75

cryptocell-312-runtime

CryptoCell 312 runtime code
C
24
star
76

CMSIS-Compiler

CMSIS Compiler support for Arm Compiler
C
24
star
77

vscode-cmsis-csolution

Extension support for VS Code CMSIS Project Extension
24
star
78

libddssec

DDS Security library - Project moved to https://gitlab.arm.com/libraries/libddssec
C
23
star
79

NXP_LPC

CMSIS Driver Implementations for the NXP LPC Microcontroller Series
C
23
star
80

golang-utils

Helpers and utilities for Golang in order to do actions not available in the standard library.
Go
23
star
81

AArch64cryptolib

AArch64cryptolib is a from scratch implementation of cryptographic primitives aiming for optimal performance on Arm A-class cores
C
23
star
82

AVH-TFLmicrospeech

Example: Micro speech for TensorFlow Lite
C
22
star
83

Shackleton-Framework

A generic genetic programming framework that aims to make genetic programming easier for a myriad of uses. Currently, the main target is to use the framework for code optimization in tandem with the LLVM framework.
C
22
star
84

CMSIS-Stream

CMSIS-Stream software component
Python
21
star
85

bart

Behavioural Analysis and Regression Toolkit
Python
20
star
86

PAF

PAF (the Physical Attack Framework) is a framework for analyzing physical attacks: fault injection and side channels
C++
20
star
87

HPCG_for_Arm

C++
20
star
88

armnn-mlperf

Arm mlperf.org benchmark port
C++
20
star
89

coresight-wire-protocol

Coresight Wire Protocol (CSWP) Server/Client and streaming trace examples.
HTML
18
star
90

ATP-Engine

C++
18
star
91

bsa-acs

Arm SystemReady : BSA Architecture Compliance Suite
C
17
star
92

ATS-Keyword

Smart Home Total Solution - Keyword Recognition
C
17
star
93

open-iot-sdk

Open-IoT-SDK - Home of the Total Solution applications.
C
16
star
94

vscode-keil-studio-pack

Extension pack for all VS Code extensions
16
star
95

CMSIS-RTOS2_Validation

Validation test suite for CMSIS-RTOS2 API implementations using Arm Virtual Hardware (AVH).
C
16
star
96

vr-sdk-for-android

VR SDK for Android
CSS
16
star
97

meabo

Multi-purpose multi-phase micro-benchmark
C
15
star
98

avhclient

Arm Virtual Hardware Client
Python
15
star
99

CMSIS-Driver_Validation

Test suite for verifying CMSIS-Driver implementations.
C
15
star
100

Methodology_for_ArmIE_SVE

C++
15
star