• Stars
    star
    250
  • Rank 162,397 (Top 4 %)
  • Language
    C++
  • License
    MIT License
  • Created over 8 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GPU Performance API for AMD GPUs

GPU Performance API


The GPU Performance API (GPUPerfAPI, or GPA) is a powerful library which provides access to GPU Performance Counters. It can help analyze the performance and execution characteristics of applications using a Radeonâ„¢ GPU. This library is used by Radeon GPU Profiler as well as several third-party tools.

Table of Contents

Downloads

Prebuilt binaries can be downloaded from the Releases page: https://github.com/GPUOpen-Tools/gpu_performance_api/releases.

Major Features

  • Provides a standard API for accessing GPU Performance counters for both graphics and compute workloads across multiple GPU APIs.
  • Supports Vulkanâ„¢, DirectXâ„¢ 12, DirectX 11, OpenGL, and OpenCLâ„¢.
  • Supports all current Radeon graphics cards and APUs based on Graphics IP version 8 and newer.
  • Supports both Windows and Linux.
  • Provides derived "public" counters based on raw hardware counters.
  • Provides access to some raw hardware counters. See Raw Hardware Counters for more information.

What's New

Version 3.13.1 (06/22/2023)

  • Add support for additional AMD Radeon RX 7000 Series hardware.
  • Add support for AMD Radeon 700M Series APUs.
    • Vulkan and OpenGL are supported on existing drivers; DX12, DX11, and OpenCL will be enabled by an upcoming driver.
  • Bug Fixes:
    • Fixed performance regression in GPUPerfAPIDX12[-x64].dll

System Requirements

  • An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.
  • Windows: Radeon Software Adrenalin 2020 Edition 20.11.2 or later (Driver Packaging Version 20.45 or later).
  • Linux: Radeon Software for Linux Revision 20.45 or later.
  • Radeon GPUs or APUs based on Graphics IP version 6 and 7 are no longer supported by GPUPerfAPI. Please use an older version (3.3) with older hardware.
  • Windows 7, 8.1, 10, or 11.
  • Ubuntu (16.04 and later) and CentOS/RHEL (7 and later) distributions.

Cloning the Repository

To clone the GPA repository, execute the following git command

After cloning the repository, please run the following python script to retrieve the required dependencies and generate the build files (see BUILD.md for more information):

  • python pre_build.py

Source Code Directory Layout

Documentation

The documentation for GPUPerfAPI can be found in each GitHub release. In the release .zip file or .tgz file, there will be a "docs" directory. Simply open the index.html file in a web browser to view the documentation.

The documentation is hosted publicly at: http://gpuperfapi.readthedocs.io/en/latest/

Raw Hardware Counters

This release exposes both "Derived" counters and "Raw Hardware" counters. Derived counters are counters that are computed using a set of raw hardware counters. This version allows you to access the raw hardware counters by simply specifying a flag when calling GpaOpenContext.

New Pipeline-Based Counters

It was discovered that the improvements introduced in Vega, RDNA, and RDNA2 architectures were not being properly accounted for in GPUPerfAPI v3.9, and caused a lot of known issues to be called out in that release. In certain cases, the driver and hardware are able to make optimizations by combining two shader stages together, which prevented GPUPerfAPI from identifying which instructions where executed for which shader type. As a result of these changes, GPUPerfAPI is no longer able to expose instruction counters for each API-level shader, specifically Vertex Shaders, Hull Shaders, Domain Shaders, and Geometry Shaders. Pixel Shaders and Compute Shaders remain unchanged. We are now exposing these instruction counters based on the type of shader pipeline being used. In pipelines that do not use tessellation, the instruction counts for both the Vertex and Geometry Shaders (if used) will be combined in the VertexGeometry group (ie: counters with the "VsGs" prefix). In pipelines that use tessellation, the instruction counts for both the Vertex and Hull Shaders will be combined in the PreTessellation group (ie: counters with the "PreTessellation" or "PreTess" prefix), and instruction counts for the Domain and Geometry Shaders (if used) will be combined in the PostTessellation group (ie: counters with the "PostTessellation" or "PostTess" prefix). The table below may help to better understand the new mapping between the API-level shaders (across the top), and which prefixes to look for in the GPUPerfAPI counters.

Pipeline Vertex Hull Domain Geometry Pixel Compute
VS-PS VsGs PS
VS-GS-PS VsGs VsGs PS
VS-HS-DS-PS PreTess PreTess PostTess PostTess PS
VS-HS-DS-GS-PS PreTess PreTess PostTess PostTess PS
CS CS

Known Issues

Counter Validity on Specific Hardware

There are some counters that are returning unexpected results on specific hardware with certain APIs.

  • AMD Radeon RX 6700M, DX11: CSLDSBankConflict and CSLDSBankConflictCycles may consistently report as much as 30x higher than expected.
  • AMD Radeon RX 480, DX12: CulledPrims and PSPixelsOut may inconsistently report higher than expected.

Counter Validation Errors in D3D12ColorCube Sample App

Due to the extensive counter validation now being done in the D3D12ColorCube sample application, and some expected variation in nondeterministic counters across a wide range of systems, the sample app may report errors on some systems. Likewise, some counters are marked as known issues and we are investigating the underlying causes of the inconsistent results.

OpenCL Performance Counter Accuracy For Radeon 6000 Series GPUs

The following performance counter values may not be accurate for OpenCL applications running on Radeon 6000 Series GPUs:

  • Wavefronts, VALUInsts, SALUInsts, SALUBusy, VALUUtilization: These values should be representative of performance, but may not be 100% accurate.

OpenGL FetchSize Counter on Radeon RX 6000 Series GPUs

FetchSize counter will show an error when enabled on Radeon RX 6000 Series GPUs using OpenGL.

Ubuntu 20.04 LTS Vulkan ICD Issue

On Ubuntu 20.04 LTS, Vulkan ICD may not be set to use AMD Vulkan ICD. In this case, it needs to be explicitly set to use AMD Vulkan ICD before using the GPA. It can be done by setting the VK_ICD_FILENAMES environment variable to /etc/vulkan/icd.d/amd_icd64.json.

Adjusting Linux Clock Mode

Adjusting the GPU clock mode on Linux is accomplished by writing to: /sys/class/drm/card\<N\>/device/power_dpm_force_performance_level, where <N> is the index of the card in question.

By default this file is only modifiable by root, so the application being profiled would have to be run as root in order for it to modify the clock mode. It is possible to modify the permissions for the file instead so that it can be written by unprivileged users. The following command will achieve this: sudo chmod ugo+w /sys/class/drm/card0/device/power_dpm_force_performance_level

  • Note that changing the permissions on a system file like this could circumvent security.
  • On multi-GPU systems you may have to replace "card0" with the appropriate card number.
  • You may have to reboot the system for the change to take effect.
  • Setting the GPU clock mode is not working correctly for Radeon 5700 Series GPUs, potentially leading to some inconsistencies in counter values from one run to the next.

Profiling Bundles

Profiling bundles in DirectX12 and Vulkan is not working properly. It is recommended to remove those GPA Samples from your application, or move the calls out of the bundle for profiling.

Style and Format Change

The source code of this product is being reformatted to follow the Google C++ Style Guide https://google.github.io/styleguide/cppguide.html. In the interim you may encounter a mix of both an older C++ coding style, as well as the newer Google C++ Style. Please refer to the .clang-format file in the root directory of the product for additional style information.

More Repositories

1

compressonator

Tool suite for Texture and 3D Model Compression, Optimization and Analysis using CPUs, GPUs and APUs
C++
1,300
star
2

radeon_gpu_analyzer

The Radeon GPU Analyzer (RGA) is an offline compiler and code analysis tool for Vulkan, DirectX, OpenGL, and OpenCL.
C++
417
star
3

radeon_gpu_profiler

Radeon GPU Profiler (RGP) is a tool from AMD that allows for deep inspection of GPU workloads.
390
star
4

GPU-Reshape

GPU Reshape (GRS) is an API & vendor agnostic instrumentation framework, with instruction level validation.
C++
379
star
5

ocat

The Open Capture and Analytics Tool (OCAT) provides an FPS overlay and performance measurement for D3D11, D3D12, and Vulkan
C++
325
star
6

radeon_raytracing_analyzer

The Radeon Raytracing Analyzer (RRA) is a tool to visualize and inspect Bounding Volume Hierarchies (BVH) for ray tracing applications.
C++
245
star
7

frame_latency_meter

C++
217
star
8

radeon_memory_visualizer

The Radeon Memory Visualizer (RMV) is a software tool that will allow users to analyze video memory usage on AMD Radeon GPUs.
C++
124
star
9

radeon_compute_profiler

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCLâ„¢ and ROCm/HSA applications. This information can be used by developers to discover bottlenecks in the application and to find ways to optimize the application's performance.
C++
84
star
10

radeon_gpu_detective

Tool for post-mortem analysis of GPU crashes.
C++
50
star
11

vscode_extensions

Handy extensions for Visual Studio Code to work on AMD GPU technologies
TypeScript
28
star
12

isa_spec_manager

Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.
C++
18
star
13

device_info

C++
16
star
14

dev_driver_tools

Developer Driver Tools components (Radeon Developer Service and Radeon Developer Panel)
C++
16
star
15

radeon_developer_panel

The Radeon Developer Panel (RDP) is a software tool that allows users to capture RGP profiles, RMV traces, RRA scenes, and RGD crash analysis dumps on Radeon GPUs.
16
star
16

appsdk

C
10
star
17

adl

C
10
star
18

qt_common

C++
8
star
19

common_src_amddxext

C
6
star
20

common_lib_ext_zlib_1.2.8

C
6
star
21

dynamic_library_module

C++
5
star
22

opengl

C
4
star
23

common_src_amdtoswrappers

Common OS Abstraction layer shared by multiple tool projects
C++
4
star
24

tsingleton

C++
4
star
25

common_src_amdtbasetools

Common base code shared by multiple tool projects
C++
4
star
26

common_src_misc

C++
3
star
27

common_src_amdvkext

C
3
star
28

common_lib_amd_acl

C
3
star
29

adl_util

C++
3
star
30

windows_kits

CMake
3
star
31

common_lib_amd_ags_4.0.0

HLSL
3
star
32

tinyxml_2

C++
3
star
33

common_lib_ext_opengles

C
3
star
34

common_lib_ext_utf8cpp

C++
3
star
35

update_check_api

The UpdateCheckAPI repository provides utility source code for AMD Tools to check for new releases available through GitHub's Release API.
C++
3
star
36

common_src_miniz

C
2
star
37

common_lib_ext_tinyxml2_5.0.1

C++
2
star
38

common_src_version_info

C#
2
star
39

common_lib_ext_glew_1.9

C
2
star
40

comgr_utils

C++
2
star
41

common_src_amdtmutex

C++
2
star
42

common_lib_ext_boost_1.59

C++
2
star
43

common_src_cmake_modules

CMake
2
star
44

common_lib_ext_zlib_1.2.10

C
2
star
45

common_src_acl_module_manager

C++
2
star
46

system_info_utils

C++
2
star
47

common_lib_ext_opencv_2.49

C++
2
star
48

common_src_celf

C++
2
star
49

common_src_vsprops

Common Visual Studio build settings shared across multiple projects
2
star
50

common_lib_ext_openexr_2.2

C++
1
star
51

common_lib_ext_yaml_cpp

C++
1
star