• Stars
    star
    1,576
  • Rank 29,696 (Top 0.6 %)
  • Language
    C++
  • License
    MIT License
  • Created over 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Capture and analyze the high-level performance characteristics of graphics applications on Windows.

PresentMon

PresentMon is a tool to capture and analyze ETW events related to swap chain presentation on Windows. It can be used to trace key performance metrics for graphics applications (e.g., CPU and Display frame durations and latencies) and works across different graphics APIs, different hardware configurations, and for both desktop and UWP applications.

While PresentMon itself is focused on lightweight collection and analysis, there are several other programs that build on its functionality and/or helps visualize the resulting data. For example, see

License

Copyright (C) 2017-2022 Intel Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Releases

Binaries for main release versions of PresentMon are provided on GitHub:

See CONTRIBUTING for information on how to request features, report issues, or contribute code changes.

Command line options

Capture Target Options
-captureall Record all processes (default).
-process_name name Record only processes with the provided exe name. This argument can be repeated to capture multiple processes.
-exclude name Don't record processes with the provided exe name. This argument can be repeated to exclude multiple processes.
-process_id id Record only the process specified by ID.
-etl_file path Consume events from an ETW log file instead of running processes.
Output Options
-output_file path Write CSV output to the provided path.
-output_stdout Write CSV output to STDOUT.
-multi_csv Create a separate CSV file for each captured process.
-no_csv Do not create any output file.
-no_top Don't display active swap chains in the console
-qpc_time Output present time as a performance counter value.
-qpc_time_s Output present time as a performance counter value converted to seconds.
Recording Options
-hotkey key Use provided key to start and stop recording, writing to a unique CSV file each time. 'key' is of the form MODIFIER+KEY, e.g., "alt+shift+f11".
-delay seconds Wait for provided time before starting to record. If using -hotkey, the delay occurs each time recording is started.
-timed seconds Stop recording after the provided amount of time.
-exclude_dropped Exclude dropped presents from the csv output.
-scroll_indicator Enable scroll lock while recording.
-no_track_display Disable tracking through GPU and display.
-track_debug Adds additional data to output not relevant to normal usage.
Execution Options
-session_name name Use the provided name to start a new realtime ETW session, instead of the default "PresentMon". This can be used to start multiple realtime captures at the same time (using distinct, case-insensitive names). A realtime PresentMon capture cannot start if there are any existing sessions with the same name.
-stop_existing_session If a trace session with the same name is already running, stop the existing session (to allow this one to proceed).
-terminate_existing Terminate any existing PresentMon realtime trace sessions, then exit. Use with -session_name to target particular sessions.
-restart_as_admin If not running with elevated privilege, restart and request to be run as administrator. (See discussion above).
-terminate_on_proc_exit Terminate PresentMon when all the target processes have exited.
-terminate_after_timed When using -timed, terminate PresentMon after the timed capture completes.
Beta Options
-track_mixed_reality Capture Windows Mixed Reality data to a CSV file with "_WMR" suffix.

Comma-separated value (CSV) file output

CSV file names

By default, PresentMon creates a CSV file named "PresentMon-<Time>.csv", where "<Time>" is the creation time in ISO 8601 format. To specify your own output location, use the -output_file PATH command line argument.

If -multi_csv is used, then one CSV is created for each process captured and "-<ProcessName>-<ProcessId>" is appended to the file name.

If -hotkey is used, then one CSV is created for each time recording is started and "-<Index>" is appended to the file name.

CSV columns

Column Header Data Description
Application The name of the process that called Present().
ProcessID The process ID of the process that called Present().
SwapChainAddress The address of the swap chain that was presented into.
Runtime The runtime used to present (e.g., D3D9 or DXGI).
SyncInterval The sync interval provided by the application in the Present() call. This value may be modified later by the driver, e.g., based on control panel overrides.
PresentFlags Flags used in the Present() call.
PresentMode The presentation mode used by the system for this Present(). See the table below for more details.
This column is not available when -no_track_display is used.
AllowsTearing Whether tearing is possible (1) or not (0).
This column is not available when -no_track_display is used.
TimeInSeconds The time of the Present() call, in seconds, relative to when the PresentMon started recording.
QPCTime The time of the Present() call, as a performance counter value.
This column is only available when -qpc_time or -qpc_time_s are used.  When -qpc_time_s is used, the value is converted to seconds by dividing by the counter frequency.
msInPresentAPI The time spent inside the Present() call, in milliseconds.
msUntilRenderComplete The time between the Present() call and when the GPU work completed, in milliseconds.
This column is not available when -no_track_display is used.
msUntilDisplayed The time between the Present() call and when the frame was displayed, in milliseconds.
This column is not available when -no_track_display is used.
Dropped Whether the frame was dropped (1) or displayed (0). Note, if dropped, msUntilDisplayed will be 0.
msBetweenPresents The time between this Present() call and the previous one, in milliseconds.
msBetweenDisplayChange How long the previous frame was displayed before this Present() was displayed, in milliseconds.
This column is not available when -no_track_display is used.
WasBatched Whether the frame was submitted by the driver on a different thread than the app (1) or not (0).
This column is only available when -track_debug is used.
DwmNotified Whether the desktop compositor was notified about the frame (1) or not (0).
This column is only available when -track_debug is used.

The following values are used in the PresentMode column:

PresentMode Description
Hardware: Legacy Flip Indicates the app took ownership of the screen, and is swapping the displayed surface every frame.
Hardware: Legacy Copy to front buffer Indicates the app took ownership of the screen, and is copying new contents to an already-on-screen surface every frame.
Hardware: Independent Flip Indicates the app does not have ownership of the screen, but is still swapping the displayed surface every frame.
Composed: Flip Indicates the app is windowed, is using "flip model" swapchains, and is sharing its surfaces with DWM to be composed.
Hardware Composed: Independent Flip Indicates the app is using "flip model" swapchains, and has been granted a hardware overlay plane.
Composed: Copy with GPU GDI Indicates the app is windowed, and is copying contents into a surface that's shared with GDI.
Composed: Copy with CPU GDI Indicates the app is windowed, and is copying contents into a dedicated DirectX window surface. GDI contents are stored separately, and are composed together with DX contents by the DWM.

For more information on the performance implications of these, see:

Windows Mixed Reality

Note: Windows Mixed Reality support is in beta, with limited OS support and maintenance.

If -track_mixed_reality is used, a second CSV file will be generated with "_WMR" appended to the filename with the following columns:

Column Header Data Description
Application Process name (if known)
ProcessID Process ID
DwmProcessID Compositor Process ID
TimeInSeconds Time since PresentMon recording started
msBetweenLsrs Time between this Lsr CPU start and the previous one
AppMissed Whether Lsr is reprojecting a new (0) or old (1) App frame (App GPU work must complete before Lsr CPU start)
LsrMissed Whether Lsr displayed a new frame (0) or not (1+) at the intended V-Sync (Count V-Syncs with no display change)
msAppPoseLatency Time between App's pose sample and the intended mid-photon frame display
msLsrPoseLatency Time between Lsr's pose sample and the intended mid-photon frame display
msActualLsrPoseLatency Time between Lsr's pose sample and mid-photon frame display
msTimeUntilVsync Time between Lsr CPU start and the intended V-Sync
msLsrThreadWakeupToGpuEnd Time between Lsr CPU start and GPU work completion
msLsrThreadWakeupError Time between intended Lsr CPU start and Lsr CPU start
msLsrPreemption Time spent preempting the GPU with Lsr GPU work
msLsrExecution Time spent executing the Lsr GPU work
msCopyPreemption Time spent preempting the GPU with Lsr GPU cross-adapter copy work (if required)
msCopyExecution Time spent executing the Lsr GPU cross-adapter copy work (if required)
msGpuEndToVsync Time between Lsr GPU work completion and V-Sync
msBetweenAppPresents Time between App's present and the previous one.
msAppPresentToLsr Time between App's present and Lsr CPU start.
This column is not available when -no_track_display is used.
HolographicFrameID App's Holographic Frame ID.
This column is only available when -track_debug is used.
msSourceReleaseFromRenderingToLsrAcquire Time between composition end and Lsr acquire.
This column is only available when -track_debug is used.
msAppCpuRenderFrame Time between App's CreateNextFrame() API call and PresentWithCurrentPrediction() API call.
This column is only available when -track_debug is used.
msAppMisprediction Time between App's intended pose time and the intended mid-photon frame display.
This column is only available when -track_debug is used.
msLsrCpuRenderFrame Time between Lsr CPU render start and GPU work submit.
This column is only available when -track_debug is used.
msLsrThreadWakeupToCpuRenderFrameStart Time between Lsr CPU start and CPU render start.
This column is only available when -track_debug is used.
msCpuRenderFrameStartToHeadPoseCallbackStart Time between Lsr CPU render start and pose sample.
This column is only available when -track_debug is used.
msGetHeadPose Time between Lsr pose sample start and pose sample end.
This column is only available when -track_debug is used.
msHeadPoseCallbackStopToInputLatch Time between Lsr pose sample end and input latch.
This column is only available when -track_debug is used.
msInputLatchToGpuSubmission Time between Lsr input latch and GPU work submit.
This column is only available when -track_debug is used.

Known issues

See GitHub Issues for a current list of reported issues.

User access denied

PresentMon needs to be run by a user who is a member of the "Performance Log Users" user group, or to be run with administrator privilege. If neither of these are true, you will get an error "failed to start trace session (access denied)".

To add a user to the "Performance Log Users" user group:

  1. Run compmgmt.msc as administrator.
  2. In the "Computer Management" window, expand "System Tools", expand "Local Users and Groups", and then click "Groups".
  3. Double-click "Performance Log Users", and then click "Add".
  4. In the "Enter the object names to select" text box, type the name of the user account or group account that you want to add, and then click "OK".
  5. Sign out and log back in for the changes to take effect.

If PresentMon is not run with administrator privilege, it will not have complete process information for processes running on different user accounts. Such processes will be listed in the console and CSV as "<error>", and they cannot be targeted by name (-process_name).

Analyzing OpenGL and Vulkan applications

Applications that do not use D3D9 or DXGI APIs for presenting frames (e.g., as is typical with OpenGL or Vulkan applications) will report the following:

  • Runtime = Other
  • SwapChainAddress = 0
  • msInPresentAPI = 0

In this case, TimeInSeconds will represent the first time the present is observed in the kernel, as opposed to the runtime, and therefore will be sometime after the application presented the frame (typically ~0.5ms). Since msUntilRenderComplete and msUntilDisplayed are deltas from TimeInSeconds, they will be correspondingly smaller then they would have been if measured from application present. msBetweenDisplayChange will still be correct, and msBetweenPresents should be correct on average.

Measuring application latency

PresentMon doesn't collect metrics for user input, so there is no direct measure for input-to-display latency in the CSV. However, if you assume the application collects user input immediately after presenting the previous frame, then a subset of the latency can be computed by finding the previous CSV row that uses the same swap chain and then computing:

msInputLatency = msBetweenPresents + msUntilDisplayed - previous(msInPresentAPI)

This is a subset of the true input-to-display latency and doesn't include:

  • time spent processing input in the keyboard/controller hardware or drivers (typically a fixed additional overhead),
  • any time that input events are queued before being used by the target application (which varies, potentially up to one frame longer),
  • time spent processing the output in the display hardware or drivers (typically a fixed additional overhead), and
  • a combination of display blanking interval and scan time (which varies, depending on timing and tearing).

Shutting down PresentMon on Windows 7

Some users have observed system stability issues when forcibly shutting down PresentMon on Windows 7. If you are having similar issues, they can be avoided by using Ctrl+C in the PresentMon window to shut it down.

More Repositories

1

IntroductionToVulkan

Source code examples for "API without Secrets: Introduction to Vulkan" tutorial
C++
1,273
star
2

MaskedOcclusionCulling

Example code for the research paper "Masked Software Occlusion Culling"; implements an efficient alternative to the hierarchical depth buffer algorithm.
C++
592
star
3

XeGTAO

An implementation of [Jimenez et al., 2016] Ground Truth Ambient Occlusion, MIT license
C++
589
star
4

GTS-GamesTaskScheduler

A task scheduling framework designed for the needs of game developers.
C++
437
star
5

ISPCTextureCompressor

ISPC Texture Compressor
C++
426
star
6

OcclusionCulling

Demonstrates a software (CPU) based approach to occllusion culling using multi-threading and SIMD instructions to improve performance.
C++
383
star
7

Intel-Texture-Works-Plugin

Intel has extended Photoshop* to take advantage of the latest image compression methods (BCn/DXT) via plugin. The purpose of this plugin is to provide a tool for artists to access superior compression results at optimized compression speeds within Photoshop*.
C++
247
star
8

ASSAO

Adaptive Screen Space Ambient Occlusion
C++
245
star
9

MetricsGui

Library of ImGui controls for displaying performance metrics.
C++
237
star
10

OutdoorLightScattering

Outdoor Light Scattering Sample
C++
236
star
11

DynamicCheckerboardRendering

Checkerboard Rendering and Dynamic Resolution Rendering in the DX12 MiniEngine
C++
174
star
12

OpenGL-ES-3.0-Deferred-Rendering

OpenGL ES 3.0 Deferred Renderer
C
142
star
13

CMAA2

Conservative Morphological Anti-Aliasing 2.0
C++
129
star
14

asteroids_d3d12

Intel Asteroids DirectX 12 Sample
C++
128
star
15

PracticalVulkan

Repository with code samples for "API without Secrets: The Practical Approach to Vulkan" series of articles.
C++
119
star
16

IntelShaderAnalyzer

Command line tool for offline shader ISA inspection.
C++
118
star
17

stardust_vulkan

The Stardust sample application uses the Vulkan graphics API to efficiently render a cloud of animated particles.
C
116
star
18

gpudetect

An example application that demonstrates how to detect which Intel GPU is present, as well as architecture-specific information such as how much memory is available.
C++
115
star
19

SamplerFeedbackStreaming

This sample uses D3D12 Sampler Feedback and DirectStorage as part of an asynchronous texture streaming solution.
C++
105
star
20

TAA

Intel® Graphics Optimized Temporal Anti-Aliasing (TAA)
C++
100
star
21

LightScattering

Source code for the light scattering sample
C++
96
star
22

VRS-DoF

Variable Rate Shading and Depth of Field
C++
95
star
23

CloudsGPUPro6

C++
91
star
24

DX12-Multi-Adapter

DirectX 12 Explicit Heterogeneous Multi-adapter Sample implementing Split Frame Rendering
C++
82
star
25

CloudySky

Cloud Rendering Sample
C++
79
star
26

XeSSUnrealPlugin

Intel® XeSS Plugin for Unreal* Engine
78
star
27

FlipModelD3D12

Interactive visualization for understanding swap chains in D3D12
C++
76
star
28

FaceMapping2

This is an improvement on the original FaceMapping code sample. This sample uses Intel RealSense to scan the user's face, and map it onto 3d head mesh.
C++
73
star
29

ClusteredShadingAndroid

Clustered shading on Android sample
C
68
star
30

HybridDetect

Heterogeneous & Homogeneous CPU Detect for Intel Processors
C++
65
star
31

DeferredCoarsePixelShading

Deferred Coarse Pixel Shading Source Code (For the article in GPU Pro 7)
C++
62
star
32

UnrealCapabilityDetect

A plugin for system capability detect in Unreal Engine 4
C++
55
star
33

FaceMapping

The face mapping sample uses the 3D Scan module to scan the user's face and then map it onto an existing 3D head model. This technique does a "stone face" mapping that is not rigged or currently capable of animating.
C++
44
star
34

FaceTracking

Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation
Logos
42
star
35

AOIT-Update

Adaptive Order Independant Transparency Sample
C++
39
star
36

UE4_GPA_Plugin

Intel® Graphics Performance Analyzer plugin for Unreal Engine* 4
C++
39
star
37

DynamicResolutionRendering

DynamicResolutionRendering_source_V3_update
C++
32
star
38

UE4RealSensePlugin

This UE4 plugin provides support for the Intel RealSense SDK to Unreal Engine 4 developers by exposing features of the SDK to the Blueprints Visual Scripting System.
C++
28
star
39

Multi-Adapter-Particles

Demonstration of Integrated + Discrete Multi-Adapter modified from Microsoft's D3D12nBodyGravity
C++
26
star
40

D3D12VariableRateShading

A simple D3D12 example demonstrating how to use Intel Tier 1 Variable Rate Shading
C++
21
star
41

MetricsDiscoveryHelper

A wrapper for Intel(R) MetricsDiscovery API that simplifies some common tasks and provides a more unified interface across different graphics APIs.
C++
20
star
42

UnityPerformanceSandbox

Project that can be used to learn Graphics Performance Analyzer toolkit by following along Unity* Optimization Guide for Intel x86 Platforms article. https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-1
C#
17
star
43

ChatHeads

Chat Heads is a native sample that uses RealSense to overlay background segmented (BGS) player images on a 3D scene or video playback in a multiplayer scenario.
C++
16
star
44

VALAR

Velocity And Luminance Adaptive Rasterization
C++
13
star
45

GrassInstancing

Grass rendering using geometry instancing in Direct3D 10.
C++
13
star
46

OpenGLESTessellation

Sample demonstrating the use of tessellation shaders with OpenGLES
C
13
star
47

Windows-Desktop-Sensors

Sample demonstrating how to use sensors for Windows Desktop
C++
8
star
48

EZSIMD

C++
8
star
49

64-bit-Typed-Atomics-Extension

C
8
star
50

XeSS-VALAR-Demo

Mini-Engine Demonstration of Combining XeSS with VRS Tier 2.
C++
8
star
51

CPU_Capability_Tester

C#
7
star
52

RCRaceland

Unreal Engine 4 sample showing how to take advantage of the CPU for more realistic scenes
C++
5
star
53

CmdThrottlePolicy

sample showing how to use the DX12 CmdThrottlePolicy Extension
C
4
star
54

VALAR-API

C
3
star
55

InstantAccess_Tiling

C++
3
star
56

AdaptiveSync

Demo and Library for Adaptive Sync
C++
3
star
57

gametechdev.github.io

HTML
1
star