• This repository has been archived on 04/Jan/2023
  • Stars
    star
    100
  • Rank 340,703 (Top 7 %)
  • Language
    C++
  • License
    Other
  • Created almost 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Intel® Graphics Optimized Temporal Anti-Aliasing (TAA)

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

Intel® Graphics Optimized TAA

Implementation:

This DirectX® 12 Compute Shader Temporal Anti-Aliasing (TAA) implementation has been optimized so that TAA performs better on Intel® Graphics Gen11 integrated GPUs, as well as Intel® Iris® Xe Graphics integrated and discrete GPUs. Yet the same code can still yield strong TAA quality and performance on other discrete GPUs. We demonstrate this new optimized TAA, inside of the well known Microsoft MiniEngine codebase. TAA is rendering technique that reuses colour samples from previous frames to achieve temporal super-sampling. It is a post-processing technique that takes as input screen velocity map, the previous frame’s colour buffer (aka accumulation buffer or history buffer), the previous frame’s depth buffer and current frame color buffer and its depth buffer. Samples from the previous frame are re-projected using the velocity map to the current frame’s pixel location, and then used to achieve temporal super-sampling. In addition, subpixel jitter is used in the main projection matrix or to the viewport (Microsoft MiniEngine uses the latter) during rendering so that the sampled pixels come from different subpixel locations even when the camera (view) is not moving between frames. Please, refer to TAAResolve.hlsl and TemporalEffects.cpp for more detailed description about implementation.

With quality settings the TAA takes as little as:

  • 1.9ms on Intel® Iris® Xe Graphics (i7-1186G7) or
  • 3.6ms on Intel® Iris® Plus Graphics (i7-1065G7) to anti-alias a 1080p 32bpp frame buffer.

With performance settings the TAA takes only 0.9ms, 1.7ms respectively.

To achieve such performance a certain optimizations have been made:

  • fp16 optimization. That is, the shader is optimized for 16-bit floating point calculations except two where lack of precision was too visible. To toggle fp16 support set #define USE_FP16 to 1 and enable SM6.2 16-bit precision in the shader compiler by adding "-enable-16bit-types" option. Enabling fp16 improves peformance by x1.04,
  • toggable thread global shared memory (#define USE_TGSM) that is a win in 64bpp (x1.05) but performs slightly worse in 32bpp with VarianceClipping rotating grid that changes every other frame either x or + pattern,
  • VarianceClipping instead of using the rotating grid may use 3x3 sampling pattern and a ray-AABB intersection to ensure maximum quality. By default a clipping to AABB boundaries is used. VarianceClipping should not be switched off as it is the main anti-ghosting solution. To control it go to #define USE_VARIANCE_CLIPPING,
  • colour space used for the variance clipping can be either RGB or YCoCg. The latter gives the best quality where the former may in certain scenarios cause the image to be slightly more reddish (the performance speed up is x1.03). Refer to #define USE_YCOCG_SPACE,
  • toggable using the longest velocity vector in the closest neighbourhood. That greatly improves the anti-aliasing quality of edges - #define USE_LONGEST_VELOCITY_VECTOR in the shader code,
  • history buffer can by sampled by either using a Bicubic sampling or Bilinear sampling, the former gives very good quality (no sharpening is needed) but using the latter gives x1.2 performance speedup, and many more.

Switches to control quality/performance ratio:

  • #USE_DEPTH_THRESHOLD - checks the depth buffers (current and previous frame) and if the depth difference is larger than the expected value (which is stored with Motion Vectors), the pixel is marked as no-history. This option helps removing ghosting artifacts.
  • #USE_VARIANCE_CLIPPING - the main algorithm for removing ghosting artifacts. It calculates mean/standard deviation to build an AABB of expected colour values in the given pixel neighbourhood and then clamp (cheaper) or intersect (better quality) the history colour against it to remove ghosting/too bright pixels. AABB min/max is calculated: mean -/+ Gamma * standard_deviation. Larger Gamma improves temporally stable results but may increase ghosting hence a confidence factor is used to lerp between stability and anti-ghosting. The confidence factor is build from the velocity of the current pixel and its depth difference between frames. The variance clipping should always be enabled in either option because this is the main solution for ghosting artifacts. Gamma is controllable using two values: MIN_VARIANCE_GAMMA and MAX_VARIANCE_GAMMA. The former is used during movement and the latter on still image. The ration between them is calculated using the motion vector.
  • #USE_YCOCG_SPACE - The YCoCg colour space is used for Variance Clipping. It improves precision of the AABB intersection. Using RGB space image may get more reddish in certain scenarios.
  • #ALLOW_NEIGHBOURHOOD_SAMPLING - If there’s no history a neighbourhood in the cross pattern is sampled.
  • #USE_BICUBIC_FILTER - Bicubic (5-tap) filter is preferred sampling option for the history buffer (temporal accumulation of previous samples). If disabled, Bilinear filtering is used. Bilinear introduces more blur to the final image but is faster.
  • #USE_LONGEST_VELOCITY_VECTOR - When reading Velocity vector (Motion vector's Z component) for the given pixel, its neighbourhood is sampled and then the longest vector is chosen. This greatly improves edges quality under motion. Unfortunately it costs, hence 9 samples or 5 samples options are provided. Disabling the option and leaving Bicubic filterin on enables a mixed solution: Bilinear on edges and Bicubic anywhere else. This is for testing mostly. Bilinear introduces more blur that softens edges. Edge detection is done by Depth Gathering and checking whether min/max is larger than a threshold.
  • #FRAME_VELOCITY_IN_PIXELS_DIFF should be tweaked based on resolution. Empirically 128 has been set for 1080p. If the motion vector lenght is larger than this number, the pixel is marked as no-history.
  • #VARIANCE_BBOX_NUMBER_OF_SAMPLES - whether to use 9 samples or 5 samples to build the AABB for Variance Clipping.
  • #VARIANCE_INTERSECTION_MAX_T - max "distance" between source colour and target colour for USE_VARIANCE_CLIPPING set to the intersection mode. Setting this to a larger value allows more bright pixels from the history buffer to be leaved unchanged.
  • #KEEP_HISTORY_TONE_MAPPED - whether to keep all colour calculation on tone-mapped colours. This is default option. The output buffer which becomes the history buffer in a next frame is tone-mapped. Next post-process should take this into account.
  • #USE_TONE_MAPPED_COLOUR_ONLY_IN_FINAL - currently this is set as !KEEP_HISTORY_TONE_MAPPED however it may be controlled manually if needed.
  • #USE_TGSM - allows using thread group shared memory to store current frame colour
  • #USE_FP16 - allows to use 16 bit precision for floating point calculations. This requires SM6.2 and using "-enable-16bit-types" as a compiler option.
  • #ENABLE_DEBUG - whether to pass parameters from the UI to the shader.

Getting started:

  • Open ModelViewer/ModelViewer_VS16.sln
  • Select configuration: Debug (full validation), Profile (instrumented), Release
  • Select platform
  • Build and run
  • TAA implementation is at TAAResolve.hlsl

Controls:

  • forward/backward/strafe: left thumbstick or WASD (FPS controls)
  • up/down: triggers or E/Q
  • yaw/pitch: right thumbstick or mouse
  • toggle slow movement: click left thumbstick or lshift
  • open debug menu: back button or backspace
  • navigate debug menu: dpad or arrow keys
  • toggle debug menu item: A button or return
  • adjust debug menu value: dpad left/right or left/right arrow keys

More Repositories

1

PresentMon

Capture and analyze the high-level performance characteristics of graphics applications on Windows.
C++
1,576
star
2

IntroductionToVulkan

Source code examples for "API without Secrets: Introduction to Vulkan" tutorial
C++
1,273
star
3

MaskedOcclusionCulling

Example code for the research paper "Masked Software Occlusion Culling"; implements an efficient alternative to the hierarchical depth buffer algorithm.
C++
592
star
4

XeGTAO

An implementation of [Jimenez et al., 2016] Ground Truth Ambient Occlusion, MIT license
C++
589
star
5

GTS-GamesTaskScheduler

A task scheduling framework designed for the needs of game developers.
C++
437
star
6

ISPCTextureCompressor

ISPC Texture Compressor
C++
426
star
7

OcclusionCulling

Demonstrates a software (CPU) based approach to occllusion culling using multi-threading and SIMD instructions to improve performance.
C++
383
star
8

Intel-Texture-Works-Plugin

Intel has extended Photoshop* to take advantage of the latest image compression methods (BCn/DXT) via plugin. The purpose of this plugin is to provide a tool for artists to access superior compression results at optimized compression speeds within Photoshop*.
C++
247
star
9

ASSAO

Adaptive Screen Space Ambient Occlusion
C++
245
star
10

MetricsGui

Library of ImGui controls for displaying performance metrics.
C++
237
star
11

OutdoorLightScattering

Outdoor Light Scattering Sample
C++
236
star
12

DynamicCheckerboardRendering

Checkerboard Rendering and Dynamic Resolution Rendering in the DX12 MiniEngine
C++
174
star
13

OpenGL-ES-3.0-Deferred-Rendering

OpenGL ES 3.0 Deferred Renderer
C
142
star
14

CMAA2

Conservative Morphological Anti-Aliasing 2.0
C++
129
star
15

asteroids_d3d12

Intel Asteroids DirectX 12 Sample
C++
128
star
16

PracticalVulkan

Repository with code samples for "API without Secrets: The Practical Approach to Vulkan" series of articles.
C++
119
star
17

IntelShaderAnalyzer

Command line tool for offline shader ISA inspection.
C++
118
star
18

stardust_vulkan

The Stardust sample application uses the Vulkan graphics API to efficiently render a cloud of animated particles.
C
116
star
19

gpudetect

An example application that demonstrates how to detect which Intel GPU is present, as well as architecture-specific information such as how much memory is available.
C++
115
star
20

SamplerFeedbackStreaming

This sample uses D3D12 Sampler Feedback and DirectStorage as part of an asynchronous texture streaming solution.
C++
105
star
21

LightScattering

Source code for the light scattering sample
C++
96
star
22

VRS-DoF

Variable Rate Shading and Depth of Field
C++
95
star
23

CloudsGPUPro6

C++
91
star
24

DX12-Multi-Adapter

DirectX 12 Explicit Heterogeneous Multi-adapter Sample implementing Split Frame Rendering
C++
82
star
25

CloudySky

Cloud Rendering Sample
C++
79
star
26

XeSSUnrealPlugin

Intel® XeSS Plugin for Unreal* Engine
78
star
27

FlipModelD3D12

Interactive visualization for understanding swap chains in D3D12
C++
76
star
28

FaceMapping2

This is an improvement on the original FaceMapping code sample. This sample uses Intel RealSense to scan the user's face, and map it onto 3d head mesh.
C++
73
star
29

ClusteredShadingAndroid

Clustered shading on Android sample
C
68
star
30

HybridDetect

Heterogeneous & Homogeneous CPU Detect for Intel Processors
C++
65
star
31

DeferredCoarsePixelShading

Deferred Coarse Pixel Shading Source Code (For the article in GPU Pro 7)
C++
62
star
32

UnrealCapabilityDetect

A plugin for system capability detect in Unreal Engine 4
C++
55
star
33

FaceMapping

The face mapping sample uses the 3D Scan module to scan the user's face and then map it onto an existing 3D head model. This technique does a "stone face" mapping that is not rigged or currently capable of animating.
C++
44
star
34

FaceTracking

Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation
Logos
42
star
35

AOIT-Update

Adaptive Order Independant Transparency Sample
C++
39
star
36

UE4_GPA_Plugin

Intel® Graphics Performance Analyzer plugin for Unreal Engine* 4
C++
39
star
37

DynamicResolutionRendering

DynamicResolutionRendering_source_V3_update
C++
32
star
38

UE4RealSensePlugin

This UE4 plugin provides support for the Intel RealSense SDK to Unreal Engine 4 developers by exposing features of the SDK to the Blueprints Visual Scripting System.
C++
28
star
39

Multi-Adapter-Particles

Demonstration of Integrated + Discrete Multi-Adapter modified from Microsoft's D3D12nBodyGravity
C++
26
star
40

D3D12VariableRateShading

A simple D3D12 example demonstrating how to use Intel Tier 1 Variable Rate Shading
C++
21
star
41

MetricsDiscoveryHelper

A wrapper for Intel(R) MetricsDiscovery API that simplifies some common tasks and provides a more unified interface across different graphics APIs.
C++
20
star
42

UnityPerformanceSandbox

Project that can be used to learn Graphics Performance Analyzer toolkit by following along Unity* Optimization Guide for Intel x86 Platforms article. https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-1
C#
17
star
43

ChatHeads

Chat Heads is a native sample that uses RealSense to overlay background segmented (BGS) player images on a 3D scene or video playback in a multiplayer scenario.
C++
16
star
44

VALAR

Velocity And Luminance Adaptive Rasterization
C++
13
star
45

GrassInstancing

Grass rendering using geometry instancing in Direct3D 10.
C++
13
star
46

OpenGLESTessellation

Sample demonstrating the use of tessellation shaders with OpenGLES
C
13
star
47

Windows-Desktop-Sensors

Sample demonstrating how to use sensors for Windows Desktop
C++
8
star
48

EZSIMD

C++
8
star
49

64-bit-Typed-Atomics-Extension

C
8
star
50

XeSS-VALAR-Demo

Mini-Engine Demonstration of Combining XeSS with VRS Tier 2.
C++
8
star
51

CPU_Capability_Tester

C#
7
star
52

RCRaceland

Unreal Engine 4 sample showing how to take advantage of the CPU for more realistic scenes
C++
5
star
53

CmdThrottlePolicy

sample showing how to use the DX12 CmdThrottlePolicy Extension
C
4
star
54

VALAR-API

C
3
star
55

InstantAccess_Tiling

C++
3
star
56

AdaptiveSync

Demo and Library for Adaptive Sync
C++
3
star
57

gametechdev.github.io

HTML
1
star