• This repository has been archived on 04/Jan/2023
  • Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Conservative Morphological Anti-Aliasing 2.0

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

Conservative Morphological Anti-Aliasing version 2 (CMAA2)

This repository contains implementation of CMAA2, a post-process anti-aliasing solution focused on providing good anti-aliasing while minimizing the change (i.e. blurring) of the source image at minimal execution cost.

Details of the implementation as well as quality and performance analysis are provided in the accompanying article at https://software.intel.com/en-us/articles/conservative-morphological-anti-aliasing-20.

Sample code in this repository is a DirectX 11 and DirectX 12 Compute Shader 5.0 HLSL reference implementation optimized for modern PC GPU hardware.

Sample overview

The sample application requires Windows 10 and Visual Studio 2017. Once built and started, the application will load the Amazon Lumberyard Bistro dataset (https://developer.nvidia.com/orca/amazon-lumberyard-bistro).

Alt text

Testing with your own images

UI on the left side provides the 'Scene' selection used to change the 3D scene or a to switch to using a static image as the input ('StaticImage' option). StaticImage option is useful for quickly testing CMAA2 or one of the other provided post-process AA effects on a screenshot captured from any external workload. To add your images to the StaticImage list simply copy the image file in the .png format to the \CMAA2\Projects\CMAA2\Media\TestScreenshots\ path and restart the application.

Switching between DirectX 11 and DirectX 12 implementations or hardware adapters

On the right UI panel tab named "System & Performance", under the settings for resolution, fullscreen and vsync, there is a new button for selecting the hardware adapter and the DirectX API version to use, out of what is available on the system. Changing the API/adapter will restart the application and the new settings will be remembered in the 'APIAdapter' file located in the folder where the executable is located.

AA options

Below is the list of selectable anti-aliasing options ('AA option'). To quickly change between them for comparison, '1' - '9' keyboard keys can also be used.

Various testing tools

For more detailed comparison, we provide 'ZoomTool' which, when enabled, allows zooming on a specific region of pixels for closer inspection.

In addition we also provide 'CompareTool' which can be used to capture a reference image ('Save ref' button) and compare it to the currently rendered image either numerically ('Compare with ref' button provides PSNR between the two) or visually (using 'Visualization' combo box to show the difference).

Finally, tools for automated testing of quality and peformance of the various AA options are available in the 'Benchmarking' section.

Code integration guide

This integration guide covers basic steps needed for integration into an engine and relies on the sample code in this repository. Relevant source code files are CMAA2/CMAA2.hlsl (used by both DX11 and DX12 versions) and CMAA2/vaCMAA2DX11.cpp (for DX11) and/or CMAA2/vaCMAA2DX12.cpp (for DX12).

Shaders

The effect uses 3 main compute shader kernels out of which the first one (EdgesColor2x2CS) is a standard Dispatch type and the second two (ProcessCandidatesCS and DeferredColorApply2x2CS) are DispatchIndirect type. There is also one helper kernel (ComputeDispatchArgsCS) used to set up indirect dispatch argument buffer.

All the shader code is contained in CMAA2/CMAA2.hlsl and various features and integration options are controlled using macros. The only macros without defaults are related to input/output color format and UAV typed store hardware capabilities and must be set up as explained in next section.

Inputs/Outputs

CMAA2 takes the single color image and applies anti-aliasing in-place. There is no fullscreen copy, only the required texels are changed. It can also optionally take precomputed luma as input since this is usually available as a byproduct of tonemapping, which allows for a faster execution path (significantly reducing memory bandwidth during edge detection). The input color Shader Resource View should be in the standard color format used for linear sampling (for ex, R8G8B8A8_UNORM_SRGB) and the UAV in the best matching format supported for typed UAV stores (for ex, R8G8B8A8_UNORM, explained below).

Due to lack of hardware UAV Typed Store format support for certain formats, we use CMAA2_UAV_STORE_TYPED, CMAA2_UAV_STORE_TYPED_UNORM_FLOAT and CMAA2_UAV_STORE_CONVERT_TO_SRGB to define the optimal path with regards to used color format and hardware support. For example, if the application is using R8G8B8A8_UNORM_SRGB color format, and the hardware does not support UAV typed stores for _SRGB but does support typed stores for R8G8B8A8_UNORM (most common scenario), we will use the following settings:

#define CMAA2_UAV_STORE_TYPED               1   // use typed UAV store
#define CMAA2_UAV_STORE_CONVERT_TO_SRGB     1   // manually convert to SRGB so we can use non-SRGB typed store because typed stores for SRGB are not supported
#define CMAA2_UAV_STORE_TYPED_UNORM_FLOAT   1   // required for non-float semantics correctness (RWTexture2D<unorm float4>)

In another example, if the application is using R11G11B10_FLOAT, all modern hardware will support direct typed UAV stores in which case the settings are simply:

#define CMAA2_UAV_STORE_TYPED               1   // use typed UAV store
#define CMAA2_UAV_STORE_CONVERT_TO_SRGB     0   // no need to convert to SRGB - R11G11B10_FLOAT does not use SRGB encoding
#define CMAA2_UAV_STORE_TYPED_UNORM_FLOAT   0   // not required for non-float semantics correctness (RWTexture2D<float4>)

With regards to edge detection, CMAA2 can work in two modes. The first is based on weighted average of per-channel color differences (marginally better quality) and the second is based on the luma difference (faster, default). These modes are controlled using CMAA2_EDGE_DETECTION_LUMA_PATH macro using the following settings:

  • 0 enables color-based edge detection path
  • 1 enables luma-based edge detection path where luma is computed from input colors inplace (simplest)
  • 2 enables luma-based edge detection path where precomputed luma is loaded from a separate R8_UNORM input texture (best performance, ideal)
  • 3 enables luma-based edge detection path where precomputed luma is loaded from alpha channel of the original input color texture (faster than 1 but slower than 2 and not an option for color formats with no alpha channel like R11G11B10_FLOAT)

The detailed example of how to handle most format and detect hardware support is provided in vaCMAA2DX11.cpp from line 384.

Temporary working buffers

CMAA2 requires a couple of working buffers for storing intermediate data. This data is only required during CMAA2 execution (there is no required persistence between CMAA2 invocations, so the memory can be used for other purposes if needed).

The amount of temporary storage memory required by default is roughly width * height * 5 bytes. This which can be reduced by processing the image in smaller tiles at the expense of complexity.

These storage buffers are:

  • working edges texture : stores the edges output by EdgeColor2x2CS
  • working shape candidates buffer : stores potential shapes for further processing, filled in EdgesColor2x2CS and read/used by ProcessCandidatesCS
  • working deferred blend buffers : these 3 buffers store a linked list of per-pixel anti-aliased color values that are output by ProcessCandidatesCS and consumed and applied to the final texture by DeferredColorApply2x2CS

For the details on buffer creation, sizes and formats please refer to vaCMAA2DX11.cpp from line 489.

Compute calls

The execution steps are:

  1. EdgesColor2x2CS: takes color (or luma) texture SRV as the input and computes edges and starting points for further processing as well as clearing some of the temporary buffers
  2. (ComputeDispatchArgsCS): computes the arguments for the next kernel's DispatchIndirect call
  3. ProcessCandidatesCS: takes the output from EdgesColor2x2CS and the source color texture SRV and do all color processing on required location, storing final color results into temporary storage
  4. (ComputeDispatchArgsCS): computes the arguments for the next kernel's DispatchIndirect call
  5. DeferredColorApply2x2CS: takes all output values from ProcessCandidatesCS and deterministically blends them back into the source (now output) texture using UAV

For details on the draw call setup please refer to 'vaCMAA2DX11::Execute' in vaCMAADX11.cpp from line 609.

Settings

The most relevant quality and performance settings are:

  • CMAA2_STATIC_QUALITY_PRESET: sets quality level; acceptable values are 0, 1, 2, 3 (defaults to 2) with 0 being lowest quality higher performance and 3 being highest quality lowest performance option.
  • CMAA2_EXTRA_SHARPNESS: enable/disable (0 - off is default) 'extra sharp' path that further minimizes image changes at the cost of avoiding some anti-aliasing.

There are no dynamic quality settings in this implementation for simplicity & performance reasons; however the edge detection threshold can be changed at runtime if needed for more granular and dynamic quality/performance setup.

Credits

This sample uses following code and libraries:

License

CMAA2 is licensed under Apache-2 License, see license.txt for more information.

More Repositories

1

PresentMon

Capture and analyze the high-level performance characteristics of graphics applications on Windows.
C++
1,576
star
2

IntroductionToVulkan

Source code examples for "API without Secrets: Introduction to Vulkan" tutorial
C++
1,273
star
3

MaskedOcclusionCulling

Example code for the research paper "Masked Software Occlusion Culling"; implements an efficient alternative to the hierarchical depth buffer algorithm.
C++
592
star
4

XeGTAO

An implementation of [Jimenez et al., 2016] Ground Truth Ambient Occlusion, MIT license
C++
589
star
5

GTS-GamesTaskScheduler

A task scheduling framework designed for the needs of game developers.
C++
437
star
6

ISPCTextureCompressor

ISPC Texture Compressor
C++
426
star
7

OcclusionCulling

Demonstrates a software (CPU) based approach to occllusion culling using multi-threading and SIMD instructions to improve performance.
C++
383
star
8

Intel-Texture-Works-Plugin

Intel has extended Photoshop* to take advantage of the latest image compression methods (BCn/DXT) via plugin. The purpose of this plugin is to provide a tool for artists to access superior compression results at optimized compression speeds within Photoshop*.
C++
247
star
9

ASSAO

Adaptive Screen Space Ambient Occlusion
C++
245
star
10

MetricsGui

Library of ImGui controls for displaying performance metrics.
C++
237
star
11

OutdoorLightScattering

Outdoor Light Scattering Sample
C++
236
star
12

DynamicCheckerboardRendering

Checkerboard Rendering and Dynamic Resolution Rendering in the DX12 MiniEngine
C++
174
star
13

OpenGL-ES-3.0-Deferred-Rendering

OpenGL ES 3.0 Deferred Renderer
C
142
star
14

asteroids_d3d12

Intel Asteroids DirectX 12 Sample
C++
128
star
15

PracticalVulkan

Repository with code samples for "API without Secrets: The Practical Approach to Vulkan" series of articles.
C++
119
star
16

IntelShaderAnalyzer

Command line tool for offline shader ISA inspection.
C++
118
star
17

stardust_vulkan

The Stardust sample application uses the Vulkan graphics API to efficiently render a cloud of animated particles.
C
116
star
18

gpudetect

An example application that demonstrates how to detect which Intel GPU is present, as well as architecture-specific information such as how much memory is available.
C++
115
star
19

SamplerFeedbackStreaming

This sample uses D3D12 Sampler Feedback and DirectStorage as part of an asynchronous texture streaming solution.
C++
105
star
20

TAA

Intel® Graphics Optimized Temporal Anti-Aliasing (TAA)
C++
100
star
21

LightScattering

Source code for the light scattering sample
C++
96
star
22

VRS-DoF

Variable Rate Shading and Depth of Field
C++
95
star
23

CloudsGPUPro6

C++
91
star
24

DX12-Multi-Adapter

DirectX 12 Explicit Heterogeneous Multi-adapter Sample implementing Split Frame Rendering
C++
82
star
25

CloudySky

Cloud Rendering Sample
C++
79
star
26

XeSSUnrealPlugin

Intel® XeSS Plugin for Unreal* Engine
78
star
27

FlipModelD3D12

Interactive visualization for understanding swap chains in D3D12
C++
76
star
28

FaceMapping2

This is an improvement on the original FaceMapping code sample. This sample uses Intel RealSense to scan the user's face, and map it onto 3d head mesh.
C++
73
star
29

ClusteredShadingAndroid

Clustered shading on Android sample
C
68
star
30

HybridDetect

Heterogeneous & Homogeneous CPU Detect for Intel Processors
C++
65
star
31

DeferredCoarsePixelShading

Deferred Coarse Pixel Shading Source Code (For the article in GPU Pro 7)
C++
62
star
32

UnrealCapabilityDetect

A plugin for system capability detect in Unreal Engine 4
C++
55
star
33

FaceMapping

The face mapping sample uses the 3D Scan module to scan the user's face and then map it onto an existing 3D head model. This technique does a "stone face" mapping that is not rigged or currently capable of animating.
C++
44
star
34

FaceTracking

Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation
Logos
42
star
35

AOIT-Update

Adaptive Order Independant Transparency Sample
C++
39
star
36

UE4_GPA_Plugin

Intel® Graphics Performance Analyzer plugin for Unreal Engine* 4
C++
39
star
37

DynamicResolutionRendering

DynamicResolutionRendering_source_V3_update
C++
32
star
38

UE4RealSensePlugin

This UE4 plugin provides support for the Intel RealSense SDK to Unreal Engine 4 developers by exposing features of the SDK to the Blueprints Visual Scripting System.
C++
28
star
39

Multi-Adapter-Particles

Demonstration of Integrated + Discrete Multi-Adapter modified from Microsoft's D3D12nBodyGravity
C++
26
star
40

D3D12VariableRateShading

A simple D3D12 example demonstrating how to use Intel Tier 1 Variable Rate Shading
C++
21
star
41

MetricsDiscoveryHelper

A wrapper for Intel(R) MetricsDiscovery API that simplifies some common tasks and provides a more unified interface across different graphics APIs.
C++
20
star
42

UnityPerformanceSandbox

Project that can be used to learn Graphics Performance Analyzer toolkit by following along Unity* Optimization Guide for Intel x86 Platforms article. https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-1
C#
17
star
43

ChatHeads

Chat Heads is a native sample that uses RealSense to overlay background segmented (BGS) player images on a 3D scene or video playback in a multiplayer scenario.
C++
16
star
44

VALAR

Velocity And Luminance Adaptive Rasterization
C++
13
star
45

GrassInstancing

Grass rendering using geometry instancing in Direct3D 10.
C++
13
star
46

OpenGLESTessellation

Sample demonstrating the use of tessellation shaders with OpenGLES
C
13
star
47

Windows-Desktop-Sensors

Sample demonstrating how to use sensors for Windows Desktop
C++
8
star
48

EZSIMD

C++
8
star
49

64-bit-Typed-Atomics-Extension

C
8
star
50

XeSS-VALAR-Demo

Mini-Engine Demonstration of Combining XeSS with VRS Tier 2.
C++
8
star
51

CPU_Capability_Tester

C#
7
star
52

RCRaceland

Unreal Engine 4 sample showing how to take advantage of the CPU for more realistic scenes
C++
5
star
53

CmdThrottlePolicy

sample showing how to use the DX12 CmdThrottlePolicy Extension
C
4
star
54

VALAR-API

C
3
star
55

InstantAccess_Tiling

C++
3
star
56

AdaptiveSync

Demo and Library for Adaptive Sync
C++
3
star
57

gametechdev.github.io

HTML
1
star