• Stars
    star
    343
  • Rank 123,371 (Top 3 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

optimized screen-space ambient occlusion, cache-aware hbao

gl ssao

This sample implements screen space ambient occlusion (SSAO) using horizon-based ambient occlusion (HBAO). You can find some details about HBAO here. It provides two alternative implementations the original hbao as well as an enhanced version that is more efficient in improved leveraging of the hardware's texture sampling cache, using de-interleaved texturing.

sample screenshot

Note: This sample provides an improved HBAO algorithm, however it is not same as HBAO+ which is part of NVIDIA ShadowWorks and improves the quality and performance of the algorithm further.

HBAO - Classic:

  • To achieve the effect a 4x4 texture that contains random directions is tiled across the screen and used to sample the neighborhood of a pixel's depth values.
  • The distance of the sampling depends on a customize-able world-size radius, for which the depth-buffer values are typically linearized first.
  • As the sampling radius depends on the pixel's depth, a big variability in the texture lookups can exist from one pixel to another.
  • To reduce the costs the effect can be computed at lower-resolution and up-scaled for final display. As AO is typically a low-frequency effect this often can be sufficient.
  • Dithering artifacts can occur due to the 4x4 texture tiling. The image is blurred using cross-bilateral filtering that takes the depth values into account, to further improve quality.

HBAO - Cache-Aware:

  • The performance is vastly improved by grouping all pixels that share the same direction values. This means the screen-space linear depth buffer is stored in 16 layers each representing one direction of the 4x4 texture. Each layer has a quarter of the original resolution. The total amount of pixels is not reduced, but the sampling is performed in equal directions for the entire layer, yielding better texture cache utilization.
  • Linearizing the depth-buffer now stores into 16 texture layers.
  • The actual HBAO effect is performed in each layer individually, however all layers are independent of each other, allowing them to be processed in parallel.
  • Finally the results are stored scattered to their original locations in screen-space.
  • Compared to the regular HBAO approach, the efficiency gains allow using the effect on full-resolution, improving the image quality.

MSAA support:

  • The effect is run on a per-sample level N times (N matching the MSAA level).
  • For each pass glSampleMask( 1 << sample); is used to update only the relevant samples in the target framebuffer.

Blur:

  • A cross-bilteral blur is used to eliminate the typical dithering artifacts. It makes use of the depth buffer to avoid smoothing over geometric discontinuities.

sample screenshot

Performance

The cache-aware technique pays off on larger AO radii or higher resolutions (full HD).

Timings in microseconds via GL timer query taken on a Quadro M6000, no MSAA, 1080p (sample default is 720p, which may give less difference between the two).

Classic

Timer ssao;            GL    2434;
 Timer linearize;      GL      54;
 Timer ssaocalc;       GL    2177;
 Timer ssaoblur;       GL     198;

Cache-Aware

Timer ssao;            GL    1264;        
 Timer linearize;      GL      55;
 Timer viewnormal;     GL      76;
 Timer deinterleave;   GL      93;
 Timer ssaocalc;       GL     762;
 Timer reinterleave;   GL     100;
 Timer ssaoblur;       GL     167;

Sample Highlights

The user can change MSAA settings, blur settings and other parameters.

Key functionality is found in

  • Sample::drawHbaoClassic()
  • Sample::drawHbaoCacheAware()

As well as in helper functions

  • Sample::drawLinearDepth()
  • Sample::drawHbaoBlur()

The sample contains alternate codepaths for two additional optimizations, which are enabled by default.

  • USE_AO_SPECIALBLUR: Depth is stored with the ssao calculation, so that the blur can use a single instead of two texture fetches, which improves performance.
  • USE_AO_LAYERED_SINGLEPASS: In the cache-aware technique we update the layers of the ssao calculation all at once using image stores and attachment-les fbo or a geometry shader with layers, instead of rendering to each layer individually.

Building

Ideally, clone this and other interesting nvpro-samples repositories into a common subdirectory. You will always need nvpro_core. The nvpro_core is searched either as a subdirectory of the sample, or one directory up.

If you are interested in multiple samples, you can use build_all CMAKE as entry point, it will also give you options to enable/disable individual samples when creating the solutions.

Providing Pull Requests

NVIDIA is happy to review and consider pull requests for merging into the main tree of the nvpro-samples for bug fixes and features. Before providing a pull request to NVIDIA, please note the following:

  • A pull request provided to this repo by a developer constitutes permission from the developer for NVIDIA to merge the provided changes or any NVIDIA modified version of these changes to the repo. NVIDIA may remove or change the code at any time and in any way deemed appropriate.
  • Not all pull requests can be or will be accepted. NVIDIA will close pull requests that it does not intend to merge. The modified files and any new files must include the unmodified NVIDIA copyright header seen at the top of all shipping files.

More Repositories

1

vk_raytracing_tutorial_KHR

Ray tracing examples and tutorials using VK_KHR_ray_tracing
C++
1,314
star
2

vk_mini_path_tracer

A beginner-friendly Vulkan path tracing tutorial in under 300 lines of C++.
C++
1,098
star
3

vk_raytrace

Ray tracing glTF scene with Vulkan
C++
533
star
4

gl_occlusion_culling

OpenGL sample for shader-based occlusion culling
C++
517
star
5

nvpro_core

shared source code and resources needed for the samples to run
C++
457
star
6

optix_advanced_samples

C
411
star
7

gl_vk_meshlet_cadscene

This OpenGL/Vulkan sample illustrates the use of "mesh shaders" for rendering CAD models.
C++
345
star
8

build_all

GO HERE FIRST: nvpro-samples overview
Batchfile
312
star
9

vk_order_independent_transparency

Demonstrates seven different techniques for order-independent transparency in Vulkan.
C++
264
star
10

vk_video_samples

Vulkan video samples
C++
239
star
11

gl_vk_chopper

Simple vulkan rendering example.
C++
202
star
12

vk_mini_samples

Collection of Vulkan samples
HLSL
184
star
13

vk_raytracing_tutorial_NV

Vulkan ray tracing examples and tutorials using VK_NV_ray_tracing
C++
158
star
14

gl_vk_threaded_cadscene

OpenGL and Vulkan comparison on rendering a CAD scene using various techniques
C++
157
star
15

gl_cadscene_rendertechniques

OpenGL sample on various rendering approaches for typical CAD scenes
C++
151
star
16

gl_commandlist_basic

OpenGL sample for NV_command_list
C++
112
star
17

vk_gltf_renderer

Rendering glTF scenes with ray tracer and raster (Vulkan)
C++
102
star
18

vk_displacement_micromaps

This sample showcases rasterizing and ray tracing displaced NVIDIA Micro-Mesh assets in Vulkan with and without the VK_NV_displacement_micromap extension.
C++
92
star
19

vk_denoise

Denoising a Vulkan ray traced image using OptiX denoiser
C++
88
star
20

gl_vk_bk3dthreaded

Vulkan sample rendering 3D with 'worker-threads'
C++
84
star
21

gl_vk_simple_interop

Display an image created by Vulkan compute shader, with OpenGL
C++
76
star
22

vk_toon_shader

Silhouette and toon shading post-processing with Vulkan
C++
74
star
23

gl_dynamic_lod

GPU classifies how to render millions of particles
C++
71
star
24

nvtt_samples

NVIDIA Texture Tools samples for compression, image processing, and decompression.
C++
64
star
25

gl_vk_supersampled

Vulkan sample showing a high quality super-sampled rendering
C++
63
star
26

optix_prime_baking

Shows how to bake ambient occlusion at mesh vertices using OptiX Prime
45
star
27

vk_compute_mipmaps

Customizable compute shader for fast cache-aware mipmap generation
GLSL
41
star
28

vk_async_resources

Sample showcasing lifetime management and resource transfers in Vulkan
C++
32
star
29

gl_vk_raytrace_interop

Adding ray traced ambient occlusion using Vulkan and OpenGL
C++
30
star
30

vk_timeline_semaphore

Vulkan timeline semaphore + async compute performance sample
GLSL
26
star
31

gl_render_vk_ddisplay

OpenGL sample that renders into a Vulkan direct display
C++
26
star
32

gl_multicast

OpenGL sample for the new GL_NVX_linked_gpu_multicast extension
C++
25
star
33

vk_device_generated_cmds

Vulkan sample on VK_NV_device_generated_commands
C++
25
star
34

shared_external

external libraries, needed for the samples (AntTweakBar; ZLib...)
HTML
17
star
35

vk_offline

Rendering offline using Vulkan without opening a window
C++
13
star
36

glsl_indexed_types_generator

GLSL code generator to aid use of Vulkan's descriptor set indexing
Lua
12
star
37

gl_cuda_simple_interop

Sample showing OpenGL and CUDA interop
C++
11
star
38

vk_memory_decompression

Vulkan Memory Decompression (VK_NV_memory_decompression) sample
C++
10
star
39

vk_streamline

DLSS Super Resolution and DLSS Frame Generation via Streamline
C++
10
star
40

vk_idbuffer_rasterization

Vulkan sample to render efficient per-part IDs in CAD models
C++
8
star
41

gl_path_rendering_CMYK

Example of how to use path rendering; and how to use it with CMYK (using multi-render target)
C++
8
star
42

dx12_present_barrier

This sample demonstrates the usage of the new NvAPI interface to synchronize present calls between windows on the same system as well as on distributed systems.
C++
7
star
43

nvml_enterprise_gpu_check

Shows how to check if a GPU is an Enterprise/Quadro GPU using NVML.
C++
4
star
44

vk_raytrace_displacement

C++
3
star
45

gl_vrs

Variable Rate Shading in OpenGL
C++
3
star
46

third_party_binaries

pre-built libraries for the nvpro-samples framework
C
2
star
47

vk_inherited_viewport

VK_NV_inherited_viewport_scissor and secondary subpass command buffer re-use
C++
2
star
48

vk_ddisplay

Sample to demonstrate multi-GPU rendering and presenting to ddisplays, meaning displays that are not part of the Windows desktop and of which an application takes complete control.
C++
2
star