Sampler Feedback Streaming With DirectStorage
Introduction
This repository contains an MIT licensed demo of DirectX12 Sampler Feedback Streaming, a technique using DirectX12 Sampler Feedback to guide continuous loading and eviction of small regions (tiles) of textures - in other words, virtual texture streaming. Sampler Feedback Streaming can dramatically improve visual quality by enabling scenes consisting of 100s of gigabytes of resources to be drawn on GPUs containing much less physical memory. The scene below uses just ~200MB of a 1GB heap, despite over 350GB of total texture resources. It also uses DirectStorage for Windows for maximum file upload performance.
New: incorporated DirectStorage for Windows v1.1.0 with GPU decompression. Be sure to update your GPU drivers to access your vendor's optimized GPU decompression capabilities. See also:
See also:
- GDC 2021 video (alternate link) which provides an overview of Sampler Feedback and discusses this sample starting at about 15:30.
- GDC 2021 presentation in PDF form
- Microsoft DirectStorage Landing Page
Textures derived from Hubble Images, see the Hubble Copyright
Notes:
- while multiple objects can share the same DX texture and source file, this sample aims to demonstrate the possibility of every object having a unique resource. Hence, every texture is treated as though unique, though the same source file may be used multiple times.
- the repo does not include all textures shown above (they total over 13GB). A few 16k x 16k textures are available as release 1 and release 2
- the file format has changed since large textures were provided as "releases." See the log below.
- this repository depends on DirectStorage for Windows® version 1.1.0 from https://www.nuget.org/packages/Microsoft.Direct3D.DirectStorage/
- at build time, BCx textures (BC7 and BC1 tested) in the dds/ directory are converted into the custom .XET format and placed in the ($TargetDir)/media directory (e.g. x64/Release/media). A few dds files are included.
Requirements:
- minimum:
- Windows 10 20H1 (aka May 2020 Update, build 19041)
- GPU with D3D12 Sampler Feedback Support such as Intel Iris Xe Graphics as found in 11th Generation Intel® Core™ processors and discrete GPUs (driver version 30.0.100.9667 or later)
- recommended:
- Windows 11
- nvme SSD with PCIe gen4 or later
- Intel Arc A770 discrete GPU or later
Build Instructions
Download the source. Build the appropriate solution file
- Visual Studio 2022: SamplerFeedbackStreaming_vs2022.sln
- Visual Studio 2019: SamplerFeedbackStreaming.sln.
All executables, scripts, configurations, and media files will be found in the x64/Release or x64/Debug directories. You can run from within the Visual Studio IDE or from the command line, e.g.:
c:\SamplerFeedbackStreaming\x64\Release> expanse.exe
By default (no command line options) there will be a single object, "terrain", which allows for exploring sampler feedback streaming. To explore sampler feedback streaming, expand "Terrain Object Feedback Viewer." In the top right find 2 windows: the raw GPU sampler feedback (min mip map of desired tiles) and to its right the "residency map" generated by the application (min mip map of tiles that have been loaded). Across the bottom are the mips of the texture, with mip 0 in the bottom left. Left-click drag the terrain to see sampler feedback streaming in action. Note that navigation in this mode has the up direction locked, which can be disabled in the UI.
Press the DEMO MODE button or run the batch file demo.bat to see streaming in action. Press "page up" or to click Tile Min Mip Overlay to toggle a visualization of the tiles loading. Toggle Roller Coaster mode (page up) to fly through the scene. Note keyboard controls are inactive while the Camera slider is non-zero.
c:\SamplerFeedbackStreaming\x64\Release> demo.bat
Benchmark mode generates massive disk traffic by cranking up the animation rate, dialing up the sampler bias, and rapidly switching between two camera paths to force eviction of all the current texture tiles. This mode is designed to stress the whole platform, from storage to PCIe interface to CPU and GPU.
Two sets of high resolution textures are available for use with "demo-hubble.bat": hubble-16k.zip and hubble-16k-bc1.zip). BUT they are in an older file format. Simply drop them into the "dds" directory and rebuild DdsToXet, or convert them to the new file format with convert.bat
(see below). Make sure the mediadir in the batch file is set properly, or override it on the command line as follows:
c:\SamplerFeedbackStreaming\x64\Release> demo-hubble.bat -mediadir c:\hubble-16k
Keyboard controls
qwe / asd
: strafe left, forward, strafe right / rotate left, back, rotate rightz c
: levitate up and downv b
: rotate around the look direction (z axis)arrow keys
: rotate left/right, pitch down/upshift
: move fastermouse left-click drag
: rotate viewpage up
: toggle the min mip map overlay onto every object (visualize tiles loading)page down
: while camera animation is non-zero, toggles fly-through "rollercoaster" vs. fly-around "orbit"space
: toggles camera animation on/off.home
: toggles UI. Hold "shift" while UI is enabled to toggle mini UI mode.insert
: toggles frustum visualizationesc
: while windowed, exit. while full-screen, return to windowed mode
Configuration files and command lines
For a full list of command line options, pass the command line "?", e.g.
c:> expanse.exe ?
Most of the detailed settings for the system can be found in the default configuration file config.json. You can replace this configuration with a custom configuration filewith the '-config' command line:
-config myconfig.json
The options in the json have corresponding command lines, e.g.:
json:
"mediaDir" : "media"
equivalent command line:
-mediadir media
Creating Your Own Textures
The executable DdsToXet.exe
converts BCn DDS textures to the custom XET format. Only BC1 and BC7 textures have been tested. Usage:
c:> ddstoxet.exe -in myfile.dds -out myfile.xet
The batch file convert.bat will read all the DDS files in one directory and write XET files to a second directory. The output directory must exist.
c:> convert c:\myDdsFiles c:\myXetFiles
A new DirectStorage trace capture and playback utility has been added so DirectStorage performance can be analyzed without the overhead of rendering. For example, to capture and play back the DirectStorage requests and submits for 500 "stressful" frames with a staging buffer size of 128MB, cd to the build directory and:
stress.bat -timingstart 200 -timingstop 700 -capturetrace
traceplayer.exe -file uploadTraceFile_1.json -mediadir media -staging 128
TileUpdateManager: a library for streaming textures
The sample includes a library TileUpdateManager with a minimal set of APIs defined in SamplerFeedbackStreaming.h. The central object, TileUpdateManager, allows for the creation of streaming textures and heaps to contain them. These objects handle all the feedback resource creation, readback, processing, and file/IO.
The application creates a TileUpdateManager and 1 or more heaps in Scene.cpp:
m_pTileUpdateManager = std::make_unique<TileUpdateManager>(m_device.Get(), m_commandQueue.Get(), tumDesc);
// create 1 or more heaps to contain our StreamingResources
for (UINT i = 0; i < m_args.m_numHeaps; i++)
{
m_sharedHeaps.push_back(m_pTileUpdateManager->CreateStreamingHeap(m_args.m_streamingHeapSize));
}
Each SceneObject creates its own StreamingResource. Note a StreamingResource can be used by multiple objects, but this sample was designed to emphasize the ability to manage many resources and so objects are 1:1 with StreamingResources.
m_pStreamingResource = std::unique_ptr<StreamingResource>(in_pTileUpdateManager->CreateStreamingResource(in_filename, in_pStreamingHeap));
Known issues
Performance Degradation
Performance appears to degrade over time with some non-Intel devices/drivers as exposed by the bandwidth graph in benchmark mode after a few minutes. Compare the following healthy graph to the graph containing stalls below:
As a workaround, try the command line -config fragmentationWA.json
, e.g.:
c:\SamplerFeedbackStreaming\x64\Release> demo.bat -config fragmentationWA.json
c:\SamplerFeedbackStreaming\x64\Release> stress.bat -mediadir c:\hubble-16k -config fragmentationWA.json
The issue (which does not affect Intel GPUs) is the tile allocations in the heap becoming fragmented relative to resources. Specifically, the CPU time for UpdateTileMappings gradually increases causing the streaming system to stall waiting for pending operations to complete. The workaround reduces fragmentation by distributing streaming resources across multiple small heaps (vs. a single large heap), which can result in visual artifacts if the small heaps fill. To mitigate the small heaps filling, more total heap memory is allocated. There may be other (unexplored) solutions, e.g. perhaps by implementing a hash in the tiled heap allocator. This workaround adjusts two properties:
"heapSizeTiles": 512, // size for each heap. 64KB per tile * 512 tiles -> 32MB heap
"numHeaps": 127, // number of heaps. streaming resources will be distributed among heaps
Cracks between tiles
The demo exhibits texture cracks due to the way feedback is used. Feedback is always read after drawing, resulting in loads and evictions corresponding to that frame only becoming available for a future frame. That means we never have exactly the texture data we need when we draw (unless no new data is needed). Most of the time this isn't perceptible, but sometimes a fast-moving object enters the view resulting in visible artifacts.
The following image shows an exaggerated version of the problem, created by disabling streaming completely then moving the camera:
In this case, the hardware sampler is reaching across tile boundaries to perform anisotropic sampling, but encounters tiles that are not physically mapped. D3D12 Reserved Resource tiles that are not physically mapped return black to the sampler. This could be mitigated by dilating or eroding the min mip map such that there is no more than 1 mip level difference between neighboring tiles. That visual optimization is TBD.
There are also a few known bugs:
- entering full screen in a multi-gpu system moves the window to a monitor attached to the GPU by design. However, if the window starts on a different monitor, it "disappears" on the first maximization. Hit escape then maximize again, and it should work fine.
- full-screen while remote desktop is not borderless.
How It Works
This implementation of Sampler Feedback Streaming uses DX12 Sampler Feedback in combination with DX12 Reserved Resources, aka Tiled Resources. A multi-threaded CPU library processes feedback from the GPU, makes decisions about which tiles to load and evict, loads data from disk storage, and submits mapping and uploading requests via GPU copy queues. There is no explicit GPU-side synchronization between the queues, so rendering frame rate is not dependent on completion of copy commands (on GPUs that support concurrent multi-queue operation) - in this sample, GPU time is mostly a function of the Sampler Feedback Resolve() operations described below. The CPU threads run continuously and asynchronously from the GPU (pausing when there's no work to do), polling fence completion states to determine when feedback is ready to process or copies and memory mapping has completed.
All the magic can be found in the TileUpdateManager library (see the internal file TileUpdateManager.h - applications should include SamplerFeedbackStreaming.h), which abstracts the creation of StreamingResources and heaps while internally managing feedback resources, file I/O, and GPU memory mapping.
The technique works as follows:
1. Create a Texture to be Streamed
The streaming textures are allocated as DX12 Reserved Resources, which behave like VirtualAlloc in C. Each resource takes no physical GPU memory until 64KB regions of the resource are committed in 1 or more GPU heaps. The x/y dimensions of a reserved resource tile is a function of the texture format, such that it fills a 64KB GPU memory page. For example, BC7 textures have 256x256 tiles, while BC1 textures have 512x256 tiles.
In Expanse, each tiled resource corresponds to a single .XeT file on a hard drive (though multiple resources can point to the same file). The file contains dimensions and format, but also information about how to access the tiles within the file.
2. Create and Pair a Min-Mip Feedback Map
To use sampler feedback, we create a feedback resource corresponding to each streaming resource, with identical dimensions to record information about which texels were sampled.
For this streaming usage, we use the min mip feedback feature by creating the resource with the format DXGI_FORMAT_SAMPLER_FEEDBACK_MIN_MIP_OPAQUE. We set the region size of the feedback to match the tile dimensions of the tiled resource (streaming resource) through the SamplerFeedbackRegion member of D3D12_RESOURCE_DESC1.
For the feedback to be written by GPU shaders (in this case, pixel shaders) the texture and feedback resources must be paired through a view created with CreateSamplerFeedbackUnorderedAccessView.
3. Draw Objects While Recording Feedback
For expanse, there is a "normal" non-feedback shader named terrainPS.hlsl and a "feedback-enabled" version of the same shader, terrainPS-FB.hlsl. The latter simply writes feedback using WriteSamplerFeedback HLSL intrinsic, using the same sampler and texture coordinates, then calls the prior shader. Compare the WriteSamplerFeedback() call below to to the Sample() call above.
To add feedback to an existing shader:
- include the original shader hlsl
- add binding for the paired feedback resource
- call the WriteSamplerFeedback intrinsic with the resource and sampler defined in the original shader
- call the original shader
#include "terrainPS.hlsl"
FeedbackTexture2D<SAMPLER_FEEDBACK_MIN_MIP> g_feedback : register(u0);
float4 psFB(VS_OUT input) : SV_TARGET0
{
g_feedback.WriteSamplerFeedback(g_streamingTexture, g_sampler, input.tex.xy);
return ps(input);
}
4. Process Feedback
Sampler Feedback resources are opaque, and must be Resolved before interpretting on the CPU.
Resolving feedback for one resource is inexpensive, but adds up when there are 1000 objects. Expanse has a configurable time limit for the amount of feedback resolved each frame. The "FB" shaders are only used for a subset of resources such that the amount of feedback produced can be resolved within the time limit. The time limit is managed by the application, not by the TileUpdateManager library, by keeping a running average of resolve time as reported by GPU timers.
As an optimization, Expanse tells streaming resources to evict all tiles if they are behind the camera. This could potentially be improved to include any object not in the view frustum.
You can find the time limit estimation, the eviction optimization, and the request to gather sampler feedback by searching Scene.cpp for the following:
- DetermineMaxNumFeedbackResolves determines how many resources to gather feedback for
- QueueEviction tell runtime to evict tiles for this resource (as soon as possible)
- SetFeedbackEnabled results in 2 actions:
- tell the runtime to collect feedback for this object via TileUpdateManager::QueueFeedback(), which results in clearing and resolving the feedback resource for this resource for this frame
- use the feedback-enabled pixel shader for this object
5. Determine Which Tiles to Load & Evict
The resolved Min mip feedback tells us the minimum mip tile that should be loaded. The min mip feedback is traversed, updating an internal reference count for each tile. If a tile previously was unused (ref count = 0), it is queued for loading from the bottom (highest mip) up. If a tile is not needed for a particular region, its ref count is decreased (from the top down). When its ref count reaches 0, it might be ready to evict.
Data structures for tracking reference count, residency state, and heap usage can be found in StreamingResource.cpp and StreamingResource.h, look for TileMappingState. This class also has methods for interpreting the feedback buffer (ProcessFeedback) and updating the residency map (UpdateMinMipMap), which execute concurrently in separate CPU threads.
class TileMappingState
{
public:
// see file for method declarations
private:
TileLayer<BYTE> m_resident;
TileLayer<UINT32> m_refcounts;
TileLayer<UINT32> m_heapIndices;
};
TileMappingState m_tileMappingState;
Tiles can only be evicted if there are no lower-mip-level tiles that depend on them, e.g. a mip 1 tile may have four mip 0 tiles "above" it in the mip hierarchy, and may only be evicted if all 4 of those tiles have also been evicted. The ref count helps us determine this dependency.
A tile also cannot be evicted if it is being used by an outstanding draw command. We prevent this by delaying evictions a frame or two depending on swap chain buffer count (i.e. double or triple buffering). If a tile is needed before the eviction delay completes, the tile is simply rescued from the pending eviction data structure instead of being re-loaded.
The mechanics of loading, mapping, and unmapping tiles is all contained within the DataUploader class, which depends on a FileStreamer class to do the actual tile loads. The latter implementation (FileStreamerReference) can easily be exchanged with DirectStorage for Windows.
6. Update Residency Map
Because textures are only partially resident, we only want the pixel shader to sample resident portions. Sampling texels that are not physically mapped that returns 0s, resulting in undesirable visual artifacts. To prevent this, we clamp all sampling operations based on a residency map. The residency map is relatively tiny: for a 16k x 16k BC7 texture, which would take 350MB of GPU memory, we only need a 4KB residency map. Note that the lowest-resolution "packed" mips are loaded for all objects, so there is always something available to sample. See also GetResourceTiling.
When a texture tile has been loaded or evicted by TileUpdateManager, it updates the corresponding residency map. The residency map is an application-generated representation of the minimum mip available for each region in the texture, and is described in the Sample Feedback spec as follows:
The MinMip map represents per-region mip level clamping values for the tiled texture; it represents what is actually loaded.
Below, the Visualization mode was set to "Color = Mip" and labels were added. TileUpdateManager processes the Min Mip Feedback (left window in top right), uploads and evicts tiles to form a Residency map, which is a proper min-mip-map (right window in top right). The contents of memory can be seen in the partially resident mips along the bottom (black is not resident). The last 3 mip levels are never evicted because they are packed mips (all fit within a 64KB tile). In this visualization mode, the colors of the texture on the bottom correspond to the colors of the visualization windows in the top right. Notice how the resident tiles do not exactly match what feedback says is required.
To reduce GPU memory, a single combined buffer contains all the residency maps for all the resources. The pixel shader samples the corresponding residency map to clamp the sampling function to the minimum available texture data available, thereby avoiding sampling tiles that have not been mapped.
We can see the lookup into the residency map in the pixel shader terrainPS.hlsl. Resources are defined at the top of the shader, including the reserved (tiled) resource g_streamingTexture, the residency map g_minmipmap, and the sampler:
Texture2D g_streamingTexture : register(t0);
Buffer<uint> g_minmipmap: register(t1);
SamplerState g_sampler : register(s0);
The shader offsets into its region of the residency map (g_minmipmapOffset) and loads the minimum mip value for the region to be sampled.
int2 uv = input.tex * g_minmipmapDim;
uint index = g_minmipmapOffset + uv.x + (uv.y * g_minmipmapDim.x);
uint mipLevel = g_minmipmap.Load(index);
The sampling operation is clamped to the minimum mip resident (mipLevel).
float3 color = g_streamingTexture.Sample(g_sampler, input.tex, 0, mipLevel).rgb;
7. Putting it all Together
There is some work that needs to be done before drawing objects that use feedback (clearing feedback resources), and some work that needs to be done after (resolving feedback resources). TileUpdateManager creates theses commands, but does not execute them. Each frame, these command lists must be built and submitted with application draw commands, which you can find just before the call to Present() in Scene.cpp as follows:
auto commandLists = m_pTileUpdateManager->EndFrame();
ID3D12CommandList* pCommandLists[] = { commandLists.m_beforeDrawCommands, m_commandList.Get(), commandLists.m_afterDrawCommands };
m_commandQueue->ExecuteCommandLists(_countof(pCommandLists), pCommandLists);
Log
- 2022-10-24: Added DirectStorage trace playback utility to measure performance of file upload independent of rendering. For example, to capture and playback the DirectStorage requests and submits for 500 "stressful" frames with a staging buffer size of 128MB, cd to the build directory and:
stress.bat -timingstart 200 -timingstop 700 -capturetrace
traceplayer.exe -file uploadTraceFile_1.json -mediadir media -staging 128
- 2022-06-10: File format (.xet) change. DdsToXet can upgrade old Xet files to the new format. Assets in the DDS directory are exported at build time into media directory. Upgrade to DirectStorage v1.0.2. Many misc. improvements.
- 2022-05-05: Workaround for rare race condition. Many tweaks and improvements.
- 2022-03-14: DirectStorage 1.0.0 integrated into mainline
- 2021-12-15: "-addAliasingBarriers" command line option to add an aliasing barrier to assist PIX analysis. Can also be enabled in config.json.
- 2021-12-03: added BC1 asset collection as "release 2." All texture assets (.xet files) can reside in the same directory despite format differences, and can co-exist in the same GPU heap. Also minor source tweaks, including fix to not cull base "terrain" object.
- 2021-10-21: code refactor to improve sampler feedback streaming library API
- 2021-08-10: Added some 16k x 16k textures (BC7 format) posted as "release 1".
License
Sample and its code provided under MIT license, please see LICENSE. All third-party source code provided under their own respective and MIT-compatible Open Source licenses.
Copyright (C) 2021, Intel Corporation