• Stars
    star
    207
  • Rank 189,769 (Top 4 %)
  • Language
    C
  • License
    MIT License
  • Created over 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A C++11/OpenGL library for the Fast Fourier Transform

GLFFT

GLFFT is a C++11/OpenGL library for doing the Fast Fourier Transform (FFT) on a GPU in one or two dimensions. The target APIs are OpenGL 4.3 core profile and OpenGL ES 3.1. GLFFT is implemented entirely with compute shaders.

The FFT has several uses in graphics. The two main ones are Tessendorf's FFT water simulation technique as well as applying massive convolution filters to images more efficiently.

Features

GLFFT is a well featured single-precision and half-precision (FP16) FFT library designed for graphics use cases.

  • Power-of-two transforms
  • 1D/2D complex-to-complex transform
  • 1D/2D real-to-complex transform
  • 1D/2D complex-to-real transform
  • 1D/2D dual complex-to-complex transforms which pair two complex numbers into a vec4 (useful for working for RGBA data).
  • 1D/2D convolution support
  • Normalized and unnormalized FFT
  • Support for choosing if input and output to GLFFT is treated as packed FP16 or FP32.
  • Support for using Shader Storage Buffer Objects or Textures as inputs and outputs to GLFFT.
  • Applicable for many different GPUs with many performance oriented knobs to tweak.
  • Support for FFT "wisdom", a method where GLFFT will find optimal parameters for a particular GPU.
  • A serialization interface for storing GLFFT wisdom for later use.
  • A standalone CLI for verification and benchmarking.

Platform support

Desktop Linux and Windows have been tested and are supported. Supports both OpenGL 4.3 core and OpenGL ES 3.1.

The GLSL code has been tested and verified on:

  • ARM Mali T-760 (Linux, Android 5.x)
  • nVidia GTX 760 (Linux)
  • Intel HD 4400 (Windows 7 x64)

Compilers tested:

  • GCC 4.8
  • GCC 4.9
  • GCC 5.2
  • Clang 3.6

Performance

GLFFT is a performance oriented FFT library, and aims to reach optimal efficiency on both desktop and mobile GPUs.

  • Supports different vector sizes to match both scalar and vector-based GPU hardware.
  • Supports techniques to reduce bank conflicts which greatly improves performance of GPUs with banked shared memory such as nVidia and AMD.
  • Supports almost any workgroup size decomposition to better match ideal memory access patterns on various hardware.
  • Supports mediump precision to improve performance in FP16 FFTs on mobile GPUs, e.g. ARM Mali.
  • Supports packed FP16 input and FP16 output to greatly reduce bandwidth when FP16 is accurate enough for the particular application.

Integrating GLFFT into a code base

GLFFT is an OpenGL oriented implementation, but all API specifics of GLFFT have been abstracted away to make it suitable for integration into engines which might have abstracted interfaces to the underlying graphics APIs. It is possible to implement GLFFT without OpenGL, as long as GLSL is supported as a shading language, which is assumed to be feasible once SPIR-V becomes mainstream. The abstracted interface is designed to match the spirit of next-generation APIs like Vulkan and D3D12.

As using GLFFT in a GL context is by far the most common use case, a default, ready-to-go OpenGL interface is supplied in the API. Due to the nature of OpenGL, it is not always feasible to build GLFFT as a standalone library, as applications might have their own ways of getting to OpenGL symbols and headers which is not easy for GLFFT to work with. Instead, the GLFFT implementation will include a user-supplied header, glfft_api_headers.hpp which is responsible for including the appropriate OpenGL or OpenGL ES 3.1 headers (or special headers like GLEW) the calling application uses, as well as defining various constants, such as the GLSL language strings to use. This header can also override various functions such as logging functions and time functions. For a reference, see test/glfw/glfft_api_headers.hpp as to how the internal test/bench suite does it.

Pre-baking GLSL shader source

To compile GLSL shader sources into header files, run:

make -C glsl

This only needs to be run when the internal GLSL shaders are modified.

Source files

To build source files, a $(wildcard *.cpp) in top level directory of GLFFT is sufficient to find the necessary files. When compiling, C++11 must be enabled, and glfft_api_headers.hpp must be found in an include path.

Snippets

Do a 1024x256 Complex-To-Complex FFT.

#include "glfft.hpp"
#include "glfft_gl_interface.hpp"
#include <memory>

using namespace GLFFT;
using namespace std;

FFTOptions options; // Default initializes to something conservative in terms of performance.
options.type.fp16 = true; // Use mediump float (if GLES) in shaders.
options.type.input_fp16 = false; // Use FP32 input.
options.type.output_fp16 = true; // Use FP16 output.
options.type.normalize = true; // Normalized FFT.

GLContext context;

FFT fft(&context, 1024, 256, ComplexToComplex, Inverse, SSBO, SSBO, make_shared<ProgramCache>(), options);

GLuint output_ssbo, input_ssbo;
// Create GL_SHADER_STORAGE_BUFFERs and put some data in them.

// Adapt raw GL types to types which GLContext uses internally.
GLBuffer adaptor_output(output_ssbo);
GLBuffer adaptor_input(input_ssbo);

// Do the FFT
CommandBuffer *cmd = context.request_command_buffer();
fft.process(cmd, &adaptor_output, &adaptor_input);
context.submit_command_buffer(cmd);

glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);

Do a 1024x256 Complex-To-Complex FFT more optimally using wisdom.

#include "glfft.hpp"
#include "glfft_wisdom.hpp"
#include "glfft_gl_interface.hpp"
#include <memory>

using namespace GLFFT;
using namespace std;

FFTOptions options; // Default initializes to something conservative in terms of performance.
options.type.fp16 = true; // Use mediump float (if GLES) in shaders.
options.type.input_fp16 = false; // Use FP32 input.
options.type.output_fp16 = true; // Use FP16 output.
options.type.normalize = true; // Normalized FFT.

GLContext context;

FFTWisdom wisdom;
// Use some static wisdom to make the learning step faster.
// Avoids searching for options which are known to be bogus for a particular vendor.
wisdom.set_static_wisdom(FFTWisdom::get_static_wisdom_from_renderer(reinterpret_cast<const char*>(glGetString(GL_RENDERER))));
// Learn how to do 1024x256 much faster!
wisdom.learn_optimal_options_exhaustive(&context, 1024, 256, ComplexToComplex, SSBO, SSBO, options.type);

GLContext context;

FFT fft(&context, 1024, 256, ComplexToComplex, Inverse, SSBO, SSBO, make_shared<ProgramCache>(), options);

GLuint output_ssbo, input_ssbo;
// Create GL_SHADER_STORAGE_BUFFERs and put some data in them.

// Adapt raw GL types to types which GLContext uses internally.
GLBuffer adaptor_output(output_ssbo);
GLBuffer adaptor_input(input_ssbo);

// Do the FFT
CommandBuffer *cmd = context.request_command_buffer();
fft.process(cmd, &adaptor_output, &adaptor_input);
context.submit_command_buffer(cmd);

glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);

Serializing wisdom to a string

GLContext context;

FFTWisdom wisdom;
// Use some static wisdom to make the learning step faster.
// Avoids searching for options which are known to be bogus for a particular vendor.
wisdom.set_static_wisdom(FFTWisdom::get_static_wisdom_from_renderer(reinterpret_cast<const char*>(glGetString(GL_RENDERER))));
// Learn how to do 1024x256 much faster!
wisdom.learn_optimal_options_exhaustive(&context, 1024, 256, ComplexToComplex, SSBO, SSBO, options.type);

// Serialize to string.
string wisdom_json = wisdom.archive();

// Unserialize wisdom.
wisdom.extract(wisdom_json.c_str());

Documentation

Proper documentation is still TODO. However, test/glfft_test.cpp and test/glfft_cli.cpp should give a good idea for how to use the API.

Verification and Benchmarking

For verification and benchmarking, GLFFT has a standalone executable which uses the GLFW3 library to create a context.

Building

Building requires GLFW3 to be installed, visible from pkg-config, although for Windows 64-bit a precompiled library with headers is included for convenience.

make
./glfft_cli help

Cross compilation for e.g. Windows from Linux

make PLATFORM=win TOOLCHAIN_PREFIX=x86_64-w64-mingw32-
wine64 ./glfft_cli help

Dependencies

Other than GLFW, muFFT and rapidjson are needed, which are included as git submodules.

git submodule init
git submodule update
# Or git clone --recursive when checking out GLFFT.

Tests

To verify GLFFT correctness, GLFFT has a large amount of tests which exhaustively tests the GLFFT API with almost any thinkable input and output parameters.

./glfft_cli test --test-all # Exhaustively tests everything (over 3000 tests currently), so will take some time.

Verification is based on SNR compared to muFFT as a reference and maximum allowed delta from reference value. The default precision requirements are fairly stringent, so particular GPUs might not support high enough precision. Precision requirements can be overridden in such scenarios on the command line (see help).

Benchmarking

To evaluate GLFFT performance, GLFFT can be benchmarked with various parameters. The benchmarking interface will run a full wisdom search first to find optimal parameters before running the actual benchmark.

./glfft_cli bench --width 1024 --height 1024 --type ComplexToComplex --input-texture # Benchmark a 1024x1024 C2C FFT with texture as input and SSBO as output. See ./glfft_cli bench help for more.

FFT method

GLFFT implements radix-4, radix-8, radix-16 (radix-4 two times in single pass) and radix-64 (radix-8 two times in single pass) FFT kernels. GLFFT will automatically find the optimal subdivision of a larger FFT problem based on either wisdom knowledge or estimations. Radix-16 and Radix-64 kernels are implemented by using shared memory to perform multiple passes without going to global memory between the two passes.

In order to support a vast number of options, GLFFT will compile shaders on-demand during initialization and store them in a user-provided cache which can be shared between GLFFT instantiations.

GLFFT is out-of-place using the Stockham auto-sort algorithm to avoid explicit reorder passes which is common for in-place algorithms.

GLFFT's implementation is overall very similar to muFFT, and more information about the algorithms can be found here.

License

The GLFFT library is licensed under the permissive MIT license, see COPYING and preambles in source files for more detail. The verification and benchmarking library links against GLFW, whose license is here.

More Repositories

1

Granite

My personal Vulkan renderer
C++
1,473
star
2

libfmsynth

A C library which implements an FM synthesizer
C
327
star
3

muFFT

A library for computing the Fast Fourier Transform (FFT) in one or two dimensions.
C
154
star
4

parallel-rdp

A low-level Vulkan compute emulation of the N64 RDP
C++
141
star
5

Emulator-Shader-Pack

Various pixel shaders in Cg for oldschool emulators
87
star
6

RetroWarp

A Vulkan compute shader implementation of a fictional retro GPU
C++
52
star
7

MIPS-LLVM-Jitter

A simple MIPS to LLVM IR recompiler with ahead-of-time recompilation support
C++
42
star
8

slang-shaders

C
41
star
9

Dinothawr

Dinothawr is a block pushing puzzle game on slippery surfaces. Our hero is a dinosaur whose friends are trapped in ice. Through puzzles it is your task to free the dinos from their ice prison.
C
40
star
10

libmaru

USB audio class driver in userspace using libusb.
C
37
star
11

Fossilize

This repository is discontinued, see https://github.com/ValveSoftware/Fossilize
C
34
star
12

simple-irc-bot

A simple IRC bot. Hackable.
C
32
star
13

RSound

A multi-platform, simple PCM audio server and client.
C
30
star
14

RetroArch-Phoenix

Simple GUI frontend for RetroArch using Phoenix
C++
28
star
15

boxes

OpenGL 4.3 experiment with compute shaders and tons of instanced geometry! Targets libretro GL.
C
25
star
16

Granite-MicroSamples

Micro-samples for Granite to read along with my blog post: http://themaister.net/blog/2019/04/14/a-tour-of-granites-vulkan-backend-part-1/
C++
19
star
17

parallel-rdp-standalone

A standalone, distilled codebase of parallel-rdp. Autogenerated.
C++
17
star
18

alsa-plugins-rsound

RSound plugin for ALSA
C
16
star
19

SimpleGL

NIH library for OpenGL window/context creation for X11/Win32/OSX.
C
11
star
20

ToneFilterVST

Basic recreation of the Nier Automata tone filter in VST form
C++
10
star
21

SLIMPlayer

Simple and Lightweight Media Player (using FFmpeg)
C++
9
star
22

pyrofling

PyroFling is a simple solution for capturing Vulkan applications and broadcast video and audio to streaming platforms using FFmpeg.
C++
8
star
23

SFC-Pong

A very newbish Pong game for Super Nintendo.
Assembly
5
star
24

airsynth

Multi-instrument polyphonic synthesizer for LV2 with a standalone JACK implementation.
C++
5
star
25

bsnes-Qt

Fork of BSNES/QT (v073)
C++
4
star
26

MIPS-Toolchain-PKGBUILD

A set of PKGBUILDs to build a MIPS (big-endian) toolchain
Shell
4
star
27

SC-MIPS

Simple, single cycle MIPS implementation in VHDL. Extended to support most instructions.
VHDL
4
star
28

libretro-ffmpeg

Video player implemented in libretro.
C++
2
star
29

XMLShaderView

View XML shaders in your browser!
JavaScript
2
star
30

parallel-psx-dump-replayer

Standalone tool to replay dumps from Parallel PSX
C++
2
star
31

alsa-reroute

Small script/preloaded lib that intercepts calls to ALSA and force them to open different devices.
C
2
star
32

Quickbuild

Simplistic build system written in sh to help with configuring building from source.
Shell
2
star
33

blipper

Band-limited decimator for emulators.
C
2
star
34

umusd

Microscopic music player controlled over TCP.
C++
2
star
35

Scale

Small and simple image scaler.
C
2
star
36

DirectRSound

Reimplement a subset of DirectSound with RSound.
C++
2
star
37

RetroArch-DSP-plugins

Audio DSP plugins for RetroArch.
C++
2
star
38

sussybard

Simple app which converts MIDI key inputs to X key presses, designed for playing FF XIV Bard
C++
2
star
39

SSNES-core

Futile attempt to create a SNES emulator :)
C
2
star
40

SC-MIPS-Chain

Simple toolchain for SC-MIPS
C
2
star
41

GLModelViewer

Simple modelviewer in OpenGL 3.0+
C
2
star
42

stile

Simple picture converter to SNES-format tilemap/tileset palette and stuff.
C++
1
star
43

midiviz

Crappy ad-hoc MIDI visualizer
C
1
star
44

iceypuzzle

Some simple engine for doing sliding puzzles, or something... :')
C++
1
star
45

RetroArch-D3D9

D3D9 external driver plugin for RetroArch.
C++
1
star
46

libretro-remotejoy

RemoteJoy client side code rewritten for libretro.
C
1
star
47

rtorrent-nogui

Attempts to run rtorrent as a pure daemon without GUI.
C++
1
star