• Stars
    star
    150
  • Rank 247,323 (Top 5 %)
  • Language
    C++
  • License
    Other
  • Created over 6 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fast single source file BC7/BPTC texture encoder with perceptual metric support
bc7enc16 - Fast, single source file BC7/BPTC GPU texture encoder with perceptual colorspace metric support

Note: Since this repo was created, we've released two new codecs with better BC7 encoders:
https://github.com/richgel999/bc7enc_rdo
https://github.com/BinomialLLC/bc7e

bc7enc16 purposely only supports modes 1 and 6. This is a strong opaque texture encoder, with basic
support for alpha channels (using mode 6). The intended use case is opaque textures, or opaque textures 
with relatively simple alpha channels. It also acts as a relatively simple to understand example.

If alpha is highly correlated compared to RGB, or alpha is relatively simple
(think simple masks where lots of blocks are either all-transparent or
all-opaque), it should work great. For complex alpha channels more modes (such
as 4, 5 or maybe 7) are necessary.

This codec supports a perceptual mode, where it computes colorspace error in
weighted YCbCr space (like etc2comp), and it also supports weighted RGBA
metrics. It's particular strong in perceptual mode, beating the current state of
the art CPU encoder (Intel's ispc_texcomp) by a wide margin when measured by
Luma PSNR, even though it only supports 2 modes and isn't vectorized.

Why only modes 1 and 6?
Because with these two modes you have a complete encoder that supports both
opaque and transparent textures in a small amount (~1400 lines) of
understandable plain C code. Mode 6 excels on smooth blocks, and mode 1 is
strong with complex blocks, and a strong encoder that combines both modes can be
quite high quality. Fast mode 6-only encoders will have noticeable block
artifacts which this codec avoids by fully supporting mode 1.

Modes 1 and 6 are typically the most used modes on many textures using other
encoders. Mode 1 has two subsets, 64 possible partitions, and 3-bit indices,
while mode 6 has large 4-bit indices and high precision 7777.1 endpoints. This
codec produces output that is far higher quality than any BC1 encoder, and
approaches (or in perceptual mode exceeds!) the quality of other full BC7
encoders.

Why is bc7enc16 so fast in perceptual mode?
Computing error in YCbCr space is more expensive than in RGB space, yet bc7enc16
in perceptual mode is stronger than ispc_texcomp (see the benchmark below) -
even without SSE/AVX vectorization and with only 2 modes to work with!

Most BC7 encoders only support linear RGB colorspace metrics, which is a
fundamental weakness. Some support weighted RGB metrics, which is better. With
linear RGB metrics, encoding error is roughly balanced between each channel, and
encoders have to work *very* hard (examining large amounts of RGB search space)
to get overall quality up. With perceptual colorspace metrics, RGB error tends
to become a bit unbalanced, with green quality favored more highly than red and
blue, and blue quality favored the least. A perceptual encoder is tuned to
prefer exploring solutions along the luma axis, where it's much less work to find
solutions with less luma error. bc7enc16 is, as far as I know, the first BC7
codec to support computing error in weighted YCbCr colorspace.

Note: Most of the timings here (except for the ispc_texcomp "fast" mode timings at the very bottom)
are for the *original* release, before I added several more optimizations. The latest version of 
bc7enc16.c is around 8-27% faster than the initial release at same quality (when mode 1 is enabled - 
there's no change with just mode 6).

Some benchmarks across 31 images (kodim corpus+others):

Perceptual (average REC709 Luma PSNR - higher is better quality):

iscp_texcomp slow vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   355.4 secs 48.6 dB
bc7enc16:       122.6 secs 50.0 dB

iscp_texcomp slow vs. bc7enc16 uber0/max_partitions 64
iscp_texcomp:   355.4 secs 48.6 dB
bc7enc16:       38.3 secs 49.6 dB

iscp_texcomp basic vs. bc7enc16 uber0/max_partitions 16
ispc_texcomp:   100.2 secs 48.3 dB
bc7enc16:       20.8 secs 49.3 dB 

iscp_texcomp fast vs. bc7enc16 uber0/max_partitions 16
iscp_texcomp:   41.5 secs 48.0 dB 
bc7enc16:       20.8 secs 49.3 dB

iscp_texcomp ultrafast vs. bc7enc16 uber0/max_partitions 0
iscp_texcomp:   1.9 secs 46.2 dB
bc7enc16:       8.9 secs 48.4 dB 

Non-perceptual (average RGB PSNR):

iscp_texcomp slow vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   355.4 secs 46.8 dB 
bc7enc16:       51 secs 46.1 dB

iscp_texcomp slow vs. bc7enc16 uber0/max_partitions 64
iscp_texcomp:   355.4 secs 46.8 dB
bc7enc16:       29.3 secs 45.8 dB

iscp_texcomp basic vs. bc7enc16 uber4/max_partitions 64
iscp_texcomp:   99.9 secs 46.5 dB
bc7enc16:       51 secs 46.1 dB

iscp_texcomp fast vs. bc7enc16 uber1/max_partitions 16
ispc_texcomp:   41.5 secs 46.1 dB
bc7enc16:       19.8 secs 45.5 dB

iscp_texcomp fast vs. bc7enc16 uber0/max_partitions 8
ispc_texcomp:   41.5 secs 46.1 dB
bc7enc16:       10.46 secs 44.4 dB

iscp_texcomp ultrafast vs. bc7enc16 uber0/max_partitions 0
ispc_texcomp:   1.9 secs 42.7 dB 
bc7enc16:       3.8 secs 42.7 dB

DirectXTex CPU in "mode 6 only" mode vs. bc7enc16 uber1/max_partions 0 (mode 6 only), non-perceptual:

DirectXTex:     466.4 secs 41.9 dB 
bc7enc16:       6.7 secs 42.8 dB

DirectXTex CPU in (default - no 3 subset modes) vs. bc7enc16 uber1/max_partions 64, non-perceptual:

DirectXTex:     9485.1 secs 45.6 dB 
bc7enc16:       36 secs 46.0 dB

(Note this version of DirectXTex has a key pbit bugfix which I've submitted but
is still waiting to be accepted. Non-bugfixed versions will be slightly lower
quality.)

UPDATE: To illustrate how strong the mode 1+6 implementation is in bc7enc16, let's compare ispc_texcomp 
fast vs. the latest version of bc7enc16 uber4/max_partitions 64:

Without filterbank optimizations:

                Time       RGB PSNR   Y PSNR
ispc_texcomp:   41.45 secs 46.09 dB   48.0 dB
bc7enc16:       41.42 secs 46.03 dB   48.2 dB

With filterbank optimizations enabled:
bc7enc16:       38.78 secs 45.94 dB   48.12 dB

They both have virtually the same average RGB PSNR with these settings (.06 dB is basically noise), but 
bc7enc16 is just as fast as ispc_texcomp fast, even though it's not vectorized. Interestingly, our Y PSNR is better, 
although bc7enc16 wasn't using perceptual metrics in these benchmarks. 

This was a multithreaded benchmark (using OpenMP) on a dual Xeon workstation.
ispc_texcomp was called with 64-blocks at a time and used AVX instructions.
Timings are for encoding only.

More Repositories

1

miniz

miniz: Single C source file zlib-replacement library, originally from code.google.com/p/miniz
C++
2,104
star
2

fpng

Super fast C++ .PNG writer/reader
C
877
star
3

lzham_codec

Lossless data compression codec with LZMA-like ratios but 1.5x-8x faster decompression speed, C/C++
C++
690
star
4

jpeg-compressor

C++ JPEG compression/fuzzed low-RAM JPEG decompression codec with Public Domain or Apache 2.0 license
C
210
star
5

bc7enc

Single source file BC1-5 and BC7 encoders and BC1-5/7 decoders with MIT or Public Domain licenses
C++
207
star
6

bc7enc_rdo

State of the art RDO BC1-7 GPU texture encoders
C++
180
star
7

uap_resources

Key OSINT UAP Related Materials
155
star
8

crunch

Advanced DXTc texture compression library
141
star
9

CppSPMD_Fast

Optimized CppSPMD test project: macro control flow, SSE4.1/AVX1/AVX2/AVX2 FMA support
C++
113
star
10

ufo_data

The Dataset of the Damned, and UFO/UAP event chronology creation tool
HTML
86
star
11

rdopng

Rate-Distortion Optimized Lossy PNG/QOI Encoding Tool
C++
80
star
12

picojpeg

picojpeg: Tiny JPEG decoder for 8/16-bit microcontrollers
C
75
star
13

lzham_codec_devel

LZHAM codec - unstable/experimental repo. Much faster compression and higher ratios in extreme mode. This is well tested but isn't the official version just yet, see lzham_codec instead. Still 100% backwards compatible with lzham v1.0.
C++
68
star
14

astc_dec

Single source file LDR ASTC texture decompression in C++ (derived from Google's open source Android project)
C++
42
star
15

triglib

Public Domain double precision trigonometry functions in C
C
39
star
16

simple_opencl

Simple C++ sample showing how to use OpenCL v1.2 on Windows/Linux/OSX with no 3rd party SDK installs
C
33
star
17

rg-etc1

Automatically exported from code.google.com/p/rg-etc1
C++
32
star
18

FastAC

FastAC - Amir Said's Arithmetic and Huffman coding library, example code, and documentation
C++
28
star
19

fasthf

Amir Said's C++ fast Huffman coding example (FastHF)
C++
25
star
20

sserangecoding

Fast vectorized (SSE 4.1) range coder for 8-bit alphabets
C++
23
star
21

imageresampler

Automatically exported from code.google.com/p/imageresampler
C
22
star
22

FloatMath

C++
17
star
23

fxdis-d3d1x

Automatically exported from code.google.com/p/fxdis-d3d1x
C++
15
star
24

dr_shirley_wright

Public record evidence showing the accomplishments of Dr. Shirley Jean Wright PhD
8
star
25

ufo_glasnost

English Translation of Col. Dr. Marina Popowitsch's book "UFO Glasnost"
8
star
26

shufrand

SSE 4.1 vectorized pseudorandom number generator (PCG variant)
C++
7
star
27

cpng

Compatible Network Graphics (CPNG) SDR/HDR Image Format Specification
7
star
28

random_pngs

A collection of random PNG test files
7
star
29

bc7enc_rdo_devel

C++
7
star
30

QBMOD

An Amiga MOD player written in QuickBASIC 4.5/PDS 7.1
VBA
7
star
31

ufo_data_search

ufo-search's client side search engine source code
C
5
star
32

png16

test repo
C++
4
star
33

lzham_alpha

This is LZHAM Alpha8, now supplanted by LZHAM 1.x here: https://github.com/richgel999/lzham_codec
C++
3
star
34

tga_test_files

A collection of .tga image format files in various formats (some obscure), for testing new decoders/readers
3
star
35

frank_scully

Scanned documents from the Frank Scully papers at the American Heritage Center
3
star
36

june_crain_files

Wright-Patterson AFB worker June Crain's official scanned personnel documents
2
star
37

junkdrawer

nothing special
1
star