• Stars
    star
    140
  • Rank 261,473 (Top 6 %)
  • Language
    C++
  • License
    BSD 2-Clause "Sim...
  • Created about 8 years ago
  • Updated almost 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Base64 coding and decoding with SIMD instructions (SSE/AVX2/AVX512F/AVX512BW/AVX512VBMI/ARM Neon)

base64 using SIMD instructions

Overview

Repository contains code for encoding and decoding base64 using SIMD instructions. Depending on CPU's architecture, vectorized encoding is faster than scalar versions by factor from 2 to 4; decoding is faster 2 .. 2.7 times.

There are several versions of procedures utilizing following instructions sets:

  • SSE,
  • AVX2,
  • AVX512F,
  • AVX512BW,
  • AVX512VBMI,
  • AVX512VL,
  • BMI2, and
  • ARM Neon.

Vectorization approaches were described in a series of articles:

Daniel Lemire and I wrote also paper Faster Base64 Encoding and Decoding Using AVX2 Instructions which was published by ACM Transactiona on the Web.

Performance results from various machines are located in subdirectories results.

Project organization

There are separate subdirectories for both algorithms, however both have the same structure. Each project contains four programs:

  • verify --- does simple validation of particular parts of algorithms,
  • check --- validates whole procedures,
  • speed --- compares speed of different variants of procedures,
  • benchmark --- similarly to speed but works on small buffers and calculates CPU cycle rate (available only for Intel architectures).

Building

Change to either directory encode or decode and then use following make commands.

command tools instruction sets
make verify, check, speed, benchmark scalar, SSE, BMI2
make avx2 verify_avx2, check_avx2, speed_avx2, benchmark_avx2 scalar, SSE, BMI2, AVX2
make avx512 verify_avx512, check_avx512, speed_avx512, benchmark_avx512 scalar, SSE, BMI2, AVX2, AVX512F
make avx512bw verify_avx512bw, check_avx512bw, speed_avx512bw, benchmark_avx512bw scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW
make avx512vbmi verify_avx512vbmi, check_avx512vbmi, benchmark_avx512vbmi scalar, SSE, BMI2, AVX2, AVX512F, AVX512BW, AVX512VBMI
make xop verify_xop, check_xop, speed_xop, benchmark_xop scalar, SSE and AMD XOP
make arm verify_arm, check_arm, speed_arm scalar, ARM Neon

Type make run (for SSE) or make run_ARCH to run all programs for given instruction sets; ARCH can be "sse", "avx2", "avx512", "avx512bw", "avx512vbmi", "avx512vl".

BMI2 presence is determined based on /proc/cpuinfo or a counterpart. When an AVX2 or AVX512 targets are used then BMI2 is enabled by default.

AVX512

To compile AVX512 versions of the programs at least GCC 5.3 is required. GCC 4.9.2 doesn't have AVX512 support.

Please download Intel Software Development Emulator in order to run AVX512 variants via make run_avx512, run_avx512bw or run_avx512vbmi. The emulator path should be added to the PATH.

Known problems

Both encoding and decoding don't match the base64 specification, there is no processing of data tail, i.e. encoder never produces '=' chars at the end, and decoder doesn't handle them at all.

All these shortcoming are not present in a brilliant library by Alfred Klomp: https://github.com/aklomp/base64.

See also

Who uses our algorithms?

More Repositories

1

pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm
C
830
star
2

sse-popcount

SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
C++
296
star
3

toys

Storage for my snippets, toy programs, etc.
C++
288
star
4

sse4-strstr

SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
C++
213
star
5

base64-avx512

Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
C
189
star
6

simd-sort

AVX512F and AVX2 versions of quick sort
C++
93
star
7

simd-string

SIMD (SSE) string functions
C++
88
star
8

aspell-python

Python wrapper for aspell (C extension and python version)
C
78
star
9

simd-search

Sample program for article "SIMD-ized searching in unique constant dictionary" (http://0x80.pl/articles/simd-search.html)
C++
48
star
10

man-intrinsics

Create man pages from information used by Intel Intrinsics Guide and optionally uops.info
Python
31
star
11

cleanup-headers

Remove unnecessary includes from C/C++ source files
Python
25
star
12

simd-byte-lookup

SIMDized check which bytes are in a set
Python
23
star
13

interview-questions

A list of question to a prospective employer
23
star
14

parsing-int-series

Parse multiple decimal integers separated by arbitrary number of delimiters
C++
22
star
15

pyDAWG

Python module (C extension and plain python) implementing DAWG
C
19
star
16

canvas2svg

Save Tk canvas in SVG
Python
18
star
17

sse2string

implementation of some function from string.h using SSE2 instructions
Assembly
14
star
18

locatedb

Python lib/prog to manipulate locatedb files
Python
13
star
19

integer-sequence-encoding

Experiments with encoding/compression integer sequences using different algorithms
Python
12
star
20

utf16utf8

Conversions between UTF16 and UTF8
C++
8
star
21

ternarylogiccli

CLI utilty to work out proper constants for vpternlogic instruction
Python
8
star
22

avx512popcnt-superoptimizer

C
7
star
23

simd-heap

C++
7
star
24

trigraph-search

Trigram search experiments
C++
7
star
25

smart-fork

Fork of SMART framework (http://www.dmi.unict.it/~faro/smart/)
C
6
star
26

base64-pictures

Python
5
star
27

random-binary-trees

Experiments with random binary trees
Python
5
star
28

compression

experiments with lossless compression
C++
4
star
29

cpp-threads

C++ Threads
C++
4
star
30

wikibooks-plpgsql

Sample programs & images from polish wikibook "Procedury składowane w PostgreSQL" (Stored procedures in PostgreSQL)
3
star
31

loadppm

Loader for PPM raster files
C
3
star
32

ram-machine-emulator

RAM machine emulator (Javascript from ancient times & console versions in SML and Python)
JavaScript
3
star
33

gtkcrop

Select portion of image (GTK app)
Python
3
star
34

fslistview

PythonFUSE driver that exposes list of files in a virtual directory
Python
3
star
35

graphics

Various graphics algorithms (Tkinter, PIL, Javascript demos)
Python
3
star
36

recovercr3

A CLI tool for recovering Canon CR3 photos from memory dumps
Python
2
star
37

pydvi2svg

pydvi2svg converts DVI files (output from TeX) to SVG vector graphics
Python
2
star
38

tex

TeX and LaTeX scripts
TeX
2
star
39

simple-sat-solver

Simple SAT solver
C++
2
star
40

fbi16

image viewer for linux vga16 framebuffer
C
2
star
41

convex-polygon-intersection

Detecting intersection of convex polygons in 2D
JavaScript
2
star
42

gimp-plugins

Python
1
star
43

blender-gui

Object-orientated GUI for Blender (not maintained)
HTML
1
star
44

presentations

TeX
1
star
45

publish

Software to manage a collection of photos
Python
1
star
46

tikz2png

Convert TikZ pictures into raster graphics
Python
1
star
47

wikibooks-posix-threads

Sample programs from polish wikibook "POSIX Threads"
C
1
star
48

pliterki

Complete plain-ASCII text with Polish diacritic characters
Python
1
star
49

timetracker

Simple time tracking utility (a command line application)
Python
1
star
50

Xscr

simple X wrapper for framebuffer-related toy-applications
C
1
star
51

image-similarity

Attempt to boost some parts of image similarity check which can be found in geeqie
C
1
star