• Stars
    star
    336
  • Rank 125,564 (Top 3 %)
  • Language
    C++
  • License
    MIT License
  • Created about 7 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Fundamental C++ SIMD types for Intel CPUs (sse, avx, avx2, avx512)

tsimd - Fundamental C++ SIMD types for Intel CPUs (sse to avx512)

This library is header-only and is implemented according to which Intel ISA flags are enabled in the translation unit for which they are used (e.g. -mavx with gcc or clang).

Master Status: Build Status

TODOs (contributions welcome!)

  • unsigned integer pack<> types
  • support for other CPU ISAs

Build Requirements

Using tsimd

  • C++11 compiler

(unofficial list of compilers, not all are tested)

  • GCC >= 4.8.1
  • clang >= 3.4
  • ICC >= 16
  • Visual Studio 2015 (64-bit target)

Building tsimd's examples/benchmarks/tests and installing from soure

  • cmake >= 3.2

Library layout and usage

The library is logically composed of 3 different components:

  1. The pack<T, W> class, which is a logical SIMD register
  2. Functions which can load and store packs in and out of larger arrays.
  3. Operators and functions to manipulate packs.

While there does not yet exist any true documentation, users are encouraged to see what type aliases are defined in tsimd/detail/pack.h, as well as what operators and functions are available in tsimd/detail/operators/ and tsimd/detail/functions/ respectively. Generally speaking, each header found in detail/ encapsulates exactly one type, operator, or function to aide in discovery.

Example

SAXPY

Consider the following function (kernel) taking values from two input arrays and storing in an output array.

// NOTE: n is the length of all 3 arrays
void saxpy(float a, int n, float x[], float y[], float out[])
{
  for (int i = 0; i < n; ++i) {
    const float xi = x[i];
    const float yi = y[i];
    const float result = a * xi + yi;
    out[i] = result;
  }
}

This kernel ends up applying the exact same formula to every element in the data. SIMD instructions enable us to reduce the total number of iterations by a factor of the CPU's SIMD register size. We do this by using tsimd types instead of builtin types.

// NOTE: n is the length of all 3 arrays
void saxpy_tsimd(float a, int n, float x[], float y[], float out[])
{
  using namespace tsimd;
  for (int i = 0; i < n; i += vfloat::static_size) {
    const vfloat xi = load<vfloat>(&x[i]);
    const vfloat yi = load<vfloat>(&y[i]);
    const vfloat result = a * xi + yi; // same formula!
    store(result, &out[i]);
  }
}

The advantage to this version (instead of using a specific SIMD width, say vfloat4 or vfloat8) is that the kernel function will be "widened" to the best available width based on how it gets compiled. In other words: 4-wide for SSE, 8-wide for AVX/AVX2, and 16-wide for AVX512.

More Repositories

1

superbuild_ospray

Use CMake's ExternalProject module to build OSPRay
CMake
15
star
2

match3D

A 3D viewer bootstrapping library
C
10
star
3

module_cpp_renderer

OSPRay module implementing renderer(s) which do everything in C++ (no ispc)
C++
6
star
4

CPW

C++ Parallelism Wrappers (use TBB/Cilk/OpenMP/etc without rewriting code)
C++
4
star
5

EmbedPTX

A CMake function to help embed PTX using obj2c
CMake
4
star
6

orbit_manipulator

A tiny C++/glm based Orbiting camera manipulator
C++
4
star
7

psimd

(experiments with) pragma-based SIMD C++ types
C++
3
star
8

module_nanort

Module which forwards ray intersection calls to NanoRT ray tracing kernels: https://github.com/lighttransport/nanort
C++
3
star
9

plugin_jet

OSPRay Studio plugin to integrate 'Jet' (fluid-engine-dev) library
C++
2
star
10

module_multihit

OSPRay multi-hit ray tracing module
C++
2
star
11

simd_wrappers_cppcon2018

Example code for my CppCon 2018 talk on SIMD wrappers in C++
C++
2
star
12

module_fiber_experiments

experiments with fibering (via boost.fiber)
C++
1
star
13

module_rayforce

OSPRay module for using Rayforce to do triangle mesh intersections
C++
1
star
14

module_brlcad

use BRLCAD geometry with OSPRay
C++
1
star
15

rfManta

C++
1
star
16

anari-build-all

CMake superbuild project to build ANARI-SDK + many implementations
CMake
1
star
17

cmake_project_commands

Missing CMake commands for project-centric builds
CMake
1
star
18

ftevaluator

small parallel fault-tree evaluator (example code for High Performance Parallelism Pearls Ch. 6)
C++
1
star
19

plugin_vtkm_demo

OSPRay Studio plugin to showcase VTKm filters
C++
1
star
20

plugin_dlaf

OSPRay Studio plugin to showcase DLAF generated point data
C++
1
star