• Stars
    star
    2
  • Language Cuda
  • License
    MIT License
  • Created about 9 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Numerical simulation of flocking behavior using CUDA and OpenGL

More Repositories

1

CRC

Fastest CRC32 for x86, Intel and AMD, + comprehensive derivation and discussion of various approaches
C++
219
star
2

RGB2Y

Fastest CPU (AVX/SSE) RGB to grayscale: 2-4x faster than OpenCV. For image processing/computer vision.
C++
89
star
3

KFAST

Implementation of FAST feature detector for computer vision (Rosten 2006) using AVX2 to outperform canonical implementation by up to 600%.
C
74
star
4

SortingNetworks

Fastest CPU SIMD (SSE4) sorting networks for small integer arrays (2-6 elements), also optimal amd64 assembly and notes on getting compilers to generate optimal sorting networks.
Assembly
42
star
5

KORAL

Novel extreme-performance CPU-GPU cooperative feature detector-descriptor for computer vision.
C++
38
star
6

FastArrayOps

Extremely fast x86 / AVX2 assembly implementations of common operations for linear arrays: checking whether array contains element, finding index of element, finding min/max element, finding index of min/max element.
Assembly
36
star
7

LATCH

Fastest CPU implementation of the LATCH 512-bit binary feature descriptor; fully scale- and rotation-invariant
C++
34
star
8

CLATCH

Insanely fast CUDA LATCH: fully scale- and rotation-invariant 512-bit binary descriptor for computer vision
C++
32
star
9

CUDAKfNN

Fastest CUDA SIFT or other 128-float vector matcher for computer vision
C++
25
star
10

KPS

Infrastructure for simultaneous orbital and attitude propagation, with attitude-based real-time analytical aerodynamics simulation
C++
23
star
11

FastDivide

Divide 64-bit integers faster than hardware. Or precompute for a given denom and quickly divide repeatedly.
C++
22
star
12

KLERP

Fastest CPU (AVX2) Bilinear and Nearest-Neighbor Interpolation: 25-100% faster than OpenCV. For computer vision / image processing.
C++
19
star
13

CUDAK2NN

Insanely fast CUDA 2NN 512-bit binary descriptor matcher for computer vision
C++
14
star
14

CUDARGB2Y

Fastest CUDA RGB to grayscale: 5-30x faster than OpenCV. For image processing/computer vision.
C++
14
star
15

KNES

Complete, lightweight NES emulator in C++, speedcoded in 3 days.
C++
14
star
16

KfNN

Fastest CPU (AVX/SSE) SIFT or other 128-float vector matcher for computer vision
C++
13
star
17

CUDALERP

Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - uint8_t data
C++
12
star
18

BoxBlur

Fastest CPU (AVX/SSE) Horizontal Box Blur for image processing and computer vision
C++
10
star
19

K2NN

Fast bruteforce and Multi-Index Hash (MIH) accelerated 2NN matchers for 512-bit binary descriptors for computer vision
C++
10
star
20

CUDAHammingMean

Fastest GPU implementation of a brute-force Hamming-weight matrix sum/mean for 512-bit binary descriptors.
C++
9
star
21

ULATCH

Fastest CPU implementation of the LATCH 512-bit binary feature descriptor for computer vision (upright)
C++
9
star
22

CUDAFLERP

Fast CUDA (GPU) Bilinear and Nearest-Neighbor Interpolation at high accuracy - float32 data
C++
9
star
23

FastThreadPool

Fast lock-free thread pool
C++
8
star
24

UCLATCH

Insanely fast CUDA LATCH 512-bit binary descriptor for computer vision (upright)
C++
8
star
25

FastIntegerSqrt

Fastest implementations of 32-bit and 64-bit integer square roots for x86-64
C++
7
star
26

FeatureAngle

Extremely fast SSE gradient (angle of rotation) computation of grayscale features in an image, for image processing and computer vision.
C++
7
star
27

popcount

Fastest possible x86 implementation of popcount/population count/Hamming weight/counting set bits
C++
6
star
28

BitOps

Basic, efficient, header-only bit ops and bit array primitives for modern x86. Tests provided.
C++
6
star
29

MATLAB-KDrag

Orbital and attitude propagator with B-dot and *dynamic* aerodynamic drag simulation, including torque computation for aero-stabilized bodies.
MATLAB
6
star
30

CUDAKfNN_packed

Fastest CUDA SIFT or other 128-float *packed as uint8_t* vector matcher for computer vision
C++
5
star
31

EllipticCurveFactorization

Fast, single-file, MIT-licensed large integer factorization using ECM combined with other techniques.
C++
5
star
32

PyCruiseControl

Modified divorced PID controller applied to car cruise control and accompanying physics simulation and visualizations
Python
5
star
33

ArduinoPhysics

Realtime 2D physics and collision detection on an Arduino with 60 fps output to a Sharp memory LCD.
C++
5
star
34

MemoryOrder

Demos of 3 ways even the strong memory model of x86 can exhibit architectural memory reordering, leading to bugs
C++
5
star
35

PrimeSieve

Super fast, dynamically expanding prime sieve for primality queries, forward or backward iteration
C++
4
star
36

ModularSqrt

Fast modular square root of primes and prime powers, including 2. Interface uses GMP bigints.
C++
4
star
37

KFAST_OpenMVG

Custom version of KFAST for integration into OpenMVG
C++
4
star
38

smart_tm

a smart, leap-second- and leap-day-aware, fast, 64-bit-capable replacement for the ctime 'tm' struct
C++
3
star
39

KHALF

Optimized special-case bilinear interpolation, halving the width and not changing the height, for computer vision dual-frame display.
C++
3
star
40

MATLABCruiseControl

Modified divorced PID controller applied to car cruise control and accompanying physics simulation and visualizations - MATLAB port
MATLAB
3
star
41

Factorization-Primality

Extremely fast, single-file factorization and primality testing for 32-bit and 64-bit integers on x86.
C++
3
star
42

SMC-Demo

Minimal demo of self-modifying code on Windows. Still doable, still useful.
Assembly
3
star
43

UnsignedIntegralToFloatingPoint

Notes on fast standards-compliant conversion of U32/U64 to and from float/double, which compilers do not get right.
3
star
44

SingleLinePythonSudoku

Single-line Python Sudoku solver
2
star
45

Boids_SDL

Numerical simulation of flocking behavior using pure CPU and SDL.
C++
2
star
46

Sudoku

Fast sudoku solver with detection of no solution/single solution/multiple solutions/invalid initial board
C++
2
star
47

SolveModularQuadratic

Generate all solutions to a modular quadratic equation. Supports any modulus. Interface uses GMP bigints.
C++
2
star
48

Schematic

Basic toy Lisp interpreter in a few hundred lines of C++.
C++
2
star
49

Leftpack

Fast AVX2 leftpack/compress implementations (keep and contiguously pack a subset of elements)
C++
1
star
50

U128

Fast unsigned 128-bit integer class for MSVC since it doesn't natively support __uint128_t yet
C++
1
star
51

FastDivide128

Getting __udivti3 or __umodti3 errors? Just want faster division/modulo for 128-bit ints on Clang? Look no further.
C++
1
star