• Stars
    star
    421
  • Rank 102,977 (Top 3 %)
  • Language
  • Created over 8 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is a list of useful libraries and resources for CUDA development.

Awesome Cuda

This is a list of useful libraries and resources for CUDA development.

Presentations

  • Optimizing Parallel Reduction in CUDA - In this presentation it is shown how a fast, but relatively simple, reduction algorithm can be implemented.

  • CUDA C/C++ BASICS - This presentations explains the concepts of CUDA kernels, memory management, threads, thread blocks, shared memory, thread syncrhonization. A simple addition kernel is shown, and an optimized stencil 1D stencil kernel is shown.

  • Advanced CUDA - Optimizing to Get 20x Performance - This presentation covers: Tesla 10-Series Architecture, Particle Simulation Example, Host to Device Memory Transfer, Asynchronous Data Transfers, OpenGL Interoperability, Shared Memory, Coalesced Memory Access, Bank Conflicts, SIMT, Page-locked Memory, Registers, Arithmetic Intensity, Finite Differences Example, Texture Memory.

  • Advanced CUDA Webinar - Memory Optimizations - This presentation covers: Asynchronous Data Transfers , Context Based Synchronization, Stream Based Synchronization, Events, Zero Copy, Memory Bandwidth, Coalescing, Shared Memory, Bank Conflicts, Matrix Transpose Example, Textures.

  • Better Performance at Lower Occupancy - Excellent presentation where it is shown that we can achieve better performance by assigning more parallel work to each thread and by using Instruction-level parallelism. Covered topics are: Arithmetic Latency, Arithmetic Throughput, Little's Law, Thread-level parallelism(TLP), Instruction-level parallelism(ILP), Matrix Multiplication Example.

  • Fun With Parallel Algorithms. Segmented Scan. Neutral territory method - In these slides, it is shown how a segmented scan can easily be implemented using a variation of a normal scan.

  • GPU/CPU Programming for Engineers - Lecture 13 - This lecture provides a good walkthrough of all the different memory types: Global Memory, Texture Memory, Constant Memory, Shared Memory, Registers and Local Memory.

Libraries

  • Thrust - A parallel algorithms library whose main goal is programmer productivity and rapid development. But if your main goal is reaching the best possible performance, you are advised to use a more low-level library, such as CUDPP or chag::pp.

  • Hemi - A nice little utility library that allows you to write code that can be run either on the CPU or GPU, and allows you to launch C++ lambda functions as CUDA kernels. Its main goal is to make it easier to write portable CUDA programs.

  • CUDPP - A library that provides 15 parallel primitives. In difference to Thrust, CUDPP is a more performance oriented library, and it is also much more low-level. Recommended if performance is more important than programmer productivity.

  • Parallel Primitives Library: chag::pp - This library provides the parallel primitives Reduction, Prefix Sum, Stream Compaction, Split, and Radix Sort. The authors have demonstrated that their implementation of Stream Compaction and Prefix Sum are the fastest ones available!

Papers

Articles

Videos

Contributing

This list is still under construction and is far from done. Anyone who wants to add links to the list are very much welcome to do so by a pull request!

More Repositories

1

vulkan_minimal_compute

Minimal Example of Using Vulkan for Compute Operations. Only ~400LOC.
C++
697
star
2

regl-cnn

Digit recognition with Convolutional Neural Networks in WebGL
JavaScript
502
star
3

wireframe-world

An infinite wireframe world in WebGL
JavaScript
357
star
4

poisson_blend

Seamless copy-and-paste of images with Poisson Blending.
C++
335
star
5

gl-water2d

2D liquid simulation in WebGL
JavaScript
286
star
6

hole_fixer

Demo implementation of smoothly filling holes in 3D meshes using surface fairing
C++
250
star
7

glsl-godrays

This module implements a volumetric light scattering effect(godrays)
JavaScript
190
star
8

cute-deferred-shading

Cute little deferred shading implementation.
C++
137
star
9

gl-catmull-clark

A javascript implementation of the Catmull-Clark subdivision surface algorithm
JavaScript
104
star
10

fluid_sim

Flashy 2D fluid simulations experiments.
C
102
star
11

gl-rock

Procedural Generation of Rocks in WebGL
JavaScript
97
star
12

glsl-worley

Worley noise implementation for WebGL shaders
GLSL
85
star
13

pnp-gui

Minimalistic Immediate Mode GUI toolkit for WebGL
JavaScript
83
star
14

sse-avx-rasterization

Triangle rasterization routines accelerated by SSE and AVX
C++
62
star
15

cloud_gen

Procedural Generation of Clouds with Vector Graphics
C++
49
star
16

webgl-rsm

Real-time Indirect lighting with Reflective Shadows Maps in WebGL
JavaScript
45
star
17

planar_proj_shadows

Demo of Planar Projected Shadows in regl
JavaScript
32
star
18

glsl-cos-palette

glsl function for making cosine palettes
JavaScript
31
star
19

teapot_shooter

Augmented Reality Teapot Shooter made using Unity and ARCore
C#
30
star
20

image-load-store-demo

A small demo and tutorial of the image load/store feature of OpenGL 4
C++
26
star
21

glsl-gradient-palette

Module for creating gradient palettes for usage in glsl.
JavaScript
19
star
22

regl-anim

Some weird animations made with regl and WebGL
JavaScript
17
star
23

regl-fire

Fire particle system made with regl
JavaScript
17
star
24

tess-opt

Demonstration of how we can use tessellation shaders to make faster fragment shaders.
C++
15
star
25

erkaman.github.io

The source code of my website.
HTML
13
star
26

regl-webvr-demo

Demo that shows how to use regl and WebVR together
JavaScript
11
star
27

ffmpeg-add-text-to-video-tutorial

Tutorial that shows how to add text to a video file with ffmpeg
9
star
28

parle-cuda

A reference implementation of RLE in CUDA
Cuda
9
star
29

NeoTextureEdit2

Fork of NeoTextureEdit that fixes several things.
Java
8
star
30

regl-stats-widget

Small widget for displaying statistics of regl
JavaScript
8
star
31

gl-quads-to-tris

WebGL helper module that converts an array of quad indices to an array of triangles indices
JavaScript
7
star
32

spiky-anim

The source code of some silly spiky animation I made
C++
7
star
33

smiley

:^) :^) :^) :^) :^) :^), :^) :^) :^) :^) :^) :^) :^) :^) :^) :^) :^).
JavaScript
6
star
34

particle-simd

SIMD-accelerated particle simulation in C++
C++
5
star
35

digital-image-formats

An exploration of the inner workings of digital image formats.
C++
3
star
36

gl-camera-pos-from-view-matrix

A helper module that allows you to recover the camera position from a view matrix
JavaScript
3
star
37

regl-cpp

Remaking regl in C++
C
3
star
38

font_creator_cpp

A simple program that creates a font atlas using FreeType
C++
2
star
39

scan_shift_jis.rb

Small utility for scanning Shift-JIS encoded strings in a file.
Ruby
1
star
40

smw-tools

Java
1
star
41

font_sheet.py

Python
1
star
42

cloth

C++
1
star
43

BrainfuckSharp

A Brainfuck compiler for the .NET framework.
C#
1
star
44

sculpture

C++
1
star
45

dotfiles

My miscellaneous configuration files.
Emacs Lisp
1
star
46

aabb_create

Small utility for computing an AABB from a wavefront object file.
C++
1
star
47

texture-editor

C++
1
star