• Stars
    star
    203
  • Rank 192,890 (Top 4 %)
  • Language
    C++
  • Created over 9 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

Course Objectives

Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number of applications that traditionally used Application Specific Integrated Circuits (ASICs) are now implemented with concurrent processors in order to improve functionality and reduce engineering cost. The real challenge is to develop applications software that effectively uses these concurrent processors to achieve efficiency and performance goals.

The aim of this course is to provide students with knowledge and hands-on experience in developing applications software for processors with massively parallel computing resources. In general, we refer to a processor as massively parallel if it has the ability to complete more than 64 arithmetic operations per clock cycle. Many commercial offerings from NVIDIA, AMD, and Intel already offer such levels of concurrency. Effectively programming these processors will require in-depth knowledge about parallel programming principles, as well as the parallelism models, communication models, and resource limitations of these processors. The target audiences of the course are students who want to develop exciting applications for these processors, as well as those who want to develop programming tools and future implementations for these processors.

We will be using NVIDIA processors and the CUDA programming tools in the lab section of the course. Many have reported success in performing non-graphics parallel computation as well as traditional graphics rendering computation on these processors. You will go through structured programming assignments before being turned loose on the final project. Each programming assignment will involve successively more sophisticated programming skills. The final project will be of your own design, with the requirement that the project must involve a demanding application such as mathematics- or physics-intensive simulation or other data-intensive computation, followed by some form of visualization and display of results.

Acknowledgements

This course is based on Wen-mei Hwu & David Kirk's UIUC Applied Parallel Programming class. We appreciate their generosity in providing their course materials to others.

More Repositories

1

tuple_utility

The missing C++ tuple functionality
C++
84
star
2

hindley_milner

A C++11 implementation of Hindley-Milner type inference.
C++
77
star
3

thrust-workshop

Introductory Thrust workshop materials
C++
43
star
4

bulk

Launching collective tasks in bulk
C++
36
star
5

future

Implementation of std::experimental::future from the C++ Concurrency TS
C++
30
star
6

is_call_possible

C++03 functionality for checking for the existence of a member function with a given name and signature.
C++
26
star
7

managed_allocator

A C++ allocator based on cudaMallocManaged
Cuda
23
star
8

cudex

CUDA executors
C++
14
star
9

cuda_launch_config

Utilities for automatically selecting a CUDA kernel launch configuration
C++
14
star
10

variant

Standalone C++11 implementation of variant
C++
11
star
11

tuple

Freestanding std::tuple implementation
C++
9
star
12

gotham

Photorealistic Renderer based on Unbiased Rendering Algorithms
C++
8
star
13

newton

Thrust-accelerated numerics
C++
8
star
14

cuda_graphs_executor_prototypes

Executor prototypes for interacting with traditional CUDA kernel launch and CUDA Graphs.
C++
8
star
15

then

Experimental implementation of future::then()
C++
7
star
16

cumem

CUDA C++ memory utilities
C++
6
star
17

process

A std::thread-alike for processes
C++
6
star
18

compiler

toy compiler
C++
6
star
19

shmalloc

Dynamic __shared__ memory allocation for CUDA
C++
6
star
20

thread_pool

A simple thread pool implementation
C++
5
star
21

personal

odds & ends that don't belong anywhere else
C++
5
star
22

igloo

Physically-based Renderer Igloo
C++
4
star
23

interprocess_future

A std::-like future & promise pair for inter-process communication
C++
4
star
24

strange

Ranges for Thrust.
C++
4
star
25

static_process_pool

A std::static_thread_pool-alike for processes.
C++
4
star
26

static_algorithm

A Statically-Unrolled C++ Algorithms Library
C++
4
star
27

overload

Standalone CUDA-compatible C++11 implementation of overload
C++
3
star
28

gpu_algorithms

Implementing GPU algorithms various ways
Cuda
3
star
29

simple_cuda_executor

A demonstration of the implementation of a very simple GPU executor in CUDA
Cuda
3
star
30

coord

Utilities for navigating multi-dimensional iteration spaces
C++
3
star
31

cuda_cpp_template

Jared's repository template
C++
3
star
32

coordinate

A mathematical vector in an N-dimentional space
C++
3
star
33

thrust-simple-benchmarks

Simple Thrust benchmarks
Cuda
2
star
34

omp_parallel_invoke

Porting parallel_invoke to OpenMP
C++
2
star
35

bounding_box_hierarchy

A generic data structure for finding intersections between rays and geometric objects
C++
2
star
36

thrust-benchmarks

C
2
star
37

any

Generic container for objects of a discriminated type
2
star
38

nvcc-scons

A SCons build tool for the NVIDIA compiler
Python
2
star
39

lazy_cuda_executor

Toy implementation of a lazy CUDA executor
C++
2
star
40

optional

Standalone C++11 implementation of optional
C++
2
star
41

set_intersection

Building a better set_intersection based on balanced_path
C
1
star
42

thrust-agency

An experimental Agency backend for Thrust.
C++
1
star
43

croquet

Prototype implementation of C++ Executors, Senders, & Receivers
C++
1
star
44

kaleidoscope

Working through the LLVM tutorial
C++
1
star
45

ndarray

A multidimensional array container.
C++
1
star
46

immutable_ptr

Stronger than const, faster than a cached load
1
star
47

active_message

Demo of active messages using OpenSHMEM
C++
1
star
48

thrust_bind

Implementation of thrust::bind similar to std::bind
C++
1
star
49

always_ready_future

An as-lightweight-as-possible future-like type holding a value that is always ready
C++
1
star
50

bitmap_allocator

Simple bitmap memory allocator using standard C++ library components
C++
1
star
51

string_view

An immutable view of a string
1
star
52

hello_sockets

Simple POSIX sockets hello world program
C++
1
star
53

operator_traits

C++11 type traits to check that a type has arithmetic operators
C++
1
star
54

any_small

A type-erasing container for small objects
C++
1
star
55

async_reduce

An asynchronous reduction algorithm implemented using Agency
C++
1
star
56

shmem_executor

An executor which creates execution on OpenSHMEM processing elements
C++
1
star
57

fancy_customization_point

Idea for a fancy type of Niebler-style customization point
C++
1
star
58

simple_cuda_then_executor

A very simple ThenExecutor implementation using CUDA
C++
1
star
59

recursive_variant

A std::variant-alike permitting alternatives that are incomplete types
C++
1
star
60

integer_sequence

Standalone C++11 implementation of integer_sequence and friends
C++
1
star
61

shared_array

A safe CUDA __shared__ array container
1
star
62

time_invocation

C++ utility for measuring the mean time of a function call
C++
1
star
63

is_strict_weak_ordering

Utilities for certifying whether a binary relation on a set is a strict weak ordering.
1
star
64

new_process_executor

An executor which creates execution by spawning new processes
C++
1
star
65

migrate_hg_to_git

Migrates an existing Mercurial repository to Git
Shell
1
star
66

nuke

The only way to be sure.
C++
1
star
67

pointer_adaptor

Adapts a handle to a value into a pointer-like type
C++
1
star
68

nes_emulator

This is an emulator for the Nintendo Entertainment System (NES) written in modern C++.
C++
1
star
69

constant

C++20 class template for a compile-time constant value
C++
1
star
70

variable

C++20 expression template for a variable whose value is unknown
C++
1
star
71

morton

Simple C++ code for encoding and decoding Morton (Z-Curve) codes
C++
1
star