• Stars
    star
    171
  • Rank 215,029 (Top 5 %)
  • Language
    C++
  • License
    MIT License
  • Created almost 7 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The study & production material for https://www.youtube.com/watch?v=Pc8DfEyAxzg

cpp_parallelization_examples

The study & production material for the video series:

Note that commit 0680aa19f50f4198f3d36d30fe778a50331f9bf5 is the state the programs were when episode 3 was released. After that, the example programs have been updated, bugs fixed etc.

Current timings chart on Bisqwit’s computer: Timings chart

Timings animation

Program list:

  • mandelbrot-vanilla: Vanilla algorithm without any parallelisation attempts. Simple Mandelbrot fractal rendering with some very basic optimizations that will be identical across all the other programs in this list.
  • mandelbrot-implicit-simd: Same as mandelbrot-vanilla, but rewritten as if it were SIMD; to attract SIMD optimizations in a compiler to manifest.
  • mandelbrot-openmp-simd: Same as mandelbrot-implicit-simd, but with OpenMP SIMD pragmas to help the compiler add SIMD optimization where they most probably would help
  • mandelbrot-cilkplus-simd: Same as mandelbrot-openmp-simd, but with CilkPlus pragmas rather than OpenMP pragmas
  • mandelbrot-explicit-simd: Same as mandelbrot-implicit-simd, but completely rewritten with Intel Intrinsics.
  • mandelbrot-openmp-loop: Same as mandelbrot-vanilla, but add OpenMP pragma for simple per-scanline threading.
  • mandelbrot-cilkplus-loop: Same as mandelbrot-openmp-loop, but with CilkPlus equivalents.
  • mandelbrot-thread-loop: Same as mandelbrot-openmp-loop, but using C++11 standard threads rather than pragmas. Algorithm is identical.
  • mandelbrot-openmp-offload: Same as mandelbrot-vanilla, but with minimal correct implementation at OpenMP offloading.
  • mandelbrot-openacc-offload: Same as mandelbrot-vanilla, but with minimal correct implementation at OpenACC offloading.
  • mandelbrot-cuda-offload: Same as mandelbrot-vanilla, but with minimal correct implementation at CUDA offloading.
  • mandelbrot-cuda-offload2: Same as mandelbrot-cuda-offload, but with small optimizations to get better performance.
  • mandelbrot-cuda-offload3: A mixture between mandelbrot-cuda-offload2, mandelbrot-thread-loop, and mandelbrot-explicit-simd.
  • mandelbrot-cuda-offload3b: Same as mandelbrot-cuda-offload3, but with small changes to threading logic in attempt to get better performance. (It failed.)

Misc. instructions

Build OpenMP programs

Add -fopenmp to both compiler and linker commandlines

When offloading, you may get linker problems from math functions if you do an optimized build. To resolve, add -foffload=-lm -fno-fast-math -fno-associative-math

Build CilkPlus programs

Add -fcilkplus to both compiler and linker commandlines, and -lcilkrts to linker commandline

Build OpenACC programs

Add -fopenacc to both compiler and linker commandlines

When offloading, you may get linker problems from math functions if you do an optimized build. To resolve, add -foffload=-lm -fno-fast-math -fno-associative-math

Note that you may get a significantly better performance with PGI Community Edition, which has a much more mature OpenACC implementation than with GCC does.

Build thread programs

Add -pthread to linker commandline

Build CUDA programs

  • Use nvcc
  • Add -x cu if your filename extension is something other than .cu

Run OpenACC offloading programs

Add one or more environment variables before running the program, to control offloading:

  • Optional: LD_LIBRARY_PATH=/usr/local/lib64
  • Optional: ACC_DEVICE_TYPE=<type>
  • Optional: GOMP_DEBUG=1

Possible device types:

  • hsa for HSA
  • nvidia or nvptx for NVidia PTX
  • mic or intelmic for Intel MIC (possibly emulator)
  • host for running on host

More information: https://gcc.gnu.org/wiki/Offloading

Run OpenMP offloading programs

Add one or more environment variables before running the program, to control offloading:

  • OMP_DEFAULT_DEVICE=<number>
  • LD_LIBRARY_PATH=/usr/local/lib64
  • Optional: OMP_DISPLAY_ENV=true
  • Optional: GOMP_DEBUG=1

Mapping between device numbers and offloading targets: Unknown, I found them experimentally. In any case, they start from 0 upwards.

Installing CUDA compiler:

apt-get install nvidia-cuda-toolkit

Building offloading-ready GCC:

More information: https://gcc.gnu.org/wiki/Offloading

ATTENTION: Do not just dump all this into your shell as a copypaste. You need to understand what you are doing.

# Preparation step for NVidia PTX offloading

# Download and install NVidia PTX target tools
cd /usr/local/src
git clone https://github.com/MentorEmbedded/nvptx-tools
cd nvptx-tools
./configure
make -j8
make install
# Preparation step for HSA offloading

# Download and install HSA library and drivers
#
# Note: This has changed. See https://github.com/RadeonOpenCompute for new instructions.

cd /usr/local/src
git clone https://github.com/HSAFoundation/HSA-Runtime-AMD.git
dpkg -i HSA-Runtime-AMD/ubuntu/hsa-runtime*_amd64.deb
git clone -b kfd-v1.6.x https://github.com/HSAFoundation/HSA-Drivers-Linux-AMD.git
echo 'KERNEL=="kfd", MODE="0666"' | sudo tee /etc/udev/rules.d/kfd.rules
dpkg -i HSA-Drivers-Linux-AMD/kfd*/ubuntu/libhsakmt*.deb
# Let’s build GCC. Go to a suitable build directory with enough free space
cd /dev/shm

# Download and extract GCC source code
wget ftp://ftp.gwdg.de/pub/misc/gcc/releases/gcc-7.1.0/gcc-7.1.0.tar.bz2
tar xvfj gcc-7.1.0.tar.bz2
cd gcc-7.1.0
# GCC build, part 1: NVidia PTX offloading compiler

# Download Newlib port for NVPTX
git clone https://github.com/MentorEmbedded/nvptx-newlib newlib # Still in the gcc-7.1.0 directory

# Build NVidia PTX offloading compiler
rm -rf build # Still in the gcc-7.1.0 directory
mkdir build
cd build
../configure \
        --target=nvptx-none --enable-as-accelerator-for=x86_64-linux-gnu \
        --disable-sjlj-exceptions --enable-newlib-io-long-long \
        --enable-checking=yes,df,fold,rtl \
        --enable-languages=c,c++,lto --with-build-time-tools=/usr/local/nvptx-none/bin
make -j8
make install
# GCC build, part 2: MIC offloading compiler & emulator

# Build MIC offloading compiler & emulator
# Add --disable-bootstrap to configure to make the build time shorter
rm -rf * # Still in the "build" directory
../configure  \
        --build=x86_64-intelmicemul-linux-gnu --host=x86_64-intelmicemul-linux-gnu \
        --target=x86_64-intelmicemul-linux-gnu --enable-as-accelerator-for=x86_64-linux-gnu \
        --enable-liboffloadmic=target \
        --enable-languages=c,c++,lto
make -j8
make install
# GCC build, part 3: Build actual host compiler

# Build the actual compiler, with support for both targets
# Add --disable-bootstrap to configure to make the build time shorter
# If you want to support HSA, add --with-hsa-runtime=/opt/hsa to configure
# If you want to support HSA, add also ,hsa to the --enable-offload-targets parameter.
# Add any languages you need into --enable-languages
rm -rf * # Still in the "build" directory
../configure  \
        --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu \
        --enable-offload-targets=x86_64-intelmicemul-linux-gnu,nvptx-none=/usr/local/nvptx-none \
        --enable-languages=c,c++,lto,jit --enable-lto --enable-host-shared \
        --enable-liboffloadmic=host \
        --with-cuda-driver=/usr
make -j8
make install

More Repositories

1

compiler_series

Material for the Creating a Compiler video lesson series.
Yacc
527
star
2

that_editor

*That* editor.
C++
356
star
3

that_terminal

It’s that terminal! This project was mostly created (or started) in a livestream series.
NASL
277
star
4

TinyDeflate

A deflate/gzip decompressor that requires minimal amount of memory to work
C++
170
star
5

speech_synth_series

Let’s Create a Speech Synthesizer
PHP
103
star
6

fft

A collection of Fast Fourier Transform algorithms implemented in C++20.
SourcePawn
101
star
7

password_codecs

Collection of password encoders and decoders created with the video series at: https://www.youtube.com/playlist?list=PLzLzYGEbdY5nEFQsxzFanSDv_38Hz0w7B
C++
84
star
8

adlmidi

ADLMIDI is a MIDI player that uses OPL3 emulation.
C++
56
star
9

tinyprintf

printf replacement for embedded programming
C++
51
star
10

crt-filter

The CRT filter that I used in my "what is that editor" video
C++
49
star
11

nescom

NES assembler and particularly clever disassembler
C++
43
star
12

dirr

ls replacement, friendlier than ls
C++
31
star
13

lvm2defrag

PHP
24
star
14

animmerger

Animation merging, quantizing and dithering swiss army knife
C++
18
star
15

nandcombinator

Exhaustive 2-input NAND combinations research tool
C++
17
star
16

viewnes

NES graphics/data hex inspection tool
PHP
13
star
17

tokumaru

Codemasters/Tokumaru NES tile compressor+decompressor
C++
5
star
18

6502delay

Collection of 6502 delay code generators
4
star
19

hy-ohte

Helsingin yliopisto: OHTE Harjoitustyö
3
star
20

polytrain

JavaScript
2
star
21

cysex21

This is just a mandatory exercise for the Cybersecurity course in University of Helsinki. It is not meant to be useful for any practical purpose.
PHP
2
star