• Stars
    star
    177
  • Rank 214,868 (Top 5 %)
  • Language
    C++
  • License
    Other
  • Created over 11 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Easy to run kernels using OpenCL

Table of Contents generated with DocToc

EasyCL

Easy to run kernels using OpenCL. (renamed from OpenCLHelper)

  • makes it easy to pass input and output arguments
  • handles much of the boilerplate
  • uses clew to load opencl dynamically

Example Usage

Imagine we have a kernel with the following signature, in the file /tmp/foo.cl:

kernel void my_kernel( int N, global float *one, global float *two, local float *one_local, global float *result ) {
    // kernel code here...
}

... then we can call it like:

#include "EasyCL.h"

if( !EasyCL::isOpenCLAvailable() ) {
    cout << "opencl library not found" << endl;
    exit(-1);
}
EasyCL *cl = EasyCL::createForFirstGpu();
CLKernel *kernel = cl->buildKernel("somekernelfile.cl", "test_function");
int in[5];
int out[5];
for( int i = 0; i < 5; i++ ) {
    in[i] = i * 3;
}
kernel->in( 5, in );
kernel->out( 5, out );
kernel->run_1d( 5, 5 ); // global workgroup size = 5, local workgroup size = 5
delete kernel;
// use the results in 'out' array here

More generally, you can call on 2d and 3d workgroups by using the kernel->run method:

const size_t local_ws[1]; local_ws[0] = 512;
const size_t global_ws[1]; global_ws[0] = EasyCL::roundUp(local_ws[0], size);
kernel->run( 1, global_ws, local_ws ); // 1 is number of dimensions, could be 2, or 3

'Fluent' style is also possible, eg:

kernel->in(10)->in(5)->out( 5, outarray )->run_1d( 5, 5 );

If you use EasyCL::createForFirstGpu(), EasyCL will bind to the first OpenCL-enabled GPU (or accelerator), that it finds. If you want to use a different device, or an OpenCL-enabled CPU, you can use one of the following method:

EasyCL::createForIndexedGpu( int gpuindex ); // looks for opencl-enabled gpus, and binds to the (gpuindex+1)th one
EasyCL::createForFirstGpuOtherwiseCpu();
EasyCL::createForPlatformDeviceIndexes( int platformIndex, int deviceIndex );
EasyCL::createForPlatformDeviceIds( int platformId, int deviceId ); // you can get these ids by running `gpuinfo` first

You can run gpuinfo to get a list of platforms and devices on your system.

There are some examples in the test subdirectory.

Environment Vars

You can use the environment variable CL_GPUOFFSET to choose a GPU. It shifts the gpu numbering downwards by this offset, ie gpu index 1 becomes 0, index 2 becomes 1. For example, if a program uses gpu index 0 by default, setting CL_GPUOFFSET to 1 will choose the second gpu, and setting it to 2 will choose the third gpu.

Examples

There are some examples in the test subdirectory.

  • create a couple of Wrapper objects, pass them into a kernel, look at the results, see method testfloatwrapper, main ) of testfloatwrapper.cpp
  • (New!) to use with clBLAS, see testclblas.cpp

API

// constructor:
EasyCL::EasyCL();
// choose different gpu index
void EasyCL::gpu( int gpuindex );

// compile kernel
CLKernel *EasyCL::buildKernel( string kernelfilepath, string kernelname, string options = "" );

// Note that you pass `#define`s in through the `options` parameters, like `-D TANH`, or `-D TANH -D BIASED`

// passing arguments to kernel:

CLKernel::in( int integerinput );

CLKernel::in( int arraysize, const float *inputarray ); // size in number of floats
CLKernel::in( int arraysize, const int *inputarray ); // size in number of ints
CLKernel::out( int arraysize, float *outputarray ); // size in number of floats
CLKernel::out( int arraysize, int *outputarray ); // size in number of ints
CLKernel::inout( int arraysize, float *inoutarray ); // size in number of floats
CLKernel::inout( int arraysize, int *inoutarray ); // size in number of ints

// to allocate local arrays, as passed-in kernel parameters:
CLKernel::localFloats( int localarraysize ); // size in number of floats
CLKernel::localInts( int localarraysize ); // size in number of ints

// running kernel, getting result back, and cleaning up:
CLKernel::run_1d( int global_ws, int local_ws );
CLKernel::run( int number_dimensions, size_t *global_ws, size_t *local_ws );

// helper function:
EasyCL::roundUp( int quantizationSize, int desiredTotalSize );

CLArray and CLWrapper objects

To make it possible to reuse data between kernels, without moving back to PC main memory, and back onto the GPU, you can use CLWrapper objects.

These can be created on the GPU, or on the host, and moved backwards and forwards between each other, as required. They can be passed as an 'input' and 'output' to a CLKernel object. They can be reused between kernels.

There are two 'flavors':

  • CLArray: more automated, but more memory copying, since creates a new array on the host
  • CLWrapper: wraps an existing host array, you'll need to call copyToDevice() and copyToHost() yourself

CLArray objects are the first implementation. CLWrapper objects are the second implementation. You can use either, but note that CLWrapper objects are the ones that I use myself.

CLWrapper objects

Compared to CLArray objects, CLWrapper objects need less memory copying, since they wrap an existing native array, but you will need to call copyToDevice() and copyToHost() yourself.

if( !EasyCL::isOpenCLAvailable() ) {
    cout << "opencl library not found" << endl;
    exit(-1);
}
cout << "found opencl library" << endl;

EasyCL cl;
CLKernel *kernel = cl.buildKernel("../test/testeasycl.cl", "test_int");
int in[5];
for( int i = 0; i < 5; i++ ) {
    in[i] = i * 3;
}
int out[5];
CLWrapper *inwrapper = cl.wrap(5, in);
CLWrapper *outwrapper = cl.wrap(5, out);
inwrapper->copyToDevice();
kernel->in( inwrapper );
kernel->out( outwrapper );
kernel->run_1d( 5, 5 );
outwrapper->copyToHost();
assertEquals( out[0] , 7 );
assertEquals( out[1] , 10 );
assertEquals( out[2] , 13 );
assertEquals( out[3] , 16 );
assertEquals( out[4] , 19 );
cout << "tests completed ok" << endl;

Can copy between buffers (New!):

wrapper1->copyTo( wrapper2 );

CLWrapper objects are currently available as CLIntWrapper and CLFloatWrapper.

CLArray objects

Compared to CLWrapper objects, CLArray objects are more automated, but involve more memory copying.

EasyCL cl;

CLArrayFloat *one = cl.arrayFloat(10000); // create CLArray object for 10,000 floats
(*one)[0] = 5; // give some data...
(*one)[1] = 7;

CLArrayFloat *two = cl.arrayFloat(10000);

// pass to kernel:
kernel->in(one)->out(two);

You can then take the 'two' CLArray object, and pass it as the 'input' to a different kernel, or you can use operator[] to read values from it.

Currently, CLArray is available as 'CLArrayFloat' and 'CLArrayInt'.

Kernel store

You can store kernels in the store, under a unique name each, to facilitate kernel caching

// store:
cl->storeKernel( "mykernelname", somekernel ); // name must be not used yet

// check exists:
cl->kernelExists( "mykernelname" );

// retrieve:
CLKernel *kernel = cl->getKernel( "mykernelname" );

New: you can transfer kernel ownership to EasyCL object, by passing third parameter deleteWithCl = true. Then, when the EasyCL object is deleted, so will be the kernel.

// store:
cl->storeKernel( "mykernelname", somekernel, true ); // this kernel will be deleted when
                                                     // `cl` object is deleted

device dirty flag

For CLWrapper objects, if the wrapper is passed to a kernel via out or inout, and then that kernel is run, then isDeviceDirty() will return true, until ->copyToHost() is called. So, you can use this to determine whether you need to run ->copyToHost() prior to reading the host-side array.

The following methods will reset the flag to false:

  • copyToDevice()
  • copyToHost()

This is a new feature, as of May 15 2015, and might have some bugs prior to May 31 2015 (ie, about 2 weeks, long enough for me to find any bugs).

templated kernels

passing structs

  • Simply #include new "CLKernel_structs.h" header, in order to be able to pass structs
  • See test/testStructs.cpp for an example

Profiling (New!)

  • Simply call cl->setProfiling(true);, then run your kernels as normal, then call cl->dumpProfiling to print the results
  • Timings are cumulative over multiple calls to the same kernel
  • Timings are grouped by kernel filename and kernelname
  • See test/testprofiling.cpp for an example

Using with clBLAS

  • You can call ->getBuffer() on a CLWrapper object, in order to pass it to clBLAS. You can see an example eg at THClBlas.cpp#L425

How to build

Build options

Option Description
`PROVIDE_LUA_ENGINE If you want to call EasyCL from within Lua, then choose option PROVIDE_LUA_ENGINE=OFF, otherwise leave it as ON
DEV_RUN_COG Only for EasyCL maintainers, leave as OFF otherwise
BUILD_TESTS whether to build unit tests

Build Status

Building on Mac OS X

(tested on Travis https://travis-ci.org/hughperkins/EasyCL )

Pre-requisites

  • git
  • cmake
  • g++
  • (maybe) OpenCL (not sure if installed by default? Travis worked ok without explicitly installing it)

Procedure

git clone --recursive https://github.com/hughperkins/EasyCL.git
cd EasyCL
mkdir build
cd build
cmake ..
make install
  • the executables will be in the ../dist/bin folder, and the .dylib files in ../dist/lib
  • Dont forget the --recursive, otherwise you will see odd errors about clew/src/clew.c missing
    • If this happens, you can try git submodule init and then git submodule update.

Building on linux

Pre-requisites

  • OpenCL needs to be installed, which means things like:
    • in linux, you'll need a libOpenCL.so installed, and
    • an OpenCL implementation, ie some kind of .so file, and
    • an appropriate text file at /etc/OpenCL/vendors/somename.icd , containing the full path to the Open CL implementation .so file
  • git (only needed to obtain the source-code)
  • cmake
  • g++

Procedure

git clone --recursive https://github.com/hughperkins/EasyCL.git
cd EasyCL
mkdir build
cd build
cmake ..
make install
  • the executables will be in the ../dist/bin folder, and the .so files in ../dist/lib
  • Dont forget the --recursive, otherwise you will see odd errors about clew/src/clew.c missing
    • If this happens, you can try git submodule init and then git submodule update.

Building on Windows

Pre-requisites

  • OpenCL-enabled GPU and driver
  • git (only needed to obtain the source-code)
  • cmake
  • Visual Studio (tested with Visual Studio 2013 Community Edition)

Procedure

  • Open git bash, and run git clone --recursive https://github.com/hughperkins/EasyCL.git
  • Open cmake:
    • set source directory to the git-cloned directory from previous step
    • Set build directory to a subdirectory build-win32, or build-win64, according to which platform you are building for
    • click configure, choose appropriate build platform, eg visual studio 2013, or visual studio 2013 win64
    • click generate
  • Open visual studio
    • open any of the projects in the build-win32 or build-win64 build directory
    • change build type from Debug to Release
    • from build menu, choose build solution
    • right-click 'INSTALL' project, and select 'Build'
  • after building, you will need to copy the *.cl files from the test directory into the directory where you will run the tests from (if you can figure out a way to automate this, please send a pull request :-) )

How to run self-tests

To check clew library is working ok (ie finding and loading the opencl library, etc):

linux:

    LD_LIBRARY_PATH=../dist/lib ..dist/bin/gpuinfo

Windows:

    ..dist/bin/gpuinfo

... should print some information about your graphics card

Unit-tests:

Linux:

    LD_LIBRARY_PATH=../dist/lib ..dist/bin/easycl_unittests

Windows:

    ..dist/bin/easycl_unittests

How to check my OpenCL installation/configuration?

  • In Ubuntu, you can use clinfo (install via sudo apt-get install clinfo), to check the OpenCL installation itself is ok. If this says 'no installations found', then it's an OpenCL configuration issue.
    • note that clinfo is broken on CUDA, I think? But OpenCL will still work ok: try gpuinfo instead
  • Run gpuinfo to list available platforms and devices
  • If no gpu-capabable devices found, you probably want to check things like:
    • do you have an OpenCL-capable GPU installed?
    • are the drivers installed?
    • is the ICD setup?

What if I've found a bug?

  • Ideally, create a simple test case, just 10-30 lines if possible, and either just paste it directly as an issue, or else fork the repository, and ideally add it into the test directory, as an additional gtest-compliant test.
  • (and then, obviously post an issue to alert me)

What if I want a new feature?

  • Post a request as an issue
  • Or, fork the repository, add the feature, and send me a pull request

What if I just have a question?

Recent changes

  • 2017 dec 28th:
    • @iame6162013 fixed race conditions when reading output buffers
    • @iame6162013 added kernel fast read option
      • Should make kernel->run a bit faster
  • 2017 Apr 29th:
    • added var CL_GPUOFFSET, which lets you choose a GPU, by setting this var to 1,2,3, ...
  • 2016 Oct 16th:
    • added EasyCL::default_queue, which is a CLQueue, containing EasyCL::queue cl_command_queue
  • 2016 Oct 15th:
    • master, and versions 4.0.0 and above, are wrapped in a namespace easycl now. Since it's a breaking change, in terms of compatibility, I've bumped the major version number
  • 2016 Jan 3rd:
  • 2015 Sep 10th:
    • fix mac build
    • merge to master
  • 2015 Aug 28th:
    • add USE_CLEW option, default 'ON', but can disable, to link directly with OpenCL libraries, rather than via clew
  • 2015 Aug 26th:
    • int64 and uint64 are now typedef'd to int64_t and uint64_t, instead of long long and unsigned long long. This is configurable in cmake options, though the default is that the typedef changes. I'm not 100% sure if changing the default is a good idea, but it seems better than having int64 and int64_t be two different types...
  • 2015 Aug 15th:
    • builds again on Windows (as well as on Ubuntu 14.04)
  • 2015 Aug 8th:
    • merged development branches into master. changes include:
      • clew is now a git submodule again. Make sure to do git submodule init and git submodulate update to download it
        • when you checkout / update, you might need to use -f option to git, or delete the thirdparty/clew directory first
      • added build option to not link with internal Lua library
      • added profiling, using the OpenCL profiling functions
      • copy of data between host and device is done using explicit enqueueCopyBuffer functions now
  • 2015 July 15th:
    • added profiling
  • 2015 June 27th:
    • merged bundle-lua to master
    • Added StatefulTimer.h (was in DeepCL )
  • 2015 June 18-25th:
    • on branch bundle-lua:
      • builds on Windows again
      • started bundling lua sourcecode, so dont need lua libraries etc
      • Added new version of CLWrapper->copyTo, wihch has additional parameters , srcOffset, dstOffset, count
      • Added StatefulTimer.h (was in DeepCL
  • 2015 June 17th:
    • Merged changes to master:
      • DevicesInfo::getNumDevices() now returns 0, if no platforms available, rather than throwing exception
      • templates are expanded recursively now, so eg you can include other templates inside your template, and those will be expanded correctly (if you dislike this, please raise an issue to make it an option; easy to add)
      • added EasyCL::createForIndexedDevice, which creates an instance for the indexed device, over all opencl-enabled devices, gpu or not
      • more diagnostic output if a kernel fails to build, including line-numbers :-)
      • added getRenderedKernel method to KernelTemplater class
  • 2015 June:
    • added kernel templates, using Lua
    • added CLWrapper->copyTo() method
    • made it possible to pass arrays of 1 or more structs into CLKernels
    • added install targets to the build
    • added options to the build to turn unit-tests on/off
  • 2015 May 11: just noticed there is an over-aggressive assert in gpuinfo, that exits if not exactly one platform => fixed
  • 2015 May 10: Added CLWrapper.devicedirty flag, which is set whenever the wrapper is passed to a kernel via out or inout, and that kernel is run
  • 2015 May 3: Added kernel store methods: storeKernel( string name, CLKernel *kernel ), getKernel( string name ), kernelExists( string name ), to facilitate per-connection kernel caching
  • 2015 May 3: Added getCl() to CLWrapper types
  • 2015 May 1:Renamed from OpenCLHelper to EasyCL (easier to type, and remember)
  • Added getBuffer to CLWrapper, to give access to the underlying buffer, eg can use this for using with clBLAS
  • Added CLWrapper instantiation for unsigned char

License

EasyCL is available under MPL v2 license, http://mozilla.org/MPL/2.0/19

More Repositories

1

DeepCL

OpenCL library to train deep convolutional neural networks
C++
849
star
2

coriander

Build NVIDIA® CUDA™ code for OpenCL™ 1.2 devices
LLVM
837
star
3

tf-coriander

OpenCL 1.2 implementation for Tensorflow
C++
789
star
4

VeriGPU

OpenSource GPU, in Verilog, loosely based on RISC-V ISA
SystemVerilog
716
star
5

pytorch

Python wrappers for torch and lua
Python
431
star
6

cltorch

An OpenCL backend for torch.
C++
287
star
7

Jinja2CppLight

(very) lightweight version of Jinja2 for C++
C++
140
star
8

clnn

OpenCL backend for Torch nn neural networks library
Lua
123
star
9

howto-jenkins-ssl

quick how to on activating ssl on jenkins, so I can find it easily
109
star
10

jeigen

Java wrapper for Eigen C++ fast matrix library
C++
104
star
11

kgsgo-dataset-preprocessor

Dataset preprocessor for the KGS go dataset, eg according to Clark and Storkey input planes
Python
70
star
12

peaceful-pie

Control Unity from Python! Use for reinforcement learning.
C#
26
star
13

cpu-tutorial

Tutorial on building your own CPU, in Verilog
26
star
14

coriander-dnn

Partial implementation of NVIDIA® cuDNN API for Coriander, OpenCL 1.2
C++
22
star
15

rnn-notes

Notes on how element-research rnn works
17
star
16

luacpptemplater

Write Jinja2-style templates in C++. Uses lua as a scripting language (very lightweight)
C++
16
star
17

jfastparser

Very fast parsing of doubles and ints from a string
Java
15
star
18

pytorch-coriander

OpenCL build of pytorch - (in-progress, not useable)
Python
14
star
19

pub-prototyping

prototyping stuff
C++
13
star
20

securewebcmd

Execute commands on a linux server through a webpage. Secured using md5 hashing
JavaScript
12
star
21

neonCl-underconstruction

experimental port of nervana neon kernels in OpenCL
Python
11
star
22

selfstudy-IBP

Self-study notes for Indian Buffet Process, from reading through "The Indian Buffet Process: An Introduction and Review", Griffiths, Ghahramani, 2011
Jupyter Notebook
10
star
23

torch-modder-notes

Notes for torch maintainers/modders
10
star
24

pycudatorch

poc for using cuda torch from python :-)
Python
7
star
25

nimbix-admin

utility scripts for start/stop/ssh to nimbix instances
Python
7
star
26

coriander-CLBlast

BLAS implementation for Coriander, using CLBlast
C++
6
star
27

ArgParseCpp

C++ version of Python's ArgParse
C++
6
star
28

passwordbookmarklet

bookmarklet to generate unique secure passwords for each website from a single master password
JavaScript
5
star
29

torchunit

torchunit
Shell
4
star
30

UnityFluidSim-pub

UnityFluidSim-pub
C#
4
star
31

tinisles-googleauthenticator

Fork of the code at http://blog.tinisles.com/2011/10/google-authenticator-one-time-password-algorithm-in-javascript/
HTML
3
star
32

springgrid

Runs spring matches on a grid of botrunners
Python
3
star
33

osmp-cs

OSMP C# - Opensource Secondlife clone, from around 2005
C#
3
star
34

ailadder

code to create an ailadder webserver
Python
3
star
35

pycltorch

POC for Python wrappers for cltorch/clnn
Python
3
star
36

yet-another-measure-theoretic-probability-tutorial

Yet another measure theoretic probability tutorial
Jupyter Notebook
3
star
37

HughAI

HughAI Java AI for Spring
Java
3
star
38

selfstudy-LARS

least angle regression, reproducing for self-study
Jupyter Notebook
2
star
39

verigpu-cuda-frontend

Front-end for VeriGPU, providing NVIDIA® CUDA™ compatibility, for compatibility purposes
2
star
40

selfstudy-LIME

LIME
Jupyter Notebook
2
star
41

blameful-indenter

reindent code, whilst preserving git blame
Python
2
star
42

SpringRTS-CSharpAI

AI For Spring RTS game, in C#, from around 2006
C#
2
star
43

neon-benchmarks

benchmarks for neon, both cuda and OpenCL version
Python
2
star
44

FractalSpline-cpp

Provides SecondLife-like primitives , in C++/OpenGL. From around 2005
C++
2
star
45

relooper

Reloop llvm IR output, to have `for`s, `if`s, `while`s. From https://reviews.llvm.org/D12744
C++
2
star
46

osmp-cpp

OSMP C++ - opensource SecondLife clone, from around 2004
C++
2
star
47

chat-ai-npcs-video-resources

chat-ai-npcs-video-resources
C#
2
star
48

github-stars

get an email whenever someone stars your github repo :-)
Python
2
star
49

project-ideas

Ideas for projects
2
star
50

scalable-gpt-developer

scalable-gpt-developer
Python
2
star
51

gpu-experiments

Informal experiments on various gpu kernel questions
Python
1
star
52

dockerfiles

dockerfiles
1
star
53

virtualenv-move

Moves a python virtualenv
Shell
1
star
54

privacy-policies

privacy-policies
HTML
1
star
55

openpw

Password hash generator, including console, bookmarklet, chrome extension
JavaScript
1
star
56

headlessopenglstubs

stubs for opengl, glew, sdl, whatever it takes ;-), to get spring to run without an X session
1
star
57

ndplot

high-dimensional viewer, by projecting onto a 3d hypercube. Use the mouse to rotate the 3d projection.
Python
1
star
58

python-threadingx

Erlang-like threading functionality for Python
Python
1
star
59

youtube-likes

Receive a notification/email when someone 'like's one of your videos.
Python
1
star
60

youtube-rl-demos

Scripts, code used in youtube demos
Python
1
star
61

python-graphics-numpy

Scripts for youtube video on creating graphics in python using numpy
Python
1
star
62

SpringMapDesigner

3d MapDesigner for Spring
1
star
63

chinese-transcriptions

Transcriptions of Chinese language videos
1
star
64

pytorch-prettify

Prettifies exceptions coming out of pytorch
Jupyter Notebook
1
star
65

tf_cached_build

Cache tensorflow build dependencies, to accelerate repeated tf configures, or run on a plane
Python
1
star
66

cppsimpleargsparser

Simple C++ args parser. Easy to use. Automatically provides checking and usage printout.
C++
1
star
67

PortableTensor.Net

Cross-platform Tensor with transpose, slice as views over underlying data
C#
1
star