• Stars
    star
    188
  • Rank 204,591 (Top 5 %)
  • Language
    C++
  • License
    Other
  • Created about 5 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

oneAPI Collective Communications Library (oneCCL)

oneAPI Collective Communications Library (oneCCL)

Installation   |   Usage   |   Release Notes   |   Documentation   |   How to Contribute   |   License

oneAPI Collective Communications Library (oneCCL) provides an efficient implementation of communication patterns used in deep learning.

oneCCL is integrated into:

oneCCL is part of oneAPI.

Table of Contents

Prerequisites

  • Ubuntu* 18
  • GNU*: C, C++ 4.8.5 or higher.

Refer to System Requirements for more details.

SYCL support

Intel(R) oneAPI DPC++/C++ Compiler with Level Zero v1.0 support.

To install Level Zero, refer to the instructions in Intel(R) Graphics Compute Runtime repository or to the installation guide for oneAPI users.

BF16 support

  • AVX512F-based implementation requires GCC 4.9 or higher.
  • AVX512_BF16-based implementation requires GCC 10.0 or higher and GNU binutils 2.33 or higher.

Installation

General installation scenario:

cd oneccl
mkdir build
cd build
cmake ..
make -j install

If you need a clean build, create a new build directory and invoke cmake within it.

You can also do the following during installation:

Usage

Launching Example Application

Use the command:

$ source <install_dir>/env/setvars.sh
$ mpirun -n 2 <install_dir>/examples/benchmark/benchmark

Setting workers affinity

There are two ways to set worker threads (workers) affinity: automatically and explicitly.

Automatic setup

  1. Set the CCL_WORKER_COUNT environment variable with the desired number of workers per process.
  2. Set the CCL_WORKER_AFFINITY environment variable with the value auto.

Example:

export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=auto

With the variables above, oneCCL will create four workers per process and the pinning will depend from process launcher.

If an application has been launched using mpirun that is provided by oneCCL distribution package then workers will be automatically pinned to the last four cores available for the launched process. The exact IDs of CPU cores can be controlled by mpirun parameters.

Otherwise, workers will be automatically pinned to the last four cores available on the node.


Explicit setup

  1. Set the CCL_WORKER_COUNT environment variable with the desired number of workers per process.
  2. Set the CCL_WORKER_AFFINITY environment variable with the IDs of cores to pin local workers.

Example:

export CCL_WORKER_COUNT=4
export CCL_WORKER_AFFINITY=3,4,5,6

With the variables above, oneCCL will create four workers per process and pin them to the cores with the IDs of 3, 4, 5, and 6 respectively.

Using oneCCL package from CMake

oneCCLConfig.cmake and oneCCLConfigVersion.cmake are included into oneCCL distribution.

With these files, you can integrate oneCCL into a user project with the find_package command. Successful invocation of find_package(oneCCL <options>) creates imported target oneCCL that can be passed to the target_link_libraries command.

For example:

project(Foo)
add_executable(foo foo.cpp)

# Search for oneCCL
find_package(oneCCL REQUIRED)

# Connect oneCCL to foo
target_link_libraries(foo oneCCL)

oneCCLConfig files generation

To generate oneCCLConfig files for oneCCL package, use the provided cmake/scripts/config_generation.cmake file:

cmake [-DOUTPUT_DIR=<output_dir>] -P cmake/script/config_generation.cmake

Additional Resources

Blog Posts

Workshop Materials

  • oneAPI, oneCCL and OFI: Path to Heterogeneous Architecure Programming with Scalable Collective Communications: recording and slides

Contribute

See CONTRIBUTING for more information.

License

Distributed under the Apache License 2.0 license. See LICENSE for more information.

Security Policy

See SECURITY for more information.

More Repositories

1

oneTBB

oneAPI Threading Building Blocks (oneTBB)
C++
5,603
star
2

oneDNN

oneAPI Deep Neural Network Library (oneDNN)
C++
3,576
star
3

oneAPI-samples

Samples for Intel® oneAPI Toolkits
C++
922
star
4

oneDPL

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html
C++
720
star
5

oneDAL

oneAPI Data Analytics Library (oneDAL)
C++
607
star
6

oneMKL

oneAPI Math Kernel Library (oneMKL) Interfaces
C++
606
star
7

SYCLomatic

LLVM
221
star
8

level-zero

oneAPI Level Zero Specification Headers and Loader
C++
210
star
9

oneVPL

oneAPI Video Processing Library (oneVPL) dispatcher, tools, and examples
C++
173
star
10

oneAPI-spec

oneAPI Specification source files
Python
165
star
11

oneapi-ci

Sample configuration files for using oneAPI in CI systems
Shell
92
star
12

oneVPL-intel-gpu

C++
86
star
13

oneAPI-tab

oneAPI Technical Advisory Board (TAB) Meeting Notes
71
star
14

distributed-ranges

Distributed ranges is a generalization of C++ ranges for distributed data structures.
C++
46
star
15

level-zero-tests

oneAPI Level Zero Conformance & Performance test content
C++
45
star
16

Velocity-Bench

C++
42
star
17

unified-runtime

C++
31
star
18

unified-memory-framework

A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management. UMF allows users to manage multiple memory pools characterized by different attributes, allowing certain allocation types to be isolated from others and allocated using different hardware resources as required.
C
31
star
19

oneVPL-cpu

oneAPI Video Processing Library (oneVPL) CPU implementation. This GitHub repository is no longer active. See ReadMe for more information.
C++
25
star
20

level-zero-spec

Python
17
star
21

ishmem

Intel® SHMEM - Device initiated shared memory based communication library
C++
15
star
22

drone-navigation-inspection

AI Starter Kit for AI applications in Drone technology using Intel® Optimized Tensorflow*
Python
13
star
23

predictive-asset-health-analytics

AI Starter Kit for Predictive Asset Maintenance using Intel® optimized version of XGBoost
HTML
13
star
24

SYCLomatic-test

LLVM
13
star
25

text-data-generation

AI Starter Kit for AI Unstructured Synthetic Data Generation using Intel® Extension for Pytorch
Python
10
star
26

traffic-camera-object-detection

AI Starter Kit for traffic camera object detection using Intel® Extension for Pytorch
Python
10
star
27

invoice-to-cash-automation

Ai starter kit for trade promotion and claim documents categorization using pytorch* and Tensorflow*
Python
7
star
28

demand-forecasting

AI Starter Kit for demand forecasting using Intel® Optimized Tensorflow*
Python
7
star
29

disease-prediction

AI Starter Kit for the implementation of AI-based NLP Disease Prediction system using Intel® Extension for PyTorch* and Intel® Neural Compressor
Python
7
star
30

computational-fluid-dynamics

AI Starter Kit for fluid Flow Profiling using Intel® Optimized Tensorflow*
Python
6
star
31

historical-assets-document-process

AI Starter Kit for Historical Assets document processing using Intel® Extension for Pytorch
Python
6
star
32

network-intrusion-detection

AI Starter Kit for Network Intrusion Detection using Intel® Extension for Scikit-learn*
Python
6
star
33

ai-transcribe

AI Starter Kit for the implementation of an AI transcribe system using Intel® Extension for PyTorch*
Python
6
star
34

level-zero-intel-gpu

5
star
35

structural-damage-assessment

AI Starter Kit for applications in Satellite Image processing using Intel® Extension for Pytorch
Python
5
star
36

digital-twin

AI Starter Kit to build a MOSFET Digital Twin for Design Exploration using Intel® optimized version of XGBoost
Python
4
star
37

medical-imaging-diagnostics

AI Starter Kit for image-based abnormalities for different diseases classification using Intel® Optimized Tensorflow*
Python
4
star
38

visual-quality-inspection

AI Starter Kit for Quality Visual Inspection using Intel® Extension for Pytorch
Python
4
star
39

customer-chatbot

AI Starter Kit for Customer Chatbot using Intel® Extension for Pytorch
Python
3
star
40

distributed-ranges-tutorial

C++
3
star
41

purchase-prediction

AI Starter Kit for Purchase Prediction model using Intel® Extension for Scikit-learn*
Python
3
star
42

customer-segmentation

AI Starter Kit for Customer Segmentation for Online Retail using Intel® Extension for Scikit-learn*
Python
3
star
43

powerline-fault-detection

AI Starter Kit for detect faulty signals in power line voltage using Intel® Extension for Scikit-learn*
Python
3
star
44

image-data-generation

AI Starter Kit for Synthetic Image Generation using Intel® Optimized Tensorflow*
Python
2
star
45

intelligent-indexing

AI Starter Kit for Intelligent Indexing of Incoming Correspondence using Intel® Extension for Scikit-learn*
Python
2
star
46

unified-runtime-spec

2
star
47

visual-process-discovery

AI Starter Kit for Visual Process Discovery using Intel® Extension for Pytorch
Python
2
star
48

vertical-search-engine

AI Starter Kit for Semantic Vertical Search Engines using Intel® Extension for Pytorch
Python
2
star
49

document-automation

AI Starter Kit for Named Entity Recognition using Intel® Optimized Tensorflow (version 2.9.0 with oneDNN)
Python
2
star
50

ai-structured-data-generation

AI Starter Kit to generate structured synthetic data using Intel® Distribution of Modin
Python
1
star
51

voice-data-generation

AI Starter Kit for Synthetic Voice and Audio Generation using Intel® Extension for Pytorch
Python
1
star
52

order-to-delivery-time-forecasting

AI Starter Kit of a delivery time forecasting solution using Intel® optimized version of XGBoost
1
star
53

product-recommendations

AI Starter Kit for product recommendation system using Intel® Extension for Scikit-learn*
Jupyter Notebook
1
star
54

customer-churn-prediction

AI Starter Kit for customer churn prediction using Intel® Extension for Scikit-learn*
Python
1
star
55

credit-card-fraud-detection

AI Starter Kit for Credit Card Fraud Detection model using Intel® Extension for Scikit-learn*
Python
1
star
56

loan-default-risk-prediction

AI Starter Kit to predict probability of a loan default from client using Intel® optimized version of XGBoost
Python
1
star
57

ai-data-protection

AI Starter Kit for Personal Identifiable Information Anonymization using Intel® Extension for Pytorch
Python
1
star
58

engineering-design-optimization

AI Starter Kit for Engineering Design Optimization using Intel® Extension for Pytorch
Python
1
star
59

data-streaming-anomaly-detection

AI Starter Kit for Data Streaming Anomaly Detection using Intel® Optimized Tensorflow*
Python
1
star