• Stars
    star
    970
  • Rank 45,427 (Top 1.0 %)
  • Language
    Assembly
  • License
    MIT License
  • Created almost 7 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

AMD's Machine Intelligence Library

MIOpen

AMD's library for high performance machine learning primitives. Sources and binaries can be found at MIOpen's GitHub site. The latest released documentation can be read online here.

MIOpen supports two programming models -

  1. HIP (Primary Support).
  2. OpenCL.

Documentation

For a detailed description of the MIOpen library see the Documentation.

How to build documentation

Run the steps below to build documentation locally.

cd docs

pip3 install -r .sphinx/requirements.txt

python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

Prerequisites

  • More information about ROCm stack via ROCm Information Portal.
  • A ROCm enabled platform, more info here.
  • Base software stack, which includes:
    • HIP -
      • HIP and HCC libraries and header files.
    • OpenCL - OpenCL libraries and header files.
  • MIOpenGEMM - enable various functionalities including transposed and dilated convolutions.
    • This is optional on the HIP backend, and required on the OpenCL backend.
    • Users can enable this library using the cmake configuration flag -DMIOPEN_USE_MIOPENGEMM=On, which is enabled by default when OpenCL backend is chosen.
  • ROCm cmake - provide cmake modules for common build tasks needed for the ROCM software stack.
  • Half - IEEE 754-based half-precision floating point library
  • Boost
    • MIOpen uses boost-system and boost-filesystem packages to enable persistent kernel cache
    • Version 1.79 is recommended, older version may need patches to work on newer systems, e.g. boost1{69,70,72} w/glibc-2.34
  • SQLite3 - reading and writing performance database
  • lbzip2 - multi-threaded compress or decompress utility
  • MIOpenTENSILE - users can enable this library using the cmake configuration flag-DMIOPEN_USE_MIOPENTENSILE=On. (deprecated after ROCm 5.1.1)
  • rocBLAS - AMD library for Basic Linear Algebra Subprograms (BLAS) on the ROCm platform.
  • MLIR - (Multi-Level Intermediate Representation) with its MIOpen dialect to support and complement kernel development.
  • Composable Kernel - C++ templated device library for GEMM-like and reduction-like operators.

Installing MIOpen with pre-built packages

MIOpen can be installed on Ubuntu using apt-get.

For OpenCL backend: apt-get install miopen-opencl

For HIP backend: apt-get install miopen-hip

Currently both the backends cannot be installed on the same system simultaneously. If a different backend other than what currently exists on the system is desired, please uninstall the existing backend completely and then install the new backend.

Installing MIOpen kernels package

MIOpen provides an optional pre-compiled kernels package to reduce the startup latency. These precompiled kernels comprise a select set of popular input configurations and will expand in future release to contain additional coverage.

Note that all compiled kernels are locally cached in the folder $HOME/.cache/miopen/, so precompiled kernels reduce the startup latency only for the first execution of a neural network. Precompiled kernels do not reduce startup time on subsequent runs.

To install the kernels package for your GPU architecture, use the following command:

apt-get install miopenkernels-<arch>-<num cu>

Where <arch> is the GPU architecture ( for example, gfx900, gfx906, gfx1030 ) and <num cu> is the number of CUs available in the GPU (for example 56 or 64 etc).

Not installing these packages would not impact the functioning of MIOpen, since MIOpen will compile these kernels on the target machine once the kernel is run. However, the compilation step may significantly increase the startup time for different operations.

The script utils/install_precompiled_kernels.sh provided as part of MIOpen automates the above process, it queries the user machine for the GPU architecture and then installs the appropriate package. It may be invoked as:

./utils/install_precompiled_kernels.sh

The above script depends on the rocminfo package to query the GPU architecture.

More info can be found here.

Installing the dependencies

The dependencies can be installed with the install_deps.cmake, script: cmake -P install_deps.cmake

This will install by default to /usr/local but it can be installed in another location with --prefix argument:

cmake -P install_deps.cmake --prefix <miopen-dependency-path>

An example cmake step can be:

cmake -P install_deps.cmake --minimum --prefix /root/MIOpen/install_dir

This prefix can used to specify the dependency path during the configuration phase using the CMAKE_PREFIX_PATH.

  • MIOpen's HIP backend uses rocBLAS by default. Users can install rocBLAS minimum release by using apt-get install rocblas. To disable using rocBLAS set the configuration flag -DMIOPEN_USE_ROCBLAS=Off. rocBLAS is not available for the OpenCL backend.

  • MIOpen's OpenCL backend uses MIOpenGEMM by default. Users can install MIOpenGEMM minimum release by using apt-get install miopengemm.

Building MIOpen from source

Configuring with cmake

First create a build directory:

mkdir build; cd build;

Next configure cmake. The preferred backend for MIOpen can be set using the -DMIOPEN_BACKEND cmake variable.

For the HIP backend (ROCm 3.5 and later), run:

Set the C++ compiler to clang++.

export CXX=<location-of-clang++-compiler>
cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="<hip-installed-path>;<rocm-installed-path>;<miopen-dependency-path>" ..

An example cmake step can be:

export CXX=/opt/rocm/llvm/bin/clang++ && \
cmake -DMIOPEN_BACKEND=HIP -DCMAKE_PREFIX_PATH="/opt/rocm/;/opt/rocm/hip;/root/MIOpen/install_dir" ..

Note: When specifying the path for the CMAKE_PREFIX_PATH variable, do not use the ~ shorthand for the user home directory.

For OpenCL, run:

cmake -DMIOPEN_BACKEND=OpenCL ..

The above assumes that OpenCL is installed in one of the standard locations. If not, then manually set these cmake variables:

cmake -DMIOPEN_BACKEND=OpenCL -DMIOPEN_HIP_COMPILER=<hip-compiler-path> -DOPENCL_LIBRARIES=<opencl-library-path> -DOPENCL_INCLUDE_DIRS=<opencl-headers-path> ..

And an example setting the dependency path for an envirnment in ROCm 3.5 and later:

cmake -DMIOPEN_BACKEND=OpenCL -DMIOPEN_HIP_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_PREFIX_PATH="/opt/rocm/;/opt/rocm/hip;/root/MIOpen/install_dir" ..

Setting Up Locations

By default the install location is set to '/opt/rocm', this can be set by using CMAKE_INSTALL_PREFIX:

cmake -DMIOPEN_BACKEND=OpenCL -DCMAKE_INSTALL_PREFIX=<miopen-installed-path> ..

System Performance Database and User Database

The default path to the System PerfDb is miopen/share/miopen/db/ within install location. The default path to the User PerfDb is ~/.config/miopen/. For development purposes, setting BUILD_DEV will change default path to both database files to the source directory:

cmake -DMIOPEN_BACKEND=OpenCL -DBUILD_DEV=On ..

Database paths can be explicitly customized by means of MIOPEN_SYSTEM_DB_PATH (System PerfDb) and MIOPEN_USER_DB_PATH (User PerfDb) cmake variables.

More information about the performance database can be found here.

Persistent Program Cache

MIOpen by default caches the device programs in the location ~/.cache/miopen/. In the cache directory there exists a directory for each version of MIOpen. Users can change the location of the cache directory during configuration using the flag -DMIOPEN_CACHE_DIR=<cache-directory-path>.

Users can also disable the cache during runtime using the environmental variable set as MIOPEN_DISABLE_CACHE=1.

For MIOpen version 2.3 and earlier

If the compiler changes, or the user modifies the kernels then the cache must be deleted for the MIOpen version in use; e.g., rm -rf ~/.cache/miopen/<miopen-version-number>. More information about the cache can be found here.

For MIOpen version 2.4 and later

MIOpen's kernel cache directory is versioned so that users' cached kernels will not collide when upgrading from earlier version.

Changing the cmake configuration

The configuration can be changed after running cmake by using ccmake:

ccmake .. OR cmake-gui: cmake-gui ..

The ccmake program can be downloaded as the Linux package cmake-curses-gui, but is not available on windows.

Building the library

The library can be built, from the build directory using the 'Release' configuration:

cmake --build . --config Release OR make

And can be installed by using the 'install' target:

cmake --build . --config Release --target install OR make install

This will install the library to the CMAKE_INSTALL_PREFIX path that was set.

Building the driver

MIOpen provides an application-driver which can be used to execute any one particular layer in isolation and measure performance and verification of the library.

The driver can be built using the MIOpenDriver target:

cmake --build . --config Release --target MIOpenDriver OR make MIOpenDriver

Documentation on how to run the driver is here.

Running the tests

The tests can be run by using the 'check' target:

cmake --build . --config Release --target check OR make check

A single test can be built and ran, by doing:

cmake --build . --config Release --target test_tensor
./bin/test_tensor

Formatting the code

All the code is formatted using clang-format. To format a file, use:

clang-format-10 -style=file -i <path-to-source-file>

Also, githooks can be installed to format the code per-commit:

./.githooks/install

Storing large file using Git LFS

Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server. In MIOpen, we use git LFS to store the large files, such as the kernel database files (*.kdb) which are normally > 0.5GB. Steps:

Git LFS can be installed and set up by:

sudo apt install git-lfs
git lfs install

In the Git repository that you want to use Git LFS, track the file type that you's like by (if the file type has been tracked, this step can be skipped):

git lfs track "*.file_type"
git add .gitattributes

Pull all or a single large file that you would like to update by:

git lfs pull --exclude=
or
git lfs pull --exclude= --include "filename"

Update the large files and push to the github by:

git add my_large_files
git commit -m "the message"
git push

Installing the dependencies manually

If Ubuntu v16 is used then the Boost packages can also be installed by:

sudo apt-get install libboost-dev
sudo apt-get install libboost-system-dev
sudo apt-get install libboost-filesystem-dev

Note: MIOpen by default will attempt to build with Boost statically linked libraries. If it is needed, the user can build with dynamically linked Boost libraries by using this flag during the configruation stage:

-DBoost_USE_STATIC_LIBS=Off

however, this is not recommended.

The half header needs to be installed from here.

Using docker

The easiest way is to use docker. You can build the top-level docker file:

docker build -t miopen-image .

Then to enter the development environment use docker run, for example:

docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device /dev/dri:/dev/dri:rw  --volume /dev/dri:/dev/dri:rw -v /var/lib/docker/:/var/lib/docker --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined miopen-image

Prebuilt docker images can be found on ROCm's public docker hub here.

Citing MIOpen

MIOpen's paper is freely available and can be accessed on arXiv:
MIOpen: An Open Source Library For Deep Learning Primitives

Citation BibTeX

@misc{jeh2019miopen,
    title={MIOpen: An Open Source Library For Deep Learning Primitives},
    author={Jehandad Khan and Paul Fultz and Artem Tamazov and Daniel Lowell and Chao Liu and Michael Melesse and Murali Nandhimandalam and Kamil Nasyrov and Ilya Perminov and Tejash Shah and Vasilii Filippov and Jing Zhang and Jing Zhou and Bragadeesh Natarajan and Mayank Daga},
    year={2019},
    eprint={1910.00078},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Porting from cuDNN to MIOpen

The porting guide highlights the key differences between the current cuDNN and MIOpen APIs.

More Repositories

1

ROCm

AMD ROCmâ„¢ Software - GitHub Home
Python
4,106
star
2

HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
C++
3,398
star
3

hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
C++
418
star
4

HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code
C++
393
star
5

rocBLAS

Next generation BLAS implementation for ROCm platform
C++
308
star
6

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++
210
star
7

rccl

ROCm Communication Collectives Library (RCCL)
C++
206
star
8

ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
C++
190
star
9

Tensile

Stretching GPU performance for GEMMs and tensor contractions.
Python
187
star
10

aomp

AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.
Fortran
178
star
11

MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVXâ„¢ and OpenVXâ„¢ Extensions.
C++
168
star
12

gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
Fortran
157
star
13

AMDMIGraphX

AMD's graph optimization engine.
C++
154
star
14

rocFFT

Next generation FFT implementation for ROCm
C++
144
star
15

rocPRIM

ROCm Parallel Primitives
C++
142
star
16

omniperf

Advanced Profiling and Analytics for AMD Hardware
Python
118
star
17

rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
C++
110
star
18

rocm_smi_lib

ROCm SMI LIB
C++
106
star
19

HIP-CPU

An implementation of HIP that works on CPUs, across OSes.
C++
104
star
20

rocSPARSE

Next generation SPARSE implementation for ROCm platform
C++
104
star
21

rocm-examples

C++
100
star
22

ROCm-Device-Libs

ROCm Device Libraries
C
99
star
23

rocRAND

RAND library for HIP programming language
C++
99
star
24

rocMLIR

95
star
25

rocThrust

ROCm Thrust - run Thrust dependent software on AMD GPUs
C++
88
star
26

rocSOLVER

Next generation LAPACK implementation for ROCm platform
C++
85
star
27

rocWMMA

rocWMMA
C++
68
star
28

hipCUB

Reusable software components for ROCm developers
C++
68
star
29

atmi

Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provides a consistent, declarative API to create task graphs on CPUs and GPUs (integrated and discrete).
C++
65
star
30

hipfort

Fortran interfaces for ROCm libraries
Fortran
63
star
31

rocALUTION

Next generation library for iterative sparse solvers for ROCm platform
C++
62
star
32

roctracer

ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
C++
58
star
33

hipSPARSE

ROCm SPARSE marshalling library
C++
58
star
34

rocm-cmake

CMake modules used within the ROCm libraries
CMake
50
star
35

ROCmValidationSuite

The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
C++
49
star
36

amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeonâ„¢ and AMD Instinctâ„¢ accelerators
Python
48
star
37

rpp

AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.
C++
46
star
38

ROCclr

44
star
39

ROCm-CompilerSupport

The compiler support repository provides various Lightning Compiler related services.
C++
42
star
40

hipFFT

hipFFT is a FFT marshalling library.
C++
40
star
41

ROCgdb

This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
C
40
star
42

HIPCC

HIPCC: HIP compiler driver
C++
38
star
43

Experimental_ROC

Experimental and Intriguing Tools for ROCm
Shell
35
star
44

ROC_SHMEM

ROC_SHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
C++
34
star
45

rocm_bandwidth_test

Bandwidth test for ROCm
C++
31
star
46

MISA

Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
Python
31
star
47

ROCm.github.io

ROCm Website
30
star
48

rocHPCG

HPCG benchmark based on ROCm platform
C++
30
star
49

clang-ocl

OpenCL compilation with clang compiler.
CMake
26
star
50

amdsmi

AMD SMI
C++
25
star
51

ROCm-OpenCL-Driver

ROCm OpenCL Compiler Tool Driver
C++
24
star
52

rccl-tests

RCCL Performance Benchmark Tests
Cuda
21
star
53

hipSOLVER

ROCm SOLVER marshalling library
C++
21
star
54

ROCdbgapi

The AMD Debugger API is a library that provides all the support necessary for a debugger and other tools to perform low level control of the execution and inspection of execution state of AMD's commercially available GPU architectures.
C++
19
star
55

rocm-blogs

Jupyter Notebook
16
star
56

hip-tests

C++
15
star
57

rdc

RDC
C++
14
star
58

TransferBench

TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
C++
14
star
59

hipRAND

Random number library that generate pseudo-random and quasi-random numbers.
C++
13
star
60

hip-python

HIP Python Low-level Bindings
Cython
13
star
61

hipify_torch

Python
13
star
62

rocm-docs-core

ROCm Documentation Python package for ReadTheDocs build standardization
CSS
12
star
63

roc-stdpar

C++
10
star
64

pyrsmi

python package of rocm-smi-lib
Python
10
star
65

rocmProfileData

C++
10
star
66

OSU_Microbenchmarks

ROCm - UCX enabled OSU_Benchmarks
C
8
star
67

rocAL

The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user.
C++
8
star
68

MITuna

Python
7
star
69

pytorch-micro-benchmarking

Python
7
star
70

half

C++
6
star
71

rocm-spack-pkgs

Repository to host spack recipes for ROCm
Python
6
star
72

rbuild

Rocm build tool
Python
6
star
73

rtg_tracer

C++
6
star
74

aotriton

Ahead of Time (AOT) Triton Math Library
Python
4
star
75

hip-testsuite

Python
4
star
76

MIFin

Tuna centric MIOpen client
C++
4
star
77

Gromacs

ROCm's implementation of Gromacs
C++
3
star
78

flang

Mirror of flang repo: The source repo is https://github.com/flang-compiler/flang . Once a day the master branch is updated from the upstream source repo and then locked. AOMP or ROCm developers may commit or create PRs on branch aomp-dev.
C++
3
star
79

rocm-core

CMake
3
star
80

hipSPARSELt

C++
2
star
81

aomp-extras

hostcall services library, math library, and utilities
Shell
2
star
82

MIOpenExamples

MIOpen examples
C++
2
star
83

rocm-recipes

Recipes for rocm
CMake
1
star