• Stars
    star
    3,398
  • Rank 13,052 (Top 0.3 %)
  • Language
    C++
  • License
    MIT License
  • Created over 8 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

HIP: C++ Heterogeneous-Compute Interface for Portability

What is this repository for?

HIP is a C++ Runtime API and Kernel Language that allows developers to create portable applications for AMD and NVIDIA GPUs from single source code.

Key features include:

  • HIP is very thin and has little or no performance impact over coding directly in CUDA mode.
  • HIP allows coding in a single-source C++ programming language including features such as templates, C++11 lambdas, classes, namespaces, and more.
  • HIP allows developers to use the "best" development environment and tools on each target platform.
  • The HIPIFY tools automatically convert source from CUDA to HIP.
  • Developers can specialize for the platform (CUDA or AMD) to tune for performance or handle tricky cases.

New projects can be developed directly in the portable HIP C++ language and can run on either NVIDIA or AMD platforms. Additionally, HIP provides porting tools which make it easy to port existing CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

DISCLAIMER

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard versionchanges, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated.AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

© 2021 Advanced Micro Devices, Inc. All Rights Reserved.

Repository branches:

The HIP repository maintains several branches. The branches that are of importance are:

  • develop branch: This is the default branch, on which the new features are still under development and visible. While this maybe of interest to many, it should be noted that this branch and the features under development might not be stable.
  • Main branch: This is the stable branch. It is up to date with the latest release branch, for example, if the latest HIP release is rocm-4.3, main branch will be the repository based on this release.
  • Release branches. These are branches corresponding to each ROCM release, listed with release tags, such as rocm-4.2, rocm-4.3, etc.

Release tagging:

HIP releases are typically naming convention for each ROCM release to help differentiate them.

  • rocm x.yy: These are the stable releases based on the ROCM release. This type of release is typically made once a month.*

More Info:

How do I get set up?

See the Installation notes.

Simple Example

The HIP API includes functions such as hipMalloc, hipMemcpy, and hipFree. Programmers familiar with CUDA will also be able to quickly learn and start coding with the HIP API. Compute kernels are launched with the "hipLaunchKernelGGL" macro call. Here is simple example showing a snippet of HIP API code:

hipMalloc(&A_d, Nbytes);
hipMalloc(&C_d, Nbytes);

hipMemcpy(A_d, A_h, Nbytes, hipMemcpyHostToDevice);

const unsigned blocks = 512;
const unsigned threadsPerBlock = 256;
hipLaunchKernelGGL(vector_square,   /* compute kernel*/
                dim3(blocks), dim3(threadsPerBlock), 0/*dynamic shared*/, 0/*stream*/,     /* launch config*/
                C_d, A_d, N);  /* arguments to the compute kernel */

hipMemcpy(C_h, C_d, Nbytes, hipMemcpyDeviceToHost);

The HIP kernel language defines builtins for determining grid and block coordinates, math functions, short vectors, atomics, and timer functions. It also specifies additional defines and keywords for function types, address spaces, and optimization controls (See the HIP Kernel Language for a full description). Here's an example of defining a simple 'vector_square' kernel.

template <typename T>
__global__ void
vector_square(T *C_d, const T *A_d, size_t N)
{
    size_t offset = (blockIdx.x * blockDim.x + threadIdx.x);
    size_t stride = blockDim.x * gridDim.x;

    for (size_t i=offset; i<N; i+=stride) {
        C_d[i] = A_d[i] * A_d[i];
    }
}

The HIP Runtime API code and compute kernel definition can exist in the same source file - HIP takes care of generating host and device code appropriately.

HIP Portability and Compiler Technology

HIP C++ code can be compiled with either,

  • On the NVIDIA CUDA platform, HIP provides header file which translate from the HIP runtime APIs to CUDA runtime APIs. The header file contains mostly inlined functions and thus has very low overhead - developers coding in HIP should expect the same performance as coding in native CUDA. The code is then compiled with nvcc, the standard C++ compiler provided with the CUDA SDK. Developers can use any tools supported by the CUDA SDK including the CUDA profiler and debugger.
  • On the AMD ROCm platform, HIP provides a header and runtime library built on top of HIP-Clang compiler. The HIP runtime implements HIP streams, events, and memory APIs, and is a object library that is linked with the application. The source code for all headers and the library implementation is available on GitHub. HIP developers on ROCm can use AMD's ROCgdb (https://github.com/ROCm-Developer-Tools/ROCgdb) for debugging and profiling.

Thus HIP source code can be compiled to run on either platform. Platform-specific features can be isolated to a specific platform using conditional compilation. Thus HIP provides source portability to either platform. HIP provides the hipcc compiler driver which will call the appropriate toolchain depending on the desired platform.

Examples and Getting Started:

  • A sample and blog that uses any of HIPIFY tools to convert a simple app from CUDA to HIP:
cd samples/01_Intro/square
# follow README / blog steps to hipify the application.

More Examples

The GitHub repository HIP-Examples contains a hipified version of the popular Rodinia benchmark suite. The README with the procedures and tips the team used during this porting effort is here: Rodinia Porting Guide

Tour of the HIP Directories

  • include:

    • hip_runtime_api.h : Defines HIP runtime APIs and can be compiled with many standard Linux compilers (GCC, ICC, CLANG, etc), in either C or C++ mode.
    • hip_runtime.h : Includes everything in hip_runtime_api.h PLUS hipLaunchKernelGGL and syntax for writing device kernels and device functions. hip_runtime.h can be compiled using a standard C++ compiler but will expose a subset of the available functions.
    • amd_detail/** , nvidia_detail/** : Implementation details for specific platforms. HIP applications should not include these files directly.
  • bin: Tools and scripts to help with hip porting

    • hipcc : Compiler driver that can be used to replace nvcc in existing CUDA code. hipcc will call nvcc or HIP-Clang depending on platform and include appropriate platform-specific headers and libraries.
    • hipconfig : Print HIP configuration (HIP_PATH, HIP_PLATFORM, HIP_COMPILER, HIP_RUNTIME, CXX config flags, etc.)
  • doc: Documentation - markdown and doxygen info.

Reporting an issue

Use the GitHub issue tracker. If reporting a bug, include the output of "hipconfig --full" and samples/1_hipInfo/hipInfo (if possible).

More Repositories

1

ROCm

AMD ROCm™ Software - GitHub Home
Shell
4,470
star
2

MIOpen

AMD's Machine Intelligence Library
Assembly
1,046
star
3

HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code
C++
440
star
4

hcc

HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
C++
425
star
5

rocBLAS

Next generation BLAS implementation for ROCm platform
C++
308
star
6

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++
285
star
7

omnitrace

Omnitrace: Application Profiling, Tracing, and Analysis
C++
283
star
8

rccl

ROCm Communication Collectives Library (RCCL)
C++
231
star
9

Tensile

Stretching GPU performance for GEMMs and tensor contractions.
Python
211
star
10

ROCR-Runtime

ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime
C++
205
star
11

aomp

AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.
Fortran
203
star
12

AMDMIGraphX

AMD's graph optimization engine.
C++
181
star
13

rocFFT

Next generation FFT implementation for ROCm
C++
174
star
14

MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
C++
168
star
15

gpufort

GPUFORT: S2S translation tool for CUDA Fortran and Fortran+X in the spirit of hipify
Fortran
159
star
16

rocPRIM

ROCm Parallel Primitives
C++
154
star
17

omniperf

Advanced Profiling and Analytics for AMD Hardware
Python
128
star
18

rocprofiler

ROC profiler library. Profiling with perf-counters and derived metrics.
C
122
star
19

rocm-examples

A collection of examples for the ROCm software stack
C++
121
star
20

rocMLIR

C++
120
star
21

rocSPARSE

Next generation SPARSE implementation for ROCm platform
C++
117
star
22

rocm_smi_lib

ROCm SMI LIB
C++
114
star
23

rocRAND

RAND library for HIP programming language
C++
111
star
24

HIP-CPU

An implementation of HIP that works on CPUs, across OSes.
C++
107
star
25

rocThrust

ROCm Thrust - run Thrust dependent software on AMD GPUs
C++
100
star
26

ROCm-Device-Libs

ROCm Device Libraries
C
99
star
27

rocSOLVER

Next generation LAPACK implementation for ROCm platform
C++
91
star
28

hipCUB

Reusable software components for ROCm developers
C++
79
star
29

rocALUTION

Next generation library for iterative sparse solvers for ROCm platform
C++
74
star
30

rocWMMA

rocWMMA
C++
71
star
31

roctracer

ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs
C++
67
star
32

hipSPARSE

ROCm SPARSE marshalling library
C++
67
star
33

hipfort

Fortran interfaces for ROCm libraries
Fortran
66
star
34

atmi

Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provides a consistent, declarative API to create task graphs on CPUs and GPUs (integrated and discrete).
C++
65
star
35

ROCmValidationSuite

The ROCm Validation Suite is a system administrator’s and cluster manager's tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.
C++
61
star
36

rocm-cmake

CMake modules used within the ROCm libraries
CMake
59
star
37

hipFFT

hipFFT is a FFT marshalling library.
C++
52
star
38

amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
Python
48
star
39

ROCm-CompilerSupport

The compiler support repository provides various Lightning Compiler related services.
C++
46
star
40

rpp

AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/OpenCL/CPU back-ends.
C++
46
star
41

ROCclr

44
star
42

ROCgdb

This is ROCgdb, the ROCm source-level debugger for Linux, based on GDB, the GNU source-level debugger.
C
44
star
43

rocm_bandwidth_test

Bandwidth test for ROCm
C++
41
star
44

HIPCC

HIPCC: HIP compiler driver
C++
39
star
45

Experimental_ROC

Experimental and Intriguing Tools for ROCm
Shell
35
star
46

rocHPCG

HPCG benchmark based on ROCm platform
C++
35
star
47

ROC_SHMEM

ROC_SHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
C++
34
star
48

MISA

Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
Python
33
star
49

amdsmi

AMD SMI
C++
32
star
50

ROCm.github.io

ROCm Website
32
star
51

TransferBench

TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)
C++
27
star
52

clang-ocl

OpenCL compilation with clang compiler.
CMake
26
star
53

hipSOLVER

ROCm SOLVER marshalling library
C++
25
star
54

aotriton

Ahead of Time (AOT) Triton Math Library
Python
24
star
55

ROCm-OpenCL-Driver

ROCm OpenCL Compiler Tool Driver
C++
24
star
56

rocm-blogs

Jupyter Notebook
22
star
57

rccl-tests

RCCL Performance Benchmark Tests
Cuda
21
star
58

hipRAND

Random number library that generate pseudo-random and quasi-random numbers.
C++
21
star
59

rdc

RDC
C++
19
star
60

ROCdbgapi

The AMD Debugger API is a library that provides all the support necessary for a debugger and other tools to perform low level control of the execution and inspection of execution state of AMD's commercially available GPU architectures.
C++
19
star
61

pyrsmi

python package of rocm-smi-lib
Python
17
star
62

hip-python

HIP Python Low-level Bindings
Shell
16
star
63

hip-tests

C++
15
star
64

roc-stdpar

C++
14
star
65

pytorch-micro-benchmarking

Python
14
star
66

hipify_torch

Python
13
star
67

rocmProfileData

C++
13
star
68

rocm-docs-core

ROCm Documentation Python package for ReadTheDocs build standardization
CSS
12
star
69

rocAL

The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a processing graph programmable by the user.
C++
10
star
70

half

C++
9
star
71

rocprofiler-sdk

C++
9
star
72

rocBLAS-Examples

Examples illustrating usage of the rocBLAS library
C++
9
star
73

OSU_Microbenchmarks

ROCm - UCX enabled OSU_Benchmarks
C
8
star
74

MITuna

Python
7
star
75

rtg_tracer

C++
7
star
76

rocm-spack-pkgs

Repository to host spack recipes for ROCm
Python
6
star
77

rbuild

Rocm build tool
Python
6
star
78

Gromacs

ROCm's implementation of Gromacs
C++
5
star
79

rocm-core

CMake
4
star
80

hip-testsuite

Python
4
star
81

MIFin

Tuna centric MIOpen client
C++
4
star
82

rocm-llvm-python

Low-level Cython and Python bindings to the (ROCm) LLVM C API.
Shell
3
star
83

flang

Mirror of flang repo: The source repo is https://github.com/flang-compiler/flang . Once a day the master branch is updated from the upstream source repo and then locked. AOMP or ROCm developers may commit or create PRs on branch aomp-dev.
C++
3
star
84

hipSPARSELt

C++
2
star
85

aomp-extras

hostcall services library, math library, and utilities
Shell
2
star
86

MIOpenExamples

MIOpen examples
C++
2
star
87

hipOMB

OSU MPI benchmarks with ROCm support
C
1
star
88

numba-hip

HIP backend patch for Numba, the NumPy aware dynamic Python compiler using LLVM.
Python
1
star
89

migraphx-benchmark

1
star
90

tensorcast

Python
1
star
91

rocm-recipes

Recipes for rocm
CMake
1
star
92

rocprofiler-register

CMake
1
star
93

rocm-install-on-windows

1
star