• Stars
    star
    346
  • Rank 122,430 (Top 3 %)
  • Language
    Julia
  • License
    MIT License
  • Created over 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Metal programming in Julia

Metal.jl

Metal programming in Julia

With Metal.jl it's possible to program GPUs on macOS using the Metal programming framework.

The package is a work-in-progress. There are bugs, functionality is missing, and performance hasn't been optimized. Expect to have to make changes to this package if you want to use it. PRs are very welcome!

Requirements

  • Mac device with M-series chip
  • Julia 1.8
  • macOS 13 (Ventura)

These requirements are fairly strict, and are due to our limited development resources (manpower, hardware). Technically, they can be relaxed. If you are interested in contributing to this, see this issue for more details. In practice, Metal.jl will probably work on any macOS 10.15+, and other GPUs that are supported by Metal might also function (if only partially), but such combinations are unsupported for now.

Quick start

Metal.jl can be installed with the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run:

pkg> add Metal

Or, equivalently, via the Pkg API:

julia> import Pkg; Pkg.add("Metal")

For an overview of the toolchain in use, you can run the following command after importing the package:

julia> using Metal

julia> Metal.versioninfo()
macOS 13.5.0, Darwin 22.6.0

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6

Julia packages:
- Metal.jl: 0.5.0
- Metal_LLVM_Tools_jll: 0.5.1+0

1 device:
- Apple M2 Max (64.000 KiB allocated)

Array abstraction

The easiest way to work with Metal.jl, is by using its array abstraction. The MtlArray type is both meant to be a convenient container for device memory, as well as provide a data-parallel abstraction for using the GPU without writing your own kernels:

julia> a = MtlArray([1])
1-element MtlArray{Int64, 1}:
 1

julia> a .+ 1
1-element MtlArray{Int64, 1}:
 2

Kernel programming

The above array abstractions are all implemented using Metal kernels written in Julia. These kernels follow a similar programming style to Julia's other GPU back-ends, and with that deviate from how kernels are implemented in Metal C (i.e., indexing intrinsics are functions not arguments, arbitrary aggregate arguments are supported, etc):

julia> function vadd(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + b[i]
           return
       end
vadd (generic function with 1 method)

julia> a = MtlArray([1,1,1,1]); b = MtlArray([2,2,2,2]); c = similar(a);

julia> @metal threads=2 groups=2 vadd(a, b, c)

julia> Array(c)
4-element Vector{Int64}:
 3
 3
 3
 3

Profiling

This package also supports profiling GPU execution for later visualization with Apple's Xcode tools. The easiest way to generate a GPU report is to use the Metal.@profile macro as seen below. To profile GPU code from a Julia process, you must set the METAL_CAPTURE_ENABLED environment variable before importing Metal.jl. On the first Metal command detected, you should get a message stating "Metal GPU Frame Capture Enabled" if the variable was set correctly:

julia> ENV["METAL_CAPTURE_ENABLED"] = 1
julia> using Metal

julia> function vadd(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + b[i]
           return
       end

julia> a = MtlArray([1]); b = MtlArray([2]); c = similar(a);
... Metal GPU Frame Capture Enabled

julia> Metal.@profile @metal threads=length(c) vadd(a, b, c);
[ Info: GPU frame capture saved to julia_capture_1.gputrace

This will generate a .gputrace folder in the current directory. To view the profile, open the folder with Xcode.

Note: Xcode is a large install, and there are some peculiarities with viewing Julia-created GPU traces. It's recommended to only have one trace open at a time, and the shader profiler may fail to start.

Metal API wrapper

Finally, all of the above functionality is made possible by interfacing with the Metal libraries through ObjectiveC.jl. We provide low-level objects and functions that map These low-level API wrappers, along with some slightly higher-level Julia wrappers, are available in the MTL submodule exported by Metal.jl:

julia> dev = MTLDevice(1)
<AGXG13XDevice: 0x14c17f200>
    name = Apple M1 Pro

julia> dev.name
NSString("Apple M1 Pro")

Hacking

Metal.jl relies on a custom LLVM with an AIR back-end, provided as a JLL. Normally, this JLLis built on Yggdrasil. If you need to make changes to the LLVM back-end, have a look at the build_llvm.jl in the deps/ folder. This scripts builds a local version of the LLVM back-end, and configures a local preference such that any environment depending on the corresponding JLLs will pick-up the modified version (i.e., do julia --project in a clone of Metal.jl).

Acknowledgements

This package builds upon the experience of several Julia contributors to CUDA.jl, AMDGPU.jl and oneAPI.jl.

More Repositories

1

CUDA.jl

CUDA programming in Julia.
Julia
1,204
star
2

CUDAnative.jl

Julia support for native CUDA programming
Julia
392
star
3

KernelAbstractions.jl

Heterogeneous programming in Julia
Julia
369
star
4

GPUArrays.jl

Reusable array functionality for Julia's various GPU backends.
Julia
316
star
5

CuArrays.jl

A Curious Cumulation of CUDA Cuisine
Julia
281
star
6

AMDGPU.jl

AMD GPU (ROCm) programming in Julia
Julia
278
star
7

OpenCL.jl

OpenCL Julia bindings
Julia
265
star
8

XLA.jl

Julia on TPUs
Julia
223
star
9

ArrayFire.jl

Julia wrapper for the ArrayFire library
Julia
204
star
10

oneAPI.jl

Julia support for the oneAPI programming toolkit.
Julia
178
star
11

GPUCompiler.jl

Reusable compiler infrastructure for Julia GPU backends.
Julia
153
star
12

Vulkan.jl

Using Vulkan from Julia
Julia
108
star
13

Adapt.jl

Julia
89
star
14

GemmKernels.jl

Flexible and performant GEMM kernels in Julia
Julia
77
star
15

VulkanCore.jl

Julia bindings for the Vulkan API
Julia
75
star
16

CUDAdrv.jl

A Julia wrapper for the CUDA driver API.
Julia
67
star
17

CLArrays.jl

OpenCL-backed GPU Arrays
Julia
62
star
18

AMDGPUnative.jl

Julia interface to AMD/Radeon GPUs
Julia
55
star
19

DaggerGPU.jl

GPU integrations for Dagger.jl
Julia
50
star
20

NVTX.jl

Julia bindings for NVTX, for instrumenting with the Nvidia Nsight Systems profiler
Julia
26
star
21

gitlab-ci

Resources related to the JuliaGPU GitLab CI.
Dockerfile
25
star
22

NCCL.jl

A Julia wrapper for the NVIDIA Collective Communications Library.
Julia
24
star
23

juliagpu.org

The JuliaGPU landing page.
HTML
23
star
24

CLBLAS.jl

CLBLAS integration for Julia
Julia
22
star
25

docker

Docker recipes for Julia builds with JuliaGPU packages.
Julia
21
star
26

buildkite

Shell
21
star
27

ROCArrays.jl

Parallel on the ROCks
Julia
18
star
28

julia-ngc

Dockerfile
17
star
29

CLFFT.jl

Julia bindings for AMD's clFFT library
Julia
16
star
30

CUDAapi.jl

Reusable components for CUDA API development.
Julia
16
star
31

HSARuntime.jl

Julia interface to the HSA runtime, for supporting AMD GPUs
Julia
15
star
32

Learning

Learning materials for GPU programming in Julia.
Jupyter Notebook
14
star
33

CLBlast.jl

Julia wrapper of CLBlast, a "tuned OpenCL BLAS library".
Julia
13
star
34

CUDAnativelib.jl

Julia
12
star
35

AMGX.jl

Julia
11
star
36

GPUBenchmarks.jl

Benchmarking Julia's GPUArray packages
Julia
8
star
37

HSA.jl

Julia Bindings for the HSA Runtime
Julia
8
star
38

BinomialGPU.jl

A Julia package for sampling binomial random variates on an nVidia GPU
Julia
7
star
39

GPUShowcases.jl

Indepth examples and show cases for GPUArrays
Julia
6
star
40

CUDABuilder

Julia
6
star
41

SPIRV.jl

SPIRV codegen for Julia
Julia
4
star
42

meta

Place for discussing general Julia GPGPU related topics
4
star
43

ROCTX.jl

Julia bindings for ROC-TX
Julia
2
star
44

NVVM.jl

Julia interface to the NVIDIA NVVM compiler library.
Julia
2
star
45

CUDA_Driver.jl

Julia
1
star
46

ROCmDeviceLibsDownloader

A BB "builder" repository for ROCm-Device-Libs bitcode
Julia
1
star