• Stars
    star
    1,184
  • Rank 39,488 (Top 0.8 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

TornadoVM: A practical and efficient heterogeneous programming framework for managed languages

TornadoVM

TornadoVM is a plug-in to OpenJDK and GraalVM that allows programmers to automatically run Java programs on heterogeneous hardware. TornadoVM targets OpenCL, PTX and SPIR-V compatible devices which include multi-core CPUs, dedicated GPUs (Intel, NVIDIA, AMD), integrated GPUs (Intel HD Graphics and ARM Mali), and FPGAs (Intel and Xilinx).

TornadoVM has three backends that generate OpenCL C, NVIDIA CUDA PTX assembly, and SPIR-V binary. Developers can choose which backends to install and run.


Website: tornadovm.org

For a quick introduction please read the following FAQ.

Latest Release: TornadoVM 0.15.2 - 26/07/2023 : See CHANGELOG.


1. Installation

In Linux and Mac OSx, TornadoVM can be installed automatically with the installation script. For example:

$ ./scripts/tornadovm-installer 
usage: tornadovm-installer [-h] [--version] [--jdk JDK] [--backend BACKEND] [--listJDKs] [--javaHome JAVAHOME]

TornadoVM Installer Tool. It will install all software dependencies except the GPU/FPGA drivers

optional arguments:
  -h, --help           show this help message and exit
  --version            Print version of TornadoVM
  --jdk JDK            Select one of the supported JDKs. Use --listJDKs option to see all supported ones.
  --backend BACKEND    Select the backend to install: { opencl, ptx, spirv }
  --listJDKs           List all JDK supported versions
  --javaHome JAVAHOME  Use a JDK from a user directory

NOTE Select the desired backend:

  • opencl: Enables the OpenCL backend (requires OpenCL drivers)
  • ptx: Enables the PTX backend (requires NVIDIA CUDA drivers)
  • spirv: Enables the SPIRV backend (requires Intel Level Zero drivers)

Example of installation:

# Install the OpenCL backend with OpenJDK 17
$ ./scripts/tornadovm-installer --jdk jdk17 --backend opencl 

# It is also possible to combine different backends:
$ ./scripts/tornadovm-installer -- jdk jdk17 --backend opencl,spirv,ptx

Alternatively, TornadoVM can be installed either manually from source or by using Docker.

If you are planning to use Docker with TornadoVM on GPUs, you can also follow these guidelines.

You can also run TornadoVM on Amazon AWS CPUs, GPUs, and FPGAs following the instructions here.

2. Usage Instructions

TornadoVM is currently being used to accelerate machine learning and deep learning applications, computer vision, physics simulations, financial applications, computational photography, and signal processing.

Featured use-cases:

  • kfusion-tornadovm: Java application for accelerating a computer-vision application using the Tornado-APIs to run on discrete and integrated GPUs.
  • Java Ray-Tracer: Java application accelerated with TornadoVM for real-time ray-tracing.

We also have a set of examples that includes NBody, DFT, KMeans computation and matrix computations.

Additional Information

3. Programming Model

TornadoVM exposes to the programmer task-level, data-level and pipeline-level parallelism via a light Application Programming Interface (API). In addition, TornadoVM uses single-source property, in which the code to be accelerated and the host code live in the same Java program.

Compute-kernels in TornadoVM can be programmed using two different approaches (APIs):

a) Loop Parallel API

Compute kernels are written in a sequential form (tasks programmed for a single thread execution). To express parallelism, TornadoVM exposes two annotations that can be used in loops and parameters: a) @Parallel for annotating parallel loops; and b) @Reduce for annotating parameters used in reductions.

The following code snippet shows a full example to accelerate Matrix-Multiplication using TornadoVM and the loop-parallel API:

public class Compute {
    private static void mxmLoop(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
        for (@Parallel int i = 0; i < size; i++) {
            for (@Parallel int j = 0; j < size; j++) {
                float sum = 0.0f;
                for (int k = 0; k < size; k++) {
                    sum += A.get(i, k) * B.get(k, j);
                }
                C.set(i, j, sum);
            }
        }
    }

    public void run(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
        TaskGraph taskGraph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, A, B) // Transfer data from host to device only in the first execution
                .task("t0", Compute::mxmLoop, A, B, C, size)              // Each task points to an existing Java method
                .transferToHost(DataTransferMode.EVERY_EXECUTION, C);     // Transfer data from device to host
        // Create an immutable task-graph
        ImmutableTaskGraph immutableTaskGraph = taskGraph.snaphot();

        // Create an execution plan from an immutable task-graph
        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph);

        // Execute the execution plan
        TorandoExecutionResult executionResult = executionPlan.execute();
    }
}

b) Kernel API

Another way to express compute-kernels in TornadoVM is via the kernel API. To do so, TornadoVM exposes a KernelContext with which the application can directly access the thread-id, allocate memory in local memory (shared memory on NVIDIA devices), and insert barriers. This model is similar to programming compute-kernels in OpenCL and CUDA. Therefore, this API is more suitable for GPU/FPGA expert programmers that want more control or want to port existing CUDA/OpenCL compute kernels into TornadoVM.

The following code-snippet shows the Matrix Multiplication example using the kernel-parallel API:

public class Compute {
    private static void mxmKernel(KernelContext context, Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
        int idx = context.threadIdx;
        int jdx = context.threadIdy;
        float sum = 0;
        for (int k = 0; k < size; k++) {
            sum += A.get(idx, k) * B.get(k, jdx);
        }
        C.set(idx, jdx, sum);
    }

    public void run(Matrix2DFloat A, Matrix2DFloat B, Matrix2DFloat C, final int size) {
        // When using the kernel-parallel API, we need to create a Grid and a Worker

        WorkerGrid workerGrid = new WorkerGrid2D(size, size);    // Create a 2D Worker
        GridScheduler gridScheduler = new GridScheduler("s0.t0", workerGrid);  // Attach the worker to the Grid
        KernelContext context = new KernelContext();             // Create a context
        workerGrid.setLocalWork(32, 32, 1);                      // Set the local-group size

        TaskGraph taskGraph = new TaskGraph("s0")
                .transferToDevice(DataTransferMode.FIRST_EXECUTION, A, B) // Transfer data from host to device only in the first execution
                .task("t0", Compute::mxmKernel, context, A, B, C, size)   // Each task points to an existing Java method
                .transferToHost(DataTransferMode.EVERY_EXECUTION, C);     // Transfer data from device to host

        // Create an immutable task-graph
        ImmutableTaskGraph immutableTaskGraph = taskGraph.snaphot();

        // Create an execution plan from an immutable task-graph
        TornadoExecutionPlan executionPlan = new TornadoExecutionPlan(immutableTaskGraph);

        // Execute the execution plan
        executionPlan.withGridScheduler(gridScheduler)
                     .execute();
    }
}

Additionally, the two modes of expressing parallelism (kernel and loop parallelization) can be combined in the same task graph object.

4. Dynamic Reconfiguration

Dynamic reconfiguration is the ability of TornadoVM to perform live task migration between devices, which means that TornadoVM decides where to execute the code to increase performance (if possible). In other words, TornadoVM switches devices if it can detect that a specific device can yield better performance (compared to another).

With the task-migration, the TornadoVM's approach is to only switch device if it detects an application can be executed faster than the CPU execution using the code compiled by C2 or Graal-JIT, otherwise it will stay on the CPU. So TornadoVM can be seen as a complement to C2 and Graal JIT compilers. This is because there is no single hardware to best execute all workloads efficiently. GPUs are very good at exploiting SIMD applications, and FPGAs are very good at exploiting pipeline applications. If your applications follow those models, TornadoVM will likely select heterogeneous hardware. Otherwise, it will stay on the CPU using the default compilers (C2 or Graal).

To use the dynamic reconfiguration, you can execute using TornadoVM policies. For example:

// TornadoVM will execute the code in the best accelerator.
executionPlan.withDynamicReconfiguration(Policy.PERFORMANCE, DRMode.PARALLEL)
             .execute();

Further details and instructions on how to enable this feature can be found here.

5. How to Use TornadoVM in your Projects?

To use TornadoVM, you need two components:

a) The TornadoVM jar file with the API. The API is licensed as GPLV2 with Classpath Exception. b) The core libraries of TornadoVM along with the dynamic library for the driver code (.so files for OpenCL, PTX and/or SPIRV/Level Zero).

You can import the TornadoVM API by setting this the following dependency in the Maven pom.xml file:

<repositories>
    <repository>
        <id>universityOfManchester-graal</id>
        <url>https://raw.githubusercontent.com/beehive-lab/tornado/maven-tornadovm</url>
    </repository>
</repositories>

<dependencies>
<dependency>
    <groupId>tornado</groupId>
    <artifactId>tornado-api</artifactId>
    <version>0.15.2</version>
</dependency>
<dependency>
    <groupId>tornado</groupId>
    <artifactId>tornado-matrices</artifactId>
    <version>0.15.2</version>
</dependency>
</dependencies>

To run TornadoVM, you need to either install the TornadoVM extension for GraalVM/OpenJDK, or run with our Docker images.

6. Additional Resources

Here you can find videos, presentations, tech-articles and artefacts describing TornadoVM, and how to use it.

7. Academic Publications

If you are using TornadoVM >= 0.2 (which includes the Dynamic Reconfiguration, the initial FPGA support and CPU/GPU reductions), please use the following citation:

@inproceedings{Fumero:DARHH:VEE:2019,
 author = {Fumero, Juan and Papadimitriou, Michail and Zakkak, Foivos S. and Xekalaki, Maria and Clarkson, James and Kotselidis, Christos},
 title = {{Dynamic Application Reconfiguration on Heterogeneous Hardware.}},
 booktitle = {Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments},
 series = {VEE '19},
 year = {2019},
 doi = {10.1145/3313808.3313819},
 publisher = {Association for Computing Machinery}
}

If you are using Tornado 0.1 (Initial release), please use the following citation in your work.

@inproceedings{Clarkson:2018:EHH:3237009.3237016,
 author = {Clarkson, James and Fumero, Juan and Papadimitriou, Michail and Zakkak, Foivos S. and Xekalaki, Maria and Kotselidis, Christos and Luj\'{a}n, Mikel},
 title = {{Exploiting High-performance Heterogeneous Hardware for Java Programs Using Graal}},
 booktitle = {Proceedings of the 15th International Conference on Managed Languages \& Runtimes},
 series = {ManLang '18},
 year = {2018},
 isbn = {978-1-4503-6424-9},
 location = {Linz, Austria},
 pages = {4:1--4:13},
 articleno = {4},
 numpages = {13},
 url = {http://doi.acm.org/10.1145/3237009.3237016},
 doi = {10.1145/3237009.3237016},
 acmid = {3237016},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {Java, graal, heterogeneous hardware, openCL, virtual machine},
}

Selected publications can be found here.

8. Acknowledgments

This work is partially funded by Intel corporation. In addition, it has been supported by the following EU & UKRI grants (most recent first):

Furthermore, TornadoVM has been supported by the following EPSRC grants:

9. Contributions and Collaborations

We welcome collaborations! Please see how to contribute to the project in the CONTRIBUTING page.

Write your questions and proposals:

Additionally, you can open new proposals on the GitHub discussions page:https://github.com/beehive-lab/TornadoVM/discussions.

Alternatively, you can share a Google document with us.

Collaborations:

For Academic & Industry collaborations, please contact here.

10. TornadoVM Team

Visit our website to meet the team.

11. Licenses

To use TornadoVM, you can link the TornadoVM API to your application which is under the CLASSPATH Exception of GPLv2.0.

Each TornadoVM module is licensed as follows:

Module License
Tornado-API License: GPL v2 + CLASSPATH Exception
Tornado-Runtime License: GPL v2
Tornado-Assembly License: GPL v2
Tornado-Drivers License: GPL v2
Tornado-Drivers-OpenCL-Headers License
Tornado-scripts License: GPL v2
Tornado-Annotation License
Tornado-Unittests License
Tornado-Benchmarks License
Tornado-Examples License
Tornado-Matrices License
JNI Libraries (OpenCL, PTX and LevelZero) License

More Repositories

1

Maxine-VM

Maxine VM: A meta-circular research VM
Java
323
star
2

mambo

A low-overhead dynamic binary instrumentation and modification tool for ARM (both AArch32 and AArch64 support) and RISC-V (RV64GC).
C
320
star
3

FastPath_MP

FastPath_MP: An FPGA-based multi-path architecture for direct access from FPGA to NVMe SSD
C
31
star
4

docker-tornadovm

Docker build scripts for TornadoVM on GPUs: https://github.com/beehive-lab/TornadoVM
Shell
28
star
5

kfusion-tornadovm

🎥 A Java implementation of Kinect Fusion running on Tornado VM.
Java
22
star
6

TornadoQSim

High-performance and modular quantum simulator in Java
Java
16
star
7

beehive-spirv-toolkit

Prototype for a SPIR-V assembler and dissasembler. It provides a composable Java interface for generating SPIR-V code at runtime.
Java
14
star
8

levelzero-jni

Intel LevelZero JNI library for TornadoVM
Java
12
star
9

tornado-insight

TornadoInsight: Unleashing the Power of TornadoVM in IntelliJ IDEA
Java
11
star
10

ProtonVM

Parallel Bytecode Interpreter For Heterogeneous Hardware
C++
9
star
11

llvm-bcj

A Java library that decodes LLVM bitcode files
Java
7
star
12

Maxine-Dockerfile

A Dockerfile to build a docker image that is capable of compiling and running the Maxine VM
Dockerfile
4
star
13

tornadovm-installer

Easy Installer for TornadoVM
Shell
3
star
14

map-reduce

MR4J is a MapReduce framework class building on the ForkJoinPool to manage the scheduling of tasks in the different phases of execution. An optimiser is available to reduce the overhead associated with the intermediate data without extending the API.
Java
3
star
15

mambo-vm

Scripts to build an AArch64 QEMU virtual machine image for evaluating MAMBO
Shell
2
star
16

DFLOW

A framework for rapidly prototyping data DFLOW proof-of-concepts
C++
2
star
17

pie

Instruction encoder / decoder generator
Ruby
2
star
18

riscv-callsites-benchmark

A microbenchmark generating inline callsites for estimating the overhead of different implementations on RISCV
C
1
star
19

tornadovm-dashboard

TornadoVM Dashboard for Performance Tracking
Python
1
star
20

Maxine-Graal

C++
1
star
21

Maxine-Jenkins

Python
1
star