• Stars
    star
    361
  • Rank 112,880 (Top 3 %)
  • Language MLIR
  • Created over 3 years ago
  • Updated 6 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MLIR-HLO: A Standalone "HLO" MLIR-based Compiler

The code here exists in two places:

This implements a self-contained compiler for a linear algebra set of operations inspired by XLA HLO IR using MLIR components. It is designed to provide an end-to-end flow independent of TensorFlow and XLA, but usable inside of these projects.

Coding practice and conventions in this repository follow the MLIR Developer Guide in this repo as part of the intent to act as an incubator for technology to upstream.

QuickStart: building and testing

These instructions work on Linux, you may have to adjust for your platform.

To build the code in this repository, you need a clone of the LLVM/MLIR git repository:

$ git clone https://github.com/llvm/llvm-project.git

You need to make sure you have the right commit checked out in the LLVM repository (you need to do this every time you pull from this repo):

$ (cd llvm-project && git checkout $(cat ../build_tools/llvm_version.txt))

We provide a script to configure and build LLVM/MLIR:

$ build_tools/build_mlir.sh ${PWD}/llvm-project/ ${PWD}/llvm-build

Again this is something to do every time you pull from this repository and the LLVM revision changes.

Finally you can build and test this repository:

$ mkdir build && cd build
$ cmake .. -GNinja \
   -DLLVM_ENABLE_LLD=ON \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_ASSERTIONS=On \
   -DMLIR_DIR=${PWD}/../llvm-build/lib/cmake/mlir
$ ninja check-mlir-hlo

Overview

MLIR-HLO aims to provide an end-to-end compiler for CPU and GPU, as well as building reusable blocks for other accelerators. This is heavily inspired by the success of XLA.

XLA (Accelerated Linear Algebra) is a domain-specific compiler framework and execution environment for linear algebra, which powers code-generation for ML frameworks like TensorFlow, JAX, and others.

A cornerstone of XLA is the HLO (High Level Optimizer) IR, which offers a carefully fixed selected list of operations, mostly orthogonal to each other. It provides an efficient optimizer for computations expressed with this set of operations and generate codes for hardware platforms like CPU, GPU, and TPUs. Its goal is to provide a uniform interface to compile and execute these optimized HLO programs independently of the targeted device. It is not a front-end ML system like TensorFlow or JAX, rather it is a backend framework that optimizes HLO and lowers to machine code.

The HLO set of operations is closed and has well defined semantics. HLO operations operate on immutable Tensors with static shapes (actually bounded shapes to be exact) and explicit broadcasts.

MLIR is a compiler infrastructure which intends to come with "battery included", as such it intends to provide all the blocks required to assemble graph optimization and codegen pipelines. The longer term roadmap for MLIR is to provide a Tensor Compute Primitive (TCP) dialect, which should hopefully be general enough to model what HLO represents today (see slides and recording for a technical discussion on this topic).

The work on MLIR-HLO can be seen as a stepping stone towards building TCP, while integrating intermediate components into XLA itself by relying on the well-proven HLO IR and introducing more pieces from upstream MLIR (Linalg, Vector, GPU dialect, ...). This document provides more information on the current migration of the XLA GPU codegen.

MLIR Dialects for XLA-style compilation

This repository defines three dialects to support a HLO-like compilation pipeline using MLIR:

  • chlo: the "client" HLO dialect, intended to be closer to the frontend (including implicit broadcast semantics).
  • mhlo: "meta"-HLO dialect ; similar to xla_hlo, but with extensions for dynamic shape support.
  • lmhlo: "late"-"meta"-HLO, it is the IR after buffer allocation is performed. In XLA the buffer allocation is a side-data structure which keeps track of these informations, while this separate dialect materializes it in the IR.

We describe these in more details below.

HLO Client Dialect: chlo.

  • It was originally designed to map the XLA client APIs (e.g., ops supports implicit broadcast and roughly modeled on XlaBuilder API) modulo support for dynamic shapes and additional ops required to support dynamic client side HLOs.
  • Ops can be from either the XlaBuilder or XLA helper functions can be converted into ops (e.g., given ambiguity in what constitutes these ops, there is some freedom to decide), the goal of this dialect is to correspond close to client level and enable a thin layer between client use and op construction (making it cheap to construct and optimizations on the dialect close to optimizations on the client ops).

Entry:

  • The vast majority of old "client" interactions are via the XlaBuilder APIs. These APIs are used by TF2XLA kernels, JAX, PyTorch bridge and directly. The legalization path (described below) can also reuse the XlaBuilder's APIs to construct XLA Client HLO ops directly (this uses MlirXlaBuilder which is a subclass of XlaBuilder).
  • The other entry point is during legalization from TensorFlow ops in the TF Graph Compiler and other tools (e.g., SavedModel lowering and TFCompile).

Exit:

  • MHLO
  • May be exported to xla::HloInstructionProto by invoking the XlaBuilder APIs (with regular XlaBuilder)

The chlo dialect started originally as mapping to the XLA client Builder APIs. It enables it to both be constructed and converted back to existing XLA interfaces using the XlaBuilder API. Due to the way that translation into and out of the dialect works, there is no expectation that this dialect roundtrips to XLA (e.g., it is only intended to be translated to MLIR and then legalized to another dialect or translated to HloInstructionProto).

The export approach of reusing the XlaBuilders enables reusing a lot of logic that was already implemented in terms of computing shapes, inserting broadcasts etc.

An important topic here is that XLA Client HLO ops are not a well defined set. And in particular what some would consider helper functions, others would consider ops. It should be easy to move between these and so define a new op along with the helper function or autogenerate the helper functions from the descriptions of the ops. For the former, a simple approach would be to simply consider the context in which the op is being constructed and if an MLIR one, construct a op in the client dialect instead of further calls into XlaBuilder. The latter could be implemented by adding the op and a legalization of the op to other known ops, from which a helper function can get generated that could be used as regular.

Status: Exists but need to be cleaned up.

Meta HLO Dialect mhlo

  • Dialect is closer to current HLO server ops (e.g., no implicit broadcast)
  • MHLO dialect where we can deviate from the requirements of the client or server dialect, in particular:
    • Control flow ops with implicit capture to enable simpler optimizations (e.g., generic LICM, unroll & jam, etc.)
    • Multiple results ops (e.g., no tuples)
    • More ops (for example, unique op or assert op), and ops that don't need to be added to either client or server dialect.
    • Op set not constrained by implementation (e.g., hlo.add operating on say i79 or !mydialect.weird_type is allowed even though no XLA backend supports it). Verification on types happening at the boundaries.
    • It does not need to preserve some deprecated XLA constructs (e.g. stateful RNG HLO).
    • More dynamic shape support ops without need for updating all users/backends.
  • This dialect enables evolving HLO independently from XLA in order to experiment with features we'd like to upstream in MLIR TCP. In particular it intends to be user-extensible through interfaces.
  • It should have no TensorFlow, or proto, or other Google internal dependencies.
  • It need not be a complete superset of ops compared to XLA HLO dialect.

Entry:

  • Legalization from chlo dialect or conversion from XLA HLO.
  • Directly emitted from TF Graph Compiler;
  • Builder call (e.g., EDSL);

Exit:

  • LMHLO, Linalg IREE, directly used in codegen.
  • XLA HLO.

The MHLO dialect has no direct export format, it is only meant as an intermediate optimization dialect/format. It is also where we can experiment cheaply with new ops. This format will be where the representation would differ from existing endpoints.

Status: Exists but need to be cleaned up and evolved, in particular with respect to supporting dynamic shapes.

MHLO differs from XLA HLO op set in multiple ways, including:

  1. MHLO While accepts multiple operands and may produce multiple results instead;

LMHLO

LMHLO corresponds to late mhlo and operates on buffer domain (e.g., memref) with side-effecting operations. The lowering from mhlo dialect proceeds by way of scheduling, memory and buffer allocation. The current mapping is directly on XLA Client HLOs but without implicit broadcast and with operation on memrefs. This dialect will instead be rebased on mhlo dialect but operating on buffers still.

Entry:

  • Post buffer assignment on mhlo dialect, or from XLA after buffer assignment.

Exit:

  • Codegen (LLVM IR in the common cases at the moment)

End-to-End pipeline

TODO

Alternative build setups

Building Python API

Building the MHLO Python API requires building as an LLVM external project. The below instructions presume that you have this mlir-hlo repo and an llvm-project repo checked out side by side.

Note that the python package produced by this procedure includes the mlir package and is not suitable for deployment as-is (but it can be included into a larger aggregate).

mkdir build && cd build
cmake -GNinja -B. ${LLVM_SRC_DIR}/llvm \
    -DCMAKE_BUILD_TYPE=Release \
    -DLLVM_ENABLE_PROJECTS=mlir \
    -DLLVM_EXTERNAL_PROJECTS=mlir_hlo \
    -DLLVM_EXTERNAL_MLIR_HLO_SOURCE_DIR=${MLIR_HLO_SRC_DIR} \
    -DLLVM_TARGETS_TO_BUILD=host \
    -DPython3_EXECUTABLE=$(which python) \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DMHLO_ENABLE_BINDINGS_PYTHON=ON

ninja MLIRHLOPythonModules
export PYTHONPATH=$PWD/tools/mlir_hlo/python_packages/mlir_hlo
python -c "import mlir.dialects.mhlo"

External projects that depend on mlir-hlo

External projects that need to depend on mlir-hlo (for example via a git submodule) can use the following setting in their cmake configuration in order for find_package(MHLO) to import all mlir-hlo cmake targets into their build setup and have access to the required include and lib variables (see generated MHLOConfig.cmake).

...
   -DMHLO_DIR=<path to mlir-hlo build dir>/lib/cmake/mlir-hlo
   ...

More Repositories

1

tensorflow

An Open Source Machine Learning Framework for Everyone
C++
181,486
star
2

models

Models and examples built with TensorFlow
Python
76,523
star
3

tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
TypeScript
18,026
star
4

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Python
14,693
star
5

tfjs-models

Pretrained models for TensorFlow.js
TypeScript
13,592
star
6

playground

Play with neural networks!
TypeScript
11,585
star
7

tfjs-core

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.
TypeScript
8,493
star
8

examples

TensorFlow examples
Jupyter Notebook
7,681
star
9

tensorboard

TensorFlow's Visualization Toolkit
TypeScript
6,500
star
10

tfjs-examples

Examples built with TensorFlow.js
JavaScript
6,397
star
11

nmt

TensorFlow Neural Machine Translation Tutorial
Python
6,315
star
12

swift

Swift for TensorFlow
Jupyter Notebook
6,115
star
13

serving

A flexible, high-performance serving system for machine learning models
C++
6,068
star
14

docs

TensorFlow documentation
Jupyter Notebook
5,997
star
15

tpu

Reference models and tools for Cloud TPUs.
Jupyter Notebook
5,177
star
16

rust

Rust language bindings for TensorFlow
Rust
4,939
star
17

lucid

A collection of infrastructure and tools for research in neural network interpretability.
Jupyter Notebook
4,611
star
18

datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Python
4,143
star
19

probability

Probabilistic reasoning and statistical analysis in TensorFlow
Jupyter Notebook
4,053
star
20

adanet

Fast and flexible AutoML with learning guarantees.
Jupyter Notebook
3,474
star
21

hub

A library for transfer learning by reusing parts of TensorFlow models.
Python
3,431
star
22

minigo

An open-source implementation of the AlphaGoZero algorithm
C++
3,428
star
23

skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Python
3,185
star
24

lingvo

Lingvo
Python
2,777
star
25

graphics

TensorFlow Graphics: Differentiable Graphics Layers for TensorFlow
Python
2,738
star
26

ranking

Learning to Rank in TensorFlow
Python
2,709
star
27

agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Python
2,706
star
28

federated

A framework for implementing federated learning
Python
2,256
star
29

tfx

TFX is an end-to-end platform for deploying production ML pipelines
Python
2,065
star
30

privacy

Library for training machine learning models with privacy for training data
Python
1,857
star
31

fold

Deep learning with dynamic computation graphs in TensorFlow
Python
1,825
star
32

recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Python
1,739
star
33

quantum

Hybrid Quantum-Classical Machine Learning in TensorFlow
Python
1,723
star
34

mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure
1,720
star
35

addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Python
1,677
star
36

tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
C++
1,575
star
37

haskell

Haskell bindings for TensorFlow
Haskell
1,558
star
38

mesh

Mesh TensorFlow: Model Parallelism Made Easier
Python
1,540
star
39

workshops

A few exercises for use at events.
Jupyter Notebook
1,457
star
40

model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Python
1,454
star
41

ecosystem

Integration of TensorFlow with other open-source frameworks
Scala
1,362
star
42

gnn

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Python
1,246
star
43

community

Stores documents used by the TensorFlow developer community
C++
1,239
star
44

model-analysis

Model analysis tools for TensorFlow
Python
1,234
star
45

text

Making text a first-class citizen in TensorFlow.
C++
1,190
star
46

benchmarks

A benchmark framework for Tensorflow
Python
1,130
star
47

tfjs-node

TensorFlow powered JavaScript library for training and deploying ML models on Node.js.
TypeScript
1,048
star
48

similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Python
992
star
49

transform

Input pipeline framework
Python
982
star
50

neural-structured-learning

Training neural models with structured signals.
Python
976
star
51

gan

Tooling for GANs in TensorFlow
Jupyter Notebook
907
star
52

compression

Data compression in TensorFlow
Python
806
star
53

swift-apis

Swift for TensorFlow Deep Learning Library
Swift
794
star
54

deepmath

Experiments towards neural network theorem proving
C++
779
star
55

data-validation

Library for exploring and validating machine learning data
Python
748
star
56

runtime

A performant and modular runtime for TensorFlow
C++
744
star
57

java

Java bindings for TensorFlow
Java
730
star
58

tensorrt

TensorFlow/TensorRT integration
Jupyter Notebook
723
star
59

tfjs-converter

Convert TensorFlow SavedModel and Keras models to TensorFlow.js
TypeScript
696
star
60

io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
C++
686
star
61

docs-l10n

Translations of TensorFlow documentation
Jupyter Notebook
684
star
62

swift-models

Models and examples built with Swift for TensorFlow
Jupyter Notebook
644
star
63

decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Python
643
star
64

tcav

Code for the TCAV ML interpretability project
Jupyter Notebook
612
star
65

recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
Cuda
547
star
66

tfjs-wechat

WeChat Mini-program plugin for TensorFlow.js
TypeScript
524
star
67

lattice

Lattice methods in TensorFlow
Python
519
star
68

model-card-toolkit

A toolkit that streamlines and automates the generation of model cards
Python
400
star
69

flutter-tflite

Dart
377
star
70

custom-op

Guide for building custom op for TensorFlow
Smarty
370
star
71

cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
Python
364
star
72

tfjs-vis

A set of utilities for in browser visualization with TensorFlow.js
TypeScript
360
star
73

tflite-support

TFLite Support is a toolkit that helps users to develop ML and deploy TFLite models onto mobile / ioT devices.
C++
350
star
74

profiler

A profiling and performance analysis tool for TensorFlow
TypeScript
340
star
75

fairness-indicators

Tensorflow's Fairness Evaluation and Visualization Toolkit
Jupyter Notebook
330
star
76

moonlight

Optical music recognition in TensorFlow
Python
325
star
77

tfjs-tsne

TypeScript
309
star
78

estimator

TensorFlow Estimator
Python
295
star
79

embedding-projector-standalone

HTML
284
star
80

tfjs-layers

TensorFlow.js high-level layers API
TypeScript
283
star
81

build

Build-related tools for TensorFlow
Shell
248
star
82

kfac

An implementation of KFAC for TensorFlow
Python
195
star
83

tflite-micro-arduino-examples

C++
171
star
84

ngraph-bridge

TensorFlow-nGraph bridge
C++
138
star
85

profiler-ui

[Deprecated] The TensorFlow Profiler (TFProf) UI provides a visual interface for profiling TensorFlow models.
HTML
134
star
86

tensorboard-plugin-example

Python
134
star
87

tfx-addons

Developers helping developers. TFX-Addons is a collection of community projects to build new components, examples, libraries, and tools for TFX. The projects are organized under the auspices of the special interest group, SIG TFX-Addons. Join the group at http://goo.gle/tfx-addons-group
Jupyter Notebook
121
star
88

metadata

Utilities for passing TensorFlow-related metadata between tools
Python
102
star
89

networking

Enhanced networking support for TensorFlow. Maintained by SIG-networking.
C++
97
star
90

tfhub.dev

Python
71
star
91

tfjs-website

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.
CSS
69
star
92

java-models

Models in Java
Java
68
star
93

java-ndarray

Java
66
star
94

tfjs-data

Simple APIs to load and prepare data for use in machine learning models
TypeScript
66
star
95

tfx-bsl

Common code for TFX
Python
61
star
96

autograph

Python
50
star
97

model-remediation

Model Remediation is a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.
Python
42
star
98

codelabs

Jupyter Notebook
36
star
99

tensorstore

C++
25
star
100

swift-bindings

Swift
25
star