TFRT: A New TensorFlow Runtime
TFRT is a new TensorFlow runtime. It aims to provide a unified, extensible infrastructure layer with best-in-class performance across a wide variety of domain specific hardware. It provides efficient use of multithreaded host CPUs, supports fully asynchronous programming models, and focuses on low-level efficiency.
TFRT will benefit a broad range of users, but it will be of particular interest to you if you are a:
- Researcher looking to experiment with complex new models and add custom operations to TensorFlow
- Application developer looking for improved performance when serving models in production
- Hardware maker looking to plug hardware into TensorFlow, including edge and datacenter devices
...or you are simply curious about cool ML infrastructure and low-level runtime technology!
To learn more about TFRT’s early progress and wins, check out our Tensorflow Dev Summit 2020 presentation where we provided a performance benchmark for small-batch GPU inference on ResNet 50, and our MLIR Open Design Deep Dive presentation where we provided a detailed overview of TFRT’s core components, low-level abstractions, and general design principles.
Note: TFRT is an early stage project and is not yet ready for general use.
Getting started
TLDR: This section describes how to set up a development environment for TFRT, as well as instructions to build and test TFRT components.
TFRT currently supports Ubuntu-16.04. Future supported platforms include MacOS, Windows, etc. Bazel and clang are required to build and test TFRT. NVIDIA's CUDA Toolkit and cuDNN libraries are required for the GPU backend.
To describe the TFRT build and test workflows, we will build and run the following binaries for graph execution.
Recall from our Dev Summit presentation that for graph execution, a TensorFlow user passes into TFRT a TensorFlow graph created via high-level TensorFlow APIs, and TFRT then calls the MLIR-based graph compiler to optimize and lower the graph into BEF, a Binary Executable Format for TFRT graph execution (MLIR is the compiler infrastructure that we use to represent TFRT host programs). The blue arrows in the simplified TensorFlow training stack diagram below show this flow.
The two binaries introduced next focus on the backend of the graph execution
workflow. After the graph compiler has optimized the TensorFlow graph and
produced a low-level TFRT Host Program represented in MLIR, tfrt_translate
generates a BEF
file from that host program and bef_executor
runs the BEF
file. The progression from TFRT Host Program to bef_executor
via
tfrt_translate
is depicted in the expanded TensorFlow training stack diagram
below. Note that the blue arrow between TFRT Host Program and BEF
file
represents tfrt_translate
. Both programs are built in the tools
directory.
tfrt_translate
The tfrt_translate
program does round trip translation between MLIR and BEF,
similar to an assembler and disassembler.
bef_executor
The bef_executor
program is the execution driver of BEF
files. It reads in a
BEF
file, sets up runtime, and asynchronously executes function(s) in that
file.
Prerequisites
Install Bazel
To build TFRT, you need to install Bazel. TFRT is built and verified with Bazel 4.0. Follow the Bazel installation instructions to install Bazel. Verify the installation with
$ bazel --version
bazel 4.0.0
Install clang
Follow the clang installation instructions to install clang. The automatic installation script that installs clang, lldb, and lld, is recommended. TFRT is built and verified with clang 11.1.
If you have multiple versions of clang installed, ensure that the right version
of clang is the default. On Ubuntu based systems, you can use
update-alternatives
to select the default version. The following example
commands assume you installed clang-11:
$ sudo update-alternatives --install /usr/bin/clang clang /usr/bin/clang-11 11
$ sudo update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-11 11
Verify the installation with
$ clang --version
clang version 11.1.0
Install libstdc++
TFRT requires libstdc++8 or greater. Check clang's selected version with
$ clang++ -v |& grep "Selected GCC"
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/10
In the example above, the 10 at the end of the path indicates that clang will use libstdc++10, which is compatible with TFRT.
If you need to upgrade, the easiest way is to install gcc-8. Run the following command to install:
$ sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install -y gcc-8 g++-8
To verify installation, re-run the clang++ -v
check above.
GPU prerequisites
Note: You can skip this section if you don't want to build the GPU backend.
Remember to exclude //backends/gpu/...
from your Bazel target patterns though.
Building and running the GPU backend requires installing additional components.
Install clang Python bindings using pip with
$ pip install libclang
Install NVIDIA's CUDA Toolkit v11.2 (see
installation guide
for details) in a single directory from NVIDIA’s .run
package with
$ wget http://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
$ sudo sh cuda_11.2.2_460.32.03_linux.run --toolkit --installpath=<path>
Register the path to CUDA shared objects with
$ sudo echo '<path>/lib64' > '/etc/ld.so.conf.d/cuda.conf'
$ sudo ldconfig
Install NVIDIA's cuDNN libraries (see installation guide for details) with
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/libcudnn8_8.0.4.30-1+cuda11.1_amd64.deb
$ sudo apt install ./libcudnn8_8.0.4.30-1+cuda11.1_amd64.deb
Note: The above package is intended for CUDA 11.1, but is compatible with CUDA 11.2. TFRT is built and verified with cuDNN 8.1 for CUDA 11.2. Access to that package requires a (free) NVIDIA developer account.
Building and running TFRT
To build TFRT, cd
to the root directory (where WORKSPACE
file is located) of
the TFRT workspace. A set of build configurations is in .bazelrc
file. You can
create a user.bazelrc
in the repository root with extra Bazel configs that may
be useful. Build tfrt_translate
and bef_executor
with the following
commands:
$ bazel build //tools:bef_executor
$ bazel build //tools:tfrt_translate
The above commands build the binaries with opt
compilation mode. Check
Bazel's documentation
for more build options. Bazel will notify the output location at the end of a
successful build (default is bazel-bin
).
After tfrt_translate
and bef_executor
are built, run an .mlir
program with
the following command:
$ bazel-bin/tools/tfrt_translate -mlir-to-bef path/to/program.mlir | bazel-bin/tools/bef_executor
TFRT provides a series of .mlir test programs. For example:
$ bazel-bin/tools/tfrt_translate -mlir-to-bef mlir_tests/bef_executor/async.mlir | bazel-bin/tools/bef_executor
Any output will be printed out to the terminal.
Adding GPU support
Add --config=cuda
to the Bazel command to link the GPU backend to the above
targets.
Custom CUDA Toolkit locations can be specified with
--repo_env=CUDA_PATH=<path>
. The default is /usr/local/cuda
.
Testing
TFRT utilizes LLVM’s LIT infrastructure and FileCheck utility tool to construct MLIR-based check tests. These tests verify that some set of string tags appear in the test’s output. More introduction and guidelines on testing can be found here. An example test is shown below:
// RUN: tfrt_translate -mlir-to-bef %s | bef_executor | FileCheck %s
// RUN: tfrt_opt %s | tfrt_opt
// CHECK-LABEL: --- Running 'basic_tensor'
func @basic_tensor() {
%c0 = tfrt.new.chain
%a = dht.create_uninitialized_tensor.i32.2 [3 : i64, 2 : i64]
%c1 = dht.fill_tensor_with_constant.i32 %a, %c0 0 : i32
// CHECK: shape = [3, 2], values = [0, 0, 0, 0, 0, 0]
%c2 = dht.print_tensor %a, %c1
tfrt.return
}
To run a test, simply invoke bazel test
:
$ bazel test //mlir_tests/bef_executor:basics.mlir.test
Most tests under //backends/gpu/...
need to be built with --config=cuda
so
that the GPU backend is linked to the bef_executor:
$ bazel test --config=cuda //backends/gpu/mlir_tests/core_runtime:get_device.mlir.test
Use Bazel target patterns to run multiple tests:
$ bazel test -- //... -//third_party/... -//backends/gpu/... # All CPU tests.
$ bazel test --config=cuda //backends/gpu/... # All GPU tests.
Next Steps
Try our tutorial for some hands-on experience with TFRT.
See host runtime design for more details on TFRT's design.
Repository Overview
The three key directories under the TFRT root directory are
lib/
: Contains core TFRT infrastructure codebackends/
: Contains device specific infrastructure and op/kernel implementationsinclude/
: Contains public header files for core TFRT infrastructure
Top level directory | Sub-directory | Description |
include/
|
TFRT infrastructure public headers | |
lib/
|
TFRT infrastructure common for host runtime and all device runtime | |
basic_kernels/
|
Common infrastructure kernels, e.g. control flow kernels | |
bef_executor/
|
BEFFile and BEFExecutor implementation | |
bef_executor_driver/
|
Driver code for running BEFExecutor for an input MLIR file | |
bef_converter/
|
Converter between MLIR and BEF (bef_to_mlir and mlir_to_bef) | |
core_runtime/
|
TFRT Core Runtime infrastructure | |
distributed_runtime/
|
TFRT Distributed Runtime infrastructure | |
data/
|
TFRT infrastructure for TF input pipelines | |
host_context/
|
Host TFRT data structure, e.g. HostContext, AsyncValue, ConcurrentWorkQueue | |
metrics/
|
ML metric integration | |
support/
|
Basic utilities, e.g. hash_util, string_util | |
tensor/
|
Base Tensor class and host tensor implementations | |
test_kernels/
|
Testing kernel implementations | |
tracing/
|
Tracing/profiling support | |
cpp_tests/
|
C++ unit tests | |
mlir_tests/
|
MLIR-based unit tests | |
utils/
|
Miscellaneous utilities, such as scripts for generating test ML models. | |
tools/
|
Binaries including bef_executor, tfrt_translate etc. | |
backends/common/
|
Library shared for different backends, e.g. eigen, dnn_op_utils.h | |
ops/
|
Shared library for op implementations across devices, e.g. metadata functions | |
compat/eigen/
|
Adapter library for eigen, used by multiple backends | |
utils/
|
Miscellaneous utilities, such as scripts for generating MLIR test code. | |
backends/cpu/
|
CPU device infra and CPU ops and kernels | |
include/
|
CPU related public headers | |
lib/core_runtime/
|
CPU core_runtime infra, e.g. cpu_device | |
lib/ops
|
CPU ops | |
lib/kernels
|
CPU kernels | |
cpp_tests/
|
CPU infra unit tests | |
mlir_tests/
|
CPU mlir based tests | |
backends/gpu/
|
GPU infra and op/kernel implementations. We might split this directory into a separate repository at some point after the interface with the rest of TFRT infra becomes stable. | |
include/
|
GPU related public headers | |
lib/core_runtime/
|
GPU Core runtime infra | |
lib/memory
|
GPU memory abstraction | |
lib/stream
|
GPU stream abstraction and wrappers | |
lib/tensor
|
GPU tensor | |
lib/ops
|
GPU ops | |
lib/kernels
|
GPU kernels | |
lib/data
|
GPU kernels for input pipeline infrastructure | |
cpp_tests/
|
GPU infra unit tests | |
mlir_tests/
|
GPU mlir based tests | |
tools/
|
Miscellaneous utilities |
Contribution guidelines
If you want to contribute to TFRT, be sure to review the contribution guidelines. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code of conduct.
Note: TFRT is currently not open to contributions. TFRT developers are currently developing workflows and continuous integration for accepting contributions. Once we are ready, we will update this page.
Continuous build status
Contact
Subscribe to the TFRT mailing list for general discussions about the runtime.
We use GitHub issues to track bugs and feature requests.