• Stars
    star
    849
  • Rank 53,688 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data compression in TensorFlow

TensorFlow Compression

TensorFlow Compression (TFC) contains data compression tools for TensorFlow.

You can use this library to build your own ML models with end-to-end optimized data compression built in. It's useful to find storage-efficient representations of your data (images, features, examples, etc.) while only sacrificing a small fraction of model performance. Take a look at the lossy data compression tutorial or the model compression tutorial to get started.

For a more in-depth introduction from a classical data compression perspective, consider our paper on nonlinear transform coding, or watch @jonycgn's talk on learned image compression. For an introduction to lossy data compression from a machine learning perspective, take a look at @yiboyang's review paper.

The library contains (see the API docs for details):

  • Range coding (a.k.a. arithmetic coding) implementations in the form of flexible TF ops written in C++. These include an optional "overflow" functionality that embeds an Elias gamma code into the range encoded bit sequence, making it possible to encode alphabets containing the entire set of signed integers rather than just a finite range.

  • Entropy model classes which simplify the process of designing rateโ€“distortion optimized codes. During training, they act like likelihood models. Once training is completed, they encode floating point tensors into optimized bit sequences by automating the design of range coding tables and calling the range coder implementation behind the scenes.

  • Additional TensorFlow functions and Keras layers that are useful in the context of learned data compression, such as methods to numerically find quantiles of density functions, take expectations with respect to dithering noise, convolution layers with more flexible padding options and support for reparameterizing kernels and biases in the Fourier domain, and an implementation of generalized divisive normalization (GDN).

Documentation & getting help

Refer to the API documentation for a complete description of the classes and functions this package implements.

Please post all questions or comments on Discussions. Only file Issues for actual bugs or feature requests. On Discussions, you may get a faster answer, and you help other people find the question or answer more easily later.

Installation

Note: Precompiled packages are currently only provided for Linux and Darwin/Mac OS. To use these packages on Windows, consider installing TensorFlow using the instructions for WSL2 or using a TensorFlow Docker image, and then installing the Linux package.

Set up an environment in which you can install precompiled binary Python packages using the pip command. Refer to the TensorFlow installation instructions for more information on how to set up such a Python environment.

The current version of TensorFlow Compression requires TensorFlow 2. For versions compatible with TensorFlow 1, see our previous releases.

pip

To install TFC via pip, run the following command:

pip install tensorflow-compression

To test that the installation works correctly, you can run the unit tests with:

python -m tensorflow_compression.all_tests

Once the command finishes, you should see a message OK (skipped=29) or similar in the last line.

Colab

You can try out TFC live in a Colab. The following command installs the latest version of TFC that is compatible with the installed TensorFlow version. Run it in a cell before executing your Python code:

!pip install tensorflow-compression~=$(pip show tensorflow | perl -p -0777 -e 's/.*Version: (\d+\.\d+).*/\1.0/sg')

Note: The binary packages of TFC are tied to TF with the same minor version (e.g., TFC 2.9.1 requires TF 2.9.x), and Colab sometimes lags behind a few days in deploying the latest version of TensorFlow. As a result, using pip install tensorflow-compression naively might attempt to upgrade TF, which can create problems.

Docker

To use a Docker container (e.g. on Windows), be sure to install Docker (e.g., Docker Desktop), use a TensorFlow Docker image, and then run the pip install command inside the Docker container, not on the host. For instance, you can use a command line like this:

docker run tensorflow/tensorflow:latest bash -c \
    "pip install tensorflow-compression &&
     python -m tensorflow_compression.all_tests"

This will fetch the TensorFlow Docker image if it's not already cached, install the pip package and then run the unit tests to confirm that it works.

Anaconda

It seems that Anaconda ships its own binary version of TensorFlow which is incompatible with our pip package. To solve this, always install TensorFlow via pip rather than conda. For example, this creates an Anaconda environment with CUDA libraries, and then installs TensorFlow and TensorFlow Compression:

conda create --name ENV_NAME python cudatoolkit cudnn
conda activate ENV_NAME
pip install tensorflow-compression

Depending on the requirements of the tensorflow pip package, you may need to pin the CUDA libraries to specific versions. If you aren't using a GPU, CUDA is of course not necessary.

Usage

We recommend importing the library from your Python code as follows:

import tensorflow as tf
import tensorflow_compression as tfc

Using a pre-trained model to compress an image

In the models directory, you'll find a python script tfci.py. Download the file and run:

python tfci.py -h

This will give you a list of options. Briefly, the command

python tfci.py compress <model> <PNG file>

will compress an image using a pre-trained model and write a file ending in .tfci. Execute python tfci.py models to give you a list of supported pre-trained models. The command

python tfci.py decompress <TFCI file>

will decompress a TFCI file and write a PNG file. By default, an output file will be named like the input file, only with the appropriate file extension appended (any existing extensions will not be removed).

Training your own model

The models directory contains several implementations of published image compression models to enable easy experimentation. Note that in order to reproduce published results, more tuning of the code and training dataset may be necessary. Use the tfci.py script above to access published models.

The following instructions talk about a re-implementation of the model published in:

"End-to-end optimized image compression"
J. Ballรฉ, V. Laparra, E. P. Simoncelli
https://arxiv.org/abs/1611.01704

Note that the models directory is not contained in the pip package. The models are meant to be downloaded individually. Download the file bls2017.py and run:

python bls2017.py -h

This will list the available command line options for the implementation. Training can be as simple as the following command:

python bls2017.py -V train

This will use the default settings. Note that unless a custom training dataset is provided via --train_glob, the CLIC dataset will be downloaded using TensorFlow Datasets.

The most important training parameter is --lambda, which controls the trade-off between bitrate and distortion that the model will be optimized for. The number of channels per layer is important, too: models tuned for higher bitrates (or, equivalently, lower distortion) tend to require transforms with a greater approximation capacity (i.e. more channels), so to optimize performance, you want to make sure that the number of channels is large enough (or larger). This is described in more detail in:

"Efficient nonlinear transforms for lossy image compression"
J. Ballรฉ
https://arxiv.org/abs/1802.00847

If you wish, you can monitor progress with Tensorboard. To do this, create a Tensorboard instance in the background before starting the training, then point your web browser to port 6006 on your machine:

tensorboard --logdir=/tmp/train_bls2017 &

When training has finished, the Python script saves the trained model to the directory specified with --model_path (by default, bls2017 in the current directory) in TensorFlow's SavedModel format. The script can then be used to compress and decompress images as follows. The same saved model must be accessible to both commands.

python bls2017.py [options] compress original.png compressed.tfci
python bls2017.py [options] decompress compressed.tfci reconstruction.png

Building pip packages

This section describes the necessary steps to build your own pip packages of TensorFlow Compression. This may be necessary to install it on platforms for which we don't provide precompiled binaries (currently only Linux and Darwin).

To be compatible with the official TensorFlow pip package, the TFC pip package must be linked against a matching version of the C libraries. For this reason, to build the official Linux pip packages, we use these Docker images and use the same toolchain that TensorFlow uses.

Inside the Docker container, the following steps need to be taken:

  1. Clone the tensorflow/compression repo from GitHub.
  2. Install Python dependencies.
  3. Run :build_pip_pkg inside the cloned repo.

For example:

sudo docker run -i --rm -v /tmp/tensorflow_compression:/tmp/tensorflow_compression \
    tensorflow/build:latest-python3.10 bash -c \
    "git clone https://github.com/tensorflow/compression.git /tensorflow_compression &&
     cd /tensorflow_compression &&
     python -m pip install -U pip setuptools wheel &&
     python -m pip install -r requirements.txt &&
     bazel run -c opt --copt=-mavx [email protected]_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda//crosstool:toolchain :build_pip_pkg -- . /tmp/tensorflow_compression <custom-version>"

For Darwin, the Docker image and specifying the toolchain is not necessary. We just build the package like this (note that you may want to create a clean Python virtual environment to do this):

git clone https://github.com/tensorflow/compression.git /tensorflow_compression
cd /tensorflow_compression
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
bazel run -c opt --copt=-mavx --macos_minimum_os=10.14 :build_pip_pkg -- . /tmp/tensorflow_compression <custom-version>"

In both cases, the wheel file is created inside /tmp/tensorflow_compression.

To test the created package, first install the resulting wheel file:

pip install /tmp/tensorflow_compression/tensorflow_compression-*.whl

Then run the unit tests (Do not run the tests in the workspace directory where the WORKSPACE file lives. In that case, the Python interpreter would attempt to import tensorflow_compression packages from the source tree, rather than from the installed package system directory):

pushd /tmp
python -m tensorflow_compression.all_tests
popd

When done, you can uninstall the pip package again:

pip uninstall tensorflow-compression

Evaluation

We provide evaluation results for several image compression methods in terms of different metrics in different colorspaces. Please see the results subdirectory for more information.

Citation

If you use this library for research purposes, please cite:

@software{tfc_github,
  author = "Ballรฉ, Johannes and Hwang, Sung Jin and Agustsson, Eirikur",
  title = "{T}ensor{F}low {C}ompression: Learned Data Compression",
  url = "http://github.com/tensorflow/compression",
  version = "2.12.0",
  year = "2022",
}

In the above BibTeX entry, names are top contributors sorted by number of commits. Please adjust version number and year according to the version that was actually used.

Note that this is not an officially supported Google product.

More Repositories

1

tensorflow

An Open Source Machine Learning Framework for Everyone
C++
186,123
star
2

models

Models and examples built with TensorFlow
Python
77,049
star
3

tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
TypeScript
18,430
star
4

tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Python
14,693
star
5

tfjs-models

Pretrained models for TensorFlow.js
TypeScript
14,058
star
6

playground

Play with neural networks!
TypeScript
11,585
star
7

tfjs-core

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.
TypeScript
8,480
star
8

examples

TensorFlow examples
Jupyter Notebook
7,920
star
9

tensorboard

TensorFlow's Visualization Toolkit
TypeScript
6,686
star
10

tfjs-examples

Examples built with TensorFlow.js
JavaScript
6,553
star
11

nmt

TensorFlow Neural Machine Translation Tutorial
Python
6,315
star
12

docs

TensorFlow documentation
Jupyter Notebook
6,119
star
13

swift

Swift for TensorFlow
Jupyter Notebook
6,118
star
14

serving

A flexible, high-performance serving system for machine learning models
C++
6,068
star
15

tpu

Reference models and tools for Cloud TPUs.
Jupyter Notebook
5,214
star
16

rust

Rust language bindings for TensorFlow
Rust
4,939
star
17

lucid

A collection of infrastructure and tools for research in neural network interpretability.
Jupyter Notebook
4,611
star
18

datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Python
4,298
star
19

probability

Probabilistic reasoning and statistical analysis in TensorFlow
Jupyter Notebook
4,053
star
20

adanet

Fast and flexible AutoML with learning guarantees.
Jupyter Notebook
3,474
star
21

hub

A library for transfer learning by reusing parts of TensorFlow models.
Python
3,467
star
22

minigo

An open-source implementation of the AlphaGoZero algorithm
C++
3,428
star
23

skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Python
3,181
star
24

lingvo

Lingvo
Python
2,812
star
25

agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Python
2,775
star
26

graphics

TensorFlow Graphics: Differentiable Graphics Layers for TensorFlow
Python
2,744
star
27

ranking

Learning to Rank in TensorFlow
Python
2,735
star
28

federated

A framework for implementing federated learning
Python
2,281
star
29

tfx

TFX is an end-to-end platform for deploying production ML pipelines
Python
2,099
star
30

privacy

Library for training machine learning models with privacy for training data
Python
1,916
star
31

tflite-micro

Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
C++
1,887
star
32

fold

Deep learning with dynamic computation graphs in TensorFlow
Python
1,824
star
33

recommenders

TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
Python
1,816
star
34

quantum

Hybrid Quantum-Classical Machine Learning in TensorFlow
Python
1,798
star
35

mlir

"Multi-Level Intermediate Representation" Compiler Infrastructure
1,720
star
36

addons

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons
Python
1,690
star
37

mesh

Mesh TensorFlow: Model Parallelism Made Easier
Python
1,589
star
38

haskell

Haskell bindings for TensorFlow
Haskell
1,558
star
39

model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Python
1,486
star
40

workshops

A few exercises for use at events.
Jupyter Notebook
1,457
star
41

ecosystem

Integration of TensorFlow with other open-source frameworks
Scala
1,370
star
42

gnn

TensorFlow GNN is a library to build Graph Neural Networks on the TensorFlow platform.
Python
1,320
star
43

model-analysis

Model analysis tools for TensorFlow
Python
1,250
star
44

community

Stores documents used by the TensorFlow developer community
C++
1,239
star
45

text

Making text a first-class citizen in TensorFlow.
C++
1,224
star
46

benchmarks

A benchmark framework for Tensorflow
Python
1,144
star
47

tfjs-node

TensorFlow powered JavaScript library for training and deploying ML models on Node.js.
TypeScript
1,048
star
48

similarity

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.
Python
1,008
star
49

transform

Input pipeline framework
Python
984
star
50

neural-structured-learning

Training neural models with structured signals.
Python
982
star
51

gan

Tooling for GANs in TensorFlow
Jupyter Notebook
907
star
52

java

Java bindings for TensorFlow
Java
818
star
53

swift-apis

Swift for TensorFlow Deep Learning Library
Swift
794
star
54

deepmath

Experiments towards neural network theorem proving
C++
779
star
55

data-validation

Library for exploring and validating machine learning data
Python
756
star
56

runtime

A performant and modular runtime for TensorFlow
C++
754
star
57

tensorrt

TensorFlow/TensorRT integration
Jupyter Notebook
736
star
58

docs-l10n

Translations of TensorFlow documentation
Jupyter Notebook
716
star
59

io

Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
C++
698
star
60

tfjs-converter

Convert TensorFlow SavedModel and Keras models to TensorFlow.js
TypeScript
697
star
61

decision-forests

A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models in Keras.
Python
656
star
62

swift-models

Models and examples built with Swift for TensorFlow
Jupyter Notebook
644
star
63

tcav

Code for the TCAV ML interpretability project
Jupyter Notebook
612
star
64

recommenders-addons

Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
Cuda
590
star
65

tfjs-wechat

WeChat Mini-program plugin for TensorFlow.js
TypeScript
547
star
66

flutter-tflite

Dart
534
star
67

lattice

Lattice methods in TensorFlow
Python
519
star
68

model-card-toolkit

A toolkit that streamlines and automates the generation of model cards
Python
415
star
69

mlir-hlo

MLIR
388
star
70

tflite-support

TFLite Support is a toolkit that helps users to develop ML and deploy TFLite models onto mobile / ioT devices.
C++
374
star
71

cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
Python
374
star
72

custom-op

Guide for building custom op for TensorFlow
Smarty
373
star
73

tfjs-vis

A set of utilities for in browser visualization with TensorFlow.js
TypeScript
360
star
74

profiler

A profiling and performance analysis tool for TensorFlow
TypeScript
359
star
75

fairness-indicators

Tensorflow's Fairness Evaluation and Visualization Toolkit
Jupyter Notebook
341
star
76

moonlight

Optical music recognition in TensorFlow
Python
325
star
77

tfjs-tsne

TypeScript
309
star
78

estimator

TensorFlow Estimator
Python
300
star
79

embedding-projector-standalone

HTML
293
star
80

tfjs-layers

TensorFlow.js high-level layers API
TypeScript
283
star
81

build

Build-related tools for TensorFlow
Shell
275
star
82

tflite-micro-arduino-examples

C++
207
star
83

kfac

An implementation of KFAC for TensorFlow
Python
197
star
84

ngraph-bridge

TensorFlow-nGraph bridge
C++
137
star
85

profiler-ui

[Deprecated] The TensorFlow Profiler (TFProf) UI provides a visual interface for profiling TensorFlow models.
HTML
134
star
86

tensorboard-plugin-example

Python
134
star
87

tfx-addons

Developers helping developers. TFX-Addons is a collection of community projects to build new components, examples, libraries, and tools for TFX. The projects are organized under the auspices of the special interest group, SIG TFX-Addons. Join the group at http://goo.gle/tfx-addons-group
Jupyter Notebook
125
star
88

metadata

Utilities for passing TensorFlow-related metadata between tools
Python
102
star
89

networking

Enhanced networking support for TensorFlow. Maintained by SIG-networking.
C++
97
star
90

tfhub.dev

Python
75
star
91

java-ndarray

Java
71
star
92

java-models

Models in Java
Java
71
star
93

tfjs-website

WebGL-accelerated ML // linear algebra // automatic differentiation for JavaScript.
CSS
71
star
94

tfjs-data

Simple APIs to load and prepare data for use in machine learning models
TypeScript
66
star
95

tfx-bsl

Common code for TFX
Python
64
star
96

autograph

Python
50
star
97

model-remediation

Model Remediation is a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.
Python
42
star
98

codelabs

Jupyter Notebook
36
star
99

tensorstore

C++
25
star
100

swift-bindings

Swift
25
star