• Stars
    star
    757
  • Rank 59,572 (Top 2 %)
  • Language
    JavaScript
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Generic and easy-to-use serving service for machine learning models

Simple TensorFlow Serving

Introduction

Simple TensorFlow Serving is the generic and easy-to-use serving service for machine learning models. Read more in https://stfs.readthedocs.io.

  • Support distributed TensorFlow models
  • Support the general RESTful/HTTP APIs
  • Support inference with accelerated GPU
  • Support curl and other command-line tools
  • Support clients in any programming language
  • Support code-gen client by models without coding
  • Support inference with raw file for image models
  • Support statistical metrics for verbose requests
  • Support serving multiple models at the same time
  • Support dynamic online and offline for model versions
  • Support loading new custom op for TensorFlow models
  • Support secure authentication with configurable basic auth
  • Support multiple models of TensorFlow/MXNet/PyTorch/Caffe2/CNTK/ONNX/H2o/Scikit-learn/XGBoost/PMML/Spark MLlib

Installation

Install the server with pip.

pip install simple_tensorflow_serving

Or install from source code.

python ./setup.py install

python ./setup.py develop

bazel build simple_tensorflow_serving:server

Or use the docker image.

docker run -d -p 8500:8500 tobegit3hub/simple_tensorflow_serving

docker run -d -p 8500:8500 tobegit3hub/simple_tensorflow_serving:latest-gpu

docker run -d -p 8500:8500 tobegit3hub/simple_tensorflow_serving:latest-hdfs

docker run -d -p 8500:8500 tobegit3hub/simple_tensorflow_serving:latest-py34
docker-compose up -d

Or deploy in Kubernetes.

kubectl create -f ./simple_tensorflow_serving.yaml

Quick Start

Start the server with the TensorFlow SavedModel.

simple_tensorflow_serving --model_base_path="./models/tensorflow_template_application_model"

Check out the dashboard in http://127.0.0.1:8500 in web browser.

dashboard

Generate Python client and access the model with test data without coding.

curl http://localhost:8500/v1/models/default/gen_client?language=python > client.py
python ./client.py

Advanced Usage

Multiple Models

It supports serve multiple models and multiple versions of these models. You can run the server with this configuration.

{
  "model_config_list": [
    {
      "name": "tensorflow_template_application_model",
      "base_path": "./models/tensorflow_template_application_model/",
      "platform": "tensorflow"
    }, {
      "name": "deep_image_model",
      "base_path": "./models/deep_image_model/",
      "platform": "tensorflow"
    }, {
       "name": "mxnet_mlp_model",
       "base_path": "./models/mxnet_mlp/mx_mlp",
       "platform": "mxnet"
    }
  ]
}
simple_tensorflow_serving --model_config_file="./examples/model_config_file.json"

Adding or removing model versions will be detected automatically and re-load latest files in memory. You can easily choose the specified model and version for inference.

endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "keys": [[11.0], [2.0]],
      "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1],
                   [1, 1, 1, 1, 1, 1, 1, 1, 1]]
  }
}
result = requests.post(endpoint, json=input_data)

GPU Acceleration

If you want to use GPU, try with the docker image with GPU tag and put cuda files in /usr/cuda_files/.

export CUDA_SO="-v /usr/cuda_files/:/usr/cuda_files/"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
export LIBRARY_ENV="-e LD_LIBRARY_PATH=/usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/cuda_files"

docker run -it -p 8500:8500 $CUDA_SO $DEVICES $LIBRARY_ENV tobegit3hub/simple_tensorflow_serving:latest-gpu

You can set session config and gpu options in command-line parameter or the model config file.

simple_tensorflow_serving --model_base_path="./models/tensorflow_template_application_model" --session_config='{"log_device_placement": true, "allow_soft_placement": true, "allow_growth": true, "per_process_gpu_memory_fraction": 0.5}'
{
  "model_config_list": [
    {
      "name": "default",
      "base_path": "./models/tensorflow_template_application_model/",
      "platform": "tensorflow",
      "session_config": {
        "log_device_placement": true,
        "allow_soft_placement": true,
        "allow_growth": true,
        "per_process_gpu_memory_fraction": 0.5
      }
    }
  ]
}

Generated Client

You can generate the test json data for the online models.

curl http://localhost:8500/v1/models/default/gen_json

Or generate clients in different languages(Bash, Python, Golang, JavaScript etc.) for your model without writing any code.

curl http://localhost:8500/v1/models/default/gen_client?language=python > client.py
curl http://localhost:8500/v1/models/default/gen_client?language=bash > client.sh
curl http://localhost:8500/v1/models/default/gen_client?language=golang > client.go
curl http://localhost:8500/v1/models/default/gen_client?language=javascript > client.js

The generated code should look like these which can be test immediately.

#!/usr/bin/env python

import requests

def main():
  endpoint = "http://127.0.0.1:8500"
  json_data = {"model_name": "default", "data": {"keys": [[1], [1]], "features": [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]} }
  result = requests.post(endpoint, json=json_data)
  print(result.text)

if __name__ == "__main__":
  main()
#!/usr/bin/env python

import requests

def main():
  endpoint = "http://127.0.0.1:8500"

  input_data = {"keys": [[1.0], [1.0]], "features": [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]]}
  result = requests.post(endpoint, json=input_data)
  print(result.text)

if __name__ == "__main__":
  main()

Image Model

For image models, we can request with the raw image files instead of constructing array data.

Now start serving the image model like deep_image_model.

simple_tensorflow_serving --model_base_path="./models/deep_image_model/"

Then request with the raw image file which has the same shape of your model.

curl -X POST -F 'image=@./images/mew.jpg' -F "model_version=1" 127.0.0.1:8500

TensorFlow Estimator Model

If we use the TensorFlow Estimator API to export the model, the model signature should look like this.

inputs {
  key: "inputs"
  value {
    name: "input_example_tensor:0"
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: -1
      }
    }
  }
}
outputs {
  key: "classes"
  value {
    name: "linear/binary_logistic_head/_classification_output_alternatives/classes_tensor:0"
    dtype: DT_STRING
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: -1
      }
    }
  }
}
outputs {
  key: "scores"
  value {
    name: "linear/binary_logistic_head/predictions/probabilities:0"
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: -1
      }
      dim {
        size: 2
      }
    }
  }
}
method_name: "tensorflow/serving/classify"

We need to construct the string tensor for inference and use base64 to encode the string for HTTP. Here is the example Python code.

def _float_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def main():
  # Raw input data
  feature_dict = {"a": _bytes_feature("10"), "b": _float_feature(10)}

  # Create Example as base64 string
  example_proto = tf.train.Example(features=tf.train.Features(feature=feature_dict))
  tensor_proto = tf.contrib.util.make_tensor_proto(example_proto.SerializeToString(), dtype=tf.string)
  tensor_string = tensor_proto.string_val.pop()
  base64_tensor_string = base64.urlsafe_b64encode(tensor_string)

  # Request server
  endpoint = "http://127.0.0.1:8500"
  json_data = {"model_name": "default", "base64_decode": True, "data": {"inputs": [base64_tensor_string]}}
  result = requests.post(endpoint, json=json_data)
  print(result.json())

Custom Op

If your models rely on new TensorFlow custom op, you can run the server while loading the so files.

simple_tensorflow_serving --model_base_path="./model/" --custom_op_paths="./foo_op/"

Please check out the complete example in ./examples/custom_op/.

Authentication

For enterprises, we can enable basic auth for all the APIs and any anonymous request is denied.

Now start the server with the configured username and password.

./server.py --model_base_path="./models/tensorflow_template_application_model/" --enable_auth=True --auth_username="admin" --auth_password="admin"

If you are using the Web dashboard, just type your certification. If you are using clients, give the username and password within the request.

curl -u admin:admin -H "Content-Type: application/json" -X POST -d '{"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}' http://127.0.0.1:8500
endpoint = "http://127.0.0.1:8500"
input_data = {
  "data": {
      "keys": [[11.0], [2.0]],
      "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]
  }
}
auth = requests.auth.HTTPBasicAuth("admin", "admin")
result = requests.post(endpoint, json=input_data, auth=auth)

TSL/SSL

It supports TSL/SSL and you can generate the self-signed secret files for testing.

openssl req -x509 -newkey rsa:4096 -nodes -out /tmp/secret.pem -keyout /tmp/secret.key -days 365

Then run the server with certification files.

simple_tensorflow_serving --enable_ssl=True --secret_pem=/tmp/secret.pem --secret_key=/tmp/secret.key --model_base_path="./models/tensorflow_template_application_model"

Supported Models

For MXNet models, you can load with commands and configuration like these.

simple_tensorflow_serving --model_base_path="./models/mxnet_mlp/mx_mlp" --model_platform="mxnet"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[12.0, 2.0]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

For ONNX models, you can load with commands and configuration like these.

simple_tensorflow_serving --model_base_path="./models/onnx_mnist_model/onnx_model.proto" --model_platform="onnx"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[...]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

For H2o models, you can load with commands and configuration like these.

# Start H2o server with "java -jar h2o.jar"

simple_tensorflow_serving --model_base_path="./models/h2o_prostate_model/GLM_model_python_1525255083960_17" --model_platform="h2o"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[...]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

For Scikit-learn models, you can load with commands and configuration like these.

simple_tensorflow_serving --model_base_path="./models/scikitlearn_iris/model.joblib" --model_platform="scikitlearn"

simple_tensorflow_serving --model_base_path="./models/scikitlearn_iris/model.pkl" --model_platform="scikitlearn"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[...]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

For XGBoost models, you can load with commands and configuration like these.

simple_tensorflow_serving --model_base_path="./models/xgboost_iris/model.bst" --model_platform="xgboost"

simple_tensorflow_serving --model_base_path="./models/xgboost_iris/model.joblib" --model_platform="xgboost"

simple_tensorflow_serving --model_base_path="./models/xgboost_iris/model.pkl" --model_platform="xgboost"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[...]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

For PMML models, you can load with commands and configuration like these. This relies on Openscoring and Openscoring-Python to load the models.

java -jar ./third_party/openscoring/openscoring-server-executable-1.4-SNAPSHOT.jar

simple_tensorflow_serving --model_base_path="./models/pmml_iris/DecisionTreeIris.pmml" --model_platform="pmml"
endpoint = "http://127.0.0.1:8500"
input_data = {
  "model_name": "default",
  "model_version": 1,
  "data": {
      "data": [[...]]
  }
}
result = requests.post(endpoint, json=input_data)
print(result.text)

Supported Client

Here is the example client in Bash.

curl -H "Content-Type: application/json" -X POST -d '{"data": {"keys": [[1.0], [2.0]], "features": [[10, 10, 10, 8, 6, 1, 8, 9, 1], [6, 2, 1, 1, 1, 1, 7, 1, 1]]}}' http://127.0.0.1:8500

Here is the example client in Python.

endpoint = "http://127.0.0.1:8500"
payload = {"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}

result = requests.post(endpoint, json=payload)

Here is the example client in C++.

Here is the example client in Java.

Here is the example client in Scala.

Here is the example client in Go.

endpoint := "http://127.0.0.1:8500"
dataByte := []byte(`{"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}`)
var dataInterface map[string]interface{}
json.Unmarshal(dataByte, &dataInterface)
dataJson, _ := json.Marshal(dataInterface)

resp, err := http.Post(endpoint, "application/json", bytes.NewBuffer(dataJson))

Here is the example client in Ruby.

endpoint = "http://127.0.0.1:8500"
uri = URI.parse(endpoint)
header = {"Content-Type" => "application/json"}
input_data = {"data" => {"keys"=> [[11.0], [2.0]], "features"=> [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new(uri.request_uri, header)
request.body = input_data.to_json

response = http.request(request)

Here is the example client in JavaScript.

var options = {
    uri: "http://127.0.0.1:8500",
    method: "POST",
    json: {"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}
};

request(options, function (error, response, body) {});

Here is the example client in PHP.

$endpoint = "127.0.0.1:8500";
$inputData = array(
    "keys" => [[11.0], [2.0]],
    "features" => [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]],
);
$jsonData = array(
    "data" => $inputData,
);
$ch = curl_init($endpoint);
curl_setopt_array($ch, array(
    CURLOPT_POST => TRUE,
    CURLOPT_RETURNTRANSFER => TRUE,
    CURLOPT_HTTPHEADER => array(
        "Content-Type: application/json"
    ),
    CURLOPT_POSTFIELDS => json_encode($jsonData)
));

$response = curl_exec($ch);

Here is the example client in Erlang.

ssl:start(),
application:start(inets),
httpc:request(post,
  {"http://127.0.0.1:8500", [],
  "application/json",
  "{\"data\": {\"keys\": [[11.0], [2.0]], \"features\": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}"
  }, [], []).

Here is the example client in Lua.

local endpoint = "http://127.0.0.1:8500"
keys_array = {}
keys_array[1] = {1.0}
keys_array[2] = {2.0}
features_array = {}
features_array[1] = {1, 1, 1, 1, 1, 1, 1, 1, 1}
features_array[2] = {1, 1, 1, 1, 1, 1, 1, 1, 1}
local input_data = {
    ["keys"] = keys_array,
    ["features"] = features_array,
}
local json_data = {
    ["data"] = input_data
}
request_body = json:encode (json_data)
local response_body = {}

local res, code, response_headers = http.request{
    url = endpoint,
    method = "POST", 
    headers = 
      {
          ["Content-Type"] = "application/json";
          ["Content-Length"] = #request_body;
      },
      source = ltn12.source.string(request_body),
      sink = ltn12.sink.table(response_body),
}

Here is the example client in Rust.

Here is the example client in Swift.

Here is the example client in Perl.

my $endpoint = "http://127.0.0.1:8500";
my $json = '{"data": {"keys": [[11.0], [2.0]], "features": [[1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1]]}}';
my $req = HTTP::Request->new( 'POST', $endpoint );
$req->header( 'Content-Type' => 'application/json' );
$req->content( $json );
$ua = LWP::UserAgent->new;

$response = $ua->request($req);

Here is the example client in Lisp.

Here is the example client in Haskell.

Here is the example client in Clojure.

Here is the example client in R.

endpoint <- "http://127.0.0.1:8500"
body <- list(data = list(a = 1), keys = 1)
json_data <- list(
  data = list(
    keys = list(list(1.0), list(2.0)), features = list(list(1, 1, 1, 1, 1, 1, 1, 1, 1), list(1, 1, 1, 1, 1, 1, 1, 1, 1))
  )
)

r <- POST(endpoint, body = json_data, encode = "json")
stop_for_status(r)
content(r, "parsed", "text/html")

Here is the example with Postman.

Performance

You can run SimpleTensorFlowServing with any WSGI server for better performance. We have benchmarked and compare with TensorFlow Serving. Find more details in benchmark.

STFS(Simple TensorFlow Serving) and TFS(TensorFlow Serving) have similar performances for different models. Vertical coordinate is inference latency(microsecond) and the less is better.

Then we test with ab with concurrent clients in CPU and GPU. TensorFlow Serving works better especially with GPUs.

For simplest model, each request only costs ~1.9 microseconds and one instance of Simple TensorFlow Serving can achieve 5000+ QPS. With larger batch size, it can inference more than 1M instances per second.

How It Works

  1. simple_tensorflow_serving starts the HTTP server with flask application.
  2. Load the TensorFlow models with tf.saved_model.loader Python API.
  3. Construct the feed_dict data from the JSON body of the request.
    // Method: POST, Content-Type: application/json
    {
      "model_version": 1, // Optional
      "data": {
        "keys": [[1], [2]],
        "features": [[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]]
      }
    }
    
  4. Use the TensorFlow Python API to sess.run() with feed_dict data.
  5. For multiple versions supported, it starts independent thread to load models.
  6. For generated clients, it reads user's model and render code with Jinja templates.

Contribution

Feel free to open an issue or send pull request for this project. It is warmly welcome to add more clients in your languages to access TensorFlow models.

More Repositories

1

seagull

Friendly Web UI to manage and monitor docker
JavaScript
1,934
star
2

tensorflow_template_application

TensorFlow template application for deep learning
Python
1,869
star
3

advisor

Open-source implementation of Google Vizier for hyper parameters tuning
Jupyter Notebook
1,542
star
4

understand_linux_process

The open-source ebook of Understand Linux Process
Dockerfile
996
star
5

ml_implementation

Implementation of Machine Learning Algorithms
Python
389
star
6

lambda-docker

Event-driven code runtime like AWS Lambda service
Python
302
star
7

miniflow

Minimal numerical computation library with TensorFlow APIs
C
301
star
8

ceph_from_scratch

The practical book about ceph
Dockerfile
153
star
9

tensorflow_examples

TensorFlow Example Projects
Python
140
star
10

deep_image_model

Deep convolution/recurrent neural network project with TensorFlow
Python
100
star
11

openmldb-chatgpt-plugin

The ChatGPT plugin to enhance OpenMLDB.
Python
51
star
12

tobe-algorithm-manual

Tobe Algorithm Manual
Python
49
star
13

deep_q

Deep reinforcement learning with TensorFlow
Python
47
star
14

pirate

Private admin dashboard for docker distribution
Go
42
star
15

openspark

The out-of-the-box environment to for Hadoop/Spark applications
Dockerfile
39
star
16

tftvm

TensorFlow and TVM integration
C++
38
star
17

code_examples

The code for computer science
ANTLR
35
star
18

understand_linux_process_examples

Examples of Understand Linux Process
Go
34
star
19

enas_model

Automatically build the deep learning models with ENAS
Python
30
star
20

mycomputer

Share your enviable life
Go
27
star
21

tfmodel

Statically and dynamically inspect tool for TensorFlow models
Python
24
star
22

ceph-web

Web based management tool for ceph
Go
21
star
23

harbor-py

The missing harbor python SDK
Python
20
star
24

auto_imgaug

Automatical image augment with heuristic machine learning models
Python
18
star
25

mirror-dockerhub

Tools to mirror container images from docker hub
Shell
14
star
26

osop

OpenStack Operation automatical toolkits
Shell
13
star
27

hotpatch

Apply hot patches for running processes without restarting
C++
12
star
28

tuning_game

The hyper-parameters tuning and black box optimization games
Python
12
star
29

tensorflow_capsnet

TensorFlow implementation of CapsNet
Python
10
star
30

hplearn

High performance machine learning system
C++
8
star
31

gitsql

Query Git with SQL
Python
7
star
32

translate

Command-line translate tool, supported by Youdao
Ruby
6
star
33

tf_progress

Easy-to-use library for logging training progress of TensorFlow
Python
5
star
34

failover_paper

The paper of failover test framework for distributed system
4
star
35

nmenu

Customized Ncurses Menu Toolkit
Shell
2
star
36

openkube

The container with Kubernetes tools.
Dockerfile
2
star
37

NotATop

3D Android game of spinning top
Java
2
star
38

painter

The painter service to construct machine learning models easily
HTML
2
star
39

sparksql-magic

The notebook magic function for SparkSQL
Jupyter Notebook
2
star
40

PortableOpenMLDB

The portable OpenMLDB cluster which can be run anywhere without installation.
Shell
2
star
41

PyCss

Write CSS in pythonic way
Python
2
star
42

openmldb_lab

Enhanced web service for OpenMLDB
Vue
1
star
43

docs-project

The docs project for any project
1
star
44

PutsColor

The maven project to System.out.println colorful message easily
Java
1
star
45

hotdog-not-hotdog-app

Hot dog or not hot dog application with machine learning
Python
1
star
46

cmake_redis

CMake support for redis
Shell
1
star
47

android_control

Android control program with adb debugger
Python
1
star
48

rbac_book

The RBAC book about authentication and authorization
1
star
49

Childhood

The HTML5 video website to remind you of childhood
JavaScript
1
star
50

pm_simulator

PM(Project Manager) Simulator.
Python
1
star
51

mplayer

Android controller for embedded devices.
C
1
star