Deploy and Scale Machine Learning Models with Keras, FastAPI, Redis and Docker Swarm
Serve a production-ready and scalable Keras-based deep learning model image classification using FastAPI, Redis and Docker Swarm. Based off this series of blog posts.
How to Use
Prerequisites
Make sure you have a modern version of docker
(>1.13.0)and docker-compose
installed.
Run with Docker Compose
Simply run docker-compose up
to spin up all the services on your local machine.
Test Service
- Test the
/predict
endpoint by passing in the includeddoge.jpg
as parameterimg_file
:
curl -X POST -F [email protected] http://localhost/predict
You should see the predictions returned as a JSON response.
Deploy on Docker Swarm
Deploying this on Docker Swarm allows us to scale the model server to multiple hosts.
This assumes that you have a Swarm instance set up (e.g. on the cloud). Otherwise, to test this in a local environment, put your Docker engine in swarm mode with docker swarm init
.
- Deploy the stack on the swarm:
docker stack deploy -c docker-compose.yml mldeploy
- Check that it's running with
docker stack services mldeploy
. Note that the model server is unreplicated at this time. You may scale up the model worker by:
docker service scale mldeploy_modelserver=X
Where X
is the number of workers you want.
Load Testing
We can use locust and the included locustfile.py
to load test our service. Run the following command to spin up 20
concurrent users immediately:
locust --host=http://localhost --no-web -c 20 -r 20
The --no-web
flag runs locust in CLI mode. You may also want to use locust's web interface with all its pretty graphs, if so, just run local --host=http://localhost
.