TensorFlow Lite Object Detection on Android and Raspberry Pi
Train your own TensorFlow Lite object detection models and run them on the Raspberry Pi, Android phones, and other edge devices!
Get started with training on Google Colab by clicking the icon below, or click here to go straight to the YouTube video that provides step-by-step instructions.
Introduction
TensorFlow Lite is an optimized framework for deploying lightweight deep learning models on resource-constrained edge devices. TensorFlow Lite models have faster inference time and require less processing power than regular TensorFlow models, so they can be used to obtain faster performance in realtime applications.
This guide provides step-by-step instructions for how train a custom TensorFlow Object Detection model, convert it into an optimized format that can be used by TensorFlow Lite, and run it on edge devices like the Raspberry Pi. It also provides Python code for running TensorFlow Lite models to perform detection on images, videos, web streams, or webcam feeds.
Step 1. Train TensorFlow Lite Models
Using Google Colab (recommended)
The easiest way to train, convert, and export a TensorFlow Lite model is using Google Colab. Colab provides you with a free GPU-enabled virtual machine on Google's servers that comes pre-installed with the libraries and packages needed for training.
I wrote a Google Colab notebook that can be used to train custom TensorFlow Lite models. It goes through the process of preparing data, configuring a model for training, training the model, running it on test images, and exporting it to a downloadable TFLite format so you can deploy it to your own device. It makes training a custom TFLite model as easy as uploading an image dataset and clicking Play on a few blocks of code!
Open the Colab notebook in your browser by clicking the icon above. Work through the instructions in the notebook to start training your own model. Once it's trained and exported, visit the Setup TFLite Runtime Environment section to learn how to deploy it on your PC, Raspberry Pi, Android phone, or other edge devices.
Using a Local PC
The old version of this guide shows how to set up a TensorFlow training environment locally on your PC. Be warned: it's a lot of work, and the guide is outdated. Here's a link to the local training guide.
Step 2. Setup TFLite Runtime Environment on Your Device
Once you have a trained .tflite
model, the next step is to deploy it on a device like a computer, Raspberry Pi, or Android phone. To run the model, you'll need to install the TensorFlow or the TensorFlow Lite Runtime on your device and set up the Python environment and directory structure to run your application in. The deploy_guides folder in this repository has step-by-step guides showing how to set up a TensorFlow environment on several different devices. Links to the guides are given below.
Raspberry Pi
Follow the Raspberry Pi setup guide to install TFLite Runtime on a Raspberry Pi 3 or 4 and run a TensorFlow Lite model. This guide also shows how to use the Google Coral USB Accelerator to greatly increase the speed of quantized models on the Raspberry Pi.
Windows
Follow the instructions in the Windows TFLite guide to set up TFLite Runtime on your Windows PC using Anaconda!
macOS
Still to come!
Linux
Still to come!
Android
Still to come!
Embedded Devices
Still to come!
Step 3. Run TensorFlow Lite Models!
There are four Python scripts to run the TensorFlow Lite object detection model on an image, video, web stream, or webcam feed. The scripts are based off the label_image.py example given in the TensorFlow Lite examples GitHub repository.
- TFLite_detection_image.py
- TFLite_detection_video.py
- TFLite_detection_stream.py
- TFLite_detection_webcam.py
The following instructions show how to run the scripts. These instructions assume your .tflite model file and labelmap.txt file are in the TFLite_model
folder in your tflite1
directory as per the instructions given in the Setup TFLite Runtime Environment guide.
If you’d like try using the sample TFLite object detection model provided by Google, simply download it here, unzip it to the tflite1
folder, and rename it to TFLite_model
. Then, use --modeldir=coco_ssd_mobilenet_v1_1.0_quant_2018_06_29
rather than --modeldir=TFLite_model
when running the script.
Webcam
Make sure you have a USB webcam plugged into your computer. If you’re on a laptop with a built-in camera, you don’t need to plug in a USB webcam.From the tflite1
directory, issue:
python TFLite_detection_webcam.py --modeldir=TFLite_model
After a few moments of initializing, a window will appear showing the webcam feed. Detected objects will have bounding boxes and labels displayed on them in real time.
Video
To run the video detection script, issue:python TFLite_detection_video.py --modeldir=TFLite_model
A window will appear showing consecutive frames from the video, with each object in the frame labeled. Press 'q' to close the window and end the script. By default, the video detection script will open a video named 'test.mp4'. To open a specific video file, use the --video
option:
python TFLite_detection_video.py --modeldir=TFLite_model --video='birdy.mp4'
Note: Video detection will run at a slower FPS than realtime webcam detection. This is mainly because loading a frame from a video file requires more processor I/O than receiving a frame from a webcam.
Web stream
To run the script to detect images in a video stream (e.g. a remote security camera), issue:python TFLite_detection_stream.py --modeldir=TFLite_model --streamurl="http://ipaddress:port/stream/video.mjpeg"
After a few moments of initializing, a window will appear showing the video stream. Detected objects will have bounding boxes and labels displayed on them in real time.
Make sure to update the URL parameter to the one that is being used by your security camera. It has to include authentication information in case the stream is secured.
If the bounding boxes are not matching the detected objects, probably the stream resolution wasn't detected. In this case you can set it explicitly by using the --resolution
parameter:
python TFLite_detection_stream.py --modeldir=TFLite_model --streamurl="http://ipaddress:port/stream/video.mjpeg" --resolution=1920x1080
Image
To run the image detection script, issue:python TFLite_detection_image.py --modeldir=TFLite_model
The image will appear with all objects labeled. Press 'q' to close the image and end the script. By default, the image detection script will open an image named 'test1.jpg'. To open a specific image file, use the --image
option:
python TFLite_detection_image.py --modeldir=TFLite_model --image=squirrel.jpg
It can also open an entire folder full of images and perform detection on each image. There can only be images files in the folder, or errors will occur. To specify which folder has images to perform detection on, use the --imagedir
option:
python TFLite_detection_image.py --modeldir=TFLite_model --imagedir=squirrels
Press any key (other than 'q') to advance to the next image. Do not use both the --image option and the --imagedir option when running the script, or it will throw an error.
To save labeled images and a text file with detection results for each image, use the --save_results
option. The results will be saved to a folder named <imagedir>_results
. This works well if you want to check your model's performance on a folder of images and use the results to calculate mAP with the calculate_map_catchuro.py script. For example:
python TFLite_detection_image.py --modeldir=TFLite_model --imagedir=squirrels --save_results
The --noshow_results
option will stop the program from displaying images.
See all command options
For more information on options that can be used while running the scripts, use the -h
option when calling them. For example:
python TFLite_detection_image.py -h
If you encounter errors, please check the FAQ section of this guide. It has a list of common errors and their solutions. If you can successfully run the script, but your object isn’t detected, it is most likely because your model isn’t accurate enough. The FAQ has further discussion on how to resolve this.
Examples
(Still to come!) Please see the examples folder for examples of how to use your TFLite model in basic vision applications.
FAQs
What's the difference between the TensorFlow Object Detection API and TFLite Model Maker?
Google provides a set of Colab notebooks for training TFLite models called [TFLite Model Maker](https://www.tensorflow.org/lite/models/modify/model_maker). While their object detection notebook is straightfoward and easy to follow, using the [TensorFlow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) for creating models provides several benefits:
- TFLite Model Maker only supports EfficientDet models, which aren't as fast as SSD-MobileNet models.
- Training models with the Object Detection API generally results in better model accuracy.
- The Object Detection API provides significantly more flexibility in model and training configuration (training steps, learning rate, model depth and resolution, etc).
- Google still recommends using the Object Detection API as the formal method for training models with large datasets.
What's the difference between training, transfer learning, and fine-tuning?
Using correct terminology is important in a complicated field like machine learning. In this notebook, I use the word "training" to describe the process of teaching a model to recognize custom objects, but what we're actually doing is "fine-tuning". The Keras documentation gives a [good example notebook](https://keras.io/guides/transfer_learning/) explaining the difference between each term.
Here's my attempt at defining the terms:
- Training: The process of taking a full neural network with randomly initialized weights, passing in image data, calculating the resulting loss from its predictions on those images, and using backpropagation to adjust the weights in every node of the network and reduce its loss. In this process, the network learns how to extract features of interest from images and correlate those features to classes. Training a model from scratch typically takes millions of training steps and a large dataset of 100,000+ images (such as ImageNet or COCO). Let's leave actual training to companies like Google and Microsoft!
- Transfer learning: Taking a model that has already been trained, unfreezing the last layer of the model (i.e. making it so only the last layer's weights can be modified), and retraining the last layer with a new dataset so it can learn to identify new classes. Transfer learning takes advantage of the feature extraction capabilities that have already been learned in the deep layers of the trained model. It takes the extracted features and recategorizes them to predict new classes.
- Fine-tuning: Fine-tuning is similar to transfer learning, except more layers are unfrozen and retrained. Instead of just unfreezing the last layer, a significant amount of layers (such as the last 20% to 50% of layers) are unfrozen. This allows the model to modify some of its feature extraction layers so it can extract features that are more relevant to the classes its trying to identify. This notebook (and the TensorFlow Object Detection API) uses fine-tuning.
In general, I like to use the word "training" instead of "fine-tuning", because it's more intuitive and understandable to new users.
Should I get a Google Colab Pro subscription?
If you plan to use Colab frequently for training models, I recommend getting a Colab Pro subscription. It provides several benefits:
- Idle Colab sessions remain connected for longer before timing out and disconnecting
- Allows for running multiple Colab sessions at once
- Priority access to TPU and GPU-enabled virtual machines
- Virtual machines have more RAM
Colab keeps track of how much GPU time you use, and cuts you off from using GPU-enabled instances once you reach a certain use time. If you get the message telling you you're cut off from GPU instances, then that's a good indicator that you use Colab enough to justify paying for a Pro subscription.