FaceBoxes-tensorflow
This is an implementation of FaceBoxes: A CPU Real-time Face Detector with High Accuracy.
I provide full training code, data preparation scripts, and a pretrained model.
The detector has speed ~7 ms/image (image size is 1024x1024, video card is NVIDIA GeForce GTX 1080).
How to use the pretrained model
To use the pretrained face detector you will need to download face_detector.py
and
a frozen inference graph (.pb
file, it is here). You can see an example of usage in try_detector.ipynb
.
Requirements
- tensorflow 1.10 (inference was tested using tensorflow 1.12)
- opencv-python, Pillow, tqdm
Notes
- Warning: This detector doesn't work well on small faces.
But you can improve its performance if you upscale images before feeding them to the network.
For example, resize an image keeping its aspect ratio so its smaller dimension is 768. - You can see how anchor densification works in
visualize_densified_anchor_boxes.ipynb
. - You can see how my data augmentation works in
test_input_pipeline.ipynb
. - The speed on a CPU is ~25 ms/image (image size is 1024x768, model is i7-7700K CPU @ 4.20GHz).
How to train
For training I use train
+val
parts of the WIDER dataset.
It is 16106 images in total (12880 + 3226).
For evaluation during the training I use the FDDB dataset (2845 images) and AP@IOU=0.5
metrics (it is not like in the original FDDB evaluation, but like in PASCAL VOC Challenge).
- Run
prepare_data/explore_and_prepare_WIDER.ipynb
to prepare the WIDER dataset
(also, you will need to combine the two created dataset parts usingcp train_part2/* train/ -a
). - Run
prepare_data/explore_and_prepare_FDDB.ipynb
to prepare the FDDB dataset. - Create tfrecords:
python create_tfrecords.py \
--image_dir=/home/gpu2/hdd/dan/WIDER/train/images/ \
--annotations_dir=/home/gpu2/hdd/dan/WIDER/train/annotations/ \
--output=data/train_shards/ \
--num_shards=150
python create_tfrecords.py \
--image_dir=/home/gpu2/hdd/dan/FDDB/val/images/ \
--annotations_dir=/home/gpu2/hdd/dan/FDDB/val/annotations/ \
--output=data/val_shards/ \
--num_shards=20
- Run
python train.py
to train a face detector. Evaluation on FDDB will happen periodically. - Run
tensorboard --logdir=models/run00
to observe training and evaluation. - Run
python save.py
andcreate_pb.py
to convert the trained model into a.pb
file. - Use
class
inface_detector.py
and.pb
file to do inference. - Also, you can get my final training checkpoint here.
- The training speed was
~2.6 batches/second
on one NVIDIA GeForce GTX 1080. So total training time is ~26 hours
(and I believe that you can make it much shorter if you optimize the input pipeline).
Training loss curve looks like this:
How to evaluate on FDDB
- Download the evaluation code from here.
tar -zxvf evaluation.tgz; cd evaluation
.
Then compile it usingmake
(it can be very tricky to make it work).- Run
predict_for_FDDB.ipynb
to make predictions on the evaluation dataset.
You will getellipseList.txt
,faceList.txt
,detections.txt
, andimages/
. - Run
./evaluate -a result/ellipseList.txt -d result/detections.txt -i result/images/ -l result/faceList.txt -z .jpg -f 0
. - You will get something like
eval_results/discrete-ROC.txt
. - Run
eval_results/plot_roc.ipynb
to plot the curve.
Also see this repository and the official FAQ if you have questions about the evaluation.
Results on FDDB
True positive rate at 1000 false positives is 0.902
.
Note that it is lower than in the original paper.
Maybe it's because some hyperparameters are wrong.
Or it's because I didn't do any upscaling of images when evaluating
(I used the original image size).
You can see the whole curve in eval_results/discrete-ROC.txt
(it's the output of the official evaluation script).