Leveraging Unlabeled Data for Crowd Counting by Learning to Rank
The paper will appear in CVPR 2018. An arXiv pre-print version is available.
The updated version is accpeted at IEEE Transactions on Pattern Analysis and Machine Intelligence. Here is arXiv pre-print version.
Citation
Please cite our paper if you are inspired by the idea.
@inproceedings{xialei2018crowd,
title={Leveraging Unlabeled Data for Crowd Counting by Learning to Rank},
author={Liu, Xialei and van de Weijer, Joost and Bagdanov, Andrew D},
booktitle={Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2018},
url = {https://github.com/xialeiliu/CrowdCountingCVPR18}
}
and
@ARTICLE{8642842,
author={X. {Liu} and J. {Van De Weijer} and A. D. {Bagdanov}},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
title={Exploiting Unlabeled Data in CNNs by Self-supervised Learning to Rank},
year={2019},
pages={1-1},
doi={10.1109/TPAMI.2019.2899857},
ISSN={0162-8828}, }
Authors
Xialei Liu, Joost van de Weijer and Andrew D. Bagdanov
Institutions
Computer Vision Center, Barcelona, Spain
Media Integration and Communication Center, University of Florence, Florence, Italy
Abstract
We propose a novel crowd counting approach that leverages abundantly available unlabeled crowd imagery in a learning-to-rank framework. To induce a ranking of cropped images , we use the observation that any sub-image of a crowded scene image is guaranteed to contain the same number or fewer persons than the super-image. This allows us to address the problem of limited size of existing datasets for crowd counting. We collect two crowd scene datasets from Google using keyword searches and queryby-example image retrieval, respectively. We demonstrate how to efficiently learn from these unlabeled datasets by incorporating learning-to-rank in a multi-task network which simultaneously ranks images and estimates crowd density maps. Experiments on two of the most challenging crowd counting datasets show that our approach obtains state-ofthe-art results.
Framework
The main idea of our approach is to address the problem of limited Crowd Counting dataset size, which allows us to leverage abundantly available unlabeled crowd imagery in a learning-to-rank framework.
Requirments
All training and test are done in Caffe framework.
- Requirements for
caffe
andpycaffe
(see: Caffe installation instructions). Caffe must be built with support for Python layers!
# In your Makefile.config, make sure to have this line uncommented
WITH_PYTHON_LAYER := 1
- Download the pre-trained VGG-16 ImageNet model for finetuning.
Pre-trained models
The pre-trained models are available to download.
Useful tools
We use the code from here to download and prepare the datasets, generate the density maps and evalate the models.