Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data
Official PyTorch implementation of the method SLidR. More details can be found in the paper:
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data, CVPR 2022 [arXiv] by Corentin Sautier, Gilles Puy, Spyros Gidaris, Alexandre Boulch, Andrei Bursuc, and Renaud Marlet
If you use SLidR in your research, please consider citing:
@InProceedings{SLidR,
author = {Sautier, Corentin and Puy, Gilles and Gidaris, Spyros and Boulch, Alexandre and Bursuc, Andrei and Marlet, Renaud},
title = {Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {9891-9901}
}
Dependencies
Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in requirements.txt
.
Datasets
The code provided is compatible with nuScenes and semantic KITTI. Put the datasets you intend to use in the datasets folder (a symbolic link is accepted).
Pre-trained models
Minkowski SR-UNet
SR-UNet pre-trained on nuScenes
SPconv VoxelNet
VoxelNet pre-trained on nuScenes
Reproducing the results
Pre-computing the superpixels (required)
Before launching the pre-training, you first need to compute all superpixels on nuScenes, this can take several hours. You can either compute superpixels for the Minkowski SR-UNet (minkunet) or the voxelnet backbones. The first is adapted for semantic segmentation and the second for object detection.
python superpixel_segmenter.py --model minkunet
Pre-training a 3D backbone
To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:
python pretrain.py --cfg config/slidr_minkunet.yaml
You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task.
If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.
Semantic segmentation
To launch a semantic segmentation, use the following command:
python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"
with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.
To re-evaluate the score of any downstream network, run:
python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"
If you wish to reevaluate the linear probing, the experiments in the paper were obtained with lr=0.05
, lr_head=null
and freeze_layers=True
.
Object detection
All experiments for object detection have been done using OpenPCDet.
Published results
All results are obtained with a pre-training on nuScenes.
Few-shot semantic segmentation
Results on the validation set using Minkowski SR-Unet:
Method | nuScenes lin. probing |
nuScenes Finetuning with 1% data |
KITTI Finetuning with 1% data |
---|---|---|---|
Random init. | 8.1 | 30.3 | 39.5 |
PointContrast | 21.9 | 32.5 | 41.1 |
DepthContrast | 22.1 | 31.7 | 41.5 |
PPKT | 36.4 | 37.8 | 43.9 |
SLidR | 38.8 | 38.3 | 44.6 |
Semantic Segmentation on nuScenes
Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:
Method | 1% | 5% | 10% | 25% | 100% |
---|---|---|---|---|---|
Random init. | 30.3 | 47.7 | 56.6 | 64.8 | 74.2 |
SLidR | 39.0 | 52.2 | 58.8 | 66.2 | 74.6 |
Object detection on KITTI
Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:
Method | 5% | 10% | 20% |
---|---|---|---|
Random init. | 56.1 | 59.1 | 61.6 |
PPKT | 57.8 | 60.1 | 61.2 |
SLidR | 57.8 | 61.4 | 62.4 |
Unpublished preliminary results
All results are obtained with a pre-training on nuScenes.
PV-RCNN:
Results on the validation set usingMethod | Car | Pedestrian | Cyclist | mAP@40 |
---|---|---|---|---|
Random init. | 84.5 | 57.9 | 71.3 | 71.3 |
STRL* | 84.7 | 57.8 | 71.9 | 71.5 |
PPKT | 83.2 | 55.5 | 73.8 | 70.8 |
SLidR | 84.4 | 57.3 | 74.2 | 71.9 |
*STRL has been pre-trained on KITTI, while SLidR and PPKT were pre-trained on nuScenes
SECOND:
Results on the validation set usingMethod | Car | Pedestrian | Cyclist | mAP@40 |
---|---|---|---|---|
Random init. | 81.5 | 50.9 | 66.5 | 66.3 |
DeepCluster* | 66.1 | |||
SLidR | 81.9 | 51.6 | 68.5 | 67.3 |
*As reimplemented in ONCE
Visualizations
For visualization you need a pre-training containing both 2D & 3D models. We provide the raw SR-UNet & ResNet50 pre-trained on nuScenes. The image part of the pre-trained weights are identical for almost all layers to those of MoCov2 (He et al.)
The visualization code allows to assess the similarities between points and pixels, as shown in the article.
Acknowledgment
Part of the codebase has been adapted from PointContrast. Computation of the lovasz loss used in semantic segmentation follows the code of PolarNet.
License
SLidR is released under the Apache 2.0 license.