PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
CVPR 2023
TL;DR: PLA leverages powerful VL foundation models to construct hierarchical 3D-text pairs for 3D open-world learning.
working space | piano | vending machine |
TODO
- Release caption processing code
Getting Started
Installation
Please refer to INSTALL.md for the installation.
Dataset Preparation
Please refer to DATASET.md for dataset preparation.
Training & Inference
Please refer to MODEL.md for training and inference scripts and pretrained models.
Citation
If you find this project useful in your research, please consider cite:
@inproceedings{ding2022language,
title={PLA: Language-Driven Open-Vocabulary 3D Scene Understanding},
author={Ding, Runyu and Yang, Jihan and Xue, Chuhui and Zhang, Wenqing and Bai, Song and Qi, Xiaojuan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}
Acknowledgement
Code is partly borrowed from OpenPCDet, PointGroup and SoftGroup.