CLIP-goes-3D
Official code for the paper "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"
This repository includes the pre-trained models, evaluation and training codes for pre-training, zero-shot, and fine-tuning experiments of CG3D. It is built on the Point-BERT codebase. Please see the end of this document for a full list of code references.
To-Do:
- Setup
- Model weights from pre-training
- Model weights from fine-tuning
- Pre-training CG3D
- Zero-shot inference
- Fine-tune inference
- Fine-tuning CG3D
- Retrieval
Environment set-up
The known working environment configuration is
python 3.9
pytorch 1.12
CUDA 11.6
- Install the conda virtual environment using the provided .yml file.
conda env create -f environment.yml
(OR)
-
Install dependencies manually.
conda create -n cg3d conda activate cg3d
pip install -r requirements.txt
conda install -c anaconda scikit-image scikit-learn scipy
pip install git+https://github.com/openai/CLIP.git
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
cd ./extensions/chamfer_dist python setup.py develop
-
Build modified timm from scratch
cd ./models/SLIP/pytorch-image-models pip install -e .
-
Install PointNet ops
cd third_party/Pointnet2_PyTorch pip install -e . pip install pointnet2_ops_lib/.
-
Install PyGeM
cd third_party/PyGeM
python setup.py install
Dataset set-up
- Download point cloud datasets for pre-training and fine-tuning.
-
Download ShapeNetCore v2.
-
Download ModelNet
-
Download ScanObjectNN
Save and unzip the above datasets.
-
Render views of textured CAD models of ShapeNet using this repository. We use a scale of 0.7 and 5 total views.
-
The data should be organized as
├── data (this may be wherever you choose)
│ ├── modelnet40_normal_resampled
│ │ │── modelnet10/40_shape_names.txt
│ │ │── modelnet10/40_train/test.txt
│ │ │── airplane
│ │ │── ...
│ │ │── laptop
│ ├── ShapeNet55
│ │ │── train.txt
│ │ │── test.txt
│ │ │── shapenet_pc
│ │ │ |── 03211117-62ac1e4559205e24f9702e673573a443.npy
│ │ │ |── ...
│ ├── shapenet_render
│ │ │── train_img.txt
│ │ │── val_img.txt
│ │ │── shape_names.txt
│ │ │── taxonomy.json
│ │ │── camera
│ │ │── img
│ │ │ |── 02691156
│ │ │ |── ...
│ ├── ScanObjectNN
│ │ │── main_split
│ │ │── ...
1) Model weights
a) Pre-trained CG3D weights
Download SLIP model weights from here.
PointTransformer
No. of points | Model file | Task | Configuration file |
---|---|---|---|
1024 | download | Pre-training | link |
8192 | download | Pre-training | link |
PointMLP
No. of points | Model file | Task | Configuration file |
---|---|---|---|
1024 | download | Pre-training | link |
8192 | download | Pre-training | link |
Test Zero-Shot performance
python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT} --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}
b) Fine-tuning model weights
PointTransformer
Dataset | Model Weights | TFBoard |
---|---|---|
ScanObjectNN | download | link |
ModelNet | download | link |
PointMLP
Dataset | Model Weights | TFBoard |
---|---|---|
ScanObjectNN | download | link |
ModelNet | download | link |
2) Training CG3D
a) Pre-training
-
Change data paths to relevant locations in
cfgs/dataset_configs/
-
Pre-train PointTransformer on ShapeNet under the CG3D framework:
python main.py --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointTransformerVPT.yaml --pretrain --out_dir {OUTPUT DIR PATH} --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}
-
Pre-train PointMLP on ShapeNet under the CG3D framework:
python main.py --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointMLP_VPT.yaml --pretrain --out_dir {OUTPUT DIR PATH} --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}
Zero-Shot Inference
python eval.py --config cfgs/ShapeNet55_models/{CONFIG} --exp_name {NAME FOR EXPT} --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL} --zshot --npoints {1024,8192}
Fine-tuning Inference
python eval.py --config cfgs/{ModelNet_models,ScanObjectNN_models}/{CONFIG} --exp_name {NAME FOR EXPT} --ckpts {CKPT PATH} --npoints {1024,8192}
b) Fine-tuning
Finetuning PointTransformer:
CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointTransformer.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}
Finetuning PointMLP:
CUBLAS_WORKSPACE_CONFIG=:4096:8 CUDA_VISIBLE_DEVICES=0 python finetune_cg3d.py --config cfgs/ModelNet_models/PointMLP.yaml --exp_name {NAME OF EXPT} --finetune_model --ckpts {PATH OF PRETRAINED MODEL WEIGHTS} --dataset_root {PATH OF DATA STORAGE}
Please change the .yml files to change the finetuning dataset from ModelNet to ScanObjectNN, etc.
References
Citation
@article{hegde2023clip,
title={CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition},
author={Hegde, Deepti and Valanarasu, Jeya Maria Jose and Patel, Vishal M},
journal={arXiv preprint arXiv:2303.11313},
year={2023}
}