• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Pytorch implementation of PCME | Paper

Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de Rezende2 Yannis Kalantidis2 Diane Larlus2

1NAVER AI LAB
2NAVER LABS Europe

VIDEO

Updates

  • 16 Jul, 2022: Add PCME CutMix-pretrained weight (used for ECCV Caption paper)
  • 23 Jun, 2021: Initial upload.

Installation

Install dependencies using the following command.

pip install cython && pip install -r requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/opt/conda/nltk_data")'
git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dockerfile

You can use my docker image as well

docker pull sanghyukchun/pcme:torch1.2-apex-dali

Please Add --model__cache_dir /vector_cache when you run the code

Configuration

All experiments are based on configuration files (see config/coco and config/cub). If you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:

python <train | eval>.py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob

See config/parser.py for details

Dataset preparation

COCO Caption

We followed the same split provided by VSE++. Dataset splits can be found in datasets/annotations.

Note that we also need instances_<train | val>2014.json for computing PMRP score.

CUB Caption

Download images (CUB-200-2011) from this link, and download caption from reedscot/cvpr2016. You can use the image path and the caption path separately in the code.

Evaluate pretrained models

NOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient:
It first dumps all ranked items for each item to a local file, and compute R-precision.
We are planning to re-implement efficient PMRP as soon as possible.

COCO Caption

# Compute recall metrics
python evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root <your_dataset_path> \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image
# Compute plausible match R-Precision (PMRP) metric
python extract_rankings_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root <your_dataset_path> \
    --model_path model_last.pth \
    --dump_to <dumped_ranking_file> \
    # --model__cache_dir /vector_cache # if you use my docker image

python evaluate_pmrp_coco.py --ranking_file <dumped_ranking_file>
Method I2T 1K PMRP I2T 1K R@1 I2T ECCV mAP@R T2I 1K PMRP T2I 1K R@1 T2I ECCV mAP@R Model file
PCME 45.0 68.8 26.2 46.0 54.6 48.0 link
PCME (CutMix-pretrained) 46.2 68.3 28.6 47.1 56.7 54.9 link
PVSE K=1 40.3 66.7 23.4 41.8 53.5 44.6 -
PVSE K=2 42.8 69.2 26.7 43.6 55.2 53.8 -
VSRN 41.2 76.2 30.8 42.4 62.8 53.8 -
VSRN + AOQ 44.7 77.5 30.7 45.6 63.5 51.2 -

Check ECCV Caption dataset for more details of "ECCV mAP@R".

CUB Caption

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root <your_dataset_path> \
    --caption_root <your_caption_path> \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image

NOTE: If you just download file from reedscot/cvpr2016, then caption_root will be cvpr2016_cub/text_c10

If you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root <your_dataset_path> \
    --caption_root <your_caption_path> \
    --model_path model_last.pth \
    --model__eval_method <distance_method> \
    # --model__cache_dir /vector_cache # if you use my docker image

You can choose distance_method in ['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']

How to train

NOTE: we train each model with mixed-precision training (O2) on a single V100.
Since, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.
Please note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.

COCO Caption

python train_coco.py ./config/coco/pcme_coco.yaml --dataset_root <your_dataset_path> \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 46 hours in a single V100 with mixed precision training.

CUB Caption

We use CUB Caption dataset (Reed, et al. 2016) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as positive. Since our split is based on the zero-shot learning benchmark (Xian, et al. 2017), we leave out 50 classes from 200 bird classes for the evaluation.

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Xian, Yongqin, Bernt Schiele, and Zeynep Akata. "Zero-shot learning-the good, the bad and the ugly." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

hyperparameter search

We additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation.

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root <your_dataset_path> \
    --caption_root <your_caption_path> \
    --dataset_name cub_trainval1 \
    # --model__cache_dir /vector_cache # if you use my docker image

Similarly, you can use cub_trainval2 and cub_trainval3 as well.

training with full training classes

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root <your_dataset_path> \
    --caption_root <your_caption_path> \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 4 hours in a single V100 with mixed precision training.

How to cite

@inproceedings{chun2021pcme,
    title={Probabilistic Embeddings for Cross-Modal Retrieval},
    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},
    year={2021},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

MIT License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

More Repositories

1

DenseDiffusion

Official Pytorch Implementation of DenseDiffusion (ICCV 2023)
Jupyter Notebook
466
star
2

StyleMapGAN

Official pytorch implementation of StyleMapGAN (CVPR 2021)
Python
458
star
3

Visual-Style-Prompting

Official Pytorch implementation of "Visual Style Prompting with Swapping Self-Attention"
Python
415
star
4

relabel_imagenet

Python
395
star
5

vidt

Python
305
star
6

pit

Python
240
star
7

korean-safety-benchmarks

Official datasets and pytorch implementation repository of SQuARe and KoSBi (ACL 2023)
Python
233
star
8

BlendNeRF

Official pytorch implementation of BlendNeRF (ICCV 2023)
Python
149
star
9

c3-gan

Official Pytorch implementation of C3-GAN (Spotlight at ICLR 2022)
Python
125
star
10

rope-vit

[ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"
Python
124
star
11

GGDR

Official Pytorch implementation of GGDR (ECCV 2022)
Python
102
star
12

cl-vs-mim

(ICLR 2023) Official PyTorch implementation of "What Do Self-Supervised Vision Transformers Learn?"
Jupyter Notebook
97
star
13

calm

Python
91
star
14

PfLayer

Learning Features with Parameter-Free Layers, ICLR 2022
Python
85
star
15

rdnet

[ECCV2024] Official implementation of paper, "DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs".
Python
84
star
16

w-ood

Python
80
star
17

model-stock

Model Stock: All we need is just a few fine-tuned models
72
star
18

egtr

[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation
Python
65
star
19

hypermix

Code for text augmentation method leveraging large-scale language models
Python
60
star
20

carecall-corpus

CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).
59
star
21

eccv-caption

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Python
52
star
22

i-Blurry

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)
Python
52
star
23

seit

[ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT
Python
51
star
24

FSMR

Official Tensorflow implementation of "Feature Statistics Mixing Regularization for Generative Adversarial Networks" (CVPR 2022)
Python
49
star
25

pcmepp

Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Python
48
star
26

cmo

Python
45
star
27

facetts

Python
44
star
28

cream

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023
Python
42
star
29

dap-cl

Official code of "Generating Instance-level Prompts for Rehearsal-free Continual Learning (ICCV 2023)"
Python
39
star
30

NeglectedFreeLunch

Jupyter Notebook
36
star
31

neuralwoz

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)
Python
36
star
32

dual-teacher

Official code for the NeurIPS 2023 paper "Switching Temporary Teachers for Semi-Supervised Semantic Segmentation"
Python
35
star
33

augsub

Official PyTorch implementation of MaskSub "Masking Augmentation for Supervised Learning"
Python
32
star
34

chacha-chatbot

Python
31
star
35

tablevqabench

Jupyter Notebook
30
star
36

carecall-memory

Keep Me Updated! Memory Management in Long-term Conversations (Findings of EMNLP 2022)
28
star
37

mid.metric

Python
27
star
38

MetricMT

The official code repository for MetricMT - a reward optimization method for NMT with learned metrics
25
star
39

scob

Official Implementation of SCOB [ICCV 2023]
Python
22
star
40

ALMoST

Python
22
star
41

coco-annotation-tool

TypeScript
21
star
42

hmix-gmix

Jupyter Notebook
21
star
43

imagenet-annotation-tool

TypeScript
17
star
44

informer

17
star
45

cs-shortcut

Saving Dense Retriever from Shortcut Dependency in Conversational Search (EMNLP 2022)
Python
16
star
46

talebrush

The official source code for TaleBrush (CHI 2022)
Python
14
star
47

cgl_fairness

Python
14
star
48

KoBBQ

Official code and dataset repository of KoBBQ (TACL 2024)
Python
14
star
49

trace

TRACE: Table Reconstruction Aligned to Corner and Edges (ICDAR 2023)
Python
12
star
50

simseek

Generating Information-Seeking Conversations from Unlabeled Documents (EMNLP 2022).
Python
11
star
51

tc-clip

[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"
Python
10
star
52

burn

Official Pytorch Implementation of Unsupervised Representation Learning for Binary Networks by Joint Classifier Training (CVPR 2022)
Python
10
star
53

tokenadapt

Python
8
star
54

llm-chatbot

The LLM chatbot demo website
HTML
7
star
55

lut

[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
5
star
56

elva

On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
5
star
57

rewas

5
star
58

densediffusion

5
star
59

rite

Python
5
star
60

demystifying-ntk

Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training? (CVPR 2022)
Python
2
star
61

carte

CARTE: Cell Adjacency Relation for Table Evaluation
Python
2
star
62

chacha

TypeScript
1
star