• Stars
    star
    174
  • Rank 219,104 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 2 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Is synthetic data from generative models ready for image recognition?

Is synthetic data from generative models ready for image recognition?

Is synthetic data from generative models ready for image recognition? (ICLR 2023, Spotlight)
By Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi.

Abstract

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings ({\ie} zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.

pic

pic

pic

Getting started

  1. Clone our repo: git clone https://github.com/CVMI-Lab/SyntheticData.git

  2. Install dependencies:

    conda create -n SyntheticData python=3.7
    conda activate SyntheticData
    pip install -r requirements.txt

Zero-shot settings

Synthetic data generation

Language Enhancement

We generate sentences from label names of a specific dataset and save the generated sentences offline.

Input the targeted label space in variable labels in file src/LE.py and run it like:

python3.7 src/LE.py 200 /path/to/save/dataset.pkl

where 200 is the number of sentence for each label, and the latter is the save path for the generated sentences.

Text-to-Image generation

We use GLIDE for text-to-image generation, and follow the official instructions for the generation process.

We use text generated from language enhancement as prompts for the text-to-image generation.

We provide a multi-gpu generation code example in src/glide/glide_zsl.py and run it like:

sh glide/gen_zsl.sh /path/to/save/dataset.pkl /path/to/save/dataset

CLIP Filter

We use CLIP to help filter out unreliable images:

# under dir: classifier-tuning
python3.7 src/select_glide_ims_by_clip.py /path/to/synthetic/dataset 10 # 10 is the number of class for a given task

Synthetic data for ZSL: Classifier-Tuning with CLIP

We revise from the Wise-ft codebase. Here, we provide a example for the Eurosat dataset.

"model" could choose "RN50"/"ViT-B/16".

Note that you should download the validation/test data for each dataset and revise the path in src/classifier-tuning/src/dataset/transfer_datasets.py.

python3.7 src/ct_zsl.py   \
      --freeze-encoder \
      --sl=0.5 \
      --sl_T=2 \
      --train-dataset=Eurosat  \
      --save=/path/to/save/results \
      --epochs=30  \
      --lr=2e-3  \
      --wd=0.1 \
      --batch-size=512  \
      --warmup_length=0 \
      --cache-dir=cache  \
      --model=RN50  \
      --eval-datasets=Eurosat \
      --template=eurosat_template  \
      --results-db=results.jsonl  \
      --data-location=/path/to/synthetic/data | tee results/${exp_name}/train-$now.log

Few-shot settings

Synthetic data generation-RG

We provide the code for our proposed Real Guidance strategy. We would first obtain a set of few-shot images for a given task. You may need to revise the function get_few_shot_images_path_prompt_pairs() that returns a list of (im_path, prompt) in file src/glide/glide_fsl.py.

Also, you should set the variable refer_img_iters to 15, 20, 35, 40, and 50 for shot 16, 8, 4, 2, and 1, respectively, and make the result of batch_size * batch_size_time * shot =800.

We provide a multi-gpu generation code example in src/glide/glide_fsl.py and run it like:

sh glide/gen_fsl.sh /path/to/few-shot/images /path/to/save/dataset

Synthetic data for FSL: Classifier-Tuning with CLIP

Again, we revise from the Wise-ft codebase. Following is a example:

python3.7 src/ct_fsl.py   \
      --freeze-encoder \
      --sl=0.5 \
      --sl_T=2 \
      --train-dataset=Eurosat  \
      --save=/path/to/save/results \
      --epochs=30  \
      --lr=1e-3  \
      --wd=0.1 \
      --batch-size-real=32  \
      --batch-size-syn=512  \
      --loss-weight=1.0 \
      --loss-weight-real=1.0 \
      --warmup_length=0 \
      --cache-dir=cache  \
      --model=RN50  \
      --eval-datasets=Eurosat \
      --template=eurosat_template  \
      --results-db=results.jsonl  \
      --data-location=/path/to/synthetic/data \
      --data-location-real=/path/to/few-shot/data | tee results/${exp_name}/train-$now.log

Pre-training settings

Synthetic data generation

We adopt language enhancement strategy only for pre-training setting. Please modify the files (src/LE.py, src/glide/glide_zsl.py) in zero-shot settings for generating synthetic pre-training data.

Pre-training with synthetic data

We recommend using timm codebase for its wonderful implementation for pre-training. For concrete hyper-parameters, please refer to Sec. C.5.3 in our Appendix.

Citing this work

If you find this repo useful for your research, please consider citing our paper:

@article{he2022synthetic,
  title={Is synthetic data from generative models ready for image recognition?},
  author={He, Ruifei and Sun, Shuyang and Yu, Xin and Xue, Chuhui and Zhang, Wenqing and Torr, Philip and Bai, Song and Qi, Xiaojuan},
  journal={arXiv preprint arXiv:2210.07574},
  year={2022}
}

Acknowledgement

We thank the open source code from GLIDE, CLIP, keytotext, Wise-ft, timm, Detectron2, DeiT, MoCo.

More Repositories

1

PAConv

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Python
281
star
2

PLA

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding & (CVPR2024) RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Python
260
star
3

ST3D

(CVPR 2021 & T-PAMI 2022) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection & ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection
Python
250
star
4

UHDM

(ECCV2022) This is the official PyTorch implementation of ECCV2022 paper: Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing
Python
199
star
5

SparseKD

(NeurlPS 2022) Towards Efficient 3D Object Detection with Knowledge Distillation
Python
110
star
6

IST-Net

(ICCV2023) IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
Python
107
star
7

SlotCon

(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
Python
91
star
8

SimGCD

(ICCV 2023) Parametric Classification for Generalized Category Discovery: A Baseline Study
Python
85
star
9

VideoDemoireing

(CVPR 2022) Video Demoireing with Relation-Based Temporal Consistency
Python
75
star
10

HybridNeuralRendering

(CVPR 2023) Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur
Python
63
star
11

DARS

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.
Python
63
star
12

SPS-Conv

(NeurlPS 2022) Spatial Pruned Sparse Convolution for Efficient 3D Object Detection
Python
62
star
13

MarS3D

(CVPR 2023) MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
Python
62
star
14

CoDet

(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Python
61
star
15

KDEP

(CVPR2022) Official PyTorch Implementation of KDEP. Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
Python
61
star
16

Total-Decom

54
star
17

DODA

(ECCV 2022) DODA: Data-oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation
Python
46
star
18

FS3D

(NeurlPS 2022) Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection
Python
39
star
19

ResKD

[NeurIPS 2022] Official implementation of the paper "Rethinking Resolution in the Context of Efficient Video Recognition".
Python
31
star
20

SC-GS

[CVPR 2024] Code for SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Python
24
star
21

clip-beyond-tail

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
Jupyter Notebook
16
star
22

SyncOOD

(ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?
Python
12
star
23

Hybrid-Occ-SDF

This is the officially implementation of ICCV 2023 paper " Learning A Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation"
Python
11
star