• Stars
    star
    162
  • Rank 225,417 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Is synthetic data from generative models ready for image recognition?

Is synthetic data from generative models ready for image recognition?

Is synthetic data from generative models ready for image recognition? (ICLR 2023, Spotlight)
By Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi.

Abstract

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images. Though the results are astonishing to human eyes, how applicable these generated images are for recognition tasks remains under-explored. In this work, we extensively study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks, and focus on two perspectives: synthetic data for improving classification models in data-scarce settings ({\ie} zero-shot and few-shot), and synthetic data for large-scale model pre-training for transfer learning. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.

pic

pic

pic

Getting started

  1. Clone our repo: git clone https://github.com/CVMI-Lab/SyntheticData.git

  2. Install dependencies:

    conda create -n SyntheticData python=3.7
    conda activate SyntheticData
    pip install -r requirements.txt

Zero-shot settings

Synthetic data generation

Language Enhancement

We generate sentences from label names of a specific dataset and save the generated sentences offline.

Input the targeted label space in variable labels in file src/LE.py and run it like:

python3.7 src/LE.py 200 /path/to/save/dataset.pkl

where 200 is the number of sentence for each label, and the latter is the save path for the generated sentences.

Text-to-Image generation

We use GLIDE for text-to-image generation, and follow the official instructions for the generation process.

We use text generated from language enhancement as prompts for the text-to-image generation.

We provide a multi-gpu generation code example in src/glide/glide_zsl.py and run it like:

sh glide/gen_zsl.sh /path/to/save/dataset.pkl /path/to/save/dataset

CLIP Filter

We use CLIP to help filter out unreliable images:

# under dir: classifier-tuning
python3.7 src/select_glide_ims_by_clip.py /path/to/synthetic/dataset 10 # 10 is the number of class for a given task

Synthetic data for ZSL: Classifier-Tuning with CLIP

We revise from the Wise-ft codebase. Here, we provide a example for the Eurosat dataset.

"model" could choose "RN50"/"ViT-B/16".

Note that you should download the validation/test data for each dataset and revise the path in src/classifier-tuning/src/dataset/transfer_datasets.py.

python3.7 src/ct_zsl.py   \
      --freeze-encoder \
      --sl=0.5 \
      --sl_T=2 \
      --train-dataset=Eurosat  \
      --save=/path/to/save/results \
      --epochs=30  \
      --lr=2e-3  \
      --wd=0.1 \
      --batch-size=512  \
      --warmup_length=0 \
      --cache-dir=cache  \
      --model=RN50  \
      --eval-datasets=Eurosat \
      --template=eurosat_template  \
      --results-db=results.jsonl  \
      --data-location=/path/to/synthetic/data | tee results/${exp_name}/train-$now.log

Few-shot settings

Synthetic data generation-RG

We provide the code for our proposed Real Guidance strategy. We would first obtain a set of few-shot images for a given task. You may need to revise the function get_few_shot_images_path_prompt_pairs() that returns a list of (im_path, prompt) in file src/glide/glide_fsl.py.

Also, you should set the variable refer_img_iters to 15, 20, 35, 40, and 50 for shot 16, 8, 4, 2, and 1, respectively, and make the result of batch_size * batch_size_time * shot =800.

We provide a multi-gpu generation code example in src/glide/glide_fsl.py and run it like:

sh glide/gen_fsl.sh /path/to/few-shot/images /path/to/save/dataset

Synthetic data for FSL: Classifier-Tuning with CLIP

Again, we revise from the Wise-ft codebase. Following is a example:

python3.7 src/ct_fsl.py   \
      --freeze-encoder \
      --sl=0.5 \
      --sl_T=2 \
      --train-dataset=Eurosat  \
      --save=/path/to/save/results \
      --epochs=30  \
      --lr=1e-3  \
      --wd=0.1 \
      --batch-size-real=32  \
      --batch-size-syn=512  \
      --loss-weight=1.0 \
      --loss-weight-real=1.0 \
      --warmup_length=0 \
      --cache-dir=cache  \
      --model=RN50  \
      --eval-datasets=Eurosat \
      --template=eurosat_template  \
      --results-db=results.jsonl  \
      --data-location=/path/to/synthetic/data \
      --data-location-real=/path/to/few-shot/data | tee results/${exp_name}/train-$now.log

Pre-training settings

Synthetic data generation

We adopt language enhancement strategy only for pre-training setting. Please modify the files (src/LE.py, src/glide/glide_zsl.py) in zero-shot settings for generating synthetic pre-training data.

Pre-training with synthetic data

We recommend using timm codebase for its wonderful implementation for pre-training. For concrete hyper-parameters, please refer to Sec. C.5.3 in our Appendix.

Citing this work

If you find this repo useful for your research, please consider citing our paper:

@article{he2022synthetic,
  title={Is synthetic data from generative models ready for image recognition?},
  author={He, Ruifei and Sun, Shuyang and Yu, Xin and Xue, Chuhui and Zhang, Wenqing and Torr, Philip and Bai, Song and Qi, Xiaojuan},
  journal={arXiv preprint arXiv:2210.07574},
  year={2022}
}

Acknowledgement

We thank the open source code from GLIDE, CLIP, keytotext, Wise-ft, timm, Detectron2, DeiT, MoCo.

More Repositories

1

PAConv

(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Python
270
star
2

ST3D

(CVPR 2021 & T-PAMI 2022) ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection & ST3D++: Denoised Self-training for Unsupervised Domain Adaptation on 3D Object Detection
Python
250
star
3

PLA

(CVPR 2023) PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Python
204
star
4

UHDM

(ECCV2022) This is the official PyTorch implementation of ECCV2022 paper: Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing
Python
174
star
5

SparseKD

(NeurlPS 2022) Towards Efficient 3D Object Detection with Knowledge Distillation
Python
106
star
6

IST-Net

(ICCV2023) IST-Net: Prior-free Category-level Pose Estimation with Implicit Space Transformation
Python
102
star
7

SlotCon

(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
Python
89
star
8

SimGCD

(ICCV 2023) Parametric Classification for Generalized Category Discovery: A Baseline Study
Python
78
star
9

VideoDemoireing

(CVPR 2022) Video Demoireing with Relation-Based Temporal Consistency
Python
68
star
10

DARS

(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.
Python
63
star
11

SPS-Conv

(NeurlPS 2022) Spatial Pruned Sparse Convolution for Efficient 3D Object Detection
Python
62
star
12

CoDet

(NeurIPS2023) CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Python
61
star
13

KDEP

(CVPR2022) Official PyTorch Implementation of KDEP. Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
Python
61
star
14

HybridNeuralRendering

(CVPR 2023) Hybrid Neural Rendering for Large-Scale Scenes with Motion Blur
Python
60
star
15

MarS3D

(CVPR 2023) MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
Python
55
star
16

DODA

(ECCV 2022) DODA: Data-oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation
Python
45
star
17

FS3D

(NeurlPS 2022) Prototypical VoteNet for Few-Shot 3D Point Cloud Object Detection
Python
39
star
18

ResKD

[NeurIPS 2022] Official implementation of the paper "Rethinking Resolution in the Context of Efficient Video Recognition".
Python
31
star
19

SC-GS

[CVPR 2024] Code for SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Python
17
star
20

Hybrid-Occ-SDF

This is the officially implementation of ICCV 2023 paper " Learning A Room with the Occ-SDF Hybrid: Signed Distance Function Mingled with Occupancy Aids Scene Representation"
Python
9
star