• Stars
    star
    134
  • Rank 270,967 (Top 6 %)
  • Language
    Python
  • Created over 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The backgrounds challenge is a public dataset challenge for creating more background-robust models. This repository contains test datasets of ImageNet-9 (IN-9) with different amounts of background and foreground signal, which you can use to measure the extent to which your models rely on image backgrounds. These are described further in the paper: "Noise or Signal: The Role of Image Backgrounds in Object Recognition" (preprint, blog).

Backgrounds Challenge

Deep computer vision models rely on both foreground objects and image backgrounds. Even when the correct foreground object is present, such models often make incorrect predictions when the image background is changed, and they are especially vulnerable to adversarially chosen backgrounds. For example, the the official pre-trained PyTorch ResNet-50 has an accuracy of 22% when evaluated against adversarial backgrounds on ImageNet-9 (for reference, a model that always predicts "dog" has an accuracy of 11%).

Thus, the goal of this challenge is to understand how background-robust models can be. Specifically, we assess models by their accuracy on images containing foregrounds superimposed on backgrounds which are adversarially chosen from the test set. We encourage researchers to use this challenge to benchmark progress on background-robustness, which can be important for determining models' out of distribution performance. We will maintain a leaderboard of top submissions.

Examples from the insect class of the most adversarial backgrounds for a model. The number above each image represents the proportion of non-insect foregrounds that can be fooled by these backgrounds.

Backgrounds Challenge Leaderboard

Model Reference Challenge
Accuracy
Clean Accuracy
(on IN-9)
Download Link
ResNet-50 (initial entry) 22.3% 95.6% Official Pytorch Model
ResNet-50 (IN-9L) (initial entry) 12.0% 96.4% Download

Running the Backgrounds Challenge Evaluation

To evaluate your model against adversarial backgrounds, you will need to do the following:

  1. Download and unzip the datasets included in the release.
  2. Run python challenge_eval.py --checkpoint '/PATH/TO/CHECKPOINT' --data-path '/PATH/TO/DATA'.

The model checkpoint that the script takes as input must be one of the following.

  1. A 1000-class ImageNet classifier.
  2. A 9-class IN-9 classifier.

See python challenge_eval.py -h for how to toggle between the two.

Note: evaluation requires PyTorch to be installed with CUDA support.

Submitting a Model

We invite any interested researchers to submit models and results by submitting a pull request with your model checkpoint included. The most successful models will be listed in the leaderboard above. We have already included baseline pre-trained models for reference.

Testing your model on ImageNet-9 and its variations

All variations of IN-9; each variation contains different amounts of foreground and background signal.

ImageNet-9 and its variations can be useful for measuring the impact of backgrounds on model decision makingβ€”see the paper for more details. You can test your own models on IN-9 and its variations as follows.

  1. Download and unzip the datasets included in the release.
  2. Run, for example, python in9_eval.py --eval-dataset 'mixed_same' --checkpoint '/PATH/TO/CHECKPOINT' --data-path '/PATH/TO/DATA'. You can replace mixed_same with whichever variation of IN-9 you are interested in.

Just like in the challenge, the input can either be a 1000-class ImageNet model or a 9-class IN-9 model.

There is no leaderboard or challenge for these datasets, but we encourage researchers to use these datasets to measure the role of image background in their models' decision making. Furthermore, we include a table of results for common pre-trained models and various models discussed in the paper.

Test Accuracy Results on ImageNet-9

Model Original Mixed-Same Mixed-Rand BG-Gap
VGG16-BN 94.3% 83.6% 73.4% 10.2%
ResNet-50 95.6% 86.2% 78.9% 7.3%
ResNet-152 96.7% 89.3% 83.5% 5.8%
ResNet-50 (IN-9L) 96.4% 89.8% 75.6% 14.2%
ResNet-50 (IN-9/Mixed-Rand) 73.3% 71.5% 71.3% 0.2%

The BG-Gap, or the difference between Mixed-Same and Mixed-Rand, measures the impact of background correlations in the presence of correct-labeled foregrounds.

Training Data

Updated June 24, 2020: We are releasing all training data that we used to train models described in the paper. The download links are as follows: IN-9L, Mixed-Next, Mixed-Rand, Mixed-Same, No-FG, Only-BG-B, Only-BG-T, Only-FG, Original.

Each downloadable dataset contains both training data and validation data generated in the same way as the training data (that is, with no manual cleaning); this validation data can be safely ignored. The test data in the release should be used instead.

Citation

If you find these datasets useful in your research, please consider citing:

@article{xiao2020noise,
  title={Noise or Signal: The Role of Image Backgrounds in Object Recognition},
  author={Kai Xiao and Logan Engstrom and Andrew Ilyas and Aleksander Madry},
  journal={ArXiv preprint arXiv:2006.09994},
  year={2020}
}

More Repositories

1

robustness

A library for experimenting with, training and evaluating neural networks, with a focus on adversarial robustness.
Jupyter Notebook
905
star
2

mnist_challenge

A challenge to explore adversarial robustness of neural networks on MNIST.
Python
720
star
3

cifar10_challenge

A challenge to explore adversarial robustness of neural networks on CIFAR10.
Python
488
star
4

photoguard

Raising the Cost of Malicious AI-Powered Image Editing
Jupyter Notebook
419
star
5

constructed-datasets

Datasets for the paper "Adversarial Examples are not Bugs, They Are Features"
178
star
6

trak

A fast, effective data attribution method for neural networks in PyTorch
Python
169
star
7

robust_representations

Code for "Learning Perceptually-Aligned Representations via Adversarial Robustness"
Jupyter Notebook
158
star
8

robustness_applications

Notebooks for reproducing the paper "Computer Vision with a Single (Robust) Classifier"
Jupyter Notebook
125
star
9

implementation-matters

Python
104
star
10

EditingClassifiers

Python
95
star
11

robust-features-code

Code for "Robustness May Be at Odds with Accuracy"
Jupyter Notebook
91
star
12

datamodels-data

Data for "Datamodels: Predicting Predictions with Training Data"
Python
64
star
13

blackbox-bandits

Code for "Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors"
Python
61
star
14

BREEDS-Benchmarks

Jupyter Notebook
50
star
15

cox

A lightweight experimental logging library
Python
50
star
16

adversarial_spatial

Investigating the robustness of state-of-the-art CNN architectures to simple spatial transformations.
Python
49
star
17

modeldiff

ModelDiff: A Framework for Comparing Learning Algorithms
Jupyter Notebook
44
star
18

failure-directions

Distilling Model Failures as Directions in Latent Space
Jupyter Notebook
42
star
19

smoothed-vit

Certified Patch Robustness via Smoothed Vision Transformers
Python
41
star
20

label-consistent-backdoor-code

Code for "Label-Consistent Backdoor Attacks"
Python
40
star
21

dataset-interfaces

Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation
Jupyter Notebook
39
star
22

DebuggableDeepNetworks

Jupyter Notebook
37
star
23

data-transfer

Python
31
star
24

ImageNetMultiLabel

Fine-grained ImageNet annotations
Jupyter Notebook
28
star
25

relu_stable

Python
26
star
26

spatial-pytorch

Codebase for "Exploring the Landscape of Spatial Robustness" (ICML'19, https://arxiv.org/abs/1712.02779).
Jupyter Notebook
26
star
27

dataset-replication-analysis

Jupyter Notebook
25
star
28

backdoor_data_poisoning

Python
25
star
29

glm_saga

Minimal, standalone library for solving GLMs in PyTorch
Python
23
star
30

AdvEx_Tutorial

Jupyter Notebook
14
star
31

rethinking-backdoor-attacks

Python
14
star
32

bias-transfer

Python
13
star
33

robustness_lib

Python
12
star
34

journey-TRAK

Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"
Python
12
star
35

datamodels

Python
12
star
36

rla

Residue Level Alignment
Python
12
star
37

copriors

Combining Diverse Feature Priors
Python
8
star
38

missingness

Code for our ICLR 2022 paper "Missingness Bias in Model Debugging"
Jupyter Notebook
5
star
39

fast_l1

Jupyter Notebook
3
star
40

pytorch-lightning-imagenet

Python
3
star
41

post--adv-discussion

HTML
2
star
42

AIaaS_Supply_Chains

Dataset and overview
2
star
43

pytorch-example-imagenet

Python
1
star
44

mnist_challenge_models

1
star
45

robust_model_colab

JavaScript
1
star