• Stars
    star
    180
  • Rank 211,706 (Top 5 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created about 5 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code, data and benchmark from the paper "Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming" (NeurIPS 2019 ML4AD)

Robust Detection Benchmark

This repository contains code, data and a benchmark leaderboard from the paper "Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming" by Claudio Michaelis*, Benjamin Mitzkus*, Robert Geirhos*, Evgenia Rusak*, Oliver Bringmann, Alexander S. Ecker, Matthias Bethge & Wieland Brendel.

The core idea is shown here: Real-world applications need to be able to cope with adverse outdoor hazards such as fog, frost, snow (and the occasional dragonfire). The paper benchmarks object detection models on their corruption resilience across a broad range of corruption types.

traffic hazards

Structure & Overview

This repository serves two purposes:

  1. Enabling reproducibility. All result figures from the directory figures/ can be generated by executing the analysis notebook in data-analysis/ which uses the data from raw-data/.

  2. Hosting the Robust Detection Benchmark (more information below).

Additionally, we provide three separate modules with functionality that we use in the paper and that we hope may be useful for your own research or applications. They are listed here:

Stylize arbitrary datasets: https://github.com/bethgelab/stylize-datasets

Corrupt arbitrary datasets: https://github.com/bethgelab/imagecorruptions

Object detection: https://github.com/bethgelab/mmdetection

Robust Detection Benchmark

This section shows the most important results on our three benchmark datasets: COCO-C, Pascal-C and Cityscapes-C. All models have a fixed ResNet 50 backbone to put the focus on improvements in detection robustness. For more results including ones with different backbones and instance segmentation results please have a look at the comprehensive results table.

Results are ranked by their mean performance under corruption (named mCE in the paper). If you achieve state-of-the-art robustness on any of the three datasets with your approach, please open a pull request where you add the results in the table below. We strongly encourage to use backbone listed in the table below, otherwise robustness gains cannot be disentangled from improved overall performance. In your pull request, you will need to indicate the three metrics P, rPC and mPC (as defined in the paper); mPC will then be used to rank your results.

Evaluation details

Pascal VOC: Results are evaluated on Pascal VOC 2007 test using the AP50 metric.
COCO: Results are evaluated on COCO 2017 val using the mAP50 metric.
Cityscapes: Results are evaluated on Cityscapes val using the mAP50 metric.

Leaderboard

Pascal-C

Rank Method Reference Model Backbone clean P [AP50] corrupted mPC [AP50] relative rPC [%]
1 stylizing training data Michaelis et al. 2019 Faster R-CNN R-50-FPN 80.4 56.2 69.9
- baseline Michaelis et al. 2019 Faster R-CNN R-50-FPN 80.5 48.6 60.4

COCO-C

Rank Method Reference Model Backbone clean P [AP] corrupted mPC [AP] relative rPC [%]
1 stylizing training data Michaelis et al. 2019 Faster R-CNN R-50-FPN 34.6 20.4 58.9
- baseline Michaelis et al. 2019 Faster R-CNN R-50-FPN 36.3 18.2 50.2

Cityscapes-C

Rank Method Reference Model Backbone clean P [AP] corrupted mPC [AP] relative rPC [%]
1 stylizing training data Michaelis et al. 2019 Faster R-CNN R-50-FPN 36.3 17.2 47.4
- baseline Michaelis et al. 2019 Faster R-CNN R-50-FPN 36.4 12.2 33.4

Citation

If you use our code or the benchmark, please consider citing:

@article{michaelis2019dragon,
  title={Benchmarking Robustness in Object Detection: 
    Autonomous Driving when Winter is Coming},
  author={Michaelis, Claudio and Mitzkus, Benjamin and 
    Geirhos, Robert and Rusak, Evgenia and 
    Bringmann, Oliver and Ecker, Alexander S. and 
    Bethge, Matthias and Brendel, Wieland},
  journal={arXiv preprint arXiv:1907.07484},
  year={2019}
}

More Repositories

1

foolbox

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX
Python
2,686
star
2

imagecorruptions

Python package to corrupt arbitrary images.
Python
400
star
3

siamese-mask-rcnn

Siamese Mask R-CNN model for one-shot instance segmentation
Jupyter Notebook
346
star
4

model-vs-human

Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)
Python
330
star
5

stylize-datasets

A script that applies the AdaIN style transfer method to arbitrary datasets
Python
153
star
6

robustness

Robustness and adaptation of ImageNet scale models. Pre-Release, stay tuned for updates.
Python
125
star
7

openimages2coco

Convert Open Images annotations into MS Coco format to make it a drop in replacement
Jupyter Notebook
111
star
8

slow_disentanglement

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding
Jupyter Notebook
70
star
9

frequency_determines_performance

Code for the paper: "No Zero-Shot Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance"
Jupyter Notebook
68
star
10

AnalysisBySynthesis

Adversarially Robust Neural Network on MNIST.
Python
64
star
11

game-of-noise

Trained model weights, training and evaluation code from the paper "A simple way to make neural networks robust against diverse image corruptions"
Python
62
star
12

decompose

Blind source separation based on the probabilistic tensor factorisation framework
Python
43
star
13

adversarial-vision-challenge

NIPS Adversarial Vision Challenge
Python
41
star
14

InDomainGeneralizationBenchmark

Python
32
star
15

robust-vision-benchmark

Robust Vision Benchmark
Python
21
star
16

docker

Information and scripts to run and develop the Bethge Lab Docker containers
Makefile
20
star
17

slurm-monitoring-public

Monitor your high performance infrastructure configured over slurm using TIG stack
Python
17
star
18

google_scholar_crawler

Crawl Google scholar publications and authors
Python
12
star
19

magapi-wrapper

Wrapper around Microsoft Academic Knowledge API to retrieve MAG data
Python
10
star
20

testing_visualizations

Code for the paper " Exemplary Natural Images Explain CNN Activations Better than Feature Visualizations"
Python
10
star
21

DataTypeIdentification

Code for the ICLR'24 paper: "Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models"
10
star
22

docker-deeplearning

Development of new unified docker container (WIP)
Python
9
star
23

CiteME

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
Python
9
star
24

notorious_difficulty_of_comparing_human_and_machine_perception

Code for the three case studies: Closed Contour Detection, Synthetic Visual Reasoning Test, Recognition Gap
Jupyter Notebook
8
star
25

tools

Shell
6
star
26

docker-jupyter-deeplearning

Docker Image with Jupyter for Deep Learning (Caffe, Theano, Lasagne, Keras)
6
star
27

lifelong-benchmarks

Benchmarks introduced in the paper: "Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"
6
star
28

sort-and-search

Code for the paper: "Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress"
Python
5
star
29

docker-xserver

Docker Image with Xserver, OpenBLAS and correct user settings
Shell
2
star
30

gym-Atari-SpaceInvaders-V0

Python
1
star
31

bwki-weekly-tasks

BWKI Task of the week
Jupyter Notebook
1
star