• Stars
    star
    357
  • Rank 119,149 (Top 3 %)
  • Language
    Python
  • Created almost 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Provable adversarial robustness at ImageNet scale

Certified Adversarial Robustness via Randomized Smoothing

This repository contains code and trained models for the paper Certified Adversarial Robustness via Randomized Smoothing by Jeremy Cohen, Elan Rosenfeld, and Zico Kolter.

Randomized smoothing is a provable adversarial defense in L2 norm which scales to ImageNet. It's also SOTA on the smaller datasets like CIFAR-10 and SVHN where other provable L2-robust classifiers are viable.

How does it work?

First, you train a neural network f with Gaussian data augmentation at variance Οƒ2. Then you leverage f to create a new, "smoothed" classifier g, defined as follows: g(x) returns the class which f is most likely to return when x is corrupted by isotropic Gaussian noise with variance Οƒ2.

For example, let x be the image above on the left. Suppose that when f classifies x corrupted by Gaussian noise (the GIF on the right), f returns "panda" 98% of the time and "gibbon" 2% of the time. Then the prediction of g at x is defined to be "panda."

Interestingly, g is provably robust within an L2 norm ball around x, in the sense that for any perturbation Ξ΄ with sufficiently small L2 norm, g(x+Ξ΄) is guaranteed to be "panda." In this particular example, g will be robust around x within an L2 radius of Οƒ Ξ¦-1(0.98) β‰ˆ 2.05 Οƒ, where Ξ¦-1 is the inverse CDF of the standard normal distribution.

In general, suppose that when f classifies noisy corruptions of x, the class "panda" is returned with probability p (with p > 0.5). Then g is guaranteed to classify "panda" within an L2 ball around x of radius Οƒ Ξ¦-1(p).

What's the intuition behind this bound?

We know that f classifies noisy corruptions of x as "panda" with probability 0.98. An equivalent way of phrasing this that the Gaussian distribution N(x, Οƒ2I) puts measure 0.98 on the decision region of class "panda," defined as the set {x': f(x') = "panda"}. You can prove that no matter how the decision regions of f are "shaped", for any Ξ΄ with ||Ξ΄||2 < Οƒ Ξ¦-1(0.98), the translated Gaussian N(x+Ξ΄, Οƒ2I) is guaranteed to put measure > 0.5 on the decision region of class "panda," implying that g(x+Ξ΄) = "panda."

Wait a minute...

There's one catch: it's not possible to actually evaluate the smoothed classifer g. This is because it's not possible to exactly compute the probability distribution over the classes when f's input is corrupted by Gaussian noise. For the same reason, it's not possible to exactly compute the radius in which g is provably robust.

Instead, we give Monte Carlo algorithms for both

  1. prediction: evaluating g(x)
  2. certification: computing the L2 radius in which g is robust around x

which are guaranteed to return a correct answer with arbitrarily high probability.

The prediction algorithm does this by abstaining from making any prediction when it's a "close call," e.g. if 510 noisy corruptions of x were classified as "panda" and 490 were classified as "gibbon." Prediction is pretty cheap, since you don't need to use very many samples. For example, with our ImageNet classifier, making a prediction using 1000 samples took 1.5 seconds, and our classifier abstained 3% of the time.

On the other hand, certification is pretty slow, since you need a lot of samples to say with high probability that the measure under N(x, Οƒ2I) of the "panda" decision region is close to 1. In our experiments we used 100,000 samples, so making each certification took 150 seconds.

Related work

Randomized smoothing was first proposed in Certified Robustness to Adversarial Examples with Differential Privacy and later improved upon in Second-Order Adversarial Attack and Certified Robustness. We simply tightened the analysis and showed that it outperforms the other provably L2-robust classifiers that have been proposed in the literature.

ImageNet results

We constructed three randomized smoothing classifiers for ImageNet, with the hyperparameter Οƒ set to 0.25, 0.50, and 1.00. Here's what the panda image looks like under these three noise levels:

The plot below shows the certified top-1 accuracy at various radii of these three classifiers. The "certified accuracy" of a classifier g at radius r is defined as test set accuracy that g will provably attain under any possible adversarial attack with L2 norm less than r. As you can see, the hyperparameter Οƒ controls a robustness/accuracy tradeoff: when Οƒ is high, the standard accuracy is lower, but the classifier's correct predictions are robust within larger radii.

To put these numbers in context: on ImageNet, random guessing would achieve a top-1 accuracy of 0.001. A perturbation with L2 norm of 1.0 could change one pixel by 255, ten pixels by 80, 100 pixels by 25, or 1000 pixels by 8.

Here's the same data in tabular form. The best Οƒ for each radius is denoted with an asterisk.

r = 0.0 r = 0.5 r = 1.0 r = 1.5 r = 2.0 r = 2.5 r = 3.0
Οƒ = 0.25 0.67* 0.49* 0.00 0.00 0.00 0.00 0.00
Οƒ = 0.50 0.57 0.46 0.38* 0.28* 0.00 0.00 0.00
Οƒ = 1.00 0.44 0.38 0.33 0.26 0.19* 0.15* 0.12*

This repository

Outline

The contents of this repository are as follows:

  • code/ contains the code for our experiments.
  • data/ contains the raw data from our experiments.
  • analysis/ contains the plots and tables, based on the contents of data, that are shown in our paper.

If you'd like to run our code, you need to download our models from here and then move the directory models into the root directory of this repo.

Smoothed classifiers

Randomized smoothing is implemented in the Smooth class in core.py.

  • To instantiate a smoothed clasifier g, use the constructor:

def __init__(self, base_classifier: torch.nn.Module, num_classes: int, sigma: float):

where base_classifier is a PyTorch module that implements f, num_classes is the number of classes in the output space, and sigma is the noise hyperparameter Οƒ

  • To make a prediction at an input x, call:

def predict(self, x: torch.tensor, n: int, alpha: float, batch_size: int) -> int:

where n is the number of Monte Carlo samples and alpha is the confidence level. This function will either (1) return -1 to abstain or (2) return a class which equals g(x) with probability at least 1 - alpha.

  • To compute a radius in which g is robust around an input x, call:

def certify(self, x: torch.tensor, n0: int, n: int, alpha: float, batch_size: int) -> (int, float):

where n0 is the number of Monte Carlo samples to use for selection (see the paper), n is the number of Monte Carlo samples to use for estimation, and alpha is the confidence level. This function will either return the pair (-1, 0.0) to abstain, or return a pair (prediction, radius). The probability that certify() will return a class not equal to g(x) is no greater than alpha. Another way to say this is that with probability at least 1 - alpha, certify() will either abstain or return g(x).

Scripts

  • The program train.py trains a base classifier with Gaussian data augmentation:

python code/train.py imagenet resnet50 model_output_dir --batch 400 --noise 0.50

will train a ResNet-50 on ImageNet under Gaussian data augmentation with Οƒ=0.50.

  • The program predict.py makes predictions using g on a bunch of inputs. For example,

python code/predict.py imagenet model_output_dir/checkpoint.pth.tar 0.50 prediction_outupt --alpha 0.001 --N 1000 --skip 100 --batch 400

will load the base classifier saved at model_output_dir/checkpoint.pth.tar, smooth it using noise level Οƒ=0.50, and classify every 100-th image from the ImageNet test set with parameters N=1000 and alpha=0.001.

  • The program certify.py certifies the robustness of g on bunch of inputs. For example,

python code/certify.py imagenet model_output_dir/checkpoint.pth.tar 0.50 certification_output --alpha 0.001 --N0 100 --N 100000 --skip 100 --batch 400

will load the base classifier saved at model_output_dir/checkpoint.pth.tar, smooth it using noise level Οƒ=0.50, and certify every 100-th image from the ImageNet test set with parameters N0=100, N=100000 and alpha=0.001.

  • The program visualize.py outputs pictures of noisy examples. For example,

python code/visualize.py imagenet visualize_output 100 0.0 0.25 0.5 1.0

will visualize noisy corruptions of the 100-th image from the ImageNet test set with noise levels Οƒ=0.0, Οƒ=0.25, Οƒ=0.50, and Οƒ=1.00.

  • The program analyze.py generates all of certified accuracy plots and tables that appeared in the paper.

Finally, we note that this file describes exactly how to reproduce our experiments from the paper.

We're not officially releasing code for the experiments where we compared randomized smoothing against the baselines, since that code involved a number of hacks, but feel free to get in touch if you'd like to see that code.

Getting started

  1. Clone this repository: git clone [email protected]:locuslab/smoothing.git

  2. Install the dependencies:

conda create -n smoothing
conda activate smoothing
# below is for linux, with CUDA 10; see https://pytorch.org/ for the correct command for your system
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch 
conda install scipy pandas statsmodels matplotlib seaborn
pip install setGPU
  1. Download our trained models from here.

  2. If you want to run ImageNet experiments, obtain a copy of ImageNet and preprocess the val directory to look like the train directory by running this script. Finally, set the environment variable IMAGENET_DIR to the directory where ImageNet is located.

  3. To get the hang of things, try running this command, which will certify the robustness of one of our pretrained CIFAR-10 models on the CIFAR test set.

model="models/cifar10/resnet110/noise_0.25/checkpoint.pth.tar"
output="???"
python code/certify.py cifar10 $model 0.25 $output --skip 20 --batch 400

where ??? is your desired output file.

More Repositories

1

TCN

Sequence modeling benchmarks and temporal convolutional networks
Python
4,122
star
2

convmixer

Implementation of ConvMixer for "Patches Are All You Need? 🀷"
Python
1,059
star
3

mpc.pytorch

A fast and differentiable model predictive control (MPC) solver for PyTorch.
Python
865
star
4

deq

[NeurIPS'19] Deep Equilibrium Models
Python
719
star
5

qpth

A fast and differentiable QP solver for PyTorch.
Python
673
star
6

wanda

A simple and effective LLM pruning approach.
Python
602
star
7

optnet

OptNet: Differentiable Optimization as a Layer in Neural Networks
Python
507
star
8

trellisnet

[ICLR'19] Trellis Networks for Sequence Modeling
Python
473
star
9

fast_adversarial

[ICLR 2020] A repository for extremely fast adversarial training using FGSM
Python
422
star
10

SATNet

Bridging deep learning and logical reasoning using a differentiable satisfiability solver.
Python
404
star
11

convex_adversarial

A method for training neural networks that are provably robust to adversarial attacks.
Python
378
star
12

pytorch_fft

PyTorch wrapper for FFTs
Python
313
star
13

lcp-physics

A differentiable LCP physics engine in PyTorch.
Python
292
star
14

icnn

Input Convex Neural Networks
Python
274
star
15

differentiable-mpc

Python
239
star
16

mdeq

[NeurIPS'20] Multiscale Deep Equilibrium Models
Python
232
star
17

e2e-model-learning

Task-based end-to-end model learning in stochastic optimization
Python
195
star
18

ect

Consistency Models Made Easy
Python
188
star
19

deq-flow

[CVPR 2022] Deep Equilibrium Optical Flow Estimation
Python
177
star
20

robust_overfitting

Python
153
star
21

DC3

DC3: A Learning Method for Optimization with Hard Constraints
Python
133
star
22

cfd-gcn

Python
113
star
23

massive-activations

Code accompanying the paper "Massive Activations in Large Language Models"
Python
95
star
24

tofu

Landing Page for TOFU
Python
86
star
25

FLYP

Code for Finetune like you pretrain: Improved finetuning of zero-shot vision models
Python
85
star
26

projected_sinkhorn

Python
85
star
27

torchdeq

Modern Fixed Point Systems using Pytorch
Python
74
star
28

perturbation_learning

Learning perturbation sets for robust machine learning
Python
64
star
29

scaling_laws_data_filtering

Python
59
star
30

lml

The Limited Multi-Label Projection Layer
Python
58
star
31

deq-ddim

Python
58
star
32

chatllm-vscode

TypeScript
58
star
33

edge-of-stability

Python
55
star
34

robust-nn-control

Enforcing robust control guarantees within neural network policies
Python
52
star
35

monotone_op_net

Monotone operator equilibrium networks
Jupyter Notebook
51
star
36

orthogonal-convolutions

Implementations of orthogonal and semi-orthogonal convolutions in the Fourier domain with applications to adversarial robustness
Jupyter Notebook
41
star
37

convmixer-cifar10

Simple CIFAR-10 classification with ConvMixer
Python
40
star
38

newton_admm

A Newton ADMM based solver for Cone programming.
Python
38
star
39

tta_conjugate

Test-Time Adaptation via Conjugate Pseudo-Labels
Python
36
star
40

T-MARS

Code for T-MARS data filtering
Python
34
star
41

stable_dynamics

Companion code to "Learning Stable Deep Dynamics Models" (Manek and Kolter, 2019)
Jupyter Notebook
31
star
42

ImpSq

Implicit^2: Implicit model for implicit neural representations
Python
27
star
43

robust_union

[ICML'20] Multi Steepest Descent (MSD) for robustness against the union of multiple perturbation models.
Python
25
star
44

breaking-poisoned-classifier

Code for paper "Poisoned classifiers are not only backdoored, they are fundamentally broken"
Jupyter Notebook
24
star
45

diffusion-model-hallucination

Python
24
star
46

acr-memorization

Python
24
star
47

gradient_regularized_gan

Code for "Gradient descent GAN optimization is locally stable"
Python
21
star
48

get

Generative Equilibrium Transformer
Python
17
star
49

smoothinv

Single Image Backdoor Inversion via Robust Smoothed Classifiers
Python
16
star
50

intermediate_robustness

Python
16
star
51

mixing

The Mixing method: coordinate descent for low-rank semidefinite programming
C
15
star
52

dreaml

dreaml: dynamic reactive machine learning
JavaScript
12
star
53

ase

Analogous Safe-state Exploration (ASE) is an algorithm for provably safe and optimal exploration in MDPs with unknown, stochastic dynamics.
Python
11
star
54

sdp_clustering

Jupyter Notebook
11
star
55

JIIO-DEQ

Efficient joint input optimization and inference with DEQ
Python
10
star
56

uniform-convergence-NeurIPS19

The code for the NeurIPS19 paper and blog on "Uniform convergence may be unable to explain generalization in deep learning".
Jupyter Notebook
10
star
57

sdp_mrf

Jupyter Notebook
3
star
58

mixsat

Low-rank semidefinite programming for the MAX2SAT problem
C
3
star
59

MonotoneDBM

Python
2
star
60

lipschitz_mondeq

Jupyter Notebook
1
star
61

mugrade

Python
1
star