• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ensemble Adversarial Training on MNIST

Ensemble Adversarial Training

This repository contains code to reproduce results from the paper:

Ensemble Adversarial Training: Attacks and Defenses
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh and Patrick McDaniel
ArXiv report: https://arxiv.org/abs/1705.07204


REQUIREMENTS

The code was tested with Python 2.7.12, Tensorflow 1.0.1 and Keras 1.2.2.

EXPERIMENTS

We start by training a few simple MNIST models. These are described in mnist.py.

python -m train models/modelA --type=0
python -m train models/modelB --type=1
python -m train models/modelC --type=2
python -m train models/modelD --type=3

Then, we can use (standard) Adversarial Training or Ensemble Adversarial Training (we train for either 6 or 12 epochs in the paper). With Ensemble Adversarial Training, we additionally augment the training data with adversarial examples crafted from external pre-trained models (models A, C and D here):

python -m train_adv models/modelA_adv --type=0 --epochs=12
python -m train_adv models/modelA_ens models/modelA models/modelC models/modelD --type=0 --epochs=12

The accuracy of the models on the MNIST test set can be computed using

python -m simple_eval test [model(s)]

To evaluate robustness to various attacks, we use

python -m simple_eval [attack] [source_model] [target_model(s)] [--parameters (opt)]

The attack can be:

Attack Description Parameters
fgs Standard FGSM eps (the norm of the perturbation)
rand_fgs Our FGSM variant that prepends the gradient computation by a random step eps (the norm of the total perturbation); alpha (the norm of the random perturbation)
ifgs The iterative FGSM eps (the norm of the perturbation); steps (the number of iterative FGSM steps)
CW The Carlini and Wagner attack eps (the norm of the perturbation); kappa (attack confidence)

Note that due to GPU non-determinism, the obtained results may vary by a few percent compared to those reported in the paper. Nevertheless, we consistently observe the following:

  • Standard Adversarial Training performs worse on transferred FGSM examples than on a "direct" FGSM attack on the model due to a gradient masking effect.
  • Our RAND+FGSM attack outperforms the FGSM when applied to any model. The gap is particularly pronounced for the adversarially trained model.
  • Ensemble Adversarial Training is more robust than (standard) adversarial training to transferred examples computed using any of the attacks above.
CONTACT

Questions and suggestions can be sent to [email protected]