NeuraL-Coverage

Research Artifact of ICSE 2023 Paper: Revisiting Neuron Coverage for DNN Testing: A Layer-Wise and Distribution-Aware Criterion

Preprint: https://arxiv.org/pdf/2112.01955.pdf

Implementations

This repo implements the NLC proposed in our paper and previous neuron coverage criteria (optimized if possible), including

Each criterion is implemented as one Python class in coverage.py.

[1] DeepXplore: Automated whitebox testing of deep learning systems, SOSP 2017.
[2] DeepGauge: Comprehensive and multi granularity testing criteria for gauging the robustness of deep learning systems, ASE 2018.
[3] Tensorfuzz: Debugging neural networks with coverage-guided fuzzing, ICML 2019.
[4] Guiding deep learning system testing using surprise adequacy, ICSE 2019.
[5] Reducing dnn labelling cost using surprise adequacy: An industrial case study for autonomous driving, FSE Industry Track 2020.

Installation

Build from source code

git clone https://github.com/Yuanyuan-Yuan/NeuraL-Coverage
cd NeuraL-Coverage
pip install -r requirements.txt

Model & Dataset

Pretrained models: please see MODEL.
Datasets: please see DATASET.

Download pretrained_models, datasets, and adversarial_examples folders here.

Getting Started

import torch
# Implemented using Pytorch

import tool
import coverage

# 0. Get layer size in model
input_size = (1, image_channel, image_size, image_size)
random_input = torch.randn(input_size).to(device)
layer_size_dict = tool.get_layer_output_sizes(model, random_input)

# 1. Initialization
# `hyper` denotes the hyper-paramter of a criterion;
# set `hyper` as None if a criterion is hyper-paramter free (e.g., NLC).
criterion = coverage.NLC(model, layer_size_dict, hyper=None)
# KMNC/NBC/SNAC/LSC/DSC/MDSC requires training data statistics of the tested model,
# which is implemented in `build`. `train_loader` can be a DataLoader object in Pytorch or a list of data samples.
# For other criteria, `build` function is empty.
criterion.build(train_loader)

# 2. Calculation
# `test_loader` stores all test inputs; it can be a DataLoader object in Pytorch or a list of data samples.
criterion.assess(test_loader)
# If test inputs are gradually given from a data stream (e.g., in fuzzing), then calculate the coverage as the following way.
for data in data_stream:
    criterion.step(data)

# 3. Result
# The following instruction assigns the current coverage value to `cov`.
cov = criterion.current

Experiments

After prepring all data and pretrained models, you should first set these paths in constants.py.

Diversity of Test Suites

Discriminative (Image) Model

python eval_diversity_image.py --model resnet50 --dataset CIFAR10 --criterion NC --hyper 0.75

--model - The tested DNN.
chocies = [resnet50, vgg16_bn, mobilenet_v2]
--dataset - Training dataset of the tested DNN. Test suites are generated using test split of this dataset.
choices = [CIFAR10, ImageNet]
--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Discriminative (Text) Model

python eval_diversity_text.py --criterion NC --hyper 0.75

--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Generative Model

Our tested generative model is BigGAN. We reuse the codebase of the official implementation and hardcode some parameters; see BigGAN-projects/CIFAR10 and BigGAN-projects/ImageNet.

Since we directly insert the BigGAN project path into system path, passing arguments to eval_diversity_gen.py in bash has conflicts with BigGAN projects. Therefore, we recommend first setting the following arguments in eval_diversity_gen.py and then run python eval_diversity_gen.py.

Of course, this should be implemented in a more elegant way...🫠 I will do it later.

--criterion - The used coverage criterion.
choices = [NC, KMNC, NBC, SNAC, TKNC, TKNP, CC, LSC, DSC, MDSC, NLC]
--hyper - The hyper-parameter of the criterion. None if the criterion does not have hyper-paramater (i.e., NLC, SNAC, NBC).

Fault-Revealing Capability of Test Suites

python eval_fault_revealing.py --dataset CIFAR10 --model resnet50 --criterion NC --hyper 0.75 --AE PGD --split test

--AE - AE generation algorithm.
choices = [PGD, CW]
--split - Which split of the dataset to generate AEs.
choices = [train, test]

Guiding Input Mutation in DNN Testing

python fuzz.py --dataset CIFAR10 --model resnet50 --criterion NC

For random mutation (i.e., without any criterion as objective), run

python fuzz_rand.py --dataset CIFAR10 --model resnet50

Yuanyuan-Yuan/NeuraL-Coverage

Yuanyuan-Yuan

Reviews

Repository Details