Giter VIP home page Giter VIP logo

sadl's Introduction

Guiding Deep Learning System Testing using Surprise Adequacy

DOI

Code release of a paper "Guiding Deep Learning System Testing using Surprise Adequacy"

If you find this paper helpful, consider cite the paper:

@inproceedings{Kim2019aa,
	Author = {Jinhan Kim and Robert Feldt and Shin Yoo},
	Booktitle = {Proceedings of the 41th International Conference on Software Engineering},	
	Pages = {1039-1049},
	Publisher = {IEEE Press},
	Series = {ICSE 2019},
	Title = {Guiding Deep Learning System Testing using Surprise Adequacy},
	Year = {2019}}
}

Introduction

This archive includes code for computing Surprise Adequacy (SA) and Surprise Coverage (SC), which are basic components of the main experiments in the paper. Currently, the "run.py" script contains a simple example that calculates SA and SC of a test set and an adversarial set generated using FGSM method for the MNIST dataset, only considering the last hidden layer (activation_3). Layer selection can be easily changed by modifying layer_names in run.py.

Files and Directories

  • run.py - Script processing SA with a benign dataset and adversarial examples (MNIST and CIFAR-10).
  • sa.py - Tools that fetch activation traces, compute LSA and DSA, and coverage.
  • train_model.py - Model training script for MNIST and CIFAR-10. It keeps the trained models in the "model" directory (code from Ma et al.).
  • model directory - Used for saving models.
  • tmp directory - Used for saving activation traces and prediction arrays.
  • adv directory - Used for saving adversarial examples.

Command-line Options of run.py

  • -d - The subject dataset (either mnist or cifar). Default is mnist.
  • -lsa - If set, computes LSA.
  • -dsa - If set, computes DSA.
  • -target - The name of target input set. Default is fsgm.
  • -save_path - The temporal save path of AT files. Default is tmp directory.
  • -batch_size - Batch size. Default is 128.
  • -var_threshold - Variance threshold. Default is 1e-5.
  • -upper_bound - Upper bound of SA. Default is 2000.
  • -n_bucket - The number of buckets for coverage. Default is 1000.
  • -num_classes - The number of classes in dataset. Default is 10.
  • -is_classification - Set if task is classification problem. Default is True.

Generating Adversarial Examples

We used the framework by Ma et al. to generate various adversarial examples (FGSM, BIM-A, BIM-B, JSMA, and C&W). Please refer to craft_adv_samples.py in the above repository of Ma et al., and put them in the adv directory. For a basic usage example, there is an included adversarial set generated by the FSGM method for MNIST (See file ./adv/adv_mnist_fgsm.npy).

Udacity Self-driving Car Challenge

To reproduce the result of Udacity self-driving car challenge, please refer to the DeepXplore and DeepTest repositories, which contain information about the dataset, models (Dave-2, Chauffeur), and synthetic data generation processes. It might take a few hours to get the dataset and the models due to their sizes.

How to Use

Our implementation is based on Python 3.5.2, Tensorflow 1.9.0, Keras 2.2, Numpy 1.14.5. Details are listed in requirements.txt.

This is a simple example of installation and computing LSA or DSA of a test set and FGSM in MNIST dataset.

# install Python dependencies
pip install -r requirements.txt

# train a model
python train_model.py -d mnist

# calculate LSA, coverage, and ROC-AUC score
python run.py -lsa

# calculate DSA, coverage, and ROC-AUC score
python run.py -dsa

Notes

  • If you encounter ValueError: Input contains NaN, infinity or a value too large for dtype ('float64'). error, you need to increase the variance threshold. Please refer to the configuration details in the paper (Section IV-C).
  • Images were processed by clipping its pixels in between -0.5 and 0.5.
  • If you want to select specific layers, you can modify the layers array in run.py.
  • Coverage may vary depending on the upper bound.
  • For speed-up, use GPU-based tensorflow.
  • All experimental results

References

sadl's People

Contributors

dbr7 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.