Black-Box Attacks on Neural Networks

Abstract

The paper[1] discusses an algorithm which allows us to craft an adversarial attack on black box networks (for classification). The attacker has no knowledge of internals or training data of the victim.

The solution presented treats the black box as an oracle and gets the output for several inputs and trains a substitute model on this data. Then adversarial samples are created by a white box attack on this substituted model. These adversarial samples work well to attack on the black box.

In this project, by the time of midterm review, we implemented this algorithm on MNIST dataset. Now, we have tried to implement this on object detection on COCO dataset. We separate the bounding boxes of object detected as new images, create adversarial example from them, and stitch the adversarial examples into the original image.

Requirements

python >=3.5 -> python == 3.10
numpy
torch
torchvision
matplotlib
tensorflow -> delete
tensorboard -> delete
terminaltables -> delete
pillow
tqdm
libtiff -> changed to tifffile

Setup Instructions

Install the requirements
Clone This directory

git clone https://github.com/darth-c0d3r/black_box_attacks

Get COCO dataset

For attack on object detection

Create substitute model

python3 main_script.py --yolo

It asks to save the model. Give it a proper name. The model is saved in the folder saved_models/

Create adversarial samples

python3 main_script.py --adv

It asks for which substitute model to use, num_samples to be generated. This generates the adversarial samples and stores them in directory adv_samples/

Stitch the adversarial examples generated to original image

python3 main_script.py --stitch

You can see the images in stitched_images/ directory

Test them with black box model

python3 main_script.py --yolotest

For attack on MNIST dataset

cd black_box_attack_classification

Create black box model

python3 main_script.py --bb

It asks to save the model. Give it a proper name. The mode lis saved in the folder saved_models/

Create substitute model

python3 main_script.py --sub

It asks which black box model to use. Give it the model name from saved_models/. It asks to save the model. Give it a proper name. The mode lis saved in the folder saved_models/

Create adversarial samples

python3 main_script.py --adv

It asks for which substitute model to use, num_samples to be generated. This generates the adversarial samples and stores them in directory adv_samples/

Test them with black box model

python3 main_script.py --test

Results

Additional Details

* dataset.py

This file creates different datasets.

* main_script.py

This file contains various options to run different steps in the project

* oracle.py

This file gets output from the oracle for a given input

* predict.py

This gets predictions from the black box for inputs given

* model.py

This creates model

* stitch_photos.py

This contains code to stitch the adversarial sample generated to original photo

* train_substitute.py:

This file implements the Substitute DNN training algorithm given in paper[1].
For oracle Õ, a maximum number max_ρ of substitute training epochs, a substitute architecture F and initial training set S₀.
Input: Õ, max_ρ , S₀ , λ
1: Define architecture F
2: for ρ ∈ 0 .. max_ρ − 1 do
3: D ← {(x, Õ(x)) : x ∈ S_ρ} // Label the substitute training
4: 0_F ← train(F, D) // Train F on D to evaluate parameters θ_F
5: S_(ρ+1) ← {x + λ · sgn(J_F [Õ(x)]) : x ∈ S_ρ} ∪ S_ρ // Perform Jacobian-based dataset augmentation
6: end for
7: return θ_F

The function create_dataset() creates dataset out of the samples generated and augment_dataset() function augments it to the current dataset.

* train.py

This file trains the model

* utilities.py

This file contains helper functions

* whitebox.py

This file creates adversarial samples based on the white box (substitute) model

References

Papers

Pre-trained model used in object detection

Minimal PyTorch implementation of YOLOv3

Dataset used

COCO 2017 Val images

kaito25atugich / black-box-attacks Goto Github PK

black-box-attacks's Introduction