The paper[1] discusses an algorithm which allows us to craft an adversarial attack on black box networks (for classification). The attacker has no knowledge of internals or training data of the victim.
The solution presented treats the black box as an oracle and gets the output for several inputs and trains a substitute model on this data. Then adversarial samples are created by a white box attack on this substituted model. These adversarial samples work well to attack on the black box.
In this project, by the time of midterm review, we implemented this algorithm on MNIST dataset. Now, we have tried to implement this on object detection on COCO dataset.
We separate the bounding boxes of object detected as new images, create adversarial example from them, and stitch the adversarial examples into the original image.
python >=3.5 -> python == 3.10
numpy
torch
torchvision
matplotlib
tensorflow -> delete
tensorboard -> delete
terminaltables -> delete
pillow
tqdm
libtiff -> changed to tifffile
- Install the requirements
- Clone This directory
git clone https://github.com/darth-c0d3r/black_box_attacks
- Get COCO dataset
- Create substitute model
python3 main_script.py --yolo
It asks to save the model. Give it a proper name. The model is saved in the folder saved_models/
- Create adversarial samples
python3 main_script.py --adv
It asks for which substitute model to use, num_samples to be generated. This generates the adversarial samples and stores them in directory adv_samples/
- Stitch the adversarial examples generated to original image
python3 main_script.py --stitch
You can see the images in stitched_images/
directory
- Test them with black box model
python3 main_script.py --yolotest
cd black_box_attack_classification
- Create black box model
python3 main_script.py --bb
It asks to save the model. Give it a proper name. The mode lis saved in the folder saved_models/
- Create substitute model
python3 main_script.py --sub
It asks which black box model to use. Give it the model name from saved_models/
.
It asks to save the model. Give it a proper name. The mode lis saved in the folder saved_models/
- Create adversarial samples
python3 main_script.py --adv
It asks for which substitute model to use, num_samples to be generated. This generates the adversarial samples and stores them in directory adv_samples/
- Test them with black box model
python3 main_script.py --test
This file creates different datasets.
This file contains various options to run different steps in the project
This file gets output from the oracle for a given input
This gets predictions from the black box for inputs given
* model.py
This creates model
This contains code to stitch the adversarial sample generated to original photo
This file implements the Substitute DNN training algorithm given in paper[1].
For oracle Õ, a maximum number maxρ of substitute training epochs, a substitute architecture F and initial training set S0.
Input: Õ, maxρ , S0 , λ
1: Define architecture F
2: for ρ ∈ 0 .. maxρ − 1 do
3: D ← {(x, Õ(x)) : x ∈ Sρ} // Label the substitute training
4: 0F ← train(F, D) // Train F on D to evaluate parameters θF
5: S(ρ+1) ← {x + λ · sgn(JF [Õ(x)]) : x ∈ Sρ} ∪ Sρ // Perform Jacobian-based dataset augmentation
6: end for
7: return θF
The function create_dataset()
creates dataset out of the samples generated and augment_dataset()
function augments it to the current dataset.
* train.py
This file trains the model
This file contains helper functions
This file creates adversarial samples based on the white box (substitute) model