Giter VIP home page Giter VIP logo

eps-ad's Introduction

Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score

Official PyTorch implementation of the ICML 2023 paper:

Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score

Shuhai Zhang, Feng Liu, Jiahao Yang, Yifan Yang, Changsheng Li, Bo Han, Mingkui Tan

Abstract: Adversarial detection aims to determine whether a given sample is an adversarial one based on the discrepancy between natural and adversarial distributions. Unfortunately, estimating or comparing two data distributions is extremely difficult, especially in high-dimension spaces. Recently, the gradient of log probability density (a.k.a., score) w.r.t. the sample is used as an alternative statistic to compute. However, we find that the score is sensitive in identifying adversarial samples due to insufficient information with one sample only. In this paper, we propose a new statistic called expected perturbation score (EPS), which is essentially the expected score of a sample after various perturbations. Specifically, to obtain adequate information regarding one sample, we perturb it by adding various noises to capture its multi-view observations. We theoretically prove that EPS is a proper statistic to compute the discrepancy between two samples under mild conditions. In practice, we can use a pre-trained diffusion model to estimate EPS for each sample. Last, we propose an EPS-based adversarial detection (EPS-AD) method, in which we develop EPS-based maximum mean discrepancy (MMD) as a metric to measure the discrepancy between the test sample and natural samples. We also prove that the EPS-based MMD between natural and adversarial samples is larger than that among natural samples. Extensive experiments show the superior adversarial detection performance of our EPS-AD.

Requirements

  • An RTX 3090 with 24 GB of memory.
  • Python 3.7
  • Pytorch 1.7.1

Data and pre-trained models

Note that you have to put the datasets in the ./dataset directory.

For the pre-trained diffusion models, you need to first download them from the following links and put them in the ./pretrained directory:

For the checkpoint of the trained kernels on Cifar and ImageNet:

Environment of EPS-AD

You have to create a virtual environment and set up libraries needed for training and evaluation.

conda env create -f epsad.yaml
pip install git+https://github.com/RobustBench/robustbench.git

Run experiments on CIFAR-10

1. Train a deep kernel for MMD.

  • To obtain the EPSs of nature samples and adversarial samples under FGSM and FGSM_L2 attack with $\epsilon=1/255$ :
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py  --num_sub 10000 \
    --adv_batch_size 200 \
    --detection_datapath './score_diffusion_t_cifar_1w'  \
    --epsilon 0.00392 \
    --diffuse_t 20  \
    --perb_image \
    --attack_methods FGSM FGSM_L2 \
    --single_vector_norm_flag \
    --generate_1w_flag \
    --clean_score_flag
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py  --num_sub 10000 \
    --adv_batch_size 200 \
    --detection_datapath './score_diffusion_t_cifar_1w'  \
    --epsilon 0.00392 \
    --diffuse_t 20  \
    --perb_image \
    --attack_methods FGSM FGSM_L2 \
    --single_vector_norm_flag \
    --generate_1w_flag
  • To train a deep kernel MMD with the EPSs of FGSM and FGSM_L2 adversarial samples:
CUDA_VISIBLE_DEVICES=0
python train_D.py --epochs 200 --lr 0.002 --id 8 --sigma0 15 --sigma 100  --epsilon 2 --feature_dim 300 --dataset cifar

Note that through all our experiments, we use only FGSM and FGSM-$\ell_{2}$ adversarial samples ($\epsilon=1/255$), $10,000$ each, along with $10,000$ nature samples to calculate their EPSs to train the deep kernel, which can also be trained on a general public dataset. Moreover, our method is suitable for detecting all the $\ell_2$ and $\ell_\infty$ adversarial samples.

In the following, we use the EPSs of a set of nature samples with size=500 as the refernce, then perform adversarial detection with the trained deep-kernel MMD.

2. Detecting adversarial data with EPS-AD

  • To obtain EPSs of adversarial samples with other attack intensities (e.g., $\epsilon=4/255$):
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --detection_datapath './score_diffusion_t_cifar_stand' \
    --num_sub 500 \
    --adv_batch_size 500 \
    --epsilon 0.01569 \
    --diffuse_t 20 \
    --single_vector_norm_flag \
    --perb_image 
  • To obtain EPSs of nature samples:
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --detection_datapath './score_diffusion_t_cifar_stand' \
    --num_sub 500 \
    --adv_batch_size 500 \
    --epsilon 0.01569 \
    --diffuse_t 20 \
    --single_vector_norm_flag \
    --perb_image \
    --clean_score_flag
  • To calculte the MMD between EPS of each test sample and EPSs of natural samples and obatin a AUROC:
CUDA_VISIBLE_DEVICES=0
python train_D.py --epochs 200 --lr 0.002 --id 8 --sigma0 15 --sigma 100  --epsilon 2 --feature_dim 300 --dataset cifar --test_flag True

Run experiments on ImageNet

1. Train a deep kernel for MMD.

  • To obtain the EPSs of nature samples and adversarial samples under FGSM and FGSM_L2 attack with $\epsilon=1/255$ :
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --datapath './dataset/imagenet' \
    --num_sub 10000 \
    --adv_batch_size 32 \
    --detection_datapath './score_diffusion_t_imagenet_1w'  \
    --single_vector_norm_flag \
    --config imagenet.yml \
    -i imagenet \
    --domain imagenet \
    --classifier_name imagenet-resnet50 \
    --diffuse_t 50  \
    --perb_image \
    --attack_methods FGSM FGSM_L2 \
    --epsilon 0.00392 \
    --generate_1w_flag \
    --clean_score_flag
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --datapath './dataset/imagenet' \
    --num_sub 10000 \
    --adv_batch_size 32 \
    --detection_datapath './score_diffusion_t_imagenet_1w'  \
    --single_vector_norm_flag \
    --config imagenet.yml \
    -i imagenet \
    --domain imagenet \
    --classifier_name imagenet-resnet50 \
    --diffuse_t 50  \
    --perb_image \
    --attack_methods FGSM FGSM_L2 \
    --epsilon 0.00392 \
    --generate_1w_flag
  • To train a deep kernel MMD with the EPSs of FGSM and FGSM_L2 adversarial samples:
CUDA_VISIBLE_DEVICES=0
python train_D.py --epochs 200 --lr 0.002 --id 6 --sigma0 0.5 --sigma 100  --epsilon 10 --feature_dim 300 --dataset imagenet

2. Detecting adversarial data with EPS-AD

  • To obtain EPSs of adversarial samples with other attack intensities (e.g., $\epsilon=4/255$):
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --datapath './dataset/imagenet' \
    --num_sub 500 \
    --adv_batch_size 32 \
    --detection_datapath './score_diffusion_t_imagenet_stand'  \
    --config imagenet.yml \
    -i imagenet \
    --domain imagenet \
    --classifier_name imagenet-resnet50 \
    --diffuse_t 50  \
    --epsilon 0.01569 \
    --single_vector_norm_flag \
    --perb_image
  • To obtain EPSs of nature samples:
CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --datapath './dataset/imagenet' \
    --num_sub 500 \
    --adv_batch_size 32 \
    --detection_datapath './score_diffusion_t_imagenet_stand'  \
    --config imagenet.yml \
    -i imagenet \
    --domain imagenet \
    --classifier_name imagenet-resnet50 \
    --diffuse_t 50  \
    --epsilon 0.01569 \
    --single_vector_norm_flag \
    --perb_image \
    --clean_score_flag
  • To calculte the MMD between EPS of each test sample and EPSs of natural samples and obatin a AUROC:
CUDA_VISIBLE_DEVICES=0
python train_D.py --epochs 200 --lr 0.002 --id 6 --sigma0 0.5 --sigma 100  --epsilon 10 --feature_dim 300 --dataset imagenet --test_flag True

Citation

@inproceedings{zhangs2023EPSAD,
  title={Detecting Adversarial Data by Probing Multiple Perturbations Using Expected Perturbation Score},
  author={Zhang, Shuhai and Liu, Feng and Yang, Jiahao and Yang, Yifan and Li, Changsheng and Han, Bo and Tan, Mingkui},
  booktitle = {International Conference on Machine Learning (ICML)},
  year={2023}
}

eps-ad's People

Contributors

zshsh98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

eps-ad's Issues

How to detect an unknown dataset

I input an unknown dataset, and I don't know which ones are clean samples and which ones are adversarial samples. How can I calculate the EPS of the unknown sample and compare it with the EPS of the clean sample through MMD

The auroc of 128x128 diffusion is very poor, not reaching the level of 256x256 diffusion (not class conditional)

I use loader to read down sampled images of size 128, and then use a 128x128 diffusion (weights downloaded from https://github.com/openai/guided-diffusion )Why is the effect not as good as 256x256 diffusion (not class conditional)? Auroc is only about 0.5, which is equivalent to random guessing。
The parameters I used were basically the same, without using the category features of the diffusion model. I saw in the paper that there is 128*128 diffusion model and it has good results. Is it because I used class conditional instead of not class conditional that the quality of the generated diffusion images is not good, which leads to the inability to calculate scores well

Could you tell us whether we have reproduced the proposed method.

Thanks for open-sourcing your wonderful work.

We have trained the MMD kernel and used the following code to evaluate the performance:

CUDA_VISIBLE_DEVICES=0
python eval_epsad.py --detection_datapath './score_diffusion_t_cifar_stand' \
    --num_sub 500 \
    --adv_batch_size 500 \
    --epsilon 0.01569 \
    --diffuse_t 20 \
    --single_vector_norm_flag \
    --perb_image

The exp result shows that:

attack_method: FGSM, robust accuracy: top1:0.35200000762939454--top5:0.8220000457763672
attack and diffuison time: 24.67634677886963
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: PGD, robust accuracy: top1:0.058000001907348636--top5:0.5020000076293946
attack and diffuison time: 19.34290313720703
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: BIM, robust accuracy: top1:0.045999999046325686--top5:0.40799999237060547
attack and diffuison time: 19.499910831451416
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: MIM, robust accuracy: top1:0.08199999809265136--top5:0.45799999237060546
attack and diffuison time: 19.615848064422607
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: TIM, robust accuracy: top1:0.25--top5:0.76
attack and diffuison time: 19.774481534957886
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: CW, robust accuracy: top1:0.04400000095367432--top5:0.8440000152587891
attack and diffuison time: 22.0277259349823
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: DI_MIM, robust accuracy: top1:0.8780000305175781--top5:0.9820000457763672
attack and diffuison time: 19.90054965019226
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: FGSM_L2, robust accuracy: top1:0.3820000076293945--top5:0.85
attack and diffuison time: 18.244937896728516
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: PGD_L2, robust accuracy: top1:0.026000001430511475--top5:0.26399999618530273
attack and diffuison time: 19.957834243774414
score_tensor.shape:torch.Size([500, 3, 32, 32])
attack_method: BIM_L2, robust accuracy: top1:0.03799999952316284--top5:0.3220000076293945
attack and diffuison time: 19.931138038635254
score_tensor.shape:torch.Size([500, 3, 32, 32])

We are doubt whether we have reproduced the proposed method.

Hope for your response.

Thanks a lot.

Pre-trained checkpoints (ImageNet)

Thank you for your code. Your work seems very interesting. Can you provide some checkpoints of the deep kernel? So I can easily use it to detect my own adversarial images and cite your paper.
Precisely, I need pre-trained checkpoints under eps=2 and 4 on ImageNet. Your help would be very grateful.

About the results

Thank you for your code! The idea of the paper is interesting and the result is competitive. I find that your method demonstrates great performance in detecting the adversarial examples generated by your code , but the detection performance on examples generated by ourselves is poor(70 auroc). I would appreciate it if you could help with our problem.

An error occurred while training my own deep kernel MMD

I used your code to obtain natural samples and adversarial samples in FGSM and FGSM_L2 Three sets of EPS under L2 attack, and then I want to calculate and use them to train the deep kernel.
python train_D.py --epochs 200 --lr 0.002 --id 8 --sigma0 15 --sigma 100 --epsilon 2 --feature_dim 300 --dataset cifar
I ran the above code but showed
No such file or directory: './score_diffusion_t_cifar_stand/scores_cleansingle_vector_norm20perb_image.npy'
I think a natural sample EPS with a quantity of 500 is needed as a reference, so I need to run
python eval_epsad.py --detection_datapath './score_diffusion_t_cifar_stand'
--num_sub 500
--adv_batch_size 500
--epsilon 0.01569
--diffuse_t 20
--single_vector_norm_flag
--perb_image
--clean_score_flag
To obtain a natural sample EPS with dimensions (500,3,32,32) stored in a single NPY file
I am not sure if this is the correct idea, and I have run the above code on 3090 for 24 hours and still have no results

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.