Giter VIP home page Giter VIP logo

auto-attack's Introduction

AutoAttack

"Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"
Francesco Croce, Matthias Hein
ICML 2020
https://arxiv.org/abs/2003.01690

We propose to use an ensemble of four diverse attacks to reliably evaluate robustness:

  • APGD-CE, our new step size-free version of PGD on the cross-entropy,
  • APGD-DLR, our new step size-free version of PGD on the new DLR loss,
  • FAB, which minimizes the norm of the adversarial perturbations (Croce & Hein, 2019),
  • Square Attack, a query-efficient black-box attack (Andriushchenko et al, 2019).

Note: we fix all the hyperparameters of the attacks, so no tuning is required to test every new classifier.

News

  • [Sep 2021]
    • We add automatic checks for potential cases where the standard version of AA might be non suitable or sufficient for robustness evaluation.
    • The evaluations of models on CIFAR-10 and CIFAR-100 are no longer maintained. Up-to-date leaderboards are available in RobustBench.
  • [Mar 2021] A version of AutoAttack wrt L1, which includes the extensions of APGD and Square Attack (Croce & Hein, 2021), is available!
  • [Oct 2020] AutoAttack is used as standard evaluation in the new benchmark RobustBench, which includes a Model Zoo of the most robust classifiers! Note that this page and RobustBench's leaderboards are maintained simultaneously.
  • [Aug 2020]
    • Updated version: in order to i) scale AutoAttack (AA) to datasets with many classes and ii) have a faster and more accurate evaluation, we use APGD-DLR and FAB with their targeted versions.
    • We add the evaluation of models on CIFAR-100 wrt Linf and CIFAR-10 wrt L2.
  • [Jul 2020] A short version of the paper is accepted at ICML'20 UDL workshop for a spotlight presentation!
  • [Jun 2020] The paper is accepted at ICML 2020!

Adversarial Defenses Evaluation

We here list adversarial defenses, for many threat models, recently proposed and evaluated with the standard version of AutoAttack (AA), including

  • untargeted APGD-CE (no restarts),
  • targeted APGD-DLR (9 target classes),
  • targeted FAB (9 target classes),
  • Square Attack (5000 queries).

See below for the more expensive AutoAttack+ (AA+) and more options.

We report the source of the model, i.e. if it is publicly available, if we received it from the authors or if we retrained it, the architecture, the clean accuracy and the reported robust accuracy (note that might be calculated on a subset of the test set or on different models trained with the same defense). The robust accuracy for AA is on the full test set.

We plan to add new models as they appear and are made available. Feel free to suggest new defenses to test!

To have a model added: please check here.

Checkpoints: many of the evaluated models are available and easily accessible at this Model Zoo.

CIFAR-10 - Linf

The robust accuracy is evaluated at eps = 8/255, except for those marked with * for which eps = 0.031, where eps is the maximal Linf-norm allowed for the adversarial perturbations. The eps used is the same set in the original papers.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.

# paper model architecture clean report. AA
1 (Gowal et al., 2020) available WRN-70-16 91.10 65.87 65.88
2 (Gowal et al., 2020) available WRN-28-10 89.48 62.76 62.80
3 (Wu et al., 2020a) available WRN-34-15 87.67 60.65 60.65
4 (Wu et al., 2020b) available WRN-28-10 88.25 60.04 60.04
5 (Carmon et al., 2019) available WRN-28-10 89.69 62.5 59.53
6 (Gowal et al., 2020) available WRN-70-16 85.29 57.14 57.20
7 (Sehwag et al., 2020) available WRN-28-10 88.98 - 57.14
8 (Gowal et al., 2020) available WRN-34-20 85.64 56.82 56.86
9 (Wang et al., 2020) available WRN-28-10 87.50 65.04 56.29
10 (Wu et al., 2020b) available WRN-34-10 85.36 56.17 56.17
11 (Alayrac et al., 2019) available WRN-106-8 86.46 56.30 56.03
12 (Hendrycks et al., 2019) available WRN-28-10 87.11 57.4 54.92
13 (Pang et al., 2020c) available WRN-34-20 86.43 54.39 54.39
14 (Pang et al., 2020b) available WRN-34-20 85.14 - 53.74
15 (Cui et al., 2020)* available WRN-34-20 88.70 53.57 53.57
16 (Zhang et al., 2020b) available WRN-34-10 84.52 54.36 53.51
17 (Rice et al., 2020) available WRN-34-20 85.34 58 53.42
18 (Huang et al., 2020)* available WRN-34-10 83.48 58.03 53.34
19 (Zhang et al., 2019b)* available WRN-34-10 84.92 56.43 53.08
20 (Cui et al., 2020)* available WRN-34-10 88.22 52.86 52.86
21 (Qin et al., 2019) available WRN-40-8 86.28 52.81 52.84
22 (Chen et al., 2020a) available RN-50 (x3) 86.04 54.64 51.56
23 (Chen et al., 2020b) available WRN-34-10 85.32 51.13 51.12
24 (Sitawarin et al., 2020) available WRN-34-10 86.84 50.72 50.72
25 (Engstrom et al., 2019) available RN-50 87.03 53.29 49.25
26 (Kumari et al., 2019) available WRN-34-10 87.80 53.04 49.12
27 (Mao et al., 2019) available WRN-34-10 86.21 50.03 47.41
28 (Zhang et al., 2019a) retrained WRN-34-10 87.20 47.98 44.83
29 (Madry et al., 2018) available WRN-34-10 87.14 47.04 44.04
30 (Pang et al., 2020a) available RN-32 80.89 55.0 43.48
31 (Wong et al., 2020) available RN-18 83.34 46.06 43.21
32 (Shafahi et al., 2019) available WRN-34-10 86.11 46.19 41.47
33 (Ding et al., 2020) available WRN-28-4 84.36 47.18 41.44
34 (Atzmon et al., 2019)* available RN-18 81.30 43.17 40.22
35 (Moosavi-Dezfooli et al., 2019) authors WRN-28-10 83.11 41.4 38.50
36 (Zhang & Wang, 2019) available WRN-28-10 89.98 60.6 36.64
37 (Zhang & Xu, 2020) available WRN-28-10 90.25 68.7 36.45
38 (Jang et al., 2019) available RN-20 78.91 37.40 34.95
39 (Kim & Wang, 2020) available WRN-34-10 91.51 57.23 34.22
40 (Wang & Zhang, 2019) available WRN-28-10 92.80 58.6 29.35
41 (Xiao et al., 2020)* available DenseNet-121 79.28 52.4 18.50
42 (Jin & Rinard, 2020) available RN-18 90.84 71.22 1.35
43 (Mustafa et al., 2019) available RN-110 89.16 32.32 0.28
44 (Chan et al., 2020) retrained WRN-34-10 93.79 15.5 0.26

CIFAR-100 - Linf

The robust accuracy is computed at eps = 8/255 in the Linf-norm, except for the models marked with * for which eps = 0.031 is used.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.

# paper model architecture clean report. AA
1 (Gowal et al. 2020) available WRN-70-16 69.15 37.70 36.88
2 (Cui et al., 2020)* available WRN-34-20 62.55 30.20 30.20
3 (Gowal et al. 2020) available WRN-70-16 60.86 30.67 30.03
4 (Cui et al., 2020)* available WRN-34-10 60.64 29.33 29.33
5 (Wu et al., 2020b) available WRN-34-10 60.38 28.86 28.86
6 (Hendrycks et al., 2019) available WRN-28-10 59.23 33.5 28.42
7 (Cui et al., 2020)* available WRN-34-10 70.25 27.16 27.16
8 (Chen et al., 2020b) available WRN-34-10 62.15 - 26.94
9 (Sitawarin et al., 2020) available WRN-34-10 62.82 24.57 24.57
10 (Rice et al., 2020) available RN-18 53.83 28.1 18.95

MNIST - Linf

The robust accuracy is computed at eps = 0.3 in the Linf-norm.

# paper model clean report. AA
1 (Gowal et al., 2020) available 99.26 96.38 96.34
2 (Zhang et al., 2020a) available 98.38 96.38 93.96
3 (Gowal et al., 2019) available 98.34 93.78 92.83
4 (Zhang et al., 2019b) available 99.48 95.60 92.81
5 (Ding et al., 2020) available 98.95 92.59 91.40
6 (Atzmon et al., 2019) available 99.35 97.35 90.85
7 (Madry et al., 2018) available 98.53 89.62 88.50
8 (Jang et al., 2019) available 98.47 94.61 87.99
9 (Wong et al., 2020) available 98.50 88.77 82.93
10 (Taghanaki et al., 2019) retrained 98.86 64.25 0.00

CIFAR-10 - L2

The robust accuracy is computed at eps = 0.5 in the L2-norm.
Note: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).

Update: this is no longer maintained, but an up-to-date leaderboard is available in RobustBench.

# paper model architecture clean report. AA
1 (Gowal et al., 2020) available WRN-70-16 94.74 - 80.53
2 (Gowal et al., 2020) available WRN-70-16 90.90 - 74.50
3 (Wu et al., 2020b) available WRN-34-10 88.51 73.66 73.66
4 (Augustin et al., 2020) authors RN-50 91.08 73.27 72.91
5 (Engstrom et al., 2019) available RN-50 90.83 70.11 69.24
6 (Rice et al., 2020) available RN-18 88.67 71.6 67.68
7 (Rony et al., 2019) available WRN-28-10 89.05 67.6 66.44
8 (Ding et al., 2020) available WRN-28-4 88.02 66.18 66.09

How to use AutoAttack

Installation

pip install git+https://github.com/fra31/auto-attack

PyTorch models

Import and initialize AutoAttack with

from autoattack import AutoAttack
adversary = AutoAttack(forward_pass, norm='Linf', eps=epsilon, version='standard')

where:

  • forward_pass returns the logits and takes input with components in [0, 1] (NCHW format expected),
  • norm = ['Linf' | 'L2' | 'L1'] is the norm of the threat model,
  • eps is the bound on the norm of the adversarial perturbations,
  • version = 'standard' uses the standard version of AA.

To apply the standard evaluation, where the attacks are run sequentially on batches of size bs of images, use

x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

To run the attacks individually, use

dict_adv = adversary.run_standard_evaluation_individual(images, labels, bs=batch_size)

which returns a dictionary with the adversarial examples found by each attack.

To specify a subset of attacks add e.g. adversary.attacks_to_run = ['apgd-ce'].

TensorFlow models

To evaluate models implemented in TensorFlow 1.X, use

from autoattack import utils_tf
model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

where:

  • logits is the tensor with the logits given by the model,
  • x_input is a placeholder for the input for the classifier (NHWC format expected),
  • y_input is a placeholder for the correct labels,
  • sess is a TF session.

If TensorFlow's version is 2.X, use

from autoattack import utils_tf2
model_adapted = utils_tf2.ModelAdapter(tf_model)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

where:

  • tf_model is tf.keras model without activation function 'softmax'

The evaluation can be run in the same way as done with PT models.

Examples

Examples of how to use AutoAttack can be found in examples/. To run the standard evaluation on a pretrained PyTorch model on CIFAR-10 use

python eval.py [--individual] --version=['standard' | 'plus']

where the optional flags activate respectively the individual evaluations (all the attacks are run on the full test set) and the version of AA to use (see below).

Other versions

AutoAttack+

A more expensive evaluation can be used specifying version='plus' when initializing AutoAttack. This includes

  • untargeted APGD-CE (5 restarts),
  • untargeted APGD-DLR (5 restarts),
  • untargeted FAB (5 restarts),
  • Square Attack (5000 queries),
  • targeted APGD-DLR (9 target classes),
  • targeted FAB (9 target classes).

Randomized defenses

In case of classifiers with stochastic components one can combine AA with Expectation over Transformation (EoT) as in (Athalye et al., 2018) specifying version='rand' when initializing AutoAttack. This runs

  • untargeted APGD-CE (no restarts, 20 iterations for EoT),
  • untargeted APGD-DLR (no restarts, 20 iterations for EoT).

Custom version

It is possible to customize the attacks to run specifying version='custom' when initializing the attack and then, for example,

if args.version == 'custom':
	adversary.attacks_to_run = ['apgd-ce', 'fab']
        adversary.apgd.n_restarts = 2
        adversary.fab.n_restarts = 2

Other options

Random seed

It is possible to fix the random seed used for the attacks with, e.g., adversary.seed = 0. In this case the same seed is used for all the attacks used, otherwise a different random seed is picked for each attack.

Log results

To log the intermediate results of the evaluation specify log_path=/path/to/logfile.txt when initializing the attack.

Citation

@inproceedings{croce2020reliable,
    title = {Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks},
    author = {Francesco Croce and Matthias Hein},
    booktitle = {ICML},
    year = {2020}
}
@inproceedings{croce2021mind,
    title={Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers}, 
    author={Francesco Croce and Matthias Hein},
    booktitle={ICML},
    year={2021}
}

auto-attack's People

Contributors

cassidylaidlaw avatar cnocycle avatar dedeswim avatar divyam3897 avatar fra31 avatar gwding avatar hengyuel avatar jeromerony avatar mariamsu avatar niklasnolte avatar s-kumano avatar saehyung-lee avatar tabrisweapon avatar topology1225 avatar vtjeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

auto-attack's Issues

What should be reported, the average of results or the minimum?

Firstly, thank you very much for this repo, it is very helpful.

I performed the AA attack on my model and got the following results:

robust accuracy by APGD-CE 28.00% 	 (time attack: 6.2 s)
robust accuracy by APGD-T 	 36.00% 	 (time attack: 15.6 s)
robust accuracy by FAB-T 	 92.00% 	 (time attack: 76.9 s)
robust accuracy by SQUARE 	 65.00% 	 (time attack: 67.5 s)

The average yields 55.25 but the minimum is 28. So what should be reported as the accuracy of the model on AutoAttack?

Also, out of curiousity, most of the models shown in the leaderboard are from the ResNet family of architectures with one densenet network. The AutoAttack can even be used for other architectures like VGG right?

Add Entropic Retraining

Paper: Optimizing Information Loss Towards Robust Neural Networks

Venue: DYNAMICS 2020 – DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Dataset and threat model: CIFAR-10, Linf, eps = 8/255

Code: Not published

Pre-trained model: Pre-trained model available here

Log file: Log file of the evaluation available here

Additional data: No

Clean and robust accuracy: Clean accuracy: 81.75%. Robust accuracy: 41.88%

Architecture: Simple convolutional network: Four convolutional layers, the first two with filter of size 32, the second pair with filter size of 64. Each pair of convolutional layers is followed by a max-pooling layer. Finally, one flatten layer and six dense layers with 512, 256, 128, 128, 84, and 10 neurons each. Script with the definition of the model available here.

Description of the model/defense: New loss: During training, which is based normal data only, an adapted loss function is used. The additional loss-term is based on information theoretic inspired metrics. The method does not require the generation of adversarial examples during training.

Add Adversarial Training with Early Stopping (ATES), CIFAR-100

Paper: Improving Adversarial Robustness Through Progressive Hardening https://arxiv.org/abs/2003.09347

Venue: under review

Dataset and threat model: CIFAR-100, L-inf, 8/255

Code: https://github.com/chawins/ates-minimal

Pre-trained model: weight

Log file: log

Additional data: no

Clean and robust accuracy: 62.82/24.57

Architecture: WRN-34-10

Description of the model/defense: We use the curriculum learning framework to schedule the "difficulty" of adversarial examples generated during adversarial training. This improves both clean and robust accuracy.

Missing file other_utils.py

autoattack.py fails because of a missing module

import other_utils as utils

which is presumably because of a missing other_utils.py file.
Could you please upload it?

Normalization of CIFAR-10 images

In your script, you didn't use the standard normalization done on CIFAR-10. Would you say that a fair comparison is still given, even when normalization is applied? Or should an approach tested by AutoAttack be trained without this normalization to stay comparable?
Best regards, and thanks for offering this great repo for testing robustness!

TF2 implementation

Hi authors,

I sincerely thank all authors for their time and efforts. Auto-attack is a powerful tool which helping me checking defense's robustness.

After reading the source code, I only fond APIs for TF1. The latest version of TF1(tf-1.15.0) published in 6 months ago. I thought that APIs should be upgraded to support TF2.

I am willing to implement APIs for TF2 if necessary.

Thanks

Adversarials are equals to originals

Hi. I run autoattack using your example autoattack/examples/eval.py:

data_dir = './data_CIFAR10'
save_dir = './results_data_CIFAR10'
norm = 'Linf'
epsilon = 0.5
log_path = './log_file.txt'
version = 'standard'
individual = 'store_true'
n_ex = 100
batch_size = 500

model = models.resnet18(pretrained=True)
model.cuda()
model.eval()

# load data
transform_list = [transforms.ToTensor()]
transform_chain = transforms.Compose(transform_list)
item = datasets.CIFAR10(root=data_dir, train=False, transform=transform_chain, download=True)
test_loader = data.DataLoader(item, batch_size=1000, shuffle=False, num_workers=0)

# create save dir
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# load attack    
from autoattack import AutoAttack
adversary = AutoAttack(model, norm=norm, eps=epsilon, log_path=log_path, version=version)

l = [x for (x, y) in test_loader]
x_test = torch.cat(l, 0)
l = [y for (x, y) in test_loader]
y_test = torch.cat(l, 0)

# example of custom version
if version == 'custom':
    adversary.attacks_to_run = ['apgd-ce', 'fab']
    adversary.apgd.n_restarts = 2
    adversary.fab.n_restarts = 2

# run attack and save images
with torch.no_grad():
    if not individual:
        adv_complete = adversary.run_standard_evaluation(x_test[:n_ex], y_test[:n_ex], bs=batch_size)
        torch.save({'adv_complete': adv_complete}, '{}.pth'.format(save_dir))

    else:
        # individual version, each attack is run on all test points
        adv_complete = adversary.run_standard_evaluation_individual(x_test[:n_ex],
            y_test[:n_ex], bs=batch_size)
        torch.save(adv_complete, '{}.pth'.format(save_dir))

But the result adversarials are equals to original inputs:

for key in adv_complete.keys():
    print(f"{key}: {np.all(adv_complete[key][0].numpy() == x_test[0].numpy())}")

>> apgd-ce: True
>> apgd-t: True
>> fab-t: True
>> square: True

I tried to use epsilon = 8./255. and epsilon = 0.5. The result was not changed((
Could you please explain me where i am wrong?

Width-Adjusted-Regularization Update

Paper: { http://arxiv.org/abs/2010.01279 }

Venue: {unpublished}

Dataset and threat model: {CIFAR-10, l-inf, eps=8/255, AutoAttack}

Code: {Same with the last report}

Pre-trained model: {https://www.dropbox.com/s/89i5zoxa2ugglaq/wrn-34-15-cad59.pt?dl=0 }

Log file: {None}

Additional data: {yes}

Clean and robust accuracy: {clean:87.67%, AutoAttack: 60.65%}

Architecture: {WideResNet-34-15}

Description of the model/defense:
’‘’
Dear authors of AutoAttack:
This is an update for our last submission in #21. Here we report our new best results and hope to replace it with the current one on the table (the 4th one). We also change our title from "Does Network Width Really Help Adversarial Robustness?" to "Do Wider Neural Networks Really Help Adversarial Robustness?". Please update this information for us on the RobustBench too.

Thanks!
Boxi Wu
‘’‘

Default configuration of CIFAR100

Hi authors,

Is the default attacking configuration of CIFAR100 identical to CIFAR10? Specifically, the targeted attack is set to 9 for CIFAR10 but CIFAR100 has 100 classes. In my experience, the targeted attacks cannot produce successful adversarial examples when targeted class is set to the 6th largest class for both CIFAR10 and CIFAR100 dataset. Should we search all directions (99 classes) or we can safely search the 9 largest class using the default configuration for CIFAR100?

Python Package

Are you considering releasing this project as a python package?

eps 8./255. works fine 4./255. does not work fine.

I want to run the standard attack on different epsilons for the perturbations.
It also works on different datasets except one.

my normalizaion:

mean:  [0.36015135049819946, 0.21252931654453278, 0.1168241947889328]
std :  [0.24773411452770233, 0.20017878711223602, 0.17963241040706
using standard version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 91.60%
apgd-ce - 1/1 - 431 out of 458 successfully perturbed
robust accuracy after APGD-CE: 5.40% (total time 110.8 s)
Traceback (most recent call last):
  File "/home/user/adversialml/src/src/attacks.py", line 104, in <module>
    adv_complete, max_nr = adversary.run_standard_evaluation(x_test, y_test, bs=args.batch_size)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autoattack.py", line 172, in run_standard_evaluation
    adv_curr = self.apgd_targeted.perturb(x, y) #cheap=True
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 682, in perturb
    res_curr = self.attack_single_run(x_to_fool, y_to_fool)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 279, in attack_single_run
    loss_indiv = criterion_indiv(logits, y)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 611, in dlr_loss_targeted
    x_sorted[:, -3] + x_sorted[:, -4]) + 1e-12)
IndexError: index -3 is out of bounds for dimension 1 with size 2

Any suggestions what to do?

Interested in adding models to the table!

Is it still possible to have new models added to the table? I remember seeing some template earlier, but it seems to have been removed. What is the current protocol? Or is it a work in progress at the moment?

Random seed

Hi,
I am a big fan of autoattack and robustbench. The centralization/standardization of adversarial robustness is so helpful. :)

I'm working on a new approach to adversarial robustness and am evaluating on autoattack.

Unfortunately, I can't exactly replicate the values on robustbench with a random seed of 0 or 1. Could you please share the random seed you use for the numbers on the leaderboard?

Thanks

Normalized input data for AutoAttack

Hi authors,

If our models for CIFAR-10 are trained with normalized data on CIFAR-10 (mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010)), are we able to evaluate the robust accuracy through your AutoAttack? Because I could not find a normalized neither in eval.py nor in autoattack.py.

Thanks,
Liang

Unable to customize attacks' parameters

Hi, authors

I notice that some issues when I customize attacks.

First, attacks_to_run or some basic parameters are set at the beginning and set_version is called at the end of constructor.

class AutoAttack():
def __init__(self, model, norm='Linf', eps=.3, seed=None, verbose=True,
attacks_to_run=[], version='standard', is_tf_model=False,
device='cuda', log_path=None):
self.model = model
self.norm = norm
assert norm in ['Linf', 'L2']
self.epsilon = eps
self.seed = seed
self.verbose = verbose
self.attacks_to_run = attacks_to_run
self.version = version
self.is_tf_model = is_tf_model
self.device = device
self.logger = Logger(log_path)

if version in ['standard', 'plus', 'rand']:
self.set_version(version)

Therefore, those parameters are overwritten by set_version.

Second, the detailed of configuration are set in set_version.

def set_version(self, version='standard'):
if version == 'standard':
self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square']
self.apgd.n_restarts = 1
self.fab.n_restarts = 1
self.apgd_targeted.n_restarts = 1
self.fab.n_target_classes = 9
self.apgd_targeted.n_target_classes = 9
self.square.n_queries = 5000
elif version == 'plus':
self.attacks_to_run = ['apgd-ce', 'apgd-dlr', 'fab', 'square', 'apgd-t', 'fab-t']
self.apgd.n_restarts = 5
self.fab.n_restarts = 5
self.apgd_targeted.n_restarts = 1
self.fab.n_target_classes = 9
self.apgd_targeted.n_target_classes = 9
self.square.n_queries = 5000
elif version == 'rand':
self.attacks_to_run = ['apgd-ce', 'apgd-dlr']
self.apgd.n_restarts = 1
self.apgd.eot_iter = 20

I would suggest that public APIs are required in order to set the detailed.

Compactness+Robustness

Paper: https://arxiv.org/pdf/2002.10509.pdf

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10, l-inf, eps=8/255

Code: https://github.com/inspire-group/compactness-robustness. A minimal script to evaluate a WRN-28-10 model is available at https://gist.github.com/VSehwag/688632e523df5d2a4c8008f5ee567b1c (only need to download the checkpoint)

Pre-trained model: https://www.dropbox.com/sh/56yyfy16elwbnr8/AADmr7bXgFkrNdoHjKWwIFKqa?dl=0 Use the model_best_dense.pth.tar

Log file: {link to log file of the evaluation}

Additional data: yes

Clean and robust accuracy: Benign test accuracy = 88.97% , PGD-50 test accuracy (1-restart) = 62.24%, Auto-attack (cheap): 57.88%

Architecture: WRN-28-10 (90% connections pruned)

Description of the model/defense: Compressed neural networks while simultaneously aching high robustness.

Modified Square Attack to Randomized Defense

Hi,

I want to apply the Square Attack to a randomized defense. You mentioned in the paper (https://arxiv.org/pdf/2003.01690.pdf) as follows:

We modify Square Attack to accept an update if it reduces the target loss on average over 20 forward passes and, as this costs more time we use only 1000 iterations,

but I don't know how to use the modified version of the Square Attack? Is it not implemented yet? Can you elaborate more on how to modify the code?

Thanks in advance!

Support FP16 in pytorch

Hi contributors,

Will auto-attack support FP16 (or mixed precision)[1] in pytorch?

In TF2, FP16 is configured at the beginning of main function with one flag tf.keras.mixed_precision.set_global_policy('mixed_float16')

The benefit of FP16 is decreasing elapsed time significantly without losing attacking algorithm's performance.

The following is the output logging of my experimental implementation on V100:

# FP32 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 146 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 142 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 157 out of 500 successfully perturbed
apgd-ce - 8/17 - 148 out of 500 successfully perturbed
apgd-ce - 9/17 - 153 out of 500 successfully perturbed
apgd-ce - 10/17 - 161 out of 500 successfully perturbed
apgd-ce - 11/17 - 166 out of 500 successfully perturbed
apgd-ce - 12/17 - 141 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 125 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.33% (total time 1835.3 s)
apgd-t - 1/12 - 27 out of 500 successfully perturbed
apgd-t - 2/12 - 24 out of 500 successfully perturbed
apgd-t - 3/12 - 23 out of 500 successfully perturbed
apgd-t - 4/12 - 18 out of 500 successfully perturbed
apgd-t - 5/12 - 23 out of 500 successfully perturbed
apgd-t - 6/12 - 16 out of 500 successfully perturbed
apgd-t - 7/12 - 24 out of 500 successfully perturbed
apgd-t - 8/12 - 23 out of 500 successfully perturbed
apgd-t - 9/12 - 28 out of 500 successfully perturbed
apgd-t - 10/12 - 22 out of 500 successfully perturbed
apgd-t - 11/12 - 27 out of 500 successfully perturbed
apgd-t - 12/12 - 22 out of 333 successfully perturbed
robust accuracy after APGD-T: 55.56% (total time 12733.1 s)
# FP16 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 147 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 141 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 158 out of 500 successfully perturbed
apgd-ce - 8/17 - 147 out of 500 successfully perturbed
apgd-ce - 9/17 - 156 out of 500 successfully perturbed
apgd-ce - 10/17 - 160 out of 500 successfully perturbed
apgd-ce - 11/17 - 164 out of 500 successfully perturbed
apgd-ce - 12/17 - 139 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 126 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.34% (total time 751.5 s)
apgd-t - 1/12 - 28 out of 500 successfully perturbed
apgd-t - 2/12 - 22 out of 500 successfully perturbed
apgd-t - 3/12 - 24 out of 500 successfully perturbed
apgd-t - 4/12 - 16 out of 500 successfully perturbed
apgd-t - 5/12 - 20 out of 500 successfully perturbed
apgd-t - 6/12 - 15 out of 500 successfully perturbed
apgd-t - 7/12 - 23 out of 500 successfully perturbed
apgd-t - 8/12 - 25 out of 500 successfully perturbed
apgd-t - 9/12 - 29 out of 500 successfully perturbed
apgd-t - 10/12 - 21 out of 500 successfully perturbed
apgd-t - 11/12 - 26 out of 500 successfully perturbed
apgd-t - 12/12 - 21 out of 334 successfully perturbed
robust accuracy after APGD-T: 55.64% (total time 5264.7 s)

As shown in logging, the performance is improved very huge (1835.3[s] -> 751.5[s]).

However, FP16 required newer pytorch version and CUDA hardware. Additional, source code should be modified properly.

I'm not sure whether FP16 will be supported on master branch in the future?

[1] https://pytorch.org/docs/stable/notes/amp_examples.html

Criterion for adding a new models to existing list of defenses?

Thanks for releasing such a rigorous evaluation of existing works on adversarial defenses. It is immensely helpful to get more clarity on this topic. I wonder what is the criterion to add new models to the existing list of defenses?

In particular, I am wondering whether papers (such as https://arxiv.org/pdf/2002.10509.pdf), which study adversarial training (in particular the SOTA approach from Carmon et al., 2019) in a new setting qualifies for it? In particular, the aforementioned paper revisits the question "whether high capacity is necessary for adversarial robustness" from Madry et al., 2018 and shows that high robustness can be achieved even after removing up to 99% of the parameters.

In general, it could be nice addition in the repo to have an evaluation of works that directly do not aim to improve robustness, but try to preserve it in the presence of other challenges (such as label noise, pruning etc).

Thanks.

How to use auto-attack with tensorflow?

When I run auto-attack with tensorflow I get an error:

import tensorflow as tf
tf_model = tf.keras.applications.VGG16(input_shape=(224, 224, 3))

file_name = "/home/m.cherepnina/cock.jpg"
image = tf.io.read_file(file_name)
image = tf.image.decode_image(image)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)

labels = [7]
batch_size= 1
images = tf.keras.applications.vgg16.preprocess_input(tf.convert_to_tensor([image])*255)
images = tf.transpose(images, perm=[0,3,2,1])

import utils_tf2
model_adapted = utils_tf2.ModelAdapter(tf_model)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

output:

[INFO] set data_format = 'channels_last'
setting parameters for standard version
using standard version including apgd-ce, apgd-t, fab-t, square

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-97-0e96c60d65fb> in <module>
     21 adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
     22 
---> 23 x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

~/auto-attack/autoattack/autoattack.py in run_standard_evaluation(self, x_orig, y_orig, bs)
     81             # calculate accuracy
     82             n_batches = int(np.ceil(x_orig.shape[0] / bs))
---> 83             robust_flags = torch.zeros(x_orig.shape[0], dtype=torch.bool, device=x_orig.device)
     84             for batch_idx in range(n_batches):
     85                 start_idx = batch_idx * bs

RuntimeError: Invalid device string: '/job:localhost/replica:0/task:0/device:CPU:0'

Add LBGAT on CIFAR-10

Paper: {Learnable Boundary Guided Adversarial Training; https://arxiv.org/pdf/2011.11164.pdf}

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: {CIFAR-10, L-inf and epsilon 0.031}

Code: { code}

Pre-trained model: LBGAT0-wideresnet-34-10, LBGAT0-wideresnet-34-20

Log file: {LBGAT0-wideresnet-34-10, LBGAT0-wideresnet-34-20}

Additional data: {no}

Clean and robust accuracy: {88.22/52.86, 88.70/53.57}

Architecture: {wideresnet-34-10; wideresnet-34-20}

Description of the model/defense: {Our method aims to enhance model robustness while preserving high natural accuracy. }

Information related to existing defenses

Dear authors,

Thank you for releasing the exhaustive list of state-of-the-art defenses along with their evaluations on a standard set of benchmark attacks. This is very helpful to the research community working in this area.
Could you please share more information related to the models listed on the leaderboard? Specifically, it would be helpful if you can share information related to the fields that we need to fill up while submitting a new defense such as architecture, link to code, link to model weights and a brief description of the defense.

Thanks,
Sravanti

Slow evaluation on a dataset with a large number of classes in TF1.x setting

Thank you for releasing the evaluation code.
AA attack evaluation is very helpful in studying the adversarial robustness, especially on Cifar10 benchmark.
However, when I tried to evaluate my model(trained on cifar100, TF1.13), it took too much time to run the code.
The longer the running time is, the longer the gpu utilization is maintained at 0%.
How can I solve this problem?

DLR loss implementation in TF2

Hi authors,

The DLR loss in TF2 may be incomplete. Misclassified cases are not be implemented:

def dlr_loss(x, y, num_classes=10):
x_sort = tf.sort(x, axis=1)
y_onehot = tf.one_hot(y, num_classes)
### TODO: adapt to the case when the point is already misclassified
loss = -(x_sort[:, -1] - x_sort[:, -2]) / (x_sort[:, -1] - x_sort[:, -3] + 1e-12)
return loss

I tried to implement DLR loss from the original paper's description in Section 4.1. Please verify the following code's correctness.

def dir_loss(x, y, num_classes=10):

    # logit
    logit = x
    logit_max = tf.reduce_max(logit, axis=1)
    logit_sort = tf.sort(logit, axis=1)

    # onthot_y
    #argmax_y = tf.argmax(y, axis = 1)
    argmax_y = y
    y_onehot = tf.one_hot(argmax_y , num_classes, dtype=tf.float32)
    logit_y = tf.reduce_sum(y_onehot * logit, axis=1)

    # z_i
    cond = (logit_max == logit_y)
    z_i = tf.where(cond, logit_sort[:, -2], logit_sort[:, -1])

    # loss
    z_y = logit_y
    z_p1 =  logit_sort[:, -1]
    z_p3 = logit_sort[:, -3]

    loss = - (z_y - z_i) / (z_p1 - z_p3 + 1e-12)

    return loss

If the code is acceptable, I will open a new PR.

Add FSGM_APR_SP

Paper: Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Venue: ICCV 2021

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: code

Pre-trained model: model

Log file: log

Additional data: no

Clean and robust accuracy: 84.30% / 45.70%

Architecture: RN-18

Description of the model/defense: Motivated by the powerful generalizability of the human, we argue that reducing the dependence on the amplitude spectrum and enhancing the ability to capture phase spectrum can improve the robustness of CNN.

The DLR loss is undefined for classification problems with fewer than 4 classes

The DLR loss is one of the major innovations of your work and is central to one of the four attacks used in the AutoAttack benchmark, APGD-DLR. However, when I was running tests with your framework on a couple of data sets, I noticed that AutoAttack had a tendency to crash when running the APGD-DLR attack. This is caused by the fact that the DLR loss function as defined in equation (6) of your paper implicitly assumes that the classification problem is composed of at least 3 classes; the targeted version presented in equation (7) assumes at least 4 classes.

This limitation raises a number of concerns which I think should be addressed:

  1. The AutoAttack framework itself currently issues no warning and raises no reasonable exceptions when running experiments on data sets with fewer than four classes. Instead, we get an unintuitive index out of bounds exception which makes no sense to someone unfamiliar with this drawback of the DLR loss.
  2. This problem raises the question of how to run the AutoAttack benchmark on, say, binary classification problems without compromising the results. One obvious "solution" is to exclude the APGD-DLR attack from the suite for such data sets, leaving only the APGD-CE, FAB and Square attacks. However, this obviously makes the evaluation of the models weaker, and may call into question the meaningfulness of the results. Ideally, the DLR loss should be generalized to a form that still makes sense even when there are only two classes.

Autoattack and APGDT

Hi, I am a primary learner and still feel little bit confused about the autoattack and target apgdt. (1) For the standard autoattack, do all images are attacked by 4 attacks respectively and calculate the average robust accuracy of 4 attacks after all? And in autoattack, are adversarial images for each attack saved respectively? (2) And for target apgdt, how the target label is found? I see that the number of target label is equal to the number of total class minus one for CIFAR-10. If I want to use it for the ImageNet, should I set n_target_classes = 999 or any number among 1-999? What's the principal for the target setting?

Looking forwards to your help! Thanks!

Guidelines for evaluating defense using Auto Attack

Hello, what is the protocol for using Auto Attack to evaluate a Defense?
For example, it has been found here that Shattered Gradients and Stochastic Gradients could result in a false sense of security.
The README mentions one should use version='rand' to combat Stochastic Gradients, but what is the protocol for Shattered Gradients? I know that Square Attack will not be affected by Shattered Gradients since it uses Black Box access only.

For example, what if I train a ResNet50 to solve CIFAR-10, and then attach a GradientKiller layer at the bottom? How will Auto Attack compute adversarial examples then?

Code in Pytorch:

class GradientKiller(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        return input

    @staticmethod
    def backward(ctx, grad_output):
        return torch.zeros(grad_output.size())

class SuperGeniusDefense(torch.nn.Module):
    def __init__(self, trained_model):
        super(SuperGeniusDefense, self).__init__()
        self.model = trained_model

    def forward(X):
        X = GradientKiller.apply(X)
        X = self.model(X)
        return X

Now providing the AutoAttack adversary with the forward pass of SuperGeniusDefense will provide a false sense of security when it is actually not safer than SuperGeniusDefense.model.

Loading data batch wise?

Hi, and thanks for open-sourcing your code!

My machine has a memory error when running evaluation scripts on several models ; it looks like the load_cifar10 function (and others) load the full dataset in memory, and then iterates over the tensor. Am I correct, or is there a way of loading the data sequentially to gpu?

Thanks!

Regarding the checks recently added to AutoAttack

Good evening!

I was browsing the repository, and I found this page with potential checks that can be applied to better evaluate attacks (https://github.com/fra31/auto-attack/blob/master/flags_doc.md)
We worked on a similar topic, and we published a preprint in June (https://arxiv.org/abs/2106.09947) where we develop indicators that trigger when an evaluation is faulty (and among these indicators, there is also the zero-gradient check). In the process, we also evaluated AutoPGD, showing that our systematic checks can patch failures that the automatic algorithm is unable to find.
We also released the code of our research on GitHub (https://github.com/pralab/IndicatorsOfAttackFailure).

It would be great if you could add a reference to our paper/code to that page, as the underlying idea is essentially very similar. Hence, interested users can benefit from both sources as well.

Thank you in advance!

Add Adversarial Training with Early Stopping (ATES), CIFAR-10

Paper: Improving Adversarial Robustness Through Progressive Hardening https://arxiv.org/abs/2003.09347

Venue: under review

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/chawins/ates-minimal

Pre-trained model: weight

Log file: log

Additional data: no

Clean and robust accuracy: 86.84/50.72

Architecture: WRN-34-10

Description of the model/defense: We use the curriculum learning framework to schedule the "difficulty" of adversarial examples generated during adversarial training. This improves both clean and robust accuracy.

ImportError: cannot import name 'zero_gradients' from 'torch.autograd.gradcheck'

I updated my pytorch to 1.9.0 via

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

After that, I got the next error message.

File "/home/satoharu/AdEx/SDPML/src/attacker/Attacker.py", line 65, in attack
    adv_example = self._attack(log=log, *args, **kwargs)
  File "/home/satoharu/AdEx/SDPML/src/attacker/AUTOPGD.py", line 96, in _attack
    adversary = AutoAttack(
  File "/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/autoattack/autoattack.py", line 31, in __init__
    from .fab_pt import FABAttack
  File "/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/autoattack/fab_pt.py", line 16, in <module>
    from torch.autograd.gradcheck import zero_gradients
ImportError: cannot import name 'zero_gradients' from 'torch.autograd.gradcheck' (/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/torch/autograd/gradcheck.py)

I think the zero_gradients method was removed.

Reference

https://pytorch.org/get-started/locally/

Question about variation in reported (clean & robust accuracy) metrics

Hello, running into something confusing & wondering if if I am using AA correctly.
When I evaluate my network using AA with default settings,
there is a small variation in reported the clean and robust accuracy compared to when I compute metrics myself using
the output of run_standard_evaluation:

For example, here I evaluate the clean accuracy

> (model(x_test).argmax(1) == y_test.argmax(1)).sum()/len(x_test)
0.87

Here, I want to evaluate robust accuracy using AA.

>x_adv = adversary.run_standard_evaluation(x_test, y_test.argmax(1), bs=32)
using standard version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 85.00%
.
.
.
robust accuracy: 49.00%
> (model(x_adv).argmax(1) == y_test.argmax(1)).sum()/len(x_adv)
0.54

So first, there is a 2-percent difference in clean accuracy, but more seriously, there is a large difference in the robust accuracies. I have verified the intermediate output of autoattack- e.g. when I add up the numbers by hand I get 42.00% accuracy.
The weird thing is the big difference when I use the x_adv output of AA. Is there something I'm missing about the output x_adv? My defense is basically adversarial training on a wide-resnet, so no randomness in the forward pass.

Add AWP w/ additional data

Paper: Dongxian Wu, Shu-Tao Xia, Yisen Wang. Adversarial Weight Perturbation Helps Robust Generalization

Venue: NeurIPS 2020

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/csdongxian/AWP/tree/main/auto_attacks

Pre-trained model: weight

Log file: log

Additional data: yes

Clean and robust accuracy: 88.25% / 60.04%

Architecture: WRN-28-10

Description of the model/defense: We introduce adversarial weight perturbations to adversarial training and its variants (TRADES, MART, RST, etc.), which can help the robust generalization.

Add Entropic Retraining

Paper: Optimizing Information Loss Towards Robust Neural Networks

Venue: DYNAMICS 2020 – DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Dataset and threat model: CIFAR-10, L2, eps = 0.5

Code: Not published

Pre-trained model: Pre-trained model available here

Log file: Log file of the evaluation available here

Additional data: No

Clean and robust accuracy: Clean accuracy: 81.75%. Robust accuracy: 56.67%

Architecture: Simple convolutional network: Four convolutional layers, the first two with filter of size 32, the second pair with filter size of 64. Each pair of convolutional layers is followed by a max-pooling layer. Finally, one flatten layer and six dense layers with 512, 256, 128, 128, 84, and 10 neurons each.

Description of the model/defense: New loss: During training, which is based normal data only, an adapted loss function is used. The additional loss-term is based on information theoretic inspired metrics. The method does not require the generation of adversarial examples during training.

Add Backward Smoothing

Paper: Efficient Robust Training via Backward Smoothing https://arxiv.org/abs/2010.01278

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10/CIFAR100, Linf, 8/255

Code: https://github.com/jinghuichen/AutoAttackEval

Pre-trained model: https://drive.google.com/file/d/1lvMa2rbMrIVkAqsyrs_YXLBhewZBfdkP/view?usp=sharing (CIFAR10)
https://drive.google.com/file/d/1xNhK4w5ZuUSfbD_WR4xFKTprojaVux1A/view?usp=sharing (CIFAR100)

Log file: {link to log file of the evaluation}

Additional data: no

Clean and robust accuracy: CIFAR10 clean 85.32 robust 54.94 CIFAR100 clean 62.15 robust 31.92

Architecture: {wideresnet-34-10}

Description of the model/defense: Efficient robust training via backward smoothing

Thanks

Rounding to nearest pixel value breaks almost all attacks

Usually images are stored in uint8 format, in range [0, 255]
Hence when I try to round the values of an image to its nearest interger values, all attacks fail to achieve desired accuracy.

class ModelWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model.eval()
        self.mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(1, 3, 1, 1).to(device)
        self.std = torch.tensor([0.2470, 0.2435, 0.2616]).view(1, 3, 1, 1).to(device)
    
    def forward(self, x):
        x = x.clamp(0, 1)
        x = x * 255
        x = torch.round(x)
        x = x / 255
        x = (x - self.mean) / self.std
        x = self.model(x)
        return x

I know that torch.round() doesn't give useful gradients to the adversary, hence the drop the attack accuracy.
So how to make sure the inputs to the model correspond to valid integer value of [0, 255], but still achieve high attack accuracy?

Add LBGAT

Paper: {Learnable Boundary Guided Adversarial Training; https://arxiv.org/pdf/2011.11164.pdf}

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: {CIFAR-100, L-inf and epsilon 0.031}

Code: { code}

Pre-trained model: LBGAT0-wideresnet-34-10, LBGAT6-wideresnet-34-10, LBGAT6-wideresnet-34-20

Log file: {link to log file of the evaluation}

Additional data: {no}

Clean and robust accuracy: {70.25/27.16, 60.64/29.33, 62.55/30.20}

Architecture: {wideresnet-34-10; wideresnet-34-20}

Description of the model/defense: {Our method aims to enhance model robustness while preserving high natural accuracy. }

Share the evaluation code for "overfitting in adversarial robust deep learning" as in your paper

Hi, @fra31 , thanks for releasing the code for evaluating various defense method.
However, I am curious about the defense in your released table (Rice et al. 2020 overfitting in adversarial robust deep learning), actually they train the model adversarially using the data normalization technique, however, directly using the currentAutoatttack code cannot reproduce the result in your table since you are assuming there is no such normalization. It will cause problems when AA generates adversarial examples.

Could you please share the code for evaluating this defense? Or did you retrain their model without normalization?

Thanks,

Add [Stochastic LWTA]

Paper: Local Competition and Stochasticity for Adversarial Robustness in Deep Learning (http://proceedings.mlr.press/v130/panousis21a)

Venue: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack

Pre-trained model: https://drive.google.com/file/d/15gTO0_HJzRi6toYEtlwA96Hwe49flmWA/view?usp=sharing

Log file: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack/blob/main/log.txt

Additional data: No

Clean and robust accuracy: 90.89 and 87.5

Architecture: {WideResNet-34-5 with Stochastic LWTA Activations}

Description of the model/defense: {This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result
in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring
the sub-part of each layer that is essential for modeling the data at hand. Then, inference is
performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets. As we show, our method achieves high robustness to adversarial perturbations,
with state-of-the-art performance in powerful adversarial attack schemes.}

Unable to load adversarial image

Hi, thank you for providing your open source code for the community. I have tried replicating the basic CIFAR-10 example by running eval.py from the examples folder but I was unable to load a clear picture of the adversarial images. Could I trouble you to take a look at my code and advise me on how I can load it properly? Thank you for your help!

image

pytoch API `view` may not work if tensors are non-contiguous

Hi authors,

I'm not familiar with pytorch, I got some errors occasionally.

It complaint that tensor is not non-contiguous and suggest that reshape is better than view.

auto-attack/fab_tf.py

Lines 394 to 407 in 0185c79

if self.norm == 'Linf':
dist1 = df.abs() / (1e-12 +
dg.abs()
.view(dg.shape[0], dg.shape[1], -1)
.sum(dim=-1))
elif self.norm == 'L2':
dist1 = df.abs() / (1e-12 + (dg ** 2)
.view(dg.shape[0], dg.shape[1], -1)
.sum(dim=-1).sqrt())
elif self.norm == 'L1':
dist1 = df.abs() / (1e-12 + dg.abs().reshape(
[df.shape[0], df.shape[1], -1]).max(dim=2)[0])
else:
raise ValueError('norm not supported')

I also found that the code is not unified. As you seen, L397,L401 is view but L404 is reshape.

An alternative solution is calling .contiguous() before .view(...). Or view should be replace by reshape

I'm not sure which solution is suitable in this project.

Any suggestion?

Add AWP w/o additional data

Paper: Dongxian Wu, Shu-Tao Xia, Yisen Wang. Adversarial Weight Perturbation Helps Robust Generalization

Venue: NeurIPS 2020

Dataset and threat model: CIFAR-10/CIFAR100 under L-inf (8/255) and CIFAR-10 under L-2 (0.5)

Code: https://github.com/csdongxian/AWP/tree/main/auto_attacks

Pre-trained model:
CIFAR-10 under L_inf
CIFAR-100 under L_inf
CIFAR-10 under L_2

Log file:
CIFAR-10 under L_inf
CIFAR-100 under L_inf
CIFAR-10 under L_2

Additional data: no

Clean and robust accuracy:
CIFAR-10 under L_inf: 85.36% / 56.17%
CIFAR-100 under L_inf: 60.38% / 28.86%
CIFAR-10 under L_2: 88.51% / 73.66%

Architecture: WRN-34-10

Description of the model/defense: We introduce adversarial weight perturbations to adversarial training and its variants (TRADES, MART, RST, etc.), which can help the robust generalization.

Width-Adjusted-Regularization

Paper: will be uploaded soon

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10, l-inf, eps=8/255

Code: Training with WAR based on implementation of RST, for testing, please refer to https://github.com/tabrisweapon/A-temp-project

'''
python auto_cifar10.py --width=15 --model-dir=highest.pt
'''

Pre-trained model: https://www.dropbox.com/s/89uuo4w2iaitw04/highest.pt?dl=0

Log file: {link to log file of the evaluation}

Additional data: yes

Clean and robust accuracy: clean: 85.60%, PGD 20 * 0.003: 64.86%

Architecture: WRN-34-15

Description of the model/defense: A new training principle: stronger regularization for wider models

Evaluation on randomized defenses

I see that the attack is modified slightly for the non-deterministic defenses. I could not find part of the code that handles this setting. Could you point me to the code or please make it available? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.