fra31 / auto-attack Goto Github PK

View Code? Open in Web Editor NEW

617.0 9.0 110.0 40.63 MB

Code relative to "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"

Home Page: https://arxiv.org/abs/2003.01690

License: MIT License

Python 100.00%

adversarial-defenses adversarial-attacks adversarial-robustness

auto-attack's Issues

Normalization of CIFAR-10 images

In your script, you didn't use the standard normalization done on CIFAR-10. Would you say that a fair comparison is still given, even when normalization is applied? Or should an approach tested by AutoAttack be trained without this normalization to stay comparable?
Best regards, and thanks for offering this great repo for testing robustness!

Update AutoAttack Framework to pytorch 1.9

Hi,

aa new PyTroch version is released: https://pytorch.org/blog/pytorch-1.9-released/

It showed me while training models, that training time goes down by a factor of 2 or 3...

When using your Framework, some updates are needed.

I would like to push my updates to this repository. Can I have a branch or something like that?

regards
Peter

TF2 implementation

Hi authors,

I sincerely thank all authors for their time and efforts. Auto-attack is a powerful tool which helping me checking defense's robustness.

After reading the source code, I only fond APIs for TF1. The latest version of TF1(tf-1.15.0) published in 6 months ago. I thought that APIs should be upgraded to support TF2.

I am willing to implement APIs for TF2 if necessary.

Thanks

The DLR loss is undefined for classification problems with fewer than 4 classes

The DLR loss is one of the major innovations of your work and is central to one of the four attacks used in the AutoAttack benchmark, APGD-DLR. However, when I was running tests with your framework on a couple of data sets, I noticed that AutoAttack had a tendency to crash when running the APGD-DLR attack. This is caused by the fact that the DLR loss function as defined in equation (6) of your paper implicitly assumes that the classification problem is composed of at least 3 classes; the targeted version presented in equation (7) assumes at least 4 classes.

This limitation raises a number of concerns which I think should be addressed:

The AutoAttack framework itself currently issues no warning and raises no reasonable exceptions when running experiments on data sets with fewer than four classes. Instead, we get an unintuitive index out of bounds exception which makes no sense to someone unfamiliar with this drawback of the DLR loss.
This problem raises the question of how to run the AutoAttack benchmark on, say, binary classification problems without compromising the results. One obvious "solution" is to exclude the APGD-DLR attack from the suite for such data sets, leaving only the APGD-CE, FAB and Square attacks. However, this obviously makes the evaluation of the models weaker, and may call into question the meaningfulness of the results. Ideally, the DLR loss should be generalized to a form that still makes sense even when there are only two classes.

Default configuration of CIFAR100

Hi authors,

Is the default attacking configuration of CIFAR100 identical to CIFAR10? Specifically, the targeted attack is set to 9 for CIFAR10 but CIFAR100 has 100 classes. In my experience, the targeted attacks cannot produce successful adversarial examples when targeted class is set to the 6th largest class for both CIFAR10 and CIFAR100 dataset. Should we search all directions (99 classes) or we can safely search the 9 largest class using the default configuration for CIFAR100?

A missing bracket in code

There is a bracket missing in this line I guess, which may cause an error.

auto-attack/autoattack/fab_pt.py

Line 177 in 67cdc76

if c_l.nelement != 0:

How to use auto-attack with tensorflow?

When I run auto-attack with tensorflow I get an error:

import tensorflow as tf
tf_model = tf.keras.applications.VGG16(input_shape=(224, 224, 3))

file_name = "/home/m.cherepnina/cock.jpg"
image = tf.io.read_file(file_name)
image = tf.image.decode_image(image)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)

labels = [7]
batch_size= 1
images = tf.keras.applications.vgg16.preprocess_input(tf.convert_to_tensor([image])*255)
images = tf.transpose(images, perm=[0,3,2,1])

import utils_tf2
model_adapted = utils_tf2.ModelAdapter(tf_model)

from autoattack import AutoAttack
adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)

x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

output:

[INFO] set data_format = 'channels_last'
setting parameters for standard version
using standard version including apgd-ce, apgd-t, fab-t, square

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-97-0e96c60d65fb> in <module>
     21 adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
     22 
---> 23 x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)

~/auto-attack/autoattack/autoattack.py in run_standard_evaluation(self, x_orig, y_orig, bs)
     81             # calculate accuracy
     82             n_batches = int(np.ceil(x_orig.shape[0] / bs))
---> 83             robust_flags = torch.zeros(x_orig.shape[0], dtype=torch.bool, device=x_orig.device)
     84             for batch_idx in range(n_batches):
     85                 start_idx = batch_idx * bs

RuntimeError: Invalid device string: '/job:localhost/replica:0/task:0/device:CPU:0'

Regarding the checks recently added to AutoAttack

Good evening!

I was browsing the repository, and I found this page with potential checks that can be applied to better evaluate attacks (https://github.com/fra31/auto-attack/blob/master/flags_doc.md)
We worked on a similar topic, and we published a preprint in June (https://arxiv.org/abs/2106.09947) where we develop indicators that trigger when an evaluation is faulty (and among these indicators, there is also the zero-gradient check). In the process, we also evaluated AutoPGD, showing that our systematic checks can patch failures that the automatic algorithm is unable to find.
We also released the code of our research on GitHub (https://github.com/pralab/IndicatorsOfAttackFailure).

It would be great if you could add a reference to our paper/code to that page, as the underlying idea is essentially very similar. Hence, interested users can benefit from both sources as well.

Thank you in advance!

Support FP16 in pytorch

Hi contributors,

Will auto-attack support FP16 (or mixed precision)[1] in pytorch?

In TF2, FP16 is configured at the beginning of main function with one flag tf.keras.mixed_precision.set_global_policy('mixed_float16')

The benefit of FP16 is decreasing elapsed time significantly without losing attacking algorithm's performance.

The following is the output logging of my experimental implementation on V100:

# FP32 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 146 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 142 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 157 out of 500 successfully perturbed
apgd-ce - 8/17 - 148 out of 500 successfully perturbed
apgd-ce - 9/17 - 153 out of 500 successfully perturbed
apgd-ce - 10/17 - 161 out of 500 successfully perturbed
apgd-ce - 11/17 - 166 out of 500 successfully perturbed
apgd-ce - 12/17 - 141 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 125 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.33% (total time 1835.3 s)
apgd-t - 1/12 - 27 out of 500 successfully perturbed
apgd-t - 2/12 - 24 out of 500 successfully perturbed
apgd-t - 3/12 - 23 out of 500 successfully perturbed
apgd-t - 4/12 - 18 out of 500 successfully perturbed
apgd-t - 5/12 - 23 out of 500 successfully perturbed
apgd-t - 6/12 - 16 out of 500 successfully perturbed
apgd-t - 7/12 - 24 out of 500 successfully perturbed
apgd-t - 8/12 - 23 out of 500 successfully perturbed
apgd-t - 9/12 - 28 out of 500 successfully perturbed
apgd-t - 10/12 - 22 out of 500 successfully perturbed
apgd-t - 11/12 - 27 out of 500 successfully perturbed
apgd-t - 12/12 - 22 out of 333 successfully perturbed
robust accuracy after APGD-T: 55.56% (total time 12733.1 s)

# FP16 version
apgd-ce - 1/17 - 159 out of 500 successfully perturbed
apgd-ce - 2/17 - 147 out of 500 successfully perturbed
apgd-ce - 3/17 - 154 out of 500 successfully perturbed
apgd-ce - 4/17 - 141 out of 500 successfully perturbed
apgd-ce - 5/17 - 155 out of 500 successfully perturbed
apgd-ce - 6/17 - 156 out of 500 successfully perturbed
apgd-ce - 7/17 - 158 out of 500 successfully perturbed
apgd-ce - 8/17 - 147 out of 500 successfully perturbed
apgd-ce - 9/17 - 156 out of 500 successfully perturbed
apgd-ce - 10/17 - 160 out of 500 successfully perturbed
apgd-ce - 11/17 - 164 out of 500 successfully perturbed
apgd-ce - 12/17 - 139 out of 500 successfully perturbed
apgd-ce - 13/17 - 158 out of 500 successfully perturbed
apgd-ce - 14/17 - 152 out of 500 successfully perturbed
apgd-ce - 15/17 - 155 out of 500 successfully perturbed
apgd-ce - 16/17 - 151 out of 500 successfully perturbed
apgd-ce - 17/17 - 126 out of 412 successfully perturbed
robust accuracy after APGD-CE: 58.34% (total time 751.5 s)
apgd-t - 1/12 - 28 out of 500 successfully perturbed
apgd-t - 2/12 - 22 out of 500 successfully perturbed
apgd-t - 3/12 - 24 out of 500 successfully perturbed
apgd-t - 4/12 - 16 out of 500 successfully perturbed
apgd-t - 5/12 - 20 out of 500 successfully perturbed
apgd-t - 6/12 - 15 out of 500 successfully perturbed
apgd-t - 7/12 - 23 out of 500 successfully perturbed
apgd-t - 8/12 - 25 out of 500 successfully perturbed
apgd-t - 9/12 - 29 out of 500 successfully perturbed
apgd-t - 10/12 - 21 out of 500 successfully perturbed
apgd-t - 11/12 - 26 out of 500 successfully perturbed
apgd-t - 12/12 - 21 out of 334 successfully perturbed
robust accuracy after APGD-T: 55.64% (total time 5264.7 s)

As shown in logging, the performance is improved very huge (1835.3[s] -> 751.5[s]).

However, FP16 required newer pytorch version and CUDA hardware. Additional, source code should be modified properly.

I'm not sure whether FP16 will be supported on master branch in the future?

[1] https://pytorch.org/docs/stable/notes/amp_examples.html

Python Package

Are you considering releasing this project as a python package?

What should be reported, the average of results or the minimum?

Firstly, thank you very much for this repo, it is very helpful.

I performed the AA attack on my model and got the following results:

robust accuracy by APGD-CE 28.00% 	 (time attack: 6.2 s)
robust accuracy by APGD-T 	 36.00% 	 (time attack: 15.6 s)
robust accuracy by FAB-T 	 92.00% 	 (time attack: 76.9 s)
robust accuracy by SQUARE 	 65.00% 	 (time attack: 67.5 s)

The average yields 55.25 but the minimum is 28. So what should be reported as the accuracy of the model on AutoAttack?

Also, out of curiousity, most of the models shown in the leaderboard are from the ResNet family of architectures with one densenet network. The AutoAttack can even be used for other architectures like VGG right?

Why do you evaluate your model without standard mean and standard variance normalization?

In this file:
https://github.com/fra31/auto-attack/blob/master/autoattack/examples/eval.py
line 37 there is the transform statement without normalization...

I am wondering if you have trained the model without normalization, too?

Add Entropic Retraining

Paper: Optimizing Information Loss Towards Robust Neural Networks

Venue: DYNAMICS 2020 – DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Dataset and threat model: CIFAR-10, L2, eps = 0.5

Code: Not published

Pre-trained model: Pre-trained model available here

Log file: Log file of the evaluation available here

Additional data: No

Clean and robust accuracy: Clean accuracy: 81.75%. Robust accuracy: 56.67%

Architecture: Simple convolutional network: Four convolutional layers, the first two with filter of size 32, the second pair with filter size of 64. Each pair of convolutional layers is followed by a max-pooling layer. Finally, one flatten layer and six dense layers with 512, 256, 128, 128, 84, and 10 neurons each.

Description of the model/defense: New loss: During training, which is based normal data only, an adapted loss function is used. The additional loss-term is based on information theoretic inspired metrics. The method does not require the generation of adversarial examples during training.

Random seed

Hi,
I am a big fan of autoattack and robustbench. The centralization/standardization of adversarial robustness is so helpful. :)

I'm working on a new approach to adversarial robustness and am evaluating on autoattack.

Unfortunately, I can't exactly replicate the values on robustbench with a random seed of 0 or 1. Could you please share the random seed you use for the numbers on the leaderboard?

Thanks

Rounding to nearest pixel value breaks almost all attacks

Usually images are stored in uint8 format, in range [0, 255]
Hence when I try to round the values of an image to its nearest interger values, all attacks fail to achieve desired accuracy.

class ModelWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model.eval()
        self.mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(1, 3, 1, 1).to(device)
        self.std = torch.tensor([0.2470, 0.2435, 0.2616]).view(1, 3, 1, 1).to(device)
    
    def forward(self, x):
        x = x.clamp(0, 1)
        x = x * 255
        x = torch.round(x)
        x = x / 255
        x = (x - self.mean) / self.std
        x = self.model(x)
        return x

I know that torch.round() doesn't give useful gradients to the adversary, hence the drop the attack accuracy.
So how to make sure the inputs to the model correspond to valid integer value of [0, 255], but still achieve high attack accuracy?

Width-Adjusted-Regularization

Paper: will be uploaded soon

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10, l-inf, eps=8/255

Code: Training with WAR based on implementation of RST, for testing, please refer to https://github.com/tabrisweapon/A-temp-project

'''
python auto_cifar10.py --width=15 --model-dir=highest.pt
'''

Pre-trained model: https://www.dropbox.com/s/89uuo4w2iaitw04/highest.pt?dl=0

Log file: {link to log file of the evaluation}

Additional data: yes

Clean and robust accuracy: clean: 85.60%, PGD 20 * 0.003: 64.86%

Architecture: WRN-34-15

Description of the model/defense: A new training principle: stronger regularization for wider models

eps 8./255. works fine 4./255. does not work fine.

I want to run the standard attack on different epsilons for the perturbations.
It also works on different datasets except one.

my normalizaion:

mean:  [0.36015135049819946, 0.21252931654453278, 0.1168241947889328]
std :  [0.24773411452770233, 0.20017878711223602, 0.17963241040706

using standard version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 91.60%
apgd-ce - 1/1 - 431 out of 458 successfully perturbed
robust accuracy after APGD-CE: 5.40% (total time 110.8 s)
Traceback (most recent call last):
  File "/home/user/adversialml/src/src/attacks.py", line 104, in <module>
    adv_complete, max_nr = adversary.run_standard_evaluation(x_test, y_test, bs=args.batch_size)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autoattack.py", line 172, in run_standard_evaluation
    adv_curr = self.apgd_targeted.perturb(x, y) #cheap=True
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 682, in perturb
    res_curr = self.attack_single_run(x_to_fool, y_to_fool)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 279, in attack_single_run
    loss_indiv = criterion_indiv(logits, y)
  File "/home/user/adversialml/src/src/submodules/autoattack/autoattack/autopgd_base.py", line 611, in dlr_loss_targeted
    x_sorted[:, -3] + x_sorted[:, -4]) + 1e-12)
IndexError: index -3 is out of bounds for dimension 1 with size 2

Any suggestions what to do?

Adversarials are equals to originals

Hi. I run autoattack using your example autoattack/examples/eval.py:

data_dir = './data_CIFAR10'
save_dir = './results_data_CIFAR10'
norm = 'Linf'
epsilon = 0.5
log_path = './log_file.txt'
version = 'standard'
individual = 'store_true'
n_ex = 100
batch_size = 500

model = models.resnet18(pretrained=True)
model.cuda()
model.eval()

# load data
transform_list = [transforms.ToTensor()]
transform_chain = transforms.Compose(transform_list)
item = datasets.CIFAR10(root=data_dir, train=False, transform=transform_chain, download=True)
test_loader = data.DataLoader(item, batch_size=1000, shuffle=False, num_workers=0)

# create save dir
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

# load attack    
from autoattack import AutoAttack
adversary = AutoAttack(model, norm=norm, eps=epsilon, log_path=log_path, version=version)

l = [x for (x, y) in test_loader]
x_test = torch.cat(l, 0)
l = [y for (x, y) in test_loader]
y_test = torch.cat(l, 0)

# example of custom version
if version == 'custom':
    adversary.attacks_to_run = ['apgd-ce', 'fab']
    adversary.apgd.n_restarts = 2
    adversary.fab.n_restarts = 2

# run attack and save images
with torch.no_grad():
    if not individual:
        adv_complete = adversary.run_standard_evaluation(x_test[:n_ex], y_test[:n_ex], bs=batch_size)
        torch.save({'adv_complete': adv_complete}, '{}.pth'.format(save_dir))

    else:
        # individual version, each attack is run on all test points
        adv_complete = adversary.run_standard_evaluation_individual(x_test[:n_ex],
            y_test[:n_ex], bs=batch_size)
        torch.save(adv_complete, '{}.pth'.format(save_dir))

But the result adversarials are equals to original inputs:

for key in adv_complete.keys():
    print(f"{key}: {np.all(adv_complete[key][0].numpy() == x_test[0].numpy())}")

>> apgd-ce: True
>> apgd-t: True
>> fab-t: True
>> square: True

I tried to use epsilon = 8./255. and epsilon = 0.5. The result was not changed((
Could you please explain me where i am wrong?

Add LBGAT

Paper: {Learnable Boundary Guided Adversarial Training; https://arxiv.org/pdf/2011.11164.pdf}

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: {CIFAR-100, L-inf and epsilon 0.031}

Code: { code}

Pre-trained model: LBGAT0-wideresnet-34-10, LBGAT6-wideresnet-34-10, LBGAT6-wideresnet-34-20

Log file: {link to log file of the evaluation}

Additional data: {no}

Clean and robust accuracy: {70.25/27.16, 60.64/29.33, 62.55/30.20}

Architecture: {wideresnet-34-10; wideresnet-34-20}

Description of the model/defense: {Our method aims to enhance model robustness while preserving high natural accuracy. }

Unable to customize attacks' parameters

Hi, authors

I notice that some issues when I customize attacks.

First, attacks_to_run or some basic parameters are set at the beginning and set_version is called at the end of constructor.

auto-attack/autoattack/autoattack.py

Lines 10 to 24 in 8af30a9

 class AutoAttack(): 

 def __init__(self, model, norm='Linf', eps=.3, seed=None, verbose=True, 

 attacks_to_run=[], version='standard', is_tf_model=False, 

 device='cuda', log_path=None): 

 self.model = model 

 self.norm = norm 

 assert norm in ['Linf', 'L2'] 

 self.epsilon = eps 

 self.seed = seed 

 self.verbose = verbose 

 self.attacks_to_run = attacks_to_run 

 self.version = version 

 self.is_tf_model = is_tf_model 

 self.device = device 

 self.logger = Logger(log_path)

auto-attack/autoattack/autoattack.py

Lines 60 to 61 in 8af30a9

 if version in ['standard', 'plus', 'rand']: 

 self.set_version(version)

Therefore, those parameters are overwritten by set_version.

Second, the detailed of configuration are set in set_version.

auto-attack/autoattack/autoattack.py

Lines 229 to 251 in 8af30a9

 def set_version(self, version='standard'): 

 if version == 'standard': 

 self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square'] 

 self.apgd.n_restarts = 1 

 self.fab.n_restarts = 1 

 self.apgd_targeted.n_restarts = 1 

 self.fab.n_target_classes = 9 

 self.apgd_targeted.n_target_classes = 9 

 self.square.n_queries = 5000 

 elif version == 'plus': 

 self.attacks_to_run = ['apgd-ce', 'apgd-dlr', 'fab', 'square', 'apgd-t', 'fab-t'] 

 self.apgd.n_restarts = 5 

 self.fab.n_restarts = 5 

 self.apgd_targeted.n_restarts = 1 

 self.fab.n_target_classes = 9 

 self.apgd_targeted.n_target_classes = 9 

 self.square.n_queries = 5000 

 elif version == 'rand': 

 self.attacks_to_run = ['apgd-ce', 'apgd-dlr'] 

 self.apgd.n_restarts = 1 

 self.apgd.eot_iter = 20

I would suggest that public APIs are required in order to set the detailed.

Interested in adding models to the table!

Is it still possible to have new models added to the table? I remember seeing some template earlier, but it seems to have been removed. What is the current protocol? Or is it a work in progress at the moment?

Normalized input data for AutoAttack

Hi authors,

If our models for CIFAR-10 are trained with normalized data on CIFAR-10 (mean=(0.4914, 0.4822, 0.4465), std=(0.2023, 0.1994, 0.2010)), are we able to evaluate the robust accuracy through your AutoAttack? Because I could not find a normalized neither in eval.py nor in autoattack.py.

Thanks,
Liang

Evaluation of Defended Models

Is there any way I can provide my robust WRN-28-10 model and can get the AA adversarial accuracy on that model?

DLR loss implementation in TF2

Hi authors,

The DLR loss in TF2 may be incomplete. Misclassified cases are not be implemented:

auto-attack/autoattack/utils_tf2.py

Lines 196 to 202 in 42189aa

 def dlr_loss(x, y, num_classes=10): 

 x_sort = tf.sort(x, axis=1) 

 y_onehot = tf.one_hot(y, num_classes) 

 ### TODO: adapt to the case when the point is already misclassified 

 loss = -(x_sort[:, -1] - x_sort[:, -2]) / (x_sort[:, -1] - x_sort[:, -3] + 1e-12) 

 return loss

I tried to implement DLR loss from the original paper's description in Section 4.1. Please verify the following code's correctness.

def dir_loss(x, y, num_classes=10):

    # logit
    logit = x
    logit_max = tf.reduce_max(logit, axis=1)
    logit_sort = tf.sort(logit, axis=1)

    # onthot_y
    #argmax_y = tf.argmax(y, axis = 1)
    argmax_y = y
    y_onehot = tf.one_hot(argmax_y , num_classes, dtype=tf.float32)
    logit_y = tf.reduce_sum(y_onehot * logit, axis=1)

    # z_i
    cond = (logit_max == logit_y)
    z_i = tf.where(cond, logit_sort[:, -2], logit_sort[:, -1])

    # loss
    z_y = logit_y
    z_p1 =  logit_sort[:, -1]
    z_p3 = logit_sort[:, -3]

    loss = - (z_y - z_i) / (z_p1 - z_p3 + 1e-12)

    return loss

If the code is acceptable, I will open a new PR.

Add [Stochastic LWTA]

Paper: Local Competition and Stochasticity for Adversarial Robustness in Deep Learning (http://proceedings.mlr.press/v130/panousis21a)

Venue: International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack

Pre-trained model: https://drive.google.com/file/d/15gTO0_HJzRi6toYEtlwA96Hwe49flmWA/view?usp=sharing

Log file: https://github.com/konpanousis/Adversarial-LWTA-AutoAttack/blob/main/log.txt

Additional data: No

Clean and robust accuracy: 90.89 and 87.5

Architecture: {WideResNet-34-5 with Stochastic LWTA Activations}

Description of the model/defense: {This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result
in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring
the sub-part of each layer that is essential for modeling the data at hand. Then, inference is
performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets. As we show, our method achieves high robustness to adversarial perturbations,
with state-of-the-art performance in powerful adversarial attack schemes.}

Width-Adjusted-Regularization Update

Paper: { http://arxiv.org/abs/2010.01279 }

Venue: {unpublished}

Dataset and threat model: {CIFAR-10, l-inf, eps=8/255, AutoAttack}

Code: {Same with the last report}

Pre-trained model: {https://www.dropbox.com/s/89i5zoxa2ugglaq/wrn-34-15-cad59.pt?dl=0 }

Log file: {None}

Additional data: {yes}

Clean and robust accuracy: {clean:87.67%, AutoAttack: 60.65%}

Architecture: {WideResNet-34-15}

Description of the model/defense:
’‘’
Dear authors of AutoAttack:
This is an update for our last submission in #21. Here we report our new best results and hope to replace it with the current one on the table (the 4th one). We also change our title from "Does Network Width Really Help Adversarial Robustness?" to "Do Wider Neural Networks Really Help Adversarial Robustness?". Please update this information for us on the RobustBench too.

Thanks!
Boxi Wu
‘’‘

Add Adversarial Training with Early Stopping (ATES), CIFAR-10

Paper: Improving Adversarial Robustness Through Progressive Hardening https://arxiv.org/abs/2003.09347

Venue: under review

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/chawins/ates-minimal

Pre-trained model: weight

Log file: log

Additional data: no

Clean and robust accuracy: 86.84/50.72

Architecture: WRN-34-10

Description of the model/defense: We use the curriculum learning framework to schedule the "difficulty" of adversarial examples generated during adversarial training. This improves both clean and robust accuracy.

Compactness+Robustness

Paper: https://arxiv.org/pdf/2002.10509.pdf

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10, l-inf, eps=8/255

Code: https://github.com/inspire-group/compactness-robustness. A minimal script to evaluate a WRN-28-10 model is available at https://gist.github.com/VSehwag/688632e523df5d2a4c8008f5ee567b1c (only need to download the checkpoint)

Pre-trained model: https://www.dropbox.com/sh/56yyfy16elwbnr8/AADmr7bXgFkrNdoHjKWwIFKqa?dl=0 Use the model_best_dense.pth.tar

Log file: {link to log file of the evaluation}

Additional data: yes

Clean and robust accuracy: Benign test accuracy = 88.97% , PGD-50 test accuracy (1-restart) = 62.24%, Auto-attack (cheap): 57.88%

Architecture: WRN-28-10 (90% connections pruned)

Description of the model/defense: Compressed neural networks while simultaneously aching high robustness.

Loading data batch wise?

Hi, and thanks for open-sourcing your code!

My machine has a memory error when running evaluation scripts on several models ; it looks like the load_cifar10 function (and others) load the full dataset in memory, and then iterates over the tensor. Am I correct, or is there a way of loading the data sequentially to gpu?

Thanks!

Unable to load adversarial image

Hi, thank you for providing your open source code for the community. I have tried replicating the basic CIFAR-10 example by running eval.py from the examples folder but I was unable to load a clear picture of the adversarial images. Could I trouble you to take a look at my code and advise me on how I can load it properly? Thank you for your help!

Add AWP w/o additional data

Paper: Dongxian Wu, Shu-Tao Xia, Yisen Wang. Adversarial Weight Perturbation Helps Robust Generalization

Venue: NeurIPS 2020

Dataset and threat model: CIFAR-10/CIFAR100 under L-inf (8/255) and CIFAR-10 under L-2 (0.5)

Code: https://github.com/csdongxian/AWP/tree/main/auto_attacks

Pre-trained model:
CIFAR-10 under L_inf
CIFAR-100 under L_inf
CIFAR-10 under L_2

Log file:
CIFAR-10 under L_inf
CIFAR-100 under L_inf
CIFAR-10 under L_2

Additional data: no

Clean and robust accuracy:
CIFAR-10 under L_inf: 85.36% / 56.17%
CIFAR-100 under L_inf: 60.38% / 28.86%
CIFAR-10 under L_2: 88.51% / 73.66%

Architecture: WRN-34-10

Description of the model/defense: We introduce adversarial weight perturbations to adversarial training and its variants (TRADES, MART, RST, etc.), which can help the robust generalization.

Add LBGAT on CIFAR-10

Paper: {Learnable Boundary Guided Adversarial Training; https://arxiv.org/pdf/2011.11164.pdf}

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: {CIFAR-10, L-inf and epsilon 0.031}

Code: { code}

Pre-trained model: LBGAT0-wideresnet-34-10, LBGAT0-wideresnet-34-20

Log file: {LBGAT0-wideresnet-34-10, LBGAT0-wideresnet-34-20}

Additional data: {no}

Clean and robust accuracy: {88.22/52.86, 88.70/53.57}

Architecture: {wideresnet-34-10; wideresnet-34-20}

Description of the model/defense: {Our method aims to enhance model robustness while preserving high natural accuracy. }

Question about variation in reported (clean & robust accuracy) metrics

Hello, running into something confusing & wondering if if I am using AA correctly.
When I evaluate my network using AA with default settings,
there is a small variation in reported the clean and robust accuracy compared to when I compute metrics myself using
the output of run_standard_evaluation:

For example, here I evaluate the clean accuracy

> (model(x_test).argmax(1) == y_test.argmax(1)).sum()/len(x_test)
0.87

Here, I want to evaluate robust accuracy using AA.

>x_adv = adversary.run_standard_evaluation(x_test, y_test.argmax(1), bs=32)
using standard version including apgd-ce, apgd-t, fab-t, square
initial accuracy: 85.00%
.
.
.
robust accuracy: 49.00%
> (model(x_adv).argmax(1) == y_test.argmax(1)).sum()/len(x_adv)
0.54

So first, there is a 2-percent difference in clean accuracy, but more seriously, there is a large difference in the robust accuracies. I have verified the intermediate output of autoattack- e.g. when I add up the numbers by hand I get 42.00% accuracy.
The weird thing is the big difference when I use the x_adv output of AA. Is there something I'm missing about the output x_adv? My defense is basically adversarial training on a wide-resnet, so no randomness in the forward pass.

Add Backward Smoothing

Paper: Efficient Robust Training via Backward Smoothing https://arxiv.org/abs/2010.01278

Venue: {if applicable, the venue where the paper appeared}

Dataset and threat model: CIFAR-10/CIFAR100, Linf, 8/255

Code: https://github.com/jinghuichen/AutoAttackEval

Pre-trained model: https://drive.google.com/file/d/1lvMa2rbMrIVkAqsyrs_YXLBhewZBfdkP/view?usp=sharing (CIFAR10)
https://drive.google.com/file/d/1xNhK4w5ZuUSfbD_WR4xFKTprojaVux1A/view?usp=sharing (CIFAR100)

Log file: {link to log file of the evaluation}

Additional data: no

Clean and robust accuracy: CIFAR10 clean 85.32 robust 54.94 CIFAR100 clean 62.15 robust 31.92

Architecture: {wideresnet-34-10}

Description of the model/defense: Efficient robust training via backward smoothing

Thanks

Share the evaluation code for "overfitting in adversarial robust deep learning" as in your paper

Hi, @fra31 , thanks for releasing the code for evaluating various defense method.
However, I am curious about the defense in your released table (Rice et al. 2020 overfitting in adversarial robust deep learning), actually they train the model adversarially using the data normalization technique, however, directly using the currentAutoatttack code cannot reproduce the result in your table since you are assuming there is no such normalization. It will cause problems when AA generates adversarial examples.

Could you please share the code for evaluating this defense? Or did you retrain their model without normalization?

Thanks,

pytoch API `view` may not work if tensors are non-contiguous

Hi authors,

I'm not familiar with pytorch, I got some errors occasionally.

It complaint that tensor is not non-contiguous and suggest that reshape is better than view.

auto-attack/fab_tf.py

Lines 394 to 407 in 0185c79

 if self.norm == 'Linf': 

 dist1 = df.abs() / (1e-12 + 

 dg.abs() 

 .view(dg.shape[0], dg.shape[1], -1) 

 .sum(dim=-1)) 

 elif self.norm == 'L2': 

 dist1 = df.abs() / (1e-12 + (dg ** 2) 

 .view(dg.shape[0], dg.shape[1], -1) 

 .sum(dim=-1).sqrt()) 

 elif self.norm == 'L1': 

 dist1 = df.abs() / (1e-12 + dg.abs().reshape( 

 [df.shape[0], df.shape[1], -1]).max(dim=2)[0]) 

 else: 

 raise ValueError('norm not supported')

I also found that the code is not unified. As you seen, L397,L401 is view but L404 is reshape.

An alternative solution is calling .contiguous() before .view(...). Or view should be replace by reshape

I'm not sure which solution is suitable in this project.

Any suggestion?

Add FSGM_APR_SP

Paper: Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain

Venue: ICCV 2021

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: code

Pre-trained model: model

Log file: log

Additional data: no

Clean and robust accuracy: 84.30% / 45.70%

Architecture: RN-18

Description of the model/defense: Motivated by the powerful generalizability of the human, we argue that reducing the dependence on the amplitude spectrum and enhancing the ability to capture phase spectrum can improve the robustness of CNN.

Add Entropic Retraining

Paper: Optimizing Information Loss Towards Robust Neural Networks

Venue: DYNAMICS 2020 – DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security

Dataset and threat model: CIFAR-10, Linf, eps = 8/255

Code: Not published

Pre-trained model: Pre-trained model available here

Log file: Log file of the evaluation available here

Additional data: No

Clean and robust accuracy: Clean accuracy: 81.75%. Robust accuracy: 41.88%

Evaluation on randomized defenses

I see that the attack is modified slightly for the non-deterministic defenses. I could not find part of the code that handles this setting. Could you point me to the code or please make it available? Thank you!

Add AWP w/ additional data

Paper: Dongxian Wu, Shu-Tao Xia, Yisen Wang. Adversarial Weight Perturbation Helps Robust Generalization

Venue: NeurIPS 2020

Dataset and threat model: CIFAR-10, L-inf, 8/255

Code: https://github.com/csdongxian/AWP/tree/main/auto_attacks

Pre-trained model: weight

Log file: log

Additional data: yes

Clean and robust accuracy: 88.25% / 60.04%

Architecture: WRN-28-10

Description of the model/defense: We introduce adversarial weight perturbations to adversarial training and its variants (TRADES, MART, RST, etc.), which can help the robust generalization.

ImportError: cannot import name 'zero_gradients' from 'torch.autograd.gradcheck'

I updated my pytorch to 1.9.0 via

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

After that, I got the next error message.

File "/home/satoharu/AdEx/SDPML/src/attacker/Attacker.py", line 65, in attack
    adv_example = self._attack(log=log, *args, **kwargs)
  File "/home/satoharu/AdEx/SDPML/src/attacker/AUTOPGD.py", line 96, in _attack
    adversary = AutoAttack(
  File "/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/autoattack/autoattack.py", line 31, in __init__
    from .fab_pt import FABAttack
  File "/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/autoattack/fab_pt.py", line 16, in <module>
    from torch.autograd.gradcheck import zero_gradients
ImportError: cannot import name 'zero_gradients' from 'torch.autograd.gradcheck' (/home/satoharu/.pyenv/versions/3.9.4_sdp/lib/python3.9/site-packages/torch/autograd/gradcheck.py)

I think the zero_gradients method was removed.

Reference

https://pytorch.org/get-started/locally/

Missing file other_utils.py

autoattack.py fails because of a missing module

import other_utils as utils

which is presumably because of a missing other_utils.py file.
Could you please upload it?

Criterion for adding a new models to existing list of defenses?

Thanks for releasing such a rigorous evaluation of existing works on adversarial defenses. It is immensely helpful to get more clarity on this topic. I wonder what is the criterion to add new models to the existing list of defenses?

In particular, I am wondering whether papers (such as https://arxiv.org/pdf/2002.10509.pdf), which study adversarial training (in particular the SOTA approach from Carmon et al., 2019) in a new setting qualifies for it? In particular, the aforementioned paper revisits the question "whether high capacity is necessary for adversarial robustness" from Madry et al., 2018 and shows that high robustness can be achieved even after removing up to 99% of the parameters.

In general, it could be nice addition in the repo to have an evaluation of works that directly do not aim to improve robustness, but try to preserve it in the presence of other challenges (such as label noise, pruning etc).

Thanks.

Modified Square Attack to Randomized Defense

Hi,

I want to apply the Square Attack to a randomized defense. You mentioned in the paper (https://arxiv.org/pdf/2003.01690.pdf) as follows:

We modify Square Attack to accept an update if it reduces the target loss on average over 20 forward passes and, as this costs more time we use only 1000 iterations,

but I don't know how to use the modified version of the Square Attack? Is it not implemented yet? Can you elaborate more on how to modify the code?

Thanks in advance!

Autoattack and APGDT

Hi, I am a primary learner and still feel little bit confused about the autoattack and target apgdt. (1) For the standard autoattack, do all images are attacked by 4 attacks respectively and calculate the average robust accuracy of 4 attacks after all? And in autoattack, are adversarial images for each attack saved respectively? (2) And for target apgdt, how the target label is found? I see that the number of target label is equal to the number of total class minus one for CIFAR-10. If I want to use it for the ImageNet, should I set n_target_classes = 999 or any number among 1-999? What's the principal for the target setting?

Looking forwards to your help! Thanks!

Add Adversarial Training with Early Stopping (ATES), CIFAR-100

Paper: Improving Adversarial Robustness Through Progressive Hardening https://arxiv.org/abs/2003.09347

Venue: under review

Dataset and threat model: CIFAR-100, L-inf, 8/255

Code: https://github.com/chawins/ates-minimal

Pre-trained model: weight

Log file: log

Additional data: no

Clean and robust accuracy: 62.82/24.57

Architecture: WRN-34-10

Information related to existing defenses

Dear authors,

Thank you for releasing the exhaustive list of state-of-the-art defenses along with their evaluations on a standard set of benchmark attacks. This is very helpful to the research community working in this area.
Could you please share more information related to the models listed on the leaderboard? Specifically, it would be helpful if you can share information related to the fields that we need to fill up while submitting a new defense such as architecture, link to code, link to model weights and a brief description of the defense.

Thanks,
Sravanti

Slow evaluation on a dataset with a large number of classes in TF1.x setting

Thank you for releasing the evaluation code.
AA attack evaluation is very helpful in studying the adversarial robustness, especially on Cifar10 benchmark.
However, when I tried to evaluate my model(trained on cifar100, TF1.13), it took too much time to run the code.
The longer the running time is, the longer the gpu utilization is maintained at 0%.
How can I solve this problem?

Adaptive with the normalization of [-1, 1]

Hi, I found the normalization used for images is [0, 1]. If the normalization of images is [-1, 1], how could I revise the attack codes?

Guidelines for evaluating defense using Auto Attack

Hello, what is the protocol for using Auto Attack to evaluate a Defense?
For example, it has been found here that Shattered Gradients and Stochastic Gradients could result in a false sense of security.
The README mentions one should use version='rand' to combat Stochastic Gradients, but what is the protocol for Shattered Gradients? I know that Square Attack will not be affected by Shattered Gradients since it uses Black Box access only.

For example, what if I train a ResNet50 to solve CIFAR-10, and then attach a GradientKiller layer at the bottom? How will Auto Attack compute adversarial examples then?

Code in Pytorch:

class GradientKiller(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input):
        return input

    @staticmethod
    def backward(ctx, grad_output):
        return torch.zeros(grad_output.size())

class SuperGeniusDefense(torch.nn.Module):
    def __init__(self, trained_model):
        super(SuperGeniusDefense, self).__init__()
        self.model = trained_model

    def forward(X):
        X = GradientKiller.apply(X)
        X = self.model(X)
        return X

Now providing the AutoAttack adversary with the forward pass of SuperGeniusDefense will provide a false sense of security when it is actually not safer than SuperGeniusDefense.model.

	class AutoAttack():
	def __init__(self, model, norm='Linf', eps=.3, seed=None, verbose=True,
	attacks_to_run=[], version='standard', is_tf_model=False,
	device='cuda', log_path=None):
	self.model = model
	self.norm = norm
	assert norm in ['Linf', 'L2']
	self.epsilon = eps
	self.seed = seed
	self.verbose = verbose
	self.attacks_to_run = attacks_to_run
	self.version = version
	self.is_tf_model = is_tf_model
	self.device = device
	self.logger = Logger(log_path)

	if version in ['standard', 'plus', 'rand']:
	self.set_version(version)

	def set_version(self, version='standard'):
	if version == 'standard':
	self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square']
	self.apgd.n_restarts = 1
	self.fab.n_restarts = 1
	self.apgd_targeted.n_restarts = 1
	self.fab.n_target_classes = 9
	self.apgd_targeted.n_target_classes = 9
	self.square.n_queries = 5000

	elif version == 'plus':
	self.attacks_to_run = ['apgd-ce', 'apgd-dlr', 'fab', 'square', 'apgd-t', 'fab-t']
	self.apgd.n_restarts = 5
	self.fab.n_restarts = 5
	self.apgd_targeted.n_restarts = 1
	self.fab.n_target_classes = 9
	self.apgd_targeted.n_target_classes = 9
	self.square.n_queries = 5000

	elif version == 'rand':
	self.attacks_to_run = ['apgd-ce', 'apgd-dlr']
	self.apgd.n_restarts = 1
	self.apgd.eot_iter = 20

	def dlr_loss(x, y, num_classes=10):
	x_sort = tf.sort(x, axis=1)
	y_onehot = tf.one_hot(y, num_classes)
	### TODO: adapt to the case when the point is already misclassified
	loss = -(x_sort[:, -1] - x_sort[:, -2]) / (x_sort[:, -1] - x_sort[:, -3] + 1e-12)

	return loss

	if self.norm == 'Linf':
	dist1 = df.abs() / (1e-12 +
	dg.abs()
	.view(dg.shape[0], dg.shape[1], -1)
	.sum(dim=-1))
	elif self.norm == 'L2':
	dist1 = df.abs() / (1e-12 + (dg ** 2)
	.view(dg.shape[0], dg.shape[1], -1)
	.sum(dim=-1).sqrt())
	elif self.norm == 'L1':
	dist1 = df.abs() / (1e-12 + dg.abs().reshape(
	[df.shape[0], df.shape[1], -1]).max(dim=2)[0])
	else:
	raise ValueError('norm not supported')

fra31 / auto-attack Goto Github PK

auto-attack's Issues

Reference

Recommend Projects

Recommend Topics

Recommend Org