Giter VIP home page Giter VIP logo

fast_adversarial's Introduction

Fast adversarial training using FGSM

A repository that implements the fast adversarial training code using an FGSM adversary, capable of training a robust CIFAR10 classifier in 6 minutes and a robust ImageNet classifier in 12 hours. Created by Eric Wong, Leslie Rice, and Zico Kolter. See our paper on arXiv here, which was inspired by the free adversarial training paper here by Shafahi et al. (2019).

News

  • 12/19/2019 - Accepted to ICLR 2020
  • 1/14/2019 - arXiv posted and repository release

What is in this repository?

  • An implementation of the FGSM adversarial training method with randomized initialization for MNIST, CIFAR10, and ImageNet
  • Cyclic learning rates and mixed precision training using the apex library to achieve DAWNBench-like speedups
  • Pre-trained models using this code base
  • The ImageNet code is mostly forked from the free adversarial training repository, with the corresponding modifications for fast FGSM adversarial training

Installation and usage

  • All examples can be run without mixed-precision with PyTorch v1.0 or higher
  • To use mixed-precision training, follow the apex installation instructions here

But wait, I thought FGSM training didn't work!

As one of the earliest methods for generating adversarial examples, the Fast Gradient Sign Method (FGSM) is also known to be one of the weakest. It has largely been replaced by the PGD-based attacked, and it's use as an attack has become highly discouraged when evaluating adversarial robustness. Afterall, early attempts at using FGSM adversarial training (including variants of randomized FGSM) were unsuccessful, and this was largely attributed to the weakness of the attack.

However, we discovered that a fairly minor modification to the random initialization for FGSM adversarial training allows it to perform as well as the much more expensive PGD adversarial training. This was quite surprising to us, and suggests that one does not need very strong adversaries to learn robust models! As a result, we pushed the FGSM adversarial training to the limit, and found that by incorporating various techniques for fast training used in the DAWNBench competition, we could learn robust architectures an order of magnitude faster than before, while achieving the same degrees of robustness. A couple of the results from the paper are highlighted in the table below.

CIFAR10 Acc CIFAR10 Adv Acc (eps=8/255) Time (minutes)
FGSM 86.06% 46.06% 12
Free 85.96% 46.33% 785
PGD 87.30% 45.80% 4966
ImageNet Acc ImageNet Adv Acc (eps=2/255) Time (hours)
FGSM 60.90% 43.46% 12
Free 64.37% 43.31% 52

But I've tried FGSM adversarial training before, and it didn't work!

In our experiments, we discovered several failure modes which would cause FGSM adversarial training to ``catastrophically fail'', like in the following plot.

overfitting

If FGSM adversarial training hasn't worked for you in the past, then it may be because of one of the following reasons (which we present as a non-exhaustive list of ways to fail):

  • FGSM step size is too large, forcing the adversarial examples to cluster near the boundary
  • Random initialization only covers a smaller subset of the threat model
  • Long training with many epochs and fine tuning with very small learning rates

All of these pitfalls can be avoided by simply using early stopping based on a subset of the training data to evaluate the robust accuracy with respect to PGD, as the failure mode for FGSM adversarial training occurs quite rapidly (going to 0% robust accuracy within the span of a couple epochs)

Why does this matter if I still want to use PGD adversarial training in my experiments?

The speedups gained from using mixed-precision arithmetic and cyclic learning rates can still be reaped regardless of what training regimen you end up using! For example, these techniques can speed up CIFAR10 PGD adversarial training by almost 2 orders of magnitude, reducing training time by about 3.5 days to just over 1 hour. The engineering costs of installing the apex library and changing the learning rate schedule are miniscule in comparison to the time saved from using these two techniques, and so even if you don't use FGSM adversarial training, you can still benefit from faster experimentation with the DAWNBench improvements.

fast_adversarial's People

Contributors

leslierice1 avatar riceric22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fast_adversarial's Issues

indices

Can anyone please let me know if it's necessary to just update the \deltas of those images that are not misclassified? Can't we just update all \detla s? Which also ensures its maximization.

I = output.max(1)[1] == y

About low and high value of uniform distribution in PGD attack (CIFAR-10)

Hi Eric,

Thank you for the code. It's awesome with all efficient training tricks.
I would like to ask your confirmation on the values of low and high value in CIFAR10/utils.py line 61, the delta is initialized in uniform distribution on each normalized channel.

delta[:, i, :, :].uniform_(-epsilon[i][0][0].item(), epsilon[0][0][0].item())

The high value is epsilon[0][0][0], wouldn't it be epsilon[i][0][0]?
I am new in this. Can you please confirm? If you specifically fix the high value, can you please explain me a little bit? Thank you for your valuable work again.

invalid key "/xff" when loading model.

Thank you for opening your technology to the open source. Bug when loading the imagenet model, error occurs.
"_pickle.UnpicklingError: invalid load key, '\xff'."
The loading method in your code cannot load the model correctly.
How should I load your model correctly?

About PGD evaluation

Hi, thank you for the great work and opening the code.

However I have a question about PGD evaluation.

In the code, when attack_pgd is called, it seems that for some images in a batch, adversarial perturbation is gained with less steps than attack_iter.

During the iteration, update on perturbation 'delta' are performed to the images those are classified correctly only.
(index is the variable that indicate the images that are classified correctly and in delta, only delta[index[0]] is updated in the loop for _ in range(attack_iters):)

I understand that the Image that are classified correctly are not adversarial example, so more search in l-inf ball should be perform to seek adversarial perturbation.

However, I don't understand why the search should be stopped for the Images which are classified wrongly in the early step of PGD iteration.

I think it can be expected that more strong adversarial perturbation can be searched by performing more gradient descent iteration even if the images are adversarial already. In other word, I doubt that evaluation on PGD are performed with relatively weak adversarial examples.

These maybe the adversarial examples with less distant from original one(not exactly but approximately), but not strong adversarial examples. And I think the strength of adversarial example is crucial because the main claim of paper is that training with FGSM can build model that are robust to strong attack such as PGD.

I think that something like max_delta[all_loss >= max_loss] = delta.detach()[all_loss >= max_loss] in the loop for zz in range(restarts): should be performed in the loop for _ in range(attack_iters): to find the strongest adversarial example that can be achieved with attack_iter steps.

But of course, I may be missing something. So can you tell me the underlying idea about why the iteration stop when the image are classified wrongly while building PGD perturbation?

Reproduce the results of Free adversarial training.

Hi. I find that free adversarial training in original paper choose multistep lr.
I trained 96/8 epochs for free adversarial training with multistep lr with 1 GPU. I just got 40.8% acc for PGD-20 (eps=8). Then I trained 205/8->26 epochs for free adversarial training and I just got 42.01% acc for PGD-20 (eps=8). My initial lr is 0.1 and lr decays at [1/2* lr_steps, 3/4*lr_steps]. The model is WRN34.
Could you please help me figure out what's wrong? I also find that cifar10_std = [0.2471, 0.2435, 0.2616] in your settings. Why not cifar10_std = [0.2023, 0.1994, 0.2010]?

torch.where API in MNIST and CIFAR10, ImageNet configuration files

Hi,

When we tried to run the codes for MNIST and CIFAR10. It throws the error like that:

index = torch.where(output.max(1)[1] == y)[0]
TypeError: where() missing 2 required positional argument: "input", "other"

We have checked the API docs for Pytorch 1.3, Pytorch 1.0, Pytorch 0.4.1 . It seems that the usuage is not standard. We also tried to run the experiment in ImageNet folder, but the configuration files used in the code are not there in the Github. Do you know how to fix this? Thank you very much.

Include python/pytorch version for MNIST reproducibility

Hi! I am having a hard time reproducing the results (on MNIST, for example) and I have found that they differ when I change the pytorch version. I observe the following:

pytorch 1.12: when training with MNIST, training accuracy of 0.98 is achieved, but robust test accuracy is zero
pytorch 1.4: when training with MNIST, training accuracy of 0.95 is achieved, robust test accuracy is 0.88

I think the code was originally run with pytorch 1.0, I am trying to find out what is breaking the code in pytorch 1.12. It would be great to make it more clear which versions to use to reproduce the results

Reproduce the result of CIFAR-10 from the default setting

Hi,
I'm running the repo with the default configuration for CIFAR-10, however, here is the reported Accuracy I got from the trained model after 15 epochs:

Total train time: 6.7291 minutes
Test Loss        Test Acc        PGD Loss        PGD Acc
0.9252           **0.7003**          1.2217          **0.3784**

so the Accuracy is 70% and PGD Accuracy is only 37.84%?
Am I missing any detailed configurations?

Reproduce results

Hello,
thanks for the great work and open-sourcing the repository.
I reran the CIFAR10 experiments with the unmodified code (without arguments) provided and I got the following results:
python train_fgsm.py:

Test Loss Test Acc PGD Loss PGD Acc
My 0.6739 0.7930 1.0310 0.4531
Paper - 0.8381 - 0.4606

python train_free.py:

Test Loss Test Acc PGD Loss PGD Acc
My 0.7544 0.7695 1.0670 0.4598
Paper - 0.7838 - 0.4618

python train_pgd.py:

Test Loss Test Acc PGD Loss PGD Acc
My 0.7657 0.7664 1.0657 0.4725
Paper - 0.8246 - 0.5069

Any hint how to close the performance gap between the reported results and the ones obtained with code (especially for train_fgsm.py)?

I also have an additional question about Table 3 in the paper. Why is the time for the seconds/epoch of PGD-7 (1456.22) so much greater than DAWNBench + PGD-7 (104.94). From what I read online the speed improvements of mixed precision are usually in the range of 20% to 30%. Here it seems to increase the speed much more drastically.

Thanks for your help

Some questions about the robustness under other attacks

Hi, thanks for your code and idea. The results are very surprising and appealing.

I adopted your techniques (cycle LR and FGSM with random initialization) in my method (not AT but very similar to AT), and it worked very well when the attack is 'FGSM-type', including FGSM, PGD, and MI-FGSM. However, the adversarial robustness degrades shapely compared with the corresponding one solved with PGD when I evaluate the model under other types of attacks (e.g., CW and JSMA) on the MNIST dataset. Have you tried those attacks in your evaluation? Have you met the same problem?

Thanks for your work again and looking for your reply.

Yiming Li

l2 norm PGD attack

Hi, does the FGSM perform as good as PGD even for adversarial training with l2 perturbation instead of l_infinity?

Parameters of training

Hello,

Thanks for your valuable work.

I would like to understand the methodology behind the division of epsilon and alpha values with standard deviation.

    epsilon = (args.epsilon / 255.) / std
    alpha = (args.alpha / 255.) / std
    pgd_alpha = (2 / 255.) / std

Can't reproduce MNIST results using current codes

I just cloned this repo and try to run codes with provided instructions. (the code is not modified.)
Environment: cuda 11.3, python 3.9.6, pytorch 1.9.0, torchvision 0.10.0, installed via miniconda.

I run python train_mnist.py --fname ./new_result.pth to get a model,
and then run python evaluate_mnist.py --fname ./new_result.pth to evaluate the robustness.
and run python evaluate_mnist.py --fname ./new_result.pth --attack none to evaluate the clean accuracy.
image
The result shows that robustness=0.00% and accuracy=97.71%, meaning the trained model is not robust at all.

However, using your pretrained model in models/fgsm.pth brings a robust model. (robustness=88.38% and accuracy=98.50%)
image

Could you provide any comment on how to reproduce your pretrained results?

Inconsistent clamping behaviour between CIFAR and MNIST fgsm implementaitions

In the implemenation of fgsm for mnist, you do not clamp the initatial perturbation - meaning you calculate gradient based on out of bounds data points:

delta = torch.zeros_like(X).uniform_(-args.epsilon, args.epsilon).cuda()
delta.requires_grad = True
output = model(X + delta)
loss = F.cross_entropy(output, y)

This contrasts with the CIFAR implementation, where this clamping is done:

for j in range(len(epsilon)):
delta[:, j, :, :].uniform_(-epsilon[j][0][0].item(), epsilon[j][0][0].item())
delta.data = clamp(delta, lower_limit - X, upper_limit - X)

Is this intended? Why was this choice made?

Model overfits with low test accuracy for higher epsilon values

I'm using the FGSM approach to train a ResNet18 model on CIFAR10.

Using the values in the paper for epsilon=8/255 and alpha=10/255 works fine. But when I try to extend to an epsilon of 12 (and an alpha of 1.25*epsilon as outlined in the paper, so 15) to compare to other robust models, the model catastrophically overfits relatively early with very low clean example accuracy (50 to 60%). Has anyone had success using this approach with a higher epsilon than 8/255? Does alpha=1.25*epsilon not apply for other values of epsilon?

Thanks in advance for any help you can provide.

Parameter settings on CIFAR-100

Hi,

I tried to use this method on CIFAR-100 with the same parameter settings as CIFAR-10. But the results are terrible that the test adversarial accuracies are less than 2%. Do you have any suggestions on how to set up the parameters (epoch, learning rate, and batch size)for CIFAR100? Also, the auxiliary loss is widely used in natural training, do you think it will be helpful if used in fast adversarial training?

Best wishes,
Jia

facing "nan" values during training the model

hi, during the training with my custom objective loss, I realized that sometimes the model went wrong and produce "nan" and become invalid; which I didn't face before with other training methods, is that because of the learning rate of the cyclic learning rate being too large and causing the loss to diverge as mentioned in the paper: For each method, we individually tune λ to be as large as possible without causing the training loss to diverge? or is it a bug?

I ran the original again with epochs=30 and also faced the same issue:
image

When computing the perturbation, do we need to set model.eval()?

Hello Leslie Rice and Eric Wong,

Congratulations on your significant work!!

I found the model is always set to training mode during adversarial training period. However, I think when we compute the adversarial perturbation, we must set model.eval() to prevent the randomness, such as dropout, to affect the estimated gradients. So a correct approach is to add model.eval() before this line.

I'm curious about why you did not set model.eval() in your code. I guess the amp makes the gradient overflow in eval mode? How about the performance gap between these two different approaches?

Looking forward to your reply, thank you !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.