vdigpku / dada Goto Github PK

View Code? Open in Web Editor NEW

188.0 188.0 29.0 959 KB

[ECCV 2020] DADA: Differentiable Automatic Data Augmentation

License: MIT License

Python 97.05% Shell 2.95%

dada's People

Contributors

Stargazers

Watchers

dada's Issues

Imagenet

Hi. Unable to reproduce the results (search_relax). I got only 65.23% (top1) in imagenet.?

No genotypes module

Dear Authors,

I follow the instruction to install required modules. But when I run search_relax train.py I received no module genotypes. May I ask how can I install this module?

Best

Extract the found policy

Hi,
After checking and testing the code, I am looking a way to extract the found policy.

Also can you explain how we can reuse the found policy.

Does it support Data Parallel and multiple GPU?

As stated in the problem, I tried using data parallel and commenting out set_device for multi-gpu, but seems to not work.

The script keeps showing
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
RuntimeError: CUDA error: initialization error

Thanks!

Why you put two "elif dataset == 'reduced'":

Hello, thank you for your excellent work. I noticed that there are two elif dataset == 'reduced_imagenet': in your dataset.py. It means that if you run this code, the second elif will not work. Is this an mistake.

IDADA

Could you give me the IDADA source code?

How to choose the final policy in the search phase?

Hi!
I have set learning rate=0.001. But when searching, the valid metric curve is not unstable.
Now I choose the top-25 sub_policy(epoch=36, 1 epoch=500 iter)，because the valid metric curve looks good. Then the second train stage is converged. I don't know why?
Additionally, I choose the sub_policy in the case of curve oscillation, the second train stage is not stable and model is not converged.
During the entire search phase, the valid metric are oscillating. Is this normal?

I don't know how to choose the final policy, looks like I select sub_policy randomly. Thanks!

The search stage figure:
X axis-epoch, Y axis-valid metric.

How to add (weight, probability and magnitude) to the forward calculation to calculate the gradient？

I have the following two questions. Is the code in No.1 used to calculate the gradient of magnitude and probability when backward? And the second code is used to calculate the gradient of ops_weights?

1、
def forward(self, origin_images, trans_images, probability, probability_index, magnitude):
index = sum( p_i.item()<<i for i, p_i in enumerate(probability_index))
com_image = 0
images = origin_images
adds = 0

    for selection in range(2**len(self.sub_policy)):
        trans_probability = 1
        for i in range(len(self.sub_policy)):
            if selection & (1<<i):
                trans_probability = trans_probability * probability[i]
               
                if selection == index:
                    images = images - magnitude[i]
                    adds = adds + magnitude[i]
            else:
                trans_probability = trans_probability * ( 1 - probability[i] )
        if selection == index:
            images = images.detach() + adds
            com_image = com_image + trans_probability * images
        else:
            com_image = com_image + trans_probability
    return com_image

2、
def forward(self, origin_images, trans_images_list, probabilities, probabilities_index, magnitudes, weights, weights_index):

    for i, (p, p_i, m, w, op) in enumerate(zip(probabilities, probabilities_index, magnitudes, weights, self._ops)):
        if weights_index.item() == i:
            return sum(w * op(origin_images, trans_images, p, p_i, m))
        else:
            return w

use one-step unrolled validation loss

因为使用one-step unrolled validation loss，就是设置unrolled = Ture的时候，需要new_model去获得虚拟的梯度，内存太大了，所以我直接使用unrolled=False，但是这样的话，就只会更新magnitude，不会更新weight和probabilities，这个地方是不是代码写的有问题
def _backward_step(self, input_valid, target_valid): loss = self.model._loss(input_valid, target_valid) loss.backward()

why did you produce 5 splits in "sss = StratifiedShuffleSplit(n_splits=5, test_size=split, random_state=0)"

Hello, I have a question about the dataset.py.
sss = StratifiedShuffleSplit(n_splits=5, test_size=split, random_state=0)
In this line, you produced 5 different datasets. Why you have to produce 5 instead 1?

Could you provide ILSVRC/ImageSets/CLS-LOC/train_cls.txt?

Hi, I don't have ILSVRC/ImageSets/CLS-LOC/train_cls.txt. It is too difficult to download ILSVRC2017_CLS-LOC.tar.gz in china. Could you please provide train_cls.txt? Thank you very much!

dataloader num_workers=0

hi, it seems that in DADA, dataloader num_workers have to be 0 in order to avioding mislocation between gradient and actual augment parameters.
But if num worker ==0, the speed advantages of DADA cannot be shown if comparsion with PBA. So do you have some trick to deal with this issue?

'torch.distributions.RelaxedOneHotCategorical' raises 'inplace' error

Hi，
We found if we use the higher versions of pytorch (instead of pytorch1.2), 'torch.distributions.RelaxedOneHotCategorical' raises 'inplace' error, seriously limiting the application of DADA. So have you considered to tackle this problem? Thanks.

Final DA policy: use all or 25?

Hi,

After you find the optimal DA policies, do you use all the policies (eg, 105 sub policies for cifar10) or choose the top 25, as shown in the readme?

Thanks!

Error in search_gumbel

Hi,

I tried running the Gumbel-Softmax model with the following parameters:

GPU=0
DATASET=cifar10
MODEL=wresnet40_2
EPOCH=200
BATCH=128
LR=0.1
WD=0.0002
AWD=0.0
ALR=0.001
CUTOUT=16
SAVE=CIFAR10

python train_search_paper.py --unrolled --report_freq 1 --num_workers 0 --epoch ${EPOCH} --batch_size ${BATCH} --learning_rate ${LR} --dataset ${DATASET} --model_name ${MODEL} --save ${SAVE} --gpu ${GPU} --arch_weight_decay ${AWD} --arch_learning_rate ${ALR} --cutout --cutout_length ${CUTOUT}

and I am getting the following error:

Traceback (most recent call last):
File "train_search_paper.py", line 284, in
main()
File "train_search_paper.py", line 175, in main
train_acc, train_obj = train(train_queue, valid_queue, model, architect, criterion, optimizer, lr)
File "train_search_paper.py", line 223, in train
loss.backward()
File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [105]] is $t version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

Do you have any idea where it comes from?

Thanks,

Juliette

About method DifferentiableAugment

Hi, thanks for the good work. I have one question need your help. I am confused with the method DifferentiableAugment in https://github.com/VDIGPKU/DADA/blob/master/search_relax/model_search.py#L43.
`class DifferentiableAugment(nn.Module):
def init(self, sub_policy):
super(DifferentiableAugment, self).init()
self.sub_policy = sub_policy

def forward(self, origin_images, probability_b, magnitude):
    images = origin_images
    adds = 0
    for i in range(len(self.sub_policy)):
        if probability_b[i].item() != 0.0:
            images = images - magnitude[i]
            adds = adds + magnitude[i]
    images = images.detach() + adds
    return images`

It seems that image processing oprations are not really called to preprocess training images on search stage. For each sub_policy, just minus the magnitude? If I misunderstood something, please tell me, thanks.

How to get the NetworkCIFAR ?

Hi!
i want to run the search_relax/train.py, but there is no model/NetworkCIFAR.

I don't know how to get it. So can you upload the NetworkCIFAR.py file?
Thanks!

Ops weights and probabilities not getting updated

As a part of my personal research, I am working on studying various automated data augmentation techniques.
Thus, while trying to reproduce your results, I am facing issues with the updation of probabilities and op_weights. Only the value of the magnitude is getting updated over epochs and the probabilities and ops_weights are remaining constant throughout the runtime at values 0.5 and 0.0095 respectively.
I would like to request you to kindly help me rectify this issue!

Thanking you.

Why in search-gumbel-architect.py-def _backward_step_unrolled(), dalpha is []?

hi, thank you for your research,I read your paper and code, and i fing in your paper in formula 17, you gradient alpha. But in your code,dalpha=[]. can you tell me the reason? thanks.

Update install instruction b to use == to specify the version of cudatoolkit instead of =

Raised a PR to fix the same : #11

Reduced ImageNet split

Hi,

Thank you for sharing this great work! Could you share the ImageNet split you used in the experiment? Thank you!

Explaination for Equation 10

Hi,

Thank you for your great work. I had a question regarding equation 10 in your paper, where you approximate gradient of transformed image with respect to the magnitude of the transformation operation ( d x_ij/ dm) = 1. I don't understand the reason behind this as the pixel values may not always increase (+1 gradient) if you increase the magnitude (for e.g. shearX, shearY).

Thanks.

Two Questions about some details of the paper

Hello! I have some minor questions about certain details in the training of the network itself.

How are the results of the paper acquired?

From the paper, its said that:
Following [3, 10, 15], we search the DA policies on the reduced datasets and evaluate on the full datasets. Furthermore, we split half of the reduced datasets as training set, and the remaining half as validation set for the data augmentation search.
So what is the workflow for the training of a neural network with DADA? Do we search on the dataset using train_search_paper, then transfer the policies and use it for training? If yes, then where is the method used to transfer the searched policies to the training? If no, then how is the validation data used? It seems like you are only using half of the data to train the neural network (train_portion = 0.5).

Is any other sub policy depth/sub policy count considered for search?
Why is ColorJitter used in conjunction with the DADA Policy? the subpolicies seem to be able to include it anyways.

Low accuracy while searching

set -x

cifar100

GPU=1
DATASET=cifar100
MODEL=resnet50
EPOCH=20
BATCH=128
LR=0.1
WD=0.0002
AWD=0.0
ALR=0.005
CUTOUT=16
TEMPERATE=0.5

which python
python train_search_paper.py --unrolled --report_freq 1 --num_workers 0 --epoch ${EPOCH} --batch_size ${BATCH} --learning_rate ${LR} --dataset ${DATASET} --model_name ${MODEL} --gpu ${GPU} --arch_weight_decay ${AWD} --arch_learning_rate ${ALR} --weight_decay ${WD} --cutout --cutout_length ${CUTOUT} --temperature ${TEMPERATE}

Hello, I used the reset50 network to search for the augmentation policy. During searching, I noticed that the accuracy for training and validation is very low.

04/27 03:59:04 PM valid 187 2.398671e+00 38.285406 70.545213
04/27 03:59:04 PM valid 188 2.397762e+00 38.289517 70.568783
04/27 03:59:04 PM valid 189 2.397031e+00 38.297697 70.575658
04/27 03:59:04 PM valid 190 2.395328e+00 38.326243 70.590641
04/27 03:59:04 PM valid 191 2.396694e+00 38.309733 70.576986
04/27 03:59:04 PM valid 192 2.396227e+00 38.337921 70.575615
04/27 03:59:05 PM valid 193 2.396536e+00 38.321521 70.574259
04/27 03:59:05 PM valid 194 2.396318e+00 38.325321 70.584936
04/27 03:59:05 PM valid 195 2.395298e+00 38.336000 70.588000
04/27 03:59:05 PM valid_acc 38.336000

Is this Ok?

Does search phase training loss converging?

I tried to search reduce imagenet policy by origin imagenet search script, but found the epoch is only 20.
it seems that in only 20 epoch, training loss is not converging?
If traing loss is not converging , how to validate the performace of data augment policy?
By the way, could you share the train loss, val acc, train acc of search phase in the end?

A question about the gradients of sampling

Hello, thank you for your great works. I have a question about how you update the sampling parameters (arch parameters). You update the the sampling parameters with the validation set. But your do not augment the validation set with the sampled augmentation. That means that, the gradients of the loss respect to the validation data is None. Then how do you update the sampling parameters by the validation set?

About DifferentiableAugment

Hi,

I don't understand the DifferentiableAugment class in the implementation. What does it do? Just subtract and add magnitude with images?. Why you have adopted such as this? Is there any specific reason for it?

class DifferentiableAugment(nn.Module):
def init(self, sub_policy):
super(DifferentiableAugment, self).init()
self.sub_policy = sub_policy

  def forward(self, origin_images, probability_b, magnitude):
      images = origin_images
      adds = 0
      for i in range(len(self.sub_policy)):
          if probability_b[i].item() != 0.0:
              images = images - magnitude[i]
              adds = adds + magnitude[i]
      images = images.detach() + adds
      return images

can't get the same augmentation policy(genotype) with the searching code

Thanks for the great work and code!
I want to reproduce the same augmentation policy(called genotype in your code) with the provided searching code. I folllow the description in ReadME.md and search augmentation policy in reduced ImageNet with Res50. However, I found the policy i get is different from the policy you gave in genotype.py, so i want to know whether i do something wrong in repoducing the result. Here are some reasons i guessed may affect the searching results:
1.In the searching code, default random seed is 2 in train_search_paper.py, is this the same random seed you used to get the final result?
2.In searching, i found the augmentations are insert after colorjitter, but in training code, augmentation policy is inserted after RandomHorizontalFlip and before Colorjitter(line 95 in fast-autoaugment/FastAutoAugment/data.py), this is not consistent in training and searching.
Are these two reasons affect the seraching? Or there are some other details i did not found in searching process process？I look forward to your reply，thanks.

vdigpku / dada Goto Github PK

dada's People

Contributors

Stargazers

Watchers

Forkers

dada's Issues

cifar100

Recommend Projects

Recommend Topics

Recommend Org