vdigpku / dada Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2020] DADA: Differentiable Automatic Data Augmentation
License: MIT License
[ECCV 2020] DADA: Differentiable Automatic Data Augmentation
License: MIT License
Hi. Unable to reproduce the results (search_relax). I got only 65.23% (top1) in imagenet.?
Dear Authors,
I follow the instruction to install required modules. But when I run search_relax train.py I received no module genotypes. May I ask how can I install this module?
Best
Hi,
After checking and testing the code, I am looking a way to extract the found policy.
Also can you explain how we can reuse the found policy.
As stated in the problem, I tried using data parallel and commenting out set_device for multi-gpu, but seems to not work.
The script keeps showing
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
RuntimeError: CUDA error: initialization error
Thanks!
Hello, thank you for your excellent work. I noticed that there are two elif dataset == 'reduced_imagenet':
in your dataset.py. It means that if you run this code, the second elif will not work. Is this an mistake.
Could you give me the IDADA source code?
Hi!
I have set learning rate=0.001. But when searching, the valid metric curve is not unstable.
Now I choose the top-25 sub_policy(epoch=36, 1 epoch=500 iter),because the valid metric curve looks good. Then the second train stage is converged. I don't know why?
Additionally, I choose the sub_policy in the case of curve oscillation, the second train stage is not stable and model is not converged.
During the entire search phase, the valid metric are oscillating. Is this normal?
I don't know how to choose the final policy, looks like I select sub_policy randomly. Thanks!
I have the following two questions. Is the code in No.1 used to calculate the gradient of magnitude and probability when backward? And the second code is used to calculate the gradient of ops_weights?
1、
def forward(self, origin_images, trans_images, probability, probability_index, magnitude):
index = sum( p_i.item()<<i for i, p_i in enumerate(probability_index))
com_image = 0
images = origin_images
adds = 0
for selection in range(2**len(self.sub_policy)):
trans_probability = 1
for i in range(len(self.sub_policy)):
if selection & (1<<i):
trans_probability = trans_probability * probability[i]
if selection == index:
images = images - magnitude[i]
adds = adds + magnitude[i]
else:
trans_probability = trans_probability * ( 1 - probability[i] )
if selection == index:
images = images.detach() + adds
com_image = com_image + trans_probability * images
else:
com_image = com_image + trans_probability
return com_image
2、
def forward(self, origin_images, trans_images_list, probabilities, probabilities_index, magnitudes, weights, weights_index):
for i, (p, p_i, m, w, op) in enumerate(zip(probabilities, probabilities_index, magnitudes, weights, self._ops)):
if weights_index.item() == i:
return sum(w * op(origin_images, trans_images, p, p_i, m))
else:
return w
因为使用one-step unrolled validation loss,就是设置unrolled = Ture的时候,需要new_model去获得虚拟的梯度,内存太大了,所以我直接使用unrolled=False,但是这样的话,就只会更新magnitude,不会更新weight和probabilities,这个地方是不是代码写的有问题
def _backward_step(self, input_valid, target_valid): loss = self.model._loss(input_valid, target_valid) loss.backward()
Hello, I have a question about the dataset.py.
sss = StratifiedShuffleSplit(n_splits=5, test_size=split, random_state=0)
In this line, you produced 5 different datasets. Why you have to produce 5 instead 1?
Hi, I don't have ILSVRC/ImageSets/CLS-LOC/train_cls.txt. It is too difficult to download ILSVRC2017_CLS-LOC.tar.gz in china. Could you please provide train_cls.txt? Thank you very much!
hi, it seems that in DADA, dataloader num_workers have to be 0 in order to avioding mislocation between gradient and actual augment parameters.
But if num worker ==0, the speed advantages of DADA cannot be shown if comparsion with PBA. So do you have some trick to deal with this issue?
Hi,
We found if we use the higher versions of pytorch (instead of pytorch1.2), 'torch.distributions.RelaxedOneHotCategorical' raises 'inplace' error, seriously limiting the application of DADA. So have you considered to tackle this problem? Thanks.
Hi,
After you find the optimal DA policies, do you use all the policies (eg, 105 sub policies for cifar10) or choose the top 25, as shown in the readme?
Thanks!
Hi,
I tried running the Gumbel-Softmax model with the following parameters:
GPU=0
DATASET=cifar10
MODEL=wresnet40_2
EPOCH=200
BATCH=128
LR=0.1
WD=0.0002
AWD=0.0
ALR=0.001
CUTOUT=16
SAVE=CIFAR10
python train_search_paper.py --unrolled --report_freq 1 --num_workers 0 --epoch ${EPOCH} --batch_size ${BATCH} --learning_rate ${LR} --dataset ${DATASET} --model_name ${MODEL} --save ${SAVE} --gpu ${GPU} --arch_weight_decay ${AWD} --arch_learning_rate ${ALR} --cutout --cutout_length ${CUTOUT}
and I am getting the following error:
Traceback (most recent call last):
File "train_search_paper.py", line 284, in
main()
File "train_search_paper.py", line 175, in main
train_acc, train_obj = train(train_queue, valid_queue, model, architect, criterion, optimizer, lr)
File "train_search_paper.py", line 223, in train
loss.backward()
File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/scratch/clear/jmarrie/miniconda3/envs/env/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [105]] is $t version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Do you have any idea where it comes from?
Thanks,
Juliette
Hi, thanks for the good work. I have one question need your help. I am confused with the method DifferentiableAugment in https://github.com/VDIGPKU/DADA/blob/master/search_relax/model_search.py#L43.
`class DifferentiableAugment(nn.Module):
def init(self, sub_policy):
super(DifferentiableAugment, self).init()
self.sub_policy = sub_policy
def forward(self, origin_images, probability_b, magnitude):
images = origin_images
adds = 0
for i in range(len(self.sub_policy)):
if probability_b[i].item() != 0.0:
images = images - magnitude[i]
adds = adds + magnitude[i]
images = images.detach() + adds
return images`
It seems that image processing oprations are not really called to preprocess training images on search stage. For each sub_policy, just minus the magnitude? If I misunderstood something, please tell me, thanks.
As a part of my personal research, I am working on studying various automated data augmentation techniques.
Thus, while trying to reproduce your results, I am facing issues with the updation of probabilities and op_weights. Only the value of the magnitude is getting updated over epochs and the probabilities and ops_weights are remaining constant throughout the runtime at values 0.5 and 0.0095 respectively.
I would like to request you to kindly help me rectify this issue!
Thanking you.
hi, thank you for your research,I read your paper and code, and i fing in your paper in formula 17, you gradient alpha. But in your code,dalpha=[]. can you tell me the reason? thanks.
Raised a PR to fix the same : #11
Hi,
Thank you for sharing this great work! Could you share the ImageNet split you used in the experiment? Thank you!
Hi,
Thank you for your great work. I had a question regarding equation 10 in your paper, where you approximate gradient of transformed image with respect to the magnitude of the transformation operation ( d x_ij/ dm) = 1. I don't understand the reason behind this as the pixel values may not always increase (+1 gradient) if you increase the magnitude (for e.g. shearX, shearY).
Thanks.
Hello! I have some minor questions about certain details in the training of the network itself.
From the paper, its said that:
Following [3, 10, 15], we search the DA policies on the reduced datasets and evaluate on the full datasets. Furthermore, we split half of the reduced datasets as training set, and the remaining half as validation set for the data augmentation search.
So what is the workflow for the training of a neural network with DADA? Do we search on the dataset using train_search_paper, then transfer the policies and use it for training? If yes, then where is the method used to transfer the searched policies to the training? If no, then how is the validation data used? It seems like you are only using half of the data to train the neural network (train_portion = 0.5).
set -x
GPU=1
DATASET=cifar100
MODEL=resnet50
EPOCH=20
BATCH=128
LR=0.1
WD=0.0002
AWD=0.0
ALR=0.005
CUTOUT=16
TEMPERATE=0.5
which python
python train_search_paper.py --unrolled --report_freq 1 --num_workers 0 --epoch ${EPOCH} --batch_size ${BATCH} --learning_rate ${LR} --dataset ${DATASET} --model_name ${MODEL} --gpu ${GPU} --arch_weight_decay ${AWD} --arch_learning_rate ${ALR} --weight_decay ${WD} --cutout --cutout_length ${CUTOUT} --temperature ${TEMPERATE}
Hello, I used the reset50 network to search for the augmentation policy. During searching, I noticed that the accuracy for training and validation is very low.
04/27 03:59:04 PM valid 187 2.398671e+00 38.285406 70.545213
04/27 03:59:04 PM valid 188 2.397762e+00 38.289517 70.568783
04/27 03:59:04 PM valid 189 2.397031e+00 38.297697 70.575658
04/27 03:59:04 PM valid 190 2.395328e+00 38.326243 70.590641
04/27 03:59:04 PM valid 191 2.396694e+00 38.309733 70.576986
04/27 03:59:04 PM valid 192 2.396227e+00 38.337921 70.575615
04/27 03:59:05 PM valid 193 2.396536e+00 38.321521 70.574259
04/27 03:59:05 PM valid 194 2.396318e+00 38.325321 70.584936
04/27 03:59:05 PM valid 195 2.395298e+00 38.336000 70.588000
04/27 03:59:05 PM valid_acc 38.336000
Is this Ok?
I tried to search reduce imagenet policy by origin imagenet search script, but found the epoch is only 20.
it seems that in only 20 epoch, training loss is not converging?
If traing loss is not converging , how to validate the performace of data augment policy?
By the way, could you share the train loss, val acc, train acc of search phase in the end?
Hello, thank you for your great works. I have a question about how you update the sampling parameters (arch parameters). You update the the sampling parameters with the validation set. But your do not augment the validation set with the sampled augmentation. That means that, the gradients of the loss respect to the validation data is None. Then how do you update the sampling parameters by the validation set?
Hi,
I don't understand the DifferentiableAugment class in the implementation. What does it do? Just subtract and add magnitude with images?. Why you have adopted such as this? Is there any specific reason for it?
class DifferentiableAugment(nn.Module):
def init(self, sub_policy):
super(DifferentiableAugment, self).init()
self.sub_policy = sub_policy
def forward(self, origin_images, probability_b, magnitude):
images = origin_images
adds = 0
for i in range(len(self.sub_policy)):
if probability_b[i].item() != 0.0:
images = images - magnitude[i]
adds = adds + magnitude[i]
images = images.detach() + adds
return images
Thanks for the great work and code!
I want to reproduce the same augmentation policy(called genotype in your code) with the provided searching code. I folllow the description in ReadME.md and search augmentation policy in reduced ImageNet with Res50. However, I found the policy i get is different from the policy you gave in genotype.py, so i want to know whether i do something wrong in repoducing the result. Here are some reasons i guessed may affect the searching results:
1.In the searching code, default random seed is 2 in train_search_paper.py, is this the same random seed you used to get the final result?
2.In searching, i found the augmentations are insert after colorjitter, but in training code, augmentation policy is inserted after RandomHorizontalFlip and before Colorjitter(line 95 in fast-autoaugment/FastAutoAugment/data.py), this is not consistent in training and searching.
Are these two reasons affect the seraching? Or there are some other details i did not found in searching process process?I look forward to your reply,thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.