facebookresearch / mixup-cifar10 Goto Github PK

View Code? Open in Web Editor NEW

1.1K 22.0 225.0 29 KB

mixup: Beyond Empirical Risk Minimization

License: Other

Python 100.00%

mixup-cifar10's People

Stargazers

Watchers

Forkers

benathi vyraun ml-lab litsycn cclauss eglxiang amerch walkoncross afcarl irvingshu dapenggg kudkudak junweima junxuezheng elyor-anyvision weitianli tinyloop queenie88 haifeng-jin yihengjiang shubhampachori12110095 ddeeppnneett beckhamchen sbarzowski omarmeriwani qqgeogor kevinmtian hiiamein csldali 5idaidai qing0991 eeqmcc yinglang asvcode aashi6 kailigo william-zhan nunofernandes-plight lijian10086 felixzhang7 wh-forker parker-lyu zhuanghp viethungluu jingcx daojishigailvlun huangpan2507 zhyhy csyhhu wulingtian bordesf richardaecn wyf0912 dsuess angelajiang lvingm wylx14 crazyvertigo idanazuri youtang1993 dict kensuncarestream lowsa chunhuiwang-china hosna-sattar baical77 cgnorthcutt vitvicky nlvtuan xwushirley lliai yonigat zwzhu-d atsuyaide persuelx wolvesxia xrosliang goodpupil hoya012 swansealeo siriusbhu jkvt2 yuanwei0908 liudefu wcw987321 127t6937 ethan-zhu-code jiyuxuan926 dushwe engineerlion jbdel xzhflying ahmedhusskhalifa madonokouki franklinaurelio nooftechcampus curryandsun dosomecrazy debottam-dutta7 liangnjupt

mixup-cifar10's Issues

reg_loss printed but not modified in train.py

Hello,

I notice that in train.py the value "reg_loss" is printed, but is set to zero and then never updated.

Is this intentional?

Unable to Reproduce Results for WRN-28-10

Hi,

I am unable to reproduce results for Cifar-10 dataset with WRN-28-10. Accuracy in paper is 97.3% whereas for me it stay below 97% around 96.9%.

Any suggestions?

Regards,

DenseNets do not match paper implementation

Hi,

Similar issue that happened with ResNets. #3

The implementation on densnet.py follow structures that are not mentioned in the paper.
There are not ResNet 121, 169, 201, 161 for CIFAR and the one called densenet_cifar layers doesn't match the original paper.

Furthermore, densenet3.py only works for DenseNets-BC. They don't match the number of parameters reported by the authors on simple DenseNets - simple ResNet should output 16 feature maps after the first convolution instead of 2k.

I have commited on commit 5db867f on #4 an implementation on DenseNet to fit every combination reported in the original paper. They match the architectures reported in the paper:

`'''
DenseNets implemented on the paper https://arxiv.org/pdf/1608.06993.pdf

+-------------+-------------+-------+--------------+
|    Model    | Growth Rate | Depth | M. of Params |
+-------------+-------------+-------+--------------+
|  DenseNet   |     12      |  40   |     1.02     |
+-------------+-------------+-------+--------------+
|  DenseNet   |     12      |  100  |     6.98     |
+-------------+-------------+-------+--------------+
|  DenseNet   |     24      |  100  |    27.249    |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     12      |  100  |    0.769     |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     24      |  250  |    15.324    |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     40      |  190  |    25.624    |
+-------------+-------------+-------+--------------+

'''`

Why we call Mix-up method as data augmentation technique?

I am confused about the Mixup data augmentation technique, let me explain the problem briefly:
We double or quadruple the data using classic augmentation techniques (e.g., Jittering, Scaling, Magnitude Warping). For instance, if the original data set contained 4000 samples, there will be 8000 samples in the data set after the augmentation.

On the other hand, according to my understanding, in Mixup data augmentation, we do not add the data but rather mix the samples and their labels and use these new mixed samples for training to produce a more regularized model. Am I correct? If yes, then why is the Mixup method referred to as data augmentation? Since we only mix samples and not artificially increase the data set size?

No implementation for wide resnet

In the raw paper, the Wideresnet-28-10 claims 3.8% error in Cifar10 with ERM and 2.7% error with mixup. However, in both repositories, I can't find the implementation for wide reset. In my own implementation, I only reach 4.02% error in Cifar10 with ERM and 3.83 error with mixup.

Different mixup loss function between code and paper

The mixup loss function in code is as below:
,
while the mixture should be down before feed into the loss function according to the paper.

Will these two loss functions have the same results?

ResNet architectures on Cifar10 do not follow Kaiming He

I think Resnets models (Resnet-18, Resnet-110, etc.) do not match https://arxiv.org/abs/1603.05027. The reason is that Kaiming He used a different architecture (under the same name) for CIFAR-10 compared to ImageNet. In particular Resnet-18 should have ~0.3M params, while the one used in the repository seems to have ~10M. See Tab. 6 and bottom of page 7 from https://arxiv.org/abs/1512.03385:

This is not a bug of course, but it might be a bit misleading.

what should I change if I want to use the concept of MixUp on my dataset

I an trying to use this concept along with a transformer based architecture so I was wondering what should I chnage to use this concept on my dataset ?
thanks in advance

test accuracy truncated to integer

The test accuracy is computed using acc = 100.*correct/total in train.py. Since both "correct" and "total" are integers, the expression returns an integer, so the test accuracy is rounded down to nearest integer.
This can be solved by using this instead:
acc = 100.*correct.float()/total

Inference using InstaHide

I want to know how to evaluate the accuracy of inference with Instahide and The definition of InstaHide during inference.
(Table2 on the paper https://arxiv.org/pdf/2010.02772.pdf)
In my experiment using this GitHub code, the classification loss in training is very high although the classification loss during the inference is very low if we use the real images.

Therefore, I'm wondering about the evaluation of inference with Instahide.

why y is not the one-hot label encodings in the implementation?

Hi,

In the implementation, y is the value of class, not the one-hot labels. It seems not same as the paper. Will it impact the final results? Could you please share some views on why y is not one-hot label. Thanks a lot.

some problem while using batch normalization

Thank you for your contribution. But I have some problem when I reimplement the project in Tensorflow and apply the BN layer with tf.layers.batch_normalization(input, training). Just like the Pytorch that we should set model.train() and model.eval() to distinguish the different stages while using the BN layer, I set the "training" to TRUE during training while FALSE during validation.

Look at the two images above. The second one is the result I distinguish the TRAINING and VALIDATION stages while using BN layers, you can find the accuracy so instable and not so well. So I set the TRAINING_FLAG of BN layers all to TRUE, using the batch_mean and batch_var while testing, not the moving ones, and the result is shown in the first image. You can notice that the curve is more smooth, although the accuracy (about 92%) is lower than the paper. I have reduced the learning rate. It also reemerged when I remove the MIXUP module. I am no user if I apply the BN layers mistakenly, but I just do it according to the document. If the data augmentation has altered the distribution between the training data and testing data?

Hope for your answer! Thanks!

should I transfer the label into one-hot?

Hey.
I am trying to use your code to train my model.
and I have noticed that y_a and y_b should be one-hot.so when I implement your code in my experiment ,should I encode the label which is integer into one-hot ???
and I have transfer it in one-hot and start train .....but I found that the correct is always zero.....

train.py giving index error

Hi,
I was trying to run your code,
Exact command being:
$ CUDA_VISIBLE_DEVICES=0 python train.py --lr=0.1 --seed=20170922 --decay=1e-4
Before running the command I modified train.py to make download=True in both trainset and testset.
I got the following output:
_==> Preparing data..
Files already downloaded and verified
Files already downloaded and verified
==> Building model..
1
Using CUDA..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 245, in
train_loss, reg_loss, train_acc = train(epoch)
File "train.py", line 165, in train
train_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number_

Could you help as to what I'm doing wrong? I ran the debugger then, it errors out here:
File "train.py", line 165, in train
train_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

PreResNet - wrong implementation of shortcut

Hello everyone,

I have two questions.

Why the repository name is cifar10. Is it because mixup is solely designed for CIFAR-10?
And in the repo, doesn't matter if there is a shortcut or not, still, BN-Relu applied to the input. This is different in many other repositories. I just want to learn which one is correct?

mixup-cifar10/models/resnet.py

Line 64 in eaff31a

def forward(self, x):

No Validation Set

Hi, For supervised methods, it is generally advised to use a separate validation set. From the code, it looks like you have reported the best test set accuracy.

Is there any specific reason that you have not used a separate validation set?

facebookresearch / mixup-cifar10 Goto Github PK

mixup-cifar10's People

Stargazers

Watchers

Forkers

mixup-cifar10's Issues

Recommend Projects

Recommend Topics

Recommend Org