Giter VIP home page Giter VIP logo

mixup-cifar10's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mixup-cifar10's Issues

Unable to Reproduce Results for WRN-28-10

Hi,

I am unable to reproduce results for Cifar-10 dataset with WRN-28-10. Accuracy in paper is 97.3% whereas for me it stay below 97% around 96.9%.

Any suggestions?

Regards,

DenseNets do not match paper implementation

Hi,

Similar issue that happened with ResNets. #3

The implementation on densnet.py follow structures that are not mentioned in the paper.
There are not ResNet 121, 169, 201, 161 for CIFAR and the one called densenet_cifar layers doesn't match the original paper.

Furthermore, densenet3.py only works for DenseNets-BC. They don't match the number of parameters reported by the authors on simple DenseNets - simple ResNet should output 16 feature maps after the first convolution instead of 2k.

I have commited on commit 5db867f on #4 an implementation on DenseNet to fit every combination reported in the original paper. They match the architectures reported in the paper:

`'''
DenseNets implemented on the paper https://arxiv.org/pdf/1608.06993.pdf

+-------------+-------------+-------+--------------+
|    Model    | Growth Rate | Depth | M. of Params |
+-------------+-------------+-------+--------------+
|  DenseNet   |     12      |  40   |     1.02     |
+-------------+-------------+-------+--------------+
|  DenseNet   |     12      |  100  |     6.98     |
+-------------+-------------+-------+--------------+
|  DenseNet   |     24      |  100  |    27.249    |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     12      |  100  |    0.769     |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     24      |  250  |    15.324    |
+-------------+-------------+-------+--------------+
| DenseNet-BC |     40      |  190  |    25.624    |
+-------------+-------------+-------+--------------+

'''`

Why we call Mix-up method as data augmentation technique?

I am confused about the Mixup data augmentation technique, let me explain the problem briefly:
We double or quadruple the data using classic augmentation techniques (e.g., Jittering, Scaling, Magnitude Warping). For instance, if the original data set contained 4000 samples, there will be 8000 samples in the data set after the augmentation.

On the other hand, according to my understanding, in Mixup data augmentation, we do not add the data but rather mix the samples and their labels and use these new mixed samples for training to produce a more regularized model. Am I correct? If yes, then why is the Mixup method referred to as data augmentation? Since we only mix samples and not artificially increase the data set size?

No implementation for wide resnet

In the raw paper, the Wideresnet-28-10 claims 3.8% error in Cifar10 with ERM and 2.7% error with mixup. However, in both repositories, I can't find the implementation for wide reset. In my own implementation, I only reach 4.02% error in Cifar10 with ERM and 3.83 error with mixup.

ResNet architectures on Cifar10 do not follow Kaiming He

I think Resnets models (Resnet-18, Resnet-110, etc.) do not match https://arxiv.org/abs/1603.05027. The reason is that Kaiming He used a different architecture (under the same name) for CIFAR-10 compared to ImageNet. In particular Resnet-18 should have ~0.3M params, while the one used in the repository seems to have ~10M. See Tab. 6 and bottom of page 7 from https://arxiv.org/abs/1512.03385:

image

This is not a bug of course, but it might be a bit misleading.

test accuracy truncated to integer

The test accuracy is computed using acc = 100.*correct/total in train.py. Since both "correct" and "total" are integers, the expression returns an integer, so the test accuracy is rounded down to nearest integer.
This can be solved by using this instead:
acc = 100.*correct.float()/total

Inference using InstaHide

I want to know how to evaluate the accuracy of inference with Instahide and The definition of InstaHide during inference.
(Table2 on the paper https://arxiv.org/pdf/2010.02772.pdf)
In my experiment using this GitHub code, the classification loss in training is very high although the classification loss during the inference is very low if we use the real images.

Therefore, I'm wondering about the evaluation of inference with Instahide.

some problem while using batch normalization

resnet18_10_25_1
resnet_10_29_1

Thank you for your contribution. But I have some problem when I reimplement the project in Tensorflow and apply the BN layer with tf.layers.batch_normalization(input, training). Just like the Pytorch that we should set model.train() and model.eval() to distinguish the different stages while using the BN layer, I set the "training" to TRUE during training while FALSE during validation.

Look at the two images above. The second one is the result I distinguish the TRAINING and VALIDATION stages while using BN layers, you can find the accuracy so instable and not so well. So I set the TRAINING_FLAG of BN layers all to TRUE, using the batch_mean and batch_var while testing, not the moving ones, and the result is shown in the first image. You can notice that the curve is more smooth, although the accuracy (about 92%) is lower than the paper. I have reduced the learning rate. It also reemerged when I remove the MIXUP module. I am no user if I apply the BN layers mistakenly, but I just do it according to the document. If the data augmentation has altered the distribution between the training data and testing data?

Hope for your answer! Thanks!

should I transfer the label into one-hot?

Hey.
I am trying to use your code to train my model.
and I have noticed that y_a and y_b should be one-hot.so when I implement your code in my experiment ,should I encode the label which is integer into one-hot ???
and I have transfer it in one-hot and start train .....but I found that the correct is always zero.....

train.py giving index error

Hi,
I was trying to run your code,
Exact command being:
$ CUDA_VISIBLE_DEVICES=0 python train.py --lr=0.1 --seed=20170922 --decay=1e-4
Before running the command I modified train.py to make download=True in both trainset and testset.
I got the following output:
_==> Preparing data..
Files already downloaded and verified
Files already downloaded and verified
==> Building model..
1
Using CUDA..

Epoch: 0
Traceback (most recent call last):
File "train.py", line 245, in
train_loss, reg_loss, train_acc = train(epoch)
File "train.py", line 165, in train
train_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number_

Could you help as to what I'm doing wrong? I ran the debugger then, it errors out here:
File "train.py", line 165, in train
train_loss += loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

PreResNet - wrong implementation of shortcut

Hello everyone,

I have two questions.

  • Why the repository name is cifar10. Is it because mixup is solely designed for CIFAR-10?
  • And in the repo, doesn't matter if there is a shortcut or not, still, BN-Relu applied to the input. This is different in many other repositories. I just want to learn which one is correct?

def forward(self, x):

No Validation Set

Hi, For supervised methods, it is generally advised to use a separate validation set. From the code, it looks like you have reported the best test set accuracy.

Is there any specific reason that you have not used a separate validation set?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.