Giter VIP home page Giter VIP logo

meal-v2's People

Contributors

paulgavrikov avatar szq0214 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

meal-v2's Issues

疑问

你好,感谢您优秀的工作。在论文中,您是利用big model蒸馏big model, lite model蒸馏lite model,那为什么没有做big model蒸馏 lite model的实验呢?或者说有其他的问题?谢谢您的回复!!!

loss Value

Hello, thank you very much for your extraordinary work. However, when I change to my own task, the following situation occurs: the value of loss does not decrease. Looking forward to your reply!

Epoch: [46][1/44] Time 2.27 (2.27) Data 1.66 (1.66) G_Loss 6.603 {6.603, 6.603} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.301, 0.301} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.00030
Epoch: [46][11/44] Time 0.66 (0.81) Data 0.06 (0.20) G_Loss 6.586 {6.603, 6.603} D_Loss 0.348 {0.347, 0.347} T_Loss 0.301 {0.303, 0.303} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.39, 0.39} LR 0.00030
Epoch: [46][21/44] Time 0.65 (0.74) Data 0.05 (0.13) G_Loss 6.564 {6.596, 6.596} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.302, 0.302} Top-1 0.39 {0.02, 0.02} Top-5 3.12 {0.50, 0.50} LR 0.00030
Epoch: [46][31/44] Time 0.66 (0.71) Data 0.05 (0.11) G_Loss 6.529 {6.582, 6.582} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.302, 0.302} Top-1 1.56 {0.10, 0.10} Top-5 3.12 {0.73, 0.73} LR 0.00030
Epoch: [46][41/44] Time 0.66 (0.70) Data 0.06 (0.09) G_Loss 6.504 {6.568, 6.568} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.301, 0.301} Top-1 1.56 {0.25, 0.25} Top-5 3.12 {1.14, 1.14} LR 0.00030
Epoch: [46] -- TRAINING SUMMARY Time 30.68 Data 3.98 G_Loss 6.560 D_Loss 0.347 T_Loss 0.301 Top-1 0.30 Top-5 1.28
Epoch: [47][1/44] Time 2.29 (2.29) Data 1.68 (1.68) G_Loss 6.598 {6.598, 6.598} D_Loss 0.347 {0.347, 0.347} T_Loss 0.314 {0.314, 0.314} Top-1 0.00 {0.00, 0.00} Top-5 0.39 {0.39, 0.39} LR 0.00030
Epoch: [47][11/44] Time 0.65 (0.82) Data 0.05 (0.21) G_Loss 6.587 {6.603, 6.603} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.302, 0.302} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.28, 0.28} LR 0.00030
Epoch: [47][21/44] Time 0.66 (0.74) Data 0.05 (0.14) G_Loss 6.606 {6.590, 6.590} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.302, 0.302} Top-1 0.00 {0.13, 0.13} Top-5 0.00 {0.67, 0.67} LR 0.00030
Epoch: [47][31/44] Time 0.66 (0.72) Data 0.05 (0.11) G_Loss 6.563 {6.582, 6.582} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.301, 0.301} Top-1 0.00 {0.09, 0.09} Top-5 0.00 {0.71, 0.71} LR 0.00030
Epoch: [47][41/44] Time 0.69 (0.70) Data 0.08 (0.10) G_Loss 6.486 {6.567, 6.567} D_Loss 0.347 {0.347, 0.347} T_Loss 0.301 {0.301, 0.301} Top-1 0.00 {0.09, 0.09} Top-5 1.56 {0.99, 0.99} LR 0.00030
Epoch: [47] -- TRAINING SUMMARY Time 30.88 Data 4.16 G_Loss 6.560 D_Loss 0.347 T_Loss 0.306 Top-1 0.08 Top-5 1.14
Epoch: [48][1/44] Time 2.23 (2.23) Data 1.62 (1.62) G_Loss 6.601 {6.601, 6.601} D_Loss 0.347 {0.347, 0.347} T_Loss 0.560 {0.560, 0.560} Top-1 0.00 {0.00, 0.00} Top-5 0.39 {0.39, 0.39} LR 0.00030
Epoch: [48][11/44] Time 0.67 (0.81) Data 0.06 (0.20) G_Loss 6.850 {6.653, 6.653} D_Loss 0.348 {0.347, 0.347} T_Loss 0.680 {0.821, 0.821} Top-1 0.00 {0.07, 0.07} Top-5 0.00 {0.89, 0.89} LR 0.00030
Epoch: [48][21/44] Time 0.68 (0.75) Data 0.07 (0.14) G_Loss 6.637 {6.657, 6.657} D_Loss 0.347 {0.347, 0.347} T_Loss 0.388 {0.653, 0.653} Top-1 0.00 {0.04, 0.04} Top-5 0.00 {0.65, 0.65} LR 0.00030
Epoch: [48][31/44] Time 0.68 (0.72) Data 0.07 (0.11) G_Loss 6.589 {6.640, 6.640} D_Loss 0.347 {0.348, 0.348} T_Loss 0.328 {0.563, 0.563} Top-1 0.00 {0.08, 0.08} Top-5 0.00 {0.68, 0.68} LR 0.00030
Epoch: [48][41/44] Time 0.65 (0.71) Data 0.05 (0.10) G_Loss 6.541 {6.622, 6.622} D_Loss 0.347 {0.348, 0.348} T_Loss 0.310 {0.502, 0.502} Top-1 0.00 {0.10, 0.10} Top-5 1.56 {0.74, 0.74} LR 0.00030

resnet50 pretrained model has top1 ACC=79.02% ?

Hi, I'm extremely interested with your work.
But I'm confuse that your pretrained Resnet50 model already have top1 Acc=79.02%, which has a big gap from your paper baseline 76.5%. (The test code also use test.py in your porject)
Have you try the pretrained model? Or did I go wrong?
Thank you.

(Resnet50 pretrained weight download from timm link: https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-weights/resnet50_ram-a26f946b.pth)

Is it necessary to apply multiple teacher ensemble?

Hi, I have a question,

In paper Table 5, you apply three teacher models to distill the student model. For example, use senet154, resnet152 and their ensembled result as teacher models for a resnet50 student.

image

Have you tried distilling a student model using only the best performing teacher model(e.g. senet154: Acc@1 81.378), instead of using ensembled models ? Will a student model distilled from a single model perform worse?

why are top1 and top5 both 0.0?

I parpared my own data according to the imagenet format (train/ and val/ folders contain different classes of image folders.)and trained the model.
But after 60 epoch ,the top1 and top5 are both still 0.0
What could be the problem?Looking forward to your reply.Thanks!

INFO 2021-01-28 22:14:55,943: Epoch: [59][141/181] Time 1.25 (6.42) Data 0.00 (0.14) G_Loss 3.085 {3.283, 3.279} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000
INFO 2021-01-28 22:15:20,853: Epoch: [59][161/181] Time 1.24 (5.78) Data 0.00 (0.12) G_Loss 3.101 {3.267, 3.255} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000
INFO 2021-01-28 22:15:45,187: Epoch: [59][181/181] Time 0.65 (5.28) Data 0.00 (0.11) G_Loss 3.335 {3.266, 3.253} D_Loss 0.347 {0.347, 0.347} Top-1 0.00 {0.00, 0.00} Top-5 0.00 {0.00, 0.00} LR 0.01000
INFO 2021-01-28 22:15:45,965: Epoch: [59] -- TRAINING SUMMARY Time 955.00 Data 19.59 G_Loss 3.266 D_Loss 0.347 Top-1 0.00 Top-5 0.00

torch.nn.DataParallel error

I want to train MEAL-V2 on a machine with 4 gpus, the train script as follow :
python train.py --gpus 0 1 2 3 --save MEAL_V2_resnet50_224 ...

but get a error:

... 
RuntimeError: Caught RuntimeError in replica 0 on device 0.
...
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
        output = module(*input, **kwargs)
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
    File "/mnt/codes/MEAL2-drink/models/discriminator.py", line 17, in forward
        out = F.relu(self.conv1(x))
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
         result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
         return self._conv_forward(input, self.weight, self.bias)
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
         self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

torch.nn.DataParallel error

trian error 👍 RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!
how to solve this problem? i use to(device),but it do not work.error in @szq0214
image

Paper Inconsistency with Code

The initial LR in your "Experimental Settings" section in the ARXIV paper says you use 0.01.

Screenshot from 2020-11-09 16-07-33

Although, analyzing your source code your ResNet50 model uses an initial LR of 0.1.

Screenshot from 2020-11-09 16-08-03

I believe the paper is mistaken, as running your source code seems to be fine. In fact, the whole experimental setup is incorrect in comparison to this LR_REGIME.

some questions about experiment setting and discriminator

HI~ @szq0214

I'm highly intersted in your work!
Here is a question, I hope you can give your thoughts about it.

  1. in experiment setting, why set weight_decay to 0, in general, weight_decay is important factor to the final performance, usually have 1% validation accuracy difference on ILSVRC2012 imagenet.

  2. about the discriminator, It contains three convolution operations, its inputs is the logits of student and combined logits of teachers, but the target for discriminator is not right, in code that is as following:

target = torch.FloatTensor([[1, 0] for _ in range(batch_size//2)] + [[0, 1] for _ in range(batch_size//2)])

I think the target should be [1,0] through the whole batch_size, so that is weird. are there any considerations? if so, the influence of discriminator loss is to make logit of students away from teachers, something like regularization?

What is the performance of the teacher model

As the results of table II in your paper, trainning from scratch using Resnet obbtains 76.51% accuracy. When the input size is 224 x 224, the student model Resnet 50 obtains 80.67% accuracy with senet154 and resnet152 v1 applied as teacher models through MEAL-V2.
So I am wondering what is the performance of the pre-trained teacher model since they are with larger and more effcitive architectures?

what's the training result on imagenet when training from scratch ?

Hi @MingSun-Tse,
i have noticed that you said you may train you distillation from scratch (random initial) on imagenet ,
i am wondering whats your training result because i want to use your method to train on my own dataset , while all i have is a large model train on this dataset . should i train this model on resnet50 firstly and than use your code to finetune or i can directly use your code to distillation exists model ?

pretrain model for student?

Hi, thanks for your job.
I was looked into your code and found that the func "_create_model" always load pretrained weight, even it's a student model.
And I'm curious that dose the student always start with a pretrained?
Thank you.

Discriminator LR Decay

Thanks for your work and the code release!

I have a small question about the lr decay schedule for the discriminator- the initial lr value for the discriminator is set to 1e-4 but it looks like it gets clobbered with the student lr value in _set_learning_rate:

MEAL-V2/train.py

Lines 94 to 96 in 3558f37

def _set_learning_rate(optimizer, lr):
for param_group in optimizer.param_groups:
param_group['lr'] = lr

Is this intentional? The discriminator is a simple model so I don't think this would make a big difference either way.

Thanks

Could not find the generator loss.

Hi,

thanks for your great job.

When I read the code, I found there is only the discriminator loss and no generator loss. In other words, there is no adversarial training in MEALv2, which is different from my intuition. I want to know what is the advantage of just using the discriminator.

img size, why 380?

380不能被32整除,在跑resnet的时候会不会性能受到影响?因为要经过5次下采样,1次maxpooling,4次stride=2卷积
这个模型
MEAL-V2 w/ ResNet50 | 380 | 25.6M | 81.72/95.81

datasets

Can i apply it in my own datasets?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.