guyuc / ws-dan.pytorch Goto Github PK

A PyTorch implementation of WS-DAN (Weakly Supervised Data Augmentation Network) for FGVC (Fine-Grained Visual Classification)

License: MIT License

Python 100.00%

ws-dan.pytorch's People

Contributors

Stargazers

Watchers

Forkers

bityangke zhaowh96 caigx111 cxxxin songfengds lusihua davidocea parsonszeng dennisleouts xxchenxx dlwbm123 cahya-wirawan zhjikoshlizhzc crekiron rid7 nothingeasy wolfworld6 barbecacov kingofview forrest-ht jdc08161063 zoops gdjmck amireshratifar-zz rajaskakodkar lijian10086 zanonshao henrylol clearrain deepalchemist dreadlord1984 wlp1996 soulempty plenari aubopiazt jcheng123 willforcv chestnut-fish dzcgaara irentang lamtio kwanegx zymale yyqgood azuredsky yzx1992 brooks0519 mx1mx2 precsys ideaflow sunyuanrui lijx1996 googlelee herokuma peiqi00 xysong1201 tangohu17 naman9639 serignecisse yuanliangxie xiaowang0 msjyyt leofengxin daiguangzhao youtang1993 qiaoxie jieliu-uva gaoziqiang luolinll1212 choieq wangbo2016 cv-ip roy860328 monkeyqx scilover zixuan-zhu ljm198134 runtao centraltang scorpiokay bysen32 xizhipeng0618 yirui-fafa hongbo-sun tor4z yingunjun muzihuole duxinkang yanll2021 darkleaves tkone2018 prathibha07 weidush celsopitta hnrna janelinlalala uplmgup6 kucukgz

ws-dan.pytorch's Issues

self.features=inception_v3(pretrained=True).get_features() dimension dismatch?

excese me,l would like test inception.py,input 3x229x229,and output 3x17x17,it's the shape I need,but
in wsdan.py ,input 3x229x229,awesome!output 3x12x12?thank

when I run train.py, there is the problem, request to solve.

feature_center_batch = F.normalize(feature_center[y], dim=-1)
IndexError: tensors used as indices must be long, byte or bool tensors

test

How to test a single image

Multiply feature matrix by 100?

In the forward pass of the model (here) we have this line, which calculates the class logits:

# Classification
p = self.fc(feature_matrix * 100.)

I'm not sure where this multiply by 100 magic number is coming from. Can you tell me why this is here, and why it's necessary? When I remove it, learning seems to stall. The only thing I can think is that it's supposed to boost the gradient, I'm just not sure why this is necessary.

Some confusions about the code

Hi, many thanks for this implementation. As shown in the following figure1, Does it mean the k (shown in the figure2) in the original always equal to 0 during training?

epoch_acc[0] += accuracy(y_pred, y, topk=(1, 3, 5))

1，3，5代表什么
1，3，5代表什么

l don't understand What does 1, 3, 5 stand for?Why to define epoch+acc=np.array([[0,0,0],[0,0,0],[0,0,0]])

some questions

Hello, author.
I encountered the following problems while looking at your code:

optimizer.zero_grad location should not be placed with loss.back
When calculating accuracy, the return should be np.array(), not tensor type
The understanding is wrong, please point out, thank you.

训练问题

训练fgvc-aircraft,40个epoch基本上稳定了，现在raw的正确率82,%，crop正确率74%，drop正确率66%。raw+crop的正确率是78左右。请问这是没有学到注意力吗？

resnet50在cub上只达到了84.7%的acc

在使用resnet50作为backbone时，在cub数据集上只达到了84.7%的acc。

exception thrown when nonzero_indices is empty

While training the model on a custom dataset I notice that I run into cases where nonzero_indices is empty and https://github.com/GuYuc/WS-DAN.PyTorch/blob/master/utils.py#L157 then throws an exception.

Is this behavior expected to happen? I'm guessing one could set:

                height_min = 0
                height_max = imgH
                width_min = 0
                width_max = imgW

when nonzero_indices

But I wanted to first confirm that I was understanding this code correctly and that nonzero_indices being empty wasn't symptomatic of a deeper issue.

how can I get the pictures labels

When I did model validation, I got three pictures (raw.jpg,raw_atten.jpg,heat_atten.jpg) I want to get the class label corresponding to each picture in the validation set. What should I do?

About pytorch computation graph

Hi mate,

I'm trying to reproduce experiment results using WS-DAN/Xception and I'm impressed by the implementation of the WS-DAN network.

However, in train-wsdan.py, when I try to iterate dataloader, for i, (X, y) in enumerate(data_loader):, it calls batch_loss.backward().

It shows the following error:
** RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8]] is at version 4; expected version 3 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

So I print out parameters in "net":

 for name,parameters in net.named_parameters():
    if parameters.size()[0]==8:
        print(name,':',parameters.size())

which shows the so-called "[torch.cuda.FloatTensor [8]]" variables.

module.attentions.conv.weight : torch.Size([8, 2048, 1, 1])
module.attentions.bn.weight : torch.Size([8])
module.attentions.bn.bias : torch.Size([8])

So I find how the attention weights are built at the very beginning:

# Generate Attention Map
if self.training:
   # Randomly choose one of attention maps Ak
   attention_map = []
      for i in range(batch_size):
         # attention_weights = torch.sqrt(attention_maps[i].sum(dim=(1, 2)).detach() + EPSILON)
         attention_weights = torch.sqrt(attention_maps[i].sum(dim=(1, 2)) + EPSILON)
         attention_weights = F.normalize(attention_weights, p=1, dim=0)
         # It block the gradients flow??
         k_index = np.random.choice(self.M, 2, p=attention_weights.cpu().detach().numpy())
         pdb.set_trace()
         attention_map.append(attention_maps[i, k_index, ...])
         attention_map = torch.stack(attention_map)  # (B, 2, H, W) - one for cropping, the other for dropping

So my question is, these parts use NumPy to calculate, so it seems what are we trying to build is actually two separate computation Graphs?

k_index = np.random.choice(self.M, 2, p=attention_weights.cpu().detach().numpy())
pdb.set_trace()
attention_map.append(attention_maps[i, k_index, ...])
attention_map = torch.stack(attention_map)

Or, should we just use pytorch to implement it? Because the gradient calculation error seems caused by this.

Thx for answering in advance!

evel

self.corrects * 100. 为什么要给self.corrects 乘100呢？

训练结束以后类别在呢没看呢？

some question about Augmentation Map in paper.

normalize it as k th Augmentation Map not found in your code.?
Last,thnx you best work!

What's the meaning of Val Acc (0.53, 2.60)?

When I train the model with Standford Car, it shows 'Val Acc (0.53, 2.60)'. What are 0.53 and 2.60 meaning for? Thank you

About performance

Hi
Can this implementation reproduce the performance in the original paper?
Thanks!

Where can I get the pretrained weights of inceptionv3?

feature_matrix and feture_center[y] mismatch when using resnet_50

Sorry. It was my mistakes.

Questions about the code

This is a piece of code in the wsdan.py program.
According to my understanding, M(32) and p([1,32]) of np.random.choice() are different sizes, but why is there no error?

if self.training:
attention_map = []
for i in range(batch_size):
attention_weights = torch.sqrt(attention_maps[i].sum(dim=(1, 2)).detach() + EPSILON)
attention_weights = F.normalize(attention_weights, p=1, dim=0)
k_index = np.random.choice(self.M, 2, p=attention_weights.cpu().numpy())
attention_map.append(attention_maps[i, k_index, ...])
attention_map = torch.stack(attention_map)
else:
attention_map = torch.mean(attention_maps, dim=1, keepdim=True)

consult

Hello, may you fully reproduce the accuracy of the paper on the CUB dataset? 89.4%?

Performence doesn't change

the val acc doesn't change for several epoch when I train the model. What's wrong with it?
Epoch 85/300: 100%|██████████| 8144/8144 [1:24:14<00:00, 1.61 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.7789, Val Acc (0.52, 2.70)]
Epoch 86/300: 100%|██████████| 8144/8144 [1:24:12<00:00, 1.61 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.8678, Val Acc (0.52, 2.70)]
Epoch 87/300: 100%|██████████| 8144/8144 [1:24:20<00:00, 1.61 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.6306, Val Acc (0.52, 2.72)]
Epoch 88/300: 100%|██████████| 8144/8144 [1:24:12<00:00, 1.61 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.8817, Val Acc (0.52, 2.71)]
Epoch 89/300: 100%|██████████| 8144/8144 [1:23:19<00:00, 1.63 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.9679, Val Acc (0.52, 2.71)]
Epoch 90/300: 100%|██████████| 8144/8144 [1:22:45<00:00, 1.64 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.4493, Val Acc (0.52, 2.71)]
Epoch 91/300: 100%|██████████| 8144/8144 [1:22:45<00:00, 1.64 batches/s, Loss 6.2739, Raw Acc (0.39, 2.49), Crop Acc (0.39, 2.49), Drop Acc (0.39, 2.49), Val Loss 44.8331, Val Acc (0.52, 2.67)]
Epoch 92/300: 97%|█████████▋| 7933/8144 [1:09:51<01:57, 1.80 batches/s, Loss 6.2741, Raw Acc (0.39, 2.53), Crop Acc (0.39, 2.53), Drop Acc (0.39, 2.53)]

the profermance on Stanford Cars is less than 70%.

I train the network on the stanford cars, the loss is keeping 1.2 when it runs about 90 epoch , the accuracy on the test dataset is only less than 70%.

the parameter I used is : batch size=4, pretrainmode=resnet50, others are not change.

Too slow when loading inception model on windows server

Hi, after training, I tried to use eval.py to do evaluation.

Eval on my win10 PC works perfectly, but pretty slow on windows server.

After dubugging, I found that it takes about 20 minutes when loading inception model on windows server.

License

Thanks for sharing the implementation of the paper. May I know if the code is open sourced? If it is would you mind adding an open source license to it?

How can I get the confidence score?

When I test the weights, how can I know the confidence score of the class? Thank you

eval

您好，我想问您一下，在您的eval 代码中我如何可以显示每一个attention map的热图。

When doing Attention Cropping, Randomly choose attention map or just choose first attention map?

As paper described:

For each training image, we randomly choose one of its attention map A k to guide the data augmentation process,

attention map are randomly choosed. but in code:
crop_images = batch_augment(X, attention_map[:, :1, :, :], mode='crop', theta=(0.4, 0.6), padding_ratio=0.1)
only use the first attention map?
Thanks.