megvii-research / dpgn Goto Github PK

View Code? Open in Web Editor NEW

177.0 177.0 38.0 1.1 MB

[CVPR 2020] DPGN: Distribution Propagation Graph Network for Few-shot Learning.

License: MIT License

Python 100.00%

distribution-graph few-shot-learning graph-network unsupervised-learning

dpgn's People

Contributors

Stargazers

Watchers

dpgn's Issues

question about evaluation partition.

Hi,
Interesting work and well done!
If I understand the code correctly, to get the best model when training, you used the test set to validate the accuracy. Should we use the validation set instead?

Cheers

Could you please release the pre-trained conv4 model of miniImageNet？

Hey，
The work of your team is very impressive, and i want to do some contrast experiments with your approach, could you please give out the link to the pre-trained conv4 model of miniImageNet?

Hope to receive your reply soon, thank you.

About accuracy

Thanks so much for your extraordinary work! I get a very low accuracy using your code. And I only changed num_workers from 8 to 0.

This is my log.txt in 5way_1shot_resnet12_mini-imagenet:
[2020-07-08 00:57:47,863] [main] test_acc : 0.5358599852919579 step : 48000
[2020-07-08 00:57:47,864] [main] test_best_acc : 0.5789199849367141 step : 13000
......
[2020-07-08 01:09:22,572] [main] -------------command line arguments-------------
[2020-07-08 01:09:22,572] [main] Namespace(checkpoint_dir='./checkpoints', config='config/5way_1shot_resnet12_mini-imagenet.py', dataset_root='dataset', device='cuda:0', display_step=100, exp_name='5way_1shot_resnet12_mini-imagenet', log_dir='./logs', log_step=100, mode='eval', num_gpu=1, seed=222)
[2020-07-08 01:09:22,572] [main] -------------configs-------------
[2020-07-08 01:09:22,572] [main] OrderedDict([('dataset_name', 'mini-imagenet'), ('num_generation', 6), ('num_loss_generation', 3), ('generation_weight', 0.2), ('point_distance_metric', 'l2'), ('distribution_distance_metric', 'l2'), ('emb_size', 128), ('backbone', 'resnet12'), ('train_config', OrderedDict([('num_ways', 5), ('num_shots', 1), ('batch_size', 25), ('iteration', 100000), ('lr', 0.001), ('weight_decay', 1e-05), ('dec_lr', 17000), ('dropout', 0.1), ('loss_indicator', [1, 1, 0]), ('lr_adj_base', 0.1), ('num_queries', 1)])), ('eval_config', OrderedDict([('num_ways', 5), ('num_shots', 1), ('batch_size', 10), ('iteration', 1000), ('interval', 1000), ('num_queries', 1)]))])
[2020-07-08 01:09:22,793] [main] find a checkpoint, loading checkpoint from ./checkpoints/5way_1shot_resnet12_mini-imagenet
[2020-07-08 01:09:27,521] [main] best model pack loaded
[2020-07-08 01:09:27,548] [main] current best test accuracy is: 0.5789199849367141, at step: 13000
[2020-07-08 01:10:52,549] [main] ------------------------------------
[2020-07-08 01:10:52,551] [main] step : 13000 test_edge_loss : 2.522496324658394 test_node_acc : 0.5824999854266644
[2020-07-08 01:10:52,551] [main] evaluation: total_count=999, accuracy: mean=58.25%, std=8.37%, ci95=0.52%

And in 5way_5shot_resnet12_mini-imagenet:
[2020-07-07 13:46:28,276] [main] -------------configs-------------
[2020-07-07 13:46:28,277] [main] OrderedDict([('dataset_name', 'mini-imagenet'), ('backbone', 'resnet12'), ('emb_size', 128), ('num_generation', 6), ('num_loss_generation', 6), ('generation_weight', 0.2), ('point_distance_metric', 'l2'), ('distribution_distance_metric', 'l2'), ('train_config', OrderedDict([('num_ways', 5), ('num_shots', 5), ('batch_size', 8), ('iteration', 100000), ('lr', 0.001), ('weight_decay', 1e-05), ('dec_lr', 15000), ('dropout', 0.1), ('lr_adj_base', 0.1), ('loss_indicator', [1, 1, 1]), ('num_queries', 1)])), ('eval_config', OrderedDict([('num_ways', 5), ('num_shots', 5), ('batch_size', 4), ('iteration', 1000), ('interval', 1000), ('num_queries', 1)]))])
[2020-07-07 13:46:28,445] [main] find a checkpoint, loading checkpoint from ./checkpoints/5way_5shot_resnet12_mini-imagenet
[2020-07-07 13:46:34,063] [main] best model pack loaded
[2020-07-07 13:46:34,090] [main] current best test accuracy is: 0.7321500130593777, at step: 17000
[2020-07-07 13:48:42,438] [main] ------------------------------------
[2020-07-07 13:48:42,439] [main] step : 17000 test_edge_loss : 4.024368638753891 test_node_acc : 0.7245500129163265
[2020-07-07 13:48:42,440] [main] evaluation: total_count=999, accuracy: mean=72.46%, std=12.81%, ci95=0.79%

How should I do to get a higher accuracy?

tiered-imagenet

Hello, I have followed the file's instructions but was unable to download the 'tiered-imagenet' dataset due to permission issues. Could you please share this dataset with me?

The code only works on num_queries=1

DPGN/main.py

Line 316 in b940111

query_node_pred_loss = [

self.pred_loss(query_node_pred_generation, query_label.long()).mean()

For example, 5 way 1 shot, num_queries=1,
query_node_pred_generation has shape of [batch_size, 5, 5]
query_label has shape of [batch_size, 5]

5 way 1 shot, num_queries=2,
query_node_pred_generation have shape of [batch_size, 10, 5]
query_label has shape of [batch_size, 10]

In query_node_pred_generation, which dimension is the class (i.e., N ways)?

Node label sequence

Hi:
In the 5-W 1-S setting, the query set label of each batch during training and testing is [0,1,2,3,4], no scrambling is performed,Will this make the network remember this setting, and the accuracy will increase?In other papers (GNN, relational network) that I read for few shot learning, the labels of the query set are out of order, so I follow this idea of out of order and only use the source code of each batch test query set label randomly scramble, maybe [1,4,2,0,3], [1,2,4,0,3], [2,0,4,3,1], etc., init_edge is also based on the modified label,the sequence generated is still a 10*10 symmetric matrix, and the accuracy value is only about 43%, which is far from the 66.27% accuracy of my source code.I also scrambled during the training and testing phases, and the result was about 43%.What I thought about the graph network at the beginning was that the order of the node labels should have no effect on the accuracy rate, because we made the form of the data into the graph, data structured, and relative, but this huge accuracy difference makes me,I don't quite understand it. Did I set it wrong?
thank you very much!

backbone of WRN and ResNet18

Can you provide the code of WRN and ResNet18 in the backbone.py?Thanks a lot.

Show the image and label of query data

Hi,
I want to use my own dataset on this model,
but i also want to show which the query data is running now[ex. image.show()].
did you have any idea to show the image and the label because i can't find the place that query data is processing now, thank you.

can't find pretrained model

Dear yang,
I want to ask for help. When I open the google drive to download the pretrained model mentioned in README.md. there is nothing in that folder. I don't know what to do. Could you share it with me again?
Thank you! Looking forward to your reply.

unable to reproduce the result

Dear Yang:

I am really impressed with your work. The work provides me with a new angle and significantly raise the benchmark of few-shot learning tasks. However, when I tried to reproduce your result with the public code, I found the test accuracy for 5way 5shot miniimagenet (Convnet) tasks is around 78%, and the final test accuracy is about 76%. I guess there must be some tricks, could you kindly help me?

Thank you.

性能和论文中不符

在为miniImageNet分类中使用DPGN backbone为convNet 5way5shot的情况下准确率与文中相应情况的准确率差别有5%。请问是为什么呢？

About the num_queries

Thanks for your work, in your experiments, 'num_queries' is set to 1 in stead of 15. Does it meet the standard few-shot learning setting ? @zsc @zxytim

Can you release the config file of Conv4 in miniImageNet and CIFAR_FS?

I want to reproduce you result, and I believe different configurations have big impact on the final result. So can you release the config file of Conv4 in miniImageNet and CIFAR_FS? Thank you very much.

Could you release the config file of Conv4 in CUB?

Hello, thanks for creating this wonderful work! I want to reproduce your result on CUB with Conv4 but cannot find the config file Could you please release the file? Thanks a lot for your help!

Ability to run my own data through the model

Hi,
I wanted to check this model out and test it for a dateset of images that I have.
Is that currently possible?
Regards.

Error running with CUB

Hi,

I am running with CUB with the following command:
python3 main.py --dataset_root dataset --config config/5way_1shot_resnet12_cub-200.py --num_gpu 1 --mode train

And I got this error
File "main.py", line 579, in
main()
File "main.py", line 570, in main
trainer.train()
File "main.py", line 105, in train
last_layer_data, second_last_layer_data = backbone_two_stage_initialization(all_data, self.enc_module)
File "/data/add_disk0/vhnguyen/cvpr21/DPGN/utils.py", line 197, in backbone_two_stage_initialization
encoded_result = encoder(data.squeeze(1))
File "/home/vhnguyen/anaconda2/envs/py36_torch1_7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/add_disk0/vhnguyen/cvpr21/DPGN/backbone.py", line 101, in forward
x = self.avgpool(x)
File "/home/vhnguyen/anaconda2/envs/py36_torch1_7/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/vhnguyen/anaconda2/envs/py36_torch1_7/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 595, in forward
self.padding, self.ceil_mode, self.count_include_pad, self.divisor_override)
RuntimeError: Given input size: (512x6x6). Calculated output size: (512x0x0). Output size is too small

Could you please let me know how to fix this?

Thank you very much.

关于测试样本每类数目随机抽取的打乱测试

您好！
您有没有试过类似于（1,1,2,3,4）,（0,0,2,2,3）这种每类不是固定取一个的测试方法呢？我在您的模型上进行这样的测试发现准确率会降低很多。实际上egnn进行这种测试准确率也会降低很多，是不是学习到了一些先验知识呢？

Questions about Resnet 12 backbone

Hello, I am reading your codes and there are a few questions i want to ask.

What is the purpose of the outputs of second last layer? I found that the outputs of second last layer are used to compute the point similarities, but why don't we just use the outputs of the last layer(i.e., embedding features of images)?
What is the exact meaning of num_queries? You describe it as the number of samples in query set, so when num_queries = 1, dose it means that there is only one image in query set? However, i found that the there are five images in the query set when num_queries is set to 1.

Question about avgpool size in backbone.py

Thanks for your sharing.
I encountered a problem about the size of avgpool.
When I implement your code, I found that the size of input before avgpool is 512X6X6 for miniimagenet dataset. However, since the filter size of avgpool is 7X7, the output size is too small (i.e., 512X0X0).
Could you help me solve the problem?
Thank you.

Can't repeat the result.

Wonderful work, but I cannot repeat your results. Specifically, I try to use ResNet12 as the backbone to solve the 5-way 1-shot task on CIFAR-FS. But the final accuracy I got was 70.63%.

[2020-07-16 09:08:10,015] [main] step : 67000  test_edge_loss : 2.035566942691803  test_node_acc : 0.7063199849426747
[2020-07-16 09:08:10,016] [main] evaluation: total_count=999, accuracy: mean=70.63%, std=7.18%, ci95=0.44%

My program environment is

CUDA Version: 10.0
Python : 3.6.7

Does the environment have such a big impact on accuracy?

How to make Dataset？

Dear Yang,

Thank you for releasing the repo. Could you give me detailed instructions on how to make the data set in the paper?

Best,
Zhongshan Bao

pickle file

Could you provide the code for generating the pickle file of miniimagenet?

请问一下您运行时的gpu条件是？

这边尝试在自己的机子上跑您的代码，2080ti 11GB，但只有5way-1shot的ConvNet能跑起来，5shot或者是ResNet12都会出现类似
RuntimeError: CUDA out of memory. Tried to allocate 36.00 MiB (GPU 0; 10.76 GiB total capacity; 9.33 GiB already allocated; 56.69 MiB free; 219.71 MiB cached)
的问题。想知道您这边运行是的gpu是？看了下您的论文也没有提到gpu这块。

Question about backbone WRN

Hello team DPGN:
I noticed that you've gotten the result of DPGN with WRN as your backbone, however there isn't any classes named WRN in your backbone.py. Could you please show us how WRN works in DPGN? And if you could put the config file at the same time, that will be better! Thanks a lot!

节点标签顺序性问题

  您好：
          不管是在EGNN论文和您的DPGN中，在5-W 1-S设置时，每一个batch的训练和测试时查询集标签都是[0,1,2,3,4]，没有进行打乱，这样会不会使网络记住这种设置，从而准确率升高？我看的其他针对few shot learning 的论文（GNN,关系网络），查询集的标签都是乱序的，因此我按照这种乱序**，只在源码中将每一个batch测试时的查询集标签随机打乱，可能[1,4,2,0,3]、[1,2,4,0,3]、[2,0,4,3,1]等等，init_edge也是根据修改后的标签顺序生成的，依然是个10*10的对称矩阵，准确率值只达到了43%左右，这与我用您源码准确率66.27%差的太多。在训练和测试阶段我也进行同时打乱，结果也是43%左右。我对于图网络一开始想的是节点标签顺序应该对于正确率是没有影响的，因为我们将数据构成图的形式，数据结构化，具有相对性了，但是这种巨大的准确率差异，使得我不是很明白，是我哪里设置错了吗？
            非常感谢！

About the pretrained model and validation set in your codes

Thanks so much for your extraordinary work! I have 2 questions on your codes.

How did you get the pretrained model? When I train from scratch without any pretrained model, the train_edge_loss is around 4 while if I used the pretrained model, it's around 1. And the testing accuracy is 66.32 for 5-way 1-shot ResNet12 (which is supposed to be 67.77)... I am a little confused about this.
It seems that you didn't use the validation set in your main.py since partition = 'test' in your train(). If this is true, then are you using the testing set to get the best accuracy directly? Or did you set the partition to 'val' when you were training the model?

Question about the dataset:CUB-200

With the dataset link provided, I follow the steps in 'download_CUB.sh' , finding there's no file named 'split' under my directory, and the code can't run without this file on CUB-200 dataset

the performance on mini-ImageNet

Hi，DPGN teams：
I am interested in your paper，so I ran your code on my computer.
When I ran the 5-way 1-shot task, I found that the test_ACC value began to rise until it reached a maximum of 57.8 at 18000 steps, and then began to decline slowly until it finally reached 52.7%. What is the reason? Is it because I set the num_workers workers in the dataloader.py to 0?

trian CUB-200-2011 error

hi,can you help me? Thank you very much.
Traceback (most recent call last):
File "main.py", line 580, in
main()
File "main.py", line 571, in main
trainer.train()
File "main.py", line 84, in train
for iteration, batch in enumerate(self.data_loader'train'):
File "/home/wuchenxi/Desktop/DPGN-master/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in next
return self._process_next_batch(batch)
File "/home/wuchenxi/Desktop/DPGN-master/venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
File "/home/wuchenxi/Desktop/DPGN-master/venv/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/wuchenxi/Desktop/DPGN-master/venv/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/wuchenxi/Desktop/DPGN-master/venv/lib/python3.7/site-packages/torchnet/dataset/listdataset.py", line 54, in getitem
return self.load(self.list[idx])
File "/home/wuchenxi/Desktop/DPGN-master/dataloader.py", line 285, in load_function
support_data, support_label, query_data, query_label = self.get_task_batch()
File "/home/wuchenxi/Desktop/DPGN-master/dataloader.py", line 259, in get_task_batch
task_class_list = random.sample(self.full_class_list, self.num_ways)
File "/usr/local/python3.7.5/lib/python3.7/random.py", line 321, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative

Reproduce issues on Cifar-fs and TieredImagenet

Dear Yang:

Thank you for releasing the repo. Could you share the detailed learning schedule for Cifar-fs and TieredImagenet datasets for reproducing the results reported in the paper?

Best,
TANG, shixiang

关于边初始化的疑问

您好！DPGN是一个十分优秀的工作。但是我在阅读您的代码的时候遇到了一些问题，是关于输入GNN的初始化边，edge_feature_gd和edge_feature_gp这两个矩阵。对于5way-1shot，1query来说，它是一个10×10的矩阵，在这个矩阵的右下角的5×5的矩阵中，您将其初始化为5×5的单位阵。但是这样是合理的吗？这样子是否引入了先验知识：任意两个询问集的样本均不同类呢？另外我使用您的代码，batchsize为25，backbone使用convnet，5way-1shot最后的测试结果为64.42±0.52，最后测试出来的结果和您论文中的结果差距（66.01±0.36）有点大，我该如何修改训练方式才能达到您展示的效果呢？

ImportError: libcudart.so.9.0: cannot open shared object file: No such file or directory

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED.

Hi,
A strange bug occurred when I ran the code to the total_loss.backward()
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
Have you ever come across this problem

megvii-research / dpgn Goto Github PK

dpgn's People

Contributors

Stargazers

Watchers

Forkers

dpgn's Issues

Recommend Projects

Recommend Topics

Recommend Org