tjddus9597 / proxy-anchor-cvpr2020 Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch Implementation of Proxy Anchor Loss for Deep Metric Learning, CVPR 2020
License: MIT License
Official PyTorch Implementation of Proxy Anchor Loss for Deep Metric Learning, CVPR 2020
License: MIT License
기존 모델의 FNN 모델부분이었던 model.model.embedding을 현재 model.cgd 안쪽으로 다 넣어두었는데,
훈련 모델에서 해당 파라미터에 속하지 않는 파라미터만 골라서 unfreeze하거나 freeze하는 기능이 구현되어있어 문제가 있음.
model.cgd 안쪽의 파라미터들을 model.embedding 쪽으로 넘겨주던지, 다른 방법을 강구해야 함.
Thanks for open source, when you train the model on In-shop, the clothes are closely croped according to gt-box? Or expand the gt box in the cropping clothes? I think this will affect the image preprocessing in the algorithm. Looking forward to your reply.
Did you try hard sample mining in this loss?
Hi, I try to reproduce n-pair loss and i adhere to your hyperparameter but I can't get the result in the original paper. Which hyperparameter you use for n pair ? N pair loss in metric learning library is updated and some of your argument is not accepted. Also, could you share other parameter such as bone, emb learning rate, weight decay, batch size and image per class ?
proxyNCA loss is also used here to compare with proxy anchor loss.
According to doc of proxyNCA loss, this loss have parameters that need optimization:
This loss requires an optimizer. You need to create an optimizer and pass this loss's parameters to that optimizer.
However, in your code, you only optimize the parameters of the proposed proxy anchor loss, as shown here.
Are there any hidden reasons untold or just an oversight?
Training complexity O(MC), M is batch size, C is number of classes. For most data sets like imagenet, C is much larger than M, which is to say, MC > M^2 or even MC > M^3, how do you explain this?
Look at the orginal Proxy NCA paper, it accent the point that it will take O((N/b)^3) steps to consider all samples while it does not mean O(b^3) is a large number, they are totally different thing. I think it is meaningless to illustrate that O(MC) is better than O(M^3).
Hello,
I guess the README reported training result, not evaluation for SOP? I used your pre-trained model, ran the evaluate.py and got much lower result. Is this expected?
Method | Backbone | R@1 | R@10 | R@100 | R@1000 |
---|---|---|---|---|---|
Proxy-Anchor512 | Inception-BN | 79.2 | 90.7 | 96.2 | 98.6 |
Run code/evaluate.py | Inception-BN | 49.4 | 65.0 | 78.8 | 91.3 |
Here is what I did
python Proxy-Anchor-CVPR2020/code/evaluate.py --gpu-id -1 --batch-size 120 --model bn_inception --embedding-size 512 --dataset SOP --resume ../pretrained/SOP_bn_inception_best.pth --workers 4
To get it run on CPU, Ubuntu 20.04 (WSL), torch==1.13.1, I changed code related to cuda:
# model = model.cuda()
if args.gpu_id != -1:
model = model.cuda()
torch.load(args.resume, map_location=torch.device('cpu'))
Also fixed an error by adding strict=False
model.load_state_dict(checkpoint['model_state_dict'], strict=False)
Otherwise, I will hit errors raised by torch shown below
if strict:
if len(unexpected_keys) > 0:
error_msgs.insert(
0, 'Unexpected key(s) in state_dict: {}. '.format(
', '.join('"{}"'.format(k) for k in unexpected_keys)))
if len(missing_keys) > 0:
error_msgs.insert(
0, 'Missing key(s) in state_dict: {}. '.format(
', '.join('"{}"'.format(k) for k in missing_keys)))
for Evaluating Image Retrieval part.
the cub_resnet50_best.pth syncs with the http://data.lip6.fr/cadene/pretrainedmodels/bn_inception-239d2248.pth, rather than the bn_inception-52deb4733.pth, cause they have different weight dims.
python evaluate.py --gpu-id 7 \
--batch-size 10 \
--model bn_inception \
--embedding-size 512 \
--dataset cub \
--resume /.../cub_resnet50_best.pth
i'm sorry for mistakes.
Hello, I have seen some of the work of Metric Learning before. To ensure fairness, Most of them use the BN_Inception structure as the backbone, and use the output of the final gap layer as embedding. However, I see that the structure of gap + gmp is used in your code, but there is no description in the paper, but the points of other work reported in the paper are only using gap, can you explain and use gap at the same time And the number of gmp rise points.
Also 'Deep Metric Learning Beyond Binary Supervision'
Do you have any plans to evaluate your loss on ReID datasets? If not, would it be okay for me to implement them in my repo for ReID and share it with you guys? Thanks.
Thanks for this great work !!
I would like to suggest a minor update for a bug that I have found.
ERROR
If you use Proxy_NCA you will most probably get this error :
TypeError: init() missing 2 required positional arguments: 'nb_classes' and 'sz_embed'
Solution:
Line 225 in code/train.py -> criterion = losses.Proxy_NCA().cuda()
change it to :
criterion = losses.Proxy_NCA(nb_classes = nb_classes, sz_embed = args.sz_embedding).cuda()
Thanks again,
Moured
Thank you for sharing your code. Nice work! But I got confused about the sampling method.
I see that the random sampler is used in your code. In this case, for a specific p ∈ P+, |Xp+| is equal to 1 with a high probability. Since |Xp+| is usually no more than 1, there is no "the most dissimilar positive example" to mine. Then do we still need the complicated smooth method of positive sample pairs in your loss function?
I would be appreciated if you could unravel my doubts.
Can you please list the parameters(LR, epochs, etc) for Proxy_NCA?
Once I compare Proxy Anchor with Proxy NCA, there is no significant difference. What does Proxy Anchor do still focuses on s(x,p) rather than s(x,x). And what is the difference between s(x,p) and s(p,x)? Regard x as anchor, then you need compute x * p pairs. Regard p as anchor, then you need compute p * x pairs? What is the difference? I notice the difference is that Proxy Anchor attributes different weights for different pairs? But you are still doing with data-to-proxy relationship. ?
Hello!
First, thank you for sharing your work!
We tried to reproduce the results in the paper by using both your codebase and also ours+pytorch_metric_learning lib. All hyper-parameters are the same as the ones in the paper, but in training, we can only reach 79% R@1 for Cars-196 dataset. When using your trained weights, R@1: 81.48%, a little bit far from the reported results (~86%).
By the way, for CUB-200 dataset, we achieved to reproduce the results by training with the same hyper-parameters ~69% R@1 and inferring your trained weights.
In the paper, referred that hyper-parameters for CUB-200 and Cars-196 are the same. Is there any hyper-parameter that differs for the training of these datasets? Also, could you please check the trained weights for Cars-196?
Thanks!
Okay, I tried the BalancedSampler provided and find that it is way too slow for larger datasets, say 50000 class and 10 images for each class.
I did some benchmark and find that the issue is with this line:
ith_class_idxs = np.nonzero(np.array(self.ys) == sampled_classes[i])[0]
You are repeated initializing a numpy array from self.ys
, which is time consuming. One way to fix this is to initialize the self.ys as a numpy array in __init__()
. Do you want me to fix this? I would be happy to provide a PR.
Dose anyone reproduce the result in CUB successfully? My results is 67%@r1. Can you give me some advice? Thanks!!
I have implemented the loss function using Keras. Hope the community will find it helpful :-)
Hi,
I have a question about experimental results using other methods such as MS.
Did you get experimental results of MS yourself or just use MS paper's result as it is?
I'm so happy to receive your response.
Thanks
if args.loss == 'Proxy_Anchor':
param_groups.append({'params': criterion.parameters(), 'lr':float(args.lr) * 100})
elif args.loss == 'Proxy_NCA':
param_groups.append({'params': criterion.parameters(), 'lr':float(args.lr)})
Why do you multiply yours lr by 100, but not the other method? In the teaser you claim your method trains faster, but it seems it is due to 100x higher lr.
python train.py --gpu-id 0 --loss Proxy_Anchor --model r
esnet50 --embedding-size 512 --batch-size 180 --lr 6e-4 --dataset SOP --warm 1 --bn-freeze 0 --lr-decay-step 20 --lr-dec
ay-gamma 0.25
wandb: Currently logged in as: shute (use wandb login --relogin
to force relogin)
wandb: wandb version 0.10.10 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.10.8
wandb: Syncing run eager-dream-16
wandb: ⭐️ View project at https://wandb.ai/shute/SOP_ProxyAnchor
wandb: 🚀 View run at https://wandb.ai/shute/SOP_ProxyAnchor/runs/2ca7m47a
wandb: Run data is saved locally in wandb/run-20201113_090754-2ca7m47a
wandb: Run wandb off
to turn off syncing.
Random Sampling
Training parameters: {'LOG_DIR': '../logs', 'dataset': 'SOP', 'sz_embedding': 512, 'sz_batch': 180, 'nb_epochs': 60, 'gpu_id': 0, 'nb_workers': 4, 'model': 'resnet50', 'loss': 'Proxy_Anchor', 'optimizer': 'adamw', 'lr': 0.0006, 'weight_decay': 0.0001, 'lr_decay_step': 20, 'lr_decay_gamma': 0.25, 'alpha': 32, 'mrg': 0.1, 'IPC': None, 'warm': 1, 'bn_freeze': 0, 'l2_norm': 1, 'remark': ''}
Training for 60 epochs.
0it [00:00, ?it/s]/home/server8/lst/Proxy-Anchor-CVPR2020-master/code/losses.py:48: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1595629411241/work/torch/csrc/utils/python_arg_parser.cpp:766.)
with_pos_proxies = torch.nonzero(P_one_hot.sum(dim = 0) != 0).squeeze(dim = 1) # The set of positive proxies of data in the batch
Train Epoch: 0 [330/330 (100%)] Loss: 10.849229: : 330it [01:34, 3.49it/s]
Evaluating...
100%|██████████| 337/337 [01:25<00:00, 3.95it/s]
R@1 : 51.770
R@10 : 67.938
R@100 : 81.594
R@1000 : 92.909
0it [00:00, ?it/s]
Traceback (most recent call last):
File "train.py", line 290, in
m = model(x.squeeze().cuda())
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/server8/lst/Proxy-Anchor-CVPR2020-master/code/net/resnet.py", line 175, in forward
x = self.model.layer1(x)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torchvision/models/resnet.py", line 112, in forward
out = self.conv3(out)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in forward
return self._conv_forward(input, self.weight)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 415, in _conv_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 552.00 MiB (GPU 0; 7.80 GiB total capacity; 6.09 GiB already allocated; 392.69 MiB free; 6.44 GiB reserved in total by PyTorch)
wandb: Waiting for W&B process to finish, PID 4172153
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb:
wandb: Find user logs for this run at: wandb/run-20201113_090754-2ca7m47a/logs/debug.log
wandb: Find internal logs for this run at: wandb/run-20201113_090754-2ca7m47a/logs/debug-internal.log
wandb: Run summary:
wandb: loss 12.3987
wandb: R@1 0.5177
wandb: R@10 0.67938
wandb: R@100 0.81594
wandb: R@1000 0.92909
wandb: _step 0
wandb: _runtime 193
wandb: _timestamp 1605276667
wandb: Run history:
wandb: loss ▁
wandb: R@1 ▁
wandb: R@10 ▁
wandb: R@100 ▁
wandb: R@1000 ▁
wandb: _step ▁
wandb: _runtime ▁
wandb: _timestamp ▁
wandb:
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb:
wandb: Synced eager-dream-16: https://wandb.ai/shute/SOP_ProxyAnchor/runs/2ca7m47a
CUDA out of memory in the second epoch.
I have set the batch size to 30, 100, 150, 180,. Nothing helps.
pytorch 1.6
CUDA 10.1
GPU RTX2080Super 8G
I have spent many hours, but still can not solve.
Many thanks for your help.
This is a great work.Thank you very much for the code.And can you provide the hyperparameters of other methods.
Hi author ! thanks for the nice repo !
I am the newbie in metric learning so i am very confuse when find the way to predict the ID (class) of image in CUB datasets. After reading in evaluate.py, i found in evaluate_cos function , i don't know the meaning of combine the embedding (after l2 norm) result in F.linear(X,X) and use it like the base for ground true (target). It looks like you use ground true to eval the model , not the prediction from the model !
Can you add more the file for inference demo or point out the way directly for solving issue in comment ? Forgive my "dummy question" and hope to see your answer soon !
I always get the “CUDA out of memory” error in the second training epoch.
OS: Ubuntu 20.04
GPU: RTX2080Super 8G
Pytorch: 1.6
CUDA: 10.1
I even set the batch size to 30, but it still report error.
Can anybody help me? Many thanks!
python test.py --gpu-id 0 --loss Proxy_Anchor --model resnet50 --embedding-size 512 --batch-size 60 --lr 1e-4 --dataset SOP --warm 0 --bn-freeze 1 --lr-decay-step 10
wandb: Currently logged in as: shute (use wandb login --relogin
to force relogin)
wandb: wandb version 0.10.10 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.10.8
wandb: Syncing run desert-gorge-52
wandb: ⭐️ View project at https://wandb.ai/shute/SOP_ProxyAnchor
wandb: 🚀 View run at https://wandb.ai/shute/SOP_ProxyAnchor/runs/2tahb6vq
wandb: Run data is saved locally in wandb/run-20201116_032833-2tahb6vq
wandb: Run wandb off
to turn off syncing.
Random Sampling
Training parameters: {'LOG_DIR': '../logs', 'dataset': 'SOP', 'sz_embedding': 512, 'sz_batch': 60, 'nb_epochs': 60, 'gpu_id': 0, 'nb_workers': 4, 'model': 'resnet50', 'loss': 'Proxy_Anchor', 'optimizer': 'adamw', 'lr': 0.0001, 'weight_decay': 0.0001, 'lr_decay_step': 10, 'lr_decay_gamma': 0.5, 'alpha': 32, 'mrg': 0.1, 'IPC': None, 'warm': 0, 'bn_freeze': 1, 'l2_norm': 1, 'remark': ''}
Training for 60 epochs.
Train Epoch: 0 [992/992 (100%)] Loss: 10.767641: : 992it [04:41, 3.52it/s]Evaluating...
100%|██████████| 1009/1009 [01:26<00:00, 11.69it/s]
R@1 : 0.020
R@10 : 0.159
R@100 : 1.397
R@1000 : 12.509
Train Epoch: 1 [992/992 (100%)] Loss: 10.454696: : 992it [04:41, 3.52it/s]
Evaluating...
100%|██████████| 1009/1009 [01:25<00:00, 11.77it/s]
Traceback (most recent call last):
File "test.py", line 321, in
Recalls = utils.evaluate_cos_SOP(model, dl_ev)
File "/home/server8/lst/Proxy-Anchor/Proxy-Anchor-CVPR2020-master/code/utils.py", line 148, in evaluate_cos_SOP
cos_sim = F.linear(xs,X)
File "/home/server8/anaconda3/envs/proj/lib/python3.8/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA out of memory. Tried to allocate 2.26 GiB (GPU 0; 7.80 GiB total capacity; 2.85 GiB already allocated; 1.26 GiB free; 5.56 GiB reserved in total by PyTorch)
wandb: Waiting for W&B process to finish, PID 2010371
wandb: Program failed with code 1. Press ctrl-c to abort syncing.
wandb:
wandb: Find user logs for this run at: wandb/run-20201116_032833-2tahb6vq/logs/debug.log
wandb: Find internal logs for this run at: wandb/run-20201116_032833-2tahb6vq/logs/debug-internal.log
wandb: Run summary:
wandb: loss 9.7143
wandb: R@1 0.0002
wandb: R@10 0.00159
wandb: R@100 0.01397
wandb: R@1000 0.12509
wandb: _step 1
wandb: _runtime 745
wandb: _timestamp 1605516058
wandb: Run history:
wandb: loss █▁
wandb: R@1 ▁
wandb: R@10 ▁
wandb: R@100 ▁
wandb: R@1000 ▁
wandb: _step ▁█
wandb: _runtime ▁█
wandb: _timestamp ▁█
wandb:
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb:
wandb: Synced desert-gorge-52: https://wandb.ai/shute/SOP_ProxyAnchor/runs/2tahb6vq
In the original ProxyNCA loss, distance measure seems to be "L2 distance". But in your paper, when you cite ProxyNCA loss, you mention it is "cosine similarity"?
Hi,
When I use "adamw" as the optimizer, I get the following error. Does anyone have any idea how I can resolve it? The torch version is 1.8.1.
Traceback (most recent call last):
File "train.py", line 396, in
opt.step()
File "/home/.conda/envs/torchdmgpu38_dev/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/.conda/envs/torchdmgpu38_dev/lib/python3.8/site-packages/torch/optim/optimizer.py", line 89, in wrapper
return func(*args, **kwargs)
File "/home/.conda/envs/torchdmgpu38_dev/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/.conda/envs/torchdmgpu38_dev/lib/python3.8/site-packages/torch/optim/adamw.py", line 117, in step
beta1,
UnboundLocalError: local variable 'beta1' referenced before assignment
I appreciate your time and help.
Best,
Farshad
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.