hdetr / h-deformable-detr Goto Github PK
View Code? Open in Web Editor NEW[CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".
License: MIT License
[CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".
License: MIT License
Hi I have a question about the performance of hybrid matching scheme.
As you noticed that among the 3 different hybrid matching schemes, using hybrid branch seems work well and achieve faster inference time since only 300 queries are used compared to other ones.
In my knowledge, if using two stage DDETR , we can set different number of queries at training and test time, so if you don't mind, I want to know whether there is potential performance degradation if training other variants with large number of queries and test with fewer queries with two stage architecture.
Thanks!
Traceback (most recent call last):
File "I:/H-Deformable-DETR/main.py", line 537, in
main(args)
File "I:/H-Deformable-DETR/main.py", line 460, in main
train_stats = train_one_epoch(
File "I:\H-Deformable-DETR\engine.py", line 96, in train_one_epoch
loss_dict = train_hybrid(
File "I:\H-Deformable-DETR\engine.py", line 48, in train_hybrid
loss_dict_one2many = criterion(outputs_one2many, multi_targets)
File "H:\miniconda\envs\H-deformable-detr\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "I:\H-Deformable-DETR\models\deformable_detr.py", line 478, in forward
indices = self.matcher(outputs_without_aux, targets)
File "H:\miniconda\envs\H-deformable-detr\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "I:\H-Deformable-DETR\models\matcher.py", line 104, in forward
C = C.view(bs, num_queries, -1).cpu()
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
tensor([70, 81, 32, 1], device='cuda:0')
Traceback (most recent call last):
File "main50.py", line 565, in
main(args)
File "main50.py", line 418, in main
train_stats = train_one_epoch_burnin(
File "/netscratch/shehzadi/Rego-semi/aH-semi/engine.py", line 377, in train_one_epoch_burnin
loss_dict = train_hybrid(
File "/netscratch/shehzadi/Rego-semi/aH-semi/engine.py", line 54, in train_hybrid
loss_dict = criterion(outputs, targets)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/netscratch/shehzadi/Rego-semi/aH-semi/models/deformable_detr.py", line 457, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/netscratch/shehzadi/Rego-semi/aH-semi/models/matcher.py", line 176, in forward
cost_class = pos_cost_class[:, tgt_ids] - neg_cost_class[:, tgt_ids]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
I am getting this error.
Will the AP improve if i use fp16?
Will fp16 slow down training, takes more time?
i have a v100 32g,i set batch=4
i really don't know that...
Hi, your work is amazing. But I see that the code can support semantic segmentation or full panoptic segmentation. Can you provide a trained model?
I run this order ,GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8
--coco_path , but i get this , launch.py: error: the following arguments are required: training_script, training_script_args.
can you tell me why?
The complete COCO dataset is too large for me as I only have a single 8GiB graphics card, and training on it would take too long. How can I modify this code to train or test on my own dataset?
I apologize for asking such a foolish question, and I have great admiration for the work your team has done.
However, I noticed that the dataset path in the code is set to 'coco_path'. Does this mean that if I want to train on a different dataset, I would need to put a lot of effort into adjusting the structure of the existing code?
Hello @PkuRainBow , thanks for opening source your excellent work !
I have a question about this code patch(line244) in deformable_transformer.py
:
...
topk = self.two_stage_num_proposals
topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]
topk_coords_unact = torch.gather(
enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)
)
...
Tensor enc_outputs_class[..., 0]
(enc_outputs_class.shape = (batch_size, len_flattened_encoder_seq, 91)) represents the cls prediction of the first fg class ?
In my understanding, The purpose here is to get the topk fg proposals according to topk highest fg scores(including all fg classes).
So why not execute topk = torch.topk(enc_outputs_class.max(dim=-1)[0], topk, dim=1)[1]
?
Could you please give some explanation, thx !
I configured the environment in Windows, and typed "python main.py" to run this code. But it gave me an error like this:
File "main.py", line 536, in
main(args)
File "main.py", line 470, in main
use_fp16=args.use_fp16,
File "F:\SOTA_DETR\H-Deformable-DETR-master\engine.py", line 97, in train_one_epoch
outputs, targets, k_one2many, criterion, lambda_one2many
File "F:\SOTA_DETR\H-Deformable-DETR-master\engine.py", line 48, in train_hybrid
loss_dict_one2many = criterion(outputs_one2many, multi_targets)
File "D:\Anaconda3\envs\detr\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "F:\SOTA_DETR\H-Deformable-DETR-master\models\deformable_detr.py", line 478, in forward
indices = self.matcher(outputs_without_aux, targets)
File "D:\Anaconda3\envs\detr\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "F:\SOTA_DETR\H-Deformable-DETR-master\models\matcher.py", line 106, in forward
C = C.view(bs, num_queries, -1).cpu()
RuntimeError: cannot reshape tensor of 0 elements into shape [2, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
I wander whether the code can't run properly with Windows.
I really appreciate the code you provided with us. Thank you!
Hi, could you please provide the configs on LVIS dataset? Appreciate it!
Thank you very much for your work.
Because the experiments were conducted on DDETR, I would like to know if this training strategy has any enhancement to the original DETR?
And I have some care for the growth of its training GPU memory usage.
If you have the corresponding data then thank you very much.
Thanks for your awesome work!
But I do not find key information in the paper and the released code. What is the total batch size of H-DETR, 32 or 16?
Thanks!
Hi,
Really nice job on the paper. I was excited to read it.
I was wondering if you could explain a bit further on the attention masks. FYI, I am referencing your Hybrid branch which I believe was used in the rest of the paper. So if I understood the paper / your code, the attention masks are just being used to prevent information leakage between the two groups (one-to-one and one-to-many) which makes sense.
However, I don't understand why you did not want to prevent information leakage between every query within the one-to-many group. I understand that you repeat the ground truth K times so multiple queries can match to the same object. However, I would think that the self attention perfromed in the decoder for the one-to-many group would naturally prevent multiple queries from selecting the same object since the whole point of self attention here is to remove duplicates. If you were to add attention mask here, I would think that would resolve this issue.
I think I may be fundamentally misunderstanding something as clearly this worked for you. Any insight would be appreicated. I linked the code below that I have looked at.
Thanks,
Owen
H-Deformable-DETR/models/deformable_detr.py
Lines 208 to 217 in 5dea6f4
H-Deformable-DETR/models/deformable_transformer.py
Lines 473 to 480 in 5dea6f4
i exported the model into torchscript format, and when i use the exported model to inference on image,it only can inference on the image that i used for exporting model, but for other image,it cann't work,and the error message is:
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return forward_call(*input, **kwargs)
Traceback (most recent call last):
File "/root/autodl-tmp/project/deploy/export_model.py", line 264, in
out1 = m(data)
File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/detectron2/export/flatten.py", line 9, in forward
def forward(self: torch.detectron2.export.flatten.TracingAdapter,
argument_1: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
_0, _1, _2, _3, _4, _5, _6, = (self.model).forward(argument_1, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
return (_0, _1, _2, _3, _4, _5, 6)
File "code/torch/adet/modeling/text_spotter.py", line 23, in forward
batched_imgs = torch.unsqueeze(_7, 0)
x0 = torch.contiguous(batched_imgs)
_8, _9, _10, _11, = (_0).forward(x0, image_size, )
~~~~~~~~~~~ <--- HERE
_12 = torch.softmax(_9, -1)
prob = torch.sigmoid(torch.mean(_8, [-2]))
File "code/torch/adet/modeling/model/detection_transformer.py", line 50, in forward
_29 = getattr(self.input_proj, "1")
_30 = getattr(self.input_proj, "0")
_31 = (self.backbone).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, = _31
_58 = (_30).forward(_32, )
File "code/torch/adet/modeling/text_spotter.py", line 104, in forward
image_size: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
_61 = getattr(self, "1")
_62 = (getattr(self, "0")).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_63, _64, _65, _66, _67, _68, _69, = _62
pos_embed = torch.to((_61).forward(_63, ), 6)
File "code/torch/adet/modeling/text_spotter.py", line 143, in forward
_92 = torch.slice(torch.slice(_91, 0, 0, 125), 1, 0, 138)
_93 = torch.view(CONSTANTS.c2, annotate(List[int], []))
94 = torch.copy(_92, torch.expand(_93, [125, 138]))
~~~~~~~~~~~ <--- HERE
masks_per_feature_level0 = torch.ones([_85, _86, _87], dtype=11, layout=None, device=torch.device("cpu"), pin_memory=False)
_95 = torch.select(masks_per_feature_level0, 0, 0)
Traceback of TorchScript, original code (most recent call last):
/root/autodl-tmp/project/adet/modeling/text_spotter.py(60): mask_out_padding
/root/autodl-tmp/project/adet/modeling/text_spotter.py(43): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(21): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/model/detection_transformer.py(168): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(220): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259):
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace
/root/autodl-tmp/project/deploy/export_model.py(125): export_tracing
/root/autodl-tmp/project/deploy/export_model.py(224):
/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py(18): execfile
/root/.pycharm_helpers/pydev/pydevd.py(1496): _exec
/root/.pycharm_helpers/pydev/pydevd.py(1489): run
/root/.pycharm_helpers/pydev/pydevd.py(2177): main
/root/.pycharm_helpers/pydev/pydevd.py(2195):
RuntimeError: The size of tensor a (50) must match the size of tensor b (125) at non-singleton dimension 0
Are you offering an implementation of original DETR with hybrid matching? I am interested in trying out DETR's performance. Thanks a lot!
Thank you for your great work and well-organized repo.
However, I do not find any code for the variant scheme Hybrid epoch or Hybrid layer. How did you implement Hybrid layer? Are there two decoders, and calculate losses separately?
Hi, I have a question about the ablation setting in table 13.
There seem no implementation details for only using one-t-many label assignment.
Did you set K for 6? Do outputs of the encoder also use one-to-many label assignment which is not used in hybrid detr?
Thanks
Hi, thank you for your work. I would like to ask how MMCV_Custom can be used in MMDetection's own project. I want to use the AMP acceleration in MMDetection, I wonder if it is feasible
Hi, thanks for such great work! I wonder if you test the generalization of the hybrid matching proposed in your paper. I tried to implement the hybrid matching queries on DINO-Deformable-DETR, and the performance degraded from 48.7 mAP to 46.5 under the standard 1x schedule, which seems the hybrid matching strategy in your paper cannot easily transform to other DETR-based object detectors. Hope to get your reply.
man i saw your chart weight decay=0.05 under the normal weight=1e-4, weight decay=0.05 has better performance.
but its just on swin backbone in the table,
Will it better on the renset backbone with weight decay=0.05?
hope ur reply~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.