hdetr / h-deformable-detr Goto Github PK

View Code? Open in Web Editor NEW

248.0 248.0 27.0 174 KB

[CVPR2023] This is an official implementation of paper "DETRs with Hybrid Matching".

License: MIT License

Python 71.62% Shell 9.98% C++ 1.67% Cuda 16.73%

h-deformable-detr's People

Contributors

Stargazers

Watchers

h-deformable-detr's Issues

Question about variant of hybrid matching

Hi I have a question about the performance of hybrid matching scheme.
As you noticed that among the 3 different hybrid matching schemes, using hybrid branch seems work well and achieve faster inference time since only 300 queries are used compared to other ones.
In my knowledge, if using two stage DDETR , we can set different number of queries at training and test time, so if you don't mind, I want to know whether there is potential performance degradation if training other variants with large number of queries and test with fewer queries with two stage architecture.

Thanks!

There have a bug?

Traceback (most recent call last):
File "I:/H-Deformable-DETR/main.py", line 537, in
main(args)
File "I:/H-Deformable-DETR/main.py", line 460, in main
train_stats = train_one_epoch(
File "I:\H-Deformable-DETR\engine.py", line 96, in train_one_epoch
loss_dict = train_hybrid(
File "I:\H-Deformable-DETR\engine.py", line 48, in train_hybrid
loss_dict_one2many = criterion(outputs_one2many, multi_targets)
File "H:\miniconda\envs\H-deformable-detr\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "I:\H-Deformable-DETR\models\deformable_detr.py", line 478, in forward
indices = self.matcher(outputs_without_aux, targets)
File "H:\miniconda\envs\H-deformable-detr\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "I:\H-Deformable-DETR\models\matcher.py", line 104, in forward
C = C.view(bs, num_queries, -1).cpu()
RuntimeError: cannot reshape tensor of 0 elements into shape [1, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

RuntimeError: CUDA error: device-side assert triggered

tensor([70, 81, 32, 1], device='cuda:0')
Traceback (most recent call last):
File "main50.py", line 565, in
main(args)
File "main50.py", line 418, in main
train_stats = train_one_epoch_burnin(
File "/netscratch/shehzadi/Rego-semi/aH-semi/engine.py", line 377, in train_one_epoch_burnin
loss_dict = train_hybrid(
File "/netscratch/shehzadi/Rego-semi/aH-semi/engine.py", line 54, in train_hybrid
loss_dict = criterion(outputs, targets)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/netscratch/shehzadi/Rego-semi/aH-semi/models/deformable_detr.py", line 457, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/netscratch/shehzadi/Rego-semi/aH-semi/models/matcher.py", line 176, in forward
cost_class = pos_cost_class[:, tgt_ids] - neg_cost_class[:, tgt_ids]
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered

I am getting this error.

Will the AP improve if i use fp16?

Will the AP improve if i use fp16?
Will fp16 slow down training, takes more time?
i have a v100 32g,i set batch=4
i really don't know that...

Can you provide a model for semantic segmentation

Hi, your work is amazing. But I see that the code can support semantic segmentation or full panoptic segmentation. Can you provide a trained model?

The shape of outputs_classses_one2_many is 0!

Thanks for sharing such wonderful & interesting work!!!
When I run the code, I met the error below.

things abotut training

I run this order ,GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8
--coco_path , but i get this , launch.py: error: the following arguments are required: training_script, training_script_args.
can you tell me why?

How can I apply this code to other datasets?

The complete COCO dataset is too large for me as I only have a single 8GiB graphics card, and training on it would take too long. How can I modify this code to train or test on my own dataset?
I apologize for asking such a foolish question, and I have great admiration for the work your team has done.
However, I noticed that the dataset path in the code is set to 'coco_path'. Does this mean that if I want to train on a different dataset, I would need to put a lot of effort into adjusting the structure of the existing code?

Hello, this tricks contains those details.

Question about proposal generation

Hello @PkuRainBow , thanks for opening source your excellent work !

I have a question about this code patch(line244) in deformable_transformer.py:

...
            topk = self.two_stage_num_proposals
            topk_proposals = torch.topk(enc_outputs_class[..., 0], topk, dim=1)[1]
            topk_coords_unact = torch.gather(
                enc_outputs_coord_unact, 1, topk_proposals.unsqueeze(-1).repeat(1, 1, 4)
            )
...

Tensor enc_outputs_class[..., 0](enc_outputs_class.shape = (batch_size, len_flattened_encoder_seq, 91)) represents the cls prediction of the first fg class ?

In my understanding, The purpose here is to get the topk fg proposals according to topk highest fg scores(including all fg classes).

So why not execute topk = torch.topk(enc_outputs_class.max(dim=-1)[0], topk, dim=1)[1] ?

Could you please give some explanation, thx !

Can this code work with Windows?

I configured the environment in Windows, and typed "python main.py" to run this code. But it gave me an error like this:

File "main.py", line 536, in
main(args)
File "main.py", line 470, in main
use_fp16=args.use_fp16,
File "F:\SOTA_DETR\H-Deformable-DETR-master\engine.py", line 97, in train_one_epoch
outputs, targets, k_one2many, criterion, lambda_one2many
File "F:\SOTA_DETR\H-Deformable-DETR-master\engine.py", line 48, in train_hybrid
loss_dict_one2many = criterion(outputs_one2many, multi_targets)
File "D:\Anaconda3\envs\detr\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "F:\SOTA_DETR\H-Deformable-DETR-master\models\deformable_detr.py", line 478, in forward
indices = self.matcher(outputs_without_aux, targets)
File "D:\Anaconda3\envs\detr\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "F:\SOTA_DETR\H-Deformable-DETR-master\models\matcher.py", line 106, in forward
C = C.view(bs, num_queries, -1).cpu()
RuntimeError: cannot reshape tensor of 0 elements into shape [2, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

I wander whether the code can't run properly with Windows.
I really appreciate the code you provided with us. Thank you!

LVIS configs

Hi, could you please provide the configs on LVIS dataset? Appreciate it!

Questions about the DETR's performance

Thank you very much for your work.
Because the experiments were conducted on DDETR, I would like to know if this training strategy has any enhancement to the original DETR?
And I have some care for the growth of its training GPU memory usage.
If you have the corresponding data then thank you very much.

The total batch size

Thanks for your awesome work!
But I do not find key information in the paper and the released code. What is the total batch size of H-DETR, 32 or 16?

Thanks!

Question of design of self attention mask

Hi,

Really nice job on the paper. I was excited to read it.

I was wondering if you could explain a bit further on the attention masks. FYI, I am referencing your Hybrid branch which I believe was used in the rest of the paper. So if I understood the paper / your code, the attention masks are just being used to prevent information leakage between the two groups (one-to-one and one-to-many) which makes sense.

However, I don't understand why you did not want to prevent information leakage between every query within the one-to-many group. I understand that you repeat the ground truth K times so multiple queries can match to the same object. However, I would think that the self attention perfromed in the decoder for the one-to-many group would naturally prevent multiple queries from selecting the same object since the whole point of self attention here is to remove duplicates. If you were to add attention mask here, I would think that would resolve this issue.

I think I may be fundamentally misunderstanding something as clearly this worked for you. Any insight would be appreicated. I linked the code below that I have looked at.

Thanks,
Owen

H-Deformable-DETR/models/deformable_detr.py

Lines 208 to 217 in 5dea6f4

 
 # make attn mask 

 """ attention mask to prevent information leakage 

  """ 

 self_attn_mask = ( 

 torch.zeros([self.num_queries, self.num_queries,]).bool().to(src.device) 

 ) 

 self_attn_mask[self.num_queries_one2one :, 0 : self.num_queries_one2one,] = True 

 self_attn_mask[0 : self.num_queries_one2one, self.num_queries_one2one :,] = True

H-Deformable-DETR/models/deformable_transformer.py

Lines 473 to 480 in 5dea6f4

 # self attention 

 q = k = self.with_pos_embed(tgt, query_pos) 

 tgt2 = self.self_attn( 

 q.transpose(0, 1), 

 k.transpose(0, 1), 

 tgt.transpose(0, 1), 

 attn_mask=self_attn_mask, 

 )[0].transpose(0, 1)

question about export the model into torchscript

i exported the model into torchscript format, and when i use the exported model to inference on image,it only can inference on the image that i used for exporting model, but for other image,it cann't work,and the error message is:
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
return forward_call(*input, **kwargs)
Traceback (most recent call last):
File "/root/autodl-tmp/project/deploy/export_model.py", line 264, in
out1 = m(data)
File "/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/detectron2/export/flatten.py", line 9, in forward
def forward(self: torch.detectron2.export.flatten.TracingAdapter,
argument_1: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
_0, _1, _2, _3, _4, _5, _6, = (self.model).forward(argument_1, )
~~~~~~~~~~~~~~~~~~~ <--- HERE
return (_0, _1, _2, _3, _4, _5, 6)
File "code/torch/adet/modeling/text_spotter.py", line 23, in forward
batched_imgs = torch.unsqueeze(_7, 0)
x0 = torch.contiguous(batched_imgs)
_8, _9, _10, _11, = (_0).forward(x0, image_size, )
~~~~~~~~~~~ <--- HERE
_12 = torch.softmax(_9, -1)
prob = torch.sigmoid(torch.mean(_8, [-2]))
File "code/torch/adet/modeling/model/detection_transformer.py", line 50, in forward
_29 = getattr(self.input_proj, "1")
_30 = getattr(self.input_proj, "0")
_31 = (self.backbone).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_32, _33, _34, _35, _36, _37, _38, _39, _40, _41, _42, _43, _44, _45, _46, _47, _48, _49, _50, _51, _52, _53, _54, _55, _56, _57, = _31
_58 = (_30).forward(_32, )
File "code/torch/adet/modeling/text_spotter.py", line 104, in forward
image_size: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]:
_61 = getattr(self, "1")
_62 = (getattr(self, "0")).forward(x, image_size, )
~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
_63, _64, _65, _66, _67, _68, _69, = _62
pos_embed = torch.to((_61).forward(_63, ), 6)
File "code/torch/adet/modeling/text_spotter.py", line 143, in forward
_92 = torch.slice(torch.slice(_91, 0, 0, 125), 1, 0, 138)
_93 = torch.view(CONSTANTS.c2, annotate(List[int], []))
94 = torch.copy(_92, torch.expand(_93, [125, 138]))
~~~~~~~~~~~ <--- HERE
masks_per_feature_level0 = torch.ones([_85, _86, _87], dtype=11, layout=None, device=torch.device("cpu"), pin_memory=False)
_95 = torch.select(masks_per_feature_level0, 0, 0)

Traceback of TorchScript, original code (most recent call last):
/root/autodl-tmp/project/adet/modeling/text_spotter.py(60): mask_out_padding
/root/autodl-tmp/project/adet/modeling/text_spotter.py(43): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(21): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/model/detection_transformer.py(168): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/autodl-tmp/project/adet/modeling/text_spotter.py(220): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(259):
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/detectron2/export/flatten.py(294): forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1039): _slow_forward
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/nn/modules/module.py(1051): _call_impl
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(952): trace_module
/root/miniconda3/envs/deepsolo/lib/python3.8/site-packages/torch/jit/_trace.py(735): trace
/root/autodl-tmp/project/deploy/export_model.py(125): export_tracing
/root/autodl-tmp/project/deploy/export_model.py(224):
/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py(18): execfile
/root/.pycharm_helpers/pydev/pydevd.py(1496): _exec
/root/.pycharm_helpers/pydev/pydevd.py(1489): run
/root/.pycharm_helpers/pydev/pydevd.py(2177): main
/root/.pycharm_helpers/pydev/pydevd.py(2195):
RuntimeError: The size of tensor a (50) must match the size of tensor b (125) at non-singleton dimension 0

an implementation of original DETR with hybrid matching

Are you offering an implementation of original DETR with hybrid matching? I am interested in trying out DETR's performance. Thanks a lot!

About Hybrid-Layer Implement

Thank you for your great work and well-organized repo.
However, I do not find any code for the variant scheme Hybrid epoch or Hybrid layer. How did you implement Hybrid layer? Are there two decoders, and calculate losses separately?

implementation details for only one-to-many branch

Hi, I have a question about the ablation setting in table 13.
There seem no implementation details for only using one-t-many label assignment.
Did you set K for 6? Do outputs of the encoder also use one-to-many label assignment which is not used in hybrid detr?

Thanks

MMCV_Custom

Hi, thank you for your work. I would like to ask how MMCV_Custom can be used in MMDetection's own project. I want to use the AMP acceleration in MMDetection, I wonder if it is feasible

About the generalization of the hybrid matching query setting

Hi, thanks for such great work! I wonder if you test the generalization of the hybrid matching proposed in your paper. I tried to implement the hybrid matching queries on DINO-Deformable-DETR, and the performance degraded from 48.7 mAP to 46.5 under the standard 1x schedule, which seems the hybrid matching strategy in your paper cannot easily transform to other DETR-based object detectors. Hope to get your reply.

Question about weight decay

man i saw your chart weight decay=0.05 under the normal weight=1e-4, weight decay=0.05 has better performance.
but its just on swin backbone in the table,
Will it better on the renset backbone with weight decay=0.05?
hope ur reply~


	# make attn mask
	""" attention mask to prevent information leakage
	"""
	self_attn_mask = (
	torch.zeros([self.num_queries, self.num_queries,]).bool().to(src.device)
	)
	self_attn_mask[self.num_queries_one2one :, 0 : self.num_queries_one2one,] = True
	self_attn_mask[0 : self.num_queries_one2one, self.num_queries_one2one :,] = True

	# self attention
	q = k = self.with_pos_embed(tgt, query_pos)
	tgt2 = self.self_attn(
	q.transpose(0, 1),
	k.transpose(0, 1),
	tgt.transpose(0, 1),
	attn_mask=self_attn_mask,
	)[0].transpose(0, 1)

hdetr / h-deformable-detr Goto Github PK

h-deformable-detr's People

Contributors

Stargazers

Watchers

Forkers

h-deformable-detr's Issues

Recommend Projects

Recommend Topics

Recommend Org