Do you also have the problem of using a lot of GPU memory during training DN-DETR?

More GPU memory cost when adding SQR to DN-DETR about sqr HOT 7 OPEN

fangyi-chen commented on July 22, 2024

More GPU memory cost when adding SQR to DN-DETR

from sqr.

Comments (7)

Fangyi-Chen commented on July 22, 2024

We also observe a (predictable) GPU memory increase during the training of any SQR-based methods, because the increased number of queries are flowing through multiple decoding layers with their backward gradients stored.

We considered how to reduce the negative effect brought by SQR, i.e., the additional training time. Since we used the A100-80GB version, GPU memory was not in our consideration. Different implementations could lead to very different GPU memory overhead. Our implementation is very simple and easy to understand -- basically only a few lines of code -- but it is not the most efficient one. We are glad to receive any advice on faster implementation of SQR!

I also noticed that Group DETR and H-DETR should have similar operation on handling 'groups of query'. I will take a look at their implementation and see if theirs are faster. If they are, I will try to update the implementation in this repo.

Finally, do not hesitate to ask if you have any further questions. Thanks!

from sqr.

powermano commented on July 22, 2024

SQR does consume some additional GPU memory, but not too much. I thought it was mostly due to code problems. I have already fixed it, and the current training GPU memory is within an acceptable range.

On my own dataset, the AP has increased by 2.5 points compared to DN-DETR, very good work. The DINO trick(look forward twice) can be combined with SQR and bring greater improvement。

from sqr.

SYH9905 commented on July 22, 2024

SQR does consume some additional GPU memory, but not too much. I thought it was mostly due to code problems. I have already fixed it, and the current training GPU memory is within an acceptable range.

On my own dataset, the AP has increased by 2.5 points compared to DN-DETR, very good work. The DINO trick(look forward twice) can be combined with SQR and bring greater improvement。

你好请问如何将look forward twice和sqr结合起来，类似dn，dab都有x，y，w，h的参数，在进行refine操作时可能会被下面好几层共同影响，该如何去解决这个问题呢

from sqr.

SYH9905 commented on July 22, 2024

之前进行DAB-DETR 进行look forward twice的时候每一层都通过detach（）进行隔离

from sqr.

SYH9905 commented on July 22, 2024

We also observe a (predictable) GPU memory increase during the training of any SQR-based methods, because the increased number of queries are flowing through multiple decoding layers with their backward gradients stored.

We considered how to reduce the negative effect brought by SQR, i.e., the additional training time. Since we used the A100-80GB version, GPU memory was not in our consideration. Different implementations could lead to very different GPU memory overhead. Our implementation is very simple and easy to understand -- basically only a few lines of code -- but it is not the most efficient one. We are glad to receive any advice on faster implementation of SQR!

I also noticed that Group DETR and H-DETR should have similar operation on handling 'groups of query'. I will take a look at their implementation and see if theirs are faster. If they are, I will try to update the implementation in this repo.

Finally, do not hesitate to ask if you have any further questions. Thanks!

你好，请问是否会公布SQR-DAB-DETR的代码，我想知道在DAB-DETR已经使用look forward twice的情况下，如何集合sqr进行refine操作

from sqr.

powermano commented on July 22, 2024

SQR does consume some additional GPU memory, but not too much. I thought it was mostly due to code problems. I have already fixed it, and the current training GPU memory is within an acceptable range.
On my own dataset, the AP has increased by 2.5 points compared to DN-DETR, very good work. The DINO trick(look forward twice) can be combined with SQR and bring greater improvement。

你好请问如何将look forward twice和sqr结合起来，类似dn，dab都有x，y，w，h的参数，在进行refine操作时可能会被下面好几层共同影响，该如何去解决这个问题呢

following is my sqr_dn-detr code based on https://github.com/IDEA-Research/detrex repo.

          hidden_states, reference_boxes = self.transformer(
            features,
            img_masks,
            input_box_query,
            pos_embed,
            target=input_label_query,
            attn_mask=[attn_mask, None],  # None mask for cross attention
        )
        
        if self.training:
            for qid in range(hidden_states.shape[0]):
                # version 2: using the correspoding reference to update the new_reference_point
                if qid < 1:
                    lvl = 0
                elif qid >= 1 and qid < 3:
                    lvl = qid - 1
                elif qid >= 3 and qid < 6:
                    lvl = qid - 2
                elif qid >= 6 and qid < 11:
                    lvl = qid - 4
                elif qid >= 11 and qid < 19:
                    lvl = qid - 7
                elif qid >= 19 and qid < 32:
                    lvl = qid - 12
                else:
                    assert False

                reference = reference_boxes[lvl]
              
                # Calculate output coordinates and classes.
                reference = inverse_sigmoid(reference)
                anchor_box_offsets = self.bbox_embed(hidden_states[qid])
                outputs_coord = (reference + anchor_box_offsets).sigmoid()
                outputs_class = self.class_embed(hidden_states[qid])  #(layers, bs, num_q+dn_group*max_gt_per_img, 1)
                
                outputs_coords.append(outputs_coord)
                outputs_classes.append(outputs_class)
            
            
            outputs_class = torch.stack(outputs_classes)
            outputs_coord = torch.stack(outputs_coords)

from sqr.

SYH9905 commented on July 22, 2024

SQR does consume some additional GPU memory, but not too much. I thought it was mostly due to code problems. I have already fixed it, and the current training GPU memory is within an acceptable range.
On my own dataset, the AP has increased by 2.5 points compared to DN-DETR, very good work. The DINO trick(look forward twice) can be combined with SQR and bring greater improvement。

你好请问如何将look forward twice和sqr结合起来，类似dn，dab都有x，y，w，h的参数，在进行refine操作时可能会被下面好几层共同影响，该如何去解决这个问题呢

following is my sqr_dn-detr code based on https://github.com/IDEA-Research/detrex repo.

          hidden_states, reference_boxes = self.transformer(
            features,
            img_masks,
            input_box_query,
            pos_embed,
            target=input_label_query,
            attn_mask=[attn_mask, None],  # None mask for cross attention
        )
        
        if self.training:
            for qid in range(hidden_states.shape[0]):
                # version 2: using the correspoding reference to update the new_reference_point
                if qid < 1:
                    lvl = 0
                elif qid >= 1 and qid < 3:
                    lvl = qid - 1
                elif qid >= 3 and qid < 6:
                    lvl = qid - 2
                elif qid >= 6 and qid < 11:
                    lvl = qid - 4
                elif qid >= 11 and qid < 19:
                    lvl = qid - 7
                elif qid >= 19 and qid < 32:
                    lvl = qid - 12
                else:
                    assert False

                reference = reference_boxes[lvl]
              
                # Calculate output coordinates and classes.
                reference = inverse_sigmoid(reference)
                anchor_box_offsets = self.bbox_embed(hidden_states[qid])
                outputs_coord = (reference + anchor_box_offsets).sigmoid()
                outputs_class = self.class_embed(hidden_states[qid])  #(layers, bs, num_q+dn_group*max_gt_per_img, 1)
                
                outputs_coords.append(outputs_coord)
                outputs_classes.append(outputs_class)
            
            
            outputs_class = torch.stack(outputs_classes)
            outputs_coord = torch.stack(outputs_coords)

感谢您的帮助和回答

from sqr.

More GPU memory cost when adding SQR to DN-DETR about sqr HOT 7 OPEN

Comments (7)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent