jbwang1997 / crosskd Goto Github PK

View Code? Open in Web Editor NEW

114.0 114.0 11.0 32.03 MB

CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection

License: Other

Dockerfile 0.11% Shell 1.07% Python 98.82%

crosskd's People

Contributors

Stargazers

Watchers

Forkers

tjdhg456 zhijieshen-bjtu hepburn-forever jewelc92 dl-kd jhyuuu xiexing9212 lemo2012 gaocheng0520 guoyangzhao renshengji

crosskd's Issues

在yolov8 实现的一些困惑

感谢您的工作，我在yolov8实现您这篇工作的想法，分别在reg_conv 和cls_conv 设置了蒸馏点位，遇到以下奇怪的问题：
1.模型蒸馏loss 收敛的很快，后期很容易过拟合。模型精度先上升后下降，大概我看了下有0.9%提升，不及预期（比我之前做的其它蒸馏要差）
2.由于cls_conv 老师和学生通道无法对齐，我增加了conv+bn 对齐通道的。
3.我没有在silu 之前的特征做蒸馏loss,是否影响精度
4.对齐老师和学生的统计量是否有效。
...

请问有尝试过crosskd方法在centernet（backbone是 resnet）上蒸馏过吗

Clarification about a part of the paper

Hey, great work. Probably the best paper on this subject so far. Discusses and compares with all the major previous works.

Just wanted to know what is meant by this part in the paper:

Compared to directly closing the predictions between the teacher-student pair, CrossKD allows part of the student’s detection head to be only relative with detection losses, resulting in a better optimization towards ground-truth targets

Specifically the part about "to be only relative with detection losses". What does relative mean here and which "part of the student's detection head" is being relative with detection losses?

Thanks.

Hello, do you have any configuration for yolo

About the figure in paper

Hi
I am confused about Fig. 2 in the paper. I can understand the meaning 'In the green circled areas, the
distillation targets predicted by the teacher have a large discrepancy with the ground-truth targets assigned to the student.' However, I am curious about how to generate the marks of ground-truth targets and distillation targets.

你好，我能理解就是说这图可以表达出teacher和groundtruth会对student造成混淆。但是我想请问下fig.2 的图是怎么画的？

the log of FCOS

Thank you for your work. Could you please provide log file of FCOS to verify the correctness of my experiment?

关于在两阶段目标检测框架中应用CrossKD方法的咨询

我希望能够请教您关于您在论文中提出的CrossKD方法的一些问题。我对这种方法非常感兴趣，特别是在两阶段目标检测框架（如Faster R-CNN）中的应用。

我阅读了您的论文，并对其解决了预测模仿方法效率低下的问题的方法印象深刻。我想知道，您认为CrossKD方法是否可以直接应用于两阶段目标检测框架中，比如Faster R-CNN，以提高学生模型的检测性能？

如果您能分享一些关于这方面的见解或提供一些建议，我将不胜感激。

谢谢！

Was RetinaNet R101-R50 trained with 1x schedule or 2x?

The config in this repo is using 2x schedule:
https://github.com/jbwang1997/CrossKD/blob/master/configs/crosskd/crosskd_r50_retinanet_r101_fpn_2x_coco.py

But the paper is not clear:

Besides performing CrossKD on GFL, we select three commonly used detectors, i.e., RetinaNet [32], FCOS [49], and ATSS [61], to investigate the effectiveness of CrossKD. We strictly follow the student settings for training and reference the teacher and student results from the MMDetection model zoo. The results are presented in Tab. 8.

About configuration of SwinT

Thank you very much for your outstanding work, which has aroused great interest in me.
Could you please provide the name of the SwinT based crosskd related configuration file? Thank you for your help.

Does this method require teacher head and student head to have the same number of input channels?

Hi,

Thanks for the paper and code. I get the idea of feeding the student's backbone features to the teacher's prediction head. My question is , does this require the student's backbone to have the same number of output channels as the teacher's (which seems rarely the case for networks with different size)? Also, how does the method perform if the student's and teacher's backbones have different number of output channels, and the number of channels have to be aligned by some way, e.g. adding a conv layer? Do you have any empirically results on this? Thank you for your help!

Possible ICCV2023 Violation

Hi,

Congratulations on your impressive and insightful work in the field of object detection.

While browsing through your GitHub repository, I noticed that your README file mentions a paper being under review for ICCV 2023. I thought it might be worth mentioning that, according to ICCV's double-blind policy, which can be found Here, you may want to remove this information to maintain the anonymity of the submission process.

Wishing you all the best :).

Some details about processing on different feature dimensions

Great Work! I want to find out how can the teacher model process the different feature dimensions, i.e., the intermediate student features (mainly presented in the channel dimension). As far as I know, different channel dimensions may result in the teacher model not being able to perform forward propagation properly. Which part of the code reflects the processing of this part？
Looking forward to your reply very much！