lyuwenyu / rt-detr Goto Github PK

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥

License: Apache License 2.0

Python 96.79% C++ 0.20% Cuda 3.02%

rt-detr's Introduction

Hi there 👋

🌱 I'm currently working on computer vision and multimodal large language models

RTDETR, RTDETRv2, PPYOLOE, PPYOLOE+, PicoDet and PPYOLOv2
PaddleDetection, PaddleMIX

🔭 Google scholar

📬 Reach out to me: [email protected]

rt-detr's People

Contributors

Stargazers

Watchers

Forkers

garspace creativesalam autogyro hhaandroid yu9s yanteng ankandrew sunmingyang1987 zouxiaodong h-bo marenan riverxieh robotseye fjsrepo kingofbadman waitingkuo ezhangle liuqinglong110 sunghyun jiongjiongli bruceruc datomi79 xuanjiawang shining-love jie311 sycamorers solar-drones jordanesikati pablitinho congyi-lcy hzy5000 panhouse leoliu5550 rsai0 techthiyanes wf1024966 giacnguyenbmt hwijune chienbienbac tsvanco josh3255 leolibin kujta1 eamon-cai thanhpham1987 cv-team-8 axera-tech pppkkk611 crescent-ao k2m5t2 yuanshengyu mubarrattajoar captain0305 stu-github jw-xiilab sparrowml ggiret-thinkdeep hcmus-sc203 bourne-m fangliang425 abdoujaouhar wslq3d3 sunhill666 quason afroserom asdzxoop geo99pro aisportswatch kangxia1990 aaronrmm uraltaf wangqiudong huihui308 arjun31415 jpassionq skutukov zenjieli yxliang gyanigk gradient-ai kevinas28 bobholamovic ttff322 sirliyang peterzs tjdahlke ssz1 gmavaliani shiaoyoungcui zecoy zniihgnexy jinnapat heckerboat supervisely-ecosystem pbdahzou 760427741 zwilsonss everestrs huytrinhm fengyunliu

rt-detr's Issues

当RT-DETR的backbone为ResNet50或ResNet101时，训练COCO2017数据集，训练时eval的map一直是0

大佬好，当我用RT-DETR的backbone为ResNet50训练COCO2017数据集时，训练过程中eval的mAP一直是0，且loss在不断下降，pretrain_weights加载的是ResNet50_vd_ssld_v2_pretrained.pdparams，其他参数没做改动；除此之外，每次eval保存的bbox.json文件中，每张图片保存了300个检测框，但是每个检测框的类别、box位置和score都是一样的，请问我需要怎么解决呢？非常期待大佬能够回答一下我的问题，感激不尽~

下面是我训练时的环境：
我用的是PaddlePaddle 2.4.2的镜像，CUDA是11.8。

(InvalidArgument) Sum of Attr(num_or_sections) must be equal to the input's size along the split dimension.

Traceback (most recent call last):
File "tools/train.py", line 183, in
main()
File "tools/train.py", line 179, in main
run(FLAGS, cfg)
File "tools/train.py", line 135, in run
trainer.train(FLAGS.eval)
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/engine/trainer.py", line 377, in train
outputs = model(data)
File "/home/gy/miniconda3/envs/detr-like/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/modeling/architectures/meta_arch.py", line 60, in forward
out = self.get_loss()
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/modeling/architectures/detr.py", line 113, in get_loss
return self._forward()
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/modeling/architectures/detr.py", line 87, in _forward
out_transformer = self.transformer(body_feats, pad_mask, self.inputs)
File "/home/gy/miniconda3/envs/detr-like/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py", line 1012, in call
return self.forward(*inputs, **kwargs)
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/modeling/transformers/rtdetr_transformer.py", line 419, in forward
get_contrastive_denoising_training_group(gt_meta,
File "/home/gy/workspace/work/RT-DETR/rtdetr_paddle/ppdet/modeling/transformers/utils.py", line 296, in get_contrastive_denoising_training_group
dn_positive_idx = paddle.split(dn_positive_idx,
File "/home/gy/miniconda3/envs/detr-like/lib/python3.8/site-packages/paddle/tensor/manipulation.py", line 1982, in split
return _C_ops.split(input, num_or_sections, dim)
ValueError: (InvalidArgument) Sum of Attr(num_or_sections) must be equal to the input's size along the split dimension. But received Attr(num_or_sections) = [84], input(X)'s shape = [2166784], Attr(dim) = 0.
[Hint: Expected sum_of_section == input_axis_dim, but received sum_of_section:84 != input_axis_dim:2166784.] (at /paddle/paddle/phi/infermeta/unary.cc:3285)

print(dn_positive_idx.shape) 的输出为 [2166784]，
print([n * num_group for n in num_gts]) 的输出为 [84]

Collection of questions/discussions/usage

Star this repo, keep following news

finetune doc

https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetr_paddle#finetune

discussions

RT-DETR training logs

paddle	torch
rtdetr_r18vd_6x_coco_log.txt
rtdetr_r34vd_6x_coco_log.txt
rtdetr_r50vd_6x_coco_log.txt
rtdetr_r101vd_6x_coco_log.txt
rtdetr_hgnetv2_l_6x_coco_log.txt
rtdetr_hgnetv2_x_6x_coco_log.txt
rtdetr_r18vd_1x_objects365_log.txt
rtdetr_r50vd_1x_objects365_log.txt
rtdetr_r50vd_2x_coco_objects365_log.txt
rtdetr_r101vd_1x_objects365_log.txt
rtdetr_r101vd_2x_coco_objects365_log.txt

coco

rtdetr_r18vd_6x_coco_log.txt
rtdetr_r34vd_6x_coco_log.txt
rtdetr_r50vd_6x_coco_log.txt
rtdetr_r101vd_6x_coco_log.txt
rtdetr_hgnetv2_l_6x_coco_log.txt
rtdetr_hgnetv2_x_6x_coco_log.txt

objects365

rtdetr_r50vd_1x_objects365_log.txt
rtdetr_r50vd_2x_coco_objects365_log.txt
rtdetr_r101vd_1x_objects365_log.txt
rtdetr_r101vd_2x_coco_objects365_log.txt
rtdetr_r18vd_1x_objects365_log.txt
rtdetr_r18vd_5x_coco_objects365_log.txt

torch

rtdetr_r18vd_6x_coco_pytorch_log.txt

可以先star本项目持续关注动态

通道数问题

您好，我将stage1的hgblock里的3*3卷积替换成了其他的卷积方法，结果出现了通道数的问题，我怎么改都是这个报错，希望能得到您的回复
ValueError: (InvalidArgument) The number of input's channels should be equal to filter's channels * groups for Op(Conv). But received: the input's channels is 64, the input's shape is [4, 64, 136, 136]; the filter's channels is 128, the filter's shape is [64, 128, 1, 1]; the groups is 1, the data_format is NCHW. The error may come from wrong data_format setting.

小目标检测

请问这个适用于小目标检测吗，比如航空方面的

复现RT-DETR训练过程

您好，我想复现整个训练和测试的过程，但是4卡RTX 3090光是训练就要至少两天。我想修改epoch,但是这是否会对收敛造成影响呢？如果能修改epoch，范围是多少呢？

res2net replaces resnet resulting in degradation of accuracy

Dear Author, Thank you for your excellent work.
The results of rt-detr have attracted a wave of enthusiasts to try it out, and I am no exception.
I was reading the RT-DETR-R50 code when I had the whim to try replacing resnet with res2net in the backbone, but I was surprised to find that it actually lost 10% accuracy!
I don't quite understand why this would happen, shouldn't res2net supposedly work better than resnet ......
Looking forward to your explanation!

pytorch 训练代码？

请问有pytorch训练代码吗
16600286930

关于PP-YOLOE相关问题/讨论/使用可留言

Pytorch checkpoint release

Hi, thanks for releasing Pytorch code. I checked the github but I can't find the checkpoints for Pytorch, only for Paddle.
Are you planning to release the Pytorch ckpts ? Thank you

关于通道数问题

ValueError: (InvalidArgument) The number of input's channels should be equal to filter's channels * groups for Op(Conv). But received: the input's channels is 64, the input's shape is [4, 64, 160, 160]; the filter's channels is 128, the filter's shape is [64, 128, 1, 1]; the groups is 1, the data_format is NCHW. The error may come from wrong data_format setting.
[Hint: Expected input_channels == filter_dims[1] * groups, but received input_channels:64 != filter_dims[1] * groups:128.] (at ../paddle/phi/infermeta/binary.cc:534)

GPU memory doesn't release

Dear authors,

I used RT-DETR for training on my custom datasets. However, it doesn't release gpu memory when I interrupt training process. Can you suggest me any solutions for this issues.

Best,
Nam

pytorch版本的大概什么时候出

paddle_pytorch里面的代码实现

尊敬的作者您好！我注意到paddle_pytorch里面有一份pytorch版本的resnet.py代码。
我看到官方在paddle_pytorch的实现步骤里是直接运行python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
但是我发现paddle_pytorch里面只有这一份代码，这是怎么回事呢
期待您的回复！

RT-DETR训练更多Epoch(>72)是否还有性能提升

请问，RT-DTR训练更多个Epoch（>72）后检测性能还会提升吗，有训练过程的log信息可以给我参考一下吗？训练更多个Epoch可能会有过拟合的问题吗？

训练集精度

请问我该如何在训练过程中看到训练集的精度

可以提供pt格式的预训练权重吗

请问可以提供pt格式的预训练权重吗，model zoo中全部是pdpara格式的

Issue with training on user data

I get this error message when I attempt to train. I have all my images in train and val folders, and my annotations in coco format as json files. I don't know why I get this error message.

python tools/train.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml
Traceback (most recent call last):
File "C:\Users\copyc\AiExperts\Eide\RT-DETR\rtdetr_pytorch\tools\train.py", line 9, in
import src.misc.dist as dist
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\copyc\AiExperts\Eide\RT-DETR\rtdetr_pytorch\tools..\src_init_.py", line 2, in
from . import data
^^^^^^^^^^^^^^^^^^
File "C:\Users\copyc\AiExperts\Eide\RT-DETR\rtdetr_pytorch\tools..\src\data_init_.py", line 6, in
from .transforms import *
File "C:\Users\copyc\AiExperts\Eide\RT-DETR\rtdetr_pytorch\tools..\src\data\transforms.py", line 30, in
ConvertDtype = register(T.ConvertDtype)
^^^^^^^^^^^^^^
AttributeError: module 'torchvision.transforms.v2' has no attribute 'ConvertDtype'. Did you mean: 'ConvertImageDtype'?

Pytorch RTDETR

code

https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetr_pytorch

weight

rtdetr_r18vd_5x_coco_objects365_from_paddle.pth
rtdetr_r18vd_1x_objects365_from_paddle.pth

rtdetr_r50vd_6x_coco_from_paddle.pth
rtdetr_r50vd_2x_coco_objects365_from_paddle.pth
rtdetr_r50vd_1x_objects365_from_paddle.pth

rtdetr_r101vd_6x_coco_from_paddle.pth
rtdetr_r101vd_2x_coco_objects365_from_paddle.pth
rtdetr_r101vd_1x_objects365_from_paddle.pth

logs

log_rtdetr_r50vd_coco_pytorch.txt

关于backbone改进问题

亲爱的作者，首先非常感谢您的工作。
我尝试用一些卷积来替换backbone的某些卷积层，但是精度要么持平要么更低。我感到非常沮丧，是backbone里面的卷积组合已经到达天花板了吗？恳请得到您的建议。

请教rtdetr_r50vd.yml文件中的一些参数设置。

作者你好，我看到rtdetr_r50vd.yml文件中有两个300的参数，分别是RTDETRTransformer: num_queries: 300和 DETRPostProcess: num_top_queries: 300。
我想请问这两个300分别表示什么，原论文中写到在IoU aware query selection中选取了Top K 个分数，文中K=300，请问论文中的K=300是代码中的哪个参数，num_queries还是num_top_queries?以及最后预测头会预测出多少个预测框和GT进行匹配？
期待你的答复

The charts in the paper is beautiful！How to draw it？

作者您好，感谢您开源了代码，论文中您的图表也十分美观，我想请问关于IOU aware query（similar to VFL）的部分，Figure 6的散点图是如何画出来的呢，有提供脚本工具绘制吗，如果有的话可以说一下在哪里吗？十分期待您的回复！！

使用RT-DETR导出的onnx模型进行推理遇到问题

您好，我使用focalNet作为RT-DETR的backbone进行目标检测的训练，效果还是挺好的，然后我把best_model.pdparams转换成onnx进行推理（使用deploy/third_engine/onnx/infer.py）时，遇到两个问题：
（1）有没有使用GPU进行推理的代码，我自己改的遇到点问题；
（2）我增加了RT-DETR的两个head，使其增加两个分类和回归任务，同时更改相应代码，在tools/infer.py的推理结果可以输出我新增head的结果，但是用deploy/third_engine/onnx/infer.py推理时，只输出类别，score，xmin, ymin, xmax, ymax六个值，无法输出我新增head的结果，怀疑预测结果只取前6维，请问怎么可以输出全部的预测结果呢？
感恩~

Decreased accuracy when training with non-square inputs

Hello, thank you for the great work.

I have conducted training on both rectangular and square size inputs on my custom dataset. However, the average accuracy of the rectangular trained model is significantly lower. Below is the configuration that was used for training & evaluation:

Square TrainReader:

batch_transforms:
    - BatchRandomResize: {target_size: [480, 512, 544, 576, 608, 640, 640, 640, 672, 704, 736, 768, 800], random_size: True, random_interp: True, keep_ratio: False}

Rect TrainReader:

batch_transforms:
    - BatchRandomResize: {target_size: [352, 608], random_size: False, random_interp: True, keep_ratio: False}

Square EvalReader:

EvalReader:
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: [608, 608], keep_ratio: False, interp: 2} # target_size: (h, w)

Rect EvalReader:

EvalReader:
  sample_transforms:
    - Decode: {}
    - Resize: {target_size: [352, 608], keep_ratio: False, interp: 2} # target_size: (h, w)

I've read through #13, but it doesn't seem to be related to training with rectangular inputs. When 'random_size: True,' one of the numbers in the target_size is used for square resizing.
I also tried switching w and h as suggested, but the results were similar.
Could I get any help? I would like to know if there's something I'm doing wrong or if a similar issue has been encountered before.

TensorRT Inference

Hello
I have issue with launching converted trt model.
I did step by step converting default model like this:

python tools/export_model.py -c configs/rtdetr/rtdetr_r50vd_6x_coco.yml \
              -o weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_r50vd_6x_coco.pdparams trt=True \
              --output_dir=output_inference

paddle2onnx --model_dir=./output_inference/rtdetr_r50vd_6x_coco/ \
            --model_filename model.pdmodel  \
            --params_filename model.pdiparams \
            --opset_version 16 \
            --save_file rtdetr_r50vd_6x_coco.onnx

Last one was tricky, I've downloaded TensorRT GA archive and built trtexec inside.

LD_LIBRARY_PATH=TensorRT-8.6.1.6/lib/ TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes=image:1x3x640x640 --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16

Convert Log

&&&& RUNNING TensorRT.trtexec [TensorRT v8601] # TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes=image:1x3x640x640 --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16
[08/29/2023-19:15:08] [W] --workspace flag has been deprecated by --memPoolSize flag.
[08/29/2023-19:15:08] [I] === Model Options ===
[08/29/2023-19:15:08] [I] Format: ONNX
[08/29/2023-19:15:08] [I] Model: ./rtdetr_r50vd_6x_coco.onnx
[08/29/2023-19:15:08] [I] Output:
[08/29/2023-19:15:08] [I] === Build Options ===
[08/29/2023-19:15:08] [I] Max batch: explicit batch
[08/29/2023-19:15:08] [I] Memory Pools: workspace: 4096 MiB, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default
[08/29/2023-19:15:08] [I] minTiming: 1
[08/29/2023-19:15:08] [I] avgTiming: 8
[08/29/2023-19:15:08] [I] Precision: FP32+FP16
[08/29/2023-19:15:08] [I] LayerPrecisions: 
[08/29/2023-19:15:08] [I] Layer Device Types: 
[08/29/2023-19:15:08] [I] Calibration: 
[08/29/2023-19:15:08] [I] Refit: Disabled
[08/29/2023-19:15:08] [I] Version Compatible: Disabled
[08/29/2023-19:15:08] [I] TensorRT runtime: full
[08/29/2023-19:15:08] [I] Lean DLL Path: 
[08/29/2023-19:15:08] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[08/29/2023-19:15:08] [I] Exclude Lean Runtime: Disabled
[08/29/2023-19:15:08] [I] Sparsity: Disabled
[08/29/2023-19:15:08] [I] Safe mode: Disabled
[08/29/2023-19:15:08] [I] Build DLA standalone loadable: Disabled
[08/29/2023-19:15:08] [I] Allow GPU fallback for DLA: Disabled
[08/29/2023-19:15:08] [I] DirectIO mode: Disabled
[08/29/2023-19:15:08] [I] Restricted mode: Disabled
[08/29/2023-19:15:08] [I] Skip inference: Disabled
[08/29/2023-19:15:08] [I] Save engine: rtdetr_r50vd_6x_coco.trt
[08/29/2023-19:15:08] [I] Load engine: 
[08/29/2023-19:15:08] [I] Profiling verbosity: 0
[08/29/2023-19:15:08] [I] Tactic sources: Using default tactic sources
[08/29/2023-19:15:08] [I] timingCacheMode: local
[08/29/2023-19:15:08] [I] timingCacheFile: 
[08/29/2023-19:15:08] [I] Heuristic: Disabled
[08/29/2023-19:15:08] [I] Preview Features: Use default preview flags.
[08/29/2023-19:15:08] [I] MaxAuxStreams: -1
[08/29/2023-19:15:08] [I] BuilderOptimizationLevel: -1
[08/29/2023-19:15:08] [I] Input(s)s format: fp32:CHW
[08/29/2023-19:15:08] [I] Output(s)s format: fp32:CHW
[08/29/2023-19:15:08] [I] Input build shape: image=1x3x640x640+1x3x640x640+1x3x640x640
[08/29/2023-19:15:08] [I] Input calibration shapes: model
[08/29/2023-19:15:08] [I] === System Options ===
[08/29/2023-19:15:08] [I] Device: 0
[08/29/2023-19:15:08] [I] DLACore: 
[08/29/2023-19:15:08] [I] Plugins:
[08/29/2023-19:15:08] [I] setPluginsToSerialize:
[08/29/2023-19:15:08] [I] dynamicPlugins:
[08/29/2023-19:15:08] [I] ignoreParsedPluginLibs: 0
[08/29/2023-19:15:08] [I] 
[08/29/2023-19:15:08] [I] === Inference Options ===
[08/29/2023-19:15:08] [I] Batch: Explicit
[08/29/2023-19:15:08] [I] Input inference shape: image=1x3x640x640
[08/29/2023-19:15:08] [I] Iterations: 10
[08/29/2023-19:15:08] [I] Duration: 3s (+ 200ms warm up)
[08/29/2023-19:15:08] [I] Sleep time: 0ms
[08/29/2023-19:15:08] [I] Idle time: 0ms
[08/29/2023-19:15:08] [I] Inference Streams: 1
[08/29/2023-19:15:08] [I] ExposeDMA: Disabled
[08/29/2023-19:15:08] [I] Data transfers: Enabled
[08/29/2023-19:15:08] [I] Spin-wait: Disabled
[08/29/2023-19:15:08] [I] Multithreading: Disabled
[08/29/2023-19:15:08] [I] CUDA Graph: Disabled
[08/29/2023-19:15:08] [I] Separate profiling: Disabled
[08/29/2023-19:15:08] [I] Time Deserialize: Disabled
[08/29/2023-19:15:08] [I] Time Refit: Disabled
[08/29/2023-19:15:08] [I] NVTX verbosity: 0
[08/29/2023-19:15:08] [I] Persistent Cache Ratio: 0
[08/29/2023-19:15:08] [I] Inputs:
[08/29/2023-19:15:08] [I] === Reporting Options ===
[08/29/2023-19:15:08] [I] Verbose: Disabled
[08/29/2023-19:15:08] [I] Averages: 10 inferences
[08/29/2023-19:15:08] [I] Percentiles: 90,95,99
[08/29/2023-19:15:08] [I] Dump refittable layers:Disabled
[08/29/2023-19:15:08] [I] Dump output: Disabled
[08/29/2023-19:15:08] [I] Profile: Disabled
[08/29/2023-19:15:08] [I] Export timing to JSON file: 
[08/29/2023-19:15:08] [I] Export output to JSON file: 
[08/29/2023-19:15:08] [I] Export profile to JSON file: 
[08/29/2023-19:15:08] [I] 
[08/29/2023-19:15:08] [I] === Device Information ===
[08/29/2023-19:15:08] [I] Selected Device: NVIDIA GeForce RTX 2080 Ti
[08/29/2023-19:15:08] [I] Compute Capability: 7.5
[08/29/2023-19:15:08] [I] SMs: 68
[08/29/2023-19:15:08] [I] Device Global Memory: 11011 MiB
[08/29/2023-19:15:08] [I] Shared Memory per SM: 64 KiB
[08/29/2023-19:15:08] [I] Memory Bus Width: 352 bits (ECC disabled)
[08/29/2023-19:15:08] [I] Application Compute Clock Rate: 1.65 GHz
[08/29/2023-19:15:08] [I] Application Memory Clock Rate: 7 GHz
[08/29/2023-19:15:08] [I] 
[08/29/2023-19:15:08] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[08/29/2023-19:15:08] [I] 
[08/29/2023-19:15:08] [I] TensorRT version: 8.6.1
[08/29/2023-19:15:08] [I] Loading standard plugins
[08/29/2023-19:15:08] [I] [TRT] [MemUsageChange] Init CUDA: CPU +13, GPU +0, now: CPU 18, GPU 488 (MiB)
[08/29/2023-19:15:13] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +896, GPU +174, now: CPU 991, GPU 662 (MiB)
[08/29/2023-19:15:13] [I] Start parsing network model.
[08/29/2023-19:15:13] [I] [TRT] ----------------------------------------------------------------
[08/29/2023-19:15:13] [I] [TRT] Input filename:   ./rtdetr_r50vd_6x_coco.onnx
[08/29/2023-19:15:13] [I] [TRT] ONNX IR version:  0.0.8
[08/29/2023-19:15:13] [I] [TRT] Opset version:    16
[08/29/2023-19:15:13] [I] [TRT] Producer name:    
[08/29/2023-19:15:13] [I] [TRT] Producer version: 
[08/29/2023-19:15:13] [I] [TRT] Domain:           
[08/29/2023-19:15:13] [I] [TRT] Model version:    0
[08/29/2023-19:15:13] [I] [TRT] Doc string:       
[08/29/2023-19:15:13] [I] [TRT] ----------------------------------------------------------------
[08/29/2023-19:15:13] [W] [TRT] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[08/29/2023-19:15:13] [I] Finished parsing network model. Parse time: 0.339291
[08/29/2023-19:15:13] [W] Dynamic dimensions required for input: im_shape, but no shapes were provided. Automatically overriding shape to: 1x2
[08/29/2023-19:15:13] [W] Dynamic dimensions required for input: scale_factor, but no shapes were provided. Automatically overriding shape to: 1x2
[08/29/2023-19:15:13] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[08/29/2023-19:15:13] [W] [TRT] Detected layernorm nodes in FP16: p2o.ReduceMean.10, p2o.Sub.0, p2o.Pow.0, p2o.Add.44, p2o.Sqrt.0, p2o.Div.0, p2o.Mul.2, p2o.Add.46, p2o.Sub.2, p2o.Pow.2, p2o.Add.56, p2o.Sqrt.2, p2o.Div.4, p2o.Mul.7, p2o.Add.58, p2o.Sub.4, p2o.Pow.4, p2o.Add.104, p2o.Sqrt.4, p2o.Div.6, p2o.Mul.57, p2o.Add.106, p2o.Sub.6, p2o.Pow.6, p2o.ReduceMean.14, p2o.Add.134, p2o.Sqrt.6, p2o.Div.8, p2o.Mul.61, p2o.Add.136, p2o.Sub.8, p2o.Pow.8, p2o.ReduceMean.18, p2o.Add.154, p2o.Sqrt.8, p2o.Div.10, p2o.Mul.81, p2o.Add.156, p2o.Sub.10, p2o.Pow.10, p2o.ReduceMean.22, p2o.Add.164, p2o.Sqrt.10, p2o.Div.12, p2o.Mul.83, p2o.Add.166, p2o.Sub.12, p2o.Pow.12, p2o.ReduceMean.26, p2o.Add.194, p2o.Sqrt.12, p2o.Div.16, p2o.Mul.89, p2o.Add.196, p2o.Sub.14, p2o.Pow.14, p2o.ReduceMean.30, p2o.Add.214, p2o.Sqrt.14, p2o.Div.18, p2o.Mul.109, p2o.Add.216, p2o.Sub.16, p2o.Pow.16, p2o.ReduceMean.34, p2o.Add.224, p2o.Sqrt.16, p2o.Div.20, p2o.Mul.111, p2o.Add.226, p2o.Sub.18, p2o.Pow.18, p2o.ReduceMean.38, p2o.Add.254, p2o.Sqrt.18, p2o.Div.24, p2o.Mul.117, p2o.Add.256, p2o.Sub.20, p2o.Pow.20, p2o.ReduceMean.42, p2o.Add.274, p2o.Sqrt.20, p2o.Div.26, p2o.Mul.137, p2o.Add.276, p2o.Sub.22, p2o.Pow.22, p2o.ReduceMean.46, p2o.Add.284, p2o.Sqrt.22, p2o.Div.28, p2o.Mul.139, p2o.Add.286, p2o.Sub.24, p2o.Pow.24, p2o.ReduceMean.50, p2o.Add.314, p2o.Sqrt.24, p2o.Div.32, p2o.Mul.145, p2o.Add.316, p2o.Sub.26, p2o.Pow.26, p2o.ReduceMean.54, p2o.Add.334, p2o.Sqrt.26, p2o.Div.34, p2o.Mul.165, p2o.Add.336, p2o.Sub.28, p2o.Pow.28, p2o.ReduceMean.58, p2o.Add.344, p2o.Sqrt.28, p2o.Div.36, p2o.Mul.167, p2o.Add.346, p2o.Sub.30, p2o.Pow.30, p2o.ReduceMean.62, p2o.Add.374, p2o.Sqrt.30, p2o.Div.40, p2o.Mul.173, p2o.Add.376, p2o.Sub.32, p2o.Pow.32, p2o.ReduceMean.66, p2o.Add.394, p2o.Sqrt.32, p2o.Div.42, p2o.Mul.193, p2o.Add.396, p2o.Sub.34, p2o.Pow.34, p2o.ReduceMean.70, p2o.Add.404, p2o.Sqrt.34, p2o.Div.44, p2o.Mul.195, p2o.Add.406, p2o.Sub.36, p2o.Pow.36, p2o.ReduceMean.74, p2o.Add.434, p2o.Sqrt.36, p2o.Div.48, p2o.Mul.201, p2o.Add.436, p2o.Sub.38, p2o.Pow.38, p2o.ReduceMean.78, p2o.Add.454, p2o.Sqrt.38, p2o.Div.50, p2o.Mul.221, p2o.Add.456, p2o.Sub.40, p2o.Pow.40, p2o.ReduceMean.82, p2o.Add.464, p2o.Sqrt.40, p2o.Div.52, p2o.Mul.223, p2o.Add.466, p2o.ReduceMean.2, p2o.ReduceMean.6
[08/29/2023-19:15:13] [W] [TRT] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy.
[08/29/2023-19:15:14] [I] [TRT] Graph optimization time: 0.432647 seconds.
[08/29/2023-19:15:14] [I] [TRT] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32.
[08/29/2023-19:15:14] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/29/2023-19:20:31] [I] [TRT] Detected 3 inputs and 2 output network tensors.
[08/29/2023-19:20:31] [I] [TRT] Total Host Persistent Memory: 443840
[08/29/2023-19:20:31] [I] [TRT] Total Device Persistent Memory: 833536
[08/29/2023-19:20:31] [I] [TRT] Total Scratch Memory: 14330880
[08/29/2023-19:20:31] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 172 MiB, GPU 73 MiB
[08/29/2023-19:20:31] [I] [TRT] [BlockAssignment] Started assigning block shifts. This will take 141 steps to complete.
[08/29/2023-19:20:31] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 5.75286ms to assign 9 blocks to 141 nodes requiring 34204160 bytes.
[08/29/2023-19:20:31] [I] [TRT] Total Activation Memory: 34204160
[08/29/2023-19:20:32] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[08/29/2023-19:20:32] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[08/29/2023-19:20:32] [W] [TRT] Check verbose logs for the list of affected weights.
[08/29/2023-19:20:32] [W] [TRT] - 1 weights are affected by this issue: Detected FP32 infinity values and converted them to corresponding FP16 infinity.
[08/29/2023-19:20:32] [W] [TRT] - 223 weights are affected by this issue: Detected subnormal FP16 values.
[08/29/2023-19:20:32] [W] [TRT] - 63 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[08/29/2023-19:20:32] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +65, GPU +81, now: CPU 65, GPU 81 (MiB)
[08/29/2023-19:20:32] [I] Engine built in 324.144 sec.
[08/29/2023-19:20:32] [I] [TRT] Loaded engine size: 85 MiB
[08/29/2023-19:20:32] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +81, now: CPU 0, GPU 81 (MiB)
[08/29/2023-19:20:32] [I] Engine deserialized in 0.0416484 sec.
[08/29/2023-19:20:32] [I] [TRT] [MS] Running engine with multi stream info
[08/29/2023-19:20:32] [I] [TRT] [MS] Number of aux streams is 2
[08/29/2023-19:20:32] [I] [TRT] [MS] Number of total worker streams is 3
[08/29/2023-19:20:32] [I] [TRT] [MS] The main stream provided by execute/enqueue calls is the first worker stream
[08/29/2023-19:20:32] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +33, now: CPU 0, GPU 114 (MiB)
[08/29/2023-19:20:32] [I] Setting persistentCacheLimit to 0 bytes.
[08/29/2023-19:20:32] [I] Using random values for input im_shape
[08/29/2023-19:20:32] [I] Input binding for im_shape with dimensions 1x2 is created.
[08/29/2023-19:20:32] [I] Using random values for input image
[08/29/2023-19:20:32] [I] Input binding for image with dimensions 1x3x640x640 is created.
[08/29/2023-19:20:32] [I] Using random values for input scale_factor
[08/29/2023-19:20:32] [I] Input binding for scale_factor with dimensions 1x2 is created.
[08/29/2023-19:20:32] [I] Output binding for tile_3.tmp_0 with dimensions  is created.
[08/29/2023-19:20:32] [I] Output binding for reshape2_95.tmp_0 with dimensions 300x6 is created.
[08/29/2023-19:20:32] [I] Starting inference
[08/29/2023-19:20:35] [I] Warmup completed 45 queries over 200 ms
[08/29/2023-19:20:35] [I] Timing trace has 671 queries over 3.01199 s
[08/29/2023-19:20:35] [I] 
[08/29/2023-19:20:35] [I] === Trace details ===
[08/29/2023-19:20:35] [I] Trace averages of 10 runs:
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45691 ms - Host latency: 5.00052 ms (enqueue 2.07189 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.44643 ms - Host latency: 4.98667 ms (enqueue 2.07825 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4878 ms - Host latency: 5.02755 ms (enqueue 2.06254 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46815 ms - Host latency: 5.01013 ms (enqueue 2.06558 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45726 ms - Host latency: 4.99639 ms (enqueue 2.07379 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45436 ms - Host latency: 4.99146 ms (enqueue 2.06379 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46643 ms - Host latency: 5.0048 ms (enqueue 2.06562 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48486 ms - Host latency: 5.02398 ms (enqueue 2.06143 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45832 ms - Host latency: 4.99994 ms (enqueue 2.0717 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46133 ms - Host latency: 4.99756 ms (enqueue 2.0851 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45362 ms - Host latency: 4.99407 ms (enqueue 2.0751 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48015 ms - Host latency: 5.0215 ms (enqueue 2.07388 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48145 ms - Host latency: 5.02426 ms (enqueue 2.07272 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45563 ms - Host latency: 4.99669 ms (enqueue 2.07729 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46913 ms - Host latency: 5.01012 ms (enqueue 2.07491 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.52659 ms - Host latency: 5.0699 ms (enqueue 2.08041 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.52965 ms - Host latency: 5.07217 ms (enqueue 1.68873 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.5099 ms - Host latency: 5.04764 ms (enqueue 1.00558 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.49135 ms - Host latency: 5.03224 ms (enqueue 2.07975 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48328 ms - Host latency: 5.02284 ms (enqueue 2.07551 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46193 ms - Host latency: 5.00437 ms (enqueue 2.07556 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45951 ms - Host latency: 4.9968 ms (enqueue 2.0771 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4526 ms - Host latency: 4.99493 ms (enqueue 2.08126 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48311 ms - Host latency: 5.02161 ms (enqueue 2.07546 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4835 ms - Host latency: 5.02534 ms (enqueue 2.07667 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46216 ms - Host latency: 5.00035 ms (enqueue 2.07687 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45424 ms - Host latency: 4.99752 ms (enqueue 2.07772 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48363 ms - Host latency: 5.02688 ms (enqueue 2.07367 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48403 ms - Host latency: 5.02856 ms (enqueue 2.083 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47053 ms - Host latency: 5.01046 ms (enqueue 2.15315 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46604 ms - Host latency: 5.00835 ms (enqueue 2.06006 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46427 ms - Host latency: 5.00811 ms (enqueue 2.08953 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50837 ms - Host latency: 5.0422 ms (enqueue 1.79258 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48506 ms - Host latency: 5.02858 ms (enqueue 2.10857 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48263 ms - Host latency: 5.02445 ms (enqueue 2.0649 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47365 ms - Host latency: 5.01277 ms (enqueue 2.09479 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47847 ms - Host latency: 5.01809 ms (enqueue 2.04818 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50176 ms - Host latency: 5.04219 ms (enqueue 2.0748 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50762 ms - Host latency: 5.04794 ms (enqueue 2.06871 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50122 ms - Host latency: 5.04116 ms (enqueue 2.07521 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48833 ms - Host latency: 5.02892 ms (enqueue 2.07524 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4823 ms - Host latency: 5.02545 ms (enqueue 2.06674 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4752 ms - Host latency: 5.0144 ms (enqueue 2.07634 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47554 ms - Host latency: 5.0187 ms (enqueue 2.07507 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47976 ms - Host latency: 5.01873 ms (enqueue 2.07749 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47539 ms - Host latency: 5.01572 ms (enqueue 2.07463 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47756 ms - Host latency: 5.01377 ms (enqueue 2.0769 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47544 ms - Host latency: 5.01692 ms (enqueue 2.07109 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47598 ms - Host latency: 5.0145 ms (enqueue 2.07634 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47068 ms - Host latency: 5.01055 ms (enqueue 2.06948 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4782 ms - Host latency: 5.01492 ms (enqueue 2.06895 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47981 ms - Host latency: 5.02217 ms (enqueue 2.06775 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47961 ms - Host latency: 5.02263 ms (enqueue 2.07134 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48474 ms - Host latency: 5.02625 ms (enqueue 2.09255 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.52488 ms - Host latency: 5.06416 ms (enqueue 2.07371 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50693 ms - Host latency: 5.04375 ms (enqueue 2.07349 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.49211 ms - Host latency: 5.03394 ms (enqueue 2.07925 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48464 ms - Host latency: 5.02498 ms (enqueue 2.07437 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47805 ms - Host latency: 5.02173 ms (enqueue 2.07852 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.47354 ms - Host latency: 5.01377 ms (enqueue 2.07297 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50503 ms - Host latency: 5.04609 ms (enqueue 2.06426 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50857 ms - Host latency: 5.04917 ms (enqueue 2.07549 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.4821 ms - Host latency: 5.02339 ms (enqueue 2.07751 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.48254 ms - Host latency: 5.02136 ms (enqueue 2.07803 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.46255 ms - Host latency: 5.003 ms (enqueue 2.06277 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.45852 ms - Host latency: 4.99438 ms (enqueue 2.08047 ms)
[08/29/2023-19:20:35] [I] Average on 10 runs - GPU latency: 4.50474 ms - Host latency: 5.04731 ms (enqueue 2.0748 ms)
[08/29/2023-19:20:35] [I] 
[08/29/2023-19:20:35] [I] === Performance summary ===
[08/29/2023-19:20:35] [I] Throughput: 222.776 qps
[08/29/2023-19:20:35] [I] Latency: min = 4.95123 ms, max = 5.09808 ms, mean = 5.02032 ms, median = 5.01892 ms, percentile(90%) = 5.04999 ms, percentile(95%) = 5.0614 ms, percentile(99%) = 5.0824 ms
[08/29/2023-19:20:35] [I] Enqueue Time: min = 0.812134 ms, max = 2.21973 ms, mean = 2.04985 ms, median = 2.07434 ms, percentile(90%) = 2.09802 ms, percentile(95%) = 2.11633 ms, percentile(99%) = 2.16772 ms
[08/29/2023-19:20:35] [I] H2D Latency: min = 0.506348 ms, max = 0.560791 ms, mean = 0.533722 ms, median = 0.53418 ms, percentile(90%) = 0.541504 ms, percentile(95%) = 0.543945 ms, percentile(99%) = 0.549316 ms
[08/29/2023-19:20:35] [I] GPU Compute Time: min = 4.41754 ms, max = 4.55377 ms, mean = 4.47985 ms, median = 4.47876 ms, percentile(90%) = 4.51099 ms, percentile(95%) = 4.51831 ms, percentile(99%) = 4.53955 ms
[08/29/2023-19:20:35] [I] D2H Latency: min = 0.00415039 ms, max = 0.0128174 ms, mean = 0.00676096 ms, median = 0.0065918 ms, percentile(90%) = 0.00817871 ms, percentile(95%) = 0.00878906 ms, percentile(99%) = 0.00970459 ms
[08/29/2023-19:20:35] [I] Total Host Walltime: 3.01199 s
[08/29/2023-19:20:35] [I] Total GPU Compute Time: 3.00598 s
[08/29/2023-19:20:35] [I] Explanations of the performance metrics are printed in the verbose logs.
[08/29/2023-19:20:35] [I] 
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # TensorRT-8.6.1.6/bin/trtexec --onnx=./rtdetr_r50vd_6x_coco.onnx --workspace=4096 --shapes=image:1x3x640x640 --saveEngine=rtdetr_r50vd_6x_coco.trt --avgRuns=10 --fp16

For launch I'm using TRTInference class from benchmark directory:

inf = trtinfer.TRTInference(trt_engine_path, backend="torch", max_batch_size=32, verbose=True)

and get error:

File ~/Projects/sandbox/unvalidated/RT-DETR/benchmark/trtinfer.py:73, in TRTInference.get_bindings(self, engine, context, max_batch_size, device)
     70 shape = engine.get_tensor_shape(name)
     71 dtype = trt.nptype(engine.get_tensor_dtype(name))
---> 73 if shape[0] == -1:
     74     dynamic = True 
     75     shape[0] = max_batch_size

IndexError: Out of bounds

So one of shapes is empty tuple ().

I wonder if you could help me, maybe I am doing something wrong during converting.

P.S. paddlepaddle-gpu==2.5.1.post117

可视化训练损失曲线

您好！我使用的是autodl云服务器平台，我想知道如何判断在模型训练上是否过拟合，想绘制训练和验证损失曲线弄了好久都没弄出来。期待您的回复!

如何得到测试集mAP等

换visdrone数据集，AP、AR为0

RT-DETR 训练数学公式

我想用RT-DETR模型训练数学公式，数据集有了，只有一个类别数学公式。但是训练结果不如意。
这里有没有好的一些参考呢

pretrained models on objects365

Could you please release the pretrained models on objects365 with HGNetv2 backbone?

ERROR, I use pytroch to train but I got this

ccumulating evaluation results...
DONE (t=10.48s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
best_stat:  {'epoch': 0, 'coco_eval_bbox': 3.9432714116975085e-10}

batchsize开的过大后，显存不够程序中断，显存不会自动释放问题

你好，我是刚刚开始用paddle的新人,在尝试过大的batchsize后，程序被打断，这时显存不会自动释放，需要我手动释放，请问paddle有没有好的方法在batchsize开的过大后，程序中断可以自动释放显存。

参数量和计算量统计

paddleDetection里rt-detr文档有参数量和计算量统计

但是似乎跑不了（接口不对）
paddle.flops(model, None, blob, custom_ops=None, print_detail=False)

参数位置不对，以及flops里的input_size只支持是个list
虽然是paddleDetection仓库，但还是在这里问了，那边东西太多回复太慢/笑哭

关于推理速度的困惑

大佬您好，关于您论文中的速度比dino快了20倍，请问是怎么做到的呢，在我看来，基本只有分辨率和编码器部分不同，请问就这两点不同就能提速20倍吗。还是说您代码中做了很多优化，对于这方面我不太了解，恳请大佬指点。

查看backbone：hgnet_v2

您好，感谢您做出的工作和之前耐心的回复！！
我想请问在哪里可以看到RT-DETR的hgnet_v2网络结构图呢？目前我只找到了hgnet。十分期待并感谢您的回复！！

每一层输出的decoder_out_bbox一样

您好，我发现每一层decoder输出的bbox都是一样的，这是合理的吗？原因是将output输入bbox_head[i]输出的结果为： Tensor(shape=[4, 468, 4], dtype=float32, place=Place(gpu:0), stop_gradient=False, [[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], ..., [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]],

[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
...,
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]],

[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
...,
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]])

关于数据增强与长方形输入

你好
感谢作者的工作，非常棒
我们在自己的数据集上进行了实验，效果不错，但还有一些关键问题，希望得到您的指导：

我们观察到实现中，数据增强只使用了简单的处理，没有使用目前复杂的马赛克增强等，如果使用该种增强性能会提升吗（实际上，我们尝试了，发现性能很差，而且在多线程dataloader时报错 Traceback (most recent call last):
File "/opt/conda/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 637, in _thread_loop
raise e
File "/opt/conda/lib/python3.8/site-packages/paddle/io/dataloader/dataloader_iter.py", line 618, in thread_loop
array.append(tensor)
RuntimeError: (PreconditionNotMet) Tensor holds no memory. Call Tensor::mutable_data firstly.
[Hint: holder should not be null.] (at ../paddle/phi/core/dense_tensor_impl.cc:44)）
我们观察到目前只正方形的训练与推理，可以使用长方形吗，以符合一般视频长方形的尺寸（实际上，我们尝试了，比如1088*1920，发现性能会下降很多）

以上两个问题，我们觉得比较困惑，希望得到您的建议
非常感谢

关于训练时添加验证集进行验证

您好，请问一下该如何在训练时添加验证集进行验证呢？我想知道自己训练的时候有没有过拟合

请问rt-detr使用dab-detr的宽高调制了嘛

在代码的哪个部分呢

pytorch rtdetr based on mmdet [in progress]

open-mmlab/mmdetection#10186
open-mmlab/mmdetection#10498

training coco dataset occur error

ValueError: (InvalidArgument) Sum of Attr(num_or_sections) must be equal to the input's size along the split dimension. But received Attr(num_or_sections) = [99, 66], input(X)'s shape = [1478656], Attr(dim) = 0.
[Hint: Expected sum_of_section == input_axis_dim, but received sum_of_section:165 != input_axis_dim:1478656.] (at /paddle/paddle/phi/infermeta/unary.cc:3285)

problem about training with my own dataset using only single class

Hi,
thanks for nice works.
I looked into issues this git and paddle git as well. I could not find how to fix training method with RT-DETR for sing class object detection. Because when using coco and visdrone, rt-detr was good, but my dataset is not good comparing with Yolov8. Can you tell me how to increase mAP for single class training?

and paddle has data augmentation method?

thansk

_generate_anchors函数

关于不同尺寸物体检出率问题

看coco上s m l三种尺寸检出率，DETR系列好像在l大尺寸上会优于YOLO系列，s小尺寸会差一些。但我在自己数据上跑了RTDETR，结果是小尺寸目标效果最佳并超过了YOLO，反而一两百大小的物体检出相比YOLO非常差。实验用的r18模型，训练参数几乎默认，测试用的960*960。对此您有什么看法吗？

VOC dataset can't eval the model ？

作者您好，我使用了VOC格式的数据集进行实验，训练时输入的命令是：python tools/train.py -c configs/rtdetr/ppfall_r50vd_voc.yml --eval ，结果在验证集时发生了报错：trainer object has no attribute '_eval_loader' ，我发现问题是发生在trainer.py中，所以把他的409·439行注释掉了，避免了报错。但是我其实需要eval验证的结果，请问有什么办法解决吗？难道是我的yml文件设置有误？数据集是ppfall，来自飞浆的官方数据库。
trainer.py：如下

            # if validate and is_snapshot:
            #     if not hasattr(self, '_eval_loader'):
            #         # build evaluation dataset and loader
            #         self._eval_dataset = self.cfg.EvalDataset
            #         self._eval_batch_sampler = \
            #             paddle.io.BatchSampler(
            #                 self._eval_dataset,
            #                 batch_size=self.cfg.EvalReader['batch_size'])
            #         # If metric is VOC, need to be set collate_batch=False.
            #         if self.cfg.metric == 'VOC':
            #             self.cfg['EvalReader']['collate_batch'] = False
            #         else:
            #             self._eval_loader = create('EvalReader')(
            #                 self._eval_dataset,
            #                 self.cfg.worker_num,
            #                 batch_sampler=self._eval_batch_sampler)
            #     # if validation in training is enabled, metrics should be re-init
            #     # Init_mark makes sure this code will only execute once
            #     if validate and Init_mark == False:
            #         Init_mark = True
            #         self._init_metrics(validate=validate)
            #         self._reset_metrics()

            #     with paddle.no_grad():
            #         self.status['save_best_model'] = True
            #         self._eval_with_loader(self._eval_loader)

            # if is_snapshot and self.use_ema:
            #     # reset original weight
            #     self.model.set_dict(weight)
            #     self.status.pop('weight')

感谢您的回复！