lsh9832 / edgeyolo Goto Github PK

View Code? Open in Web Editor NEW

400.0 10.0 54.0 8.42 MB

an edge-real-time anchor-free object detector with decent performance

License: Apache License 2.0

Python 71.56% CMake 0.45% C++ 23.78% Shell 0.17% C 4.03%

anchor-free computer-vision object-detector deep-learning edge-computing object-detection onnx yolo pytorch tensorrt

edgeyolo's Introduction

EdgeYOLO: anchor-free, edge-friendly

简体中文

1 Intro
2 Updates
3 Coming Soon
4 Models
5 Quick Start
$\quad$5.1 setup
$\quad$5.2 inference
$\quad$5.3 train
$\quad$5.4 evaluate
$\quad$5.5 export onnx & tensorrt
6 Cite EdgeYOLO
7 Bugs found currently

Tool Recommendation: SAM(Segment Anything Model) assisted labeling tools: SAMLabeler Pro，multi-person remote labeling is supported.

工具推荐: 使用SAM(Segment Anything Model) 辅助的图像标注工具: SAMLabeler Pro，支持多人同时远程标注

Intro

In embeded device such as Nvidia Jetson AGX Xavier, EdgeYOLO reaches 34FPS with 50.6% AP in COCO2017 dataset and 25.9% AP in VisDrone2019 (image input size is 640x640, batch=16, post-process included). And for smaller model EdgeYOLO-S, it reaches 53FPS with 44.1% AP and 63.3% AP^0.5(SOTA in P5 small models) in COCO2017.
we provide a more effective data augmentation during training.
small object and medium object detect performace is imporved by using RH loss during the last few training epochs.
Our pre-print paper is released on arxiv.

Updates

[2024/3/16]

upload demo/amct_onnx2om.py that exports onnx to om models for Huawei Ascend devices(such as Ascend310) and cpp deployment code.(Please note that you must have the corresponding libraries and tools provided on the Huawei official website, part of which only clients who purchase the corresponding hardware have download permissions)

[2024/3/6]

docker enviroment for training and exporting models for edge device. (RKNN, Horizon J5, Jetson...)

[2023/12/6]

RKNN(for rk3588) deployment code is released.

[2023/11/23]

MNN deployment code is released.

[2023/2/28]

Evaluation for TensorRT model is supported now.

[2023/2/24]

EdgeYOLO supports dataset with yolo format now.
Fix some errors and bugs(which happened when using "--loop" in linux cpp, and caching labels in distributed training).

[2023/2/20]

TensorRT cpp inference console demo (lib opencv and qt5 required)
Fix bugs when exporting models using Version 7.X TensorRT

[2023/2/19]

Publish TensorRT int8 export code with Calibration (torch2trt is required)

Coming Soon

Rebuild TensorRT deployment c++ code for easy to use.
More different models
C++ code for TensorRT inference with UI
EdgeYOLO-mask for segmentation task
Simple but effective pretrain method

Models

models trained on COCO2017-train

Model	Size	mAP^val 0.5:0.95	mAP^val 0.5	FPS^{AGX Xavier trt fp16 batch=16 include NMS}	Params train / infer ^(M)	Download
EdgeYOLO-Tiny-LRELU	416 640	33.1 37.8	50.5 56.7	206 109	7.6 / 7.0	github
EdgeYOLO-Tiny	416 640	37.2 41.4	55.4 60.4	136 67	5.8 / 5.5	github
EdgeYOLO-S	640	44.1	63.3	53	9.9 / 9.3	github
EdgeYOLO-M	640	47.5	66.6	46	19.0 / 17.8	github
EdgeYOLO	640	50.6	69.8	34	41.2 / 40.5	github

models trained on VisDrone2019 (pretrained backbone on COCO2017-train)

We use VisDrone2019-DET dataset with COCO format in our training.
Here's the results without removing detect boxes in ignored region

Model	Size	mAP^val 0.5:0.95	mAP^val 0.5	Download
EdgeYOLO-Tiny-LRELU	416 640	12.1 18.5	22.8 33.6	github
EdgeYOLO-Tiny	416 640	14.9 21.8	27.3 38.5	github
EdgeYOLO-S	640	23.6	40.8	github
EdgeYOLO-M	640	25.0	42.9	github
EdgeYOLO	640	25.9 26.9	43.9 45.4	github(legacy) github(new)

Some of our detect results in COCO2017

COCO2017

Quick Start

setup

git clone https://github.com/LSH9832/edgeyolo.git
cd edgeyolo
pip install -r requirements.txt

if you use tensorrt, please make sure torch2trt and TensorRT Development Toolkit(version>7.1.3.0) are installed.

git clone https://github.com/NVIDIA-AI-IOT/torch2trt.git
cd torch2trt
python setup.py install

or to make sure you use the same version of torch2trt as ours, download here

if you want to use docker, then

download docker image from Baiduyun, 14.3G, pwd: ujar

docker import edgeyolo_deploy.tar.gz edgeyolo:latest

run docker

docker run -it \
           --runtime=nvidia \
           -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
           -e NVIDIA_VISIBLE_DEVICES=all \
           --shm-size 15g \
           -w /code \
           -v "/path/to/your/edgeyolo/parent_dir":/code \
           -v "/path/to/your/dataset/parent_dir":/dataset \
           edgeyolo:latest

then you can use "docker_export.py" instead of "export.py"

inference

First download weights here

python detect.py --weights edgeyolo_coco.pth --source XXX.mp4 --fp16

# all options
python detect.py --weights edgeyolo_coco.pth 
                 --source /XX/XXX.mp4     # or dir with images, such as /dataset/coco2017/val2017    (jpg/jpeg, png, bmp, webp is available)
                 --conf-thres 0.25 
                 --nms-thres 0.5 
                 --input-size 640 640 
                 --batch 1 
                 --save-dir ./output/detect/imgs    # if you press "s", the current frame will be saved in this dir
                 --fp16 
                 --no-fuse                # do not reparameterize model
                 --no-label               # do not draw label with class name and confidence
                 --mp                     # use multi-process to show images more smoothly when batch > 1
                 --fps 30                 # max fps limitation, valid only when option --mp is used

train

first prepare your dataset and create dataset config file(./params/dataset/XXX.yaml), make sure your dataset config file contains:

(COCO, YOLO, VOC, VisDrone and DOTA formats are supported)

type: "coco"                        # dataset format(lowercase)，COCO, YOLO, VOC, VisDrone and DOTA formats are supported currently
dataset_path: "/dataset/coco2017"   # root dir of your dataset

kwargs:
  suffix: "jpg"        # suffix of your dataset's images
  use_cache: true      # test on i5-12490f: Total loading time: 52s -> 10s(seg enabled) and 39s -> 4s(seg disabled)

train:
  image_dir: "images/train2017"                   # train set image dir
  label: "annotations/instances_train2017.json"   # train set label file(format with single label file) or directory(multi label files)

val:
  image_dir: "images/val2017"                     # evaluate set image dir
  label: "annotations/instances_val2017.json"     # evaluate set label file or directory

test:
  test_dir: "test2017"     # test set image dir (not used in code now, but will)

segmentaion_enabled: true  # whether this dataset has segmentation labels and you are going to use them instead of bbox labels

names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']    # category names

then edit file ./params/train/train_XXX.yaml
finally

python train.py --cfg ./params/train/train_XXX.yaml

you can plot figures about loss, learning rate and precision(AP50 and AP50:95) curve using "plot.py"

python plot.py --all \                                   # plot all figures or (--lr, --ap, --loss)
               -f ./output/train/edgeyolo_tiny_lrelu \   # train ouput path or (output_path/eval.yaml for --ap and output_path/log.txt for --lr and --loss)
               --no-show \                               # do not show by plt.show(), (for device without desktop env, or you just want to save the figs)
               --save    \                               # save figures
               --format pdf png svg jpg eps              # save format

figs will be like the following

loss

learning rate

evaluate

python evaluate.py --weights edgeyolo_coco.pth --dataset params/dataset/XXX.yaml --batch 16 --device 0

# all options
python evaluate.py --weights edgeyolo_coco.pth        # or tensorrt model: output/export/edgeyolo_coco/model.pt
                   --dataset params/dataset/XXX.yaml 
                   --batch 16                         # batch size for each gpu, not valid if it's tensorrt model
                   --device 0
                   --input-size 640 640               # height, width
                   --trt                              # if you use tensorrt model add this option
                   --save                             # save weights without optimizer params and set epoch to -1

export onnx & tensorrt

ONNX

python export.py --onnx --weights edgeyolo_coco.pth --batch 1

# all options
python export.py --onnx   # or --onnx-only if tensorrt and torch2trt are not installed
                 --weights edgeyolo_coco.pth 
                 --input-size 640 640   # height, width
                 --batch 1
                 --opset 11
                 --no-simplify    # do not simplify this model

it generates

output/export/edgeyolo_coco/640x640_batch1.onnx

TensorRT

# fp16
python export.py --trt --weights edgeyolo_coco.pth --batch 1 --workspace 8

# int8
python export.py --trt --weights edgeyolo_coco.pth --batch 1 --workspace 8 --int8 --dataset params/dataset/coco.yaml --num-imgs 1024

# all options
python export.py --trt                       # you can add --onnx and relative options to export both models
                 --weights edgeyolo_coco.pth
                 --input-size 640 640        # height, width
                 --batch 1
                 --workspace 10              # (GB)
                 --no-fp16        # fp16 mode in default, use this option to disable it(fp32)
                 --int8           # int8 mode, the following options are needed for calibration
                 --dataset params/dataset/coco.yaml   # generates calibration images from its val images(upper limit：5120)
                 --train          # use train images instead of val images(upper limit：5120)
                 --all            # use all images(upper limit：5120)
                 --num-imgs 512   # (upper limit：5120)

it generates

(optional) output/export/edgeyolo_coco/640x640_batch1.onnx
output/export/edgeyolo_coco/640x640_batch1_fp16(int8).pt       # for python inference
output/export/edgeyolo_coco/640x640_batch1_fp16(int8).engine   # for c++ inference
output/export/edgeyolo_coco/640x640_batch1_fp16(int8).json     # for c++ inference

Benchmark of TensorRT Int8 Model

enviroment: TensorRT Version 8.2.5.1, Windows, i5-12490F, RTX 3060 12GB
For TensorRT, diffirent calib dataset can cause appearent difference in both precision and speed. I think that's why most of official project didn't give int8 quantization results. The table below is of little reference significance, I think.

COCO2017-TensorRT-int8

Int8 Model	Size	Calibration Image number	Workspace ^(GB)	mAP^val 0.5:0.95	mAP^val 0.5	FPS^{RTX 3060 trt int8 batch=16 include NMS}
Tiny-LRELU	416 640	512	8	31.5 36.4	48.7 55.5	730 360
Tiny	416 640	512	8	34.9 39.8	53.1 59.5	549 288
S	640	512	8	42.4	61.8	233
M	640	512	8	45.2	64.2	211
L	640	512	8	49.1	68.0	176

for python inference

python detect.py --trt --weights output/export/edgeyolo_coco/640x640_batch1_int8.pt --source XXX.mp4

# all options
python detect.py --trt 
                 --weights output/export/edgeyolo_coco/640x640_batch1_int8.pt 
                 --source XXX.mp4
                 --legacy         # if "img = img / 255" when you train your train model
                 --use-decoder    # if use original yolox tensorrt model before version 0.3.0
                 --mp             # use multi-process to show images more smoothly when batch > 1
                 --fps 30         # max fps limitation, valid only when option --mp is used

for c++ inference

# build
cd cpp/tensorrt
mkdir build && cd build
cmake ..
make

# help
./yolo -?
./yolo --help

# run
# ./yolo [engine file] [source] [--conf] [--nms] [--loop] [--no-label]
# make sure your engine file and your yaml file are both in a same path
./yolo ../../../output/export/edgeyolo_coco/640x640_batch1_int8.engine ~/Videos/test.avi --conf 0.25 --nms 0.5 --loop --no-label

Cite EdgeYOLO

@article{edgeyolo2023,
  title={EdgeYOLO: An Edge-Real-Time Object Detector},
  author={Shihan Liu, Junlin Zha, Jian Sun, Zhuo Li, and Gang Wang},
  journal={arXiv preprint arXiv:2302.07483},
  year={2023}
}

Bugs found currently

Sometimes it raises error as follows during training. Reduce pytorch version to 1.8.0 might solve this problem.

File "XXX/edgeyolo/edgeyolo/train/loss.py", line 667, in dynamic_k_matching
_, pos_idx = torch.topk(cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

For DOTA dataset, we only support single GPU training mode now, please do not train DOTA dataset with distributed mode or model can not be trained correctly.
Sometimes converting to TensorRT fp16 model with 8.4.X.X or higher version might lose a lot of precision, please use TensorRT Verson 7.X.X.X or 8.2.X.X

edgeyolo's People

Contributors

Stargazers

Watchers

edgeyolo's Issues

Error in export to TRT int8

Hi,

I have a question regarding the TRT-export with int8.

I do the following command:

python export.py --trt --weights edgeyolo_visdrone.pth --batch 1 --workspace 8 --int8 --dataset params/dataset/visdrone_coco.yaml --num-imgs 16

Unfortunately after the calibration part I get the following error.
Do you have any idea or suggestion what I am doing wrong?

Thanks in advance!

2023-03-03 12:39:32.768 | INFO     | edgeyolo.models:__init__:50 - loading models from weight /opt/ssd/visiontools/edgeyolo/edgeyolo_visdrone.pth
/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2311.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Reparameterizing models...
2023-03-03 12:39:37.982 | INFO     | edgeyolo.export.calib:__init__:43 - used images: 16
[03/03/2023-12:39:46] [TRT] [I] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 763, GPU 9107 (MiB)
[03/03/2023-12:39:49] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +302, GPU +404, now: CPU 1088, GPU 9512 (MiB)
2023-03-03 12:40:04.763 | INFO     | edgeyolo.export.pth2trt:torch2onnx2trt:137 - start to simplify ONNX...
2023-03-03 12:40:08.507 | INFO     | edgeyolo.export.pth2trt:torch2onnx2trt:140 - simplified ONNX successfully.
[03/03/2023-12:40:08] [TRT] [W] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/03/2023-12:40:11] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +535, GPU +539, now: CPU 2739, GPU 11713 (MiB)
[03/03/2023-12:40:11] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +82, GPU +122, now: CPU 2821, GPU 11835 (MiB)
[03/03/2023-12:40:11] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[03/03/2023-12:40:17] [TRT] [I] Total Activation Memory: 9815946752
[03/03/2023-12:40:17] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[03/03/2023-12:40:18] [TRT] [I] Total Host Persistent Memory: 239856
[03/03/2023-12:40:18] [TRT] [I] Total Device Persistent Memory: 0
[03/03/2023-12:40:18] [TRT] [I] Total Scratch Memory: 0
[03/03/2023-12:40:18] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 18 MiB, GPU 384 MiB
[03/03/2023-12:40:18] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 372 steps to complete.
[03/03/2023-12:40:18] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 47.7322ms to assign 10 blocks to 372 nodes requiring 182681600 bytes.
[03/03/2023-12:40:18] [TRT] [I] Total Activation Memory: 182681600
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 3143, GPU 12633 (MiB)
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +5, now: CPU 3143, GPU 12618 (MiB)
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +174, now: CPU 0, GPU 430 (MiB)
[03/03/2023-12:40:18] [TRT] [I] Starting Calibration.
[03/03/2023-12:40:20] [TRT] [I]   Calibrated batch 0 in 1.90551 seconds.
[03/03/2023-12:40:22] [TRT] [I]   Calibrated batch 1 in 1.72534 seconds.
[03/03/2023-12:40:24] [TRT] [I]   Calibrated batch 2 in 1.71749 seconds.
[03/03/2023-12:40:25] [TRT] [I]   Calibrated batch 3 in 1.73021 seconds.
[03/03/2023-12:40:27] [TRT] [I]   Calibrated batch 4 in 1.71629 seconds.
[03/03/2023-12:40:29] [TRT] [I]   Calibrated batch 5 in 1.7292 seconds.
[03/03/2023-12:40:31] [TRT] [I]   Calibrated batch 6 in 1.72344 seconds.
[03/03/2023-12:40:32] [TRT] [I]   Calibrated batch 7 in 1.74846 seconds.
[03/03/2023-12:40:34] [TRT] [I]   Calibrated batch 8 in 1.73344 seconds.
[03/03/2023-12:40:36] [TRT] [I]   Calibrated batch 9 in 1.72423 seconds.
[03/03/2023-12:40:38] [TRT] [I]   Calibrated batch 10 in 1.728 seconds.
[03/03/2023-12:40:39] [TRT] [I]   Calibrated batch 11 in 1.74652 seconds.
[03/03/2023-12:40:41] [TRT] [I]   Calibrated batch 12 in 1.75374 seconds.
[03/03/2023-12:40:43] [TRT] [I]   Calibrated batch 13 in 1.7546 seconds.
[03/03/2023-12:40:45] [TRT] [I]   Calibrated batch 14 in 1.75687 seconds.
[03/03/2023-12:40:47] [TRT] [I]   Calibrated batch 15 in 1.7355 seconds.
[03/03/2023-12:40:54] [TRT] [E] 2: [quantization.cpp::DynamicRange::80] Error Code 2: Internal Error (Assertion min_ <= max_ failed. )

2023-03-03 12:40:54.210 | ERROR    | __main__:<module>:182 - An error has been caught in function '<module>', process 'MainProcess' (469532), thread 'MainThread' (281473415237648):
Traceback (most recent call last):

> File "export.py", line 182, in <module>
    main()
    └ <function main at 0xfffeeae42940>

  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
           │     │       └ {}
           │     └ ()
           └ <function main at 0xffff0bf304c0>

  File "export.py", line 169, in main
    data_save["model"] = model_trt.state_dict()
    │                    │         └ <function Module.state_dict at 0xffff21304160>
    │                    └ TRTModule()
    └ {'names': ['pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'], 'img_siz...

  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1321, in state_dict
    hook_result = hook(self, destination, prefix, local_metadata)
                  │    │     │            │       └ {'version': 1}
                  │    │     │            └ ''
                  │    │     └ OrderedDict()
                  │    └ TRTModule()
                  └ <function TRTModule._on_state_dict at 0xfffed7158700>
  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch2trt-0.4.0-py3.8.egg/torch2trt/torch2trt.py", line 572, in _on_state_dict
    state_dict[prefix + "engine"] = bytearray(self.engine.serialize())
    │          │                              │    └ None
    │          │                              └ TRTModule()
    │          └ ''
    └ OrderedDict()

AttributeError: 'NoneType' object has no attribute 'serialize'

作者你好，请问何时可以把各个模型的参数量和计算量这些指标放出来呢？

使用VisDrone2019MOT采用默认的学习和数据增强设置在训练过程出现的问题

   不好意思，又麻烦了。我是之前使用VisDrone2019MOT数据集的，我后续检查数据已经没啥问题了，但是在训练过程中，前三轮tatolloss保持一个平稳2.8到3.6的水平，但是仅仅在第一第二轮训练后的验证集有增长，第三轮虽然损失显示还算正常，但验证精度下跌0.3，但到第四轮出现了到中间时刻，tatolloss暴增到6，后续totalloss还在保持一个高水平的上涨。然后精度开始大幅下滑，第7轮后直接归零了。
   我后续还尝试调低了学习率，但情况一样。我还尝试加载第3轮的权重用更低的学习率计算，依然发生以上情况，即3轮跌，4轮崩。实在找不到解决办法了。

Complete comparison with SOTA methods

Hello
How are you?
Thanks for contributing to this project.
I found that your method was compared with ONLY yolov5, yolov6, yolox of SOTA light-weight object detectors.
Can we see the comparison with yolov7 and yolov8 too?

请问为什么evaluate时会报list out of range

我是在将edgeyolo用于自己的数据集时遇到这个问题的，如果直接训练的话可以完成训练，但在评估时报此错误。之后直接评估也是此错误，具体报错信息如下。我的数据集是coco格式的，也可以用别的模型跑起来。不过我的类别只有四个，是否是类别数量太少的原因？该如何修改呢？
File "/home/workspace/edgeyolo/edgeyolo/train/val/coco_evaluator.py", line 192, in convert_to_coco_format
label = self.dataloader.dataset.class_ids[int(cls[ind])]
IndexError: list index out of range

Train Bug

I already use pytorch1.8.0, but still encounter the bug during training, could you give me some help? Thank you

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered
�[32m20231218_233328�[0m �[36medgeyolo.train.loss:371�[0m - �[31m�[1merror msg: CUDA error: device-side assert triggered�[0m
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa857cfe2f2 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fa857cfb67b in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7fa857f561f9 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fa857ce63a4 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e44ca (0x7fa8cbb0a4ca in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e4561 (0x7fa8cbb0a561 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x509306]
frame #7: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0360]
frame #8: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #9: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #10: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #11: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5023c9]
frame #12: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x502019]
frame #13: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x501fdd]
frame #14: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4df468]
frame #15: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c8443]
frame #16: _PyEval_EvalFrameDefault + 0x4b37 (0x4ec567 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #17: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #18: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x685 (0x4e80b5 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #20: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f8123]
frame #21: _PyEval_EvalFrameDefault + 0x3c7 (0x4e7df7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #22: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #23: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x1231 (0x4e8c61 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #25: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #26: _PyEval_EvalCodeWithName + 0x47 (0x4e67b7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #27: PyEval_EvalCodeEx + 0x39 (0x4e6769 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #28: PyEval_EvalCode + 0x1b (0x59466b in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #29: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c1dc7]
frame #30: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5bddd0]
frame #31: PyRun_StringFlags + 0x9b (0x5b59eb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #32: PyRun_SimpleStringFlags + 0x3b (0x5b56cb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #33: Py_RunMain + 0x25c (0x5b4f0c in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #34: Py_BytesMain + 0x39 (0x588719 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #35: __libc_start_main + 0xe7 (0x7fa8ce156c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #36: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5885ce]

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f464b8172f2 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f464b81467b in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f464ba6f1f9 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f464b7ff3a4 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e44ca (0x7f46bf6234ca in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e4561 (0x7f46bf623561 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x509306]
frame #7: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0360]
frame #8: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #9: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #10: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #11: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5023c9]
frame #12: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x502019]
frame #13: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x501fdd]
frame #14: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4df468]
frame #15: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c8443]
frame #16: _PyEval_EvalFrameDefault + 0x4b37 (0x4ec567 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #17: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #18: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x685 (0x4e80b5 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #20: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f8123]
frame #21: _PyEval_EvalFrameDefault + 0x3c7 (0x4e7df7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #22: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #23: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x1231 (0x4e8c61 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #25: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #26: _PyEval_EvalCodeWithName + 0x47 (0x4e67b7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #27: PyEval_EvalCodeEx + 0x39 (0x4e6769 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #28: PyEval_EvalCode + 0x1b (0x59466b in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #29: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c1dc7]
frame #30: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5bddd0]
frame #31: PyRun_StringFlags + 0x9b (0x5b59eb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #32: PyRun_SimpleStringFlags + 0x3b (0x5b56cb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #33: Py_RunMain + 0x25c (0x5b4f0c in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #34: Py_BytesMain + 0x39 (0x588719 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #35: __libc_start_main + 0xe7 (0x7f46c1c6fc87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #36: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5885ce]

Traceback (most recent call last):
  File "/home/wsy/paper/Edgeyolo-231206/train.py", line 16, in <module>
    train("DEFAULT" if args.default else args.cfg)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/launch_train.py", line 101, in launch
    mp.start_processes(
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 355, in get_losses
    ) = self.get_assignments(  # noqa
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 553, in get_assignments
    ) = self.dynamic_k_matching(cost, pair_wise_ious, gt_classes, num_gt, fg_mask)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 668, in dynamic_k_matching
    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/launch_train.py", line 73, in train_single
    trainer.train()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 499, in train
    train_one_epoch()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 485, in train_one_epoch
    train_one_iter()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 460, in train_one_iter
    train_in_iter()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 410, in train_in_iter
    outputs = self.loss(outputs, (targets, mask_edge))
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 241, in forward
    loss, bbox_loss, confidence_loss, class_loss, l1_loss, num_fg = self.get_losses(
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 389, in get_losses
    ) = self.get_assignments(  # noqa
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 492, in get_assignments
    gt_bboxes_per_image = gt_bboxes_per_image.cpu().float()
RuntimeError: CUDA error: device-side assert triggered

有tensorrt 推理时间的对比实验嘛

论文里没有看到，请问作者有做过tensorrt 推理时间和别的yolo的对比实验嘛

cannot connect to X server BUG

root@autodl-container-809011af9e-2908fc6e:~/edgeyolo# python detect.py --weights ./weights/edgeyolo_coco.pth --source test.avi --fp16
2023-09-01 18:00:05.169 | INFO | edgeyolo.models:init:50 - loading models from weight /root/edgeyolo/weights/edgeyolo_coco.pth
Params: 41.23M, Gflops: 126.71
Reparameterizing models...
After re-parameterization: Params: 40.51M, Gflops: 124.70
574.5ms average:574.5ms: cannot connect to X server
以上是我运行代码时出现的问题，博主方便解答一下吗？

你好，我想请问下pytorch的版本

上面提示的错误是不兼容

pycocotools starts with index 1

Hi guys! Thanks for your great job!

I have evaluate my custom one-class (pedestrian) dataset (visdrone format) and have -1 COCO metrics values, because pycocotool class indexes starts with 1. Index 0 means no-object.
I fix it locally, and now it works fine for me.
But may be all the pedestrians are not included in your COCO evaluation?

Add this line:
coco_evaluator.py 242: logger.info(f'cocoGT {cocoGt.anns[1]} CatIds: {cocoGt.cats}')

And get result:

cocoGT {'segmentation': [], 'area': 598.0, 'iscrowd': 0, 'image_id': 0, 'bbox': [440.0, 248.0, 13.0, 46.0], 'category_id': 0, 'id': 1} , ...
CatIds: {1: {'supercategory': 'pedestrian', 'id': 1, 'name': 'pedestrian'}, ...

../aten/src/ATen/native/cuda/Loss.cu:92: operator(): block: [37,0,0], thread: [96,0,0] Assertion `input_val >= zero && input_val <= one` failed.

Hi Shihan Liu,

Thanks for your work on this repo!

I'm trying to run a custom training myself, using your train script and the yolo formats. The dataset seems to be fine, looking at demo/load_dataset.py. Any ideas where I can start my bug hunt, given above error?

Best regards,
Ramon

yolo格式数据集训练报错

FileNotFoundError :[Errno 2] No such file or directory:/dataset/VisDrone2019-DET/train_cache.edgeyolo

when i train 286 epoch thiscame up

cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
RuntimeError: selected index k out of range

关于detect.py

非常感谢您的分享。我在运行evaluate的时候是没问题的，训练结束后想要进行detect，但是总是报错，错误如下，请问是我的环境配置出错了吗，方便提供下python，pytorch，cuda，opencv的版本吗，我这里可能是opencv出了问题，但是尝试了许多版本都没办法进行预测。
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
242.7ms average:242.7msqt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/lzy30/anaconda3/envs/edgeyolo/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

what training input formats are supported?

Hello. Very interesting. Do you support other training input formats apart from Coco? for example the text-based format used in other YOLO variants (e.g. yolov5)?

Retraining on a trained costum weight

Hi,

I am planning on re-training using a costume weight, e.g. of epoch 5.
When I execute a training a with that as a weight-file it starts counting the train epochs at 5 instead of 0.
Is there a way to reset the epoch within the weight-file? A pretrained coco-weight starts at 0 but it certainly has trained more epochs :)

Thanks in advance!
Lydia

cache未保存问题

大佬你好，我在你的代码训练部分遇到了问题，可以帮忙解决了吗？
loading VisDrone dataset...
Traceback (most recent call last):
File "train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\launch_train.py", line 112, in launch
train_single(params=params)
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\launch_train.py", line 73, in train_single
trainer.train()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 497, in train
before_train()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 337, in before_train
self.load_init()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 273, in load_init
load_dataloader()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 118, in load_dataloader
dataset = get_dataset(
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\get_dataset.py", line 37, in get_dataset
dataset = dataset(
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\visdrone.py", line 89, in init
self.annotation_list = self._load_visdrone_annotations()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\visdrone.py", line 192, in _load_visdrone_annotations
with open(cache_file, "wb") as cachef:
FileNotFoundError: [Errno 2] No such file or directory: '/datasets/VisDrone2019-DET/train_cache.edgeyolo'
annotation_list

数据增强

论文的数据增强方法，没有可视化的图片嘛，像yolov5在训练过程中，会有个效果图。

segmentation support

Hi guys,
thanks for your great work!

I wanted to ask if you are planning on supporting segmentation?
And if yes: when?

All the best,
Lydia

CUDA问题

报错信息如下：
root@autodl-container-7154118f52-44680f8e:~/edgeyolo-main# python train.py --cfg ./params/train/train_coco.yaml
Traceback (most recent call last):
File "train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "/root/edgeyolo-main/edgeyolo/train/launch_train.py", line 101, in launch
mp.start_processes(
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/edgeyolo-main/edgeyolo/train/launch_train.py", line 50, in train_single
torch.cuda.set_device(device)
File "/root/miniconda3/lib/python3.8/site-packages/torch/cuda/init.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
请问怎么解决？

About quantize.

Thank you for open this repo. Recommend add int8 model fps and acc in readme.

请问一下可以断点训练吗

我看train.py函数里只有两个参数选项，在训练开始时有一句no weight file found, setup models from cfg file /mnt/edgeyolo-main/params/model/edgeyolo.yaml是不是意味着可以加载之前的训练模型继续训练呀

显存占用大

训练时，大小接近的模型显存占用比yolox要大很多正常么？yolox可以bs=256，edgeyolo的bs=8

如果我有数据集，可以用您的模型训练一下吗？

如果我有数据集，可以用您的模型训练一下吗？如何操作
我直接训练，如下错误
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'model_cfg' not list in settings file, use default value: model_cfg=params/model/edgeyolo.yaml
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'weights' not list in settings file, use default value: weights=output/train/edgeyolo_coco/last.pth
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'use_cfg' not list in settings file, use default value: use_cfg=False
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'output_dir' not list in settings file, use default value: output_dir=output/train/edgeyolo_coco
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'save_checkpoint_for_each_epoch' not list in settings file, use default value: save_checkpoint_for_each_epoch=True
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'log_file' not list in settings file, use default value: log_file=log.txt
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'dataset_cfg' not list in settings file, use default value: dataset_cfg=params/dataset/coco.yaml
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'batch_size_per_gpu' not list in settings file, use default value: batch_size_per_gpu=8
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'loader_num_workers' not list in settings file, use default value: loader_num_workers=4
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'num_threads' not list in settings file, use default value: num_threads=1
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'device' not list in settings file, use default value: device=[0, 1, 2, 3]
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'fp16' not list in settings file, use default value: fp16=False
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'cudnn_benchmark' not list in settings file, use default value: cudnn_benchmark=False
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'optimizer' not list in settings file, use default value: optimizer=SGD
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'max_epoch' not list in settings file, use default value: max_epoch=300
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'close_mosaic_epochs' not list in settings file, use default value: close_mosaic_epochs=15
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'lr_per_img' not list in settings file, use default value: lr_per_img=0.00015625
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'warmup_epochs' not list in settings file, use default value: warmup_epochs=5
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'warmup_lr_ratio' not list in settings file, use default value: warmup_lr_ratio=0.0
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'final_lr_ratio' not list in settings file, use default value: final_lr_ratio=0.05
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'loss_use' not list in settings file, use default value: loss_use=['bce', 'bce', 'giou']
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'input_size' not list in settings file, use default value: input_size=[640, 640]
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'multiscale_range' not list in settings file, use default value: multiscale_range=5
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'weight_decay' not list in settings file, use default value: weight_decay=0.0005
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'momentum' not list in settings file, use default value: momentum=0.9
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'enhance_mosaic' not list in settings file, use default value: enhance_mosaic=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'use_ema' not list in settings file, use default value: use_ema=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'enable_mixup' not list in settings file, use default value: enable_mixup=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mixup_scale' not list in settings file, use default value: mixup_scale=[0.5, 1.5]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mosaic_scale' not list in settings file, use default value: mosaic_scale=[0.1, 2.0]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'flip_prob' not list in settings file, use default value: flip_prob=0.5
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mosaic_prob' not list in settings file, use default value: mosaic_prob=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mixup_prob' not list in settings file, use default value: mixup_prob=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'degrees' not list in settings file, use default value: degrees=10
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'hsv_gain' not list in settings file, use default value: hsv_gain=[0.0138, 0.664, 0.464]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_at_start' not list in settings file, use default value: eval_at_start=False
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'val_conf_thres' not list in settings file, use default value: val_conf_thres=0.001
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'val_nms_thres' not list in settings file, use default value: val_nms_thres=0.65
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_only' not list in settings file, use default value: eval_only=False
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'obj_conf_enabled' not list in settings file, use default value: obj_conf_enabled=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_interval' not list in settings file, use default value: eval_interval=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'print_interval' not list in settings file, use default value: print_interval=100
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'load_optimizer_params' not list in settings file, use default value: load_optimizer_params=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'train_backbone' not list in settings file, use default value: train_backbone=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'train_start_layers' not list in settings file, use default value: train_start_layers=51
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'force_start_epoch' not list in settings file, use default value: force_start_epoch=-1
Traceback (most recent call last):
File "D:\myworkspace\yolov5\edgeyolo-main\train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "D:\myworkspace\yolov5\edgeyolo-main\edgeyolo\train\launch_train.py", line 101, in launch
mp.start_processes(
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 188, in start_processes
while not context.join():
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 59, in wrap
fn(i, *args)
File "D:\myworkspace\yolov5\edgeyolo-main\edgeyolo\train\launch_train.py", line 50, in train_single
torch.cuda.set_device(device)
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

why not use enhance_mosaic in training COCO or DOTA?

Hi, thanks for share your azmazing work.

In train_coco.yaml and train_dota.yaml, I notice that the setting of enhance_mosaic is set to False.

According to your paper this method could effectively increases the image richness. Could you please tell me why don't you use enhance_mosaic?

Thanks!

train not Stably maintain high usage

when i train my dataset, GPU not Stably maintain high usage
(it sometimes 10% , and sonetimes 70%,The numbers will go up and down)
batch size 8 (because RTX3060 only has 12 GPU RAM)
my num work is set 32, because i have 32 core CPU
threads is set 1 , because only has one GPU
Do any parameters still need to be changed?

torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
.
.
.
AssertionError
这个问题怎么解决呀

请问visdrone2019的coco是怎么设置的

我按相同的形式把VisDrone2019MOT的转换coco了一波。用预训练权重训练的时候发现，精度在往下跌，15个epoch的时候直接全是0了。我看了下你们提供的VisDrone_coco中的json，里面categories包含"ignored regions"和"others"，但是category_id只有0~9。然后按照默认配置num_classes也被设置为10。

Custom Training gives no detections

Hi,
I am training edgeyolo on a custom dataset. Unfortunately the model does not detect any objects.
Also the training loss does not decrease as expected.

number of images:
train images: 15000
val images: 4370

and the auto calculated anchors in yolov4 would be:
[5, 10], [13, 30], [25, 64], [38,123], [51,230], [87,158], [82,340], [136,420], [256,492]

My questions now:

Do I need to edit the anchors in params/model/edgeyolo.yaml ?
And do I need to change the learning rate maybe?

AMd显卡，yolov5训练正常，edgeyolo提示错误

错误提示：

assert torch.cuda.is_available(), "cuda not available, please double check your device status!"
AssertionError: cuda not available, please double check your device status!

how's the cpu performance compares with other sota model such as yolov8-s?

Minimum size for training

Hi Shihan Liu,

What is the reason there is a minimum input size for the model? In my experiments 192x192.

What would be needed to adapt the model to handle smaller sizes, eg 96x96?

Best regards,
Ramon

想问下不同训练阶段使用不同的损失函数代码是在哪一部分

添加一个SE注意力块在head结构后加载预训练权重训练时loss正常，在eval时模型输出inf

SE模块的添加是正常的，训练过程中loss正常输出无异常值。目前bs16，仅训练16epoch。
但是在model.eval()之后，SE之后的一个Conv层就会输出nan。SE之前的输出都无异常nan值，SE之后的这个Conv层的输入也是正常的，但是输出就会异常。后续排查了BN层，发现SE以后的Conv移动均值和方差是异常值nan，weight和bias正常，但是禁用移动之后，依然输出nan。后续检查conv2d也没发现异常weight和bias。

Model detects nothing

I followed the instruction in readme and ran model with pretrained coco tini edgeyolo, but when i tried to interfere my clip of cars running on street, it detects nothing, the result is always None.
Help me please

VisDrone权重的模型是有其他设置吗？遇到了一些问题

很nice的模型！先感谢一波大佬的分享。
我自己尝试用VisDrone2019的MOTval转COCO后跑了下eval，发现mAP很低，用的最好的那个VisDrone权重用的参数是默认的。后续用Detect直接试着跑了一轮的MOT数据集里的图，发现效果没问题。

训练时长

您好，请问使用COCO数据集，4张3090训练完300轮需要多久？

端侧推理

您好，请问有做arm端的推理延迟对比嘛？
Jetson AGX Xavier这个芯片算力比一般普通芯片高太多了，如高通、联发科的一些arm端芯片

Speed slow compared to Yolov5

Hi sir,

Thank you once again for your wonderful contribution!

I'm experiencing slower inference compared to YOLOv5, contrary to what is suggested in the paper. Obviously, I must be overlooking some reparameterization or other essential export steps.

Both exported as tflite models:
Edgeyolo Small: Mean inference time of 1059.58ms
YOLOv5 Small: Mean inference time of 372.30ms

Please see the attached benchmark (last two cells): https://colab.research.google.com/drive/1ZrniYC57kWCd5CB39uMIFPL1eGjt-3aS?usp=sharing#scrollTo=szTi0fvV01kq

Do you have any insights into what I might be missing?

Best regards,
Ramon

On my own dataset, the issue of map being 0

20230626_220743 edgeyolo.train.trainer:379 - Start Train Epoch 1
20230626_220826 edgeyolo.train.trainer:457 - epoch:1/150 iter:100/488 mem:5829MB t_iter:1.04 lr:3.149e-06 loss:{total:9.27 iou:3.17 conf:5.53 cls:0.57} ETA:8:36:33
20230626_220908 edgeyolo.train.trainer:457 - epoch:1/150 iter:200/488 mem:5829MB t_iter:0.96 lr:1.260e-05 loss:{total:7.25 iou:3.01 conf:3.52 cls:0.72} ETA:8:37:18
20230626_220950 edgeyolo.train.trainer:457 - epoch:1/150 iter:300/488 mem:5831MB t_iter:1.17 lr:2.834e-05 loss:{total:6.50 iou:2.50 conf:3.47 cls:0.53} ETA:8:33:24
20230626_221032 edgeyolo.train.trainer:457 - epoch:1/150 iter:400/488 mem:5831MB t_iter:0.88 lr:5.039e-05 loss:{total:4.61 iou:1.99 conf:1.98 cls:0.65} ETA:8:30:49
20230626_221108 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/last.pth
20230626_221108 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/epoch_001.pth
0it [00:07, ?it/s]
20230626_221116 edgeyolo.train.val.coco_evaluator:208 - Evaluate in main process...
20230626_221116 edgeyolo.train.trainer:523 -
Average forward time: 0.00 ms, Average NMS time: 0.00 ms, Average inference time: 0.00 ms

20230626_221116 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/last.pth
20230626_221116 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/epoch_001.pth
Why is the NMS time and inference time showing zero on my own dataset?And，map0.5 and map0.5:0.95 of evaluation on my model are zero.

在detector.py中并没有找到utils中的postprocess, get_model_info，请问可以更新一下库吗

如题。

训练好的模型问题

博主您好，我在使用您的模型对DeepFashion2数据集（采用YOLO格式标签，13个类别，衣服类型）训练后，生成的模型进行正向推理时发生问题，检测框是有的，但是检测框输出的所有类别都是一样的，例如检测框正确框出了“裤子”和“长袖上衣”，但是输出的文字内容均为“短袖”，请问训练自己数据集时，除了修改yaml文件之外还需要修改哪些地方呢？谢谢解答

Training in Yolo format

I have YOLO Annotated text dataset in Folder structure
train:
images/.jpg
labels/.txt

val:
images/.jpg
labels/.txt

Got this Error after 1 epoch:

Loading and preparing results...
20230314_012420 edgeyolo.train.trainer:480 - error: Results do not correspond to current coco set

部署

您好我想问一下用您的模型生成了onnx文件后，我是否能够用已经成型的框架把它部署到安卓端（例如ncnn）

many false positives

enhanced-mosic & mixup

Where is the enhanced-mosic & mixup as claimed in the paper? I compared the edgeyolo/utils2/datasets.py and yolov7/utils/datasets.py files and found that they were basically the same and no significant changes were found.

Could you share some details about running inference on RK3588 NPU?

Hi guys,

I have RK3588 board, that you have used in paper as edge device to run on.

I have trained my weights using edgeyolo_tiny_lrelu , converted to ONNX , than to RKNN using rknn-toolkit 1.4

Command for export to onnx used was

python export.py --onnx-only --weights /workspaces/rocm-ml/edgeyolo/output/train/edgeyolo_lp_2/best.pth --opset 12

However currently i am unable to use QUANTIZE_ON during onnx->rknn conversion, i have used same dataset as for validation during training , different range from 10 to 50 images, without success --> resulting rknn model always outputs bogus.
You are mentioned QUANTIZATION enabled in your paper, could you share same details about how you managed to make it work?

I am running it on RK3588 (from radxa , rock5b) my inference speed is around 11fps~

You are mentioned 32fps in your paper.

I am curious if you implemented multi-threaded inference on RK3588 using all 3 NPU cores by yourself? , which would be 3x11fps ?
Also, have you used edgeyolo_tiny_lrelu.yaml as model_cfg?

Could you please share some more details about running edge-yolo on rk3588, paper nor code here is showing some insights about running it on that edge device.

Thank you in advance~

ONNX exported model is outputing Bogus - normalize image to 0..1 values

I tried it both with torch 2.0 and torch 1.3 , also messed with different versions of onnx, it all behave same

I am using model edgeyolo_tiny_lrelu trained on my custom dataset for 100epochs with 1 class.

Trained model (best.pth) works with detect.py , and is giving correct results for me.

However when i export it using following command line :

python export.py --onnx-only --weights /workspaces/rocm-ml/edgeyolo-output/train/edgeyolo_lp/best.pth

I have to comment out import # import tensorrt as trt in export.py , and i am getting following warnings :

/workspaces/rocm-ml/python-venv/edgeyolo-cpu/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Reparameterizing models...
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:963: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if augment:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:995: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:1010: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
2023-06-17 20:17:26.644 | INFO     | __main__:main:124 - 
start to simplify ONNX...
2023-06-17 20:17:27.221 | INFO     | __main__:main:131 - ONNX export success, saved as output/export/best/640x640_batch1.onnx
2023-06-17 20:17:27.221 | INFO     | __main__:main:178 - All files are saved in output/export/best.

ONNX model is created but its not usable ... does not output anything meaningfull ...

I am trying it using it via following commands in python notebook

import numpy as np
import onnxruntime as rt
import cv2
import torch, torchvision

# init rt
sess = rt.InferenceSession("/workspaces/rocm-ml/tmp/edgeyolo/output/export/best/640x640_batch1.onnx")
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# resize to 640x640
original_image: np.ndarray = cv2.imread("/workspaces/rocm-ml/datasets/ds_yolo/valid/images/drive_img_0015.jpg")
[height, width, _] = original_image.shape
length = max((height, width))
image = np.zeros((length, length, 3), np.uint8)
image[0:height, 0:width] = original_image 
scale = length / 640
image = cv2.resize(image, (640, 640))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# prepare and transpose
image = image / 255.0
image = image.transpose(2, 0, 1)  # HWC -> CHW
# batch
image = np.expand_dims(image, axis=0).astype(np.float32)

# run prediction
pred_onx = sess.run([output_name], {input_name: image})[0]

# pred_onx.shape is correct -> (1, 8400, 6)

then continuing just by reusing your code

# convert it to torch tensor
prediction = torch.tensor(pred_onx)

# boxes and pred
box_corner = prediction.new(prediction.shape)
box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
prediction[:, :, :4] = box_corner[:, :, :4]

# conf
conf_thre = 0.1
num_classes = 1

# detections
output = [None for _ in range(len(prediction))]
for i, image_pred in enumerate(prediction):
    # If none are remaining => process next image
    if not image_pred.size(0):
        continue
    # Get score and class with highest confidence
    class_conf, class_pred = torch.max(image_pred[:, 5 : 5 + num_classes], 1, keepdim=True)

    conf_mask = (image_pred[:, 4] * class_conf.squeeze() >= conf_thre).squeeze()
    # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
    detections = torch.cat((image_pred[:, :5], class_conf, class_pred.float(), image_pred[:, 5 + num_classes :]), 1)
    detections = detections[conf_mask]
    if not detections.size(0):
        continue

and this is where it ends detections array is (0,7) instead of (1,7) which is what "detect.py" in this points returns.
so no detections ...

code is taken from your postprocess function

I can provide notebook best.pth and onnx if you are interested

AnchorFreeDetect参数融合

将YOLOXDetect切换为AnchorFreeDetect，其中的ia参数融合报错。
code：
for i in range(len(self.m)):
c1, c2, , _ = self.m[i].weight.shape
c1, c2_, , _ = self.ia[i].implicit.shape
self.m[i].bias += torch.matmul(self.m[i].weight.reshape(c1, c2),
self.ia[i].implicit.reshape(c2, c1_)).squeeze(1)
Error：
ValueError: Cannot assign non-leaf Tensor to parameter 'bias'. Model parameters must be created explicitly. To express 'bias' as a function of another Tensor, compute the value in the forward() method.

best.pth

您好，我用您的模型训练了下自己的数据集，在经过300次epoch后并没有出现best.pth，而是出现了一个last_augmentation_epoch.pth，请问这是什么原因啊，谢谢

lsh9832 / edgeyolo Goto Github PK

edgeyolo's Introduction

EdgeYOLO: anchor-free, edge-friendly

Tool Recommendation: SAM(Segment Anything Model) assisted labeling tools: SAMLabeler Pro，multi-person remote labeling is supported.

工具推荐: 使用SAM(Segment Anything Model) 辅助的图像标注工具: SAMLabeler Pro，支持多人同时远程标注

Intro

Updates

Coming Soon

Models

Quick Start

setup

if you want to use docker, then

inference

train

evaluate

export onnx & tensorrt

Benchmark of TensorRT Int8 Model

for python inference

for c++ inference

Cite EdgeYOLO

Bugs found currently

edgeyolo's People

Contributors

Stargazers

Watchers

Forkers

edgeyolo's Issues

Recommend Projects

Recommend Topics

Recommend Org