Giter VIP home page Giter VIP logo

edgeyolo's Issues

显存占用大

训练时,大小接近的模型显存占用比yolox要大很多正常么?yolox可以bs=256,edgeyolo的bs=8

cache未保存问题

大佬你好,我在你的代码训练部分遇到了问题,可以帮忙解决了吗?
loading VisDrone dataset...
Traceback (most recent call last):
File "train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\launch_train.py", line 112, in launch
train_single(params=params)
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\launch_train.py", line 73, in train_single
trainer.train()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 497, in train
before_train()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 337, in before_train
self.load_init()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 273, in load_init
load_dataloader()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\train\trainer.py", line 118, in load_dataloader
dataset = get_dataset(
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\get_dataset.py", line 37, in get_dataset
dataset = dataset(
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\visdrone.py", line 89, in init
self.annotation_list = self._load_visdrone_annotations()
File "D:\soft\study\my_project\yolo_detection\edgeyolo-main\edgeyolo\data\datasets\visdrone.py", line 192, in _load_visdrone_annotations
with open(cache_file, "wb") as cachef:
FileNotFoundError: [Errno 2] No such file or directory: '/datasets/VisDrone2019-DET/train_cache.edgeyolo'
annotation_list

Speed slow compared to Yolov5

Hi sir,

Thank you once again for your wonderful contribution!

I'm experiencing slower inference compared to YOLOv5, contrary to what is suggested in the paper. Obviously, I must be overlooking some reparameterization or other essential export steps.

Both exported as tflite models:
Edgeyolo Small: Mean inference time of 1059.58ms
YOLOv5 Small: Mean inference time of 372.30ms

Please see the attached benchmark (last two cells): https://colab.research.google.com/drive/1ZrniYC57kWCd5CB39uMIFPL1eGjt-3aS?usp=sharing#scrollTo=szTi0fvV01kq

Do you have any insights into what I might be missing?

Best regards,
Ramon

请问一下可以断点训练吗

我看train.py函数里只有两个参数选项,在训练开始时有一句no weight file found, setup models from cfg file /mnt/edgeyolo-main/params/model/edgeyolo.yaml是不是意味着可以加载之前的训练模型继续训练呀

请问visdrone2019的coco是怎么设置的

我按相同的形式把VisDrone2019MOT的转换coco了一波。用预训练权重训练的时候发现,精度在往下跌,15个epoch的时候直接全是0了。我看了下你们提供的VisDrone_coco中的json,里面categories包含"ignored regions"和"others",但是category_id只有0~9。然后按照默认配置num_classes也被设置为10。

使用VisDrone2019MOT采用默认的学习和数据增强设置在训练过程出现的问题

   不好意思,又麻烦了。我是之前使用VisDrone2019MOT数据集的,我后续检查数据已经没啥问题了,但是在训练过程中,前三轮tatolloss保持一个平稳2.8到3.6的水平,但是仅仅在第一第二轮训练后的验证集有增长,第三轮虽然损失显示还算正常,但验证精度下跌0.3,但到第四轮出现了到中间时刻,tatolloss暴增到6,后续totalloss还在保持一个高水平的上涨。然后精度开始大幅下滑,第7轮后直接归零了。
   我后续还尝试调低了学习率,但情况一样。我还尝试加载第3轮的权重用更低的学习率计算,依然发生以上情况,即3轮跌,4轮崩。实在找不到解决办法了。

About quantize.

Thank you for open this repo. Recommend add int8 model fps and acc in readme.

segmentation support

Hi guys,
thanks for your great work!

I wanted to ask if you are planning on supporting segmentation?
And if yes: when?

All the best,
Lydia

训练时长

您好,请问使用COCO数据集,4张3090训练完300轮需要多久?

Model detects nothing

I followed the instruction in readme and ran model with pretrained coco tini edgeyolo, but when i tried to interfere my clip of cars running on street, it detects nothing, the result is always None.
Help me please

Retraining on a trained costum weight

Hi,

I am planning on re-training using a costume weight, e.g. of epoch 5.
When I execute a training a with that as a weight-file it starts counting the train epochs at 5 instead of 0.
Is there a way to reset the epoch within the weight-file? A pretrained coco-weight starts at 0 but it certainly has trained more epochs :)

Thanks in advance!
Lydia

添加一个SE注意力块在head结构后加载预训练权重训练时loss正常,在eval时模型输出inf

SE模块的添加是正常的,训练过程中loss正常输出无异常值。目前bs16,仅训练16epoch。
但是在model.eval()之后,SE之后的一个Conv层就会输出nan。SE之前的输出都无异常nan值,SE之后的这个Conv层的输入也是正常的,但是输出就会异常。后续排查了BN层,发现SE以后的Conv移动均值和方差是异常值nan,weight和bias正常,但是禁用移动之后,依然输出nan。后续检查conv2d也没发现异常weight和bias。

train not Stably maintain high usage

when i train my dataset, GPU not Stably maintain high usage
(it sometimes 10% , and sonetimes 70%,The numbers will go up and down)
batch size 8 (because RTX3060 only has 12 GPU RAM)
my num work is set 32, because i have 32 core CPU
threads is set 1 , because only has one GPU
Do any parameters still need to be changed?

cannot connect to X server BUG

root@autodl-container-809011af9e-2908fc6e:~/edgeyolo# python detect.py --weights ./weights/edgeyolo_coco.pth --source test.avi --fp16
2023-09-01 18:00:05.169 | INFO | edgeyolo.models:init:50 - loading models from weight /root/edgeyolo/weights/edgeyolo_coco.pth
Params: 41.23M, Gflops: 126.71
Reparameterizing models...
After re-parameterization: Params: 40.51M, Gflops: 124.70
574.5ms average:574.5ms: cannot connect to X server
以上是我运行代码时出现的问题,博主方便解答一下吗?

Error in export to TRT int8

Hi,

I have a question regarding the TRT-export with int8.

I do the following command:

python export.py --trt --weights edgeyolo_visdrone.pth --batch 1 --workspace 8 --int8 --dataset params/dataset/visdrone_coco.yaml --num-imgs 16

Unfortunately after the calibration part I get the following error.
Do you have any idea or suggestion what I am doing wrong?

Thanks in advance!

2023-03-03 12:39:32.768 | INFO     | edgeyolo.models:__init__:50 - loading models from weight /opt/ssd/visiontools/edgeyolo/edgeyolo_visdrone.pth
/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/functional.py:568: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2311.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Reparameterizing models...
2023-03-03 12:39:37.982 | INFO     | edgeyolo.export.calib:__init__:43 - used images: 16
[03/03/2023-12:39:46] [TRT] [I] [MemUsageChange] Init CUDA: CPU +221, GPU +0, now: CPU 763, GPU 9107 (MiB)
[03/03/2023-12:39:49] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +302, GPU +404, now: CPU 1088, GPU 9512 (MiB)
2023-03-03 12:40:04.763 | INFO     | edgeyolo.export.pth2trt:torch2onnx2trt:137 - start to simplify ONNX...
2023-03-03 12:40:08.507 | INFO     | edgeyolo.export.pth2trt:torch2onnx2trt:140 - simplified ONNX successfully.
[03/03/2023-12:40:08] [TRT] [W] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/03/2023-12:40:11] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +535, GPU +539, now: CPU 2739, GPU 11713 (MiB)
[03/03/2023-12:40:11] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +82, GPU +122, now: CPU 2821, GPU 11835 (MiB)
[03/03/2023-12:40:11] [TRT] [I] Timing cache disabled. Turning it on will improve builder speed.
[03/03/2023-12:40:17] [TRT] [I] Total Activation Memory: 9815946752
[03/03/2023-12:40:17] [TRT] [I] Detected 1 inputs and 4 output network tensors.
[03/03/2023-12:40:18] [TRT] [I] Total Host Persistent Memory: 239856
[03/03/2023-12:40:18] [TRT] [I] Total Device Persistent Memory: 0
[03/03/2023-12:40:18] [TRT] [I] Total Scratch Memory: 0
[03/03/2023-12:40:18] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 18 MiB, GPU 384 MiB
[03/03/2023-12:40:18] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 372 steps to complete.
[03/03/2023-12:40:18] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 47.7322ms to assign 10 blocks to 372 nodes requiring 182681600 bytes.
[03/03/2023-12:40:18] [TRT] [I] Total Activation Memory: 182681600
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +3, now: CPU 3143, GPU 12633 (MiB)
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +5, now: CPU 3143, GPU 12618 (MiB)
[03/03/2023-12:40:18] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +174, now: CPU 0, GPU 430 (MiB)
[03/03/2023-12:40:18] [TRT] [I] Starting Calibration.
[03/03/2023-12:40:20] [TRT] [I]   Calibrated batch 0 in 1.90551 seconds.
[03/03/2023-12:40:22] [TRT] [I]   Calibrated batch 1 in 1.72534 seconds.
[03/03/2023-12:40:24] [TRT] [I]   Calibrated batch 2 in 1.71749 seconds.
[03/03/2023-12:40:25] [TRT] [I]   Calibrated batch 3 in 1.73021 seconds.
[03/03/2023-12:40:27] [TRT] [I]   Calibrated batch 4 in 1.71629 seconds.
[03/03/2023-12:40:29] [TRT] [I]   Calibrated batch 5 in 1.7292 seconds.
[03/03/2023-12:40:31] [TRT] [I]   Calibrated batch 6 in 1.72344 seconds.
[03/03/2023-12:40:32] [TRT] [I]   Calibrated batch 7 in 1.74846 seconds.
[03/03/2023-12:40:34] [TRT] [I]   Calibrated batch 8 in 1.73344 seconds.
[03/03/2023-12:40:36] [TRT] [I]   Calibrated batch 9 in 1.72423 seconds.
[03/03/2023-12:40:38] [TRT] [I]   Calibrated batch 10 in 1.728 seconds.
[03/03/2023-12:40:39] [TRT] [I]   Calibrated batch 11 in 1.74652 seconds.
[03/03/2023-12:40:41] [TRT] [I]   Calibrated batch 12 in 1.75374 seconds.
[03/03/2023-12:40:43] [TRT] [I]   Calibrated batch 13 in 1.7546 seconds.
[03/03/2023-12:40:45] [TRT] [I]   Calibrated batch 14 in 1.75687 seconds.
[03/03/2023-12:40:47] [TRT] [I]   Calibrated batch 15 in 1.7355 seconds.
[03/03/2023-12:40:54] [TRT] [E] 2: [quantization.cpp::DynamicRange::80] Error Code 2: Internal Error (Assertion min_ <= max_ failed. )

2023-03-03 12:40:54.210 | ERROR    | __main__:<module>:182 - An error has been caught in function '<module>', process 'MainProcess' (469532), thread 'MainThread' (281473415237648):
Traceback (most recent call last):

> File "export.py", line 182, in <module>
    main()
    └ <function main at 0xfffeeae42940>

  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
           │     │       └ {}
           │     └ ()
           └ <function main at 0xffff0bf304c0>

  File "export.py", line 169, in main
    data_save["model"] = model_trt.state_dict()
    │                    │         └ <function Module.state_dict at 0xffff21304160>
    │                    └ TRTModule()
    └ {'names': ['pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'], 'img_siz...

  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1321, in state_dict
    hook_result = hook(self, destination, prefix, local_metadata)
                  │    │     │            │       └ {'version': 1}
                  │    │     │            └ ''
                  │    │     └ OrderedDict()
                  │    └ TRTModule()
                  └ <function TRTModule._on_state_dict at 0xfffed7158700>
  File "/opt/ssd/visiontools/edgeyolo/venv/lib/python3.8/site-packages/torch2trt-0.4.0-py3.8.egg/torch2trt/torch2trt.py", line 572, in _on_state_dict
    state_dict[prefix + "engine"] = bytearray(self.engine.serialize())
    │          │                              │    └ None
    │          │                              └ TRTModule()
    │          └ ''
    └ OrderedDict()

AttributeError: 'NoneType' object has no attribute 'serialize'

请问为什么evaluate时会报list out of range

我是在将edgeyolo用于自己的数据集时遇到这个问题的,如果直接训练的话可以完成训练,但在评估时报此错误。之后直接评估也是此错误,具体报错信息如下。我的数据集是coco格式的,也可以用别的模型跑起来。不过我的类别只有四个,是否是类别数量太少的原因?该如何修改呢?
File "/home/workspace/edgeyolo/edgeyolo/train/val/coco_evaluator.py", line 192, in convert_to_coco_format
label = self.dataloader.dataset.class_ids[int(cls[ind])]
IndexError: list index out of range

pycocotools starts with index 1

Hi guys! Thanks for your great job!

I have evaluate my custom one-class (pedestrian) dataset (visdrone format) and have -1 COCO metrics values, because pycocotool class indexes starts with 1. Index 0 means no-object.
I fix it locally, and now it works fine for me.
But may be all the pedestrians are not included in your COCO evaluation?

Add this line:
coco_evaluator.py 242: logger.info(f'cocoGT {cocoGt.anns[1]} CatIds: {cocoGt.cats}')

And get result:

cocoGT {'segmentation': [], 'area': 598.0, 'iscrowd': 0, 'image_id': 0, 'bbox': [440.0, 248.0, 13.0, 46.0], 'category_id': 0, 'id': 1} , ...
CatIds: {1: {'supercategory': 'pedestrian', 'id': 1, 'name': 'pedestrian'}, ...

Complete comparison with SOTA methods

Hello
How are you?
Thanks for contributing to this project.
I found that your method was compared with ONLY yolov5, yolov6, yolox of SOTA light-weight object detectors.
Can we see the comparison with yolov7 and yolov8 too?

Training in Yolo format

I have YOLO Annotated text dataset in Folder structure
train:
images/.jpg
labels/
.txt

val:
images/.jpg
labels/
.txt

Got this Error after 1 epoch:

Loading and preparing results...
20230314_012420 edgeyolo.train.trainer:480 - error: Results do not correspond to current coco set

ONNX exported model is outputing Bogus - normalize image to 0..1 values

I tried it both with torch 2.0 and torch 1.3 , also messed with different versions of onnx, it all behave same

I am using model edgeyolo_tiny_lrelu trained on my custom dataset for 100epochs with 1 class.

Trained model (best.pth) works with detect.py , and is giving correct results for me.

However when i export it using following command line :

python export.py --onnx-only --weights /workspaces/rocm-ml/edgeyolo-output/train/edgeyolo_lp/best.pth

I have to comment out import # import tensorrt as trt in export.py , and i am getting following warnings :

/workspaces/rocm-ml/python-venv/edgeyolo-cpu/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Reparameterizing models...
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:963: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if augment:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:995: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
/workspaces/rocm-ml/tmp/edgeyolo/edgeyolo/models/yolo.py:1010: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
2023-06-17 20:17:26.644 | INFO     | __main__:main:124 - 
start to simplify ONNX...
2023-06-17 20:17:27.221 | INFO     | __main__:main:131 - ONNX export success, saved as output/export/best/640x640_batch1.onnx
2023-06-17 20:17:27.221 | INFO     | __main__:main:178 - All files are saved in output/export/best.

ONNX model is created but its not usable ... does not output anything meaningfull ...

I am trying it using it via following commands in python notebook

import numpy as np
import onnxruntime as rt
import cv2
import torch, torchvision

# init rt
sess = rt.InferenceSession("/workspaces/rocm-ml/tmp/edgeyolo/output/export/best/640x640_batch1.onnx")
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# resize to 640x640
original_image: np.ndarray = cv2.imread("/workspaces/rocm-ml/datasets/ds_yolo/valid/images/drive_img_0015.jpg")
[height, width, _] = original_image.shape
length = max((height, width))
image = np.zeros((length, length, 3), np.uint8)
image[0:height, 0:width] = original_image 
scale = length / 640
image = cv2.resize(image, (640, 640))
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# prepare and transpose
image = image / 255.0
image = image.transpose(2, 0, 1)  # HWC -> CHW
# batch
image = np.expand_dims(image, axis=0).astype(np.float32)

# run prediction
pred_onx = sess.run([output_name], {input_name: image})[0]

# pred_onx.shape is correct -> (1, 8400, 6)

then continuing just by reusing your code

# convert it to torch tensor
prediction = torch.tensor(pred_onx)

# boxes and pred
box_corner = prediction.new(prediction.shape)
box_corner[:, :, 0] = prediction[:, :, 0] - prediction[:, :, 2] / 2
box_corner[:, :, 1] = prediction[:, :, 1] - prediction[:, :, 3] / 2
box_corner[:, :, 2] = prediction[:, :, 0] + prediction[:, :, 2] / 2
box_corner[:, :, 3] = prediction[:, :, 1] + prediction[:, :, 3] / 2
prediction[:, :, :4] = box_corner[:, :, :4]

# conf
conf_thre = 0.1
num_classes = 1

# detections
output = [None for _ in range(len(prediction))]
for i, image_pred in enumerate(prediction):
    # If none are remaining => process next image
    if not image_pred.size(0):
        continue
    # Get score and class with highest confidence
    class_conf, class_pred = torch.max(image_pred[:, 5 : 5 + num_classes], 1, keepdim=True)

    conf_mask = (image_pred[:, 4] * class_conf.squeeze() >= conf_thre).squeeze()
    # Detections ordered as (x1, y1, x2, y2, obj_conf, class_conf, class_pred)
    detections = torch.cat((image_pred[:, :5], class_conf, class_pred.float(), image_pred[:, 5 + num_classes :]), 1)
    detections = detections[conf_mask]
    if not detections.size(0):
        continue

and this is where it ends detections array is (0,7) instead of (1,7) which is what "detect.py" in this points returns.
so no detections ...

code is taken from your postprocess function

I can provide notebook best.pth and onnx if you are interested

Minimum size for training

Hi Shihan Liu,

What is the reason there is a minimum input size for the model? In my experiments 192x192.

What would be needed to adapt the model to handle smaller sizes, eg 96x96?

Best regards,
Ramon

CUDA问题

报错信息如下:
root@autodl-container-7154118f52-44680f8e:~/edgeyolo-main# python train.py --cfg ./params/train/train_coco.yaml
Traceback (most recent call last):
File "train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "/root/edgeyolo-main/edgeyolo/train/launch_train.py", line 101, in launch
mp.start_processes(
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/edgeyolo-main/edgeyolo/train/launch_train.py", line 50, in train_single
torch.cuda.set_device(device)
File "/root/miniconda3/lib/python3.8/site-packages/torch/cuda/init.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
请问怎么解决?

数据增强

论文的数据增强方法,没有可视化的图片嘛,像yolov5在训练过程中,会有个效果图。

如果我有数据集,可以用您的模型训练一下吗?

如果我有数据集,可以用您的模型训练一下吗?如何操作
我直接训练,如下错误
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'model_cfg' not list in settings file, use default value: model_cfg=params/model/edgeyolo.yaml
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'weights' not list in settings file, use default value: weights=output/train/edgeyolo_coco/last.pth
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'use_cfg' not list in settings file, use default value: use_cfg=False
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'output_dir' not list in settings file, use default value: output_dir=output/train/edgeyolo_coco
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'save_checkpoint_for_each_epoch' not list in settings file, use default value: save_checkpoint_for_each_epoch=True
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'log_file' not list in settings file, use default value: log_file=log.txt
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'dataset_cfg' not list in settings file, use default value: dataset_cfg=params/dataset/coco.yaml
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'batch_size_per_gpu' not list in settings file, use default value: batch_size_per_gpu=8
2023-02-22 21:08:33.924 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'loader_num_workers' not list in settings file, use default value: loader_num_workers=4
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'num_threads' not list in settings file, use default value: num_threads=1
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'device' not list in settings file, use default value: device=[0, 1, 2, 3]
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'fp16' not list in settings file, use default value: fp16=False
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'cudnn_benchmark' not list in settings file, use default value: cudnn_benchmark=False
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'optimizer' not list in settings file, use default value: optimizer=SGD
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'max_epoch' not list in settings file, use default value: max_epoch=300
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'close_mosaic_epochs' not list in settings file, use default value: close_mosaic_epochs=15
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'lr_per_img' not list in settings file, use default value: lr_per_img=0.00015625
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'warmup_epochs' not list in settings file, use default value: warmup_epochs=5
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'warmup_lr_ratio' not list in settings file, use default value: warmup_lr_ratio=0.0
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'final_lr_ratio' not list in settings file, use default value: final_lr_ratio=0.05
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'loss_use' not list in settings file, use default value: loss_use=['bce', 'bce', 'giou']
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'input_size' not list in settings file, use default value: input_size=[640, 640]
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'multiscale_range' not list in settings file, use default value: multiscale_range=5
2023-02-22 21:08:33.925 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'weight_decay' not list in settings file, use default value: weight_decay=0.0005
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'momentum' not list in settings file, use default value: momentum=0.9
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'enhance_mosaic' not list in settings file, use default value: enhance_mosaic=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'use_ema' not list in settings file, use default value: use_ema=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'enable_mixup' not list in settings file, use default value: enable_mixup=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mixup_scale' not list in settings file, use default value: mixup_scale=[0.5, 1.5]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mosaic_scale' not list in settings file, use default value: mosaic_scale=[0.1, 2.0]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'flip_prob' not list in settings file, use default value: flip_prob=0.5
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mosaic_prob' not list in settings file, use default value: mosaic_prob=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'mixup_prob' not list in settings file, use default value: mixup_prob=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'degrees' not list in settings file, use default value: degrees=10
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'hsv_gain' not list in settings file, use default value: hsv_gain=[0.0138, 0.664, 0.464]
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_at_start' not list in settings file, use default value: eval_at_start=False
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'val_conf_thres' not list in settings file, use default value: val_conf_thres=0.001
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'val_nms_thres' not list in settings file, use default value: val_nms_thres=0.65
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_only' not list in settings file, use default value: eval_only=False
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'obj_conf_enabled' not list in settings file, use default value: obj_conf_enabled=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'eval_interval' not list in settings file, use default value: eval_interval=1
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'print_interval' not list in settings file, use default value: print_interval=100
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'load_optimizer_params' not list in settings file, use default value: load_optimizer_params=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'train_backbone' not list in settings file, use default value: train_backbone=True
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'train_start_layers' not list in settings file, use default value: train_start_layers=51
2023-02-22 21:08:33.926 | INFO | edgeyolo.train.launch_train:load_train_settings:31 - param 'force_start_epoch' not list in settings file, use default value: force_start_epoch=-1
Traceback (most recent call last):
File "D:\myworkspace\yolov5\edgeyolo-main\train.py", line 16, in
train("DEFAULT" if args.default else args.cfg)
File "D:\myworkspace\yolov5\edgeyolo-main\edgeyolo\train\launch_train.py", line 101, in launch
mp.start_processes(
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 188, in start_processes
while not context.join():
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\multiprocessing\spawn.py", line 59, in wrap
fn(i, *args)
File "D:\myworkspace\yolov5\edgeyolo-main\edgeyolo\train\launch_train.py", line 50, in train_single
torch.cuda.set_device(device)
File "D:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init
.py", line 261, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal

On my own dataset, the issue of map being 0

20230626_220743 edgeyolo.train.trainer:379 - Start Train Epoch 1
20230626_220826 edgeyolo.train.trainer:457 - epoch:1/150 iter:100/488 mem:5829MB t_iter:1.04 lr:3.149e-06 loss:{total:9.27 iou:3.17 conf:5.53 cls:0.57} ETA:8:36:33
20230626_220908 edgeyolo.train.trainer:457 - epoch:1/150 iter:200/488 mem:5829MB t_iter:0.96 lr:1.260e-05 loss:{total:7.25 iou:3.01 conf:3.52 cls:0.72} ETA:8:37:18
20230626_220950 edgeyolo.train.trainer:457 - epoch:1/150 iter:300/488 mem:5831MB t_iter:1.17 lr:2.834e-05 loss:{total:6.50 iou:2.50 conf:3.47 cls:0.53} ETA:8:33:24
20230626_221032 edgeyolo.train.trainer:457 - epoch:1/150 iter:400/488 mem:5831MB t_iter:0.88 lr:5.039e-05 loss:{total:4.61 iou:1.99 conf:1.98 cls:0.65} ETA:8:30:49
20230626_221108 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/last.pth
20230626_221108 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/epoch_001.pth
0it [00:07, ?it/s]
20230626_221116 edgeyolo.train.val.coco_evaluator:208 - Evaluate in main process...
20230626_221116 edgeyolo.train.trainer:523 -
Average forward time: 0.00 ms, Average NMS time: 0.00 ms, Average inference time: 0.00 ms

20230626_221116 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/last.pth
20230626_221116 edgeyolo.models:146 - weight file saved to output/train/tiny_lrelu/epoch_001.pth
Why is the NMS time and inference time showing zero on my own dataset?And,map0.5 and map0.5:0.95 of evaluation on my model are zero.

部署

您好我想问一下用您的模型生成了onnx文件后,我是否能够用已经成型的框架把它部署到安卓端(例如ncnn)

best.pth

您好,我用您的模型训练了下自己的数据集,在经过300次epoch后并没有出现best.pth,而是出现了一个last_augmentation_epoch.pth,请问这是什么原因啊,谢谢

Could you share some details about running inference on RK3588 NPU?

Hi guys,

I have RK3588 board, that you have used in paper as edge device to run on.

I have trained my weights using edgeyolo_tiny_lrelu , converted to ONNX , than to RKNN using rknn-toolkit 1.4

Command for export to onnx used was

python export.py --onnx-only --weights /workspaces/rocm-ml/edgeyolo/output/train/edgeyolo_lp_2/best.pth --opset 12

However currently i am unable to use QUANTIZE_ON during onnx->rknn conversion, i have used same dataset as for validation during training , different range from 10 to 50 images, without success --> resulting rknn model always outputs bogus.
You are mentioned QUANTIZATION enabled in your paper, could you share same details about how you managed to make it work?

I am running it on RK3588 (from radxa , rock5b) my inference speed is around 11fps~

You are mentioned 32fps in your paper.

  • I am curious if you implemented multi-threaded inference on RK3588 using all 3 NPU cores by yourself? , which would be 3x11fps ?
  • Also, have you used edgeyolo_tiny_lrelu.yaml as model_cfg?

Could you please share some more details about running edge-yolo on rk3588, paper nor code here is showing some insights about running it on that edge device.

Thank you in advance~

Custom Training gives no detections

Hi,
I am training edgeyolo on a custom dataset. Unfortunately the model does not detect any objects.
Also the training loss does not decrease as expected.

number of images:
train images: 15000
val images: 4370

and the auto calculated anchors in yolov4 would be:
[5, 10], [13, 30], [25, 64], [38,123], [51,230], [87,158], [82,340], [136,420], [256,492]

My questions now:

Do I need to edit the anchors in params/model/edgeyolo.yaml ?
And do I need to change the learning rate maybe?

Train Bug

I already use pytorch1.8.0, but still encounter the bug during training, could you give me some help? Thank you

/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [60,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [61,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [62,0,0] Assertion `input_val >= zero && input_val <= one` failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [301,0,0], thread: [63,0,0] Assertion `input_val >= zero && input_val <= one` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered
�[32m20231218_233328�[0m �[36medgeyolo.train.loss:371�[0m - �[31m�[1merror msg: CUDA error: device-side assert triggered�[0m
terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fa857cfe2f2 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fa857cfb67b in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7fa857f561f9 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fa857ce63a4 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e44ca (0x7fa8cbb0a4ca in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e4561 (0x7fa8cbb0a561 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x509306]
frame #7: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0360]
frame #8: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #9: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #10: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #11: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5023c9]
frame #12: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x502019]
frame #13: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x501fdd]
frame #14: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4df468]
frame #15: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c8443]
frame #16: _PyEval_EvalFrameDefault + 0x4b37 (0x4ec567 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #17: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #18: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x685 (0x4e80b5 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #20: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f8123]
frame #21: _PyEval_EvalFrameDefault + 0x3c7 (0x4e7df7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #22: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #23: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x1231 (0x4e8c61 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #25: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #26: _PyEval_EvalCodeWithName + 0x47 (0x4e67b7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #27: PyEval_EvalCodeEx + 0x39 (0x4e6769 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #28: PyEval_EvalCode + 0x1b (0x59466b in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #29: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c1dc7]
frame #30: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5bddd0]
frame #31: PyRun_StringFlags + 0x9b (0x5b59eb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #32: PyRun_SimpleStringFlags + 0x3b (0x5b56cb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #33: Py_RunMain + 0x25c (0x5b4f0c in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #34: Py_BytesMain + 0x39 (0x588719 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #35: __libc_start_main + 0xe7 (0x7fa8ce156c87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #36: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5885ce]

terminate called after throwing an instance of 'c10::Error'
  what():  CUDA error: device-side assert triggered
Exception raised from create_event_internal at /pytorch/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f464b8172f2 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f464b81467b in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x809 (0x7f464ba6f1f9 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7f464b7ff3a4 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #4: <unknown function> + 0x6e44ca (0x7f46bf6234ca in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x6e4561 (0x7f46bf623561 in /home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #6: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x509306]
frame #7: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0360]
frame #8: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #9: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #10: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f0427]
frame #11: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5023c9]
frame #12: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x502019]
frame #13: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x501fdd]
frame #14: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4df468]
frame #15: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c8443]
frame #16: _PyEval_EvalFrameDefault + 0x4b37 (0x4ec567 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #17: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #18: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x685 (0x4e80b5 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #20: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4f8123]
frame #21: _PyEval_EvalFrameDefault + 0x3c7 (0x4e7df7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #22: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #23: _PyFunction_Vectorcall + 0xd4 (0x4f7e54 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x1231 (0x4e8c61 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #25: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x4e6b2a]
frame #26: _PyEval_EvalCodeWithName + 0x47 (0x4e67b7 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #27: PyEval_EvalCodeEx + 0x39 (0x4e6769 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #28: PyEval_EvalCode + 0x1b (0x59466b in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #29: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5c1dc7]
frame #30: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5bddd0]
frame #31: PyRun_StringFlags + 0x9b (0x5b59eb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #32: PyRun_SimpleStringFlags + 0x3b (0x5b56cb in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #33: Py_RunMain + 0x25c (0x5b4f0c in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #34: Py_BytesMain + 0x39 (0x588719 in /home/wsy/anaconda3/envs/pytorch1.8/bin/python)
frame #35: __libc_start_main + 0xe7 (0x7f46c1c6fc87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #36: /home/wsy/anaconda3/envs/pytorch1.8/bin/python() [0x5885ce]

Traceback (most recent call last):
  File "/home/wsy/paper/Edgeyolo-231206/train.py", line 16, in <module>
    train("DEFAULT" if args.default else args.cfg)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/launch_train.py", line 101, in launch
    mp.start_processes(
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 355, in get_losses
    ) = self.get_assignments(  # noqa
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 553, in get_assignments
    ) = self.dynamic_k_matching(cost, pair_wise_ious, gt_classes, num_gt, fg_mask)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 668, in dynamic_k_matching
    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/launch_train.py", line 73, in train_single
    trainer.train()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 499, in train
    train_one_epoch()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 485, in train_one_epoch
    train_one_iter()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 460, in train_one_iter
    train_in_iter()
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/trainer.py", line 410, in train_in_iter
    outputs = self.loss(outputs, (targets, mask_edge))
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 241, in forward
    loss, bbox_loss, confidence_loss, class_loss, l1_loss, num_fg = self.get_losses(
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 389, in get_losses
    ) = self.get_assignments(  # noqa
  File "/home/wsy/anaconda3/envs/pytorch1.8/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/wsy/paper/Edgeyolo-231206/edgeyolo/train/loss.py", line 492, in get_assignments
    gt_bboxes_per_image = gt_bboxes_per_image.cpu().float()
RuntimeError: CUDA error: device-side assert triggered

关于detect.py

非常感谢您的分享。我在运行evaluate的时候是没问题的,训练结束后想要进行detect,但是总是报错,错误如下,请问是我的环境配置出错了吗,方便提供下python,pytorch,cuda,opencv的版本吗,我这里可能是opencv出了问题,但是尝试了许多版本都没办法进行预测。
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
After re-parameterization: Params: 40.47M, Gflops: 280.00
242.7ms average:242.7msqt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/home/lzy30/anaconda3/envs/edgeyolo/lib/python3.8/site-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

训练好的模型问题

博主您好,我在使用您的模型对DeepFashion2数据集(采用YOLO格式标签,13个类别,衣服类型)训练后,生成的模型进行正向推理时发生问题,检测框是有的,但是检测框输出的所有类别都是一样的,例如检测框正确框出了“裤子”和“长袖上衣”,但是输出的文字内容均为“短袖”,请问训练自己数据集时,除了修改yaml文件之外还需要修改哪些地方呢?谢谢解答

VisDrone权重的模型是有其他设置吗?遇到了一些问题

很nice的模型!先感谢一波大佬的分享。
我自己尝试用VisDrone2019的MOTval转COCO后跑了下eval,发现mAP很低,用的最好的那个VisDrone权重用的参数是默认的。后续用Detect直接试着跑了一轮的MOT数据集里的图,发现效果没问题。

AnchorFreeDetect参数融合

将YOLOXDetect切换为AnchorFreeDetect,其中的ia参数融合报错。
code:
for i in range(len(self.m)):
c1, c2, , _ = self.m[i].weight.shape
c1
, c2_, , _ = self.ia[i].implicit.shape
self.m[i].bias += torch.matmul(self.m[i].weight.reshape(c1, c2),
self.ia[i].implicit.reshape(c2
, c1_)).squeeze(1)
Error:
ValueError: Cannot assign non-leaf Tensor to parameter 'bias'. Model parameters must be created explicitly. To express 'bias' as a function of another Tensor, compute the value in the forward() method.

端侧推理

您好,请问有做arm端的推理延迟对比嘛?
Jetson AGX Xavier这个芯片算力比一般普通芯片高太多了,如高通、联发科的一些arm端芯片

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.