Giter VIP home page Giter VIP logo

libtorch-yolov5's Introduction

Yasen's github stats

libtorch-yolov5's People


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

libtorch-yolov5's Issues


when i run the code,i got this error ,how could i solve it?

terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at "/dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h":842, please report a bug to PyTorch. Expected Tuple but got GenericList
Exception raised from toTuple at /dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h:842 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f569de58eb9 in /dxd/libtorch-yolov5/libtorch/lib/
frame #1: c10::IValue::toTuple() const & + 0xe5 (0x4206cd in ./libtorch-yolov5)
frame #2: ./libtorch-yolov5() [0x41916a]
frame #3: ./libtorch-yolov5() [0x4316f6]
frame #4: __libc_start_main + 0xf0 (0x7f5654d44840 in /lib/x86_64-linux-gnu/
frame #5: ./libtorch-yolov5() [0x4176b9]

Aborted (core dumped)

Modify LetterboxImage error

Hello, thank you very much for your open source, it helped me a lot, I have a question:
When the model input image size is 640640, the accuracy of the prediction result changes and the reasoning time becomes longer; then I modified LetterboxImage (refer to the python version), the model input image size is 640480, but the error is reported as follows:

terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/models/", line 45, in forward
_35 = (_4).forward(_34, )
_36 = (_2).forward((_3).forward(_35, ), _29, )
_37 = (_0).forward(_33, _35, (_1).forward(_36, ), )
~~~~~~~~~~~ <--- HERE
_38, _39, _40, _41, = _37
return (_41, [_38, _39, _40])
File "code/torch/models/", line 75, in forward
_52 = torch.sub(_51, CONSTANTS.c3, alpha=1)
_53 =, dtype=6, layout=0, device=torch.device("cpu"), pin_memory=None, non_blocking=False, copy=False, memory_format=None)
_54 = torch.mul(torch.add(_52, _53, alpha=1),, 0, 0))
~~~~~~~~~ <--- HERE
_55 = torch.slice(y, 4, 0, 2, 1)
_56 = torch.expand(torch.view(_54, [3, 80, 80, 2]), [1, 3, 80, 80, 2], implicit=True)

Traceback of TorchScript, original code (most recent call last):
/home//PycharmProjects/paper_yolov5/models/ forward
//anaconda3/lib/python3.8/site-packages/torch/nn/modules/ _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/ _call_impl
//PycharmProjects/paper_yolov5/models/ forward_once
/home////PycharmProjects/paper_yolov5/models/ forward
//anaconda3/lib/python3.8/site-packages/torch/nn/modules/ _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/ _call_impl
//anaconda3/lib/python3.8/site-packages/torch/jit/ trace_module
/home////anaconda3/lib/python3.8/site-packages/torch/jit/ trace
RuntimeError: The size of tensor a (60) must match the size of tensor b (80) at non-singleton dimension 3

How to solve it, thank you!






`// load input image
std::vectorcv::String filenames;
cv::String folder = "/home/xavier/dataset/DF";
cv::glob(folder, filenames);
for(size_t i = 0; i < filenames.size(); ++i)
cv::Mat img = cv::imread(filenames[i]);
if (img.empty())
std::cerr << "Error loading the image!\n";
return -1;
// load network
std::string weights = opt["weights"].asstd::string();
auto detector = Detector(weights, device_type);

// set up threshold
float conf_thres = opt["conf-thres"].as<float>();
float iou_thres = opt["iou-thres"].as<float>();

// inference
auto result = detector.Run(img, conf_thres, iou_thres);

// visualize detections
if (opt["view-img"].as<bool>()) {
    Demo(img, result[0], class_names);


Export ONNX with CUDA

I have modified the " to support GPU,but still receive the following error:

RuntimeError: Input, output and indices must be on the current device

Do you have any suggestions on how this issue can be resolved?


cpu model

python models\ --device cpu


Run once on empty image
----------New Frame----------
pre-process takes : 60 ms
inference takes : 4630 ms
post-process takes : 69 ms
----------New Frame----------
pre-process takes : 77 ms
inference takes : 3762 ms
post-process takes : 155 ms

gpu model

python models\ --device 0


Run once on empty image
----------New Frame----------
pre-process takes : 40 ms
inference takes : 2766 ms
post-process takes : 1 ms
----------New Frame----------
pre-process takes : 32 ms
inference takes : 10285 ms
post-process takes : 11 ms

Error in cmake building

Hi @yasenh
I installed all dependencies and did setup as you said in repo. But when i wanted to build it with cmake(cmake .. && make
) i got this error:
Screenshot from 2020-12-29 05-30-36

Can you please tell me whats the problem ?
Thank you

如何能使用 224X的图

尝试修改 std::vector pad_info = LetterboxImage(img_input, img_input, cv::Size(640, 640)); 为std::vector pad_info = LetterboxImage(img_input, img_input, cv::Size(224, 224)); 出现问题。

post proccesing takes too long

model forwarding takes only ~5 ms to infer the input blob, but post processing takes about 50 ms , i wonder as pytorch implementation(python) takes only 15 ms for both infer and post processing , but here it's taking too long for post-processing, is there any way to optimize post-processing for low latency.

Performance on Win10 with GPU

My device is I5 with GPU 1080TI 11GB, and I have successfully complie and run on WIN10 with GPU, but why the inference take taht much time(Release mode)? Already coment out the warm up part in the main.cpp, and it will still take around 500ms to process one single image. But when I using the same model to do detection in python, it works much more efficient with 20FPS. Dont know whats wrong with my configuration or any other issue is the C++ project decrease the performance.

When `PostProcessing()` error occurred


  • libtorch 1.5.0 debug.
  • Visual studio 2019
  • Windows 10

First step, PostProcess() with temp_img is fine.
But second step, PostProcess() with my custom image the error occurred

ㄴ When I set half_ to false, the error has gone. Why this error occured? How to run this code with half_

Why is the detection speed slow on GPU?

Dear author, I use to detect images. It takes 289 ms per frame on CPU and 127 ms per frame on rtx3070 graphics card. Why is it slow on GPU? Pytorch version yolov5 takes 11 ms per frame on rtx3070 graphics card。

I look forward to your help!

Python and libtorch model prediction results are inconsistent

Hello, I have updated the version of YOLOv5 (4.0). I found that the prediction results of the python model are a little different from the results predicted by the libtorch model. The prediction results of the 3.1 version are the same. What is the reason? Can you help me, thank you!

The models pt to TorchScript what it's unsuccessful

python models/ --weights --img 640 --batch 1

Fusing layers...
Model Summary: 120 layers, 7.06617e+06 parameters, 7.06617e+06 gradients
Traceback (most recent call last):
File "models/", line 41, in
y = model(img) # dry run


type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'Detect' object has no attribute 'm'

time of post process is way too long(后处理的时间太长了)


./libtorch-yolov5 error

Thank you for your work, when I run "./libtorch-yolov5 /data_1/train_project/OBJ_Detection/yolov5-forward/module/ /data_1/train_project/OBJ_Detection/yolov5-forward/img/000240_01046820200606110918_0035_670_3cls.jpg -gpu".
The following error occurred

terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723, please report a bug to PyTorch. Expected Tuple but got GenericList (toTuple at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f7d00dfaaaa in /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/lib/
frame #1: c10::IValue::toTuple() const & + 0x121 (0x559bed24f2b3 in ./libtorch-yolov5)
frame #2: + 0xef9c (0x559bed245f9c in ./libtorch-yolov5)
frame #3: + 0x4176b (0x559bed27876b in ./libtorch-yolov5)
frame #4: __libc_start_main + 0xe7 (0x7f7cabd8eb97 in /lib/x86_64-linux-gnu/
frame #5: + 0xcc6a (0x559bed243c6a in ./libtorch-yolov5)

how to solve this problem


terminate called after throwing an instance of 'std::bad_alloc'
what(): std;;bad_alloc

Performance difference between running model in python vs c++?

Hi @yasenh ,
The code is beautifully written in c++. Even I tried but could not perform the end to end successfully due to lack of expertise in c++ in libtorch. Could you also provide some sort of statistics that could tell whether running the model in c++ improves the performance as compared to running the same model in python in GPU?

What I guess is there should not be much difference, even if it exists. And if exists, why such performance difference is coming? I know I am asking too much, but if you could analyze it, it would be really useful.

Memory leak issues, the program will die

Memory leak issues, 3 consecutive for ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/ --gpu --view-img

If the memory is not released, the program will die.


Does anyone run GPU inference successfully?

I could not run the inference with GPU enabled. I follow the instructions to modify the code to export the torchscript model with GPU, but when inferring with libtorch, it cannot load the weight.

Does anyone know how to solve it?

My OS is windows 10 and it is able to run the CPU torchscript model.

Thanks in advance.


0x00007FFB68A2A799 处(位于 Demo.exe 中)有未经处理的异常: Microsoft C++ 异常: c10::Error,位于内存位置 0x00000012BCF9BE60 处。

代码中断跳转到 :kernel_lambda.h
auto operator()(Parameters... args) -> decltype(std::declval()(std::forward(args)...)) {
return kernel_func_(std::forward(args)...);

// get the max classes score at each result (e.g. elements 5-84)
std::tuple<torch::Tensor, torch::Tensor> max_classes = torch::max(det.slice(1, item_attr_size, item_attr_size + num_classes), 1);

I did all the steps but in the make step I get this error

[ 33%] Building CXX object CMakeFiles/libtorch-yolov5.dir/src/detector.cpp.o
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/ArrayRef.h:19:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/MemoryFormat.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/core/TensorBody.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Tensor.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Context.h:4,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/C++17.h:24:2: error: #error You need C++14 to compile PyTorch
#error You need C++14 to compile PyTorch
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/Exception.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
inline decltype(auto) str(const Args&... args) {
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected ‘)’ before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:8: error: expected unqualified-id before ‘decltype’
inline decltype(auto) str(const Args&... args) {
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h: In member function ‘void c10::Device::validate()’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:96:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(index_ == -1 || index_ >= 0,
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:98:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(!is_cpu() || index_ <= 0,
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h: In member function ‘void* c10::Allocator::raw_allocate(size_t)’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());

run yolov5 v4.0 error

run: ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/ --gpu --view-img
terminate called after throwing an instance of 'torch::jit::ErrorReport'

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor):
Expected at most 12 arguments but found 13 positional arguments.

why found 13 positional arguments?

system is NVIDIA Jetson Xavier NX and docker
opencv 4.4.0
libtorch 1.6.0
cuda 10.2
yolov5 v4.0



`VideoCapture capture;
std::cout << "finish load network and open the video" << std::endl;"/home/****/libtorch-yolov5/test.mp4");
if (!capture.isOpened())
        std::cout << "can not open ...\n" << std::endl;
        return -1;
Mat frame;
// set up threshold
float conf_thres = 0.4;//opt["conf-thres"].as<float>();
float iou_thres = 0.5;//opt["iou-thres"].as<float>();
for (;;)
    capture >> frame;
    //Mat pic;
    if (frame.empty()) break;
    std::cout << "start forward" <<std::endl;
    auto result = detector.Run(frame, conf_thres, iou_thres);
    Demo(frame, result, class_names);
    if (waitKey(33) >= 0) break;

//return 0;`

----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 137 ms <-------------------------------------------------------137
post-process takes : 19 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 5 ms
inference takes : 7869 ms <--------------------------------------------------------7869
post-process takes : 24 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 3 ms
inference takes : 8 ms <------------------------------------------------------------8
post-process takes : 25 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 8 ms <-------------------------------------------------------------8
post-process takes : 23 ms

注:不知道有什么作用,所以我取消掉了warm up。

there is no clip coords process in your code

Hi, when i use your code , i find a problem. In python version code of yolov5, there is a clip_coords function in /utils/ rows) which is to Clip bounding xyxy bounding boxes to image shape (height, width). Sometimes my predict box value may out of image size, so i add a clip coords process in your detector.cpp code. I wonder if I'm doing the right thing. Thank you for sharing.


hello,thxfor your code ,did you test the batch reference?

auto detections = output.toTuple()->elements()[0].toTensor();

执行到 auto detections = output.toTuple()->elements()[0].toTensor(); 出现错误中断:

inline c10::intrusive_ptrivalue::Tuple IValue::toTuple() const & {
AT_ASSERT(isTuple(), "Expected Tuple but got ", tagKind());
return toIntrusivePtrivalue::Tuple();

No CUDA for inference

Hi, just wondering if it is possible to build without CUDA? I don't have a NVIDIA GPU so I want to do inference with CPU

model.model[-1].export = False 疑问

有一个疑问,在导出模型时model.model[-1].export = False 但是Detect层中还有对每个特征图进行卷积操作的操作,如果导出模型不导出Detect层,模型的输出不就与训练的不一致了么?

How to debug?

Hi, thanks for much for creating this repo and it is really awesome.

My question is how to debug with libtorch? Now I face the problem of "segmentation fault(core dumped)" after running the warm-up.


I tried to debug with VSCode, but I could not go deep into libtorch library.

Is it compulsory to use debug version of libtorch?

No obj problem

hello ,
Let me post a question in your project. In detect.cpp,
// if none remain then process next image
if (det.size(1) == 0) {
det.size(1) == 0 should be det.size(0) == 0 Is that right?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.