yasenh / libtorch-yolov5 Goto Github PK
View Code? Open in Web Editor NEWA LibTorch inference implementation of the yolov5
License: MIT License
A LibTorch inference implementation of the yolov5
License: MIT License
when i run the code,i got this error ,how could i solve it?
terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at "/dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h":842, please report a bug to PyTorch. Expected Tuple but got GenericList
Exception raised from toTuple at /dxd/libtorch-yolov5/libtorch/include/ATen/core/ivalue_inl.h:842 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x69 (0x7f569de58eb9 in /dxd/libtorch-yolov5/libtorch/lib/libc10.so)
frame #1: c10::IValue::toTuple() const & + 0xe5 (0x4206cd in ./libtorch-yolov5)
frame #2: ./libtorch-yolov5() [0x41916a]
frame #3: ./libtorch-yolov5() [0x4316f6]
frame #4: __libc_start_main + 0xf0 (0x7f5654d44840 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: ./libtorch-yolov5() [0x4176b9]
Aborted (core dumped)
Hello, thank you very much for your open source, it helped me a lot, I have a question:
When the model input image size is 640640, the accuracy of the prediction result changes and the reasoning time becomes longer; then I modified LetterboxImage (refer to the python version), the model input image size is 640480, but the error is reported as follows:
terminate called after throwing an instance of 'std::runtime_error'
what(): The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/models/yolo.py", line 45, in forward
_35 = (_4).forward(_34, )
_36 = (_2).forward((_3).forward(_35, ), _29, )
_37 = (_0).forward(_33, _35, (_1).forward(_36, ), )
~~~~~~~~~~~ <--- HERE
_38, _39, _40, _41, = _37
return (_41, [_38, _39, _40])
File "code/torch/models/yolo.py", line 75, in forward
_52 = torch.sub(_51, CONSTANTS.c3, alpha=1)
_53 = torch.to(CONSTANTS.c4, dtype=6, layout=0, device=torch.device("cpu"), pin_memory=None, non_blocking=False, copy=False, memory_format=None)
_54 = torch.mul(torch.add(_52, _53, alpha=1), torch.select(CONSTANTS.c5, 0, 0))
~~~~~~~~~ <--- HERE
_55 = torch.slice(y, 4, 0, 2, 1)
_56 = torch.expand(torch.view(_54, [3, 80, 80, 2]), [1, 3, 80, 80, 2], implicit=True)
Traceback of TorchScript, original code (most recent call last):
/home//PycharmProjects/paper_yolov5/models/yolo.py(57): forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
/home////PycharmProjects/paper_yolov5/models/yolo.py(137): forward_once
/home////PycharmProjects/paper_yolov5/models/yolo.py(121): forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(709): _slow_forward
/home////anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py(725): _call_impl
/home////anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py(934): trace_module
/home////anaconda3/lib/python3.8/site-packages/torch/jit/_trace.py(733): trace
/home////PycharmProjects/paper_yolov5/models/export.py(57):
RuntimeError: The size of tensor a (60) must match the size of tensor b (80) at non-singleton dimension 3
How to solve it, thank you!
不知道是什么原因,我设置的是一张图一张图过的,有的模型第一张图要7.8秒时间,第二张图也要1.2秒,有的模型第一张图几百毫秒,第二张图最高甚至要30秒。但是很奇怪的是,过了前两张图就全部正常了,总体时间也就几十毫秒左右。
不知道有没有遇到相同问题的,找到原因的,推理时间这块要找原因无从下手啊!
在对您的代码进行简单修改后,使用循环读取本地文件的方式,用yolov5s模型,进行效果测试,发现推理速度只有两三帧,且查看GPU,发现GPU的占用率很小,所以想问下,该工程是不是不支持模型加载一次,而进行预测。修改代码部分如图。期待您的答复,谢谢!
`// load input image
std::vectorcv::String filenames;
cv::String folder = "/home/xavier/dataset/DF";
cv::glob(folder, filenames);
for(size_t i = 0; i < filenames.size(); ++i)
{
cv::Mat img = cv::imread(filenames[i]);
//std::cout<<"******"<<filenames[i]<<std::endl;
if (img.empty())
{
std::cerr << "Error loading the image!\n";
return -1;
}
// load network
std::string weights = opt["weights"].asstd::string();
auto detector = Detector(weights, device_type);
// set up threshold
float conf_thres = opt["conf-thres"].as<float>();
float iou_thres = opt["iou-thres"].as<float>();
// inference
auto result = detector.Run(img, conf_thres, iou_thres);
// visualize detections
if (opt["view-img"].as<bool>()) {
Demo(img, result[0], class_names);
}
//cv::destroyAllWindows();
}`
Hi!
I have modified the "export.py to support GPU,but still receive the following error:
RuntimeError: Input, output and indices must be on the current device
Do you have any suggestions on how this issue can be resolved?
Thanks!
使用models下的export.py(此文件未改动,来自最新的yolov5)导出模型yolov5s.pt
cpu model
导出
python models\export.py --device cpu
运行
Run once on empty image
----------New Frame----------
pre-process takes : 60 ms
inference takes : 4630 ms
post-process takes : 69 ms
----------New Frame----------
pre-process takes : 77 ms
inference takes : 3762 ms
post-process takes : 155 ms
gpu model
导出
python models\export.py --device 0
运行
Run once on empty image
----------New Frame----------
pre-process takes : 40 ms
inference takes : 2766 ms
post-process takes : 1 ms
----------New Frame----------
pre-process takes : 32 ms
inference takes : 10285 ms
post-process takes : 11 ms
Hi @yasenh
I installed all dependencies and did setup as you said in repo. But when i wanted to build it with cmake(cmake .. && make
) i got this error:
Can you please tell me whats the problem ?
Thank you
Load and predict is OK.
But when I try .toTuple()
Error occured.
How to solve it?
尝试修改 std::vector pad_info = LetterboxImage(img_input, img_input, cv::Size(640, 640)); 为std::vector pad_info = LetterboxImage(img_input, img_input, cv::Size(224, 224)); 出现问题。
model forwarding takes only ~5 ms to infer the input blob, but post processing takes about 50 ms , i wonder as pytorch implementation(python) takes only 15 ms for both infer and post processing , but here it's taking too long for post-processing, is there any way to optimize post-processing for low latency.
My device is I5 with GPU 1080TI 11GB, and I have successfully complie and run on WIN10 with GPU, but why the inference take taht much time(Release mode)? Already coment out the warm up part in the main.cpp, and it will still take around 500ms to process one single image. But when I using the same model to do detection in python, it works much more efficient with 20FPS. Dont know whats wrong with my configuration or any other issue is the C++ project decrease the performance.
Dear author, I use yolov5s.pt to detect images. It takes 289 ms per frame on CPU and 127 ms per frame on rtx3070 graphics card. Why is it slow on GPU? Pytorch version yolov5 takes 11 ms per frame on rtx3070 graphics card。
I look forward to your help!
I installed Python>=3.8 and PyTorch>=1.6 and also ONNX>=1.7.
I used export.py with your suggestion here: https://github.com/yasenh/libtorch-yolov5#torchscript-model-export
and I got these results, just got yolov5s.torchscript.pt, I didn't get yolov5s.onnx:
When I use weight (yolov5s.torchscript.pt) to run inference, I got these:
Is there any advice? Thank you.
Hello, I have updated the version of YOLOv5 (4.0). I found that the prediction results of the python model are a little different from the results predicted by the libtorch model. The prediction results of the 3.1 version are the same. What is the reason? Can you help me, thank you!
Hi
python models/export.py --weights yolov5s.pt --img 640 --batch 1
Fusing layers...
Model Summary: 120 layers, 7.06617e+06 parameters, 7.06617e+06 gradients
Traceback (most recent call last):
File "models/export.py", line 41, in
y = model(img) # dry run
.
.
.
type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: 'Detect' object has no attribute 'm'
在python代码中用yolov5x模型一张图的预测时间是40ms。包括了得到pred的时间和nms的时间。
在c++代码中,pred的时间在20ms左右,但是nms时间达到了50ms,但实际上,nms中最耗时的部分是从gpu到cpu的数据转化,实际的nms计算并不会这样耗时,这个应该是有办法可以优化的,盼能提点一二。
hi
Thank you for your work, when I run "./libtorch-yolov5 /data_1/train_project/OBJ_Detection/yolov5-forward/module/torchscript.pt /data_1/train_project/OBJ_Detection/yolov5-forward/img/000240_01046820200606110918_0035_670_3cls.jpg -gpu".
The following error occurred
terminate called after throwing an instance of 'c10::Error'
what(): isTuple() INTERNAL ASSERT FAILED at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723, please report a bug to PyTorch. Expected Tuple but got GenericList (toTuple at /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/include/ATen/core/ivalue_inl.h:723)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f7d00dfaaaa in /data_1/train_project/OBJ_Detection/yolov5-forward/libtorch/lib/libc10.so)
frame #1: c10::IValue::toTuple() const & + 0x121 (0x559bed24f2b3 in ./libtorch-yolov5)
frame #2: + 0xef9c (0x559bed245f9c in ./libtorch-yolov5)
frame #3: + 0x4176b (0x559bed27876b in ./libtorch-yolov5)
frame #4: __libc_start_main + 0xe7 (0x7f7cabd8eb97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #5: + 0xcc6a (0x559bed243c6a in ./libtorch-yolov5)
how to solve this problem
terminate called after throwing an instance of 'std::bad_alloc'
what(): std;;bad_alloc
Hi @yasenh ,
The code is beautifully written in c++. Even I tried but could not perform the end to end successfully due to lack of expertise in c++ in libtorch. Could you also provide some sort of statistics that could tell whether running the model in c++ improves the performance as compared to running the same model in python in GPU?
What I guess is there should not be much difference, even if it exists. And if exists, why such performance difference is coming? I know I am asking too much, but if you could analyze it, it would be really useful.
Hi
Memory leak issues, 3 consecutive for ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img
If the memory is not released, the program will die.
Thanks
每次运行可能会出现内存泄露?现象就是内存不断占用,但是一直没有前向
I could not run the inference with GPU enabled. I follow the instructions to modify the export.py code to export the torchscript model with GPU, but when inferring with libtorch, it cannot load the weight.
Does anyone know how to solve it?
My OS is windows 10 and it is able to run the CPU torchscript model.
Thanks in advance.
0x00007FFB68A2A799 处(位于 Demo.exe 中)有未经处理的异常: Microsoft C++ 异常: c10::Error,位于内存位置 0x00000012BCF9BE60 处。
代码中断跳转到 :kernel_lambda.h
auto operator()(Parameters... args) -> decltype(std::declval()(std::forward(args)...)) {
return kernel_func_(std::forward(args)...);
}
调试在这里开始引起崩溃停止:
// get the max classes score at each result (e.g. elements 5-84)
std::tuple<torch::Tensor, torch::Tensor> max_classes = torch::max(det.slice(1, item_attr_size, item_attr_size + num_classes), 1);
Hi, I wanna use this code to inference with cpu-only, without cuda support?
[ 33%] Building CXX object CMakeFiles/libtorch-yolov5.dir/src/detector.cpp.o
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/ArrayRef.h:19:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/MemoryFormat.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/core/TensorBody.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Tensor.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/Context.h:4,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/C++17.h:24:2: error: #error You need C++14 to compile PyTorch
#error You need C++14 to compile PyTorch
^~~~~
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/Exception.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
inline decltype(auto) str(const Args&... args) {
^~~~
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected ‘)’ before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:17: error: expected primary-expression before ‘auto’
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/util/StringUtil.h:86:8: error: expected unqualified-id before ‘decltype’
inline decltype(auto) str(const Args&... args) {
^~~~~~~~
In file included from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:5:0,
from /home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:6,
from /home/alikarimi/libtorch-yolov5/libtorch/include/ATen/ATen.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/csrc/api/include/torch/types.h:3,
from /home/alikarimi/libtorch-yolov5/libtorch/include/torch/script.h:3,
from /home/alikarimi/libtorch-yolov5/include/detector.h:5,
from /home/alikarimi/libtorch-yolov5/src/detector.cpp:1:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h: In member function ‘void c10::Device::validate()’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:96:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(index_ == -1 || index_ >= 0,
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Device.h:98:5: error: ‘str’ is not a member of ‘c10’
TORCH_CHECK(!is_cpu() || index_ <= 0,
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h: In member function ‘void* c10::Allocator::raw_allocate(size_t)’:
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());
^
/home/alikarimi/libtorch-yolov5/libtorch/include/c10/core/Allocator.h:163:5: error: ‘str’ is not a member of ‘c10’
AT_ASSERT(dptr.get() == dptr.get_context());
^
.....
run: ./libtorch-yolov5 --source ../images/bus.jpg --weights ../weights/yolov5s.torchscript.pt --gpu --view-img
error:
terminate called after throwing an instance of 'torch::jit::ErrorReport'
what():
aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor):
Expected at most 12 arguments but found 13 positional arguments.
why found 13 positional arguments?
system is NVIDIA Jetson Xavier NX and docker
opencv 4.4.0
libtorch 1.6.0
cuda 10.2
yolov5 v4.0
`VideoCapture capture;
std::cout << "finish load network and open the video" << std::endl;
capture.open("/home/****/libtorch-yolov5/test.mp4");
if (!capture.isOpened())
{
std::cout << "can not open ...\n" << std::endl;
return -1;
}
Mat frame;
namedWindow("output",WINDOW_AUTOSIZE);
// set up threshold
float conf_thres = 0.4;//opt["conf-thres"].as<float>();
float iou_thres = 0.5;//opt["iou-thres"].as<float>();
for (;;)
{
capture >> frame;
//Mat pic;
if (frame.empty()) break;
//imshow("output",frame);
std::cout << "start forward" <<std::endl;
auto result = detector.Run(frame, conf_thres, iou_thres);
Demo(frame, result, class_names);
imshow("output",frame);
if (waitKey(33) >= 0) break;
}
capture.release();
cv::destroyAllWindows();
//return 0;`
然后程序运行时,加载模型后第一帧马上就会显示出检测后的图像,也能正确画出检测框,这个过程很快,但第二帧就需要几百倍的时间,在inference阶段。。之后又回复到更短的时间,每次都是这样,换了视频也是如此,我统计了时间:
----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 137 ms <-------------------------------------------------------137
post-process takes : 19 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 5 ms
inference takes : 7869 ms <--------------------------------------------------------7869
post-process takes : 24 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 3 ms
inference takes : 8 ms <------------------------------------------------------------8
post-process takes : 25 ms
start forward
----------New Frame----------
img size:1080x1920
pre-process takes : 4 ms
inference takes : 8 ms <-------------------------------------------------------------8
post-process takes : 23 ms
请问这可能是什么原因造成的呢?
注:不知道有什么作用,所以我取消掉了warm up。
torch::Tensor preds = module.forward({ imgTensor }).toTuple()->elements()[0].toTensor()
code error, because torch forward get a GenericList not a Tuple
thx!
OpenCV 4.2.0/4.4.0/4.5.0
DNN API (readNetFromTorch(model_file))
I have not been able to load a yolov5/torch format pt file (or the torchscript CUDA variant) in OpenCV without an exception being thrown. Have you tried this?
Thanks,
Rob
Why do this?
can not batch inference?
Hi, when i use your code , i find a problem. In python version code of yolov5, there is a clip_coords function in /utils/general.py(240 rows) which is to Clip bounding xyxy bounding boxes to image shape (height, width). Sometimes my predict box value may out of image size, so i add a clip coords process in your detector.cpp code. I wonder if I'm doing the right thing. Thank you for sharing.
hello,thxfor your code ,did you test the batch reference?
Beyond cmake builds, do you have any suggestions on how to build code like this for Visual Studio?
Thanks in advance...
执行到 auto detections = output.toTuple()->elements()[0].toTensor(); 出现错误中断:
inline c10::intrusive_ptrivalue::Tuple IValue::toTuple() const & {
AT_ASSERT(isTuple(), "Expected Tuple but got ", tagKind());
return toIntrusivePtrivalue::Tuple();
}
how can i export model with torch==1.3.0????
I tested your code to detect object on a video using GPU. I used GTX 1050 Ti LP and I got 3 FPS. What is it corresponding with your test result?
Thanks @yasenh
Hi, just wondering if it is possible to build without CUDA? I don't have a NVIDIA GPU so I want to do inference with CPU
有一个疑问,在导出模型时model.model[-1].export = False 但是Detect层中还有对每个特征图进行卷积操作的操作,如果导出模型不导出Detect层,模型的输出不就与训练的不一致了么?
Hi, thanks for much for creating this repo and it is really awesome.
My question is how to debug with libtorch? Now I face the problem of "segmentation fault(core dumped)" after running the warm-up.
I tried to debug with VSCode, but I could not go deep into libtorch library.
Is it compulsory to use debug version of libtorch?
自己训练了个模型,转换为s.torchscript.pt 总是报错 请问怎么回事?
output.toTuple()->elements()[0].toTensor();
hello ,
Let me post a question in your project. In detect.cpp,
// if none remain then process next image
if (det.size(1) == 0) {
continue;
}
det.size(1) == 0 should be det.size(0) == 0 Is that right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.