zzzxxxttt / pytorch_simple_centernet_45 Goto Github PK
View Code? Open in Web Editor NEWA simple pytorch implementation of CenterNet (Objects as Points)
A simple pytorch implementation of CenterNet (Objects as Points)
问一下,大佬是几个GPU训练的啊 coco 140 epochs的得训练多久啊 得好长时间吧。。
Hi guys,
I am trying training this repo on my custom dataset,
and I want to know what the meaning of self.max_objs = 128
is, is this mean that there must be no more than 128 objects in every image?
Your answer or idea will be appreciated!
Hi, Could you please add what are the steps required to train this network for a new dataset and classes?
加载离线权重时出错,resnet 和 hourglass都是一样,全部显示 No Param,文件路径配置没问题
用centernet训练自己的数据出现一个物体有多个预测框重叠在一起的现象,加入nms后还是有一个物体两个检测框的情况
when I run it in docker container,it shows that:
Epoch: 1 [2020-09-02 08:56:59,475]
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
File "train.py", line 237, in
main()
File "train.py", line 226, in main
train(epoch)
File "train.py", line 143, in train
outputs = model(batch['image'])
File "/opt/conda/envs/centernet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/centernet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/envs/centernet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/dyl/Centernet/nets/resdcn.py", line 220, in forward
x = self.conv1(x)
File "/opt/conda/envs/centernet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/centernet/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405
It will take a lot of time to train a model on big datasets. So can you share your pre-trained model weights on COCO or pascal-voc?
Hi @zzzxxxttt,
Thank you for your amazing work. I have completed my training and testing. However, I require two more results for documentation purposes.
Hope to hear from you soon. Thank you.
Cheers,
JiaLim98
Hi guys,
After reading the code of coco.py
, I find that it seems the affine transformation in the coco.py
forces the image into the size of 512x512, as
trans_img = get_affine_transform(center, scale, 0, [self.img_size['w'], self.img_size['h']])
img = cv2.warpAffine(img, trans_img, (self.img_size['w'], self.img_size['h']))
I am not sure whether this is a general trick to resize the images of different sizes.
Any idea or answer will be appreciated!
I use coco dataset, and resdcn_18_512. And follow the instructions. "python train.py --log_name coco_resdcn18_512 --dataset coco --arch resdcn_18 --lr 5e-4 --lr_step 90,120 --batch_size 24 --num_epochs 100 --num_workers 8 "
After 100 epochs. I get a very poor performance. Am I missing some important steps? Thanks for everyone.
大家好,我是个新手。我使用了作者的预设指令,如上,经过100epochs后的IOU非常的低,请问我有做错哪个重要的步骤吗?感谢赐教
你好,请问下,我验证时为什么输出的结果都是-1,如下所示:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
I train my custom dataset using this pytorch centernet, but the following error was raised:
RuntimeError: The size of tensor a (152) must match the size of tensor b (150) at non-singleton dimension 3
Traceback (most recent call last):
File "train.py", line 238, in <module>
main()
File "train.py", line 227, in main
train(epoch)
File "train.py", line 149, in train
hmap_loss = _neg_loss(hmap, batch['hmap'])
File "pytorch_simple_CenterNet_45/utils/losses.py", line 47, in _neg_loss
pos_loss = torch.log(pred) * torch.pow(1 - pred, 2) * pos_inds
RuntimeError: The size of tensor a (152) must match the size of tensor b (150) at non-singleton dimension 3
Could you help me to fix it? Thank you.
Hi @zzzxxxttt,
I have already get this repo to work on Google Colab. However, when I try to install at another linux computer. It shows this when I try to compile DCNv2.
Do you have any idea why? Hope to hear from you soon.
Best regards,
JiaLim98
我根据README.md中提供的命令行接口对PascalVOC数据集进行训练,最后在validation中的mAP只有10%,其中关于w_h_的loss下降到10左右就不降了,而README.md提供的相同的已经训练好的模型参数的关于w_h_的loss在2左右,为什么我的loss降不下去呢?
python train.py --log_name pascal_resdcn18_384_dp \
--dataset pascal \
--arch resdcn_18 \
--img_size 384 \
--lr 1.25e-4 \
--lr_step 45,60 \
--batch_size 32 \
--num_epochs 70 \
--num_workers 10
RuntimeError: cublas runtime error : the GPU program failed to execute at /opt/conda/conda-bld/pytorch_1556653215914/work/aten/src/THC/THCBlas.cu:425
how to solve this error???
help !!
如何计算fps,代码中有体现吗?
老哥,想问你一下,如果我想修改或者替换backbone,那我应该怎么做呢?我是指有没有什么格式上的要求,比如模型的返回值要包含什么之类的?
Hi @zzzxxxttt,
Thank you for your interesting work.
I followed everything about setting up this repo. However, when I start to train, it gives me this error. Do you know what's wrong?
Hope to hear from you soon.
Regards,
JiaLim98
Are there any training details I need to pay attention to?I reach 70% mAP using flip test. I'm looking forward to your reply.
您好,这个报这个错误该如何解决No such file or directory: './data/voc/VOCdevkit/VOC2007/annotations_cache'
Disable cudnn batch normalization. Open torch/nn/functional.py and find the line with torch.batch_norm and replace the torch.backends.cudnn.enabled with False.
The official centernet repo is based on cornernet repo, the code for gaussian radius is wrong, please refer to this issue
princeton-vl/CornerNet#110
Duankaiwen/CenterNet#47
Epoch: 1 [2020-08-31 11:13:22,017]
Traceback (most recent call last):
File "train.py", line 241, in
main()
File "train.py", line 229, in main
train(epoch)
File "train.py", line 139, in train
for batch_idx, batch in enumerate(train_loader):
IndexError: index 14 is out of bounds for axis 0 with size 2
在pascal_voc数据集上训练时出现维度问题
if i want to inference a single image and show the results on the image,how can I achieve it quickly?I would appreciate it if you can share the inference code.thank u
RT...get_border函数的意义是什么?仿射变换前后的三个点是怎么取的。。。没看懂这两个地方,这个方法和resize比有什么改进吗
hi,Thank you for the project, I train my custom data get the hmap_loss= 9.20850 alway the same,What could be the cause?
你好,python -m torch.distributed.launch --nproc_per_node NUM_GPUS train.py 这里我为NUM_GPUS为4的话,自动用的是前4张卡,那我如果想用后4张卡该怎么设置呢?
line 167, in getitem
w_h_[k] = 1. * w, 1. * h
IndexError: index 100 is out of bounds for axis 0 with size 100
I was not able to download "checkpoint.t7" from your referred path?
Could you provide me checkpoint.t7 file? since I'm from overseas I can't create account in "https://pan.baidu.com/s/1tp9-5CAGwsX3VUSdV276Fg" ?
在看代码的时候,发现test阶段需要用到soft_nms,但是CenterNet不是号称不需要nms做后处理的嘛?我记得原文中,说是找极值,然后用前100个极值来做框。其实,我就是对这一部分有点云里雾里的。不太能理解,如果写死了取100个极值,岂不是意味着一张图就有100个框?但是,实际上一张图可能是只有3个框,4个框之类的。
作者能不能把这一步写出来呢?感谢!!!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.