Giter VIP home page Giter VIP logo

pyramidbox's People

Contributors

goingqs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyramidbox's Issues

Did you use head loss same as the author?

The paper downsample the input image region correcbonding to head anchor by 2 to match the face gt, but I cannot find out where you did this operation in your code. Did I miss something? Please comment @Goingqs.
res

A inf bug, please help me out.

root@41a655357ffb:/app# python3 train.py
Initializing weights...
Loading Dataset...
Training SSD on WiderFace
/usr/local/lib/python3.5/dist-packages/torch/autograd/functions/tensor.py:447: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior.
return tensor.masked_fill
(mask, value)
front and back Timer: 10.275606155395508 sec.
iter 0 || Loss: 90.8623 ||
Loss conf: 51.309993743896484 Loss loc: 11.424317359924316
Loss head conf: 48.754676818847656 Loss head loc: 7.501279354095459
lr: 0.0001
Saving state, iter: 0
front and back Timer: 0.7664861679077148 sec.
iter 10 || Loss: 62.3223 ||
Loss conf: 12.217093467712402 Loss loc: 17.27483367919922
Loss head conf: 43.204200744628906 Loss head loc: 22.456457138061523
lr: 0.0001
front and back Timer: 0.7011480331420898 sec.
iter 20 || Loss: 25.3138 ||
Loss conf: 13.556425094604492 Loss loc: 6.718225955963135
Loss head conf: 4.488845348358154 Loss head loc: 5.589457988739014
lr: 0.0001
front and back Timer: 0.7081215381622314 sec.
iter 30 || Loss: 16.7039 ||
Loss conf: 4.1354851722717285 Loss loc: 6.339954853057861
Loss head conf: 6.332630157470703 Loss head loc: 6.124255180358887
lr: 0.0001
front and back Timer: 0.7141270637512207 sec.
iter 40 || Loss: 12.9531 ||
Loss conf: 2.785682201385498 Loss loc: 5.851232528686523
Loss head conf: 3.0570006370544434 Loss head loc: 5.575360298156738
lr: 0.0001
front and back Timer: 0.7503578662872314 sec.
iter 50 || Loss: 23.3493 ||
Loss conf: 13.566965103149414 Loss loc: 5.035085678100586
Loss head conf: 4.751973628997803 Loss head loc: 4.742441654205322
lr: 0.0001
/app/utils/augmentations.py:266: RuntimeWarning: divide by zero encountered in double_scalars
ratio = float(target_anchor) / rand_Side
front and back Timer: 0.7871015071868896 sec.
iter 60 || Loss: 12.1471 ||
Loss conf: 3.0819289684295654 Loss loc: 5.024109363555908
Loss head conf: 3.064208507537842 Loss head loc: 5.017943382263184
lr: 0.0001
Traceback (most recent call last):
File "train.py", line 241, in
train()
File "train.py", line 175, in train
images, targets = next(batch_iterator)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 281, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 301, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
OverflowError: Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 55, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py", line 55, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/app/data/widerface.py", line 99, in getitem
im, gt, h, w = self.pull_item(index)
File "/app/data/widerface.py", line 119, in pull_item
img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])
File "/app/utils/augmentations.py", line 509, in call
return self.augment(img, boxes, labels)
File "/app/utils/augmentations.py", line 53, in call
img, boxes, labels = t(img, boxes, labels)
File "/app/utils/augmentations.py", line 269, in call
if int(height * ratio * width * ratio) > self.maxSize*self.maxSize:
OverflowError: cannot convert float infinity to integer

Did you use loss_head and loss_head_loc in your public model?

I noticed that you do not use body loss, so why keep head loss? I cannot find the way how you match the head anchor to gt like the paper, any clues? Or you just use LFPN, data aug, and max-out to train a typical face detector without caring about contextual information like the paper? Is the inf bug caused by the head loss? Would you kindly give me a hint? @Goingqs

shape error

RuntimeError: The shape of the mask [4, 34125] at index 0 does not match the shape of the indexed tensor [136500, 1] at index 0

hello, i met this error during training, can you help me? thanks.

GPU memory issue.

@hdjsjyl @Goingqs Using batchsize 32, I got the following error when run train.py, I use two 1080TI and 32G memory, can you help me out please?
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I understand that it is a gpu memory issue, but I use the same setting as yours, why can't I train with batch 32? I change batch size to 16, and now it works for the first iteration, but failed next, got the following error:
root@bfffba56f59a:/app# python3 train.py
Initializing weights...
Loading Dataset...
Training SSD on WiderFace
/usr/local/lib/python3.5/dist-packages/torch/autograd/functions/tensor.py:447: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior.
return tensor.masked_fill
(mask, value)
front and back Timer: 6.264386892318726 sec.
iter 0 || Loss: 59.1287 ||
Loss conf: 30.951845169067383 Loss loc: 13.884753227233887
Loss head conf: 20.91655158996582 Loss head loc: 7.667636871337891
lr: 0.001
Saving state, iter: 0
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 240, in
train()
File "train.py", line 185, in train
out = net(images)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/app/pyramid.py", line 211, in forward
c3 = self.layer2(c2) #S8
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/app/pyramid.py", line 84, in forward
out = F.relu(self.bn2(self.conv2(out)),inplace=True)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

Why not using head box predict? And why not using face, head, body predict on same layers?

https://github.com/Goingqs/PyramidBox/blob/master/pyramid.py#L155, here using face predict in range(6);
https://github.com/Goingqs/PyramidBox/blob/master/pyramid.py#L164, here using head predict in range(5);
https://github.com/Goingqs/PyramidBox/blob/master/pyramid.py#L173, here using body predict in range(4), and the body predict code is commented, why is that?
Why not just using same layer to predict face, head, body?

what gamma should be in learning rate decay

While I train the model, it report in 8e4 steps
Traceback (most recent call last):
File "train.py", line 241, in
train()
File "train.py", line 160, in train
adjust_learning_rate(optimizer, args.gamma, step_index)
AttributeError: 'Namespace' object has no attribute 'gamma'
I checked the code and found the train.py lacked the gamma paramters.
Could you tell us how large the gamma should be here?
I really appreciate it if you can give use some advice

Did you missed softmax function in pyramid.py when train? I think that is the reason caused the inf error.

I print out the x.max value in box_utils.py, which seems invalid? Did you miss sth? Can you check the code with your local version of the box_utils.py and multibox_loss.py? Thank you very much @Goingqs .

root@5cd6eaa6b882:/app# python3 train.py 
Resuming training, loading /app/weights/resnet50.pth...
Loading weights into state dict...
Finished!
Loading Dataset...
Training SSD on WiderFace
Unexpected end of /proc/mounts line `overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GPIJIOK2KBYQP7A7TWB3AG4C5R:/var/lib/docker/overlay2/l/HYHHDRAD47BRS7TOLC3DMJFGS6:/var/lib/docker/overlay2/l/VOEQ7X6D4LPPCPGL3QXAN4N7GV:/var/lib/docker/overlay2/l/EZCO27VM3STX5Y7W6MWSUGJWBY:/var/lib/docker/overlay2/l/HS3DBFS45ITIEOVQAOCNGIVHFU:/var/lib/docker/overlay2/l/VIP4R7IWAIXVSQEZX2UYBG2U4I:/var/lib/docker/overlay2/l/7YIED5WPH6PRFXHEFI7QO6QP6V:/var/lib/docker/overlay2/l/ATJ2EAYUF5A2N7SKD3QJDPX4QZ:/var/lib/docker/overlay2/l/UW62NT7UIVLPY'
Unexpected end of /proc/mounts line `GRJMHNAXM3KRD:/var/lib/docker/overlay2/l/VFLIQSZXBOVVVDAMILUZZBALZR:/var/lib/docker/overlay2/l/3CNOC6EZA53WJ2IUWP2FDLVPIL:/var/lib/docker/overlay2/l/VYAKZQTGVZI5DI6E5MYHSBQ74N:/var/lib/docker/overlay2/l/JEOVNGVFESTF26OPVFMQKGFLLY:/var/lib/docker/overlay2/l/MFLAX63H4KHKDUIDHTNSRXK77X:/var/lib/docker/overlay2/l/23TWJ4G72S6T6VGAI4XFBJOVER:/var/lib/docker/overlay2/l/WB3G3ZRBQNWQRHN7ZTBRHAS4YC:/var/lib/docker/overlay2/l/EAIMBSOGEWF4LR37TGW5ZY6ZV5:/var/lib/docker/overlay2/l/E6YFTVR3TXU2FHIAT42SBYWSPX:/var/lib/do'
7.3556623458862305
/usr/local/lib/python3.5/dist-packages/torch/autograd/_functions/tensor.py:447: UserWarning: mask is not broadcastable to self, but they have the same number of elements.  Falling back to deprecated pointwise behavior.
  return tensor.masked_fill_(mask, value)
9.192743301391602
[0] Mon Sep 17 11:00:53 2018, lr: 0.001, Loss: 27.25711441040039, Loss conf: 13.937179565429688, Loss loc: 5.17810583114624, Loss head conf: 11.115198135375977, Loss head loc: 5.168460369110107
Saving state, iter: 0
48.86750030517578
25.718935012817383
45.029014587402344
24.79298210144043
54.45554733276367
13.748291015625
128.50917053222656
20.55414581298828
198.4980010986328
20.904644012451172
157.7447052001953
18.93536376953125
115.79173278808594
22.67245101928711
170.63449096679688
39.02762222290039
353.6111145019531
77.53874969482422
140.64254760742188
68.49005126953125
[10] Mon Sep 17 11:01:11 2018, lr: 0.001, Loss: 14.15038776397705, Loss conf: 5.292577743530273, Loss loc: 4.439815521240234, Loss head conf: 4.067444324493408, Loss head loc: 4.768544673919678
67.7098159790039
53.532981872558594
271.6030578613281
72.82318878173828
576.6469116210938
132.18527221679688
1238.272705078125
243.9960479736328
1225.3575439453125
140.73471069335938
1127.5462646484375
43.21701431274414
7262.974609375
4437.2763671875
9383.4140625
7337.30712890625
20645.720703125
16939.15234375
22946.884765625
19082.61328125
[20] Mon Sep 17 11:01:28 2018, lr: 0.001, Loss: 93.56636810302734, Loss conf: 27.614913940429688, Loss loc: 6.290896892547607, Loss head conf: 110.0365219116211, Loss head loc: 9.284590721130371
19387.001953125
19364.396484375
19352.09375
20078.130859375
17651.787109375
18933.884765625
17581.36328125
18586.98046875
15218.2470703125
14269.748046875
12470.7529296875
10245.810546875
13091.4541015625
7110.84765625
13197.767578125
4382.61572265625
10534.1533203125
9500.7861328125
6798.1884765625
11889.32421875
[30] Mon Sep 17 11:01:45 2018, lr: 0.001, Loss: 16.44256591796875, Loss conf: 0.5192383527755737, Loss loc: 4.5175395011901855, Loss head conf: 18.143779754638672, Loss head loc: 4.667797088623047
10810.2294921875
26774.283203125
10667.640625
37191.828125
4724.7783203125
20590.34375
21514.845703125
5066.765625
53443.92578125
13976.7451171875
57733.58203125
13083.5732421875
81190.5703125
13683.9833984375
654372.5
51571.29296875
1796330.5
162691.5625
3683897.75
268608.125

pytorch0.3 RuntimeError: value cannot be converted to type float without overflow: inf???

/home/lshi22/.local/lib/python3.5/site-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.23) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Initializing weights...
Loading Dataset...
Training SSD on WiderFace
/home/lshi22/.local/lib/python3.5/site-packages/torch/autograd/functions/tensor.py:447: UserWarning: mask is not broadcastable to self, but they have the same number of elements. Falling back to deprecated pointwise behavior.
return tensor.masked_fill(mask, value)
front and back Timer: 22.854674100875854 sec.
iter 0 || Loss: 71.4618 ||
Loss conf: 38.856040954589844 Loss loc: 13.214371681213379
Loss head conf: 28.07379722595215 Loss head loc: 10.708907127380371
lr: 0.001
Saving state, iter: 0
Traceback (most recent call last):
File "train.py", line 241, in
train()
File "train.py", line 188, in train
loss_l, loss_c = criterion(tuple(out[0:3]), targets)
File "/home/lshi22/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/home/lshi22/PyramidBox/layers/modules/multibox_loss.py", line 102, in forward
loss_c = log_sum_exp(batch_conf) - batch_conf.gather(1, conf_t.view(-1, 1))
File "/home/lshi22/PyramidBox/layers/box_utils.py", line 246, in log_sum_exp
return torch.log(torch.sum(torch.exp(x-x_max), 1, keepdim=True)) + x_max
RuntimeError: value cannot be converted to type float without overflow: inf

@Goingqs I think it is an error for pytorch0.3, how you train your model in pytorch0.3 version?

So there is 4 invalid imgs in the train list?

@Goingqs I remove the following four imgs from the list:

/data/WIDER_train/images/0--Parade/0_Parade_Parade_0_452.jpg
/data/WIDER_train/images/2--Demonstration/2_Demonstration_Political_Rally_2_444.jpg
/data/WIDER_train/images/39--Ice_Skating/39_Ice_Skating_iceskiing_39_380.jpg
/data/WIDER_train/images/46--Jockey/46_Jockey_Jockey_46_576.jpg
num_invalid = 4

Is that true? But still cannot train your model.

Trouble downloading pretrained model.

Dear Author,

I am having trouble downloading pretrained model that is provided in baidu.
Can you please release same model in Google Drive.

Thank you.

Question about PyramidBox loss

formula(5) in original paper shows that we need to define new target box coordinates of head. however, when I check your code, I can not find the definition.

Can't download weights

Hi,

first of all, thanks for the great code! Very helpful.

Unfortunately, I can't read chinese, so I can't download your weights (I only get to some kind of downloader which I can't install since I can't read anything). Could you maybe also upload the weights somewhere else? Or send them to me via whatever tool? This would be great, thanks a lot!
(If you could send it to me, I could also upload it on AWS s3 and provide the link here)

Thanks a lot!

Can you give a Dockerfile that can run your code perfectly?

I use pytorch 0.3.1, but met lots of weired bug. like:
cuda Runtime Error (77): an illegal memory access was encountered
OverflowError: cannot convert float infinity to integer
RuntimeError: value cannot be converted to type float without overflow: inf

Size supporting

Hello, i want input images with size bigger than 640, does it support now? What should i do if i want rewrite code. thank you.

Cannot use the model

Thanks for your codes
But it seems that the model downloaded works bad with pytorch 0.4.1 - cpu
"Error(s) in loading state_dict for PyramidBox: Unexpected key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight..."
I don't know how to fix it
Hope you can give some advice

about priors/anchors

@Goingqs HI
code中,设置了2组priors,一组针对face,一组针对head
face_priors:6个预测分支对应6组,尺度分别为16,32,64,128,256,512.
head_priors:5个预测分支对应5组,尺度分别为16,32,64,128,256.
比如说:
对于face_priors,第1个分支的第1个anchor,对应回原图(640的尺寸),坐标为(2,2,16,16)
对于head_priors,第2个分支的第1个anchor,对应回原图(640的尺寸),坐标为(4,4,16,16)

您这样设计的目的是什么呢??
自己的理解:比如在第1个特征分支上,对于face_anchor,第1个anchor坐标为(2,2,16,16),那么为了体现这个face_anchor对应的head_anchor,在第2个特征分支上坐标应该为(2,2,32,32)吧.为什么是(4,4,16,16)呢??
或者说自己本身这样的理解就有问题.

麻烦了!!

augmentations error

1)when only run augmentations.py to augment data, cause error. when resize some img 0
2)when change resize640-->1024, also cause error. when resize some img 0

WIDER FACE list

Thanks for your code. It is very good. Can you share your wider face list? Because when I use my wider face list, some pictures show that height or width is equal to zero.

shape not match

PyramidBox/layers/modules/multibox_loss.py", line 105, in forward
loss_c[pos] = 0 # filter out pos boxes for now
RuntimeError: The shape of the mask [4, 34125] at index 0 does not match the shape of the indexed tensor [136500, 1] at index 0

I set the batchsize to 4 and I use my own dataset. When i run train.py I occur this problem. why does it happen?

train error with CPU

Great work!
I use your code to train with CPU and failed, the error is as following:
(my setting: python 2.7, opencv 3, pytorch 0.4)

File "xxx/PyramidBox/layers/box_utils.py", line 179, in matchNoBipartite
best_truth_overlap = best_truth_overlap.cuda()
RuntimeError: torch.cuda.FloatTensor is not enabled.

shape error

Loading Dataset...
Training SSD on WiderFace
Traceback (most recent call last):
File "train.py", line 241, in
train()
File "train.py", line 185, in train
out = net(images)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 59, in forward
return self.module(*inputs[0], **kwargs[0])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "/home/lei/PyramidBox/pyramid.py", line 241, in forward
prior_boxs.append(self.priorbox.forward(idx, f_layer.shape[3], f_layer.shape[2]))
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 63, in getattr
raise AttributeError(name)
AttributeError: shape

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.