Comments (13)
from face-detection-with-mobilenet-ssd.
@hongrui16 May be there are someting wrong with your dataloder. Too little information for me to point out specific problems.
from face-detection-with-mobilenet-ssd.
@bruceyang2012
thanks you very much,
make further explanation.
i tried to train the model in keras.
i converted face_train.ipynb to .py file. certainly, i made some modification, but it did matter little. after revising, the train code could run.
What matters, i revised the image loading part, face_generator.py, in order to load the pascalvoc dataset directly.
But i am sure that ‘’yield (np.array(batch_X), y_true)‘’ is in right format.
https://github.com/bruceyang2012/Face-detection-with-mobilenet-ssd/blob/master/face_generator.py#L1098
i think the key parts are model structure, loss calculation, box generator and box encoder. i did nothing changed.
i also rewrote the [email protected] calculation code in order to evaluate the model performance.
I want to confirm the key parts listed as above are correct. Bruce.
after that, i can narrow the scope to find out the error location.
In all, generous Bruce, the key parts are correct, right?
from face-detection-with-mobilenet-ssd.
@bruceyang2012
this is my email address: [email protected]
could i add your wechat?
if possible, could you send your wechat to my email box.
from face-detection-with-mobilenet-ssd.
from face-detection-with-mobilenet-ssd.
@hongrui16 I think you should check the parts you have changed. I haven't done any experiment on voc or coco. I think you can first do an experiment with a subset of these data and visualize your label on the image to ensure that your input is correct. If you haven't made any changes, the model part of the code should be fine.
from face-detection-with-mobilenet-ssd.
@bruceyang2012 , hello, Generous Bruce, thank you very much.
i uncommented the sereval last lines and added some printed information in generator function in face_generator.py as following:
(in order to see the input format clearly, i set batch_size = 4)
if train:
if diagnostics:
yield (np.array(batch_X), y_true, batch_y, this_filenames, original_images, original_labels)
else:
#print "============================", np.ndim((np.array(batch_X), y_true))
print "Batch Shape :", np.array(batch_X).shape
print 'this_filenames:', this_filenames
print "batch_y:", batch_y
print 'y_true.shape:', y_true.shape
yield (np.array(batch_X), y_true)
i got the input information printed as following:
Batch Shape : (4, 300, 300, 3)
this_filenames: ['VOC2012/JPEGImages/2011_001514.jpg', 'VOC2012/JPEGImages/2011_001146.jpg', 'VOC2012/JPEGImages/2009_005015.jpg', 'VOC2012/JPEGImages/2011_003545.jpg']
batch_y: [array([[ 1, 41, 231, 90, 190]]), array([[ 6, 14, 282, 15, 285],
[ 6, 277, 299, 106, 154]]), array([[ 4, 97, 263, 31, 156],
[ 4, 39, 57, 117, 151]]), array([[ 15, 0, 194, 34, 300]])]
y_true.shape: (4, 2268, 33)
the image has been resize to (300,300)
Apart from y_true value, i have confirmed the batch_y vaule and format, and they are all correct.
So, it seems that i have loaded the data correctly.
y_true has been encoded in box encoder, and you also mentioned that this part are correct and i did nothing changed. So we can assume that y_true value is correct.
The following information was the output during training:
Epoch 80/100
lr remains 0.000572594814003
321/321 [==============================] - 123s 384ms/step - loss: 0.0753 - acc: 0.3788 - val_loss: 0.1655 - val_acc: 0.3933
Epoch 00080: val_loss did not improve from 0.14949
Epoch 81/100
lr changed to 0.000526787228882
321/321 [==============================] - 127s 397ms/step - loss: 0.0735 - acc: 0.3794 - val_loss: 0.1681 - val_acc: 0.3966
Epoch 00081: val_loss did not improve from 0.14949
Epoch 82/100
lr remains 0.000526787247509
321/321 [==============================] - 128s 398ms/step - loss: 0.0718 - acc: 0.3785 - val_loss: 0.1646 - val_acc: 0.3924
Epoch 00082: val_loss did not improve from 0.14949
In the training setting:
monitor='val_loss', steps_per_epoch = max(1,int(n_train_samples//batch_size))
I fact, Epoch is more than 82, because i have loaded the pretrained weight.
we can see from the above, the acc and val_acc are still very low and cannot be imporved.
In test part, i set
NMS_CONF = 0.4
NMS_IOU = 0.6
The final result is as following:
num_tp [ 0. 685. 260. 605. 253. 220. 289. 715. 1084. 531. 221. 178.
1035. 360. 331. 4981. 125. 479. 226. 422. 316.]
num_fp [ 0. 1125. 904. 577. 650. 315. 1088. 1348. 1615. 939. 397. 293.
1631. 853. 754. 5541. 266. 987. 1025. 684. 545.]
num_pos [ 0. 527. 456. 680. 539. 789. 377. 1282. 793. 1461. 398. 392.
911. 472. 473. 9375. 645. 515. 342. 393. 512.]
[email protected]: 0.3449509485065984
num_tp[i]: number of true positive boxes in classes[i]
num_fp[i]: number of false positive boxes in classes[i]
num_pos[i]: number of positive ground truth boxes in classes[i]
classes[0] stands for background
Q1: It seems that the train codes work correctly, right? @bruceyang2012
We also can see from above results, the [email protected] is very low,
but in another project, https://github.com/chuanqi305/MobileNet-SSD, they have got a much better result.
VOC0712 mAP=0.727.
Q2: if codes have been implemented correctly, how can i improve my result? @bruceyang2012
Thank you very very very much.
from face-detection-with-mobilenet-ssd.
@bruceyang2012 many thanks
from face-detection-with-mobilenet-ssd.
@hongrui16 This code is only a demo for learning, and I did not do more experiments to evaluate it. If you want to use this code to get better results, you can refer to the setting methods of other project, such as data augmentation, anchor setting, learning rate and so on. I don't have time to do these experiments now. You'll have to try them yourself.
from face-detection-with-mobilenet-ssd.
@bruceyang2012 Okey, many thanks
from face-detection-with-mobilenet-ssd.
@bruceyang2012
Finially, i found where there was the problem.
Indeed, there is an error in the box encoder code in 'ssd_box_encode_decode_utils.py' . Because the box outout format setting is ''box_output_format = ['class_id', 'xmin', 'xmax', 'ymin', 'ymax'])'', the right code should be as following:
if self.limit_boxes:
x_coords = boxes_tensor[:,:,:,[0, 1]]
x_coords[x_coords >= self.img_width] = self.img_width - 1
x_coords[x_coords < 0] = 0
boxes_tensor[:,:,:,[0, 1]] = x_coords
y_coords = boxes_tensor[:,:,:,[2, 3]]
y_coords[y_coords >= self.img_height] = self.img_height - 1
y_coords[y_coords < 0] = 0
boxes_tensor[:,:,:,[2, 3]] = y_coords
if self.normalize_coords:
boxes_tensor[:, :, :, :2] /= self.img_width
boxes_tensor[:, :, :, 2:] /= self.img_height
After i made these modifications, it worked well.
from face-detection-with-mobilenet-ssd.
@hongrui16 You are right, wider face and VOC have different labeling formats. I forgot that.
from face-detection-with-mobilenet-ssd.
@bruceyang2012 got it. thanks.
from face-detection-with-mobilenet-ssd.
Related Issues (14)
- How can I run this pro with a pretrained model? HOT 8
- Error while training HOT 2
- bad bbox classification results HOT 1
- There is an error in class AnchorBoxes. HOT 1
- Requirements HOT 1
- How can I reduce the size of the model HOT 1
- hi, error in executing:face_train.ipynb
- 关于性能 HOT 1
- Translate HOT 1
- I find one error in your code HOT 6
- widerface training HOT 2
- Two questions HOT 4
- wider_extract.py HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from face-detection-with-mobilenet-ssd.