luoyetx / deep-landmark Goto Github PK

View Code? Open in Web Editor NEW

281.0 24.0 158.0 4.73 MB

Predict facial landmarks with Deep CNNs powered by Caffe.

License: BSD 3-Clause "New" or "Revised" License

Shell 2.01% Python 91.93% JavaScript 0.93% HTML 5.12%

deep-landmark's Introduction

deep-landmark

Predict facial landmarks with Deep CNNs powered by Caffe.

This project is a reimplementation of the paper Deep Convolutional Network Cascade for Facial Point Detection.

Data

All training data can be downloaded from here.

Download the images and extract to dataset with train and test.

modify level1.py, level2.py, level3.py under dataset to change to training datasets.

Train

./bootstrap.sh

This will first generate prototxt files for caffe models and convert training data(images and landmarks) into h5 files. Then We will train the level-1 CNNs and use the result to generate training data for level-2. And for level-2 and level-3 goes the same way.

I strongly suggest you to train every CNN seperately. It's every important to view the loss at first to see if it is stable, if not, stop the training and restart.

View Trainging Logs

I have modified Caffe source code to log the test loss over every test, and I write view_loss.py to plot the loss, all log file are under log so as plot. If the loss plot is unusual, retraining the CNN model is needed.

Caffe will log all stuffs during the network training, you can find the log file under /tmp or you can give Caffe a hit where to save the log file. If you want to see the training loss curve, you should write a program to parse the log file yourself.

Models

All model files are under model, we can modify *.template file to change the caffe network structure for every level.

Results

I have created a web page to test the project, all code are under webapp.

error of every landmark in Level-3

some test

video test

https://youtu.be/oNiAtu0erEk

References

deep-landmark's People

Contributors

Stargazers

Watchers

Forkers

nanyomy akumar14 linzhineng lgx900730 xyy19920105 lijian8 dan1900 miaomiao1989 lixuwork sshhuu110 morningsky seirouslee hyang428 mariolew chetkhatri ikvision sunxingxingtf westamine ilovecv xingyuxie ltyscu geshiming runauto amos-zq milestonesvn lochappy clcarwin mfzhang bowrein wwwanghao nianfudong zkailinzhang gzzgz wanjinchang liuhuiwisdom caomw twinsyssy1018 hengliangzhu bikong2 lauragpt zuoshaobo tfy1028 skyrino anazou cupwater iscas-lee tianboguangding ilibx aihgf jerrymomo10 pierrehao xiangzi1992 a491907335 zh3036 yiweichen04 yozey superyangwenwen zjucsxxd lovelyboy1 sudevbohra dongsontm hxl1990 dreadlord1984 soledad89 fuyangsh heeya8876 dengcy028 ahuang1900 zuowang ggtan caothu3d shaoli-huang ustcluq geogreff cellauto pengkiki xianweilv julylinan chenbangfeng walkoncross wjgaas promm zhoushiwei boosting rotorliu apprisi philoxmyu zhencang farmingyard sunmyfong ypbwithus budistman 3398930655 ggkingdom mellivorapku niumeng07 kevinlemon realzheng yuckfu zsw93

deep-landmark's Issues

how to solve the face just on the edge of the image?

@luoyetx When a human face in an image is just on the edge of the image. There is always such a problem: out of bounds.Because in the process of recognition, there are several images of cutting. Out of the image boundary Will appear.

The test with the landmark.py you applied is not accurate.

I use the landmark and the caffemodel you applied to test the image,and the result wasn't accurate.
and I see the classification.cpp caffe applied.The input testing image should minus mean_file.Do you think it's a factor of inaccuracy.

your paper and implementation is not the same?

you said absolute value rectification(tanh) and locally sharing weights is efficient in facial point detection in your paper,but i did not see them in this implementation.could you give me some clues?

about test time

thank you very much for your code sharing. Is it possible to tell me the test time of the three levels? or can you give a proportion of the time cost of different level?
thx again!!!

A question about face detection in README

I'm curious about the face detector.
In the README file as shown below, there are several red boxes that indicate the detected faces, however, I could not figure out what algorithm is used to detect these. Did you just manually selected bounding boxes? or used the author's face detector(http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm#ref)? I cannot find any information about author's face detector and wonder if you used your own face detector or directly used author's face detector.
Or if you know anything about the author's face detector, would you please let me know where I can find related paper or information?
Thanks.

a question about hdf5

I am having a very strange problem with hdf5.
it will return an "IOError: unable to create file (File accessibility: unable to open file)". Here is the offending line of code:
h5py.File('/path/to/file', 'w')
I am not trying to create a file that already exists. do you have any idea about this problem?

loss EuclideanLoss can change into SoftmaxWithLoss？thanks

Why there is a final relu in the 1_F network

Why is there a relu layer after the fc2 layer in the 1_F neural network?
Traditionally, there is no relu layer after final fully connected layer.

The test results of level2 and level3 are worse

Test Number: 3466
Time Consume: 24.520 s
FPS: 141.351
LEVEL - 1
Mean Error:
Left Eye = 0.022420
Right Eye = 0.023296
Nose = 0.030268
Left Mouth = 0.028703
Right Mouth = 0.028891
Failure:
Left Eye = 0.060300
Right Eye = 0.064628
Nose = 0.137046
Left Mouth = 0.110502
Right Mouth = 0.122331
################## Summary #####################
Test Number: 3466
Time Consume: 46.793 s
FPS: 74.071
LEVEL - 2
Mean Error:
Left Eye = 0.090333
Right Eye = 0.015689
Nose = 0.018732
Left Mouth = 0.020073
Right Mouth = 0.091233
Failure:
Left Eye = 0.983266
Right Eye = 0.031448
Nose = 0.040969
Left Mouth = 0.055972
Right Mouth = 0.973745
################## Summary #####################
Test Number: 3466
Time Consume: 71.581 s
FPS: 48.421
LEVEL - 3
Mean Error:
Left Eye = 0.089275
Right Eye = 0.014808
Nose = 0.061951
Left Mouth = 0.063852
Right Mouth = 0.089410
Failure:
Left Eye = 0.980958
Right Eye = 0.031160
Nose = 0.804385
Left Mouth = 0.866128
Right Mouth = 0.970571

Where is my wrong, thanks!

Hard to repeat the training loss or accuracy as your trained model

Hi, thanks for your sharing of the codes at first. I tried to use some new data (20,000 pics) to train the 1_F model through finetuning. I got the test output error of 0.006 during training, but I got a testing accuracy of 0.03-0.04% on five keypoints which are 0.01 more than your model. Moreover, as mentioned in other issues, I got different training loss during every new training period. So can you share some valuable experience on how to training the model ? Can I try repeated finetuning? I mean using the model generated during last finetuning to do a new finetuing. Of course the newly trained model should have a lower training loss than the previous one.

why the results are not stable？

Without changing other paramers, trainging again can get better (maybe worse) results?

how to check the hd5 data ?

After training by using bootstrap.sh, I get the wrong result .

I use HDFView the 1_F/train.hd5 file,what are the meaning of data and landmark?
(1)landmark,first six rows:
0.27142859 0.27142859 0.6571429 0.27142859 0.3 0.6142857 0.31428573 0.78571427 0.62857145 0.78571427
0.32422328 0.1958884 0.8004285 0.24294785 0.49501193 0.4644839 0.2713394 0.6769809 0.68675554 0.74030954
0.21656966 0.22970417 0.7015257 0.23771922 0.58244133 0.60628456 0.28283 0.8140927 0.68875426 0.7836232
0.2470238 0.22058824 0.6875 0.24411765 0.39583334 0.5029412 0.24107143 0.7617647 0.6279762 0.7852941
0.26837537 0.24787492 0.765999 0.2968669 0.5151268 0.6077065 0.19847062 0.6727469 0.6902065 0.7266793
0.4111675 0.30964467 0.7817259 0.20812182 0.75126904 0.6040609 0.5329949 0.8071066 0.78680205 0.7360406
(2)data's six rows
1.5018264
-1.6148947
-1.7687876
-1.742761
-2.0221322
-1.746762
“data” only has a column and "landmark" has 10 columns . Is it right?

[libprotobuf ERROR google/protobuf/text_format.cc:245]

who can help me? Thanks!

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 25:9: Message type "caffe.SolverParameter" has no field named "log_file".
F0405 10:16:20.896741 2613 upgrade_proto.cpp:1101] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse SolverParameter file: prototxt/3_LE1_solver.prototxt
*** Check failure stack trace: ***
@ 0x7f597c55bdbd google::LogMessage::Fail()
@ 0x7f597c55dc5d google::LogMessage::SendToLog()
@ 0x7f597c55b9ac google::LogMessage::Flush()
@ 0x7f597c55e57e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f597cc8f9be caffe::ReadSolverParamsFromTextFileOrDie()
@ 0x407ce4 train()
@ 0x40590c main
@ 0x7f597b569f45 (unknown)
@ 0x40617b (unknown)
Aborted (core dumped)

why is my caffemodel filenme “1_F_solver_iter_50000.caffemodel” but not “_iter_50000.caffemodel”?

How to add one more point

Hi,
I have tried your code. It works well.
I am think about if I could add more points to train the model. I am trying to add one point on nose (the center of two eyes). (Now there are 6 points) I modify the h5 data and the num_output of layer "fc2" in train and deploy prototxt.
But in process of training 1_F, the train loss keeps ~0.03, and the test error is about 0.16. And the other points are also at an incorrect position, when I run test.
Do you have any suggestion about what else I should modify if I want to add points?

Thank you.

not changed NM1 landmark accordingly, it is why 1_F is better?

Hi,
i am so appreciated with your great work.

here comes my two question:

code from dataset/level1.py

1, nm_face = cv2.resize(nm_face, (31, 39)).reshape((1, 31, 39))
nm_landmark = landmarkGt[2:, :].reshape((6))
NM_imgs.append(nm_face)
This code seems keep the landmark same between the F_face and the nm_face except the dimension. But I think nm_landmark in y_axis shoud be (landmarkGt[2:, :]-8)/31, because in landmarkGt, point_y = y/39,
point_y_new = (y-8)/31, suppose y is the abs location in (39,39) box.
Isn't why 1-F is better than 1-F+1_NM+1_EN?

2, f_bbox = bbox.subBBox(-0.05, 1.05, -0.05, 1.05)
f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

this code changed the bbox to f_bbox, but does not changed the landmark accordingly. isn't a mistake?
or just for add noise?

look forward to your reply. thanks

F1 network is not same as orignal paper

thanks for your sharing code and I have a question to ask you
1.the F1 network is not same as orignal paper,did you modify it?
2.in paper in the F1 network the first conv is CR(4,20,2,2), and 2 means weight sharing parameters, each map in conv evenly divided into 2 by 2 region ,and I don't know in F1 network where reflect 2 by 2 region?

About how you preprocess the image

I saw you preprocess input image by subtracting the mean intensity of it rather than subtracting the mean intensity of the whole training images as people usually does. What is the difference? Any reason for it?

About the modified caffe

You said that you have modified caffe source code so it would log every test iter, would you mind sharing your code? Thanks.

compilation issue because of looped dependencies

I am getting this error when I run ./database/level1.py:
Traceback (most recent call last):
File "./dataset/level1.py", line 18, in
from common import shuffle_in_unison_scary, logger, createDir, processImage
ImportError: cannot import name shuffle_in_unison_scary

how can I overcome this error?

thanks!

training 1_F network

Hi,
I downloaded the recommended data and trained 1_F network for 1 million iteration, and it got:

Those error are consistent on the training data and test data.
on a test image it is clear that the nose is set on the bounding box on the x axis:

I tried to re-train it again and then got the same kind of error for the left eye:

Everytime I re-train I get a different defective point, which mad me think it is a random phenomena related to the data generation, but I was unable to detect a bug in the augmentation.
When running the pre-trained 1_F model you provided than I am able to get perfect results as you present.
I'm running on ubuntu 14.04 python 2.7

Let me know what output can be useful in order to debug it

Thanks

2 questions about data_preprocess

1.Here you use the subbox to expand the boundingbox(not resize the boundingbox), but without changing the relative coordinates. Could you explain that?

f_bbox = bbox.subBBox(-0.05, 1.05, -0.05, 1.05)
f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

2.Here you set the bbox's width and height as bbox[1] - bbox[0] and bbox[3] - bbox[2] respectively,that means it doesn't include the last row's or col's of pixels.But in the other codes of images cropping,it includes this.

class BBox(object):
"""
Bounding Box of face
"""
def init(self, bbox):
self.left = bbox[0]
self.right = bbox[1]
self.top = bbox[2]
self.bottom = bbox[3]
self.x = bbox[0]
self.y = bbox[2]
self.w = bbox[1] - bbox[0]
self.h = bbox[3] - bbox[2]

patch = img[patch_top: patch_bottom+1, patch_left: patch_right+1]

f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

Could you help me explain these two problems ,thanks.

flip() function

where is the flip() function defined?
It is giving me the error:
"
face_flipped, landmark_flipped = flip(f_face, landmarkGt)
NameError: global name 'flip' is not defined
"
am I missing something?

thank you!
saikrishna

sh: 1: caffe: not found

I met one problem, when I run
"python2.7 prototxt/generate.py CPU

level-1

python2.7 dataset/level1.py
rm -rf log/train1.log
echo "Train LEVEL-1"
python2.7 train/level.py 1 pool_on

echo "=.=""
It give me error like "sh: 1: caffe: not found"
but when I use python, I can import caffe successfully.
could you tell me what's the problem?

hello why twice use np.random.rand() > 0.5 in level1 generate_hdf5 .thanks

        ### rotation
        if np.random.rand() > 0.5:
            face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \
                bbox.reprojectLandmark(landmarkGt), 5)
            landmark_rotated = bbox.projectLandmark(landmark_rotated)
            face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39))
            F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39)))
            F_landmarks.append(landmark_rotated.reshape(10))
            ### flip with rotation
            face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated)
            face_flipped = cv2.resize(face_flipped, (39, 39))
            F_imgs.append(face_flipped.reshape((1, 39, 39)))
            F_landmarks.append(landmark_flipped.reshape(10))
        ### rotation
        if np.random.rand() > 0.5:
            face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \
                bbox.reprojectLandmark(landmarkGt), -5)
            landmark_rotated = bbox.projectLandmark(landmark_rotated)
            face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39))
            F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39)))
            F_landmarks.append(landmark_rotated.reshape(10))
            ### flip with rotation
            face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated)
            face_flipped = cv2.resize(face_flipped, (39, 39))
            F_imgs.append(face_flipped.reshape((1, 39, 39)))
            F_landmarks.append(landmark_flipped.reshape(10))

Model blending using only 1_F is better than using all three networks in level1.

When I test level1, I found that using 1_F only is better than using 1_F + 1_EN + 1_NM. By the way, the loss of 1_EN and 1_NM is also pretty low. If I use 1_F only for the first level, I got reasonable good result from level3... I just can't understand why 1_F only would be better than blending of three networks.

ImportError: No module named common

pre-trained models?

I coudn't find any pretrained models under the model folder. Can you make them available for landmark detection ? Because the original authors only provide windows version, a linux implementation will be very much appreciated.

Thanks!

why use the flatten layer and concat layer in level 1?

Firstly,thank you for sharing your code. I run it and got good performance,but I found it not follow the paper strictly.I found in level 1, you use the flatten layer and concat layer,why? would you please tell me the reason or give me some reference source.I am a totally newer in this field, thank you very much.

When I train the model, which is entered. / Bootstrap.sh always show sh: 1: caffe not found, what is the problem? How to deal with it?

where is the functools？

Why some landmarks failed?

I run your code and only the nose and right mouth get the right result. But the rest landmarks' failure rate are 99%. It is wired.

Hard to repeat level3's mean error at some points

Hi, thanks for your code sharing! I run your code several times by using bootstrap.sh. The result is very close to your presentation in level1 and level2, but the mean error in level3 is rising randomly at some points. (eg: Nose-0.121428, Left Mouth-0.062793 level3)
Could you give me some suggestions to solve this problem? Thanks a lot！

Training and testing procedures are not the same

I have a question.In the training, the results of the previous level of training are not used for the latter level of training. However, in the test phase, the previous results are used as a reference for the face position of next test phase. My understanding is like this. Is it right?

How to set the host value when testing?

I am new about python, and I want to use app to test the image.
In app.py

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, sdch',
    'Host': 'image.baidu.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36',
}

I don't know what's that mean.
How to set the host value when testing?I use Chrome.
THX.