Giter VIP home page Giter VIP logo

deep-landmark's Introduction

deep-landmark

Predict facial landmarks with Deep CNNs powered by Caffe.

This project is a reimplementation of the paper Deep Convolutional Network Cascade for Facial Point Detection.

Data

All training data can be downloaded from here.

Download the images and extract to dataset with train and test.

modify level1.py, level2.py, level3.py under dataset to change to training datasets.

Train

./bootstrap.sh

This will first generate prototxt files for caffe models and convert training data(images and landmarks) into h5 files. Then We will train the level-1 CNNs and use the result to generate training data for level-2. And for level-2 and level-3 goes the same way.

I strongly suggest you to train every CNN seperately. It's every important to view the loss at first to see if it is stable, if not, stop the training and restart.

View Trainging Logs

I have modified Caffe source code to log the test loss over every test, and I write view_loss.py to plot the loss, all log file are under log so as plot. If the loss plot is unusual, retraining the CNN model is needed.

Caffe will log all stuffs during the network training, you can find the log file under /tmp or you can give Caffe a hit where to save the log file. If you want to see the training loss curve, you should write a program to parse the log file yourself.

Models

All model files are under model, we can modify *.template file to change the caffe network structure for every level.

Results

I have created a web page to test the project, all code are under webapp.

error of every landmark in Level-3

some test

video test

https://youtu.be/oNiAtu0erEk

References

  1. Caffe
  2. Deep Convolutional Network Cascade for Facial Point Detection

deep-landmark's People

Contributors

luoyetx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-landmark's Issues

The test with the landmark.py you applied is not accurate.

I use the landmark and the caffemodel you applied to test the image,and the result wasn't accurate.
and I see the classification.cpp caffe applied.The input testing image should minus mean_file.Do you think it's a factor of inaccuracy.

your paper and implementation is not the same?

you said absolute value rectification(tanh) and locally sharing weights is efficient in facial point detection in your paper,but i did not see them in this implementation.could you give me some clues?

about test time

thank you very much for your code sharing. Is it possible to tell me the test time of the three levels? or can you give a proportion of the time cost of different level?
thx again!!!

A question about face detection in README

I'm curious about the face detector.
In the README file as shown below, there are several red boxes that indicate the detected faces, however, I could not figure out what algorithm is used to detect these. Did you just manually selected bounding boxes? or used the author's face detector(http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm#ref)? I cannot find any information about author's face detector and wonder if you used your own face detector or directly used author's face detector.
Or if you know anything about the author's face detector, would you please let me know where I can find related paper or information?
Thanks.

screenshot from 2016-05-12 22 37 37

a question about hdf5

I am having a very strange problem with hdf5.
it will return an "IOError: unable to create file (File accessibility: unable to open file)". Here is the offending line of code:
h5py.File('/path/to/file', 'w')
I am not trying to create a file that already exists. do you have any idea about this problem?

The test results of level2 and level3 are worse

Test Number: 3466
Time Consume: 24.520 s
FPS: 141.351
LEVEL - 1
Mean Error:
Left Eye = 0.022420
Right Eye = 0.023296
Nose = 0.030268
Left Mouth = 0.028703
Right Mouth = 0.028891
Failure:
Left Eye = 0.060300
Right Eye = 0.064628
Nose = 0.137046
Left Mouth = 0.110502
Right Mouth = 0.122331
################## Summary #####################
Test Number: 3466
Time Consume: 46.793 s
FPS: 74.071
LEVEL - 2
Mean Error:
Left Eye = 0.090333
Right Eye = 0.015689
Nose = 0.018732
Left Mouth = 0.020073
Right Mouth = 0.091233
Failure:
Left Eye = 0.983266
Right Eye = 0.031448
Nose = 0.040969
Left Mouth = 0.055972
Right Mouth = 0.973745
################## Summary #####################
Test Number: 3466
Time Consume: 71.581 s
FPS: 48.421
LEVEL - 3
Mean Error:
Left Eye = 0.089275
Right Eye = 0.014808
Nose = 0.061951
Left Mouth = 0.063852
Right Mouth = 0.089410
Failure:
Left Eye = 0.980958
Right Eye = 0.031160
Nose = 0.804385
Left Mouth = 0.866128
Right Mouth = 0.970571

Where is my wrong, thanks!

Hard to repeat the training loss or accuracy as your trained model

Hi, thanks for your sharing of the codes at first. I tried to use some new data (20,000 pics) to train the 1_F model through finetuning. I got the test output error of 0.006 during training, but I got a testing accuracy of 0.03-0.04% on five keypoints which are 0.01 more than your model. Moreover, as mentioned in other issues, I got different training loss during every new training period. So can you share some valuable experience on how to training the model ? Can I try repeated finetuning? I mean using the model generated during last finetuning to do a new finetuing. Of course the newly trained model should have a lower training loss than the previous one.

how to check the hd5 data ?

After training by using bootstrap.sh, I get the wrong result .
4febec6cd3fa9e75d2a28023c0b0bc76
b656dd97cb9670160ad455223cd61a6c

I use HDFView the 1_F/train.hd5 file,what are the meaning of data and landmark?
(1)landmark,first six rows:
0.27142859 0.27142859 0.6571429 0.27142859 0.3 0.6142857 0.31428573 0.78571427 0.62857145 0.78571427
0.32422328 0.1958884 0.8004285 0.24294785 0.49501193 0.4644839 0.2713394 0.6769809 0.68675554 0.74030954
0.21656966 0.22970417 0.7015257 0.23771922 0.58244133 0.60628456 0.28283 0.8140927 0.68875426 0.7836232
0.2470238 0.22058824 0.6875 0.24411765 0.39583334 0.5029412 0.24107143 0.7617647 0.6279762 0.7852941
0.26837537 0.24787492 0.765999 0.2968669 0.5151268 0.6077065 0.19847062 0.6727469 0.6902065 0.7266793
0.4111675 0.30964467 0.7817259 0.20812182 0.75126904 0.6040609 0.5329949 0.8071066 0.78680205 0.7360406
(2)data's six rows
1.5018264
-1.6148947
-1.7687876
-1.742761
-2.0221322
-1.746762
“data” only has a column and "landmark" has 10 columns . Is it right?

[libprotobuf ERROR google/protobuf/text_format.cc:245]

who can help me? Thanks!

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.SolverParameter: 25:9: Message type "caffe.SolverParameter" has no field named "log_file".
F0405 10:16:20.896741 2613 upgrade_proto.cpp:1101] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse SolverParameter file: prototxt/3_LE1_solver.prototxt
*** Check failure stack trace: ***
@ 0x7f597c55bdbd google::LogMessage::Fail()
@ 0x7f597c55dc5d google::LogMessage::SendToLog()
@ 0x7f597c55b9ac google::LogMessage::Flush()
@ 0x7f597c55e57e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f597cc8f9be caffe::ReadSolverParamsFromTextFileOrDie()
@ 0x407ce4 train()
@ 0x40590c main
@ 0x7f597b569f45 (unknown)
@ 0x40617b (unknown)
Aborted (core dumped)

How to add one more point

Hi,
I have tried your code. It works well.
I am think about if I could add more points to train the model. I am trying to add one point on nose (the center of two eyes). (Now there are 6 points) I modify the h5 data and the num_output of layer "fc2" in train and deploy prototxt.
But in process of training 1_F, the train loss keeps ~0.03, and the test error is about 0.16. And the other points are also at an incorrect position, when I run test.
Do you have any suggestion about what else I should modify if I want to add points?

Thank you.

not changed NM1 landmark accordingly, it is why 1_F is better?

Hi,
i am so appreciated with your great work.

here comes my two question:

code from dataset/level1.py

1, nm_face = cv2.resize(nm_face, (31, 39)).reshape((1, 31, 39))
nm_landmark = landmarkGt[2:, :].reshape((6))
NM_imgs.append(nm_face)
This code seems keep the landmark same between the F_face and the nm_face except the dimension. But I think nm_landmark in y_axis shoud be (landmarkGt[2:, :]-8)/31, because in landmarkGt, point_y = y/39,
point_y_new = (y-8)/31, suppose y is the abs location in (39,39) box.
Isn't why 1-F is better than 1-F+1_NM+1_EN?

2, f_bbox = bbox.subBBox(-0.05, 1.05, -0.05, 1.05)
f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

this code changed the bbox to f_bbox, but does not changed the landmark accordingly. isn't a mistake?
or just for add noise?

look forward to your reply. thanks

F1 network is not same as orignal paper

thanks for your sharing code and I have a question to ask you
1.the F1 network is not same as orignal paper,did you modify it?
2.in paper in the F1 network the first conv is CR(4,20,2,2), and 2 means weight sharing parameters, each map in conv evenly divided into 2 by 2 region ,and I don't know in F1 network where reflect 2 by 2 region?

About how you preprocess the image

I saw you preprocess input image by subtracting the mean intensity of it rather than subtracting the mean intensity of the whole training images as people usually does. What is the difference? Any reason for it?

About the modified caffe

You said that you have modified caffe source code so it would log every test iter, would you mind sharing your code? Thanks.

compilation issue because of looped dependencies

I am getting this error when I run ./database/level1.py:
Traceback (most recent call last):
File "./dataset/level1.py", line 18, in
from common import shuffle_in_unison_scary, logger, createDir, processImage
ImportError: cannot import name shuffle_in_unison_scary

how can I overcome this error?

thanks!

training 1_F network

Hi,
I downloaded the recommended data and trained 1_F network for 1 million iteration, and it got:
training error
Those error are consistent on the training data and test data.
on a test image it is clear that the nose is set on the bounding box on the x axis:
test image
I tried to re-train it again and then got the same kind of error for the left eye:
test image 1
Everytime I re-train I get a different defective point, which mad me think it is a random phenomena related to the data generation, but I was unable to detect a bug in the augmentation.
When running the pre-trained 1_F model you provided than I am able to get perfect results as you present.
I'm running on ubuntu 14.04 python 2.7

Let me know what output can be useful in order to debug it

Thanks

2 questions about data_preprocess

1.Here you use the subbox to expand the boundingbox(not resize the boundingbox), but without changing the relative coordinates. Could you explain that?

f_bbox = bbox.subBBox(-0.05, 1.05, -0.05, 1.05)
f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

2.Here you set the bbox's width and height as bbox[1] - bbox[0] and bbox[3] - bbox[2] respectively,that means it doesn't include the last row's or col's of pixels.But in the other codes of images cropping,it includes this.

class BBox(object):
"""
Bounding Box of face
"""
def init(self, bbox):
self.left = bbox[0]
self.right = bbox[1]
self.top = bbox[2]
self.bottom = bbox[3]
self.x = bbox[0]
self.y = bbox[2]
self.w = bbox[1] - bbox[0]
self.h = bbox[3] - bbox[2]

patch = img[patch_top: patch_bottom+1, patch_left: patch_right+1]

f_face = img[f_bbox.top:f_bbox.bottom+1,f_bbox.left:f_bbox.right+1]

Could you help me explain these two problems ,thanks.

flip() function

where is the flip() function defined?
It is giving me the error:
"
face_flipped, landmark_flipped = flip(f_face, landmarkGt)
NameError: global name 'flip' is not defined
"
am I missing something?

thank you!
saikrishna

sh: 1: caffe: not found

I met one problem, when I run
"python2.7 prototxt/generate.py CPU

level-1

python2.7 dataset/level1.py
rm -rf log/train1.log
echo "Train LEVEL-1"
python2.7 train/level.py 1 pool_on

echo "=.=""
It give me error like "sh: 1: caffe: not found"
but when I use python, I can import caffe successfully.
could you tell me what's the problem?

hello why twice use np.random.rand() > 0.5 in level1 generate_hdf5 .thanks

        ### rotation
        if np.random.rand() > 0.5:
            face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \
                bbox.reprojectLandmark(landmarkGt), 5)
            landmark_rotated = bbox.projectLandmark(landmark_rotated)
            face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39))
            F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39)))
            F_landmarks.append(landmark_rotated.reshape(10))
            ### flip with rotation
            face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated)
            face_flipped = cv2.resize(face_flipped, (39, 39))
            F_imgs.append(face_flipped.reshape((1, 39, 39)))
            F_landmarks.append(landmark_flipped.reshape(10))
        ### rotation
        if np.random.rand() > 0.5:
            face_rotated_by_alpha, landmark_rotated = rotate(img, f_bbox, \
                bbox.reprojectLandmark(landmarkGt), -5)
            landmark_rotated = bbox.projectLandmark(landmark_rotated)
            face_rotated_by_alpha = cv2.resize(face_rotated_by_alpha, (39, 39))
            F_imgs.append(face_rotated_by_alpha.reshape((1, 39, 39)))
            F_landmarks.append(landmark_rotated.reshape(10))
            ### flip with rotation
            face_flipped, landmark_flipped = flip(face_rotated_by_alpha, landmark_rotated)
            face_flipped = cv2.resize(face_flipped, (39, 39))
            F_imgs.append(face_flipped.reshape((1, 39, 39)))
            F_landmarks.append(landmark_flipped.reshape(10))

pre-trained models?

I coudn't find any pretrained models under the model folder. Can you make them available for landmark detection ? Because the original authors only provide windows version, a linux implementation will be very much appreciated.

Thanks!

why use the flatten layer and concat layer in level 1?

Firstly,thank you for sharing your code. I run it and got good performance,but I found it not follow the paper strictly.I found in level 1, you use the flatten layer and concat layer,why? would you please tell me the reason or give me some reference source.I am a totally newer in this field, thank you very much.

Why some landmarks failed?

I run your code and only the nose and right mouth get the right result. But the rest landmarks' failure rate are 99%. It is wired.

Hard to repeat level3's mean error at some points

Hi, thanks for your code sharing! I run your code several times by using bootstrap.sh. The result is very close to your presentation in level1 and level2, but the mean error in level3 is rising randomly at some points. (eg: Nose-0.121428, Left Mouth-0.062793 level3)
Could you give me some suggestions to solve this problem? Thanks a lot!

Training and testing procedures are not the same

I have a question.In the training, the results of the previous level of training are not used for the latter level of training. However, in the test phase, the previous results are used as a reference for the face position of next test phase. My understanding is like this. Is it right?

How to set the host value when testing?

I am new about python, and I want to use app to test the image.
In app.py

headers = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'gzip, deflate, sdch',
    'Host': 'image.baidu.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36',
}

I don't know what's that mean.
How to set the host value when testing?I use Chrome.
THX.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.