kpzhang93 / mtcnn_face_detection_alignment Goto Github PK
View Code? Open in Web Editor NEWJoint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks
License: MIT License
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks
License: MIT License
why input windows into O-Net two times and the windows are the same? and i do not understand points2=[1-points([2,1,3,5,4],:);points([7,6,8,10,9],:)]? could anybody answer? thank you very much @ @kpzhang93
why is used the pnet to generate train datas for rnet? Whether can I use the same way to gererate train datas in pnet to generate train datas fot rnet?
Not an Issue.
I forked your repo and modified a little so it can run with python3. If anyone want to use or contribute. Here is the link https://github.com/vplentz/mxnet_mtcnn_face_detection .
Hey, thanks for your excellent job! I have one question about generateBoundingBox.m
[y x]=find(map>=t);
a=find(map>=t);
if size(y,1)==1
y=y';x=x';score=map(a)';dx1=dx1';dy1=dy1';dx2=dx2';dy2=dy2';
else
score=map(a);
end
When size(y,1)==1, there is only one point found in map. So why do you transpose those variables?
BTW, I implement mtcnn in python. Here is the repo.
Thanks.
in the detect_face.m line 34:
im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;
how to understand this magic number?
It is too slow to do the face detection and alignment when dataset is large if process images one by one. So I wander if it is possible to change your script a little bit to make it work when the input is a batch?
Hello,
thanks for sharing your code. I have some questions about training.
Hi, I'm new to matlab. I don't know how to change the boundingbox size. For I want to detect face and crop it with a margin, what can I do to change the boundingbox size with a margin? I use MTCNNv2 ...Any help will be appreciated!
我在linux已经成功安装了caffe,想在cpu模式下测试一下该工程,但是没有找到相关教程,编译一直出错
It will be great if there available a wrapper for Java, if yes, then how can we get? Thanks!
Dear sir:
what are the thresholds when you test your algorithm on the WIDERface validation set ?
and do you use the same models that you released in this project ?
thank you!
I want to train the data by myself.
My method:
input data: four kinds of training data( includes positive, negative, part, landmarks)
for different data, only backward specific loss layers.(eg. For positive update regression Loss and landmark loss, for negative update classification loss , for landmark face update landmark loss)
However, in the paper, We use det:box:landmark = 1:0.5:0.5 in P-Net and R-Net
, how can i implement it?
Weigh_Loss
to 2:1:1?I wish to know the two method above is it correct?
What is your method?
What the data proportion(pos:neg:part:land = ?:?:?:?)
Thanks a lot
Hello I am new in face detection and landmark detection. My question is based on this network how can I produce 68 points of landmark, because in thisdemo it just can produce 5 points. Do I need to train again with 68 landmark?
-Thank you-
@kpzhang93 mtcnn is slow by CPU, is there any solution to speed it up in real time?
Hi ya,
At test, given an image, the PNet outputs the heatmaps of classification scores (i.e. HW2) and the offsets
(i.e. HW4) for each position. Then the python function generate_bbox is used to produce the bounding box for the image. I was confused how this generate_bbox works. What's the meaning of the offsets (i.e. HW4) for each position.
Many thanks
i need a tensorflow version, thx.
When I run demo.m in MTCNNv2 , there occurs an error:
Undefined function or variable 'rectgenpathangle'.
How can I fix it ?
hi @kpzhang93 ,
I have gone through the repository and found that the license type used in this repository is "MIT".
1.Is the model file also covered under MIT license?
2.With your knowledge, any idea about the licensing of datasets you used?
in generateBoundingBox function of the file generateBoundingBox.m, there is "boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg]". Then my question is why plus 1.
Thank you in advance.
How to understand LNet in MTCNNv2 ?
i found if the img is too big, eg. 3000*2000, the detection function is corrupt. so i want to know why
Hello,
When I execute MTCNN on linux, some fields are not recognizing by the caffe library and I have the followings errors :
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 12:12: Message type "caffe.MemoryDataParameter" has no field named "transpose".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:53:38.052264 3463 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***
This error is related to the proto "memory_data_param" which has no "transpose" field
When I remove this field of the model "det1-memory.prototxt", a new error appear :
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 196:21: Message type "caffe.LayerParameter" has no field named "predict_box_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:55:39.076514 3523 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***
This time, caffe don't know the layer of type "PredictBox".
If I remove this layer of the file, the model success to load but at the runtime, I have the following error :
F1112 10:37:54.543467 3165 data_transformer.cpp:290] Check failed: img_height == height (360 vs. 256)
I think to resolve the problem I need to modify the proto layers of caffe but I don't know exactly how.
I use :
I find it in your caffe library
I have the impression that using greyscale images has a bad impact on the detection performance. I made an experiment: I used an RGB image and found a face, afterwards I converted to greyscale and the detector was no longer able to find a face in this image. In my opinion this behaviour could occur as the detector was trained with more RGB than greyscale images (or images where R=G=B). Could this be the case?
Hi, for this multi-task network, if there is no valid samples for some task in one batch, such as no -2 label for landmark features or no -1 and 1 labels for bbox features. when this happens, what the corresponding loss value should be? nan or 0?
Thanks in advance!
there are 4 caffemodel in code/codes/MTCNNv1/model/,one of them is about reset101 ,where the model come from
if i want use reset50 ,how can i do?
where can i download form except train by myself?
yaw, pitch, roll for the three most common problems in face detection.
According to my testing,
Roll is a weakness of MTCNN, almost more than 20 degrees will not recognize it.
I try using mtcnn' landmark detection to do the face alignment in face roll situation, but it's not work.
Is there any method to improve this situation, like increase the rotation of the dataset to train or something?
Hi, is there any variable in the detection function that we can use as face detection confidence ? Please inform. Thanks in advance.
Before face detection operation, why do we need the gaussian pyramid images?
请问boundbox_regression输出的四个值,是关于【左上角坐标,宽,高】的,还是【左上角坐标,右下角坐标】的?代码貌似和论文不一致,求教。
i have been reimplementing this work presently, but i have a problem about the "P-net",we need recover original scans location from output feature maps, but why the stride was set 2, i could not understand it clearly, the code is follow:
stride=2; boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg];
could you explain it simplely? thanks.
比如说得到150x151这种裁剪出的人脸图像,怎么用代码解决
After running demo.m on MTCNNv1 or v2, how can I align that face with respect to the eye points? Is there code to do so?
文章摘要和引言中多次提到人脸特征点定位于包围盒回归之间的关系,以及人脸检测与配准之间的关系。
我不是十分明白这两者之间的关系是什么,或者说文章中这个关系是什么。
另外,我不理解文章中又是怎么利用这个关系的。
谢谢
Hi,
Thanks for sharing the code. I want to reproduce the result and maybe make some modifications on the network architecture.Is it possible to share the training code with us? especially, how to prepare the dataset. Thanks.
Hello
I try to reproduce MTCNNv1 WIDER score on validation data
I have not exactly same resault as in paper or mat files eval/plot/baselines/Val/setting_int/multitask-cascade-cnn/wider_pr_info_multitask-cascade-cnn_easy_val.mat
I set parameters:
threshold=[0.5 0.5 0.3];
factor=0.79;
And have:
0.834 0.810 0.624
vs
0.848 0.825 0.598
Is difference more then 1% mAP significantly?
您好,我有两个问题,
1.第三层网络输出的10个点,前5个是x 后5个是y 的偏移比例对吧?
2.在pad中box是有微调的,total_boxes是否也应该调整一样?
谢谢
您好,非常感谢您开源的模型。我现在想用您的模型进行人脸检测和关键点定位。但是我有一些问题需要请教,主要是关于下面这个函数:
[total_boxes points] = detect_face(img,minsize,PNet,RNet,ONet,threshold,fastresize,factor)
参数 minsize , threshold , ,factor 该依据什么准则进行设置?虽然您在demo里提供了一些默认参数,但是我并不知道您是根据什么来设置的。
再次感谢。
I evaluated the author's model on WIDER FACE dataset. The result is different from the paper. I set minsize as 10, scale factor as 0.79 and threshold as [0.5 0.5 0.3].
paper's result evaluation result using author's model
easy set: 85.1 83.3
medium set: 82 80.9
hard set: 60.7 62.2
Has anyone reproduced the author's performance on WIDER FACE?
Hi,
Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.
The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the code:
%Code file: detect_face.m
if fastresize
im_data=imResample(im_data,[hs ws],'bilinear');
else
im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;
end
PNet.blobs('data').reshape([hs ws 3 1]);
out=PNet.forward({im_data});
For reference I have printed out the original size and the rescaled size:
ORIGINAL Height: 340
ORIGINAL Width: 151
SCALE USED (were computed before): 0.107493555074
RESCALED Height: 37
RESCALED Width: 17
The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.
% Code file: det1.prototxt
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12
What I don't understand is where is the size going from size of image to 12x12?
Equation 1 from paper have a typo? Looks like it should be cross-entropy loss
which one is -(y_true*log(y_pred)+(1-y_true)*log(1-pred_y)
and not -(y_true*log(y_pred)+(1-y_true)*(1-log(pred_y)))
as in paper.
论文里Evaluation on AFLW for face alignment
和其他方法进行了比较,请问本方法特征点定位比其他方法好的原因是什么呢?
与TCDCN方法相比 本方法Onet48网络结构更简单,为何效果更好呢?
与人脸框回归有关系?如果有,那是什么关系呢
i have ready 1 million pictures. 03 is pos,0.3 is part,0.4 is neg.
I trained the Pnet, the result is very bad.
could you share your solver of Onet?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.