Giter VIP home page Giter VIP logo

deep3dfacereconstruction's Introduction

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

***07/20/2021: A PyTorch implementation which has much better performance and is much easier to use is available now. This repo will not be maintained in future. ***

This is a tensorflow implementation of the following paper:

Y. Deng, J. Yang, S. Xu, D. Chen, Y. Jia, and X. Tong, Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set, IEEE Computer Vision and Pattern Recognition Workshop (CVPRW) on Analysis and Modeling of Faces and Gestures (AMFG), 2019. (Best Paper Award!)

The method enforces a hybrid-level weakly-supervised training for CNN-based 3D face reconstruction. It is fast, accurate, and robust to pose and occlussions. It achieves state-of-the-art performance on multiple datasets such as FaceWarehouse, MICC Florence and BU-3DFE.

Features

● Accurate shapes

The method reconstructs faces with high accuracy. Quantitative evaluations (shape errors in mm) on several benchmarks show its state-of-the-art performance:

Method FaceWareHouse Florence BU3DFE
Tewari et al. 17 2.19±0.54 - -
Tewari et al. 18 1.84±0.38 - -
Genova et al. 18 - 1.77±0.53 -
Sela et al. 17 - - 2.91±0.60
PRN 18 - - 1.86±0.47
Ours 1.81±0.50 1.67±0.50 1.40±0.31

(Please refer to our paper for more details about these results)

● High fidelity textures

The method produces high fidelity face textures meanwhile preserves identity information of input images. Scene illumination is also disentangled to generate a pure albedo.

● Robust

The method can provide reasonable results under extreme conditions such as large pose and occlusions.

● Aligned with images

Our method aligns reconstruction faces with input images. It provides face pose estimation and 68 facial landmarks which are useful for other tasks. We conduct an experiment on AFLW_2000 dataset (NME) to evaluate the performance, as shown in the table below:

Method [0°,30°] [30°,60°] [60°,90°] Overall
3DDFA 16 3.78 4.54 7.93 5.42
3DDFA+SDM 16 3.43 4.24 7.17 4.94
Bulat et al. 17 2.47 3.01 4.31 3.26
PRN 18 2.75 3.51 4.61 3.62
Ours 2.56 3.11 4.45 3.37

● Easy and Fast

Faces are represented with Basel Face Model 2009, which is easy for further manipulations (e.g expression transfer). ResNet-50 is used as backbone network to achieve over 50 fps (on GTX 1080) for reconstructions.

Getting Started

Testing Requirements

Installation

1. Clone the repository

git clone https://github.com/Microsoft/Deep3DFaceReconstruction --recursive
cd Deep3DFaceReconstruction

2. Set up the python environment

If you use anaconda, run the following:

conda create -n deep3d python=3.6
source activate deep3d
conda install tensorflow-gpu==1.12.0 scipy
pip install pillow argparse

Alternatively, you can install tensorflow via pip install (In this way, you need to link /usr/local/cuda to cuda-9.0):

pip install tensorflow-gpu==1.12.0

3. Compile tf_mesh_renderer

If you install tensorflow using pip, we provide a pre-compiled binary file (rasterize_triangles_kernel.so) of the library. Note that the pre-compiled file can only be run with tensorflow 1.12.

If you install tensorflow using conda, you have to compile tf_mesh_renderer from sources. Compile tf_mesh_renderer with Bazel. Set -D_GLIBCXX_USE_CXX11_ABI=1 in ./mesh_renderer/kernels/BUILD before the compilation:

cd tf_mesh_renderer
git checkout ba27ea1798
git checkout master WORKSPACE
bazel test ...
cd ..

If the library is compiled correctly, there should be a file named "rasterize_triangles_kernel.so" in ./tf_mesh_renderer/bazel-bin/mesh_renderer/kernels.

After compilation, copy corresponding files to ./renderer subfolder:

cd renderer
cp ./tf_mesh_renderer/mesh_renderer/{camera_utils.py,mesh_renderer.py,rasterize_triangles.py} ./renderer/
cp ./tf_mesh_renderer/bazel-bin/mesh_renderer/kernels/rasterize_triangles_kernel.so ./renderer/

If you download our pre-compiled binary file, put it into ./renderer subfolder as well.

Replace the library path in Line 26 in ./renderer/rasterize_triangles.py with "./renderer/rasterize_triangles_kernel.so".

Replace "xrange" function in Line 109 in ./renderer/rasterize_triangles.py with "range" function for compatibility with python3.

Testing with pre-trained network

  1. Download the Basel Face Model. Due to the license agreement of Basel Face Model, you have to download the BFM09 model after submitting an application on its home page. After getting the access to BFM data, download "01_MorphableModel.mat" and put it into ./BFM subfolder.

  2. Download the Expression Basis provided by Guo et al. You can find a link named "CoarseData" in the first row of Introduction part in their repository. Download and unzip the Coarse_Dataset.zip. Put "Exp_Pca.bin" into ./BFM subfolder. The expression basis are constructed using Facewarehouse data and transferred to BFM topology.

  3. Download the pre-trained reconstruction network, unzip it and put "FaceReconModel.pb" into ./network subfolder.

  4. Run the demo code.

python demo.py
  1. ./input subfolder contains several test images and ./output subfolder stores their reconstruction results. For each input test image, two output files can be obtained after running the demo code:
    • "xxx.mat" :
      • cropped_img: an RGB image after alignment, which is the input to the R-Net
      • recon_img: an RGBA reconstruction image aligned with the input image (only on Linux).
      • coeff: output coefficients of R-Net.
      • face_shape: vertex positions of 3D face in the world coordinate.
      • face_texture: vertex texture of 3D face, which excludes lighting effect.
      • face_color: vertex color of 3D face, which takes lighting into consideration.
      • lm_68p: 68 2D facial landmarks derived from the reconstructed 3D face. The landmarks are aligned with cropped_img.
      • lm_5p: 5 detected landmarks aligned with cropped_img.
    • "xxx_mesh.obj" : 3D face mesh in the world coordinate (best viewed in MeshLab).

Training requirements

  • Training is only supported on Linux. To train new model from scratch, more requirements are needed on top of the requirements listed in the testing stage.
  • Facenet provided by Sandberg et al. In our paper, we use a network to exrtact perceptual face features. This network model cannot be publicly released. As an alternative, we recommend using the Facenet from Sandberg et al. This repo uses the version 20170512-110547 trained on MS-Celeb-1M. Training process has been tested with this model to ensure similar results.
  • Resnet50-v1 pre-trained on ImageNet from Tensorflow Slim. We use the version resnet_v1_50_2016_08_28.tar.gz as an initialization of the face reconstruction network.
  • 68-facial-landmark detector. We use 68 facial landmarks for loss calculation during training. To make the training process reproducible, we provide a lightweight detector that produce comparable results to the method of Bulat et al.. The detector is trained on 300WLP, LFW, and LS3D-W.

Training preparation

  1. Download the pre-trained weights of Facenet provided by Sandberg et al., unzip it and put all files in ./weights/id_net.
  2. Download the pre-trained weights of Resnet_v1_50 provided by Tensorflow Slim, unzip it and put resnet_v1_50.ckpt in ./weights/resnet.
  3. Download the 68 landmark detector, put the file in ./network.

Data pre-processing

  1. To train our model with custom images,5 facial landmarks of each image are needed in advance for an image pre-alignment process. We recommend using dlib or MTCNN. Use these public face detectors to get 5 landmarks, and save all images and corresponding landmarks in <raw_img_path>. Note that an image and its detected landmark file should have same name.
  2. Align images and generate 68 landmarks as well as skin masks for training:
# Run following command for data pre-processing. By default, the code uses example images in ./input and saves the processed data in ./processed_data
python preprocess_img.py

# Alternatively, you can set your custom image path and save path
python preprocess_img.py --img_path <raw_img_path> --save_path <save_path_for_processed_data>

Training networks

  1. Train the reconstruction network with the following command:
# By default, the code uses the data in ./processed_data as training data as well as validation data
python train.py

# Alternatively, you can set your custom data path
python train.py --data_path <custom_data_path> --val_data_path <custom_val_data_path> --model_name <custom_model_name>

  1. Monitoring the training process via tensorboard:
tensorboard --logdir=result/<custom_model_name> --port=10001
  1. Evaluating trained model:
python demo.py --use_pb 0 --pretrain_weights <custom_weights>.ckpt

Training a model with a batchsize of 16 and 200K iterations takes 20 hours on a single Tesla M40 GPU.

Latest Update

2020.4

The face reconstruction process is totally transferred to tensorflow version while the old version uses numpy. We have also integrated the rendering process into the framework. As a result, reconstruction images aligned with the input can be easily obtained without extra efforts. The whole process is tensorflow-based which allows gradient back-propagation for other tasks.

2020.6

Upload a pre-trained model with white light assumption as described in the paper.

2020.12

Upload the training code for single image face reconstruction.

Note

  1. An image pre-alignment with 5 facial landmarks is necessary before reconstruction. In our image pre-processing stage, we solve a least square problem between 5 facial landmarks on the image and 5 facial landmarks of the BFM09 average 3D face to cancel out face scales and misalignment. To get 5 facial landmarks, you can choose any open source face detector that returns them, such as dlib or MTCNN. However, these traditional 2D detectors may return wrong landmarks under large poses which could influence the alignment result. Therefore, we recommend using the method of Bulat et al. to get facial landmarks (3D definition) with semantic consistency for large pose images. Note that our model is trained without position augmentation so that a bad alignment may lead to inaccurate reconstruction results. We put some examples in the ./input subfolder for reference.

  2. We assume a pinhole camera model for face projection. The camera is positioned at (0,0,10) (dm) in the world coordinate and points to the negative z axis. We set the camera fov to 12.6 empirically and fix it during training and inference time. Faces in canonical views are at the origin of the world coordinate and facing the positive z axis. Rotations and translations predicted by the R-Net are all with respect to the world coordinate.

  1. The current model is trained using 3-channel (r,g,b) scene illumination instead of white light described in the paper. As a result, the gamma coefficient that controls lighting has a dimension of 27 instead of 9.

  2. We excluded ear and neck region of original BFM09 to allow the network concentrate on the face region. To see which vertices in the original model are preserved, check select_vertex_id.mat in the ./BFM subfolder. Note that index starts from 1.

  3. Our model may give inferior results for images with severe perspetive distortions (e.g., some selfies). In addition, we cannot well handle faces with eyes closed due to the lack of these kind of images in the training data.

  4. If you have any further questions, please contact Yu Deng ([email protected]) and Jiaolong Yang ([email protected]).

Citation

Please cite the following paper if this model helps your research:

@inproceedings{deng2019accurate,
    title={Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set},
    author={Yu Deng and Jiaolong Yang and Sicheng Xu and Dong Chen and Yunde Jia and Xin Tong},
    booktitle={IEEE Computer Vision and Pattern Recognition Workshops},
    year={2019}
}

The face images on this page are from the public CelebA dataset released by MMLab, CUHK.

deep3dfacereconstruction's People

Contributors

aadha3 avatar deconimus avatar microsoft-github-policy-service[bot] avatar vozf avatar yangjiaolong avatar yudeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep3dfacereconstruction's Issues

2d alignment seems bad

When I plot landmarks_2d on an image in input folder, it produces bad result. Why is this happening?

vd005
vd092
vd034

Landmark format

What is the format of the landmarks in the .txt file? I was expecting 5 pairs of x,y coordinates but in the example input, it is just 6 integers.

When I use the Python dlib on an image, I get the following points:
points[(715, 266), (667, 269), (551, 270), (599, 270), (637, 359)]
How would I convert this to the expected input format?

Unit conversion during the data loading

Hi, thanks for sharing this awesome work. Just got one question for the data loading code, it seems that you performed the unit conversion by dividing 1e5 for all the base and mean shape. I am wondering is it a normal practice to do that? Cause I see that the location of the vertex in 3dmm is usually large (in the order of 1e5), what is the unit for those coordinates in the 3dmm model?

Texture and Illumination coefficients

Hi, Yu,
I am re-implementing this paper and I have finished the photometric loss, landmark loss and regularizations. I find that my network can not regress texture coef(delta) and easily regress illumination coef(gamma). In other word, when the network is fed with a black man, the network thinks that it is the scene illumination that too low not the raw face texture too dark.
I want to ask is this a normal phenomenon?
And are the results posted in your paper rendered with illumination or without illumination that the network estimated?
Thanks!

suppl. material

where is Supplementary materials mentioned in the article

PCA bases scaled with standard deviation

Hi,
I have a question that whether the idBase, exBase and texBase you provide are already scaled with their standard deviation respectively (as said in chapter 3 of your paper).

And I am inplementing Regularization in 4.3 of your paper, can you tell me the specific number(or order of magnitude) of ||alpha||^2, ||beta||^2 and ||delta||^2 when the training converged.

Thanks!

Is photometric loss a l2 loss with sqrt or not?

In your paper, the photometric loss is a l2 loss with sqrt operation. But when computing gradients, sqrt operation always cause NAN. So, I just want to make sure if the photometric loss is a l2 loss with sqrt? If it is, can you give some suggestion on how to deal with the NAN situation.

landmark

I have a txt of 68 landmarks. But in program, it need five landmarks. How can I choose these five landmark?

Question about angles through split coeff

Hi:

thanks for your contribution.

After split coeff, we can have ruler angle and use it to calculate rotation matrix.

I am confused that, is the angles equals to (roll, pitch, yaw)?

@

Question about tf_mesh_render parameter fov_y and 2D projection parameter focal

你好,

请问一下渲染时候使用的相机视角宽度系数fov_y和3d到2d投影时相机focal系数是怎么计算的?因为现在我打算使用kaolin替换tf_mesh_render, 发现使用tf_mesh_render的fov_y系数会导致渲染的结果只看到鼻子。所以我想知道fov_y 和 相机的focal系数之间的关系,如果有相关资料的话,烦请告诉我

祝好!

didn't succeed to run the demo

Hello everybody, so im brand new to python, but very motivated to make deep3Dface reconstruction work so i spent two days downloading and installing all the packages, cloning the repositery, installing everything and puting good files in good folders, and then asked myself how the program is running, should i replace the images in the input folder with one of mine ? i don't get it and feel a bit lost, so i ran the demo.py like that and got this message before the window close:

2019-05-08 23:08:10.069653: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
reconstructing...
1
C:\Users\79\Desktop\neuron\Deep3DFaceReconstruction\Deep3DFaceReconstruction\preprocess_img.py:19: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
k,,,_ = np.linalg.lstsq(A,b)

then got another message but didnt had time to read or copy because windows closed directly.

any help is very welcome, thanks David

Do the 68 landmarks index need minus 1?

You have made a 'facemodel_info.mat' in folder 'BFM'. And it contains the 68 landmarks' index. In your .mat, it's 'keypoints'. If I want to use these keypoints, should I minus 1 before use just like what you did to 'tri'.

IndexError: too many indices for array

when I suppose my image to demo.py ,this happens
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1. k,_,_,_ = np.linalg.lstsq(A,b) Traceback (most recent call last): File "demo.py", line 76, in <module> demo() File "demo.py", line 55, in demo input_img,lm_new,transform_params = Preprocess(img,lm,lm3D) File "C:\Users\Utente\Desktop\1x\Deep3DFaceReconstruction-master\preprocess_img.py", line 65, in Preprocess img_new,lm_new = process_img(img,lm,t,s) File "C:\Users\Utente\Desktop\1x\Deep3DFaceReconstruction-master\preprocess_img.py", line 46, in process_img img = img[:,:,::-1] IndexError: too many indices for array
matteo.txt
this is the txt input file .

why returns this error?

don't work well for chinese?

Hi, I have a question when I try this work. whether does it not work well for Chinese? The result don't like the person who is a Chinese which I input .

TypeError: 'list' object cannot be interpreted as an integer

Hi I tried running demo.py and getting this error
Traceback (most recent call last):
File "demo.py", line 76, in
demo()
File "demo.py", line 64, in demo
shape = np.squeeze(face_shape,[0])
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 1388, in squeeze
return squeeze(axis=axis)
TypeError: 'list' object cannot be interpreted as an integer

Camera distance 10

Projection is a very important procedure since you use 2d info to do weakly supervisation. You set camera posion (0,0,10), and to get a suitable face image, you set fov 12.5936. Actually, it's not precise since each photo is taken at different distance. So, this may cause some shape error. What do you think about this issue, or do you have any solution like 'dynamic projection procedure'.

Scale and translation for z buffer

Hi

What's the scale and translation value for z-buffer if I want to align the mesh to image? For x,y, the rasterization can be done according to the process_img in preprocess_img.py. I cannot find something similar for z value. Thank you.

About Tensorflow Version

If we use Tensorflow version 2.x, it may report that "placeholder" is missed. And using Tensorflow version 1.x could solve this problem.

How to calculate landmark loss?

Hi,
I am inplementing landmark loss, I am confused about eqaution (3) in 4.1.2 of your paper. There are two questions.
<1> Is N always equal to 68, even if there are self-occlusion ?
<2> Is q_n and q'_n include z-component when calculating equation (3) ? (I know that you use [8] to detect 3D landmark to get (x,y,z) tuples).

Thanks!

why select_vertex_id.mat and BFM_front_idx.mat are difference?

hi, thanks for sharing this code. when i read the function transferBFM09() in load_data.py.
the chosen indexes were based on BFM_front_idx.mat. why not use select_vertex_id.mat(35709, 1)??? and what is the difference between select_vertex_id.mat and BFM_front_idx.mat(35709,1)

hopes for you reply! thanks

How to render texture?

Hey! I run your demo and the results are great!
But your demo only generates meshes without texture, and you mentioned in your readme file that you use tf_mesh_renderer as renderer during training and testing.
Could your kindly release some example code to show me how to render the results with tf_mesh_renderer, since there are not much documents about tf_mesh_renderer?
I'm just getting started with face reconstruction, appreciate your help!

Training Process Problems

Hi,thanks for your work, I am really interested !
Now I have implement training process according to your paper, but I found some problems during training:

  1. Image-level loss is hard to converge,with which the loss is about 40~50 at last , leading the reconstruction result to be mean shape without real texture of input image. (I used skin mask to train, and I want to ask whether there exists any tricks to converge Image-level loss ?)
  2. Land-mark loss and image-level loss is hard to balance, which means that if I increase the weight of Land-mark loss, which will lead reconstruction result to have accurate pose and expression but wrong face color. On the contrast, if the weight of image-level loss is increased, the reconstruction result can have accurate face color but wrong pose and expression. (I want to ask How to balance this two losses?)
  3. The learning rate is also important during training, which Learning rate strategy is used to achieve good results ?
    Hope for your advice, thank you very much!

How many neurons are there in the last fully-connected layer of R-Net?

In the paper, you said "we use a ResNet-50 network [22] to regress these coefficients by modifying the last fully-connected layer to 239 neurons." , but when I debug your code, and I found the count of output coefficients of R-Net is 257 ? I think that's because the count of lighting coefficients, in your paper it should be 9, but in your code, it's 27. which is right?

Genova et al result

Hi! Thank you for open sourcing your code! I noticed in your paper that you reported Genova et al's average RMSE result on MICC to be ~1.78, however, Genova et al report (what i assume to be the same evaluation metric) to be ~1.50...is there a reason for this?

2D face alignment in preprocessing

Hi,

Many thanks for your work!

I have a question that why you don't do rotation alignment in image preprocessing, and how can I do that at the same time the model can work properly. I didn't understand how the POS function work, if you can tell me that is better.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.