gafniguy / 4d-facial-avatars Goto Github PK

View Code? Open in Web Editor NEW

670.0 670.0 67.0 54.14 MB

Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction

Python 100.00%

4d-facial-avatars's People

Contributors

Stargazers

Watchers

4d-facial-avatars's Issues

Question about the translation vector

Hi, I am doubting that how does you get translation vector working as camera location? In my own monocular video, the translation vectors do not shift a lot.

What is the rotation parameter in the dataset?

Hi, I want to create my custom dataset to train this model. However, I do not understand what is the rotation parameter in the transform_train.json.

Training time

Wonderful Work! I have a question about training time. Could you tell me what is the average training time for this model to get relatively good results？

more datasets with different persons needed

https://syncandshare.lrz.de/getlink/fi9dX7xk9kQ8cN7WS4pNkNBU/dave.zip

Hi, could you share other datasets that are shown in your paper? Thank you!

Hi,
After obtaining a model trained 400k epoches, I run eval_transformed_rays.py .
It seems that time to render each frame is about 15 seconds (I use GeForce RTX 3060 to test), which means to render a 20-second video with 1000 frames I need to spend a couple of hours. Is it normal? May I ask how long does it take for you to test per frame?
Could you give some hints about how to accelerate the test phase and is it possible to use it in real time in practice (now or in the future)?
Thanks!

Rigid Transformations

Hi,

I am trying to run 'real_to_nerf.py' to use my research. I am implementing a face-tracking code using DECA. However, DECA uses camera parameters as (tx, ty, scale). I guess NerFACE uses (tx, ty, tz) for rigid transformations. So, I would like to know how to change the DECA camera parameters to rigid transformations used in NerFACE.
The purpose of my question is to make text files - intrinsics.txt, expression.txt, and rigid.txt
I appreciate your kindness. Your research is helpful to me in understanding the area of study.

Reproduce the Transformation Matrices

Hi Guy,

Thanks for the excellent work! I am using my own face tracker to reproduce the transformation matrices in your provided JSON files. However, I cannot get the same results for both rotation and translation. Moreover, when I substitute the transformation matrices with mine in your JSON files, the validation result is bad. But the weird thing is that the debug results (overlaying transformed face mask on the original image) of my transformation matrix seem good.

I am wondering if you directly utilize the head rotation and translation in the face tracker as the camera-to-world matrix in NeRF. If so, what is the unit of your rotation and translation? Does it matter if my transformation matrix is in a different world coordinate system from yours (e.g. different origins)? Could you provide your debug code that overlays the face mask on the original image so that I could check what is wrong with my own transformation matrices?

Looking forward to hearing from you!

Thanks,
Jeremy

Could this project be trained on multi-gpus?

It doesn't seem to have been designed for training on multiple GPUs.
When i simply add nn.DataParallel in train_transformed_rays.py, some error appeared

  model_coarse = nn.DataParallel(model_coarse)
  model_coarse.to(device)
  ...
  model_fine = nn.DataParallel(model_fine)
  model_fine.to(device)

Done with data loading
done loading data
Available GPUs: 8
loading GT background to condition on
bg shape torch.Size([512, 512, 3])
should be torch.Size([512, 512, 3])
initialized latent codes with shape 100 X 32
computing boundix boxes probability maps
Starting loop
0%| | 0/1000000 [00:26<?, ?it/s]
Traceback (most recent call last):
File "train_transformed_rays.py", line 613, in
main()
File "train_transformed_rays.py", line 347, in main
rgb_coarse, _, _, rgb_fine, _, _, weights = run_one_iter_of_nerf(
File "/data/shenzhonghai/4D-Facial-Avatars-main/nerface_code/nerf-pytorch/nerf/train_utils.py", line 268, in run_one_iter_of_nerf
pred = [
File "/data/shenzhonghai/4D-Facial-Avatars-main/nerface_code/nerf-pytorch/nerf/train_utils.py", line 269, in
predict_and_render_radiance(
File "/data/shenzhonghai/4D-Facial-Avatars-main/nerface_code/nerf-pytorch/nerf/train_utils.py", line 100, in predict_and_render_radiance
radiance_field = run_network(
File "/data/shenzhonghai/4D-Facial-Avatars-main/nerface_code/nerf-pytorch/nerf/train_utils.py", line 36, in run_network
pred = network_fn(batch, expressions, latent_code)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/shenzhonghai/4D-Facial-Avatars-main/nerface_code/nerf-pytorch/nerf/models.py", line 248, in forward
x = self.layers_xyzi
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/data/shenzhonghai/anaconda3/envs/4DFA/lib/python3.8/site-packages/torch/nn/functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

Is it possible to extract meshes from the learned volume?

Hi, I was wondering if we can extract meshes from the learned NeRFs as in the original NeRF repo. Specifically, how to sample a valid cube? What's the range of the coordinates should be? Thanks!

How to do facial reenactment?

Hi, Thanks for open-sourcing this awesome work. I am trying to re-implement the facial reenactment shown in Fig.5 in the paper. Could you please let me know how to do that? Thanks!

foy_y to focal length

Hi, thanks for you for this great job.
I have a question on camera intrinsics.
I implement a face tracker with fixed fov_y(I didn't do camera calibration because I use online video as input), how can I transform fov_y to the focal length in the json file.

Is this formula correct?
fy = image_height / (2 * tan(fov_y / 2))

Qualitative results different from paper: Blurry shoulders

Hi,

Thank you for your work and code. The results on the paper look amazing. I used person_2 from your dataset and the config file attached at the end, and trained the network for 400k following your paper. The face looks good, but the shoulders are way too blurry compared to the image in the paper (Fig 5, second row). Could you let me know if there are different parameters I should use or if there are some parts of the code I should modify?

Thanks for the help!

# Parameters to setup experiment.
experiment:
  # Unique experiment identifier
  id: dave__fixed_bg_512_paper_model
  # Experiment logs will be stored at "logdir"/"id"
  logdir: logs
  # Seed for random number generators (for repeatability).
  randomseed: 42  # Cause, why not?
  # Number of training iterations.
  train_iters: 1000000
  # Number of training iterations after which to validate.
  validate_every: 1000
  # Number of training iterations after which to checkpoint.
  save_every: 5000
  # Number of training iterations after which to print progress.
  print_every: 100
  device: 0

# Dataset parameters.
dataset:
  # Type of dataset (Blender vs LLFF vs DeepVoxels vs something else)
  type: blender
  # Base directory of dataset.
  basedir: /datasets/nerface/person_2
  #basedir: real_data/andrei_1_light
  #basedir: real_data/debug
  # Optionally, provide a path to the pre-cached dataset dir. This
  # overrides the other dataset options.
  #cachedir: cache/flame_sample
  # For the Blender datasets (synthetic), optionally return images
  # at half the original resolution of 800 x 800, to save space.
  half_res: False
  # Stride (include one per "testskip" images in the dataset).
  testskip: 1
  # Do not use NDC (normalized device coordinates). Usually True for
  # synthetic (Blender) datasets.
  no_ndc: True
  # Near clip plane (clip all depth values closer than this threshold).
  near: 0.2
  # Far clip plane (clip all depth values farther than this threshold).
  far: 0.8

# Model parameters.
models:
  # Coarse model.
  coarse:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding
    # of the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True
  # Fine model.
  fine:
    # Name of the torch.nn.Module class that implements the model.
    type: ConditionalBlendshapePaperNeRFModel
    # Number of layers in the model.
    num_layers: 4 # ignore this, I hard coded the model
    # Number of hidden units in each layer of the MLP (multi-layer
    # perceptron).
    hidden_size: 256
    # Add a skip connection once in a while. Note: This parameter
    # won't take affect unless num_layers > skip_connect_every.
    skip_connect_every: 3
    # Number of encoding functions to use in the positional encoding
    # of the coordinates.
    num_encoding_fn_xyz: 10
    # Whether to include the position (xyz) itself in its positional
    # encoding.
    include_input_xyz: True
    # Whether or not to perform log sampling in the positional encoding
    # of the coordinates.
    log_sampling_xyz: True
    # Additionally use viewing directions as input.
    use_viewdirs: True
    # Whether to include the direction itself in its positional encoding.
    include_input_dir: False
    # Number of encoding functions to use in the positional encoding of
    # the direction.
    num_encoding_fn_dir: 4
    # Whether or not to perform log sampling in the positional encoding
    # of the direction.
    log_sampling_dir: True

# Optimizer params.
optimizer:
  # Name of the torch.optim class used for optimization.
  type: Adam
  # Learning rate.
  lr: 5.0E-4

# Learning rate schedule.
scheduler:
  # Exponentially decay learning rate (in 1000 steps)
  lr_decay: 250
  # Rate at which to apply this decay.
  lr_decay_factor: 0.1

# NeRF parameters.
nerf:
  # Use viewing directions as input, in addition to the X, Y, Z coordinates.
  use_viewdirs: True
  # Encoding function for position (X, Y, Z).
  encode_position_fn: positional_encoding
  # Encoding function for ray direction (theta, phi).
  encode_direction_fn: positional_encoding
  # Training-specific parameters.
  train:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    num_random_rays: 2048  # 32 * 32 * 4 # was 1024
    # Size of each chunk (rays are batched into "chunks" and passed through
    # Size of each chunk (rays are batched into "chunks" and passed through
    # the network)
    chunksize: 2048 #16384  #131072  # 131072  # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.1
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False
  # Validation-specific parameters.
  validation:
    # Number of random rays to retain from each image.
    # These sampled rays are used for training, and the others are discarded.
    chunksize: 65536 #4096  #131072   # 1024 * 32
    # Whether or not to perturb the sampled depth values.
    perturb: True
    # Number of depth samples per ray for the coarse network.
    num_coarse: 64
    # Number of depth samples per ray for the fine network.
    num_fine: 64
    # Whether to render models using a white background.
    white_background: False
    # Standard deviation of noise to be added to the radiance field when
    # performing volume rendering.
    radiance_field_noise_std: 0.
    # Sample linearly in disparity space, as opposed to in depth space.
    lindisp: False

for customized data preprocessing, how to generate the required shape expression vector?

@gafniguy
as this vht repo "https://github.com/philgras/video-head-tracker", it outputs totally different dimension of expression vector (which is 100d), how to align this with nerface requirement (expression is 76D vector)?

Also, again, how to generate the rigid.txt, transform.txt needed by real_to_nerf.py? could you provide samples?
"
expressions = read_expressions(os.path.join(args.source, "expression.txt"))
rigid_poses, scale = read_rigid_poses(os.path.join(args.source, "rigid.txt"))
"

How to change the pose and expression during evaluation?

Hello,

after evaluation I find that generated images are same and there is no change for head pose and expression, which parameter(s) shoud I change?

Thanks in advance！

How to get continuous video, not individual rendered images?

Thanks for great contribution!!
However, I'm getting problem with the same issue #29 .
How can I get different pose & expression frame images, and final video just as in the sample?
Set line 420 to None, but too many errors... Can I get codes to synthesize a video?

How to implement facial expression tracking?

How can I export the intrinsic, expression, and pose from the input frames? Could you share the reference respo?

AttributeError: 'NoneType' object has no attribute 'view'

when I run train_transformed_rays.py,the error occur.And the argument ray_directions_ablation in train_utils.py is None,but the function in train_transformed_rays.py does not have the correspond input.
thanks.

Which ".yml" file is used to build the environment?

Hello,

Wonderful work! I am trying to reimplement your code, could you please tell me which ".yml" file is used to build the same environment as you did in the paper?

Thanks in advance!

nerf MLP architecture

Hello,

Thank you for your nice work. I noticed that the nerf MLP architecture in your paper is slightly different than the original nerf implementation, i.e. 1 hidden layer for RGB prediction --> 4 hidden layers with halved width.

I wonder whether you've tried the original nerf architecture and whether you changed it because the modified architecture gives better results.

Thank you in advance for your answer:)

Yufeng

How to Render (eval_transformed_rays.py) with Ablation?

I was able to train person3 dataset but when I tried to run eval_transformed_rays.py, the output is just a still video. I'm trying to set the angle of the head to face the camera and to keep the expressions. Any suggestions on a solution that I can try?

I tried to do the following:

set ablate = 'expression'
commented the index_of_image_after_train_shuffle = idx_map[10,1]
uncommented index_of_image_after_train_shuffle = 10
set no_lcode = True

Did I miss anything? Also, what should be the correct value for index_of_image_after_train_shuffle?

Thanks!

eval_transformed_rays.py print out same images.

Hello.
Thank you for sharing the great code.
When I used the evaluation code (eval_transformed_rays.py), there was AN ISSUE—the code printed out the same images.

I trained the person_1 dataset using train_transformed_rays.py code to get checkpoints. Then, I used the 295000 checkpoint file made myself.
Is there anyone who solves this problem or has no errors?
Thanks.

Question about Train

When I tried to train, a lot of errors appeared. My training steps are as follows:

Download the data you provided
Modify the data path in the configuration file
Run train_transformed_rays.py

When I run the third step, run_one_iter_of_nerf produced a series of errors because it did not pass in the ray_directions_ablation parameter. At present, I commented all the relevant codes and can start training, but I don't know if it is correct.My training output is below:

[VAL] =======> Iter: 0
Validation loss: 0.10423796623945236 Validation PSNR: 9.81974070622407 Time: 2.703904151916504
================== Saved Checkpoint =================
[TRAIN] Iter: 100 Loss: 0.10412006825208664 BG Loss: 0.0 PSNR: 9.824655558769116 LatentReg: 0.0
[TRAIN] Iter: 200 Loss: 0.03349904716014862 BG Loss: 0.0 PSNR: 14.749675457683708 LatentReg: 0.0
[TRAIN] Iter: 300 Loss: 0.017453620210289955 BG Loss: 0.0 PSNR: 17.581144784878884 LatentReg: 0.0
[TRAIN] Iter: 400 Loss: 0.01741687022149563 BG Loss: 0.0 PSNR: 17.590298842108833 LatentReg: 0.0
[TRAIN] Iter: 500 Loss: 0.029193900525569916 BG Loss: 0.0 PSNR: 15.347078761129636 LatentReg: 0.0

How to decouple the pose and expression

Hi:
Thank you for your wonderful work !
we are trying to re-implement your work with the dateset you provided, the model seems fit the trainning set well, but some of the images in test set cannot be predicted correctly and the predicted images get worse when changing the pose or expression. It seems that the pose and expression are not decoupled with out model.
Is there any network design or trainning tricks we should notice ?

Thank you very much.

About real_to_nerf.py

Hello.
Thank you for your excellent work. NerFACE is helpful for my study.
I recorded my video sequence to make a JSON file. Then, using FFmpeg, I made frame image files. However, I've got this error.

####################################################################################
Reading tracked face data
found 2877 images in image folder
Traceback (most recent call last):
File "real_to_nerf.py", line 1532, in
generate_driven_test_sequence(configargs, 1000)
File "real_to_nerf.py", line 1144, in generate_driven_test_sequence
intrinsics = read_intrinsics(os.path.join(args.source, "intrinsics.txt"), im_size)
File "real_to_nerf.py", line 64, in read_intrinsics
all_intrinsics = np.genfromtxt(path_to_intrinsics_txt, dtype=None)
File "/home/stephencha/.local/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1813, in genfromtxt
fid = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/home/stephencha/.local/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/home/stephencha/.local/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 532, in open
raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: ./real2nerf/source/intrinsics.txt not found.
####################################################################################

I would like to know how to make an intrinsics.txt file.
I think I should make intrinsics.txt using frame tracking to go forward. So I searched GitHub, and I found some useful open sources.

https://github.com/kimoktm/Face2face
https://github.com/YadiraF/DECA

The train JSON file contains camera x, expression, and bounding box parameters. I think DECA (including FLAME) could make these parameters. However, I could not know what was inside the intrinsics.txt file.

Thanks.

Plan of code release

First of all, thank you for your great work!

The problem definition, as well as your method, is quite awesome too.

Are you planning to release the code?

Add details

Hi，I found that there are so many configs. Please add some details on in README. @gafniguy

IndexError

After running train_transformed_rays, there is an error during running eval_transformed_rays:

Traceback (most recent call last):
File "eval_transformed_rays.py", line 522, in
main()
File "eval_transformed_rays.py", line 459, in main
latent_code = latent_codes[index_of_image_after_train_shuffle].to(device) if use_latent_code else None
IndexError: index 1150 is out of bounds for dimension 0 with size 100

Do you have the idea how to fix this?

Thanks in advance!

How to get expression statistics?

Thanks for previous support on making continuous video.

In the real_to_nerf.py code, it says I need "expressions.txt", "rigid.txt".
Also in the json file of the person_1 dataset, there are "expressions" values that is already prepared.

How can I get these values from my own video or image sequence? I searched for face2face model code, and there's nothing but demo code using pix2pix model.

Could you share your dataset firstly?

Want to do some work on your dataset, thanks.

Questions about how to create learnable latent code at test time

Thanks for the great research!
During training, the latent code is learned for each frame, but during testing, there is no latent code learned.
As far as I can see, latent codes are all set to 0 or on commented out line, there are signs that the average of the latent codes during training was tried.
I would appreciate it if you could tell me how you create the latent code at test time.

Torch Errors

Hello!

I am trying to get this to work and am getting some weird torch errors. I am newish to ML so was a bit confused.

To get it running I had to make some changes to nerface_code/nerf-pytorch/nerf/train_utils.py hopefully I did not break
something 😅

ray_directions_ablation is used here but when run_one_iter_of_nerf is called here it is not passed. The YML file in
the README has options.dataset.no_ndc as True so it fails. I also commented out some other lines that seemed to
be used for ablation runs:

Following the comment here I commented out the paragraph here
commented out a line here
changed ray_dirs_fake to None here

I am guessing that these were all for ablation studies?

The final error I am getting is this (I included the stdout from the program, and obfuscated the directory structure in the errors):

before signal registration
after registration
starting data loading
Done with data loading
done loading data
loading GT background to condition on
bg shape torch.Size([256, 256, 3])
should be  torch.Size([256, 256, 3])
initialized latent codes with shape 551 X 32
computing boundix boxes probability maps
Starting loop
  0%|          | 0/1000000 [00:00<?, ?it/s]$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1634272092750/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|          | 0/1000000 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 593, in <module>
    main()
  File "$REPODIR/4D-Facial-Avatars/nerface_code/nerf-pytorch/train_transformed_rays.py", line 398, in main
    loss_total.backward()
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "$HOME/miniconda3/envs/new_nerf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2048, 128]], which is output 0 of ReluBackward0, is at version 2; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I am hoping it is just a versioning issue, but I am not sure.

This is very cool work and I would love to get it working! I would also echo and ask if there is a pretrained model floating around some where that I (and others) could take a look at!

Thanks!!

Question about background

Hi, Thanks for your amazing work!
Recently I'm trying to re-implement NerFACE. However, I'm struggling with the shaking background problem right now. 😢

As the paper mentioned: "The last sample on the ray r is assumed to lie on the background with a fixed
color". Does it mean we only replace the RGB output of the last sample per ray while leaving along with the density output?
In the original NeRF paper, points on the rays are sampled randomly within N evenly-spaced bins, so does NerFACE use the same point-sampling strategy or any others?
In the dave-dvp dataset, there are multiple background images rather than only one static background as the paper states. And there are some compression errors when comparing any two background images, even between training images and backgrounds (also in the still background area!). I'm not sure whether the errors mentioned above will affect the optimization of radiance fields or not.

Hoping to get some suggestions! Thanks in advance!

available trained model

thank you very much for your attention. Can you provide the trained model ?

Getting cuda memory issue. How to resolve this error?

Hi I am getting "CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.80 GiB total capacity; 4.51 GiB already allocated; 34.31 MiB free; 4.57 GiB reserved in total by PyTorch). on running train_transformed_rays.py script."

Please find the below details of error:

before signal registration
after registration
starting data loading
Done with data loading
done loading data
loading GT background to condition on
bg shape torch.Size([512, 512, 3])
should be  torch.Size([512, 512, 3])
initialized latent codes with shape 56 X 32
computing boundix boxes probability maps
Starting loop
  0%|                                                                                                                        | 0/1000000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_transformed_rays.py", line 608, in <module>
    main()
  File "train_transformed_rays.py", line 342, in main
    rgb_coarse, _, _, rgb_fine, _, _, weights = run_one_iter_of_nerf(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 228, in run_one_iter_of_nerf
    pred = [
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 229, in <listcomp>
    predict_and_render_radiance(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 129, in predict_and_render_radiance
    radiance_field = run_network(
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 24, in run_network
    preds = [network_fn(batch, expressions, latent_code) for batch in batches]
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/train_utils.py", line 24, in <listcomp>
    preds = [network_fn(batch, expressions, latent_code) for batch in batches]
  File "/home/santu/miniconda3/envs/nerf2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/santu/4D-Facial-Avatars/nerface_code/nerf-pytorch/nerf/models.py", line 249, in forward
    x = self.relu(x)
  File "/home/santu/miniconda3/envs/nerf2/lib/python3.8/site-packages/torch/nn/functional.py", line 1063, in relu
    result = torch.relu(input)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 5.80 GiB total capacity; 4.58 GiB already allocated; 46.19 MiB free; 4.64 GiB reserved in total by PyTorch)

Note: My system configuration- i7, 1TB SSD, 16GB RAM and gtx2060 graphics. Can you please confirm if system configuration is the problem or anything else and if so then what is the minimum system requirement?

How to achieve the same effect of facial replay as the paper, and whether it can achieve real-time?

When I follow the steps to generate the image of this step, what should I do next，Hope to get guidance!

can't allocate memory: you tried to allocate 20484980736 bytes

This error occurred when I used the dataset you provided. I want to know what the hardware Env of the code are?
my pc:
GPU:2080ti 16G
CPU memory：32G

Will the code to recreate the results ever be released?

How to get the bounding box?

Hi, I'm trying to train this model on other videos. May I ask how to get the bounding box? So far I used face parsing model to get a semantic segmentation and extract a bounding box including both the face and torso part. It seems larger than the bounding box coming with NeRFace. I'm not sure if this is the correct way. Thanks for your time.

results is incorrect when using render_debug_camera_matrix

Hi Guy,

Thanks for your excellent work.

I use the function render_debug_camera_matrix to render the overlay using the cameras, the params feed in the function are transform_matrix of single image and intrinsics provided in your dataset, i.e., transforms_test.json, could you please tell me the correct way to use this function, or any process should be done for the two params before feeding into the function?

which repo you used in quantitative evaluation for DVP(Deep Video Portrait)?

hi,

I want to know which repo you used in quantitative evaluation for DVP? There is no open source repo in the github.
Could you share the link of this work (DVP) you used in your paper ?

What boundary values did you use for volume rendering?

Hi,

I'm trying to re-implement this work. Could you please kindly share the boundary values you used for volume rendering? a.k.a. the "near" and "far" value in most of the NeRF codebases.

Thanks!

Validation and testing results are not good

Hi,
Thanks for the great work and kind release of the implementation. I followed the modification in Issue#16 and could train the model with your default .yml file. However, the validation and testing results are not good visually (using ckpt at epoch 500k/600k/...). Most of them are blurry and some of them even only contains background. The training losses are decreasing, but the coarse loss of validation set is not. Could you please provide some suggestions about how to tune the parameters or fix the model training? Thanks a lot!

The validation results and loss curves from tensorboard snapshots are attached below:

Could not get the psnr result as paper mentioned

Hi, thanks for your great work ! I trained the model and could get visual good result.The frame results are attached below:

I trained 40k iterations, however,the validation psnr is less than 20, far below of the result about psnr 26.85. I wonder if it is because of the different dataset. So could you please provide the dataset which can be tested psnr 26.85 in the paper ?

Questions about the latent code

Hello, gafniguy, thank you for your brilliant paper and wonderful work.

I'm a little curious about the latent code. The paper said the latent code can compensate error in the facial experssion and pose estimation and make the image more sharper. I have three questions:

How to get the latent code? Sorry, I didn't find the description in the paper. Use a trained network, like lenet, resnet?
Why the latent code can make the reconstruction more sharper? Is it because the code will bring some information like edges?
Why fixed latent code is used for test set? Will it bring a problem if the first frame of traing set is very different from the test set?

How to generate customized dataset?

Hello, thanks for your work. Recently I would like to apply your approach in my customized dataset. I have captured a portrait video. However, I do not know to to get the expression value and the transformation matrices. Could you provide more details about how to estimate those parameters?

Question about the coordinate system

hi, many thanks for your great work! I have questions about the line code. In nerf, there is opengl -> opencv, so y and z mul -1. Is it right? But why you also mul -1. I don't know when to multiply -1
Thanks for your time!

A problem about the test results

Thanks for your wonderful work！

When tested using eval_transformed_rays.py, I got images have clearly visible pixels. As shown in the figure below. The model was trained 400K times. The configuration file is dave_DVP_lcode_fixed_bg_256_paper_model.yml.

What can I do to get clear generated images?

Could you provide the pretrained model?

Thanks a lot！

How much RAM required for training?

I am able to set up environment on Windows. Now that I run training I run into this issue:

Traceback (most recent call last):
File "F:\4D-Facial-Avatars\nerface_code\nerf-pytorch\train_transformed_rays.py", line 602, in <module>
main()
File "F:\4D-Facial-Avatars\nerface_code\nerf-pytorch\train_transformed_rays.py", line 62, in main
images, poses, render_poses, hwf, i_split, expressions, , bboxs = loadflame_data(
File "F:\4D-Facial-Avatars\nerface_code\nerf-pytorch\nerf\load_flame.py", line 87, in load_flame_data
imgs = (np.array(imgs) / 255.0).astype(np.float32)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 4.03 GiB for an array with shape (5507, 512, 512, 3) and data type uint8

index out of bounds

After successfully training, there is an error during evaluation:

File "eval_transformed_rays.py", line 433, in main
_, ray_directions_ablation = get_ray_bundle(hwf[0], hwf[1], hwf[2], render_poses[240+i][:3, :4])
IndexError: index 1000 is out of bounds for dimension 0 with size 1000

Do you have the idea how to fix this?

Thanks in advance!

gafniguy / 4d-facial-avatars Goto Github PK

4d-facial-avatars's People

Contributors

Stargazers

Watchers

Forkers

4d-facial-avatars's Issues

Recommend Projects

Recommend Topics

Recommend Org