Giter VIP home page Giter VIP logo

ziqiaopeng / synctalk Goto Github PK

View Code? Open in Web Editor NEW
887.0 887.0 92.0 29.56 MB

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Home Page: https://ziqiaopeng.github.io/synctalk/

License: Other

Python 61.72% C++ 0.81% Cuda 35.19% C 2.28%
audio-driven-talking-face cvpr cvpr2024 talking-face talking-face-generation talking-head

synctalk's People

Contributors

ziqiaopeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

synctalk's Issues

how to use ave encode?

I used ave audio encoder to replace ernerf's audio encoder, and unfortunately I got a worse result.

UserWarning: No faces were detected.

When I follow the instructions in the readme to run the evaluation code, an error occurs.How should I solve it?

python main.py data/May --workspace model/trial_may -O --test --asr_model ave

Namespace(H=450, O=True, W=450, amb_aud_loss=1, amb_dim=2, amb_eye_loss=1, asr=False, asr_model='ave', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, bs_area='upper', ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=20000, init_lips=False, iters=200000, l=10, lambda_amb=0.1, lr=0.01, lr_net=0.001, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/May', portrait=False, preload=0, pyramid_loss=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=True, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, unc_loss=1, update_extra_interval=16, upsample_steps=0, warmup_step=10000, workspace='model/trial_may')
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2024-03-20_03-36-57 | cuda | fp16 | model/trial_may
[INFO] #parameters: 768165
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
[INFO] load 553 test frames.
[INFO] load  aud_features: torch.Size([6072, 1, 512])
Loading test data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 553/553 [00:00<00:00, 6888.24it/s]
[INFO] eye_area: -0.06423357874155045 - 0.9898659586906433
==> Start Test, save results to model/trial_may/results
100% 551/553 [00:12<00:00, 44.98it/s]==> Finished Test.
100% 553/553 [00:14<00:00, 38.84it/s]
++> Evaluate at epoch 0 ...
  0% 0/553 [00:00<?, ?it/s]/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/face_alignment/api.py:147: UserWarning: No faces were detected.
  warnings.warn("No faces were detected.")
Traceback (most recent call last):
  File "main.py", line 211, in <module>
    trainer.evaluate(test_loader)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 1039, in evaluate
    self.evaluate_one_epoch(loader, name)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 1403, in evaluate_one_epoch
    metric.update(preds, truths)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 589, in update
    lms_pred = self.get_landmarks(preds)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 560, in get_landmarks
    lms = self.predictor.get_landmarks(img)[-1]
TypeError: 'NoneType' object is not subscriptable
  0% 0/553 [00:00<?, ?it/s]

Code

Add code, don't tease SOTA

Can this project run on Windows 10?

Hi, nowadays I'm interested in talking head generation and with the demo video for this project seems really awesome.
I want to try this one but I have some errors from env making...
Anyway, can this project run on Windows 10?

The expressions and head poses for different driving audio clips are the same

Hi, thanks for sharing the great work.

I managed to test the model using customized audio clips. However, I noticed that the expressions and head poses for the generated videos follow the same motion pattern (even provided with different audio clips).

Could you please help advise how to generate diverse motions? And is it possible to explicitly control the generated expressions & head poses by user inputs?

Looking forward to the responses. Thanks!

The shape of aud_features ?

first_frame, last_frame = outputs[:1], outputs[-1:] aud_features = torch.cat([first_frame.repeat(2, 1), outputs, last_frame.repeat(2, 1)], dim=0).numpy()
Why copy a few copies before and after extracting audio features?Do I also need to process aud_features in this way during training ?

gride_encode_forward出现错误,该如何解决?

project/SyncTalk/gridencoder/grid.py", line 222, in forward
_backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S, H, dy_dx, gridtype, align_corners)
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: float, arg9: int, arg10: Optional[torch.Tensor], arg11: int, arg12: bool, arg13: int) -> None

Invoked with: tensor([[0.5000, 0.5000],
[0.5000, 0.5000],
[0.5000, 0.5000],
...,
[0.5000, 0.5000],
[0.5000, 0.5000],
[0.5000, 0.5000]], device='cuda:0'), Parameter containing:
tensor([[-3.6624e-07],
[-8.1951e-07],
[ 3.8581e-07],
...,
[-4.4871e-02],
[ 4.7605e-02],
[ 3.7443e-01]], device='cuda:0', requires_grad=True), tensor([ 0, 4232, 10480, 19512, 32512, 48896, 65280, 81664, 98048,
114432, 130816, 147200, 163584], device='cuda:0', dtype=torch.int32), tensor([[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    ...,

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]]], device='cuda:0'), 262272, 2, 1, 12, 0.2727272727272726, 64, None, 0, False

Colab demo

When do you release colab notebook? Thanks 😊

"face_rect"?

"face_rect": [
61,
0,
384,
468
] means [xmin, ymin, w, h]?
0_with_face_rect

代码在上传吗?它训练快速吗

感谢你们杰出的工作。
代码在上传吗?它训练快速吗。


Thank you for your outstanding work.
Is the code being uploaded? Does it train quickly?


--asr_model Hubert?

  1. 我在用hubert来重新训练May数据,得到的人物嘴巴抖动很快,使用Hubert来训练需要更改其他的设置吗?
  2. 为什么hubert的self.audio_in_dim = 27,在er-nerf中是1024?
    if 'esperanto' in self.opt.asr_model:
    self.audio_in_dim = 44
    elif 'deepspeech' in self.opt.asr_model:
    self.audio_in_dim = 29
    elif 'hubert' in self.opt.asr_model:
    self.audio_in_dim = 27
    else:
    self.audio_in_dim = 32

FileNotFoundError: [Errno 2] No such file or directory: 'data/May/transforms_val.json'

Namespace(path='data/May', O=True, test=True, test_train=False, data_range=[0, -1], workspace='model/trial_may', seed=0, iters=200000, lr=0.01, lr_net=0.001, ckpt='latest', num_rays=65536, cuda_ray=True, max_steps=16, num_steps=16, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, warmup_step=10000, amb_aud_loss=1, amb_eye_loss=1, unc_loss=1, lambda_amb=0.1, pyramid_loss=0, fp16=True, bg_img='', fbg=False, exp_eye=True, fix_eye=-1, smooth_eye=False, bs_area='upper', torso_shrink=0.8, color_space='srgb', preload=0, bound=1, scale=4, offset=[0, 0, 0], dt_gamma=0.00390625, min_near=0.05, density_thresh=10, density_thresh_torso=0.01, patch_size=1, init_lips=False, finetune_lips=False, smooth_lips=False, torso=False, head_ckpt='', gui=False, W=450, H=450, radius=3.35, fovy=21.24, max_spp=1, att=2, aud='', emb=False, portrait=False, ind_dim=4, ind_num=20000, ind_dim_torso=8, amb_dim=2, part=False, part2=False, train_camera=False, smooth_path=False, smooth_path_window=7, asr=False, asr_wav='', asr_play=False, asr_model='ave', asr_save_feats=False, fps=50, l=10, m=50, r=10)
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/models/utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100% 233M/233M [00:01<00:00, 244MB/s]
Loading model from: /usr/local/lib/python3.10/dist-packages/lpips/weights/v0.1/alex.pth
Downloading: "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" to /root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth
100% 85.7M/85.7M [14:43<00:00, 102kB/s]
Downloading: "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip" to /root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip
100% 91.9M/91.9M [00:04<00:00, 21.7MB/s]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /usr/local/lib/python3.10/dist-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2024-03-11_15-57-00 | cuda | fp16 | model/trial_may
[INFO] #parameters: 768165
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is model/trial_may/checkpoints/ngp_ep0019.pth
[INFO] loaded model.
[INFO] load at epoch 19, global step 104880
[WARN] Failed to load optimizer.
[INFO] loaded scheduler.
[INFO] loaded scaler.
Traceback (most recent call last):
File "/content/drive/MyDrive/SyncTalk/main.py", line 192, in
test_loader = NeRFDataset(opt, device=device, type='test').dataloader()
File "/content/drive/MyDrive/SyncTalk/nerf_triplane/provider.py", line 127, in init
with open(os.path.join(self.root_path, f'transforms
{_split}.json'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/May/transforms_val.json'

size mismatch for sigma_net.net.0.weight: copying a param with shape torch.Size([64, 75]) from checkpoint, the shape in current

D:\anaconda3\python.exe D:\work\SyncTalk\main.py
Namespace(path='data/May', O=False, test=False, test_train=False, data_range=[0, -1], workspace='model/trial_may', seed=0, iters=200000, lr=0.01, lr_net=0.001, ckpt='latest', num_rays=65536, cuda_ray=True, max_steps=16, num_steps=16, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, warmup_step=10000, amb_aud_loss=1, amb_eye_loss=1, unc_loss=1, lambda_amb=0.1, pyramid_loss=0, fp16=False, bg_img='', fbg=False, exp_eye=False, fix_eye=-1, smooth_eye=False, bs_area='upper', torso_shrink=0.8, color_space='srgb', preload=0, bound=1, scale=4, offset=[0, 0, 0], dt_gamma=0.00390625, min_near=0.05, density_thresh=10, density_thresh_torso=0.01, patch_size=1, init_lips=False, finetune_lips=False, smooth_lips=False, torso=False, head_ckpt='', gui=False, W=450, H=450, radius=3.35, fovy=21.24, max_spp=1, att=2, aud='demo/test.wav', emb=False, portrait=False, ind_dim=4, ind_num=20000, ind_dim_torso=8, amb_dim=2, part=False, part2=False, train_camera=False, smooth_path=False, smooth_path_window=7, asr=False, asr_wav='', asr_play=False, asr_model='ave', asr_save_feats=False, fps=50, l=10, m=50, r=10)
[INFO] load 5520 train frames.
[INFO] load demo/test.wav aud_features: torch.Size([331, 1, 512])
Loading train data: 100%|██████████| 5520/5520 [00:02<00:00, 2019.28it/s]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\anaconda3\lib\site-packages\lpips\weights\v0.1\alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\anaconda3\lib\site-packages\lpips\weights\v0.1\alex.pth
[INFO] Trainer: ngp | 2024-03-10_11-51-37 | cuda | fp32 | model/trial_may
[INFO] #parameters: 767717
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is model/trial_may\checkpoints\ngp_ep0019.pth
Traceback (most recent call last):
File "D:\work\SyncTalk\main.py", line 237, in
trainer = Trainer('ngp', opt, model, device=device, workspace=opt.workspace, optimizer=optimizer, criterion=criterion, ema_decay=0.95, fp16=opt.fp16, lr_scheduler=scheduler, scheduler_update_every_step=True, metrics=metrics, use_checkpoint=opt.ckpt, eval_interval=eval_interval)
File "D:\work\SyncTalk\nerf_triplane\utils.py", line 735, in init
self.load_checkpoint()
File "D:\work\SyncTalk\nerf_triplane\utils.py", line 1532, in load_checkpoint
missing_keys, unexpected_keys = self.model.load_state_dict(checkpoint_dict['model'], strict=False)
File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for NeRFNetwork:
size mismatch for sigma_net.net.0.weight: copying a param with shape torch.Size([64, 75]) from checkpoint, the shape in current model is torch.Size([64, 68]).

Process finished with exit code 1

预处理中颈部的填充Cn实现

你好,我想问一下,对于预处理中的颈部的填充是如何填的呢,我借鉴于ER-NeRF的代码来进行修改

我根据论文的**,取出对应图片的颈部的RGB颜色,然后填充,但是好像效果不一样,就请问一下,大概如何实现的
image

index = np.where(neck_part)
neck_part_image = torso_image[index[0], index[1], :]
neck_part_alpha = torso_alpha[index[0], index[1]]
mean_neck_part_color = np.mean(neck_part_image, axis=0)
mean_neck_part_alpha = np.mean(neck_part_alpha)
torso_image[inpaint_mask] = mean_neck_part_color
torso_alpha[inpaint_mask] = mean_neck_part_alpha

5_inpaint_mask

5

5

关于uncertainty loss

本项目中loss_u和loss_static_uncertainty使用了与er-nerf不同的权重,请教下具体原因是什么?

inference结果没声音

The inference results are just like the evalution code and have no audio, anyone has the same problem lime me?

preprocessing mask face

Thanks for your amazing work!

image

what is the filter you applied to get the left image after parsing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.