ziqiaopeng / synctalk Goto Github PK

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Home Page: https://ziqiaopeng.github.io/synctalk/

License: Other

Python 61.72% C++ 0.81% Cuda 35.19% C 2.28%

audio-driven-talking-face cvpr cvpr2024 talking-face talking-face-generation talking-head

synctalk's People

Contributors

Stargazers

Watchers

synctalk's Issues

Bundle Adjustment I used co-tracker but looks like the facial landmark of the mouth is not correct?

对视频的与处理代码

恭喜CVPR2024，请问预处理代码啥时候发啊。想要训练自己的视频，万分感谢！！

Can't install ./freqencoder ./shencoder ./gridencoder ./raymarching on Colab

how to use ave encode？

I used ave audio encoder to replace ernerf's audio encoder, and unfortunately I got a worse result.

UserWarning: No faces were detected.

When I follow the instructions in the readme to run the evaluation code, an error occurs.How should I solve it?

python main.py data/May --workspace model/trial_may -O --test --asr_model ave

Namespace(H=450, O=True, W=450, amb_aud_loss=1, amb_dim=2, amb_eye_loss=1, asr=False, asr_model='ave', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, bs_area='upper', ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=20000, init_lips=False, iters=200000, l=10, lambda_amb=0.1, lr=0.01, lr_net=0.001, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/May', portrait=False, preload=0, pyramid_loss=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=True, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, unc_loss=1, update_extra_interval=16, upsample_steps=0, warmup_step=10000, workspace='model/trial_may')
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2024-03-20_03-36-57 | cuda | fp16 | model/trial_may
[INFO] #parameters: 768165
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
[INFO] load 553 test frames.
[INFO] load  aud_features: torch.Size([6072, 1, 512])
Loading test data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 553/553 [00:00<00:00, 6888.24it/s]
[INFO] eye_area: -0.06423357874155045 - 0.9898659586906433
==> Start Test, save results to model/trial_may/results
100% 551/553 [00:12<00:00, 44.98it/s]==> Finished Test.
100% 553/553 [00:14<00:00, 38.84it/s]
++> Evaluate at epoch 0 ...
  0% 0/553 [00:00<?, ?it/s]/home/sg/miniconda3/envs/synctalk/lib/python3.8/site-packages/face_alignment/api.py:147: UserWarning: No faces were detected.
  warnings.warn("No faces were detected.")
Traceback (most recent call last):
  File "main.py", line 211, in <module>
    trainer.evaluate(test_loader)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 1039, in evaluate
    self.evaluate_one_epoch(loader, name)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 1403, in evaluate_one_epoch
    metric.update(preds, truths)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 589, in update
    lms_pred = self.get_landmarks(preds)
  File "/home/sg/github/SyncTalk/nerf_triplane/utils.py", line 560, in get_landmarks
    lms = self.predictor.get_landmarks(img)[-1]
TypeError: 'NoneType' object is not subscriptable
  0% 0/553 [00:00<?, ?it/s]

Can this project run on Windows 10?

Hi, nowadays I'm interested in talking head generation and with the demo video for this project seems really awesome.
I want to try this one but I have some errors from env making...
Anyway, can this project run on Windows 10?

Code Release

Nothing to say much. Anytime soon?

按照training code 得到的checkpoints 没有眨眼的动作

The checkpoints obtained according to the training code do not have any blinking action，that's why?

The expressions and head poses for different driving audio clips are the same

Hi, thanks for sharing the great work.

I managed to test the model using customized audio clips. However, I noticed that the expressions and head poses for the generated videos follow the same motion pattern (even provided with different audio clips).

Could you please help advise how to generate diverse motions? And is it possible to explicitly control the generated expressions & head poses by user inputs?

Looking forward to the responses. Thanks!

--bg_img doesn't work when using --portrait

The shape of aud_features ?

first_frame, last_frame = outputs[:1], outputs[-1:] aud_features = torch.cat([first_frame.repeat(2, 1), outputs, last_frame.repeat(2, 1)], dim=0).numpy()
Why copy a few copies before and after extracting audio features？Do I also need to process aud_features in this way during training ?

how can i get the masked images?

the face_mask pictures in data ,anyone knows how can i get in?

How can i get the aud_ave.npy in May?

This looks amazing!. Any timeline of code release?

unstable head result with my own model

i tried to preprocess my own video data, the bs.npy is generated from EmoTalk project.

is there only head training, not torso?

gride_encode_forward出现错误，该如何解决？

project/SyncTalk/gridencoder/grid.py", line 222, in forward
_backend.grid_encode_forward(inputs, embeddings, offsets, outputs, B, D, C, L, S, H, dy_dx, gridtype, align_corners)
TypeError: grid_encode_forward(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: int, arg5: int, arg6: int, arg7: int, arg8: float, arg9: int, arg10: Optional[torch.Tensor], arg11: int, arg12: bool, arg13: int) -> None

Invoked with: tensor([[0.5000, 0.5000],
[0.5000, 0.5000],
[0.5000, 0.5000],
...,
[0.5000, 0.5000],
[0.5000, 0.5000],
[0.5000, 0.5000]], device='cuda:0'), Parameter containing:
tensor([[-3.6624e-07],
[-8.1951e-07],
[ 3.8581e-07],
...,
[-4.4871e-02],
[ 4.7605e-02],
[ 3.7443e-01]], device='cuda:0', requires_grad=True), tensor([ 0, 4232, 10480, 19512, 32512, 48896, 65280, 81664, 98048,
114432, 130816, 147200, 163584], device='cuda:0', dtype=torch.int32), tensor([[[0.],
[0.],
[0.],
...,
[0.],
[0.],
[0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    ...,

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]],

    [[0.],
     [0.],
     [0.],
     ...,
     [0.],
     [0.],
     [0.]]], device='cuda:0'), 262272, 2, 1, 12, 0.2727272727272726, 64, None, 0, False

代码预计什么时候可以发布呢？

非常好的工作，有很大的启发！请问，预计什么时候可以发布代码呢？非常感谢！

请问tri plane hash mlp在哪个部分

请问使用了什么方法将生成的视频头部拼回半身视频呢？

您好！想咨询一下在项目主页展示的视频中有May的半身视频中，是如何将生成的头部贴合原视频的身体的？

what is transforms_train.json?

Colab demo

When do you release colab notebook? Thanks 😊

exciting

waiting

"face_rect"？

"face_rect": [
61,
0,
384,
468
] means [xmin, ymin, w, h]?

代码在上传吗？它训练快速吗

感谢你们杰出的工作。
代码在上传吗？它训练快速吗。

Thank you for your outstanding work.
Is the code being uploaded? Does it train quickly?

how to process my own video?

how to process my own video data and train model? the preprocessing code is not releasing yet.

waiting....

--asr_model Hubert?

我在用hubert来重新训练May数据，得到的人物嘴巴抖动很快，使用Hubert来训练需要更改其他的设置吗？
为什么hubert的self.audio_in_dim = 27，在er-nerf中是1024？
if 'esperanto' in self.opt.asr_model:
self.audio_in_dim = 44
elif 'deepspeech' in self.opt.asr_model:
self.audio_in_dim = 29
elif 'hubert' in self.opt.asr_model:
self.audio_in_dim = 27
else:
self.audio_in_dim = 32

FileNotFoundError: [Errno 2] No such file or directory: 'data/May/transforms_val.json'

Namespace(path='data/May', O=True, test=True, test_train=False, data_range=[0, -1], workspace='model/trial_may', seed=0, iters=200000, lr=0.01, lr_net=0.001, ckpt='latest', num_rays=65536, cuda_ray=True, max_steps=16, num_steps=16, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, warmup_step=10000, amb_aud_loss=1, amb_eye_loss=1, unc_loss=1, lambda_amb=0.1, pyramid_loss=0, fp16=True, bg_img='', fbg=False, exp_eye=True, fix_eye=-1, smooth_eye=False, bs_area='upper', torso_shrink=0.8, color_space='srgb', preload=0, bound=1, scale=4, offset=[0, 0, 0], dt_gamma=0.00390625, min_near=0.05, density_thresh=10, density_thresh_torso=0.01, patch_size=1, init_lips=False, finetune_lips=False, smooth_lips=False, torso=False, head_ckpt='', gui=False, W=450, H=450, radius=3.35, fovy=21.24, max_spp=1, att=2, aud='', emb=False, portrait=False, ind_dim=4, ind_num=20000, ind_dim_torso=8, amb_dim=2, part=False, part2=False, train_camera=False, smooth_path=False, smooth_path_window=7, asr=False, asr_wav='', asr_play=False, asr_model='ave', asr_save_feats=False, fps=50, l=10, m=50, r=10)
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/models/utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth
100% 233M/233M [00:01<00:00, 244MB/s]
Loading model from: /usr/local/lib/python3.10/dist-packages/lpips/weights/v0.1/alex.pth
Downloading: "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth" to /root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth
100% 85.7M/85.7M [14:43<00:00, 102kB/s]
Downloading: "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip" to /root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip
100% 91.9M/91.9M [00:04<00:00, 21.7MB/s]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /usr/local/lib/python3.10/dist-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2024-03-11_15-57-00 | cuda | fp16 | model/trial_may
[INFO] #parameters: 768165
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is model/trial_may/checkpoints/ngp_ep0019.pth
[INFO] loaded model.
[INFO] load at epoch 19, global step 104880
[WARN] Failed to load optimizer.
[INFO] loaded scheduler.
[INFO] loaded scaler.
Traceback (most recent call last):
File "/content/drive/MyDrive/SyncTalk/main.py", line 192, in
test_loader = NeRFDataset(opt, device=device, type='test').dataloader()
File "/content/drive/MyDrive/SyncTalk/nerf_triplane/provider.py", line 127, in init
with open(os.path.join(self.root_path, f'transforms{_split}.json'), 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/May/transforms_val.json'

预计什么时候开源代码？

很棒的工作，预计什么时候开源代码和checkpoints?

FakeOpen?

right?

请问Portrait-Sync Generator这部分模块在哪个部分？

我想知道是所有人物只需要训练一次就足够还是，每换过一个目标人物就要训练某个部分

Preprocessing Code

Good Job!
When will the preprocessed code be released？？

size mismatch for sigma_net.net.0.weight: copying a param with shape torch.Size([64, 75]) from checkpoint, the shape in current

D:\anaconda3\python.exe D:\work\SyncTalk\main.py
Namespace(path='data/May', O=False, test=False, test_train=False, data_range=[0, -1], workspace='model/trial_may', seed=0, iters=200000, lr=0.01, lr_net=0.001, ckpt='latest', num_rays=65536, cuda_ray=True, max_steps=16, num_steps=16, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, warmup_step=10000, amb_aud_loss=1, amb_eye_loss=1, unc_loss=1, lambda_amb=0.1, pyramid_loss=0, fp16=False, bg_img='', fbg=False, exp_eye=False, fix_eye=-1, smooth_eye=False, bs_area='upper', torso_shrink=0.8, color_space='srgb', preload=0, bound=1, scale=4, offset=[0, 0, 0], dt_gamma=0.00390625, min_near=0.05, density_thresh=10, density_thresh_torso=0.01, patch_size=1, init_lips=False, finetune_lips=False, smooth_lips=False, torso=False, head_ckpt='', gui=False, W=450, H=450, radius=3.35, fovy=21.24, max_spp=1, att=2, aud='demo/test.wav', emb=False, portrait=False, ind_dim=4, ind_num=20000, ind_dim_torso=8, amb_dim=2, part=False, part2=False, train_camera=False, smooth_path=False, smooth_path_window=7, asr=False, asr_wav='', asr_play=False, asr_model='ave', asr_save_feats=False, fps=50, l=10, m=50, r=10)
[INFO] load 5520 train frames.
[INFO] load demo/test.wav aud_features: torch.Size([331, 1, 512])
Loading train data: 100%|██████████| 5520/5520 [00:02<00:00, 2019.28it/s]
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\anaconda3\lib\site-packages\lpips\weights\v0.1\alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
D:\anaconda3\lib\site-packages\torchvision\models_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1. You can also use weights=AlexNet_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: D:\anaconda3\lib\site-packages\lpips\weights\v0.1\alex.pth
[INFO] Trainer: ngp | 2024-03-10_11-51-37 | cuda | fp32 | model/trial_may
[INFO] #parameters: 767717
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is model/trial_may\checkpoints\ngp_ep0019.pth
Traceback (most recent call last):
File "D:\work\SyncTalk\main.py", line 237, in
trainer = Trainer('ngp', opt, model, device=device, workspace=opt.workspace, optimizer=optimizer, criterion=criterion, ema_decay=0.95, fp16=opt.fp16, lr_scheduler=scheduler, scheduler_update_every_step=True, metrics=metrics, use_checkpoint=opt.ckpt, eval_interval=eval_interval)
File "D:\work\SyncTalk\nerf_triplane\utils.py", line 735, in init
self.load_checkpoint()
File "D:\work\SyncTalk\nerf_triplane\utils.py", line 1532, in load_checkpoint
missing_keys, unexpected_keys = self.model.load_state_dict(checkpoint_dict['model'], strict=False)
File "D:\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for NeRFNetwork:
size mismatch for sigma_net.net.0.weight: copying a param with shape torch.Size([64, 75]) from checkpoint, the shape in current model is torch.Size([64, 68]).

Process finished with exit code 1

预处理中颈部的填充Cn实现

你好，我想问一下，对于预处理中的颈部的填充是如何填的呢，我借鉴于ER-NeRF的代码来进行修改

我根据论文的**，取出对应图片的颈部的RGB颜色，然后填充，但是好像效果不一样，就请问一下，大概如何实现的

index = np.where(neck_part)
neck_part_image = torso_image[index[0], index[1], :]
neck_part_alpha = torso_alpha[index[0], index[1]]
mean_neck_part_color = np.mean(neck_part_image, axis=0)
mean_neck_part_alpha = np.mean(neck_part_alpha)
torso_image[inpaint_mask] = mean_neck_part_color
torso_alpha[inpaint_mask] = mean_neck_part_alpha

ziqiaopeng / synctalk Goto Github PK

synctalk's People

Contributors

Stargazers

Watchers

Forkers

synctalk's Issues

Recommend Projects

Recommend Topics

Recommend Org