youngseng / speech-driven-expressions Goto Github PK

View Code? Open in Web Editor NEW

70.0 70.0 14.0 11.06 MB

Speech-Driven Expression Blendshape Based on Single-Layer Self-attention Network (AIWIN 2022)

Python 100.00%

speech-driven-expressions's Introduction

Hi there 👋

speech-driven-expressions's People

Contributors

Stargazers

Watchers

Forkers

ishine liujingxiu23 qingboai ariafyy flashpoint493 matlab2017 secutron snamper shiyuzh2007 akaedu2012

speech-driven-expressions's Issues

data

你好，感谢你的分享，我也是做口型驱动研究的，请问这次比赛的训练数据可否分享，感谢！

Traceback (most recent call last):
File "./scripts/synthesize.py", line 233, in
main(args.ckpt_path, args.transcript_path, args.wav_path)
File "./scripts/synthesize.py", line 64, in main
args, generator, loss_fn, speaker_model, out_dim = utils.train_utils.load_checkpoint_and_model(checkpoint_path, device)
File "/home/xiaoduo/liu/Speech-driven-expressions-main/Tri/scripts/utils/train_utils.py", line 190, in load_checkpoint_and_model
generator.load_state_dict(checkpoint['gen_dict'],strict=False)
File "/home/xiaoduo/miniconda3/envs/3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGenerator:
size mismatch for audio_encoder.model.masked_spec_embed: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.feature_projection.projection.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for audio_encoder.model.feature_projection.projection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.encoder.pos_conv_embed.conv.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.encoder.pos_conv_embed.conv.weight_v: copying a param with shape torch.Size([1024, 64, 128]) from checkpoint, the shape in current model is torch.Size([768, 48, 128]).

代码训练时异常

您好，非常感谢您的分享
但是我在尝试您的项目时
按要求说明要求处理好了数据后，运行train.py文件时，训练结果出现异常
loss值一直固定在0，我可以保证我的数据是正常的，但是不知道为何在训练时出现了这种问题

2023-02-02 15:36:26,276: Reading data '/mnt/vdh/Speech-driven-expressions-main/data/trn/lmdb/lmdb_train'...
2023-02-02 15:36:26,276: Found pre-loaded samples from /mnt/vdh/Speech-driven-expressions-main/data/trn/lmdb/lmdb_train_cache
2023-02-02 15:36:26,298: Reading data '/mnt/vdh/Speech-driven-expressions-main/data/val/lmdb/lmdb_test'...
2023-02-02 15:36:26,298: Found pre-loaded samples from /mnt/vdh/Speech-driven-expressions-main/data/val/lmdb/lmdb_test_cache
0 0
Some weights of the model checkpoint at /mnt/vdh/Audio2Face/chinese-roberta-wwm-ext were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).

This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
1
2023-02-02 15:36:36,444: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:36,445: *** BEST VALIDATION LOSS: 0.000
2023-02-02 15:36:38,995: Saved the checkpoint
2023-02-02 15:36:39,542: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:39,542: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:40,084: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:40,084: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:40,615: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:40,616: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:41,163: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:41,164: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:41,726: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:41,727: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:42,268: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:42,268: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:42,829: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:42,830: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:43,382: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:43,382: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:43,903: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:43,903: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:44,440: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
2023-02-02 15:36:44,441: best validation loss so far: 0.000 at EPOCH 0
2023-02-02 15:36:47,256: Saved the checkpoint

请问您在训练时出现过类似问题么，您是如何处理并解决的呢？
再次感谢您的分享

Pretrain model load error

epoch 226
Some weights of the model checkpoint at /xhzyssd2/molingqiang/repo/Speech-driven-expressions/data/chinese-roberta-wwm-ext were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "synthesize.py", line 233, in <module>
    main(args.ckpt_path, args.transcript_path, args.wav_path)
  File "synthesize.py", line 64, in main
    checkpoint_path, device)
  File "/xhzyssd2/molingqiang/repo/Speech-driven-expressions/Tri/scripts/utils/train_utils.py", line 187, in load_checkpoint_and_model
    generator.load_state_dict(checkpoint['gen_dict'])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1498, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGenerator:
        Unexpected key(s) in state_dict: "gru.weight_ih_l0", "gru.weight_hh_l0", "gru.bias_ih_l0", "gru.bias_hh_l0", "gru.weight_ih_l0_reverse", "gru.weight_hh_l0_reverse", "gru.bias_ih_l0_reverse", "gru.bias_hh_l0_reverse", "gru.weight_ih_l1", "gru.weight_hh_l1", "gru.bias_ih_l1", "gru.bias_hh_l1", "gru.weight_ih_l1_reverse", "gru.weight_hh_l1_reverse", "gru.bias_ih_l1_reverse", "gru.bias_hh_l1_reverse", "gru.weight_ih_l2", "gru.weight_hh_l2", "gru.bias_ih_l2", "gru.bias_hh_l2", "gru.weight_ih_l2_reverse", "gru.weight_hh_l2_reverse", "gru.bias_ih_l2_reverse", "gru.bias_hh_l2_reverse", "out.0.weight", "out.0.bias", "out.2.weight", "out.2.bias", "mutimodal_fusiom.weight", "mutimodal_fusiom.bias", "mutimodal_fusiom_.layers.0.self_attn.in_proj_weight", "mutimodal_fusiom_.layers.0.self_attn.in_proj_bias", "mutimodal_fusiom_.layers.0.self_attn.out_proj.weight", "mutimodal_fusiom_.layers.0.self_attn.out_proj.bias", "mutimodal_fusiom_.layers.0.linear1.weight", "mutimodal_fusiom_.layers.0.linear1.bias", "mutimodal_fusiom_.layers.0.linear2.weight", "mutimodal_fusiom_.layers.0.linear2.bias", "mutimodal_fusiom_.layers.0.norm1.weight", "mutimodal_fusiom_.layers.0.norm1.bias", "mutimodal_fusiom_.layers.0.norm2.weight", "mutimodal_fusiom_.layers.0.norm2.bias", "con1d_pitch.0.weight", "con1d_pitch.0.bias", "con1d_pitch.1.weight", "con1d_pitch.1.bias", "con1d_pitch.1.running_mean", "con1d_pitch.1.running_var", "con1d_pitch.1.num_batches_tracked", "con1d_pitch.3.weight", "con1d_pitch.3.bias", "con1d_pitch.4.weight", "con1d_pitch.4.bias", "con1d_pitch.4.running_mean", "con1d_pitch.4.running_var", "con1d_pitch.4.num_batches_tracked", "con1d_pitch.6.weight", "con1d_pitch.6.bias", "con1d_pitch.7.weight", "con1d_pitch.7.bias", "con1d_pitch.7.running_mean", "con1d_pitch.7.running_var", "con1d_pitch.7.num_batches_tracked", "con1d_erengy.0.weight", "con1d_erengy.0.bias", "con1d_erengy.1.weight", "con1d_erengy.1.bias", "con1d_erengy.1.running_mean", "con1d_erengy.1.running_var", "con1d_erengy.1.num_batches_tracked", "con1d_erengy.3.weight", "con1d_erengy.3.bias", "con1d_erengy.4.weight", "con1d_erengy.4.bias", "con1d_erengy.4.running_mean", "con1d_erengy.4.running_var", "con1d_erengy.4.num_batches_tracked", "con1d_erengy.6.weight", "con1d_erengy.6.bias", "con1d_erengy.7.weight", "con1d_erengy.7.bias", "con1d_erengy.7.running_mean", "con1d_erengy.7.running_var", "con1d_erengy.7.num_batches_tracked", "con1d_volume.0.weight", "con1d_volume.0.bias", "con1d_volume.1.weight", "con1d_volume.1.bias", "con1d_volume.1.running_mean", "con1d_volume.1.running_var", "con1d_volume.1.num_batches_tracked", "con1d_volume.3.weight", "con1d_volume.3.bias", "con1d_volume.4.weight", "con1d_volume.4.bias", "con1d_volume.4.running_mean", "con1d_volume.4.running_var", "con1d_volume.4.num_batches_tracked", "con1d_volume.6.weight", "con1d_volume.6.bias", "con1d_volume.7.weight", "con1d_volume.7.bias", "con1d_volume.7.running_mean", "con1d_volume.7.running_var", "con1d_volume.7.num_batches_tracked", "transformer_encoder_1.layers.0.self_attn.in_proj_weight", "transformer_encoder_1.layers.0.self_attn.in_proj_bias", "transformer_encoder_1.layers.0.self_attn.out_proj.weight", "transformer_encoder_1.layers.0.self_attn.out_proj.bias", "transformer_encoder_1.layers.0.linear1.weight", "transformer_encoder_1.layers.0.linear1.bias", "transformer_encoder_1.layers.0.linear2.weight", "transformer_encoder_1.layers.0.linear2.bias", "transformer_encoder_1.layers.0.norm1.weight", "transformer_encoder_1.layers.0.norm1.bias", "transformer_encoder_1.layers.0.norm2.weight", "transformer_encoder_1.layers.0.norm2.bias", "fusion_1.fusion_layer_1.weight", "fusion_1.fusion_layer_1.bias", "fusion_1.fusion_layer_3.weight", "fusion_1.fusion_layer_3.bias", "fusion_wo_emo.fusion_layer_1.weight", "fusion_wo_emo.fusion_layer_1.bias", "fusion_wo_emo.fusion_layer_3.weight", "fusion_wo_emo.fusion_layer_3.bias". 
        size mismatch for AIWIN.0.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for AIWIN.0.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for AIWIN.0.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for AIWIN.0.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 64]) from checkpoint, the shape in current model is torch.Size([2048, 66]).
        size mismatch for AIWIN.0.layers.0.linear2.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([66, 2048]).
        size mismatch for AIWIN.0.layers.0.linear2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for TransformerDecoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 64]) from checkpoint, the shape in current model is torch.Size([2048, 66]).
        size mismatch for TransformerDecoder.layers.0.linear2.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([66, 2048]).
        size mismatch for TransformerDecoder.layers.0.linear2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm3.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm3.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN_.0.weight: copying a param with shape torch.Size([37, 64]) from checkpoint, the shape in current model is torch.Size([37, 66]).

Thank you for sharing your solution ! Is the pretrain model provided in result/output_myfastdtw_batchfist_interpolate_normalize_dropout_data_decoder_val3_5_1/train_multimodal_context/Readme.txt ready to use? Cause as I loaded in synthesize.py, yaild errors about shape missmach and unexpected keys. I can fix the shape mismath errors by replace d_model=64.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.