Giter VIP home page Giter VIP logo

speech-driven-expressions's Introduction

Hi there 👋

speech-driven-expressions's People

Contributors

youngseng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

speech-driven-expressions's Issues

data

你好,感谢你的分享,我也是做口型驱动研究的,请问这次比赛的训练数据可否分享,感谢!

Inference error

Traceback (most recent call last):
File "./scripts/synthesize.py", line 233, in
main(args.ckpt_path, args.transcript_path, args.wav_path)
File "./scripts/synthesize.py", line 64, in main
args, generator, loss_fn, speaker_model, out_dim = utils.train_utils.load_checkpoint_and_model(checkpoint_path, device)
File "/home/xiaoduo/liu/Speech-driven-expressions-main/Tri/scripts/utils/train_utils.py", line 190, in load_checkpoint_and_model
generator.load_state_dict(checkpoint['gen_dict'],strict=False)
File "/home/xiaoduo/miniconda3/envs/3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1483, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGenerator:
size mismatch for audio_encoder.model.masked_spec_embed: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.feature_projection.projection.weight: copying a param with shape torch.Size([1024, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
size mismatch for audio_encoder.model.feature_projection.projection.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.encoder.pos_conv_embed.conv.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for audio_encoder.model.encoder.pos_conv_embed.conv.weight_v: copying a param with shape torch.Size([1024, 64, 128]) from checkpoint, the shape in current model is torch.Size([768, 48, 128]).

代码训练时异常

您好,非常感谢您的分享
但是我在尝试您的项目时
按要求说明要求处理好了数据后,运行train.py文件时,训练结果出现异常
loss值一直固定在0,我可以保证我的数据是正常的,但是不知道为何在训练时出现了这种问题

2023-02-02 15:36:26,276: Reading data '/mnt/vdh/Speech-driven-expressions-main/data/trn/lmdb/lmdb_train'...
2023-02-02 15:36:26,276: Found pre-loaded samples from /mnt/vdh/Speech-driven-expressions-main/data/trn/lmdb/lmdb_train_cache
2023-02-02 15:36:26,298: Reading data '/mnt/vdh/Speech-driven-expressions-main/data/val/lmdb/lmdb_test'...
2023-02-02 15:36:26,298: Found pre-loaded samples from /mnt/vdh/Speech-driven-expressions-main/data/val/lmdb/lmdb_test_cache
0 0
Some weights of the model checkpoint at /mnt/vdh/Audio2Face/chinese-roberta-wwm-ext were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    1
    2023-02-02 15:36:36,444: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:36,445: *** BEST VALIDATION LOSS: 0.000
    2023-02-02 15:36:38,995: Saved the checkpoint
    2023-02-02 15:36:39,542: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:39,542: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:40,084: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:40,084: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:40,615: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:40,616: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:41,163: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:41,164: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:41,726: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:41,727: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:42,268: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:42,268: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:42,829: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:42,830: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:43,382: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:43,382: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:43,903: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:43,903: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:44,440: [VAL] loss: 0.000, joint mae: 0.000 / 0.3s
    2023-02-02 15:36:44,441: best validation loss so far: 0.000 at EPOCH 0
    2023-02-02 15:36:47,256: Saved the checkpoint

请问您在训练时出现过类似问题么,您是如何处理并解决的呢?
再次感谢您的分享

Pretrain model load error

epoch 226
Some weights of the model checkpoint at /xhzyssd2/molingqiang/repo/Speech-driven-expressions/data/chinese-roberta-wwm-ext were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2022.16.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "synthesize.py", line 233, in <module>
    main(args.ckpt_path, args.transcript_path, args.wav_path)
  File "synthesize.py", line 64, in main
    checkpoint_path, device)
  File "/xhzyssd2/molingqiang/repo/Speech-driven-expressions/Tri/scripts/utils/train_utils.py", line 187, in load_checkpoint_and_model
    generator.load_state_dict(checkpoint['gen_dict'])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1498, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PoseGenerator:
        Unexpected key(s) in state_dict: "gru.weight_ih_l0", "gru.weight_hh_l0", "gru.bias_ih_l0", "gru.bias_hh_l0", "gru.weight_ih_l0_reverse", "gru.weight_hh_l0_reverse", "gru.bias_ih_l0_reverse", "gru.bias_hh_l0_reverse", "gru.weight_ih_l1", "gru.weight_hh_l1", "gru.bias_ih_l1", "gru.bias_hh_l1", "gru.weight_ih_l1_reverse", "gru.weight_hh_l1_reverse", "gru.bias_ih_l1_reverse", "gru.bias_hh_l1_reverse", "gru.weight_ih_l2", "gru.weight_hh_l2", "gru.bias_ih_l2", "gru.bias_hh_l2", "gru.weight_ih_l2_reverse", "gru.weight_hh_l2_reverse", "gru.bias_ih_l2_reverse", "gru.bias_hh_l2_reverse", "out.0.weight", "out.0.bias", "out.2.weight", "out.2.bias", "mutimodal_fusiom.weight", "mutimodal_fusiom.bias", "mutimodal_fusiom_.layers.0.self_attn.in_proj_weight", "mutimodal_fusiom_.layers.0.self_attn.in_proj_bias", "mutimodal_fusiom_.layers.0.self_attn.out_proj.weight", "mutimodal_fusiom_.layers.0.self_attn.out_proj.bias", "mutimodal_fusiom_.layers.0.linear1.weight", "mutimodal_fusiom_.layers.0.linear1.bias", "mutimodal_fusiom_.layers.0.linear2.weight", "mutimodal_fusiom_.layers.0.linear2.bias", "mutimodal_fusiom_.layers.0.norm1.weight", "mutimodal_fusiom_.layers.0.norm1.bias", "mutimodal_fusiom_.layers.0.norm2.weight", "mutimodal_fusiom_.layers.0.norm2.bias", "con1d_pitch.0.weight", "con1d_pitch.0.bias", "con1d_pitch.1.weight", "con1d_pitch.1.bias", "con1d_pitch.1.running_mean", "con1d_pitch.1.running_var", "con1d_pitch.1.num_batches_tracked", "con1d_pitch.3.weight", "con1d_pitch.3.bias", "con1d_pitch.4.weight", "con1d_pitch.4.bias", "con1d_pitch.4.running_mean", "con1d_pitch.4.running_var", "con1d_pitch.4.num_batches_tracked", "con1d_pitch.6.weight", "con1d_pitch.6.bias", "con1d_pitch.7.weight", "con1d_pitch.7.bias", "con1d_pitch.7.running_mean", "con1d_pitch.7.running_var", "con1d_pitch.7.num_batches_tracked", "con1d_erengy.0.weight", "con1d_erengy.0.bias", "con1d_erengy.1.weight", "con1d_erengy.1.bias", "con1d_erengy.1.running_mean", "con1d_erengy.1.running_var", "con1d_erengy.1.num_batches_tracked", "con1d_erengy.3.weight", "con1d_erengy.3.bias", "con1d_erengy.4.weight", "con1d_erengy.4.bias", "con1d_erengy.4.running_mean", "con1d_erengy.4.running_var", "con1d_erengy.4.num_batches_tracked", "con1d_erengy.6.weight", "con1d_erengy.6.bias", "con1d_erengy.7.weight", "con1d_erengy.7.bias", "con1d_erengy.7.running_mean", "con1d_erengy.7.running_var", "con1d_erengy.7.num_batches_tracked", "con1d_volume.0.weight", "con1d_volume.0.bias", "con1d_volume.1.weight", "con1d_volume.1.bias", "con1d_volume.1.running_mean", "con1d_volume.1.running_var", "con1d_volume.1.num_batches_tracked", "con1d_volume.3.weight", "con1d_volume.3.bias", "con1d_volume.4.weight", "con1d_volume.4.bias", "con1d_volume.4.running_mean", "con1d_volume.4.running_var", "con1d_volume.4.num_batches_tracked", "con1d_volume.6.weight", "con1d_volume.6.bias", "con1d_volume.7.weight", "con1d_volume.7.bias", "con1d_volume.7.running_mean", "con1d_volume.7.running_var", "con1d_volume.7.num_batches_tracked", "transformer_encoder_1.layers.0.self_attn.in_proj_weight", "transformer_encoder_1.layers.0.self_attn.in_proj_bias", "transformer_encoder_1.layers.0.self_attn.out_proj.weight", "transformer_encoder_1.layers.0.self_attn.out_proj.bias", "transformer_encoder_1.layers.0.linear1.weight", "transformer_encoder_1.layers.0.linear1.bias", "transformer_encoder_1.layers.0.linear2.weight", "transformer_encoder_1.layers.0.linear2.bias", "transformer_encoder_1.layers.0.norm1.weight", "transformer_encoder_1.layers.0.norm1.bias", "transformer_encoder_1.layers.0.norm2.weight", "transformer_encoder_1.layers.0.norm2.bias", "fusion_1.fusion_layer_1.weight", "fusion_1.fusion_layer_1.bias", "fusion_1.fusion_layer_3.weight", "fusion_1.fusion_layer_3.bias", "fusion_wo_emo.fusion_layer_1.weight", "fusion_wo_emo.fusion_layer_1.bias", "fusion_wo_emo.fusion_layer_3.weight", "fusion_wo_emo.fusion_layer_3.bias". 
        size mismatch for AIWIN.0.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for AIWIN.0.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for AIWIN.0.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for AIWIN.0.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 64]) from checkpoint, the shape in current model is torch.Size([2048, 66]).
        size mismatch for AIWIN.0.layers.0.linear2.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([66, 2048]).
        size mismatch for AIWIN.0.layers.0.linear2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN.0.layers.0.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for TransformerDecoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for TransformerDecoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.in_proj_weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([198, 66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.in_proj_bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([198]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.out_proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([66, 66]).
        size mismatch for TransformerDecoder.layers.0.multihead_attn.out_proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 64]) from checkpoint, the shape in current model is torch.Size([2048, 66]).
        size mismatch for TransformerDecoder.layers.0.linear2.weight: copying a param with shape torch.Size([64, 2048]) from checkpoint, the shape in current model is torch.Size([66, 2048]).
        size mismatch for TransformerDecoder.layers.0.linear2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm2.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm3.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for TransformerDecoder.layers.0.norm3.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([66]).
        size mismatch for AIWIN_.0.weight: copying a param with shape torch.Size([37, 64]) from checkpoint, the shape in current model is torch.Size([37, 66]).

Thank you for sharing your solution ! Is the pretrain model provided in result/output_myfastdtw_batchfist_interpolate_normalize_dropout_data_decoder_val3_5_1/train_multimodal_context/Readme.txt ready to use? Cause as I loaded in synthesize.py, yaild errors about shape missmach and unexpected keys. I can fix the shape mismath errors by replace d_model=64.

data preprocess

数据预处理 tsv 文本可以给个例子?
原始语音如何转tsv?

感谢!

如何验证

你好,非常感谢你的分享。对这个课题比较感兴趣,但不太了解,想请教下,假设训练了模型,如何验证呢? 就是怎么生成一段驱动视频呢?是有什么开源工具吗。

请求数据集

你好,我最近在学习这方面的算法,但是我不太清楚这个数据集的输入是什么样的,请问是否可以分享一下输入的数据格式呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.