Giter VIP home page Giter VIP logo

difftalk's People

Contributors

sstzal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

difftalk's Issues

about loss function

Thanks to the author for sharing. I have a concern, which loss function is used in the model. I only found MSE loss in the code. If the author sees it, can you take a moment to answer it, thank you

What is the splits for HDTF?

Hey there! You mentioned that 100 videos randomly select from HDTF for testing and the remaining for training. Can you provide the specific filenames for us, thus we can compare our methods with yours.

Thanks a lot~

About DeepSpeechRNN

您原文提到了使用了该模型,但是您项目代码中没有用到这一模型,想请教您这个模型是否是在预处理的程序中引用?

image

Code license

Thank you for your research,
What is the code license?

The usage of RAM is always increasing during one epoch.

After preprocessing of HDTF dataset, I got 415 videos.
249 videos (60%) were randomly selected as training set, the others (40%) were test set.
The first 1500 frames of each video were extracted for training with stride 2.
So, I got 277,117 frames in training set, and 179,711 frames in test set.

My machine has 4 A100 GPUs with 40GB VRAM, and 377GB RAM and 72GB Swap.
In training, the batch size is set to 16.
At the first epoch, the usage of RAM is always increasing.
At step 2743, all RAM was occupied (even the Swap space) and the training stopped.
Thus, 2743 * 16 * 4 = 175,552 is the max number of frames can be used in training for my machine, and the test set was not token into account.
I tried to reduce the number of frames of both training and test set to 10,000 frames, and the training process is OK.

Questions @sstzal :

  • Did you meet the same problem in your training?
  • If so, how did you solve the problem?
  • Is it possible to release the weights of diffusion model?

I guess the reason of this problem is that there are too much log during training.

Do we need to have the same number of images, landmarks and audio features?

Thanks for your great work. I am confused one thing in preporcessing stage. When we extract images, landmarks and audio features from a video, do we need to have the same number of these files because I got different numbers of file. For example, I got 2247 images and 2247 landmarks but audio features of 937 files only. Could someone please answer this issue?

Where can I down the HDTF dataset?

"Please download the HDTF dataset for training and test, and process the dataset as following."

Sorry, I'm a newbie.
Please tell me where can i down the HDTF dataset, if you could give me a url, that would be great!

How did you split dataset for training and validation?

In data dir, there are data_test.txt for validation and data_train.txt for training.
How did you split dataset? By portrait or by videos?
By portrait means persons in training set are not repeated in validation set.
By videos means randomly spit all videos into training and validation set.

In train_name.txt, there are 98 videos. However, 99 videos can be found in data_train.txt.
What is the relationship between train_name.txt and data_train.txt?

Could you provide test_name.txt just like train_name.txt to indicate videos used in validation?

Inference question

When running inference, I only get an incomplete image with landmarks and mask. What do I need to do in order to get a clean image?
0000_0000

pretrained model

Awsome paper!
We are extremely interesting in your work. And we want to run a inference program to see the awsome result of the paper, could you share the entire pretrained model, may be share it through google or baidu online. Thanks for your time.

confusion in processing

|——data/HDTF
|——images
|——0_0.jpg
|——0_1.jpg
|——...
|——N_M.bin
|——landmarks
|——0_0.lmd
|——0_1.lmd
|——...
|——N_M.lms
what is N_M.bin should it be .jpg in images? and N_M.lmd in landmarks?

What's the data.txt?

error:No such file or directory: './data/HDTF/data.txt''./data/HDTF/data.txt'
Is this file the total of data_train.txt and data_test.txt?

about data_test

What does every line in data_test.txt mean?I guess first part before '_'means the id of video,the later one means the frame number of that video.But some of them don't have all of frames of original video.So what does every line exactly mean?

I encountered some difficulties in the process of reproducing the paper----the model cannot be loaded and cannot run through the inference process

hello friends Has anyone successfully reproduced this paper? I encountered some difficulties in the process of reproducing the paper, and I directly used the model parameters provided by the author. When strict is set to True in m, u = model.load_state_dict(sd, strict=True), the model cannot be loaded and cannot run through the reasoning process. I also trained it myself and found that the saved model reached 8.2G. Does anyone have the same problem, hope to get your help, thank you

channel error

    elif cond_class == "audio":
        if self.cond_stage_forward is None:
            bs = c.shape[0] # 20
            c = c.reshape(-1,16,29) # [20, 16, 29]
            c = self.cond_stage_model_for_audio(c) # [20, 64]
            c = c.reshape(bs, 8, -1) # [20, 8, 8]
            c = self.cond_stage_model_for_audio_smooth(c)

在处理音频信息的时候,网络要求输入维度是(B, 16, 29),c.reshape(-1,16,29)也可以确认网络的输入维度信息,我输入的音频信息与其一致,经过c = self.cond_stage_model_for_audio_smooth(c)的时候报错RuntimeError: Given groups=1, weight of size [16, 32, 3], expected input[20, 8, 8] to have 32 channels, but got 8 channels instead

deepspeech model version

I use the deepspeech==0.9.3, however, it has error:
graph_def.ParseFromString(f.read())
google.protobuf.message.DecodeError: Error parsing message with type 'tensorflow.GraphDef'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.