Giter VIP home page Giter VIP logo

Comments (12)

WenbingWei avatar WenbingWei commented on July 22, 2024

Hi, I can't reproduce his results either. How much PESQ can you reproduce so far?

from cmgan.

hbwu-ntu avatar hbwu-ntu commented on July 22, 2024

Very low, only 3.2. Far more lower than the paper. By the way, how much PESQ can you reproduce?

from cmgan.

WenbingWei avatar WenbingWei commented on July 22, 2024

I am trained according to the parameters published in the code, and the data processing is in accordance with the way in the TSTNN code. At 50epoch, the PESQ is 3.24. Then gen_loss starts to rise, and the model does not converge.

from cmgan.

hbwu-ntu avatar hbwu-ntu commented on July 22, 2024

Thanks for making the implementation details clear. It seems our results are similar. Do you use the same way as the CMGAN github repo to downsample the data to 16k?

from cmgan.

WenbingWei avatar WenbingWei commented on July 22, 2024

No, I downsampled the original Voice Bank+DEMAND data to 16K. I think the results in the paper cannot be reproduced because of the hyperparameter and the learning rate.

from cmgan.

hbwu-ntu avatar hbwu-ntu commented on July 22, 2024

Apart from the learning rate, which other hyperparameters do you consider crucial for reproducing the results?
I recall that the authors address the learning rates in the paper. But from my experience, I can not reproduce the results using that learning rate.

from cmgan.

WenbingWei avatar WenbingWei commented on July 22, 2024

I think the learning rate in this code is more suitable for speech separation, such as the classic conv-tasnet. I think the hyperparameter of loss is also very important, but I don't know how much weight to assign to each loss can achieve the optimal value.

from cmgan.

SherifAbdulatif avatar SherifAbdulatif commented on July 22, 2024

Sorry for that, but this was never an issue for us and also we didn't get this complain on PESQ. Did you try to use the checkpoint in src/best_ckpt?

from cmgan.

SherifAbdulatif avatar SherifAbdulatif commented on July 22, 2024

Hi! Your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!

I can not reproduce the results in your paper. I just want to know some settings to run the experiments

  1. about the loss_weights, do you use the setting in your paper or the setting in your github?
  2. about the epoch number, do you use 50 in the paper or 120 in the github repo?
  3. how do you select the final model for inference?
  4. why you set the utterance length as 16 * 16000 during testing
  5. How do you downsample the audio, could you share the script?
  1. For reproducing you can use the checkpoint in src/best_ckpt
  2. Weights are the same as paper, you can find more details here
  3. When the loss saturates, sometimes it can happen between 50 to 75 epochs
  4. In testing the length is variable, but 10 is the maximum time that can run on our GPU. Otherwise we need to split the track.
  5. We already saved as downsampled, you can download from here https://drive.google.com/file/d/1pGV79T3k030f6uc2SbUpuNhfovtmLJxN/view?usp=sharing
    Alternatively we used the librosa which follow the same downsampler as torch:
    import librosa
    audio_down, sr = librosa.load(audio_path, sr=16000)

from cmgan.

hbwu-ntu avatar hbwu-ntu commented on July 22, 2024

Thank you very much for your warm and detailed response. I will follow the instructions provided and make an effort to reproduce the results:

  • download the data as you mentioned in the response 5
  • use the same weights as in the response 2
  • test all the checkpoints between 50 to 75 epochs as in the response 2.

By the way, I have some follow-up questions:

  1. During testing, why you set the length as variable? Fixing the batch size as 1 for testing won't encounter any GPU OOM issues.
  2. I downloaded your checkpoint from src/best_ckpt and conducted testing. The numbers are slightly worse but is close with the results mentioned in your paper. I'm attempting to reproduce the results. Have you conducted experiments by running multiple trials and calculating the mean and variance of performance? This would help ensure that the positive results are not solely due to good initializations.

from cmgan.

SherifAbdulatif avatar SherifAbdulatif commented on July 22, 2024

Variable length would avoid any normalization issues when splitting the tracks and it is much more convenient than padding tracks to a predefined maximum length or splitting tracks exceeding this length.
No actually not the results in the paper are from the best checkpoint not multiple trials, however, your point is a very interesting insight and should be involved in our future studies.
Thanks!

from cmgan.

SherifAbdulatif avatar SherifAbdulatif commented on July 22, 2024

However, it is worth mentioning that based on several training trials the results are somehow consistent.

from cmgan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.