Comments (12)
Hi, I can't reproduce his results either. How much PESQ can you reproduce so far?
from cmgan.
Very low, only 3.2. Far more lower than the paper. By the way, how much PESQ can you reproduce?
from cmgan.
I am trained according to the parameters published in the code, and the data processing is in accordance with the way in the TSTNN code. At 50epoch, the PESQ is 3.24. Then gen_loss starts to rise, and the model does not converge.
from cmgan.
Thanks for making the implementation details clear. It seems our results are similar. Do you use the same way as the CMGAN github repo to downsample the data to 16k?
from cmgan.
No, I downsampled the original Voice Bank+DEMAND data to 16K. I think the results in the paper cannot be reproduced because of the hyperparameter and the learning rate.
from cmgan.
Apart from the learning rate, which other hyperparameters do you consider crucial for reproducing the results?
I recall that the authors address the learning rates in the paper. But from my experience, I can not reproduce the results using that learning rate.
from cmgan.
I think the learning rate in this code is more suitable for speech separation, such as the classic conv-tasnet. I think the hyperparameter of loss is also very important, but I don't know how much weight to assign to each loss can achieve the optimal value.
from cmgan.
Sorry for that, but this was never an issue for us and also we didn't get this complain on PESQ. Did you try to use the checkpoint in src/best_ckpt?
from cmgan.
Hi! Your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!
I can not reproduce the results in your paper. I just want to know some settings to run the experiments
- about the loss_weights, do you use the setting in your paper or the setting in your github?
- about the epoch number, do you use 50 in the paper or 120 in the github repo?
- how do you select the final model for inference?
- why you set the utterance length as 16 * 16000 during testing
- How do you downsample the audio, could you share the script?
- For reproducing you can use the checkpoint in src/best_ckpt
- Weights are the same as paper, you can find more details here
- When the loss saturates, sometimes it can happen between 50 to 75 epochs
- In testing the length is variable, but 10 is the maximum time that can run on our GPU. Otherwise we need to split the track.
- We already saved as downsampled, you can download from here https://drive.google.com/file/d/1pGV79T3k030f6uc2SbUpuNhfovtmLJxN/view?usp=sharing
Alternatively we used the librosa which follow the same downsampler as torch:
import librosa
audio_down, sr = librosa.load(audio_path, sr=16000)
from cmgan.
Thank you very much for your warm and detailed response. I will follow the instructions provided and make an effort to reproduce the results:
- download the data as you mentioned in the response 5
- use the same weights as in the response 2
- test all the checkpoints between 50 to 75 epochs as in the response 2.
By the way, I have some follow-up questions:
- During testing, why you set the length as variable? Fixing the batch size as 1 for testing won't encounter any GPU OOM issues.
- I downloaded your checkpoint from src/best_ckpt and conducted testing. The numbers are slightly worse but is close with the results mentioned in your paper. I'm attempting to reproduce the results. Have you conducted experiments by running multiple trials and calculating the mean and variance of performance? This would help ensure that the positive results are not solely due to good initializations.
from cmgan.
Variable length would avoid any normalization issues when splitting the tracks and it is much more convenient than padding tracks to a predefined maximum length or splitting tracks exceeding this length.
No actually not the results in the paper are from the best checkpoint not multiple trials, however, your point is a very interesting insight and should be involved in our future studies.
Thanks!
from cmgan.
However, it is worth mentioning that based on several training trials the results are somehow consistent.
from cmgan.
Related Issues (20)
- RuntimeError HOT 3
- the change of gen_loss during training HOT 1
- RuntimeError
- RuntimeError HOT 2
- About the decreasing of loss HOT 1
- Training can get stuck HOT 6
- Inferior results trained from scratch HOT 7
- RuntimeeError HOT 1
- Can not reproduce the results HOT 3
- Training GPU requirements HOT 1
- File "pesq/cypesq.pyx", line 1, in init cypesq ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it)
- File "/anaconda3/envs/cmg/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 578, in __init__ dist._verify_model_across_ranks(self.process_group, parameters) RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc). HOT 2
- How do you resample to 16000? HOT 2
- 时域Loss计算疑惑
- the training speed confusion
- My server has a 3090, but reports that I don't have a gpu HOT 1
- Test set requirements when training
- epochs HOT 1
- 模型训练的采样率以及显卡训练配置咨询 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cmgan.