jasonswfu / metricgan Goto Github PK
View Code? Open in Web Editor NEWMetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019, with Travel awards)
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019, with Travel awards)
While I study MetricGAN, I have some questions.
What is a generative component of MetricGAN?
For example, SEGAN[1] has a generative component. It concatenates "random latent vectors" to its encoder output.
Do you classify this work as a generative or discriminative method?
Have you tried a pre-trained discriminator(D) that is trained with training data and then fixed while training the generator(G)?
As you mentioned in the paper, D should be called an evaluator. The training procedure of MetricGAN is not adversarial. G and D help each other.
Therefore, I'm wondering which is better, 'training G with pre-trained D' or 'training G and D together'.
[1] Pascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative adversarial network." arXiv preprint arXiv:1703.09452 (2017).
Thank you very much for sharing your great work.
Hi,
I was trying to see if I could access a pretrained SE model that I could use this method as a baseline for my work. I could not find it here, so if possible, can you please redirect me to the correct place? Thanks a lot!
I have an inquiry about the performance measurement method for the NSDTSEA (VCTK+DEMAND) data set published in the paper.
I think you measured MOS-LQO and not PESQ. Many open source codes convert the measurement results of WB PESQ into MOS-LQO and export them. So the range of values is not -0.5 to 4.5 as you wrote in your paper.
My own measurements on the same dataset show that the PESQ is 1.68. For MOS-LQO it measures 1.97 as shown in your paper. The remaining CSIG, CBAK, and COVL were measured in the same way as in the paper.
I would appreciate it if you could confirm that your pesq metric is correct.
Hi, Jason
Thank you very much for sharing the codes. I really like your paper, it's quite flexible and efficient to optimize some un-differentiable measures using your proposed scheme. Regarding to the detailed training procedures, I have several questions below:
(1) Did you train enhancement model (G), using PESQ measure (D) directly? I mean, did you add some additional loss, like MSE, or only used PESQ measure alone, in training stage?
(2) when training D, I found you trained it using so-called "previous list". It seems optional, I would like to know whether this stage is crucial for getting a better result?
(3) In the released codes, G and D are trained alternately for num_sampling=100 steps in one epoch. And batch_size used is equal to 1. I am wondering whether these hyper-parameters are same with your recipe, to get the Table.2 results?
Sorry to ask so many questions. Thank you again and wish you good works in future!
Hi,
I am curious about the batch size. It is always 1 for different implementations. How long does it require to train 600 epochs? Why not use a much larger batch? It takes lots of time to train.
Hi, Jason
I am trying to run MetricGAN.py But it fails while executing the read_pesq
function.
def read_pesq(clean_root, enhanced_file, sr):
f=enhanced_file.split('/')[-1]
wave_name=f.split('_')[-1].split('@')[0]
clean_file=clean_root+'Train_'+wave_name+'.wav'
cmd = PESQ_path+'/PESQ {} {} +{}'.format(clean_file, enhanced_file, sr)
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
out = proc.communicate()
pesq = float(out[0][-6:-1])
return (pesq+0.5)/5.0
The particular error is (line 84):
pesq = float(out[0][-6:-1])
ValueError: could not convert string to float:
I tried running it separately in bash in format
PESQ clean.wav enhanced.wav 16000
The error is PESQ: command not found
Could you, please, share more detail about how to use PESQ file?
Thank you very much!
能提供一下你们的评价指标代码吗,为什么有些我测的有些指标跟论文不一致
when I run ./MetricGAN.py, I get some number (maybe loss) and error, just like this:
How do I fix this error? which cuda & cudnn versions did you use? (The keras, librosa, python versions are exactly same with yours)
0.765720826757873
2.1266
/home/users/woody/vocoder/speech_enhancement/metricGAN/local/lib/python2.7/site-packages/matplotlib/axes/_base.py:3152: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=1, right=1
'left=%s, right=%s') % (left, right))
Sample training data for discriminator training...
Discriminator training...
/home/users/woody/vocoder/speech_enhancement/metricGAN/local/lib/python2.7/site-packages/keras/engine/training.py:973: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set model.trainable
without calling model.compile
after ?
'Discrepancy between trainable weights and collected trainable'
Epoch 1/1
2019-09-03 18:42:10.063259: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-09-03 18:42:10.063335: E tensorflow/stream_executor/cuda/cuda_dnn.cc:411] possibly insufficient driver version: 384.130.0
2019-09-03 18:42:10.063354: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)
Hi
Can you share critical requriments like tensorflow or which needed importantly?
Thanks
Hi,Jason:
When I try to train the MetricGAN(table2).py, I met two problems:
(1)At every epoch both on G and D, its loss is nan , but only show a warning :
/student/home/yll/anaconda3/envs/MetricGAN/lib/python3.6/site-packages/keras/engine/training.py:973: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set `model.trainable` without calling `model.compile` after ?
'Discrepancy between trainable weights and collected trainable'
Epoch 1/1
- 1498s - loss: nan
(2) At the fisrt epoch of traning Discriminator, it taken 593M on GPU's memory but its ultilization is 0%.
Data info: I use the same dataset from SEGAN already downsampled to 16k
by the way, keras-gpu=2.1.2, tensorflow-gpu=1.10, librosa=0.5.1, python=3.6
Do you know why this happen(loss nan)?
Thank you very much.
it is operate on TF domain right?
In this paper, MetricGAN used PESQ or STOI as the target metric for training.
In addition to PESQ and STOI, is it possible to use metrics such as CSIG, CBAK, and COVL be used as target metrics? (Possibility)
Hello authors,
I'm trying to run the MetricGAN on multi GPUs. When I added mirror strategy to the MetricGAN model and try to run, it is running on the single GPU. Can you suggest how to resolve this issue?
这个工作里志杰构造了STOI和PESQ loss 不需要用一个GAN去approximate
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.