jasonswfu / metricgan Goto Github PK

MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement (ICML 2019, with Travel awards)

Python 41.84% MATLAB 58.16%

metricgan's People

Contributors

Stargazers

Watchers

metricgan's Issues

Questions about MetricGAN

While I study MetricGAN, I have some questions.

What is a generative component of MetricGAN?
For example, SEGAN[1] has a generative component. It concatenates "random latent vectors" to its encoder output.
Do you classify this work as a generative or discriminative method?
Have you tried a pre-trained discriminator(D) that is trained with training data and then fixed while training the generator(G)?
As you mentioned in the paper, D should be called an evaluator. The training procedure of MetricGAN is not adversarial. G and D help each other.
Therefore, I'm wondering which is better, 'training G with pre-trained D' or 'training G and D together'.

[1] Pascual, Santiago, Antonio Bonafonte, and Joan Serra. "SEGAN: Speech enhancement generative adversarial network." arXiv preprint arXiv:1703.09452 (2017).

Thank you very much for sharing your great work.

pretrained SE model available?

Hi,
I was trying to see if I could access a pretrained SE model that I could use this method as a baseline for my work. I could not find it here, so if possible, can you please redirect me to the correct place? Thanks a lot!

About pesq measurement

I have an inquiry about the performance measurement method for the NSDTSEA (VCTK+DEMAND) data set published in the paper.

I think you measured MOS-LQO and not PESQ. Many open source codes convert the measurement results of WB PESQ into MOS-LQO and export them. So the range of values is not -0.5 to 4.5 as you wrote in your paper.

My own measurements on the same dataset show that the PESQ is 1.68. For MOS-LQO it measures 1.97 as shown in your paper. The remaining CSIG, CBAK, and COVL were measured in the same way as in the paper.

I would appreciate it if you could confirm that your pesq metric is correct.

A few question on training schemes

Hi, Jason

Thank you very much for sharing the codes. I really like your paper, it's quite flexible and efficient to optimize some un-differentiable measures using your proposed scheme. Regarding to the detailed training procedures, I have several questions below:

(1) Did you train enhancement model (G), using PESQ measure (D) directly? I mean, did you add some additional loss, like MSE, or only used PESQ measure alone, in training stage?
(2) when training D, I found you trained it using so-called "previous list". It seems optional, I would like to know whether this stage is crucial for getting a better result?
(3) In the released codes, G and D are trained alternately for num_sampling=100 steps in one epoch. And batch_size used is equal to 1. I am wondering whether these hyper-parameters are same with your recipe, to get the Table.2 results?

Sorry to ask so many questions. Thank you again and wish you good works in future!

Batch Size = 1

Hi,
I am curious about the batch size. It is always 1 for different implementations. How long does it require to train 600 epochs? Why not use a much larger batch? It takes lots of time to train.

Running PESQ file

Hi, Jason

I am trying to run MetricGAN.py But it fails while executing the read_pesq function.

def read_pesq(clean_root, enhanced_file, sr):
    f=enhanced_file.split('/')[-1]       
    wave_name=f.split('_')[-1].split('@')[0]
    clean_file=clean_root+'Train_'+wave_name+'.wav'
    
    cmd = PESQ_path+'/PESQ {} {} +{}'.format(clean_file, enhanced_file, sr)
    proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
    out = proc.communicate()
    pesq = float(out[0][-6:-1])
    return (pesq+0.5)/5.0

The particular error is (line 84):

    pesq = float(out[0][-6:-1])
ValueError: could not convert string to float:

I tried running it separately in bash in format

PESQ clean.wav enhanced.wav 16000

The error is PESQ: command not found

Could you, please, share more detail about how to use PESQ file?

Thank you very much!

What was pesq score of deep feature loss and SERGAN?

Hi, thanks for sharing your great work! could you elaborate on the numbers that are not listed here? I wanna exactly compare which model performs the best. Thanks!

评价指标

能提供一下你们的评价指标代码吗，为什么有些我测的有些指标跟论文不一致

which cuda and cudnn version did you use?

when I run ./MetricGAN.py, I get some number (maybe loss) and error, just like this:
How do I fix this error? which cuda & cudnn versions did you use? (The keras, librosa, python versions are exactly same with yours)

0.765720826757873
2.1266
/home/users/woody/vocoder/speech_enhancement/metricGAN/local/lib/python2.7/site-packages/matplotlib/axes/_base.py:3152: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=1, right=1
'left=%s, right=%s') % (left, right))
Sample training data for discriminator training...
Discriminator training...
/home/users/woody/vocoder/speech_enhancement/metricGAN/local/lib/python2.7/site-packages/keras/engine/training.py:973: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set model.trainable without calling model.compile after ?
'Discrepancy between trainable weights and collected trainable'
Epoch 1/1
2019-09-03 18:42:10.063259: E tensorflow/stream_executor/cuda/cuda_dnn.cc:403] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-09-03 18:42:10.063335: E tensorflow/stream_executor/cuda/cuda_dnn.cc:411] possibly insufficient driver version: 384.130.0
2019-09-03 18:42:10.063354: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

Is it possible to share detailly requriments?

Hi
Can you share critical requriments like tensorflow or which needed importantly?

Thanks

~loss: Nan

Hi，Jason：

When I try to train the MetricGAN(table2).py, I met two problems:
(1)At every epoch both on G and D, its loss is nan , but only show a warning :

/student/home/yll/anaconda3/envs/MetricGAN/lib/python3.6/site-packages/keras/engine/training.py:973: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set `model.trainable` without calling `model.compile` after ?
  'Discrepancy between trainable weights and collected trainable'
Epoch 1/1
 - 1498s - loss: nan

(2) At the fisrt epoch of traning Discriminator, it taken 593M on GPU's memory but its ultilization is 0%.

Data info: I use the same dataset from SEGAN already downsampled to 16k
by the way, keras-gpu=2.1.2, tensorflow-gpu=1.10, librosa=0.5.1, python=3.6

Do you know why this happen（loss nan）？
Thank you very much.

jasonswfu / metricgan Goto Github PK

metricgan's People

Contributors

Stargazers

Watchers

Forkers

metricgan's Issues

Recommend Projects

Recommend Topics

Recommend Org