Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for your quick response and explanation! I am trying the <code class="notra

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Testing failed when reproducing experiments by processing test*.py about robust-e2e-asr HOT 7 CLOSED

archiki commented on July 18, 2024

Testing failed when reproducing experiments by processing test*.py

from robust-e2e-asr.

Comments (7)

archiki commented on July 18, 2024

Hi @n1243645679976!
The table that you have mentioned presents the WER scores on libri-test-clean so you do not need to add any SNR related arguments. You can simply use the test.py/test_enhanced.py. The command seems to have --model-path missing, you should probably add that unless you have hard-coded it at your end. I have made minor edits to test.py and added 'utils_orig.py` which should eliminate some of the errors. Note that greedy decoding (viterbi decoding) is the default setting.

Hope this helps!
Best,
Archiki

from robust-e2e-asr.

n1243645679976 commented on July 18, 2024

Thanks for your quick response and explanation!
I am trying the test_enhanced.py for testing now, but the noise-related code(L36, L40-41, L90-91, L106-114, L154) still gives me error message, so I comment them and force it to evaluate. If there's any problem, I'll comment in this issue again.
By the way, can I ask for the log of training/develop loss and the checkpoint after finetuning?
I want to use it to compare with my experiment results.

best,
Cheng-Hung Hu.

from robust-e2e-asr.

archiki commented on July 18, 2024

I am not sure about which lines you are referring to (L36 is the parser code for epochs), but one quick fix is to supply the SNR arguments but keep --noise-dir as empty or none, and --noise-prob 0. This will not add any noise and the evaluation can proceed.

from robust-e2e-asr.

n1243645679976 commented on July 18, 2024

Hi @archiki,
I'm testing with test_enhanced.py, so for example, L36 means here. I doesn't change the parameters about audio_conf_noise such as noise-prob, noise-dir since the noise-prob is given 0 in your code and the target test set is already noisy.

Here's the process I test the noisy dataset.
I modified the test_enhanced.py as I comment two days ago, where L154 is modified to half=args.half, wer_dict= wer_dict) (remove the args.SNR)
Then, after I grouped the .wav files in the customed dataset(test_noisy_speech) by SNR and the type of noise, I tested them with the pretrained model, and I found the WER and the CER is different from the result shown in Table2 in your paper
For instance, the Talbe shows the WER of Car 0dB is 35.0, but the WER I got is 45.683.

My problem is :

Is the pretrained model reported as the baseline in your paper? If not, could you provided your model?
If so, is my evaluation process different from yours? Such as dataset or others...

best,
Cheng-Hung Hu

from robust-e2e-asr.

archiki commented on July 18, 2024

Hey @n1243645679976 ,

I fixed the edit you suggested in L154 of test_enhanced.py. However, at my end, I am able to reproduce the results mentioned in the table. I am attaching an image of the command as well as the results generated. I hope this will give you some clarity.

So the answer to your question 1 is yes, it's the same model. The answer to your question 2, is difficult for me to say since I don't have access to your setup. However, I have provided the test set used as you mentioned. I would recommend you double-check your manifest files, to ensure that you have matched the audio file with the correct transcript text. You should also check if you are able to re-create the clean WER of 10.3 using the standard libri-test-clean set.

Then, after I grouped the .wav files in the customed dataset(test_noisy_speech) by SNR and the type of noise, I tested them with

I am not sure what you mean by this, there is no need for you to group anything. As long as add all the files in the test_noisy_speech to the manifest appropriately, the testing code can group the files by noise-type and SNR. Hope this helps.

Best,
Archiki

from robust-e2e-asr.

n1243645679976 commented on July 18, 2024

Hi @archiki,
Thank you for running the experiment and the image is very helpful!
I can reproduce the experiment result now!
I found that the way to derive WER in your table and in the row starting with Test Summary are different and this is the point...

This is what happened:
First, I just misunderstood that the way to derive WER in your table and in the row staring with Test Summary are the same.
I comment the print_summary part in your code(L106-L114) because I think even if I group the wavfiles by SNR and noise type and test them only, it will still give me the same result with those in the table.
So, when I group the wavfiles by Car and 0dB first and test them, I can only get the WER in the row starting with Test Summary, which gives me a WER(32.56) different from yours(35.0), and I just confused and can't observe the difference of WER.

Thanks for your help!

Best,
Cheng-Hung Hu

from robust-e2e-asr.

archiki commented on July 18, 2024

Yes, the difference between the two is that under Test Summary, WER is calculated as [sum of all edit distances in the test set]/[sum of all lengths of transcripts] instead of [sum (edit distance/length of transcripts)]. Hope that makes sense to you.

from robust-e2e-asr.

Testing failed when reproducing experiments by processing test*.py about robust-e2e-asr HOT 7 CLOSED

Comments (7)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent