Giter VIP home page Giter VIP logo

Comments (7)

archiki avatar archiki commented on June 19, 2024

Hi @n1243645679976!
The table that you have mentioned presents the WER scores on libri-test-clean so you do not need to add any SNR related arguments. You can simply use the test.py/test_enhanced.py. The command seems to have --model-path missing, you should probably add that unless you have hard-coded it at your end. I have made minor edits to test.py and added 'utils_orig.py` which should eliminate some of the errors. Note that greedy decoding (viterbi decoding) is the default setting.

Hope this helps!
Best,
Archiki

from robust-e2e-asr.

n1243645679976 avatar n1243645679976 commented on June 19, 2024

Thanks for your quick response and explanation!
I am trying the test_enhanced.py for testing now, but the noise-related code(L36, L40-41, L90-91, L106-114, L154) still gives me error message, so I comment them and force it to evaluate. If there's any problem, I'll comment in this issue again.
By the way, can I ask for the log of training/develop loss and the checkpoint after finetuning?
I want to use it to compare with my experiment results.

best,
Cheng-Hung Hu.

from robust-e2e-asr.

archiki avatar archiki commented on June 19, 2024

I am not sure about which lines you are referring to (L36 is the parser code for epochs), but one quick fix is to supply the SNR arguments but keep --noise-dir as empty or none, and --noise-prob 0. This will not add any noise and the evaluation can proceed.

from robust-e2e-asr.

n1243645679976 avatar n1243645679976 commented on June 19, 2024

Hi @archiki,
I'm testing with test_enhanced.py, so for example, L36 means here. I doesn't change the parameters about audio_conf_noise such as noise-prob, noise-dir since the noise-prob is given 0 in your code and the target test set is already noisy.

Here's the process I test the noisy dataset.
I modified the test_enhanced.py as I comment two days ago, where L154 is modified to half=args.half, wer_dict= wer_dict) (remove the args.SNR)
Then, after I grouped the .wav files in the customed dataset(test_noisy_speech) by SNR and the type of noise, I tested them with the pretrained model, and I found the WER and the CER is different from the result shown in Table2 in your paper
For instance, the Talbe shows the WER of Car 0dB is 35.0, but the WER I got is 45.683.

My problem is :

  1. Is the pretrained model reported as the baseline in your paper? If not, could you provided your model?
  2. If so, is my evaluation process different from yours? Such as dataset or others...

best,
Cheng-Hung Hu

from robust-e2e-asr.

archiki avatar archiki commented on June 19, 2024

Hey @n1243645679976 ,

I fixed the edit you suggested in L154 of test_enhanced.py. However, at my end, I am able to reproduce the results mentioned in the table. I am attaching an image of the command as well as the results generated. I hope this will give you some clarity.
image

So the answer to your question 1 is yes, it's the same model. The answer to your question 2, is difficult for me to say since I don't have access to your setup. However, I have provided the test set used as you mentioned. I would recommend you double-check your manifest files, to ensure that you have matched the audio file with the correct transcript text. You should also check if you are able to re-create the clean WER of 10.3 using the standard libri-test-clean set.

Then, after I grouped the .wav files in the customed dataset(test_noisy_speech) by SNR and the type of noise, I tested them with

I am not sure what you mean by this, there is no need for you to group anything. As long as add all the files in the test_noisy_speech to the manifest appropriately, the testing code can group the files by noise-type and SNR. Hope this helps.

Best,
Archiki

from robust-e2e-asr.

n1243645679976 avatar n1243645679976 commented on June 19, 2024

Hi @archiki,
Thank you for running the experiment and the image is very helpful!
I can reproduce the experiment result now!
I found that the way to derive WER in your table and in the row starting with Test Summary are different and this is the point...

This is what happened:
First, I just misunderstood that the way to derive WER in your table and in the row staring with Test Summary are the same.
I comment the print_summary part in your code(L106-L114) because I think even if I group the wavfiles by SNR and noise type and test them only, it will still give me the same result with those in the table.
So, when I group the wavfiles by Car and 0dB first and test them, I can only get the WER in the row starting with Test Summary, which gives me a WER(32.56) different from yours(35.0), and I just confused and can't observe the difference of WER.
image

Thanks for your help!

Best,
Cheng-Hung Hu

from robust-e2e-asr.

archiki avatar archiki commented on June 19, 2024

Yes, the difference between the two is that under Test Summary, WER is calculated as [sum of all edit distances in the test set]/[sum of all lengths of transcripts] instead of [sum (edit distance/length of transcripts)]. Hope that makes sense to you.

from robust-e2e-asr.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.