Hi, we do not write any tranion into the test files, as they a

IAM data testset processing and path for rnn.py about returnn HOT 6 CLOSED

rwth-i6 commented on July 17, 2024

IAM data testset processing and path for rnn.py

from returnn.

Comments (6)

pvoigtlaender commented on July 17, 2024

Hi,

we do not write any transcription into the test files, as they are not needed. We only use the test data to evaluate and not to train or tune parameters. By not including the transcriptions, we wanted to avoid to accidentally using it for training or tuning.

We train on train.1.h5 and train.2.h while train_valid.h5 is used for validation, for example for early stopping to prevent overfitting.
Once the training is done, we use the trained network and forward the network on the valid.h5 or test.h5 data, which will create h5 files with posterior probabilities. These are then the input to a decoder which does the recognition. In the demo, there is a script which does the recognition in a very simple way (without language model or lexicon and with a greedy search strategy).
The final evaluation is done with the result of the decoder, outside of RETURNN.

from returnn.

aarora8 commented on July 17, 2024

Thank you. Before the input to a decoder for recognition, does test transcriptions also go through same regex operations. Or in scoring is same regex operation is applied?

from returnn.

pvoigtlaender commented on July 17, 2024

If you need to post-process the recognition output depends on the recognizer. The important thing is that your output matches the format of the ground truth. I'm not sure anymore, if we changed the tokenization of the transcriptions or if it was directly provided like this, but here are a few samples of the ground truth which we used to give you an idea how tokenization was done for IAM (I can try to look up more details if necessary):

d06-030: Amen . MOST people would probably regard tiredness as a purely physical thing . The cure for which is sleep . This is only partly true . Many people wake up tired of a morning and no amount of rest seems to make any difference . Sleep , to be effective , must be of that child-like quality which comes from innocence .
d06-050: Nor is she necessarily being deceitful . She really did feel tired until the mind got the necessary injection of a fresh - and an attractive - interest ! Tiredness has , therefore , as much to do with our mental state as with our physical exhaustion . A disturbed mind can bring the healthiest body to a sense of fatigue . They wonder why they get no rest at night , even if they do sleep . 
d06-082: If we ever stopped to consider how much energy - and time - we lose this way in the course of a day we would be staggered . Some of it is inevitable , and we do not want to become too pernickety . Nevertheless , we could all probably be a little more orderly for we so frequently just muddle through .
d04-075: The Messusah , which is nailed on the doorposts contains those two lessons . The third lesson of the Shmah ( Numbers 15 , 37-41 ) is called " the going-out of the land of Egypt " . It starts with the story of the fringes the " tsitsits " . This lesson used to be read only in the morning . And the Talmud tells a story , which is also in the Hagadah that Rabbi Eleasar ben Assarja said , " I am nearly seventy and I had not succeeded that

from returnn.

aarora8 commented on July 17, 2024

Thank you for the help. It is not necessary, i was just curious.

from returnn.

pvoigtlaender commented on July 17, 2024

Great!
Do you have further problems/questions? Otherwise I'd like to close the issue for now. You can reopen a new one if you get further problems at a later point.

from returnn.

aarora8 commented on July 17, 2024

Ok, yeah I do not have any further question.

from returnn.

IAM data testset processing and path for rnn.py about returnn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent