In <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi, dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard

hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

What is the Kaldi, RASR decoder to decode the posterior file in demo/mdlstm/IAM? about returnn HOT 16 CLOSED

rwth-i6 commented on July 17, 2024

What is the Kaldi, RASR decoder to decode the posterior file in demo/mdlstm/IAM?

from returnn.

Comments (16)

pvoigtlaender commented on July 17, 2024 1

Hi,

if you just want a quick result, you can also do the best path decoding of the posteriors yourself with a simple script. Note however, that the best path decoding does not make use of a language model or lexicon and does the decoding in a greedy way.

For the best path decoding, you first take the argmax at each frame, then merge all adjacent instances of the same label to one and then remove the blanks, e.g. aa_abcc_d , where "_" is blank, will be first mapped to a_abc_d and then after removing blanks to aabcd. I don't have the script here right now, but I can give you some rough code fragment, which you might get working (the chars.txt is from here: https://github.com/rwth-i6/returnn/blob/master/demos/mdlstm/IAM/chars.txt)

import h5py
import numpy

with open("chars.txt") as f:
  chars = map(str.strip, f.readlines())

with h5py.File("my_cache.h5", "r") as f:
  for seg in f:
    x = f[seg][...]
    x = numpy.argmax(x, axis=1)
    x = [chars[idx] for idx in x]
    y = []
    last_char = None
    for c in x:
      if last_char != c:
        y.append(c)
        last_char = c
    y = filter(lambda z: z != "_blank", y)
    y = map(lambda z: " " if z == "si" else z, y)
    y = map(lambda z: "." if z == "_dot" else z, y)
    y = map(lambda z: "-" if z == "_minus" else z, y)
    y = map(lambda z: "+" if z == "_plus" else z, y)
    y = map(lambda z: "#" if z == "_hash" else z, y)
    output = "".join(y).strip()
    print seg, output

from returnn.

pvoigtlaender commented on July 17, 2024 1

The key should be "inputs" (I know, the name is kind of misleading...)
Please have a try and let me know, if this works.
So instead of
x = f[seg][...]
try to use
x = f['inputs'][...]

from returnn.

pvoigtlaender commented on July 17, 2024 1

the 80th character is the blank of CTC and will be removed by the first filter, so do not remove this line:
y = filter(lambda z: z != "_blank", y)

for better readability, you migh replace it with
y = [z for z in y if z != "_blank"]

but you can remove all the maps below. They were used for a different set of characters and are not necessary in your case.

from returnn.

pvoigtlaender commented on July 17, 2024 1

Hi,

that's a point I forgot to mention. The outputs of all sequences are concatenated in the inputs field. Here is an updated script which will do the job:

#!/usr/bin/env python

import h5py
import numpy

with open("chars.txt") as f:
  chars = [l.strip() for l in f.readlines()]

with h5py.File("mdlstm_long_valid.h5", "r") as f:
  x = f["inputs"][...]
  x = numpy.argmax(x, axis=1)
  x = [chars[idx] for idx in x]
  lens = f["seqLengths"][...]
  tags = f["seqTags"][...]
  start = 0
  for tag, len_ in zip(tags, lens):
    y = []
    last_char = None
    for c in x[start:start+len_]:
      if last_char != c:
        y.append(c)
        last_char = c
    y = [" " if c == "|" else c for c in y]
    output = "".join(y).strip()
    print tag, output
    start += len_

from returnn.

interxuxing commented on July 17, 2024

@pvoigtlaender
thank you very much for your prompt reply.
I will try your advice asap and report the result to you. :-)

from returnn.

interxuxing commented on July 17, 2024

@pvoigtlaender thank you for your code fragment, I run the config_fwd with the trained model, e.g. mdlstm_real.0020, and I got the predicted file "mdlstm_real_valid.h5" (maybe this is the final predicted file with log-posterior probabilities scores).

When I loaded this h5 file, I cannot find which one is the log-posterior scores?

In [57]: fid = h5py.File('mdlstm_real_valid.h5','r')

In [58]: fid.keys()
Out[58]: [u'inputs', u'labels', u'seqDims', u'seqLengths', u'seqTags', u'targets']

In the fid, which key represents the log-posterior scores?

In your code fragment, I saw that:

with h5py.File("my_cache.h5", "r") as f:
for seg in f:
x = f[seg][...]
maybe 'seg' is the key, but in my fid.keys(), I do not have the 'seg' key...

Thank you!

from returnn.

interxuxing commented on July 17, 2024

@pvoigtlaender really grateful for your prompt rely.
Yes, I guess the 'inputs' is the 'seg' you mentioned in your code fragment.
I tried with x = f['inputs'][...], and i saw that the shape of 'inputs' numpy mat is (numframe, 80) as follows:

In [69]: inputs2 = fid2['inputs']
In [70]: inputs2.shape
Out[70]: (205, 80)

Since the number of characters in 'chars.txt' is 79, why the dimension of the log-posterior score is 80? I guess the last index is the EOF symbol?
So, currently I just ignore the last index as:
x = [chars[idx] for idx in x if idx != 79]

I test with my trained model, the prediction seems normal and almost correct after 20 epoch.
Really thank you for your help. Next time I will try more reasonable decoding scheme such as RASR, kaldi.

BTW, i cannot understand the lines in the code fragment:

y = filter(lambda z: z != "_blank", y)
y = map(lambda z: " " if z == "si" else z, y)
y = map(lambda z: "." if z == "_dot" else z, y)
y = map(lambda z: "-" if z == "_minus" else z, y)
y = map(lambda z: "+" if z == "_plus" else z, y)
y = map(lambda z: "#" if z == "_hash" else z, y)

since I cannot find 'si', '_dot', '_minus', '_plus', '_hash' in the chars.txt, can I remove these lines?

from returnn.

interxuxing commented on July 17, 2024

hi, dear @pvoigtlaender , may I have a question about forwarding (predicting) multiple training samples using a pre-trained model?

Actually, I refer to the config_fwd file, for my test samples, supposed I have 4 test samples, the groundtruth labels are in a list ['note', 'you', 'succeed', 'how'], I first create the .h5 file for these 4 test samples, and run the config_fwd file. Finally, I got a predict.h5 file for the log-posterior.

According to your "greedy decoding" method above, I load the predict.h5 file, and use the x = f['inputs'] as the numpy matrix of the final predictions, which has dimension of x.shape = [18, 80], after I remove the EOF (index = 79) in x as:
x = [chars[idx] for idx in x if idx != 79]
y = ''.join(x)
now the value in y is ||yo||nate||suaed|. I guess that the predicted result for each test sample is separated by symbol '||' as '|prediction|', so I separate the string in y according to a pair of '||', unfortunately, I got predictions ['yo', 'nate', 'suaed'], only 3 predictions, which does not equal to the number of test samples 4!
And the order of the prediction is reverse to the test samples?
Does the model may predict no symbol for a test sample, i.e. no characters output?
Or, in the prediction of y, the number of '|' is 7, not an even number, '|' should be a pair for one prediction?

So, how can I parse the prediction of x, to ensure the number of predicted words equal to the number of test samples? Thank you for your kind help!

from returnn.

interxuxing commented on July 17, 2024

hi, @pvoigtlaender , really grateful for your kind help and prompt reply! :-)
Yes, according to your example code, I can correctly decode the predicted log-posterior sequence.
I need to dig deeper into the returnn source code. :-)
Thank you again.

from returnn.

pvoigtlaender commented on July 17, 2024

It seems like you have a simple working decoder now, so I'm closing for now.
Feel free to reopen if you have more questions

from returnn.

interxuxing commented on July 17, 2024

Hi, dear @pvoigtlaender , always grateful for your kind help. These days I was doing some other data collection works. When I analyzed the decoded log-posterior of the trained model. I found the order the input samples is different with the orders of the predicted results using the decoding algorithm above.

Specifically, given 5 images with groundtruth strings as

image1.jpg Today is sunny
image2.jpg I went to school.
image3.jpg There are 30 classmates in my class
image4.jpg We have four lessons in the morning
image5.jpg And three lessons in the afternoon

When created the .h5 file for predicting and predicted the results via trained models, I usually get sequential results which have different order with the original 5 images, the prediction is often as follows:

image5.jpg And three lessons in the afternoon
image3.jpg There are 30 classmates in my class
image1.jpg Today is sunny
image4.jpg We have four lessons in the morning
image2.jpg I went to school.

why the order of the input images is different from the predicted one? Is it possible to keep the orders of them?

Thank you!

from returnn.

pvoigtlaender commented on July 17, 2024

Hi,

I think probably the order in which you write into the hdf5 file is used, can you verify, if the order of the output is consistent with the order in the hdf5 file? If yes, then you should make sure to use the right order when creating your file.

from returnn.

interxuxing commented on July 17, 2024

Hi, dear @pvoigtlaender ,
I have checked the hdf5 files that was created from the 5 images(the order to create the hdf5 file is image1,2,3,4,5), however, when i got the predicted h5files and parsed it using the decoding algorithm, the order changed ... it was really weird.

Usually, the order is the same during creating and predicting in the returnn code?
Thank you!

from returnn.

pvoigtlaender commented on July 17, 2024

Hi,

can you please also try h5dump both the file used as input and the file which came out when forwarding
to check the order which is actually stored in the files?
And if you run multiple times, is the order then consistent, or might it be that for some reason the order is randomized?
I'm also not so sure, why this is happening here. @albertz can you help here?

from returnn.

interxuxing commented on July 17, 2024

hi, dear @pvoigtlaender , thank you for your help all the time.
actually, i found that the decoding procedure reordered the original order of batch images.
For example: the original order is 0,1,2,3,4,5, then the decoded order becomes 2,0,1,4,5,3.

Anyway, when I decoding, I can get the decoded sequence and the correpsonding image name, thus, even the order changes, it is still possible to reorder the decoding result according to the image names.

Thank you again~

from returnn.

prolaser commented on July 17, 2024

Hi @interxuxing
since you have experience with returnn i was wondering if you could help me as a beginner to understand some issues. I have run the IAM full dataset using config_real and i stopped the training after 72 epochs. After that i have run the config_fwd and i got a mdlstm_real_valid.h5 file. The prior folder is still empty for some reason which i dont know why.
Now my questions is based on these files that i have how can i do testing on my models to check the accuracy of the trained models. How can i see how accurate are my models now. it seems like you had the same experience so i would appreciate it if you could help me.

Regards
Arman

from returnn.

What is the Kaldi, RASR decoder to decode the posterior file in demo/mdlstm/IAM? about returnn HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent