Giter VIP home page Giter VIP logo

Comments (16)

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024 1

Hi,

if you just want a quick result, you can also do the best path decoding of the posteriors yourself with a simple script. Note however, that the best path decoding does not make use of a language model or lexicon and does the decoding in a greedy way.

For the best path decoding, you first take the argmax at each frame, then merge all adjacent instances of the same label to one and then remove the blanks, e.g. aa_abcc_d , where "_" is blank, will be first mapped to a_abc_d and then after removing blanks to aabcd. I don't have the script here right now, but I can give you some rough code fragment, which you might get working (the chars.txt is from here: https://github.com/rwth-i6/returnn/blob/master/demos/mdlstm/IAM/chars.txt)

import h5py
import numpy

with open("chars.txt") as f:
  chars = map(str.strip, f.readlines())

with h5py.File("my_cache.h5", "r") as f:
  for seg in f:
    x = f[seg][...]
    x = numpy.argmax(x, axis=1)
    x = [chars[idx] for idx in x]
    y = []
    last_char = None
    for c in x:
      if last_char != c:
        y.append(c)
        last_char = c
    y = filter(lambda z: z != "_blank", y)
    y = map(lambda z: " " if z == "si" else z, y)
    y = map(lambda z: "." if z == "_dot" else z, y)
    y = map(lambda z: "-" if z == "_minus" else z, y)
    y = map(lambda z: "+" if z == "_plus" else z, y)
    y = map(lambda z: "#" if z == "_hash" else z, y)
    output = "".join(y).strip()
    print seg, output

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024 1

The key should be "inputs" (I know, the name is kind of misleading...)
Please have a try and let me know, if this works.
So instead of
x = f[seg][...]
try to use
x = f['inputs'][...]

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024 1

the 80th character is the blank of CTC and will be removed by the first filter, so do not remove this line:
y = filter(lambda z: z != "_blank", y)

for better readability, you migh replace it with
y = [z for z in y if z != "_blank"]

but you can remove all the maps below. They were used for a different set of characters and are not necessary in your case.

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024 1

Hi,

that's a point I forgot to mention. The outputs of all sequences are concatenated in the inputs field. Here is an updated script which will do the job:

#!/usr/bin/env python

import h5py
import numpy

with open("chars.txt") as f:
  chars = [l.strip() for l in f.readlines()]

with h5py.File("mdlstm_long_valid.h5", "r") as f:
  x = f["inputs"][...]
  x = numpy.argmax(x, axis=1)
  x = [chars[idx] for idx in x]
  lens = f["seqLengths"][...]
  tags = f["seqTags"][...]
  start = 0
  for tag, len_ in zip(tags, lens):
    y = []
    last_char = None
    for c in x[start:start+len_]:
      if last_char != c:
        y.append(c)
        last_char = c
    y = [" " if c == "|" else c for c in y]
    output = "".join(y).strip()
    print tag, output
    start += len_

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

@pvoigtlaender
thank you very much for your prompt reply.
I will try your advice asap and report the result to you. :-)

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

@pvoigtlaender thank you for your code fragment, I run the config_fwd with the trained model, e.g. mdlstm_real.0020, and I got the predicted file "mdlstm_real_valid.h5" (maybe this is the final predicted file with log-posterior probabilities scores).

When I loaded this h5 file, I cannot find which one is the log-posterior scores?

In [57]: fid = h5py.File('mdlstm_real_valid.h5','r')

In [58]: fid.keys()
Out[58]: [u'inputs', u'labels', u'seqDims', u'seqLengths', u'seqTags', u'targets']

In the fid, which key represents the log-posterior scores?


In your code fragment, I saw that:

with h5py.File("my_cache.h5", "r") as f:
for seg in f:
x = f[seg][...]
maybe 'seg' is the key, but in my fid.keys(), I do not have the 'seg' key...

Thank you!

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

@pvoigtlaender really grateful for your prompt rely.
Yes, I guess the 'inputs' is the 'seg' you mentioned in your code fragment.
I tried with x = f['inputs'][...], and i saw that the shape of 'inputs' numpy mat is (numframe, 80) as follows:

In [69]: inputs2 = fid2['inputs']
In [70]: inputs2.shape
Out[70]: (205, 80)

Since the number of characters in 'chars.txt' is 79, why the dimension of the log-posterior score is 80? I guess the last index is the EOF symbol?
So, currently I just ignore the last index as:
x = [chars[idx] for idx in x if idx != 79]

I test with my trained model, the prediction seems normal and almost correct after 20 epoch.
Really thank you for your help. Next time I will try more reasonable decoding scheme such as RASR, kaldi.

BTW, i cannot understand the lines in the code fragment:

y = filter(lambda z: z != "_blank", y)
y = map(lambda z: " " if z == "si" else z, y)
y = map(lambda z: "." if z == "_dot" else z, y)
y = map(lambda z: "-" if z == "_minus" else z, y)
y = map(lambda z: "+" if z == "_plus" else z, y)
y = map(lambda z: "#" if z == "_hash" else z, y)

since I cannot find 'si', '_dot', '_minus', '_plus', '_hash' in the chars.txt, can I remove these lines?

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

hi, dear @pvoigtlaender , may I have a question about forwarding (predicting) multiple training samples using a pre-trained model?

Actually, I refer to the config_fwd file, for my test samples, supposed I have 4 test samples, the groundtruth labels are in a list ['note', 'you', 'succeed', 'how'], I first create the .h5 file for these 4 test samples, and run the config_fwd file. Finally, I got a predict.h5 file for the log-posterior.

According to your "greedy decoding" method above, I load the predict.h5 file, and use the x = f['inputs'] as the numpy matrix of the final predictions, which has dimension of x.shape = [18, 80], after I remove the EOF (index = 79) in x as:
x = [chars[idx] for idx in x if idx != 79]
y = ''.join(x)
now the value in y is ||yo||nate||suaed|. I guess that the predicted result for each test sample is separated by symbol '||' as '|prediction|', so I separate the string in y according to a pair of '||', unfortunately, I got predictions ['yo', 'nate', 'suaed'], only 3 predictions, which does not equal to the number of test samples 4!
And the order of the prediction is reverse to the test samples?
Does the model may predict no symbol for a test sample, i.e. no characters output?
Or, in the prediction of y, the number of '|' is 7, not an even number, '|' should be a pair for one prediction?

So, how can I parse the prediction of x, to ensure the number of predicted words equal to the number of test samples? Thank you for your kind help!

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

hi, @pvoigtlaender , really grateful for your kind help and prompt reply! :-)
Yes, according to your example code, I can correctly decode the predicted log-posterior sequence.
I need to dig deeper into the returnn source code. :-)
Thank you again.

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

It seems like you have a simple working decoder now, so I'm closing for now.
Feel free to reopen if you have more questions

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

Hi, dear @pvoigtlaender , always grateful for your kind help. These days I was doing some other data collection works. When I analyzed the decoded log-posterior of the trained model. I found the order the input samples is different with the orders of the predicted results using the decoding algorithm above.

Specifically, given 5 images with groundtruth strings as

image1.jpg Today is sunny
image2.jpg I went to school.
image3.jpg There are 30 classmates in my class
image4.jpg We have four lessons in the morning
image5.jpg And three lessons in the afternoon

When created the .h5 file for predicting and predicted the results via trained models, I usually get sequential results which have different order with the original 5 images, the prediction is often as follows:

image5.jpg And three lessons in the afternoon
image3.jpg There are 30 classmates in my class
image1.jpg Today is sunny
image4.jpg We have four lessons in the morning
image2.jpg I went to school.

why the order of the input images is different from the predicted one? Is it possible to keep the orders of them?

Thank you!

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

Hi,

I think probably the order in which you write into the hdf5 file is used, can you verify, if the order of the output is consistent with the order in the hdf5 file? If yes, then you should make sure to use the right order when creating your file.

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

Hi, dear @pvoigtlaender ,
I have checked the hdf5 files that was created from the 5 images(the order to create the hdf5 file is image1,2,3,4,5), however, when i got the predicted h5files and parsed it using the decoding algorithm, the order changed ... it was really weird.

Usually, the order is the same during creating and predicting in the returnn code?
Thank you!

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

Hi,

can you please also try h5dump both the file used as input and the file which came out when forwarding
to check the order which is actually stored in the files?
And if you run multiple times, is the order then consistent, or might it be that for some reason the order is randomized?
I'm also not so sure, why this is happening here. @albertz can you help here?

from returnn.

interxuxing avatar interxuxing commented on July 17, 2024

hi, dear @pvoigtlaender , thank you for your help all the time.
actually, i found that the decoding procedure reordered the original order of batch images.
For example: the original order is 0,1,2,3,4,5, then the decoded order becomes 2,0,1,4,5,3.

Anyway, when I decoding, I can get the decoded sequence and the correpsonding image name, thus, even the order changes, it is still possible to reorder the decoding result according to the image names.

Thank you again~

from returnn.

prolaser avatar prolaser commented on July 17, 2024

Hi @interxuxing
since you have experience with returnn i was wondering if you could help me as a beginner to understand some issues. I have run the IAM full dataset using config_real and i stopped the training after 72 epochs. After that i have run the config_fwd and i got a mdlstm_real_valid.h5 file. The prior folder is still empty for some reason which i dont know why.
Now my questions is based on these files that i have how can i do testing on my models to check the accuracy of the trained models. How can i see how accurate are my models now. it seems like you had the same experience so i would appreciate it if you could help me.

Regards
Arman

from returnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.