Comments (16)
Hi,
if you just want a quick result, you can also do the best path decoding of the posteriors yourself with a simple script. Note however, that the best path decoding does not make use of a language model or lexicon and does the decoding in a greedy way.
For the best path decoding, you first take the argmax at each frame, then merge all adjacent instances of the same label to one and then remove the blanks, e.g. aa_abcc_d , where "_" is blank, will be first mapped to a_abc_d and then after removing blanks to aabcd. I don't have the script here right now, but I can give you some rough code fragment, which you might get working (the chars.txt is from here: https://github.com/rwth-i6/returnn/blob/master/demos/mdlstm/IAM/chars.txt)
import h5py
import numpy
with open("chars.txt") as f:
chars = map(str.strip, f.readlines())
with h5py.File("my_cache.h5", "r") as f:
for seg in f:
x = f[seg][...]
x = numpy.argmax(x, axis=1)
x = [chars[idx] for idx in x]
y = []
last_char = None
for c in x:
if last_char != c:
y.append(c)
last_char = c
y = filter(lambda z: z != "_blank", y)
y = map(lambda z: " " if z == "si" else z, y)
y = map(lambda z: "." if z == "_dot" else z, y)
y = map(lambda z: "-" if z == "_minus" else z, y)
y = map(lambda z: "+" if z == "_plus" else z, y)
y = map(lambda z: "#" if z == "_hash" else z, y)
output = "".join(y).strip()
print seg, output
from returnn.
The key should be "inputs" (I know, the name is kind of misleading...)
Please have a try and let me know, if this works.
So instead of
x = f[seg][...]
try to use
x = f['inputs'][...]
from returnn.
the 80th character is the blank of CTC and will be removed by the first filter, so do not remove this line:
y = filter(lambda z: z != "_blank", y)
for better readability, you migh replace it with
y = [z for z in y if z != "_blank"]
but you can remove all the maps below. They were used for a different set of characters and are not necessary in your case.
from returnn.
Hi,
that's a point I forgot to mention. The outputs of all sequences are concatenated in the inputs field. Here is an updated script which will do the job:
#!/usr/bin/env python
import h5py
import numpy
with open("chars.txt") as f:
chars = [l.strip() for l in f.readlines()]
with h5py.File("mdlstm_long_valid.h5", "r") as f:
x = f["inputs"][...]
x = numpy.argmax(x, axis=1)
x = [chars[idx] for idx in x]
lens = f["seqLengths"][...]
tags = f["seqTags"][...]
start = 0
for tag, len_ in zip(tags, lens):
y = []
last_char = None
for c in x[start:start+len_]:
if last_char != c:
y.append(c)
last_char = c
y = [" " if c == "|" else c for c in y]
output = "".join(y).strip()
print tag, output
start += len_
from returnn.
@pvoigtlaender
thank you very much for your prompt reply.
I will try your advice asap and report the result to you. :-)
from returnn.
@pvoigtlaender thank you for your code fragment, I run the config_fwd with the trained model, e.g. mdlstm_real.0020, and I got the predicted file "mdlstm_real_valid.h5" (maybe this is the final predicted file with log-posterior probabilities scores).
When I loaded this h5 file, I cannot find which one is the log-posterior scores?
In [57]: fid = h5py.File('mdlstm_real_valid.h5','r')
In [58]: fid.keys()
Out[58]: [u'inputs', u'labels', u'seqDims', u'seqLengths', u'seqTags', u'targets']
In the fid, which key represents the log-posterior scores?
In your code fragment, I saw that:
with h5py.File("my_cache.h5", "r") as f:
for seg in f:
x = f[seg][...]
maybe 'seg' is the key, but in my fid.keys(), I do not have the 'seg' key...
Thank you!
from returnn.
@pvoigtlaender really grateful for your prompt rely.
Yes, I guess the 'inputs' is the 'seg' you mentioned in your code fragment.
I tried with x = f['inputs'][...], and i saw that the shape of 'inputs' numpy mat is (numframe, 80) as follows:
In [69]: inputs2 = fid2['inputs']
In [70]: inputs2.shape
Out[70]: (205, 80)
Since the number of characters in 'chars.txt' is 79, why the dimension of the log-posterior score is 80? I guess the last index is the EOF symbol?
So, currently I just ignore the last index as:
x = [chars[idx] for idx in x if idx != 79]
I test with my trained model, the prediction seems normal and almost correct after 20 epoch.
Really thank you for your help. Next time I will try more reasonable decoding scheme such as RASR, kaldi.
BTW, i cannot understand the lines in the code fragment:
y = filter(lambda z: z != "_blank", y)
y = map(lambda z: " " if z == "si" else z, y)
y = map(lambda z: "." if z == "_dot" else z, y)
y = map(lambda z: "-" if z == "_minus" else z, y)
y = map(lambda z: "+" if z == "_plus" else z, y)
y = map(lambda z: "#" if z == "_hash" else z, y)
since I cannot find 'si', '_dot', '_minus', '_plus', '_hash' in the chars.txt, can I remove these lines?
from returnn.
hi, dear @pvoigtlaender , may I have a question about forwarding (predicting) multiple training samples using a pre-trained model?
Actually, I refer to the config_fwd file, for my test samples, supposed I have 4 test samples, the groundtruth labels are in a list ['note', 'you', 'succeed', 'how'], I first create the .h5 file for these 4 test samples, and run the config_fwd file. Finally, I got a predict.h5 file for the log-posterior.
According to your "greedy decoding" method above, I load the predict.h5 file, and use the x = f['inputs']
as the numpy matrix of the final predictions, which has dimension of x.shape = [18, 80]
, after I remove the EOF (index = 79) in x as:
x = [chars[idx] for idx in x if idx != 79]
y = ''.join(x)
now the value in y is ||yo||nate||suaed|
. I guess that the predicted result for each test sample is separated by symbol '||' as '|prediction|', so I separate the string in y according to a pair of '||', unfortunately, I got predictions ['yo', 'nate', 'suaed'], only 3 predictions, which does not equal to the number of test samples 4!
And the order of the prediction is reverse to the test samples?
Does the model may predict no symbol for a test sample, i.e. no characters output?
Or, in the prediction of y, the number of '|' is 7, not an even number, '|' should be a pair for one prediction?
So, how can I parse the prediction of x, to ensure the number of predicted words equal to the number of test samples? Thank you for your kind help!
from returnn.
hi, @pvoigtlaender , really grateful for your kind help and prompt reply! :-)
Yes, according to your example code, I can correctly decode the predicted log-posterior sequence.
I need to dig deeper into the returnn source code. :-)
Thank you again.
from returnn.
It seems like you have a simple working decoder now, so I'm closing for now.
Feel free to reopen if you have more questions
from returnn.
Hi, dear @pvoigtlaender , always grateful for your kind help. These days I was doing some other data collection works. When I analyzed the decoded log-posterior of the trained model. I found the order the input samples is different with the orders of the predicted results using the decoding algorithm above.
Specifically, given 5 images with groundtruth strings as
image1.jpg Today is sunny
image2.jpg I went to school.
image3.jpg There are 30 classmates in my class
image4.jpg We have four lessons in the morning
image5.jpg And three lessons in the afternoon
When created the .h5 file for predicting and predicted the results via trained models, I usually get sequential results which have different order with the original 5 images, the prediction is often as follows:
image5.jpg And three lessons in the afternoon
image3.jpg There are 30 classmates in my class
image1.jpg Today is sunny
image4.jpg We have four lessons in the morning
image2.jpg I went to school.
why the order of the input images is different from the predicted one? Is it possible to keep the orders of them?
Thank you!
from returnn.
Hi,
I think probably the order in which you write into the hdf5 file is used, can you verify, if the order of the output is consistent with the order in the hdf5 file? If yes, then you should make sure to use the right order when creating your file.
from returnn.
Hi, dear @pvoigtlaender ,
I have checked the hdf5 files that was created from the 5 images(the order to create the hdf5 file is image1,2,3,4,5), however, when i got the predicted h5files and parsed it using the decoding algorithm, the order changed ... it was really weird.
Usually, the order is the same during creating and predicting in the returnn code?
Thank you!
from returnn.
Hi,
can you please also try h5dump both the file used as input and the file which came out when forwarding
to check the order which is actually stored in the files?
And if you run multiple times, is the order then consistent, or might it be that for some reason the order is randomized?
I'm also not so sure, why this is happening here. @albertz can you help here?
from returnn.
hi, dear @pvoigtlaender , thank you for your help all the time.
actually, i found that the decoding procedure reordered the original order of batch images.
For example: the original order is 0,1,2,3,4,5, then the decoded order becomes 2,0,1,4,5,3.
Anyway, when I decoding, I can get the decoded sequence and the correpsonding image name, thus, even the order changes, it is still possible to reorder the decoding result according to the image names.
Thank you again~
from returnn.
Hi @interxuxing
since you have experience with returnn i was wondering if you could help me as a beginner to understand some issues. I have run the IAM full dataset using config_real and i stopped the training after 72 epochs. After that i have run the config_fwd and i got a mdlstm_real_valid.h5 file. The prior folder is still empty for some reason which i dont know why.
Now my questions is based on these files that i have how can i do testing on my models to check the accuracy of the trained models. How can i see how accurate are my models now. it seems like you had the same experience so i would appreciate it if you could help me.
Regards
Arman
from returnn.
Related Issues (20)
- ReturnnDumpHDFJob bug HOT 4
- PT DistributedDataParallel with mixed precision training HOT 5
- hdf_dump not working with SprintCacheDataset + seq_list_filter_file HOT 5
- param_dropout doesn't work with TF2.4 HOT 3
- RF CausalAttention get_sequence_mask_broadcast bug HOT 3
- PT potential CUDA mem leak? HOT 2
- `psutil` `_read_smaps_file` takes lots of time HOT 4
- Hang in `uvm_ioctl` in kernel HOT 2
- PyTorch CUDA OOM in distributed training HOT 7
- PyTorch distributed training, could not unlink the shared memory file
- PyTorch distributed training, hang in `all_reduce(_has_data ...`, after exception, Timed out waiting 1800000ms for send operation to complete HOT 4
- PyTorch training, some epochs very slow HOT 5
- PyTorch training RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR HOT 2
- PyTorch collect model statistics
- PyTorch recover after CUDA OOM with restart does not work with CUDA HOT 3
- PyTorch distributed training CPU OOM with sync_on_cpu HOT 1
- Support `torch.compile` for RF
- RF backend: PyTorch code
- Different effective learning rate reported over gpus HOT 11
- CUDA error: initialization error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from returnn.