Hello! I'm working with IAM dataset. I've been playing with the daem

Passing multiple data to the daemon about returnn HOT 6 CLOSED

rwth-i6 commented on July 17, 2024

Passing multiple data to the daemon

from returnn.

Comments (6)

pvoigtlaender commented on July 17, 2024

Hi,

which kind of model do you use for training? Note, that the data layout used in the write_to_hdf function from demos/mdlstm/IAM/create_IAM_dataset.py is only suitable for a 2D LSTM network, while the 1D networks use a different layout.

Do the demos work for you? If yes, then you should try to stick as close as possible to the way the demo creates the data.

Also note, that in the example, the data is put under the "inputs" key and not under the "data" key (although I'm not sure, if this matters)

Please also have a look at https://github.com/rwth-i6/returnn/blob/master/demos/mdlstm/artificial/create_test_h5.py which is a very simple script which shows how to properly create a data file for a 2D LSTM network.

If this still does not work for you, please let us now.

from returnn.

mmedinajiem commented on July 17, 2024

Thank you for your reply!

I'm using a model trained with the demo included in the code (demos/mdlstm/IAM/go.sh). The data I want to send are just images from the IAM dataset, so it's 2D.

Yeah. The demos work. I'm creating the data pretty much in the same way it's being done in the code.

What I'm trying to do is to setup a demo on a web service, where you can load a trained model and send it data to recognize, get the result and show it on a web page. The issue I'm having is that when I send just one image, I get result a result, decode it and it's fine, but when sending more than one I get a long sequence, and when I decode it, it's gibberish. I want to send a request with more than one image at the same time, if possible.

Working on a AWS instance (2GB GPU), If I send three images to the daemon, it crashes:

python2.7: mod.cu:3443: int _GLOBAL__N__38_tmpxft_00005d36_00000000_9_mod_cpp1_ii_5bcebdd5::__struct_compiled_op_b7d1f699ec8aa72531b9afc40db7fbc6::run(): Assertion `V49' failed.
Fatal Python error: Aborted

Current thread 0x00007f8738f30740 (most recent call first):
  File "/home/mmedina/ReturRNN/dlenv/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 859 in __call__
  File "/home/mmedina/ReturRNN/returnn/Device.py", line 759 in compute_run
  File "/home/mmedina/ReturRNN/returnn/Device.py", line 1034 in process_inner
  File "/home/mmedina/ReturRNN/returnn/Device.py", line 887 in process
  File "/home/mmedina/ReturRNN/returnn/TaskSystem.py", line 1195 in _asyncCall
  File "/home/mmedina/ReturRNN/returnn/TaskSystem.py", line 470 in funcCall
  File "/home/mmedina/ReturRNN/returnn/TaskSystem.py", line 957 in checkExec
  File "/home/mmedina/ReturRNN/returnn/TaskSystem.py", line 1304 in <module>
Dev gpu0 proc died: recv_bytes EOFError: 
device crashed on batch 0

If I do the same on a local server with 3 Titan X GPUs (36GB RAM) it does not crash, but I only get a long sequence as result, and when decoding it I just get nonsense. Judging by these factors, my guess is that somehow the library thinks that the data contained in "data" is just one image and processes it as one single sequence.

This is the code I use to create the data and the JSON I pass to the daemon:

def normalize_image(imgfile, pad_x=15, pad_y=15):
    img = imread(imgfile)
    img = 255 - img;
    img = np.pad(img, ((pad_y, pad_y), (pad_x, pad_x)), 'constant')
    padded_shape = img.shape
    img = img.reshape(img.size, 1)
    img = img.astype("float32") / 255.0
    return img, padded_shape


def build_json_from_file(imgfile):
    data_structure = {}
    data_structure['classes'] = [79,1]
    data_structure['data'] = []
    data_structure['sizes'] = []

    imgs = []
    padded_shapes = []

    with open(imgfile, 'r') as f:
        for image in f.read().splitlines():
            img, padded_shape = normalize_image(image, 15, False)            
            imgs.append(img)
            padded_shapes.append(padded_shape)

    imgs = np.concatenate(imgs, axis=0)
    print np.array(imgs).shape
    padded_shapes = np.concatenate(padded_shapes, axis=0)

    data_structure['sizes'] += padded_shapes.tolist()
    
    for img in imgs:        
        as_list = img.tolist()
        data_structure['data'].append(as_list)

    return data_structure

If sending to the daemon multiple images in one request is not possible, I was thinking on receiving the images in a generic request, create an .h5 file with it, fire up ./rnn.py with a custom configuration file that includes the path of the created .h5 file, and then, when finished, get the results somehow, create a response and send it back to the caller, but I think it's too much.

Please let me know if I'm not clear enough. I've been stuck on this issue for a couple of days now and I may be missing or omitting something.

I really appreciate your help.

Thanks!

from returnn.

pvoigtlaender commented on July 17, 2024

Hi,

first of all, the the error "Assertion `V49' failed." indicates out of GPU memory (sorry for the unspecific error message there, we should improve this).

And yes, it should be possible to forward multiple images at the same time.

You said, that the demo for training works. Does it also work, when you use the demo data for forwarding?

So far I wasn't able to see, where the problem comes from. Can you please send me the config and a small h5 data file you are using to p.voigtlaender [at] gmail.com ?

Edit: Please note that when forwarding to hdf, the result is stored as one long sequence, which has to be splitted using the seqLengths, so that "it does not crash, but I only get a long sequence as result" is expected, however the result should not be nonsense but the concatenation of the contents of both images in this case.
Btw, the daemon you are using is an experimental and undocumented feature. Maybe first try to get everything working with a "normal" forwarding to hdf5, so we can isolate the problem

from returnn.

mmedinajiem commented on July 17, 2024

Thanks again for your reply. Really appreciate that you're taking time to do this.

I have not tried forwarding. I'll send you the config file. Also, I'm not using any h5 file so far. I read the image from disk, convert it in the format the JSON expects (based on how you prepare the data for writing in an h5 file in the code demos/mdlstm/IAM/create_IAM_dataset.py I'll send you the full program I'm using so you can have a better look at what I'm trying to achieve.

About the daemon: I understand. I found it while studying the code and thought it was the easiest option to get something working.

from returnn.

doetsch commented on July 17, 2024

Hi Manuel,

So you are currently using the daemon functionality within RETURNN as defined in Engine.py? This is a very experimental feature which so far was only used in a toy chat bot experiment.

There might be bugs, but in general you should be able to use it. Each call to classify only accepts a single sequence for now, but you can simply make multiple classify requests asynchronously and remember the hashes it returns and ask for them in any order to get the results (or a message that they are not done yet). There is no need to wait for previous results before making new requests.

If there are more performance requirements then I can look into extending the server to support batches of sequences.

from returnn.

mmedinajiem commented on July 17, 2024

Hi Patrick. Yes. I found the daemon in Engine.py and thought of giving it a try as it is perfect for what I'm trying to accomplish. I see. I supposed that it only could work with a single sequence as I got good results when sending one, but when I tried with two or more, either it crashes or the results are not good (using the same model). The solution you suggest sounds also good. I'll discuss with my team to see what is the best for us. It would be great if the daemon could accept batches of sequences! But I understand if this is not a priority. I also checked that you are adding support for Tensorflow. That would be great too! I work normally with Keras using Tensorflow as backend. Not so familiar with Theano, but it's nothing I can't handle. Thank you very much for your reply. I'll be bothering you again if we run into issues. I mentioned I owe you Paul beer, well, now I own you one too... although I don't think Japanese beer can be better than German beer. Best regards, (EDIT: Sorry for confusing the names)

from returnn.

Passing multiple data to the daemon about returnn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent