Giter VIP home page Giter VIP logo

Comments (12)

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

Hi,

unfortunately our mdlstm implementation is for thano only, so you can't use the tensorflow backend for that.

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

Hi,

Could you give me some directions if I want to "hack into" it myself?

Thanks

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

I got rid of "use_tensorflow:true" from config_real, and "./go.sh",

Here is another exception:

......
File "/home/ubuntu/temp/returnn/Updater.py", line 4, in
line: import theano
locals:
theano =
File "/usr/local/lib/python2.7/dist-packages/theano/init.py", line 116, in
line: theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
locals:
theano = <module 'theano' from '/usr/local/lib/python2.7/dist-packages/theano/init.pyc'>
theano.sandbox = <module 'theano.sandbox' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/init.pyc'>
theano.sandbox.cuda = <module 'theano.sandbox.cuda' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/init.pyc'>
theano.sandbox.cuda.tests = <module 'theano.sandbox.cuda.tests' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/init.pyc'>
theano.sandbox.cuda.tests.test_driver = <module 'theano.sandbox.cuda.tests.test_driver' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.pyc'>
theano.sandbox.cuda.tests.test_driver = <module 'theano.sandbox.cuda.tests.test_driver' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.pyc'>
theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1 = <function test_nvidia_driver1 at 0x7ffb30ab4cf8>
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.py", line 41, in test_nvidia_driver1
line: raise Exception("The nvidia driver version installed with this OS "
locals:
Exception = <type 'exceptions.Exception'>
Exception: The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads
Device proc gpuX (gpuZ) died: ProcConnectionDied('recv_bytes EOFError: ',)
Theano flags: compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--dev-gpuZ,device=gpu,force_device=True
EXCEPTION
Traceback (most recent call last):
File "/home/ubuntu/temp/returnn/Device.py", line 347, in startProc
line: self._startProc(*args, **kwargs)
locals:
self = <Device.Device object at 0x7ff35acd5cd0>
self._startProc = <bound method Device._startProc of <Device.Device object at 0x7ff35acd5cd0>>
args = ('gpuZ',)
kwargs = {}
File "/home/ubuntu/temp/returnn/Device.py", line 401, in _startProc
line: interrupt_main()
locals:
interrupt_main = <function interrupt_main at 0x7ff35b881668>
File "/home/ubuntu/temp/returnn/Util.py", line 665, in interrupt_main
line: sys.exit(1) # And exit the thread.
locals:
sys = <module 'sys' (built-in)>
sys.exit =
SystemExit: 1
KeyboardInterrupt
EXCEPTION
Traceback (most recent call last):
File "../../../rnn.py", line 539, in main
line: init(commandLineOptions=argv[1:])
locals:
init = <function init at 0x7ff35accd1b8>
commandLineOptions =
argv = ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 341, in init
line: devices = initDevices()
locals:
devices =
initDevices = <function initDevices at 0x7ff35acccd70>
File "../../../rnn.py", line 154, in initDevices
line: time.sleep(0.25)
locals:
time = <module 'time' (built-in)>
time.sleep =
KeyboardInterrupt
Quitting

I didn't hit any key though!

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

Hi,

the important part of the messages is this:

The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads

Please try another nvidia driver.

For mdlstm with tensorflow, also have a look at #8
The most important part would be to wrap our CUDA based mdlstm implementation as a tensorflow kernel. If you want to look further into this, I can give you a few more details. I think, however, that it will require some (not sure how much actually) effort and cannot just be hacked in a few minutes.

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

Hi,

"If you want to look further into this, I can give you a few more details."
Yes, please send me the details. I am interested in making this work.
Thanks!

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

Hi,
With regard to the error message:
"
The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads
"
I upgraded to nvidia-390, but still got this error ???!!!
Here is the HW/SW configuration:

Aws, p2.xlarge --- Tesla K80
nvidia 390
theano 0.9.0,
pygpu 0.6.9

Tried

%python ../../../rnn.py config_demo

"Could not find cudnn library (looked for v5[.1])


Tried

%THEANO_FLAGS=device=gpu python ../../../rnn.py config_demo

"... does not give good results ...."


Ran demo/demo-tf-lstm-benchmark.py

KeyError: 'lstmblockfused'

But I can see TFEngine works on the GPU.

Please advice!

Thanks

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

"Could not find cudnn library (looked for v5[.1])

Have a look at http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html on how to setup cudnn for theano

The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem

I also don't know much about that. Might be related to the version of theano, are you using 0.9 or 1.0? It should be worth trying 0.9. A google search for the error message brought up quite some results, e.g. Theano/Theano#5530

from returnn.

pvoigtlaender avatar pvoigtlaender commented on July 17, 2024

For a possible port of mdlstm to tensorflow:
The mdlstm implementation is mainly here: https://github.com/rwth-i6/returnn/blob/master/cuda_implementation/MultiDirectionalTwoDLSTMOp.py
Here, we derive from theano.sandbox.cuda.GpuOp to define a theano Op. You would need to create a different wrapper for tensorflow, you can find some information about this here (pay special attention to the GPU kernel parts): https://www.tensorflow.org/extend/adding_an_op

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

Danke!

from returnn.

albertz avatar albertz commented on July 17, 2024

Is this bug now only about MDLSTM in TensorFlow? Then this is just a duplicate of #8.

from returnn.

rogerzico avatar rogerzico commented on July 17, 2024

@albertz
more like an unfinished feature than a bug!

from returnn.

albertz avatar albertz commented on July 17, 2024

Yea. Ok, then I'm closing this now. Anything related to MDLSTM in TF should be discussed in #8. If there is another separate issue, please open a separate issue.

from returnn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.