Comments (12)
Hi,
unfortunately our mdlstm implementation is for thano only, so you can't use the tensorflow backend for that.
from returnn.
Hi,
Could you give me some directions if I want to "hack into" it myself?
Thanks
from returnn.
I got rid of "use_tensorflow:true" from config_real, and "./go.sh",
Here is another exception:
......
File "/home/ubuntu/temp/returnn/Updater.py", line 4, in
line: import theano
locals:
theano =
File "/usr/local/lib/python2.7/dist-packages/theano/init.py", line 116, in
line: theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1()
locals:
theano = <module 'theano' from '/usr/local/lib/python2.7/dist-packages/theano/init.pyc'>
theano.sandbox = <module 'theano.sandbox' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/init.pyc'>
theano.sandbox.cuda = <module 'theano.sandbox.cuda' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/init.pyc'>
theano.sandbox.cuda.tests = <module 'theano.sandbox.cuda.tests' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/init.pyc'>
theano.sandbox.cuda.tests.test_driver = <module 'theano.sandbox.cuda.tests.test_driver' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.pyc'>
theano.sandbox.cuda.tests.test_driver = <module 'theano.sandbox.cuda.tests.test_driver' from '/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.pyc'>
theano.sandbox.cuda.tests.test_driver.test_nvidia_driver1 = <function test_nvidia_driver1 at 0x7ffb30ab4cf8>
File "/usr/local/lib/python2.7/dist-packages/theano/sandbox/cuda/tests/test_driver.py", line 41, in test_nvidia_driver1
line: raise Exception("The nvidia driver version installed with this OS "
locals:
Exception = <type 'exceptions.Exception'>
Exception: The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads
Device proc gpuX (gpuZ) died: ProcConnectionDied('recv_bytes EOFError: ',)
Theano flags: compiledir_format=compiledir_%(platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s--dev-gpuZ,device=gpu,force_device=True
EXCEPTION
Traceback (most recent call last):
File "/home/ubuntu/temp/returnn/Device.py", line 347, in startProc
line: self._startProc(*args, **kwargs)
locals:
self = <Device.Device object at 0x7ff35acd5cd0>
self._startProc = <bound method Device._startProc of <Device.Device object at 0x7ff35acd5cd0>>
args = ('gpuZ',)
kwargs = {}
File "/home/ubuntu/temp/returnn/Device.py", line 401, in _startProc
line: interrupt_main()
locals:
interrupt_main = <function interrupt_main at 0x7ff35b881668>
File "/home/ubuntu/temp/returnn/Util.py", line 665, in interrupt_main
line: sys.exit(1) # And exit the thread.
locals:
sys = <module 'sys' (built-in)>
sys.exit =
SystemExit: 1
KeyboardInterrupt
EXCEPTION
Traceback (most recent call last):
File "../../../rnn.py", line 539, in main
line: init(commandLineOptions=argv[1:])
locals:
init = <function init at 0x7ff35accd1b8>
commandLineOptions =
argv = ['../../../rnn.py', 'config_real'], _[0]: {len = 15}
File "../../../rnn.py", line 341, in init
line: devices = initDevices()
locals:
devices =
initDevices = <function initDevices at 0x7ff35acccd70>
File "../../../rnn.py", line 154, in initDevices
line: time.sleep(0.25)
locals:
time = <module 'time' (built-in)>
time.sleep =
KeyboardInterrupt
Quitting
I didn't hit any key though!
from returnn.
Hi,
the important part of the messages is this:
The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads
Please try another nvidia driver.
For mdlstm with tensorflow, also have a look at #8
The most important part would be to wrap our CUDA based mdlstm implementation as a tensorflow kernel. If you want to look further into this, I can give you a few more details. I think, however, that it will require some (not sure how much actually) effort and cannot just be hacked in a few minutes.
from returnn.
Hi,
"If you want to look further into this, I can give you a few more details."
Yes, please send me the details. I am interested in making this work.
Thanks!
from returnn.
Hi,
With regard to the error message:
"
The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem: http://developer.nvidia.com/cuda-downloads
"
I upgraded to nvidia-390, but still got this error ???!!!
Here is the HW/SW configuration:
Aws, p2.xlarge --- Tesla K80
nvidia 390
theano 0.9.0,
pygpu 0.6.9
Tried
%python ../../../rnn.py config_demo
"Could not find cudnn library (looked for v5[.1])
Tried
%THEANO_FLAGS=device=gpu python ../../../rnn.py config_demo
"... does not give good results ...."
Ran demo/demo-tf-lstm-benchmark.py
KeyError: 'lstmblockfused'
But I can see TFEngine works on the GPU.
Please advice!
Thanks
from returnn.
"Could not find cudnn library (looked for v5[.1])
Have a look at http://deeplearning.net/software/theano/library/sandbox/cuda/dnn.html on how to setup cudnn for theano
The nvidia driver version installed with this OS does not give good results for reduction.Installing the nvidia driver available on the same download page as the cuda package will fix the problem
I also don't know much about that. Might be related to the version of theano, are you using 0.9 or 1.0? It should be worth trying 0.9. A google search for the error message brought up quite some results, e.g. Theano/Theano#5530
from returnn.
For a possible port of mdlstm to tensorflow:
The mdlstm implementation is mainly here: https://github.com/rwth-i6/returnn/blob/master/cuda_implementation/MultiDirectionalTwoDLSTMOp.py
Here, we derive from theano.sandbox.cuda.GpuOp to define a theano Op. You would need to create a different wrapper for tensorflow, you can find some information about this here (pay special attention to the GPU kernel parts): https://www.tensorflow.org/extend/adding_an_op
from returnn.
Danke!
from returnn.
Is this bug now only about MDLSTM in TensorFlow? Then this is just a duplicate of #8.
from returnn.
@albertz
more like an unfinished feature than a bug!
from returnn.
Yea. Ok, then I'm closing this now. Anything related to MDLSTM in TF should be discussed in #8. If there is another separate issue, please open a separate issue.
from returnn.
Related Issues (20)
- CUDA error: initialization error HOT 3
- MultiProcDataset inside PyTorch DataLoader with num_workers>0, multiple issues HOT 4
- RuntimeError: CUDA error: unspecified launch failure HOT 2
- NonDaemonicSpawnProcess hangs at exit HOT 2
- High memory usage with datasets (specifically when multi procs are used)
- Hang at exit in TDL worker in multiprocessing `_run_finalizers`, deadlock in `_wait_for_tstate_lock`? HOT 6
- Hang HOT 2
- Returnn Native after using different apptainer uses old compilation HOT 6
- MetaDataset with sequence list filter file
- HDFDataset (or generic dataset) post processing HOT 15
- Dataset batching like ESPnet support
- torch.nn.functional.conv2d: RuntimeError: GET was unable to find an engine to execute this computation HOT 1
- TensorFlow 2.14 degradation in WER HOT 2
- Updates for recent TensorFlow version
- Hang in dataset iterator HOT 5
- Log GPU device for torch backend HOT 2
- torch.onnx.export requires input_names and output_names to be in order HOT 12
- RF weight dropout HOT 6
- Support for larger scale datasets HOT 33
- RuntimeError: CUDA error: unknown error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from returnn.