hawkaaron / warp-transducer Goto Github PK

View Code? Open in Web Editor NEW

307.0 16.0 124.0 245 KB

A fast parallel implementation of RNN Transducer.

License: Apache License 2.0

CMake 2.80% C++ 50.86% C 4.18% Python 26.37% Cuda 15.80%

warp-transducer rnn-transducer sequence-transduction

warp-transducer's Introduction

warp-transducer

A fast parallel implementation of RNN Transducer (Graves 2013 joint network), on both CPU and GPU.

GPU implementation is now available for Graves2012 add network.

GPU Performance

Benchmarked on a GeForce GTX 1080 Ti GPU.

T=150, L=40, A=28	warp-transducer
N=1	8.51 ms
N=16	11.43 ms
N=32	12.65 ms
N=64	14.75 ms
N=128	19.48 ms

T=150, L=20, A=5000	warp-transducer
N=1	4.79 ms
N=16	24.44 ms
N=32	41.38 ms
N=64	80.44 ms
N=128	51.46 ms

Interface

The interface is in include/rnnt.h. It supports CPU or GPU execution, and you can specify OpenMP parallelism if running on the CPU, or the CUDA stream if running on the GPU. We took care to ensure that the library does not preform memory allocation internally, in oder to avoid synchronizations and overheads caused by memory allocation. Please be carefull if you use the RNNTLoss CPU version, log_softmax should be manually called before the loss function. (For pytorch binding, this is optionally handled by tensor device.)

Compilation

warp-transducer has been tested on Ubuntu 16.04 and CentOS 7. Windows is not supported at this time.

First get the code:

git clone https://github.com/HawkAaron/warp-transducer
cd warp-transducer

create a build directory:

mkdir build
cd build

if you have a non standard CUDA install, add -DCUDA_TOOLKIT_ROOT_DIR=/path/to/cuda option to cmake so that CMake detects CUDA.

Run cmake and build:

cmake -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME ..
make

if it logs

-- cuda found TRUE
-- Building shared library with no GPU support

please run rm CMakeCache.txt and cmake again.

The C library should now be built along with test executables. If CUDA was detected, then test_gpu will be built; test_cpu will always be built.

Test

To run the tests, make sure the CUDA libraries are in LD_LIBRARY_PATH (DYLD_LIBRARY_PATH for OSX).

Contributing

We welcome improvements from the community, please feel free to submit pull requests.

Reference

warp-transducer's People

Stargazers

Watchers

Forkers

entn-at nashannashui huizhang0110 beyondboy ganji15 fireae templeblock yuq-1s lahiruts adonisues eggonlea aiyuzhi chanil1218 chung-i successren junailin alan101-tech tz579 chmenet fotwo jannepy hiyoung-asr podurem times125 wangmengzhi bicheng-ntnu noahchalifour ishine samchen saber5433 aheba digital10111 jaesong kamo-naoyuki lianghe423 by2101 patrixu rpersie nglehuy wangzesen farisalasmary changxiangshi double22a wlmsoft cst781 booker2681 qianlanwyd ncilfone weiwei-ww thanhkm r4nc0r joaoalvarenga tadas-subonis simbe195 dragonj123 many-hats sanzimu mddct gandroz vincentqb li563042811 aqlv tanghaitao1994 zhuleiustc1983 dophist sephiroce songtaoshi b-flo adeamoy coolwind8214 hiroaki-ogawa orion545 kalisgd sdli1995 wxy1988 bekinsmingo namnv1906 n3lly43 insutil-lab carlfm01 amathews-amd utorik45 drawfish novahow yjiangling sunpengfei1122 hd435 razorback3 zhangp85 beatlesctr jackwaterveg vtodream burin-n phakhawatchu yejing-lai gipder hisashi-ito yanbing-j dbyoung18 joshuaghost

warp-transducer's Issues

Working with GPU? Version Requirements?

Hi,

Just wanted to confirm that this repo is functioning properly with GPU.
After building warp-rnnt and running the training script I get the following error:

  File "train_rnnt.py", line 139, in train
    loss = model(xs, ys, xlen, ylen)
  File "/home/leow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/leow/speech/E2E-ASR/model.py", line 92, in forward
    loss = self.loss(out, ys, xlen, ylen)
  File "/home/leow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/leow/anaconda3/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 68, in forward
    return self.rnnt(acts, labels, act_lens, label_lens, self.size_average, self.blank_label)
  File "/home/leow/anaconda3/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 27, in forward
    0)
  File "/home/leow/anaconda3/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 180, in safe_call
    result = torch._C._safe_call(*args, **kwargs)
TypeError: initializer for ctype 'struct THCudaIntTensor *' must be a pointer to same type, not cdata 'struct THCudaLongTensor *'

Is this due to an issue with the GPU version, or perhaps some version incompatibility? This is using Python 3.6 / Pytorch 3.1 / Cuda 9.0.

cuda runtime error (77) : an illegal memory access was encountered

Hello

When I use Pytorch to train a RNN-Transducer model on a big dataset, I encoutered this error randomly.
It means maybe this error occurs at first epoch, maybe after several epochs. I'm sure it's not a problem of my code, because it works well in 100h dataset.

Traceback (most recent call last):
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/search/odin/wang/RNNT-Parallel/run_transducer_bmuf.py", line 211, in run
    transcription, prediction, transducer, optimizer, is_training=True, **conf['model_parameter'])
  File "/search/odin/wang/RNNT-Parallel/util/functions.py", line 188, in batch_iterator_transducer
    loss = objective(logits, batch_label, logits_len, label_len)
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 68, in forward
    return self.rnnt(acts, labels, act_lens, label_lens, self.size_average, self.blank_label)
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 33, in forward
    grads = grads / minibatch_size
  File "/search/odin/wang/anaconda2/envs/py3/lib/python3.6/site-packages/torch/tensor.py", line 342, in __div__
    return self.div(other)
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

I tried to set

torch.backends.cudnn.benchmark = False

and modified batchsize from 16 to 10 or 20, this problem is gone at first, but it wil come back after several epochs.

From the traces information, it seems to about batchsize.
Do you have some idea about this error?

Thank in advance

Can i use RNNTLoss in transducer_np.py while training transducer-model?

Program test cases do not pass

Thank you for your codes.
I have a problem GPU version warp-transducer.
Source code compilation is ok, but the program test (build/test_gpu) did not respond.

thank you

pytorch 1.0.1
centos 7
cuda 10
Tesla V100-PCIE

TF 2.0 Support

Does it support TF2.0? Any working plan on that? Thanks!!

fatal error: cuda_runtime_api.h: No such file or directory

i set this
export CUDA_HOME="/usr/local/cuda/"
when i run "python setup.py install" i found this error.. how can i solve it ?
!! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
building 'warprnnt_pytorch.warp_rnnt' extension
gcc -pthread -B /home/imr555/miniconda3/envs/ariyan/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/imr555/Hasan/Project/RRNT/E2E-ASR-master/warp-transducer/include -I/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include -I/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/TH -I/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/THC -I/home/imr555/miniconda3/envs/ariyan/include/python3.6m -c src/binding.cpp -o build/temp.linux-x86_64-3.6/src/binding.o -std=c++11 -fPIC -DWARPRNNT_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=warp_rnnt -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/THC/THCGeneral.h:12:0,
from /home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/THC/THC.h:4,
from src/binding.cpp:8:
/home/imr555/miniconda3/envs/ariyan/lib/python3.6/site-packages/torch/lib/include/ATen/cuda/CUDAStream.h:6:30: fatal error: cuda_runtime_api.h: No such file or directory
#include "cuda_runtime_api.h"
^
compilation terminated.
error: command 'gcc' failed with exit status 1

Input activation format

I would like to use the pytorch binding for transducer loss. In the comments it is shown

acts: Tensor of (batch x seqLength x labelLength x outputDim) containing output from network

However, it is not clear if the input activation needs to be normalized probabilities (i.e. after softmax), log probability (i.e. log softmax) or unnormalized values. Could you please clarify on this?

Thank you very much.

negative loss in tf version

hi,
I have tried binding rnnt loss to tensorflow 1.4 and by testing the loss function I got negative results. The whole code are as follows. Why did it happen?

import numpy as np
import tensorflow as tf
from warprnnt_tensorflow import rnnt_loss

B = 1; T = 3; U = 2; V = 5; blank = V

logits = tf.placeholder(tf.float32, [None, T, U, V])
labels = tf.placeholder(tf.int32, [None, U])
input_length = tf.placeholder(tf.int32, [None])
label_length = tf.placeholder(tf.int32, [None])

logits = tf.nn.log_softmax(logits)
costs = rnnt_loss(logits, labels, input_length, label_length, blank)
grad = tf.gradients(costs, [logits])

a = np.array(
[[
[[0.1, 0.6, 0.1, 0.1, 0.1],
[0.1, 0.1, 0.6, 0.1, 0.1]],

                        [[0.1, 0.6, 0.1, 0.1, 0.1],
                         [0.1, 0.1, 0.6, 0.1, 0.1]],
                        
                        [[0.1, 0.6, 0.1, 0.1, 0.1],
                         [0.1, 0.3, 0.2, 0.3, 0.1]],
                       
                       ]]
, dtype=np.float32)

c = np.array([3], dtype=np.int32)
print(a.shape)
print(c)

b = np.array([[0,5]], dtype=np.int32)
d = np.array([1], dtype=np.int32)
print(b.shape)
print(d)

feed = {logits: a, labels: b, input_length: c, label_length: d}
with tf.Session() as sess:
logit, cost, grads = sess.run([logits, costs, grad], feed_dict=feed)
print(cost)

And does rnnt_loss computing process include softmax function? So when I get logits after rnn or fully connected layer, do I need to softmax the logits?

Thanks a lot!

RNNT Loss tensorflow binding

pytorch binding for pytorch 1.0

The current pytorch binding does not work for pytorch 1.0. Are there any plan to support pytorch 1.0?

How can I build the pkg in windows?

How can I build the pkg in windows?
Can you provide a compiled python installation package for tensorflow-gpu 2.0.0rc1? Just like warp-rnnt
thanks!

Adding rnnt_loss to Keras ("Cannot iterate over a shape with unknown rank.")

I'm trying to build up an rnn-t baseline system using Keras.
The rnn-t model can not be compiled with the error message below.

ValueError: Cannot iterate over a shape with unknown rank.

Please check the lambda function.

  def rnnt_lambda_func(args):
    # y_trans: the output of transcription networks
    # y_pred: the output of prediction networks
    # labels; true sequence
    y_trans, y_pred, labels, input_length, label_length = args
    import keras.backend as K

    # calculating lattices from the output from the prediction network and the transcription network.
    B = K.shape(y_trans)[0]
    T = K.shape(y_trans)[1]
    U = K.shape(y_pred)[1]
    V = K.shape(y_trans)[2]

    y_trans = K.reshape(K.tile(y_trans, [1, U, 1]), [B, T, U, V])
    y_pred = K.reshape(K.tile(y_pred, [1, T, 1]), [B, T, U, V])

    logit_lattice = K.exp(y_trans + y_pred)
    acts = K.softmax(logit_lattice, axis=3)

    from warprnnt_tensorflow import rnnt_loss
    return rnnt_loss(acts, K.cast(labels, 'int32'),
                                     K.cast(input_length, 'int32'),
                                     K.cast(label_length, 'int32'), 29)

This is the source code for adding a lambda layer to the model.

# y_trans: the output of transcription networks
# y_pred: the output of prediction networks
# labels; true sequence
    loss_out = Lambda(Util.rnnt_lambda_func, output_shape=(1,),
                      name=Constants.KEY_RNTLS)([y_trans, y_pred, labels, input_length, label_length])
    model = Model(inputs=[input_data] + [labels, input_length, label_length],
                             outputs=loss_out)
    optimizer = SGD(lr=lr, decay=lr * 0.0001, momentum=0.9, nesterov=True,
                    clipnorm=5)
    model.compile(loss={Constants.KEY_RNTLS: lambda y_true, y_pred: y_pred[0]},
               optimizer=optimizer)

build against pip-installed tensorflow-gpu gives segfault

I use pip-installed tensorflow-gpu. To complete the build, I changed the following (in addition to #5 ):

../../external/nsync/public  ->  external/nsync/public

use tf.sysconfig.get_lib()

if os.path.exists(os.path.join(tf.sysconfig.get_lib(), 'libtensorflow_framework.so')):
        extra_link_args = ['-L' + tf.sysconfig.get_lib(), '-ltensorflow_framework']

But I still get a segfault

$ python3 test_warprnnt_op.py 
2018-09-03 12:51:15.051736: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-03 12:51:15.145283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-09-03 12:51:15.145685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
totalMemory: 11.17GiB freeMemory: 11.09GiB
2018-09-03 12:51:15.145721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-09-03 12:51:15.468689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-03 12:51:15.468750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971]      0 
2018-09-03 12:51:15.468759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0:   N 
2018-09-03 12:51:15.469071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3432 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
*** Received signal 11 ***
*** BEGIN MANGLED STACK TRACE ***
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(+0x6a2c5b)[0x7fe9c3251c5b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fe9fc009390]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/warprnnt_tensorflow-0.1-py3.5-linux-x86_64.egg/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so(_ZN9warp_rnnt14WarpRNNTOpBase7ComputeEPN10tensorflow15OpKernelContextE+0x2b7)[0x7fe98b6d2d37]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZN10tensorflow13BaseGPUDevice13ComputeHelperEPNS_8OpKernelEPNS_15OpKernelContextE+0x37d)[0x7fe9c317ecdd]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0x8d)[0x7fe9c317f14d]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(+0x61b061)[0x7fe9c31ca061]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(+0x61b87a)[0x7fe9c31ca87a]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZN5Eigen26NonBlockingThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x21a)[0x7fe9c322c22a]
/home/ginter/venv-mozds-gpu/lib/python3.5/site-packages/tensorflow/python/../libtensorflow_framework.so(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x32)[0x7fe9c322b2d2]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7fe9b99d6c80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fe9fbfff6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe9fbd3541d]
*** END MANGLED STACK TRACE ***

*** Begin stack trace ***
	tensorflow::CurrentStackTrace()
	
	
	warp_rnnt::WarpRNNTOpBase::Compute(tensorflow::OpKernelContext*)
	tensorflow::BaseGPUDevice::ComputeHelper(tensorflow::OpKernel*, tensorflow::OpKernelContext*)
	tensorflow::BaseGPUDevice::Compute(tensorflow::OpKernel*, tensorflow::OpKernelContext*)
	
	
	Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop(int)
	std::_Function_handler<void (), tensorflow::thread::EigenEnvironment::CreateThread(std::function<void ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&)
	
	
	clone
*** End stack trace ***
Aborted (core dumped)

negative loss

Hi :
I am using the GPU version RNNTLoss, but the loss is negative.
the shape of logits and targets is as follows:
targets: [batch_size, U]
logits: [batch_size, T, U+1, hidden]
RNNTloss(logits, targets, input_len, target_len)

However when I run on cpu, the loss is normal.

Negative value of loss function

Hello,
Have you ever encountered such a situation that loss function had a negative values. I have no idea. The inputs of RNNLoss are as follows:
loss = RNNTLoss(logits, targets.int(), input_lengths.int(), target_lengths.int())
logits: (batch_size, time_steps, sequence_length, vocab_size)
targets: (batchsize, sequence_length) 2 dimensional Tensor containing all the targets of the batch with zero padded

Thanks a lot.

tensorflow test failed at test_multiple_batches_gpu with results dismatch

OS: Ubuntu 16.04.3 LTS
CUDA version: 9.0
GPU: Tesla P100

I built and installed the tensorflow-bind and it seems no error. However, when I am trying the unit test by running
python setup.py test
, it failed with the following information:

setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors
warnings.warn("Assuming tensorflow was compiled without C++11 ABI. "
running test
running egg_info
writing warprnnt_tensorflow.egg-info/PKG-INFO
writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt
writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt
reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow
/data/gengjie/workspace/warp-transducer/tensorflow_binding/setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors
warnings.warn("Assuming tensorflow was compiled without C++11 ABI. "
running test
running egg_info
writing warprnnt_tensorflow.egg-info/PKG-INFO
writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt
writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt
reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow
2019-07-16 08:09:39.373288: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-16 08:09:39.811372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:22:00.0
totalMemory: 15.90GiB freeMemory: 15.34GiB
2019-07-16 08:09:39.811462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-16 08:09:40.169127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 08:09:40.169209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-16 08:09:40.169237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-16 08:09:40.169744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0)
[4.280653 3.938437]
[array([[[[-1.86843961e-01, -6.25548363e-02, 2.49398738e-01],
[-2.03376651e-01, 2.02399358e-01, 9.77352262e-04],
[-1.41016066e-01, 7.91234598e-02, 6.18926175e-02]],

    [[-1.15517527e-02, -8.12802464e-02,  9.28320065e-02],
     [-1.54257044e-01,  2.29432672e-01, -7.51756430e-02],
     [-2.46593103e-01,  1.46404594e-01,  1.00188486e-01]],

    [[-1.29182935e-02, -6.15932457e-02,  7.45115280e-02],
     [-5.59856892e-02,  2.19830751e-01, -1.63845122e-01],
     [-4.97626871e-01,  2.09239930e-01,  2.88386971e-01]],

    [[ 1.36048598e-02, -3.02196294e-02,  1.66147705e-02],
     [ 1.13924518e-01,  6.27812073e-02, -1.76705718e-01],
     [-6.67078257e-01,  3.67658854e-01,  2.99419403e-01]]],


   [[[-3.56343776e-01, -5.53474724e-02,  4.11691159e-01],
     [-9.69219282e-02,  2.94590741e-02,  6.74628317e-02],
     [-6.35175705e-02,  2.76544970e-02,  3.58630754e-02]],

    [[-1.54498994e-01, -7.39420503e-02,  2.28441045e-01],
     [-1.66789889e-01, -8.79168510e-05,  1.66877776e-01],
     [-1.72369659e-01,  1.05565324e-01,  6.68043196e-02]],

    [[ 2.38748863e-02, -1.18255839e-01,  9.43809301e-02],
     [-1.04707092e-01, -1.08934462e-01,  2.13641584e-01],
     [-3.69844258e-01,  1.80118084e-01,  1.89726144e-01]],

    [[ 2.57137045e-02, -7.94617534e-02,  5.37480488e-02],
     [ 1.22328229e-01, -2.38788679e-01,  1.16460443e-01],
     [-5.98686934e-01,  3.02203149e-01,  2.96483815e-01]]]],
  dtype=float32)]

test_forward (test_warprnnt_op.WarpRNNTTest) ... 2019-07-16 08:09:40.466216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-16 08:09:40.466285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 08:09:40.466299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-16 08:09:40.466309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-16 08:09:40.466501: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0)
[4.4956665]
ok
test_multiple_batches_cpu (test_warprnnt_op.WarpRNNTTest) ... /data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py:14: DeprecationWarning: Please use assertEqual instead.
self.assertEquals(acts.shape, expected_grads.shape)
2019-07-16 08:09:40.505122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-16 08:09:40.505221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 08:09:40.505236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-16 08:09:40.505245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-16 08:09:40.505559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0)
ok
test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest) ... 2019-07-16 08:09:40.522426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-16 08:09:40.522482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 08:09:40.522506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-16 08:09:40.522522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-16 08:09:40.522782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0)
2019-07-16 08:09:40.538106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-07-16 08:09:40.538146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 08:09:40.538159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-07-16 08:09:40.538169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-07-16 08:09:40.538347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14862 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:22:00.0, compute capability: 6.0)
FAIL
test_session (test_warprnnt_op.WarpRNNTTest)
Use cached_session instead. ... ok

======================================================================
FAIL: test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest)

Traceback (most recent call last):
File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 92, in test_multiple_batches_gpu
self._test_multiple_batches(use_gpu=True)
File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 85, in _test_multiple_batches
self._run_rnnt(acts, labels, input_lengths, label_lengths, expected_costs, expected_grads, 0, use_gpu)
File "/data/gengjie/workspace/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 27, in _run_rnnt
self.assertAllClose(tf_costs, expected_costs, atol=1e-6)
File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1591, in assertAllClose
self._assertAllCloseRecursive(a, b, rtol=rtol, atol=atol, msg=msg)
File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1561, in _assertAllCloseRecursive
(path_str, path_str, msg)))
File "/data/gengjie/env/lib/python3.5/site-packages/tensorflow/python/framework/test_util.py", line 1496, in _assertArrayLikeAllClose
a, b, rtol=rtol, atol=atol, err_msg="\n".join(msgs), equal_nan=True)
File "/data/gengjie/env/lib/python3.5/site-packages/numpy/testing/_private/utils.py", line 1501, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/data/gengjie/env/lib/python3.5/site-packages/numpy/testing/_private/utils.py", line 827, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-06, atol=1e-06
Mismatched value: a is different from b.
not close where = (array([0, 1]),)
not close lhs = [-5.3799906 -5.5812006]
not close rhs = [4.28065 3.93844]
not close dif = [9.660641 9.519641]
not close tol = [5.28065e-06 4.93844e-06]
dtype = float32, shape = (2,)
Mismatch: 100%
Max absolute difference: 9.660641
Max relative difference: 2.4171095
x: array([-5.379991, -5.581201], dtype=float32)
y: array([4.28065, 3.93844], dtype=float32)

Ran 4 tests in 0.095s

FAILED (failures=1)

Can blank assign with non-zero?

hi, when I use the warprnnt_tensorflow.rnnt_loss(), I set the blank_label to vocab_size-1, is it right?

I saw the python interface, and one sentence infer to blank like:

The label reserved for the blank symbol should be label 0.

And then I check the source code, but there is no constraint on blank...

Are you still working on this repo?

Hi, I met a problem using pytorch_binding currently. I am using pytorch 0.4.0 with both python 2 and 3. It gives me this error when I run 'from warprnnt_pytorch import RNNTLoss'

Traceback (most recent call last):
File "", line 1, in
File "warprnnt_pytorch/init.py", line 8, in
from ._warp_rnnt import *
File "warprnnt_pytorch/_warp_rnnt/init.py", line 3, in
from .__warp_rnnt import lib as _lib, ffi as _ffi
ImportError: No module named __warp_rnnt

tf-binding Registering two gradient with name 'WarpRNNT' !

Hello! first thanks for your code, it helps a lot!
But I was encountered some problems when I install the tensorflow-binding. I build the libwarprnnt.so successfully and passes the test cases " test_cpu test_gpu test_time test_time_gpu".
And then "cd tensorflow_binding; sudo -E CUDA=/usr/local/cuda python3 setup.py install", it seems performing well, but when I ran "sudo -E CUDA=/usr/local/cuda python3 setup.py test", it failed.
Here is the output log:

#####################################log start
_setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors
warnings.warn("Assuming tensorflow was compiled without C++11 ABI. "
running test
running egg_info
writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt
writing warprnnt_tensorflow.egg-info/PKG-INFO
writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt
reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow
/disk2/syd/warp-transducer/tensorflow_binding/setup.py:63: UserWarning: Assuming tensorflow was compiled without C++11 ABI. It is generally true if you are using binary pip package. If you compiled tensorflow from source with gcc >= 5 and didn't set -D_GLIBCXX_USE_CXX11_ABI=0 during compilation, you need to set environment variable TF_CXX11_ABI=1 when compiling this bindings. Also be sure to touch some files in src to trigger recompilation. Also, you need to set (or unsed) this environment variable if getting undefined symbol: _ZN10tensorflow... errors
warnings.warn("Assuming tensorflow was compiled without C++11 ABI. "
running test
running egg_info
writing dependency_links to warprnnt_tensorflow.egg-info/dependency_links.txt
writing warprnnt_tensorflow.egg-info/PKG-INFO
writing top-level names to warprnnt_tensorflow.egg-info/top_level.txt
reading manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_tensorflow.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.5/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so -> warprnnt_tensorflow
2019-05-21 15:44:57.870631: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-21 15:45:02.016217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2019-05-21 15:45:02.607413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 1 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:03:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2019-05-21 15:45:03.151200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 2 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:82:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2019-05-21 15:45:03.778234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 3 with properties:
name: TITAN Xp major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:83:00.0
totalMemory: 11.90GiB freeMemory: 11.74GiB
2019-05-21 15:45:03.781227: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-05-21 15:45:05.104860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-21 15:45:05.104906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2019-05-21 15:45:05.104912: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2019-05-21 15:45:05.104915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2019-05-21 15:45:05.104918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2019-05-21 15:45:05.104921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2019-05-21 15:45:05.105817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11357 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-05-21 15:45:05.208888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11357 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-05-21 15:45:05.312473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 11357 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:82:00.0, compute capability: 6.1)
2019-05-21 15:45:05.415492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 11357 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
[4.280653 3.9384367]
[array([[[[-1.86843872e-01, -6.25548586e-02, 2.49398723e-01],
[-2.03376621e-01, 2.02399313e-01, 9.77304531e-04],
[-1.41016081e-01, 7.91234747e-02, 6.18926138e-02]],

    [[-1.15518123e-02, -8.12801942e-02,  9.28320065e-02],
     [-1.54257059e-01,  2.29432628e-01, -7.51755759e-02],
     [-2.46593088e-01,  1.46404624e-01,  1.00188479e-01]],

    [[-1.29182916e-02, -6.15932457e-02,  7.45115355e-02],
     [-5.59857599e-02,  2.19830781e-01, -1.63845018e-01],
     [-4.97627079e-01,  2.09239975e-01,  2.88387090e-01]],

    [[ 1.36048663e-02, -3.02196350e-02,  1.66147687e-02],
     [ 1.13924518e-01,  6.27811924e-02, -1.76705718e-01],
     [-6.67078257e-01,  3.67658824e-01,  2.99419463e-01]]],


   [[[-3.56343716e-01, -5.53474464e-02,  4.11691159e-01],
     [-9.69219282e-02,  2.94591114e-02,  6.74628168e-02],
     [-6.35175407e-02,  2.76544876e-02,  3.58630568e-02]],

    [[-1.54498979e-01, -7.39419907e-02,  2.28440970e-01],
     [-1.66789874e-01, -8.78970968e-05,  1.66877761e-01],
     [-1.72369599e-01,  1.05565295e-01,  6.68042973e-02]],

    [[ 2.38749050e-02, -1.18255846e-01,  9.43809450e-02],
     [-1.04707167e-01, -1.08934328e-01,  2.13641495e-01],
     [-3.69844109e-01,  1.80117995e-01,  1.89726129e-01]],

    [[ 2.57137045e-02, -7.94617534e-02,  5.37480488e-02],
     [ 1.22328207e-01, -2.38788620e-01,  1.16460413e-01],
     [-5.98686934e-01,  3.02203119e-01,  2.96483815e-01]]]],
  dtype=float32)]

test_forward (test_warprnnt_op.WarpRNNTTest) ... 2019-05-21 15:45:05.675877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-05-21 15:45:05.676081: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-21 15:45:05.676101: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2019-05-21 15:45:05.676113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2019-05-21 15:45:05.676123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2019-05-21 15:45:05.676132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2019-05-21 15:45:05.676142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2019-05-21 15:45:05.676874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11357 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-05-21 15:45:05.677075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11357 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-05-21 15:45:05.677804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 11357 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:82:00.0, compute capability: 6.1)
2019-05-21 15:45:05.678136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 11357 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
[4.4956665]
ok
test_multiple_batches_cpu (test_warprnnt_op.WarpRNNTTest) ... /disk2/syd/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py:14: DeprecationWarning: Please use assertEqual instead.
self.assertEquals(acts.shape, expected_grads.shape)
2019-05-21 15:45:05.746685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-05-21 15:45:05.746852: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-21 15:45:05.746871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2019-05-21 15:45:05.746882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2019-05-21 15:45:05.746891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2019-05-21 15:45:05.746899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2019-05-21 15:45:05.746908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2019-05-21 15:45:05.747531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11357 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-05-21 15:45:05.747657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11357 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-05-21 15:45:05.747763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 11357 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:82:00.0, compute capability: 6.1)
2019-05-21 15:45:05.747888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 11357 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
ok
test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest) ... 2019-05-21 15:45:05.769992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-05-21 15:45:05.770181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-21 15:45:05.770202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2019-05-21 15:45:05.770215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2019-05-21 15:45:05.770226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2019-05-21 15:45:05.770237: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2019-05-21 15:45:05.770247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2019-05-21 15:45:05.770935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:0 with 11357 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-05-21 15:45:05.771051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:1 with 11357 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-05-21 15:45:05.771161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:2 with 11357 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:82:00.0, compute capability: 6.1)
2019-05-21 15:45:05.771274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/device:GPU:3 with 11357 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
2019-05-21 15:45:05.789642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0, 1, 2, 3
2019-05-21 15:45:05.789855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-21 15:45:05.789882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0 1 2 3
2019-05-21 15:45:05.789896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N Y N N
2019-05-21 15:45:05.789908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 1: Y N N N
2019-05-21 15:45:05.789919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 2: N N N Y
2019-05-21 15:45:05.789931: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 3: N N Y N
2019-05-21 15:45:05.790727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11357 MB memory) -> physical GPU (device: 0, name: TITAN Xp, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-05-21 15:45:05.790897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 11357 MB memory) -> physical GPU (device: 1, name: TITAN Xp, pci bus id: 0000:03:00.0, compute capability: 6.1)
2019-05-21 15:45:05.791044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 11357 MB memory) -> physical GPU (device: 2, name: TITAN Xp, pci bus id: 0000:82:00.0, compute capability: 6.1)
2019-05-21 15:45:05.791222: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 11357 MB memory) -> physical GPU (device: 3, name: TITAN Xp, pci bus id: 0000:83:00.0, compute capability: 6.1)
ok
test_session (test_warprnnt_op.WarpRNNTTest)
Returns a TensorFlow Session for use in executing tests. ... ok

Ran 4 tests in 0.159s

OK
test_basic (unittest.loader._FailedTest) ... ERROR
test_warprnnt_op (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_basic (unittest.loader._FailedTest)

ImportError: Failed to import test module: test_basic
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/disk2/syd/warp-transducer/tensorflow_binding/tests/test_basic.py", line 3, in
from warprnnt_tensorflow import rnnt_loss
File "/disk2/syd/warp-transducer/tensorflow_binding/warprnnt_tensorflow/init.py", line 37, in
@ops.RegisterGradient("WarpRNNT")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2285, in call
_gradient_registry.register(f, self._op_type)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/registry.py", line 62, in register
(self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'WarpRNNT' !(Previous registration was in setup /usr/lib/python3.5/distutils/core.py:148)"

======================================================================
ERROR: test_warprnnt_op (unittest.loader._FailedTest)

ImportError: Failed to import test module: test_warprnnt_op
Traceback (most recent call last):
File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
module = self._get_module_from_name(name)
File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
import(name)
File "/disk2/syd/warp-transducer/tensorflow_binding/tests/test_warprnnt_op.py", line 3, in
from warprnnt_tensorflow import rnnt_loss
File "/disk2/syd/warp-transducer/tensorflow_binding/warprnnt_tensorflow/init.py", line 37, in
@ops.RegisterGradient("WarpRNNT")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2285, in call
_gradient_registry.register(f, self._op_type)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/registry.py", line 62, in register
(self._name, name, function_name, filename, line_number))
KeyError: "Registering two gradient with name 'WarpRNNT' !(Previous registration was in setup /usr/lib/python3.5/distutils/core.py:148)"

Ran 2 tests in 0.000s

FAILED (errors=2)
Test failed: <unittest.runner.TextTestResult run=2 errors=2 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=2 errors=2 failures=0>_
####################################log end

BTW，when I ran "python3 tests/test_basic.py" and "python3 tests/test_warprnnt_op.py" it passed. I'm confused about the differences and it seems "python setup.py test" runs the test cases twice and it repo the issues the second time.
Do you have any idea about what happened here ?

When i use this interface(warprnnt_tensorflow.rnnt_loss) in my own tensorflow train program(timit as the train dataset), I encountered "CUDA_ERROR_ILLEGAL_ADDRESS" error randomly, sometimes it occurs after 1 step, sometimes several steps. The log details is:
###start###
2019-05-21 15:05:25.985187: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:649] failed to record completion event; therefore, failed to create inter-stream dependency
2019-05-21 15:05:25.985270: I tensorflow/stream_executor/stream.cc:4793] stream 0x7fdeba0adbb0 did not memcpy host-to-device; source: 0x104e2486900
2019-05-21 15:05:25.985289: E tensorflow/stream_executor/stream.cc:318] Error recording event in stream: error recording CUDA event on stream 0x7fdeba0adc80: CUDA_ERROR_ILLEGAL_ADDRESS; not marking stream as bad, as the Event object may be at fault. Monitor for further errors.
2019-05-21 15:05:25.985312: E tensorflow/stream_executor/cuda/cuda_event.cc:48] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS
2019-05-21 15:05:25.985329: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:206] Unexpected Event status: 1
Aborted
###end###

thanks for your time. I'm really appreciate if you could reply.

tf.version: 1.10.1
cuda: 9.0
GPU: TITAN Xp 12G
platfom: Linux Ubuntu 16.04.3 LTS
corpus: timit(123dim fbank to 61phns)

dynamic module does not define module export function(PyInit_warp_rnnt)

When I finished the install , then run python test.py in pytorch_binding/test/test.py and meet the problem:
ImportError: dynamic module does not define module export function(PyInit_warp_rnnt)
It seems to be because I can't import the library.

My installation environment：
cuda 9.0
gcc 4.8.5
pytorch 1.0.0

GPU version gives nan value

Hi. First thank you for your code.
I have a problem GPU version warp-transducer.
I applied gpu version warp-transducer to your E2E-ASR code. (with minor adjust)
At this case, loss became "nan" after several steps.
Is there any requirement in cuda or another problem?

thank you

pytorch 1.0.1
ubuntu 18
cuda 10

Could you please say what can be improved in this implementation?

warp-transducer/include/detail/gpu_rnnt.h

Line 197 in 6e33845

// TODO optimize gradient kernel

warp rnnt tensorflow binding issue

Hi @HawkAaron ,

I was trying to install your RNNT tensorflow binding so that I can use the RNNTLoss function in TensorFlow. I followed your steps and built libwarprnnt.so first then python setup.py install. When I tried importing the warprnnt_tensorflow to my python environment, I got the following error:

In [1]: import warprnnt_tensorflow

NotFoundError Traceback (most recent call last)
in ()
----> 1 import warprnnt_tensorflow

~/warp-transducer/tensorflow_binding/warprnnt_tensorflow/init.py in ()
4
5 lib_file = imp.find_module('kernels', path)[1]
----> 6 _warprnnt = tf.load_op_library(lib_file)
7
8

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py in load_op_library(library_filename)
56 """
57 with errors_impl.raise_exception_on_not_ok_status() as status:
---> 58 lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
59
60 op_list_str = py_tf.TF_GetOpList(lib_handle)

~/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
514 None, None,
515 compat.as_text(c_api.TF_Message(self.status.status)),
--> 516 c_api.TF_GetCode(self.status.status))
517 # Delete the underlying status object from memory o

Any idea how to resolve this issue?

Thanks!

What's the license of this project

I've found license=apache in setup.py. However, I couldn't find any LICENSE file in this project. Would you please provide it? Thanks

AttributeError: module 'warprnnt_pytorch' has no attribute 'cpu_rnnt

When I run test.py in pytorch_binding/test, I got the following error:

  File "test.py", line 165, in <module>
    small_test()
  File "test.py", line 60, in small_test
    cost, grads = wrap_and_call(acts, labels)
  File "test.py", line 44, in wrap_and_call
    costs = fn(acts, labels, lengths, label_lengths)
  File "/home/zhuminxian/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data2/zhuminxian/ASR/RNN-T/warp-transducer/pytorch_binding/warprnnt_pytorch/__init__.py", line 100, in forward
    return self.loss(acts, labels, act_lens, label_lens, self.blank, self.reduction)
  File "/data2/zhuminxian/ASR/RNN-T/warp-transducer/pytorch_binding/warprnnt_pytorch/__init__.py", line 23, in forward
    loss_func = warp_rnnt.gpu_rnnt if is_cuda else warp_rnnt.cpu_rnnt
AttributeError: module 'warprnnt_pytorch' has no attribute 'cpu_rnnt'

It seems there is neither gpu_rnnt or cpu_rnnt in line 23 of pytorch_binding/warprnnt_pytorch/init.py.
Is it my problem about installation, or others?

RNNTloss decrease while the syllabel error rate increace

Hello, I used RNNT training on the Chinese speech recognition library of more than 300 hours (the encoder did pretrain, but the decoder is a random initialization parameter). After training dozens of epoch, the loss first quickly dropped from more than 1000 to 60. Then slowly dropped to more than 20, but the SER of inference has risen from 2 to 20. Is this normal? It seems that you mentioned this phenomenon elsewhere.
Thank you very much!

multi-gpu

Hi,

I am using pytorch with dataparallel on 2 GPUs but has "ValueError("Input length mismatch")" when computing loss. Have you tested it on multi-gpu?

Traceback (most recent call last): ... line 541, in __call__ result = self.forward(*input, **kwargs) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/_utils.py", line 385, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in replica 0 on device 0. Original Traceback (most recent call last): File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "model.py", line 86, in forward loss = self.loss(out, ys.int(), xlen, ylen) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 100, in forward return self.loss(acts, labels, act_lens, label_lens, self.blank, self.reduction) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 21, in forward certify_inputs(acts, labels, act_lens, label_lens) File "venv_rnnt_pytorch/lib64/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 138, in certify_inputs raise ValueError("Input length mismatch") ValueError: Input length mismatch

Grads mismatch when running tests

Hello,
Which version of GPU driver&CUDA library are you working on?
The test cases fail on a Tesla K80 GPU, with 375.26 driver, CentOS 7.
It's confusing me that I'm getting different results with different CUDA versions, and no other errors happened.

CUDA 8.0/cuDNN7.1.3

CUDA 7.5/cuDNN5.1

CUDA 7.5/cuDNN6.0

Tensorflow_binding?

Hi ,
Do you plan to finish Tensorflow_binding?
Looking forward to your reply

PyTorch reduction='none' is not properly handled

When loss option reduction='none', then in backward function self.grads.mul_(grad_output) cannot be properly computed.
Because grad_output dim is [batch_size] and self.grads dim is [batch x seqLength x labelLength x outputDim].
My ad-hoc solution would be making grad_output broadcastable by grad_output.view(-1, 1, 1, 1).

tensorflow binding

Hi @HawkAaron ,

The tensorflow rnnt_loss takes acts as one of the arguments. According to the description acts is a 4-dims tensor shaped [B,T,U,V] from the output network. I am wondering how you go about transforming both [B, T, V] from the encoder and [B, U] from the decoder into [B,T,U,V]. Could you give more tips about how you derived [B,T,U,V]?

BTW, I think the description on the repo regarding rnnt_loss tensorflow binding is off (says 3-dims for acts) and also, the test sample in add_network branch is not in synced with master.

Thanks!

does this binding support multi-gpus?

When I train the transducer with multi-gpus, there is a error:

File "/disk2/dongsq/environments/python3.6/lib/python3.6/site-packages/warprnnt_pytorch-0.1-py3.6-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 99, in certify_inputs
    raise ValueError("Output length mismatch")
ValueError: Output length mismatch

It looks like, when the data is splited to two gpus, on one gpu the data will doesn't match this.

fatal error: tensorflow/core/framework/bounds_check.h: No such file or directory #include "tensorflow/core/framework/bounds_check.h"

Hi, I meet the problem of can't find the file of "tensorflow/core/framework/bounds_check.h" at last step? How can I solve it? Where should I put the TensorFlow source? Could U please give me some help? Thanks a lot!

PyTorch non-static autograd.Function loss cause issue

From 339471a this commit,
Loss function changed from static autograd.Function to non-static Function.

With this change, model with non-static Function loss cannot be trained(or properly back-propagated).

Check forum comment that is related to this issue.

My suggestion is that keep wrapper module with static inner function.

Error during building warp-transducer: __shfl_down() is not defined

I pulled nvcr.io/nvidia/pytorch:19.06-py3 as my working space image. I was facing this error for a while during make in build warp-transducer: ../warp-transducer/src/reduce.co line 32: __shfl_down() is not defined.

I would like to highlight that __shfl_down() is deprecated and deleted from high version device and no more supported; should change to __shfl_down_sync()[Ref. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#warp-shuffle-functions]
Fix is to change line 32 in src/reduce.co
shuff = __shfl_down(x, offset);
To
shuff = __shfl_down_sync(0xFFFFFFFF, x, offset);

Hope this will help others in their build.

build fails on latest tensorflow 1.10 because "1.10.0" < "1.4.0" in setup.py

To compare versions:

from distutils.version import LooseVersion
...
if LooseVersion(tf.__version__) >= LooseVersion('1.4'):
   ...
'''

GPU test segmentation fault with CUDA 10.0

Thank you for the great RNN-T library :)
I got a "segmentation fault" message while running test_gpu.

(py3-tf-gpu) sephiroce@Sephiroce-BiKE:~/warp-transducer/build$ ./test_gpu
Running gpu tests
Segmentation fault
(py3-tf-gpu) sephiroce@Sephiroce-BiKE:~/warp-transducer/build$ ./test_cpu
Running CPU tests
finish small_test 1
finish options_test 1
finish inf_test 1
finished 1
Tests pass

This is a cmake log message.

(py3-tf-gpu) sephiroce@Sephiroce-BiKE:~/warp-transducer/build$ cmake -DCUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME ..
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /usr/local/cuda (found version "10.0")
-- cuda found TRUE
-- Building shared library with GPU support
-- Configuring done
-- Generating done
-- Build files have been written to: /home/sephiroce/warp-transducer/build

I also ran gdb

Starting program: /home/sephiroce/warp-transducer/build/test_gpu
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running gpu tests
[New Thread 0x7fffef200700 (LWP 31941)]
[New Thread 0x7fffee9ff700 (LWP 31942)]
[New Thread 0x7fffee17d700 (LWP 31943)]

Thread 1 "test_gpu" received signal SIGSEGV, Segmentation fault.
0x00007ffff56436b0 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) up
#1  0x00007ffff5646998 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) up
#2  0x00007ffff5544b94 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) up
#3  0x00007ffff5544d8e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) up
#4  0x00007ffff569b7d0 in cuLaunchKernel () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
(gdb) up
#5  0x00007ffff74c82e2 in cudart::cudaApiLaunchKernelCommon(void const*, dim3, dim3, void**, unsigned long, CUstream_st*, bool) () from libwarprnnt.so
(gdb) up
#6  0x00007ffff74c84d7 in cudart::cudaApiLaunchKernel(void const*, dim3, dim3, void**, unsigned long, CUstream_st*) () from libwarprnnt.so
(gdb) up
#7  0x00007ffff74fc93b in cudaLaunchKernel () from libwarprnnt.so

Do you have any clues to fix this seg fault..?

question about gradient to last blank transition

Hi @HawkAaron ,
according to Eq. (12) from your note, the gradient to last blank transition is:

However I found the actual implementations different:

warp-transducer/include/detail/gpu_rnnt_kernel.h

Lines 161 to 165 in c6d12f9

 Tp logpk = denom[col] + acts[col * alphabet_size + idx]; 

 // Tp logpk = logp(denom, acts, maxT, maxU, alphabet_size, mb, t, u, idx); 

 Tp grad = exp(alphas[col] + betas[col] + logpk - logll[mb]); 

 // grad to last blank transition 

 if (idx == blank_ && t == T-1 && u == U-1) grad -= 1;

, should it be

if (idx == blank_ && t == T-1 && u == U-1) grad -= exp(alphas[col] + logpk - logll[mb]);

for line 165?

Correct me if I am wrong. Thank you for this great repo.

still pytorch binding GPU test error

running install
running bdist_egg
running egg_info
writing warprnnt_pytorch.egg-info/PKG-INFO
writing dependency_links to warprnnt_pytorch.egg-info/dependency_links.txt
writing top-level names to warprnnt_pytorch.egg-info/top_level.txt
reading manifest file 'warprnnt_pytorch.egg-info/SOURCES.txt'
writing manifest file 'warprnnt_pytorch.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
running build_ext
building 'warprnnt_pytorch.warp_rnnt' extension
gcc -pthread -B /data3/tanghaoyu/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data3/tanghaoyu/espnet/tools/warp-transducer/include -I/data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include -I/data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/torch/csrc/api/include -I/data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/TH -I/data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/THC -I/data3/tanghaoyu/anaconda3/include/python3.7m -c src/binding.cpp -o build/temp.linux-x86_64-3.7/src/binding.o -std=c++11 -fPIC -DWARPRNNT_ENABLE_GPU -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=warp_rnnt -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/THC/THCGeneral.h:12:0,
from /data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/THC/THC.h:4,
from src/binding.cpp:8:
/data3/tanghaoyu/espnet/tools/venv/lib/python3.7/site-packages/torch/lib/include/ATen/cuda/CUDAStream.h:6:30: fatal error: cuda_runtime_api.h: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

#15
read this issue still get problem, my GCC is 5.4 and environment variable is setted as

export CUDAROOT=/usr/local/cuda
export CUDA_HOME=$CUDAROOT
export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
export CUDA_PATH=$CUDAROOT

export PATH=$CUDAROOT/bin:$PATH

export LD_LIBRARY_PATH=$CUDA_HOME/extras/CUPTI/lib64:$LD_LIBRARY_PATH
export LIBRARY_PATH=$LIBRARY_PATH:$CUDA_HOME/lib64:$LIBRARY_PATH

export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"
export CFLAGS="-I/usr/local/cuda/include $CFLAGS"

joint network defined in source code

Hi @HawkAaron ,

I was trying to understand your rnnt loss implementation. Could you point me to in your C++ source code where you defined the output network (equation (16) (17) (18) from the 2013 paper: speech recognition with deep recurrent neural networks)?

Thanks!

question about input shape

Hi:
I have a question about the rnnt_loss input shape. The shape of acts is batch x seqLength x (labelLength + 1) x outputDim. However in normal rnn model, the output of rnn network has only three dimension, which is batch x seqLength x outputDim. Then a linear layer can be used to transform the output shape to batch x seqLength x (labelLength+1). Why here the rnnt_loss input has four dimension.

getting negative loss

Hello,
I'm trying to feed some random matrix into the loss function, and then negative loss returned.

tf1.9/cuda9.0/cudnn7.1
What's the normal range of the loss value? Or, any suggestions in debugging? Thanks a lot!

import warprnnt_pytorch as warp_rnnt

tensorflow binding undefined symbol: _ZTIN10tensorflow8OpKernelE

Hi,
I used a docker with pip installed tensorflow1.14, cuda10, python3.5 and nvidia-415 on my desktop and the script passed the test_cpu, test_gpu.
When I ran

python setup.py install

and there seemed no error.

However, when I ran

python setup.py test

the error shows that 'tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python3.5/dist-packages/warprnnt_tensorflow-0.1-py3.5-linux-x86_64.egg/warprnnt_tensorflow/kernels.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZTIN10tensorflow8OpKernelE'

Before running setup.py install, I find that it first has an error like 'fatal error: not found tensorflow/core/kernels/bounds_check.h', so I just change kernels to framework and past the setup install step.

(But on my desktop, which has cuda9 and tensorflow-1.12, there seems no problem at all since bounds_check.h is under kernels/ in tf1.12 source code. I successfully built this on my desktop.)

UPDATE:
Previously I've encountered /usr/bin/ld: cannot find -ltensorflow_framework. So followed a solution on warp-ctc, I made a soft-link

ln -s /usr/local/lib/python3.5/dist-packages/tensorflow/libtensorflow_framework.so.1 /usr/local/lib/ibtensorflow_framework.so

. Then I encountered the "xxxxx as list() " problem, so I changed the registered op according to here.Then I encountered _ZTIN10tensorflow8OpKernelE error.

I successfully solved this problem by just

cd  /usr/local/lib/python3.5/dist-packages/tensorflow/; ln -s libtensorflow_framework.so.1 libtensorflow_framework.so

After that, if I directly run

python setup.py test

it will encounter '"Registering two gradient with name 'WarpRNNT'!'.
So to test the test cases, just

python tests/test_basic.py
&
python tests/test_warprnnt_op.py

error in make

Hello, according to your prompt, when executing the make command, the following error occurs, how can I solve this problem, thank you.
/home/gaoliqing/lhb/warp-transducer-master/src/rnnt_entrypoint.cu(1): error: this declaration has no storage class or type specifier

/home/gaoliqing/lhb/warp-transducer-master/src/rnnt_entrypoint.cu(1): error: expected a ";"

2 errors detected in the compilation of "/tmp/tmpxft_000014f2_00000000-13_rnnt_entrypoint.compute_70.cpp1.ii".
CMake Error at warprnnt_generated_rnnt_entrypoint.cu.o.cmake:266 (message):
Error generating file
/home/gaoliqing/lhb/warp-transducer-master/build/CMakeFiles/warprnnt.dir/src/./warprnnt_generated_rnnt_entrypoint.cu.o

CMakeFiles/warprnnt.dir/build.make:192: recipe for target 'CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o' failed
make[2]: *** [CMakeFiles/warprnnt.dir/src/warprnnt_generated_rnnt_entrypoint.cu.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/warprnnt.dir/all' failed
make[1]: *** [CMakeFiles/warprnnt.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

tensorflow setup on CPU error

@HawkAaron
Hi , I installed the GPU version successfully with the latest setup.py script, but I got error when installing a cpu version on Mac for warprnnt_tensorflow.kernels.

ld: library not found for -l:libtensorflow_framework.1.dylib

I checked this dylib exists in the directory where I installed tensorflow 1.14. I didn't find similar issue on this. Do you have any clue ?

GPU Test Failed

getting test.py output as :

[ 9.53674316e-07]
CPU Tests passed!
GPU execution requested, but not compiled with GPU support
Traceback (most recent call last):
File "/home/tanish/multi_gpu_ds3/warp-transducer/pytorch_binding/test/test.py", line 171, in
small_test()
File "/home/tanish/multi_gpu_ds3/warp-transducer/pytorch_binding/test/test.py", line 77, in small_test
"small_test costs mismatch."
AssertionError: small_test costs mismatch.

print(cost-expected_cost) = [-4.49566603]
Also cost is coming to be 0.
Any Help on this ???

unable to run test.py file

Even After properly doing make unable to run test.py file:
getting this stacktrace

Traceback (most recent call last):
File "/home/rajeev/pb/ds3/libs/warp-transducer/pytorch_binding/test/test.py", line 17, in
from warprnnt_pytorch import RNNTLoss
ModuleNotFoundError: No module named 'warprnnt_pytorch

Any help on this??

ImportError: libwarprnnt.so: cannot open shared object file: No such file or directory

Thanks for your code!
When I install your project ends and test_cpu(gpu) is ok.
when import wraprnnt_pytorch , i got the errors as folowing:
ModuleNotFoundError: No module named 'warprnnt_pytorch.warp_rnnt'
I try many times, and don't knoe how to solve it.

Can log_probs be torch.float16?

@HawkAaron

I'm trying to add the mixed precision of (apex)[https://github.com/NVIDIA/apex] into the training. But I found that the rnntloss need log_probs must be torch.float32. Can it be torch.float16?

I tried to comment [this line], there is an error:

File "/ssd4/exec/sqdong/environments/anaconda3/lib/python3.7/site-packages/warprnnt_pytorch-0.1-py3.7-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 100, in forward
    return self.loss(acts, labels, act_lens, label_lens, self.blank, self.reduction)
  File "/ssd4/exec/sqdong/environments/anaconda3/lib/python3.7/site-packages/warprnnt_pytorch-0.1-py3.7-linux-x86_64.egg/warprnnt_pytorch/__init__.py", line 40, in forward
    grads /= minibatch_size
  File "/ssd4/exec/sqdong/environments/anaconda3/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/wrap.py", line 53, in wrapper
RuntimeError: CUDA error: an illegal memory access was encountered

Is there some method to solve it?

	Tp logpk = denom[col] + acts[col * alphabet_size + idx];
	// Tp logpk = logp(denom, acts, maxT, maxU, alphabet_size, mb, t, u, idx);
	Tp grad = exp(alphas[col] + betas[col] + logpk - logll[mb]);
	// grad to last blank transition
	if (idx == blank_ && t == T-1 && u == U-1) grad -= 1;

hawkaaron / warp-transducer Goto Github PK

warp-transducer's Introduction

warp-transducer

GPU Performance

Interface

Compilation

Test

Contributing

Reference

warp-transducer's People

Stargazers

Watchers

Forkers

warp-transducer's Issues

====================================================================== FAIL: test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest)

====================================================================== ERROR: test_basic (unittest.loader._FailedTest)

====================================================================== ERROR: test_warprnnt_op (unittest.loader._FailedTest)

In [1]: import warprnnt_tensorflow

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
FAIL: test_multiple_batches_gpu (test_warprnnt_op.WarpRNNTTest)

======================================================================
ERROR: test_basic (unittest.loader._FailedTest)

======================================================================
ERROR: test_warprnnt_op (unittest.loader._FailedTest)