iamjanvijay / rnnt Goto Github PK

View Code? Open in Web Editor NEW

45.0 4.0 9.0 143.43 MB

An implementation of RNN-Transducer loss in TF-2.0.

License: MIT License

Python 100.00%

transducer-loss rnnt ctc-loss asr-decoder asr-model

rnnt's Introduction

RNN-Transducer Loss

This package provides a implementation of Transducer Loss in TensorFlow==2.0.

Using the pakage

First install the module using pip command.

pip install rnnt

Then use the "rnnt" loss funtion from "rnnt" module, as described in the sample script: Sample Train Script

from rnnt import rnnt_loss

def loss_grad_gradtape(logits, labels, label_lengths, logit_lengths):
    with tf.GradientTape() as g:
        g.watch(logits)
        loss = rnnt_loss(logits, labels, label_lengths, logit_lengths)
    grad = g.gradient(loss, logits)
    return loss, grad
    
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)

Follwing are the shapes of input parameters for rnnt_loss method -
logits - (batch_size, input_time_steps, output_time_steps+1, vocab_size+1)
labels - (batch_size, output_time_steps)
label_length - (batch_size) - number of time steps for each output sequence in the minibatch.
logit_length - (batch_size) - number of time steps for each input sequence in the minibatch.

rnnt's People

Contributors

Stargazers

Watchers

Forkers

entn-at bizzu5252 lxw566666 ramanps05 yjiangling coolwind8214 creatorscan beatlesctr caoyuji1986

rnnt's Issues

Are there benchmarks with other implementations?

Invalid argument: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484]

logits shape: (10, 79, 18, 6485)
labels shape (10, 17)
labels_length: tf.Tensor([ 2 14 9 9 9 13 17 9 9 17], shape=(10,), dtype=int64)
logit_length : tf.Tensor([20 47 36 35 41 58 64 38 45 78], shape=(10,), dtype=int64)

I got this error：
2020-12-11 10:48:09.516460: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at scatter_nd_op.cc:133 : Invalid argument: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484]
Traceback (most recent call last):
File "/home/dapeng/PycharmProjects/convTT/train.py", line 68, in
train(model, train_set, optimizer, train_loss, epoch)
File "/home/dapeng/PycharmProjects/convTT/train.py", line 33, in train
label_length=labels_length)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 204, in rnnt_loss
return compute_rnnt_loss_and_grad(*args)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 264, in call
return self._d(self._f, a, k)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 218, in decorated
return _eager_mode_decorator(wrapped, args, kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 412, in _eager_mode_decorator
result, grad_fn = f(*args, **kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 195, in compute_rnnt_loss_and_grad
result = compute_rnnt_loss_and_grad_helper(**kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 168, in compute_rnnt_loss_and_grad_helper
[batch_size, input_max_len, target_max_len, vocab_size - 1])
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8842, in scatter_nd
indices, updates, shape, name=name, ctx=_ctx)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8885, in scatter_nd_eager_fallback
attrs=_attrs, ctx=ctx, name=name)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484] [Op:ScatterNd]

Support for Tensorflow >= 2.3

Hello, I like your work. I wonder if you can support this package for the newer version of tensorflow and tf-nightly?

This multiply operator might fail.

https://github.com/mejanvijay/tensorflow_rnnt/blob/e18f10d82c8b0b815b80094dae5777aeae257e1b/rnnt_loss.py#L30

This multiply operator would fail, when input_max_len != (target_max_len-1).

Basically labels is batch x (target_max_len-1). When converted to one_hot_labels it becomes batch x (target_max_len-1) x (target_max_len-1) x vocab_size.

logits is batch x input_max_len x target_max_len x vocab_size.

And when we do tf.multiply(log_probs[:, :, :-1, :], one_hot_labels).
if input_max_len != (target_max_len-1) it should fail.

Our test cases are succeeding only cause input_max_len == (target_max_len-1) in all test cases.
ie input_max_len = 5 and target_max_len = 6.

Dimension error

2020-07-10 09:52:09.715599: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:13.875948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-10 09:52:13.910317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
2020-07-10 09:52:13.910441: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:13.966109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 09:52:14.009304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 09:52:14.028718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 09:52:14.068297: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 09:52:14.094965: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 09:52:14.171005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 09:52:14.171229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 09:52:14.173103: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-10 09:52:14.199127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x18498c47600 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-10 09:52:14.199354: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-10 09:52:14.200187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
2020-07-10 09:52:14.200339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:14.200510: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 09:52:14.200624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 09:52:14.200728: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 09:52:14.200834: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 09:52:14.200937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 09:52:14.201032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 09:52:14.201209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 09:52:15.580864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-10 09:52:15.581199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-07-10 09:52:15.581289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-07-10 09:52:15.582336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6609 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:08:00.0, compute capability: 7.5)
2020-07-10 09:52:15.586161: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x184c2fd8b50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-10 09:52:15.586307: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
Model: "EncoderModel"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(4, None, 20)]           0
_________________________________________________________________
EncoderBlock (EncoderBlock)  (4, None, 100)            370000
=================================================================
Total params: 370,000
Trainable params: 370,000
Non-trainable params: 0
_________________________________________________________________
None
Model: "PredictorModel"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(4, None, 28)]           0
_________________________________________________________________
PredictionBlock (PredictionB (4, None, 100)            212400
=================================================================
Total params: 212,400
Trainable params: 212,400
Non-trainable params: 0
_________________________________________________________________
None
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(4, None, 20)]      0
__________________________________________________________________________________________________
input_2 (InputLayer)            [(4, None, 28)]      0
__________________________________________________________________________________________________
EncoderBlock (EncoderBlock)     (4, None, 100)       370000      input_1[0][0]
__________________________________________________________________________________________________
PredictionBlock (PredictionBloc (4, None, 100)       212400      input_2[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(4, None, 1, 100)]  0           EncoderBlock[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims_1 (Tenso [(4, 1, None, 100)]  0           PredictionBlock[0][0]
__________________________________________________________________________________________________
tf_op_layer_AddV2 (TensorFlowOp [(4, None, None, 100 0           tf_op_layer_ExpandDims[0][0]
                                                                 tf_op_layer_ExpandDims_1[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, None, 1 10100       tf_op_layer_AddV2[0][0]
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, None, 2 2828        time_distributed[0][0]
==================================================================================================
Total params: 595,328
Trainable params: 595,328
Non-trainable params: 0
__________________________________________________________________________________________________
None
Epoch 1/10
(4, 391, 172, 28)
(4, 172)
(4, 1)
(4, 1)
Traceback (most recent call last):
  File "run_model.py", line 74, in <module>
    train(t_model)
  File "run_model.py", line 67, in train
    loss = train_step(t_model, t_data, optimizer)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 627, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 506, in _initialize
    *args, **kwds))
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2446, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2777, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2667, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py", line 981, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 441, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    run_model.py:54 train_step  *
        loss = loss_fn(logits, labels, label_lens, mfcc_lens)
    run_model.py:42 loss_fn  *
        return rnnt_loss(logits, labels, label_length, logit_length)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:195 compute_rnnt_loss_and_grad  *
        result = compute_rnnt_loss_and_grad_helper(**kwargs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:112 compute_rnnt_loss_and_grad_helper  *
        blank_probs, truth_probs = transition_probs(one_hot_labels, log_probs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:36 transition_probs  *
        truth_probs = tf.reduce_sum(tf.multiply(log_probs[:, :, :-1, :], one_hot_labels), axis=-1)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\util\dispatch.py:180 wrapper  **
        return target(*args, **kwargs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\ops\math_ops.py:381 multiply
        return gen_math_ops.mul(x, y, name)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\ops\gen_math_ops.py:6092 mul
        "Mul", x=x, y=y, name=name)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\op_def_library.py:744 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py:595 _create_op_internal
        compute_device)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:3327 _create_op_internal
        op_def=op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:1817 __init__
        control_input_ops, op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:1657 _create_c_op
        raise ValueError(str(e))

    ValueError: Dimensions must be equal, but are 171 and 172 for '{{node rnnt_loss/Mul}} = Mul[T=DT_FLOAT](rnnt_loss/strided_slice_1, rnnt_loss/one_hot)' with input shapes: [4,391,171,28], [4,391,172,28].

I got this error.

what the use of pred_grads?

Hi, I have studied the source code, but the matrix operations are used so weird that really confuse me. Maybe you can kindly give a good literature about the algorithm with such strange diagonal matrix for us to understand. Now come to the problem, in the example:
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)
Is the pred_loss for tensorflow model loss function? what the use of pred_grads?

And when I check the source code, find the loss
loss = -final_state_probs
and
final_state_probs = beta[:, 0, 0]

the loss is get only from backward_dp() without connection with forward_dp(). So I think the pred_loss can't be used in tensorflow model simply. What's the correct training method for tensorflow, following is correct?

logits = some_deep_network(...)
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)
rnnt_model = tf.keras.Model(inputs=[logits, labels, label_lengths, logit_lengths], outputs=pred_loss)
rnnt_model.compile(optimizer='adam', loss=lambda y_true, y_pred: y_pred)
rnnt_model.fit(...)

why ‘labels - 1 ’ in compute_rnnt_loss_and_grad_helper

hello，I have a question here:why ‘labels - 1 ’ in compute_rnnt_loss_and_grad_helper?
b = tf.reshape(labels - 1, shape=(batch_size, 1, target_max_len - 1, 1))

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.