Giter VIP home page Giter VIP logo

rnnt's Issues

Dimension error

2020-07-10 09:52:09.715599: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:13.875948: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-10 09:52:13.910317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
2020-07-10 09:52:13.910441: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:13.966109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 09:52:14.009304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 09:52:14.028718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 09:52:14.068297: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 09:52:14.094965: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 09:52:14.171005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 09:52:14.171229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 09:52:14.173103: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-10 09:52:14.199127: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x18498c47600 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-10 09:52:14.199354: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-10 09:52:14.200187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 computeCapability: 7.5
coreClock: 1.71GHz coreCount: 46 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.23GiB/s
2020-07-10 09:52:14.200339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-07-10 09:52:14.200510: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-10 09:52:14.200624: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-10 09:52:14.200728: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-10 09:52:14.200834: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-10 09:52:14.200937: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-10 09:52:14.201032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-10 09:52:14.201209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-07-10 09:52:15.580864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-10 09:52:15.581199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0
2020-07-10 09:52:15.581289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N
2020-07-10 09:52:15.582336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6609 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:08:00.0, compute capability: 7.5)
2020-07-10 09:52:15.586161: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x184c2fd8b50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-10 09:52:15.586307: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
Model: "EncoderModel"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(4, None, 20)]           0
_________________________________________________________________
EncoderBlock (EncoderBlock)  (4, None, 100)            370000
=================================================================
Total params: 370,000
Trainable params: 370,000
Non-trainable params: 0
_________________________________________________________________
None
Model: "PredictorModel"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_2 (InputLayer)         [(4, None, 28)]           0
_________________________________________________________________
PredictionBlock (PredictionB (4, None, 100)            212400
=================================================================
Total params: 212,400
Trainable params: 212,400
Non-trainable params: 0
_________________________________________________________________
None
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(4, None, 20)]      0
__________________________________________________________________________________________________
input_2 (InputLayer)            [(4, None, 28)]      0
__________________________________________________________________________________________________
EncoderBlock (EncoderBlock)     (4, None, 100)       370000      input_1[0][0]
__________________________________________________________________________________________________
PredictionBlock (PredictionBloc (4, None, 100)       212400      input_2[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims (TensorF [(4, None, 1, 100)]  0           EncoderBlock[0][0]
__________________________________________________________________________________________________
tf_op_layer_ExpandDims_1 (Tenso [(4, 1, None, 100)]  0           PredictionBlock[0][0]
__________________________________________________________________________________________________
tf_op_layer_AddV2 (TensorFlowOp [(4, None, None, 100 0           tf_op_layer_ExpandDims[0][0]
                                                                 tf_op_layer_ExpandDims_1[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, None, 1 10100       tf_op_layer_AddV2[0][0]
__________________________________________________________________________________________________
time_distributed_1 (TimeDistrib (None, None, None, 2 2828        time_distributed[0][0]
==================================================================================================
Total params: 595,328
Trainable params: 595,328
Non-trainable params: 0
__________________________________________________________________________________________________
None
Epoch 1/10
(4, 391, 172, 28)
(4, 172)
(4, 1)
(4, 1)
Traceback (most recent call last):
  File "run_model.py", line 74, in <module>
    train(t_model)
  File "run_model.py", line 67, in train
    loss = train_step(t_model, t_data, optimizer)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 627, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 506, in _initialize
    *args, **kwds))
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2446, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2777, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\function.py", line 2667, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py", line 981, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\eager\def_function.py", line 441, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    run_model.py:54 train_step  *
        loss = loss_fn(logits, labels, label_lens, mfcc_lens)
    run_model.py:42 loss_fn  *
        return rnnt_loss(logits, labels, label_length, logit_length)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:195 compute_rnnt_loss_and_grad  *
        result = compute_rnnt_loss_and_grad_helper(**kwargs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:112 compute_rnnt_loss_and_grad_helper  *
        blank_probs, truth_probs = transition_probs(one_hot_labels, log_probs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\rnnt-0.0.5-py3.7.egg\rnnt\rnnt.py:36 transition_probs  *
        truth_probs = tf.reduce_sum(tf.multiply(log_probs[:, :, :-1, :], one_hot_labels), axis=-1)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\util\dispatch.py:180 wrapper  **
        return target(*args, **kwargs)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\ops\math_ops.py:381 multiply
        return gen_math_ops.mul(x, y, name)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\ops\gen_math_ops.py:6092 mul
        "Mul", x=x, y=y, name=name)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\op_def_library.py:744 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\func_graph.py:595 _create_op_internal
        compute_device)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:3327 _create_op_internal
        op_def=op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:1817 __init__
        control_input_ops, op_def)
    C:\Users\jtdut\anaconda3\envs\rnnt\lib\site-packages\tensorflow\python\framework\ops.py:1657 _create_c_op
        raise ValueError(str(e))

    ValueError: Dimensions must be equal, but are 171 and 172 for '{{node rnnt_loss/Mul}} = Mul[T=DT_FLOAT](rnnt_loss/strided_slice_1, rnnt_loss/one_hot)' with input shapes: [4,391,171,28], [4,391,172,28].

I got this error.

This multiply operator might fail.

https://github.com/mejanvijay/tensorflow_rnnt/blob/e18f10d82c8b0b815b80094dae5777aeae257e1b/rnnt_loss.py#L30

This multiply operator would fail, when input_max_len != (target_max_len-1).

Basically labels is batch x (target_max_len-1). When converted to one_hot_labels it becomes batch x (target_max_len-1) x (target_max_len-1) x vocab_size.

logits is batch x input_max_len x target_max_len x vocab_size.

And when we do tf.multiply(log_probs[:, :, :-1, :], one_hot_labels).
if input_max_len != (target_max_len-1) it should fail.

Our test cases are succeeding only cause input_max_len == (target_max_len-1) in all test cases.
ie input_max_len = 5 and target_max_len = 6.

what the use of pred_grads?

Hi, I have studied the source code, but the matrix operations are used so weird that really confuse me. Maybe you can kindly give a good literature about the algorithm with such strange diagonal matrix for us to understand. Now come to the problem, in the example:
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)
Is the pred_loss for tensorflow model loss function? what the use of pred_grads?

And when I check the source code, find the loss
loss = -final_state_probs
and
final_state_probs = beta[:, 0, 0]

the loss is get only from backward_dp() without connection with forward_dp(). So I think the pred_loss can't be used in tensorflow model simply. What's the correct training method for tensorflow, following is correct?

logits = some_deep_network(...)
pred_loss, pred_grads = loss_grad_gradtape(logits, labels, label_lengths, logit_lengths)
rnnt_model = tf.keras.Model(inputs=[logits, labels, label_lengths, logit_lengths], outputs=pred_loss)
rnnt_model.compile(optimizer='adam', loss=lambda y_true, y_pred: y_pred)
rnnt_model.fit(...)

Invalid argument: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484]

logits shape: (10, 79, 18, 6485)
labels shape (10, 17)
labels_length: tf.Tensor([ 2 14 9 9 9 13 17 9 9 17], shape=(10,), dtype=int64)
logit_length : tf.Tensor([20 47 36 35 41 58 64 38 45 78], shape=(10,), dtype=int64)

I got this error:
2020-12-11 10:48:09.516460: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at scatter_nd_op.cc:133 : Invalid argument: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484]
Traceback (most recent call last):
File "/home/dapeng/PycharmProjects/convTT/train.py", line 68, in
train(model, train_set, optimizer, train_loss, epoch)
File "/home/dapeng/PycharmProjects/convTT/train.py", line 33, in train
label_length=labels_length)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 204, in rnnt_loss
return compute_rnnt_loss_and_grad(*args)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 264, in call
return self._d(self._f, a, k)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 218, in decorated
return _eager_mode_decorator(wrapped, args, kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/custom_gradient.py", line 412, in _eager_mode_decorator
result, grad_fn = f(*args, **kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 195, in compute_rnnt_loss_and_grad
result = compute_rnnt_loss_and_grad_helper(**kwargs)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/rnnt/rnnt.py", line 168, in compute_rnnt_loss_and_grad_helper
[batch_size, input_max_len, target_max_len, vocab_size - 1])
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8842, in scatter_nd
indices, updates, shape, name=name, ctx=_ctx)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8885, in scatter_nd_eager_fallback
attrs=_attrs, ctx=ctx, name=name)
File "/home/dapeng/anaconda3/envs/tf/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0,2] = [0, 0, 2, -1] does not index into shape [10,79,18,6484] [Op:ScatterNd]

Support for Tensorflow >= 2.3

Hello, I like your work. I wonder if you can support this package for the newer version of tensorflow and tf-nightly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.