maghoumi / pytorch-softdtw-cuda Goto Github PK

Fast CUDA implementation of (differentiable) soft dynamic time warping for PyTorch

License: MIT License

Python 100.00%

dynamic-time-warping pytorch cuda deep-learning soft-dtw

pytorch-softdtw-cuda's Introduction

Hi there 👋

My name is Mehran Maghoumi and I'm a senior deep learning engineer at NVIDIA. My primary area of work is parking space perception using surround camera setups for autonomous driving. I also hold a Ph.D. degree in computer sceince from the University of Central Florida. Feel free to checkout my full profile on my homepage.

What's all this? 🤔

Below is the list of my open source projects that ✨ I'm the most proud of ✨. I've worked on these either during my spare time or as a part of my Ph.D. dissertation. Countless hours of my time have gone into the development of each one, and nothing makes me happier than seeing people use them in their projects.

If you see something you like, please consider ⭐ starring ⭐ the repo. It gives me a better idea of where to focus my efforts!

Happy browsing! 💥

pytorch-softdtw-cuda's People

Contributors

Stargazers

Watchers

pytorch-softdtw-cuda's Issues

package in pypi or conda

Hello,

thanks for the repository, it is very useful to have a soft-dtw algorithm available for GPU computing.

I am wondering whether you are interested in pushing this to pypi and/or conda for ease of use.
This might requires some restructuring of the code as a package.

Value for bandwitdh pruning?

Should it be in the range 0-1 or 0-max_len?

GPU mem still be affected when using CPU Soft DTW

Here is my testing with dim_feature = 80

The space complexity seems to be O(Frame time^2)

I just wonder why GPU mem still be affected when using CPU Soft DTW implementation

NvvmSupportError: libNVVM cannot be found.

Hi guys, this work is very interesting and helpful for my project. However, when I try to run your sample code, I encounter the following error:

FileNotFoundError                         Traceback (most recent call last)
File d:\Miniconda3\envs\usad\lib\site-packages\numba\cuda\cudadrv\nvvm.py:126, in NVVM.__new__(cls)
    125 try:
--> 126     inst.driver = open_cudalib('nvvm')
    127 except OSError as e:

File d:\Miniconda3\envs\usad\lib\site-packages\numba\cuda\cudadrv\libs.py:60, in open_cudalib(lib)
     59 path = get_cudalib(lib)
---> 60 return ctypes.CDLL(path)

File d:\Miniconda3\envs\usad\lib\ctypes\__init__.py:374, in CDLL.__init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    373 if handle is None:
--> 374     self._handle = _dlopen(self._name, mode)
    375 else:

FileNotFoundError: Could not find module 'nvvm.dll' (or one of its dependencies). Try using the full path with constructor syntax.

During handling of the above exception, another exception occurred:

NvvmSupportError                          Traceback (most recent call last)
Cell In [9], line 12
      9 x = x.cuda()
     10 y = y.cuda()
---> 12 loss_soft_dtw(x, y)
...
    133 # Find & populate functions
    134 for name, proto in inst._PROTOTYPES.items():

NvvmSupportError: libNVVM cannot be found. Do `conda install cudatoolkit`:
Could not find module 'nvvm.dll' (or one of its dependencies). Try using the full path with constructor syntax.

I'm pretty sure I have installed the numba library, how can I fix this?
Thanks!

Batch of variable sequence length

Hi, Thanks for the implementation!
I was wondering if there is a way to handle batch with sequences of variable size?
Suppose I have length1, length2 variable that contains all sequences lenghts.

I guess for the forward function we just have to change the R[:, -2, -2] to something like R[:, -length1-1, -length2-1], but I'm not sure about what to do with the backward function.

Do you know if there is any mathematical document with the detailed computation for backward function?

AssertionError when dims increase

Distance function and comparison with no warping

Hi,
I am trying to use a L1 distance instead of L2.
Currently I only change line 321 in soft_dtw_cuda.py:

    def _calc_distance_matrix(self, x, y):
        """
        Calculates the Euclidean distance between each element in x and y per timestep
        """
        n = x.size(1)
        m = y.size(1)
        d = x.size(2)
        x = x.unsqueeze(2).expand(-1, n, m, d)
        y = y.unsqueeze(1).expand(-1, n, m, d)
        return torch.abs(x - y).sum(3)
#        return torch.pow(x - y, 2).sum(3)

is this correct or is there more to be changed?

Also, I'm comparing the non warped loss (i.e. mean(abs(X-Y))) and the dtw loss is much higher.
I would expect dtw to roughly less or equal (averaging the timesteps) of the normal L1, or at least comparable, but I get value which are of 2 or even 3 orders of magnitude higher. What am I missing?

output the alignments of two sequences

Hello，thanks for the beautiful implementation of dtw！

I'm facing an issue when using this code.
At first, I want to output the alignments of two sequences.
Therefore, I change the output of forward function in class _SoftDTWCUDA.
The new output is R instead of R[:, -2, -2].
When running loss.backward(), the error occurs: TypeError: backward() takes 2 positional arguments but 3 were given.

I would greatly appreciate it if you could assist me in understanding the cause of this issue and providing guidance on resolving it. Any insights you can offer would be invaluable.

Thank you in advance for your help!

can DTW be negative?

Can DTW return a negative difference between the two sequences? (dist_func was not given. If so, as far as I know, euclidean distance works)

In this image, those two sequences are compared by DTW cuda (dist_func was not given.)

...self.sdtw = SoftDTW(use_cuda=True, gamma=0.1)

diffs = self.sdtw(y_pred, y)
"""y_pred shape = (1000, 100, 1), y shape = (1000, 100, 1)"""

if diffs[0] < 0: # current diffs[0] = -2.0720e+00
plt.figure(figsize=(10, 5))
plt.plot(y_pred[0, :, [0]].cpu().detach().numpy(), label="y_pred")
plt.plot(y[0, :, [0]].cpu().detach().numpy(), label="y")
plt.legend()
plt.show() => above image.

The loss value of softDTW is inf when choosing bandwidth values

Hello @Maghoumi ,

I have tried the example codes you provided as follows, and I noticed that when choosing bandwidth = 1 or 2, the loss values are all inf. Could you please help me solve this issue?

Sample codes:

from soft_dtw_cuda import SoftDTW
batch_size, len_x, len_y, dims = 8, 15, 12, 5
x = torch.rand((batch_size, len_x, dims), requires_grad=True)
y = torch.rand((batch_size, len_y, dims))

sdtw = SoftDTW(use_cuda=False, gamma=0.1, bandwidth=2)
loss = sdtw(x, y)
loss
--- OUTPUT:
tensor([inf, inf, inf, inf, inf, inf, inf, inf], grad_fn=<_SoftDTWBackward>)

sdtw = SoftDTW(use_cuda=False, gamma=0.1, bandwidth=1)
loss = sdtw(x, y)
loss
--- OUTPUT:
tensor([inf, inf, inf, inf, inf, inf, inf, inf], grad_fn=<_SoftDTWBackward>)

Thank you very much.

Regards,

Does .backward() contain grads for both sequences?

Hi, first of all: Awesome repo! :) Thanks.
To the point: I would like to train an audio-phoneme aligner. I have two encoders.

Spectrogram encoder - returns as many embeddings as there are feature frames
Phoneme encoder - returns as many embeddings as there are phonemes

I want to align these two sequences with DTW and optimize both the SpecEncoder and PhonemeEncoder.
Is this possible out of the box? I looked in the code but I am not sure if the loss returns grads w.r.t to both input sequences.

Update: I just checked with a dummy code if the backward can be used to modify both sequences and it seems that both input sequences are changing:

import torch
from soft_dtw_cuda import SoftDTW

criterion = SoftDTW(use_cuda=False, gamma=0.1)
x = torch.randn(1, 5, 80).permute(0,2,1)
y = torch.randn(1, 5, 120).permute(0,2,1)
x.requires_grad = True
y.requires_grad = True

opt = torch.optim.Adam([x, y], lr=0.001)

for i in range(10):

    opt.zero_grad()
    loss = criterion(x, y)
    loss.backward()
    opt.step()
    print(y[0, 0])

The question remains. Is the gradient calculated correctly for both input sequences? Also do you think that this loss could be used to solve forced alignment tasks such as the one I described above? I will update you with how it is going later this week :)

soft-DTW loss became inf when setting bandwidth bigger than 0

[2021-05-10 17:26:05,123][torchtts.hooks.logging][INFO] - epoch = 1, step = 763, soft_dtw_mel_loss = inf, total_dur_loss = 37.49, loss = inf, lr = 2.721e-06
pred_mel.shape= torch.Size([6, 386, 80])
batch[mel].shape= torch.Size([6, 302, 80])
[2021-05-10 17:26:05,962][torchtts.hooks.logging][INFO] - epoch = 1, step = 764, soft_dtw_mel_loss = inf, total_dur_loss = 45.59, loss = inf, lr = 2.724e-06
pred_mel.shape= torch.Size([24, 113, 80])
batch[mel].shape= torch.Size([24, 78, 80])
[2021-05-10 17:26:06,781][torchtts.hooks.logging][INFO] - epoch = 1, step = 765, soft_dtw_mel_loss = 134.2, total_dur_loss = 14.37, loss = 148.6, lr = 2.728e-06
pred_mel.shape= torch.Size([6, 288, 80])
batch[mel].shape= torch.Size([6, 257, 80])
[2021-05-10 17:26:07,412][torchtts.hooks.logging][INFO] - epoch = 1, step = 766, soft_dtw_mel_loss = 305.9, total_dur_loss = 12.17, loss = 318.1, lr = 2.731e-06
pred_mel.shape= torch.Size([4, 526, 80])
batch[mel].shape= torch.Size([4, 499, 80])
[2021-05-10 17:26:08,161][torchtts.hooks.logging][INFO] - epoch = 1, step = 767, soft_dtw_mel_loss = inf, total_dur_loss = 41.76, loss = inf, lr = 2.735e-06
pred_mel.shape= torch.Size([4, 532, 80])
batch[mel].shape= torch.Size([4, 495, 80])
[2021-05-10 17:26:08,953][torchtts.hooks.logging][INFO] - epoch = 1, step = 768, soft_dtw_mel_loss = 629.2, total_dur_loss = 38.73, loss = 668, lr = 2.739e-06
pred_mel.shape= torch.Size([12, 156, 80])
batch[mel].shape= torch.Size([12, 135, 80])
[2021-05-10 17:26:09,657][torchtts.hooks.logging][INFO] - epoch = 1, step = 769, soft_dtw_mel_loss = 195.1, total_dur_loss = 18.41, loss = 213.6, lr = 2.742e-06
pred_mel.shape= torch.Size([6, 343, 80])
batch[mel].shape= torch.Size([6, 311, 80])
[2021-05-10 17:26:10,349][torchtts.hooks.logging][INFO] - epoch = 1, step = 770, soft_dtw_mel_loss = inf, total_dur_loss = 36.64, loss = inf, lr = 2.746e-06
pred_mel.shape= torch.Size([6, 344, 80])
batch[mel].shape= torch.Size([6, 279, 80])
[2021-05-10 17:26:11,073][torchtts.hooks.logging][INFO] - epoch = 1, step = 771, soft_dtw_mel_loss = inf, total_dur_loss = 30.31, loss = inf, lr = 2.749e-06
pred_mel.shape= torch.Size([9, 264, 80])
batch[mel].shape= torch.Size([9, 214, 80])
[2021-05-10 17:26:11,904][torchtts.hooks.logging][INFO] - epoch = 1, step = 772, soft_dtw_mel_loss = inf, total_dur_loss = 20.1, loss = inf, lr = 2.753e-06
pred_mel.shape= torch.Size([6, 316, 80])
batch[mel].shape= torch.Size([6, 283, 80])
[2021-05-10 17:26:12,653][torchtts.hooks.logging][INFO] - epoch = 1, step = 773, soft_dtw_mel_loss = 373.4, total_dur_loss = 27.09, loss = 400.5, lr = 2.756e-06
pred_mel.shape= torch.Size([4, 428, 80])
batch[mel].shape= torch.Size([4, 399, 80])
[2021-05-10 17:26:13,337][torchtts.hooks.logging][INFO] - epoch = 1, step = 774, soft_dtw_mel_loss = 500.6, total_dur_loss = 20.63, loss = 521.2, lr = 2.760e-06

loss drop below 0?

Epoch 56 train DTW: -2.6562428356693495
hi, i found my loss can be a minus, is it possible?

It is support time-serise Classification task?

@Maghoumi How are you?
Thank you for sharing your code.
In this paper, it showed that Soft-DTW is able to use classification task. your code is able to do it?

Example fails on GPU

When running the example from the README:

from soft_dtw_cuda import SoftDTW
import torch
# Create the sequences
batch_size, len_x, len_y, dims = 8, 15, 12, 5
x = torch.rand((batch_size, len_x, dims), requires_grad=True)
y = torch.rand((batch_size, len_y, dims))
# Create the "criterion" object
sdtw = SoftDTW(use_cuda=True, gamma=0.1)
# Compute the loss value
loss = sdtw(x, y)  # Just like any torch.nn.xyzLoss()
# Aggregate and call backward()
loss.mean().backward()

I get a TypeError:

Traceback (most recent call last):
  File "test_dtw.py", line 10, in <module>
    loss = sdtw(x, y)  # Just like any torch.nn.xyzLoss()
  File "/home/dockeruser/env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/dockeruser/soft_dtw_cuda.py", line 352, in forward
    return func_dtw(D_xy, self.gamma, self.bandwidth)
  File "/home/dockeruser/soft_dtw_cuda.py", line 140, in forward
    compute_softdtw_cuda[B, threads_per_block](cuda.as_cuda_array(D.detach()),
  File "/home/dockeruser/env/lib/python3.7/site-packages/numba/cuda/api.py", line 74, in as_cuda_array
    raise TypeError("*obj* doesn't implement the cuda array interface.")
TypeError: *obj* doesn't implement the cuda array interface.

Potentially relevant packages from my env:
Cython 0.29.24
numba 0.54.0
torch 1.6.0

Any suggestions?

BTW. Thanks a lot for porting to cuda!

Negative value

When the input and labels have only one feature and the values for both of them are between 0 and 1, the loss will be negative values.

[Feature request] Add backtracking/compatibility with other tools

There seems to be no out of the box backtracking option to extract.
Are the results expected to be compatible with non-differentiabe dtw toolkits such as dtw that provide the backtracking functionality? I tried using dtw-python for alignment extraction and it seems to work pretty well. :)

Grid size (4) < 2 * SM count (164) will likely result in GPU under utilization due to low occupancy.

Thank you for your great work! This is very useful and works like a charm for my use case. Here is one quick question:

/net/papilio/storage2/bowenz/anaconda3/envs/zbw/lib/python3.9/site-packages/numba/cuda/compiler.py:726: NumbaPerformanceWarning: Grid size (4) < 2 * SM count (164) will likely result in GPU under utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/net/papilio/storage2/bowenz/anaconda3/envs/zbw/lib/python3.9/site-packages/numba/cuda/compiler.py:726: NumbaPerformanceWarning: Grid size (4) < 2 * SM count (164) will likely result in GPU under utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))
/net/papilio/storage2/bowenz/anaconda3/envs/zbw/lib/python3.9/site-packages/numba/cuda/compiler.py:726: NumbaPerformanceWarning: Grid size (4) < 2 * SM count (164) will likely result in GPU under utilization due to low occupancy.
  warn(NumbaPerformanceWarning(msg))

At the very beginning of the running, I got this warning three times, but the training seems to be going successfully. What does this imply, can I simply ignore this warning?
Thank you again!

Gradients are gone when moving the code to CUDA

Hi,
First of all, thank you so much for this amazing implementation! I am trying to use your code (the example code), but I am getting an error when I moved everything to CUDA.

device = torch.device("cuda")
# Create the sequences
batch_size, len_x, len_y, dims = 8, 15, 12, 5
x = torch.rand((batch_size, len_x, dims), requires_grad=True)
y = torch.rand((batch_size, len_y, dims))
# Transfer tensors to the GPU
x = x.to(device)
y = y.to(device)

# Create the "criterion" object
sdtw = SoftDTW(use_cuda=True, gamma=0.1)

# Compute the loss value
loss = sdtw(x, y)  # Just like any torch.nn.xyzLoss()

# Aggregate and call backward()
loss.mean().backward()

If I print x.grad, the result is empty and I get the following warning message:

/usr/local/lib/python3.7/dist-packages/torch/_tensor.py:1083: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  aten/src/ATen/core/TensorBody.h:477.)
  return self._grad

I'm running the code using Google Colab. Any idea why this is happening? Again thank you so much!

sympy error

Hi Maghoumi!
I report this issue, not sure if due to pytorch cuda version.
I installed it via pip, I could not get conda to find pytorch-cuda from nvidia channel, using conda version 23+.

pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

I import correctly the library, but:

AttributeError                            Traceback (most recent call last)
Cell In[19], line 4
      1 import sys
      2 sys.path.insert(0, "../../pytorch-softdtw-cuda")
----> 4 from soft_dtw_cuda import SoftDTW
      5 import torch

File /data0/home/h21/luas6629/Thesis/classifiers_bat_231/../../pytorch-softdtw-cuda/soft_dtw_cuda.py:25
      1 # MIT License
      2 #
      3 # Copyright (c) 2020 Mehran Maghoumi
   (...)
     21 # SOFTWARE.
     22 # ----------------------------------------------------------------------------------------------------------------------
     24 import numpy as np
---> 25 import torch
     26 import torch.cuda
     27 from numba import jit, prange

File ~/miniconda3/lib/python3.10/site-packages/torch/__init__.py:1465
   1463 from . import library
   1464 if not TYPE_CHECKING:
-> 1465     from . import _meta_registrations
   1467 # Enable CUDA Sanitizer
   1468 if 'TORCH_CUDA_SANITIZER' in os.environ:

File ~/miniconda3/lib/python3.10/site-packages/torch/_meta_registrations.py:7
      5 import torch._prims_common as utils
      6 from torch import Tensor
----> 7 from torch._decomp import _add_op_to_registry, global_decomposition_table, meta_table
      8 from torch._ops import OpOverload
      9 from torch._prims import _elementwise_meta, ELEMENTWISE_PRIM_TYPE_PROMOTION_KIND

File ~/miniconda3/lib/python3.10/site-packages/torch/_decomp/__init__.py:169
    165     return decompositions
    168 # populate the table
--> 169 import torch._decomp.decompositions
    170 import torch._refs
    172 # This list was copied from torch/_inductor/decomposition.py
    173 # excluding decompositions that results in prim ops
    174 # Resulting opset of decomposition is core aten ops

File ~/miniconda3/lib/python3.10/site-packages/torch/_decomp/decompositions.py:10
      7 from typing import Callable, cast, Iterable, List, Optional, Tuple, Union
      9 import torch
---> 10 import torch._prims as prims
     11 import torch._prims_common as utils
     12 import torch.nn.functional as F

File ~/miniconda3/lib/python3.10/site-packages/torch/_prims/__init__.py:33
     17 from torch._prims_common import (
     18     check,
     19     Dim,
   (...)
     30     type_to_dtype,
     31 )
     32 from torch._prims_common.wrappers import backwards_not_supported
---> 33 from torch._subclasses.fake_tensor import FakeTensor, FakeTensorMode
     34 from torch.overrides import handle_torch_function, has_torch_function
     35 from torch.utils._pytree import tree_flatten, tree_map, tree_unflatten

File ~/miniconda3/lib/python3.10/site-packages/torch/_subclasses/__init__.py:3
      1 import torch
----> 3 from torch._subclasses.fake_tensor import (
      4     DynamicOutputShapeException,
      5     FakeTensor,
      6     FakeTensorMode,
      7     UnsupportedFakeTensorException,
      8 )
     10 from torch._subclasses.fake_utils import CrossRefFakeMode
     12 __all__ = [
     13     "FakeTensor",
     14     "FakeTensorMode",
   (...)
     17     "CrossRefFakeMode",
     18 ]

File ~/miniconda3/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py:13
     10 from weakref import ReferenceType
     12 import torch
---> 13 from torch._guards import Source
     14 from torch._ops import OpOverload
     15 from torch._prims_common import (
     16     elementwise_dtypes,
     17     ELEMENTWISE_TYPE_PROMOTION_KIND,
     18     is_float_dtype,
     19     is_integer_dtype,
     20 )

File ~/miniconda3/lib/python3.10/site-packages/torch/_guards.py:78
     74 class GuardBuilderBase:
     75     pass
---> 78 class ShapeGuard(NamedTuple):
     79     expr: sympy.Expr
     80     stack: str

File ~/miniconda3/lib/python3.10/site-packages/torch/_guards.py:79, in ShapeGuard()
     78 class ShapeGuard(NamedTuple):
---> 79     expr: sympy.Expr
     80     stack: str

AttributeError: module 'sympy' has no attribute 'Expr'

Regarding GPU memory footprint

Hi @Maghoumi,

Thanks a lot for sharing your work!

I tried using your implementation for learning of model to match two sequences, and found that it consumes a lot of GPU memory, more than two times that of an optimal transport (0T) loss. With the exact same training configuration (batch size, etc.), OT could fit into a 16GB memory while DTW could not fit into a 32GB GPU. Do you think that this is expected?

Could you please tell me if it's possible to somehow reduce the memory footprint of DTW? I'm not sure if I should play with the bandwidth argument...

Thank you very much in advance for your reply!

Steps chosen / choice penalty

Hi,
thanks for the implementation!
I'm still unfamiliar with it, would it be possible to extract or store the steps (or choice) taken during the optimization as it is doable with normal dwt implementations?
Or, equivalently, adding penalty terms for the different choices?

Incorrect Batch size for euclidian dist using normalization

Dear author,

using your SoftDTW implementation with normalization mode (=SoftDTW divergence) throws an exception due to incorrect batch shape.

 File ".../softDTWLoss.py", line 109, in jacobean_product_squared_euclidean
    return 2 * (ones.matmul(Bt) * X - Y.matmul(Bt))
RuntimeError: The size of tensor a (128) must match the size of tensor b (384) at non-singleton dimension 0

I suspect there is a small mistake in the implementation:

if self.normalize:
    # Stack everything up and run
    x = torch.cat([X, X, Y])
    y = torch.cat([Y, X, Y])
    D = self.dist_func(x, y)
    out = func_dtw(X, Y, D, self.gamma, self.bandwidth)
    out_xy, out_xx, out_yy = torch.split(out, X.shape[0])
    return out_xy - 1 / 2 * (out_xx + out_yy)

I think line 275 needs to be changed to
out = func_dtw(x, y, D, self.gamma, self.bandwidth)

Can you check if this is correct?

why i get a negative loss value?

obj doesn't implement the cuda array interface. at cuda.as_cuda_array(D.detach())

Dear Maghoumi,
Firstly, I would like to thank your effort on this implementation.

I am trying to use your code, but i am getting an error: *obj* doesn't implement the cuda array interface.
At the code block:

# Run the CUDA kernel.
# Set CUDA's grid size to be equal to the batch size (every CUDA block processes one sample pair)
# Set the CUDA block size to be equal to the length of the longer sequence (equal to the size of the largest diagonal)
compute_softdtw_cuda[B, threads_per_block](cuda.as_cuda_array(D.detach()),
                                        gamma.item(), bandwidth.item(), N, M, n_passes,
                                        cuda.as_cuda_array(R))

Could you please help me know where the reason is?
My configuration:
GPU: GeForce RTX 2080 Ti
Cuda: Cuda compilation tools, release 10.0, V10.0.130
Cuda driver: version 452.06
torch: version 1.6.0

CUDA_ERROR_INVALID_VALUE

Hi guys,

Im trying to run SoftDTW between two tensors with the following dimensions:

bs x num_points x data_dimensionality

relevant_centerlines = 1024 x 30 x 2
predictions = 1024 x 30 x 2

As following:

error = sdtw(relevant_centerlines, predictions)

However, Im getting the following error. What can I do?

[INFO:  246]: init
[ERROR:  385]: Call to cuPointerGetAttribute results in CUDA_ERROR_INVALID_VALUE
*** numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuPointerGetAttribute results in CUDA_ERROR_INVALID_VALUE

[Feature Request] Compatibility with Pytorch packed sequences

Can a newer version be created which is capable of handling PackedSequence? Working with time series requires batching different length sequences together, to which PackedSequence & these two are a great help.

Hence, compatibility with this data structure would be a great plus.

Does this implementation support distributed training in Pytorch?

I have tried to use DistributedDatraParallel to run your script which is included in my model, but there are some unexpected problems. So i come to ask whether your script support distributed training as the original version (https://github.com/Sleepwalking/pytorch-softdtw) works well for distributed training.

Compare with Cython implement

Thank you @Maghoumi , I am investigating your amazing source code, but have you compare this with the Cython implement: https://github.com/mblondel/soft-dtw

I have used your softDTW in CUDA but with the Batch Size = 1, my GPU is nearly Out of memory, can not increase more

Here is my input size,

I wonder how can you use the batch size up to 256

maghoumi / pytorch-softdtw-cuda Goto Github PK

pytorch-softdtw-cuda's Introduction

Hi there 👋

What's all this? 🤔

pytorch-softdtw-cuda's People

Contributors

Stargazers

Watchers

Forkers

pytorch-softdtw-cuda's Issues

Recommend Projects

Recommend Topics

Recommend Org