Hi, I found that the following code would fail: <div class="snip

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multi-GPU support for the PyTorch bindings? about tiny-cuda-nn HOT 5 CLOSED

nvlabs commented on May 14, 2024

Multi-GPU support for the PyTorch bindings?

from tiny-cuda-nn.

Comments (5)

yongsiang-fb commented on May 14, 2024 1

Yes, you are absolutely right. It's "current" device instead of "default" device. CUDAGuard would store the original current device, and sets the current device to the specified device. After a CUDAGuard goes out of scope, it sets the current device back to the original current device.

from tiny-cuda-nn.

Tom94 commented on May 14, 2024

Hi there, yes, unfortunately tiny-cuda-nn does not support multi-GPU operation as of now. This is something that'd be cool to have in the future, but currently is not a high priority.

I'm going to leave this issue open to serve as a TODO marker.

Cheers!

from tiny-cuda-nn.

yongsiang-fb commented on May 14, 2024

@Tom94 Thanks for the quick response! Hopefully it could be implemented one day. But I think even if we do not support multi-GPU for now, it should still be possible to support the single GPU case where both input & the network is on cuda:1? In particular, I think we probably just need to use a CUDAGuard to change the default GPU to be the same as the one net.params is using?

I am guessing that the error was caused by a mismatch between the default GPU and the inputs/net.params' GPU. I even have some hope that maybe th.nn.DataParallel would also work as long as there is CUDAGuard, but maybe that's just my wishful thinking. Would be great to hear about your thoughts. Thanks!

from tiny-cuda-nn.

Tom94 commented on May 14, 2024

tcnn uses whichever CUDA device is "current" on the CPU thread, i.e. the device returned by cudaGetDevice. This device needs to remain the same across all calls into tcnn.

If CUDAGuard controls this (I haven't tried it myself), then yes, it should work the way you describe. If not, please let me know and I can add a tcnn-specific version of CUDAGuard that'll do the trick.

from tiny-cuda-nn.

MultiPath commented on May 14, 2024

Hi, do you have any plans on supporting multi-GPU training recently? Or do you have some hints about why the current code prevents using multiple GPUs with Pytorch distributed training? I think it will be super useful to have tinycudann used to train large-scale models with tiny MLPs. Thanks

from tiny-cuda-nn.

Recommend Projects

Multi-GPU support for the PyTorch bindings? about tiny-cuda-nn HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent