I got some speedup by using dual gpus dedicating one for the policy, other for the tar

Have you tried <a href="https://github.com/torch/cunn/blob/master/doc/cunnmodules.md#d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multi gpu support about atari HOT 9 CLOSED

lake4790k commented on August 21, 2024

Multi gpu support

from atari.

Comments (9)

Kaixhin commented on August 21, 2024

Have you tried DataParallelTable at all (with NCCL)? With the default minibatch size of 32 I doubt it'll work that well, but worth double checking? If not then an optional 2 GPU switch for the target network might be decent.

As for a cluster, there's always DistLearn. Though I would rather focus on integrating single-machine async Q-learning first.

from atari.

lake4790k commented on August 21, 2024

@Kaixhin Haven't tried DataParallel with Dqn as I thought it would not speed up much the small convnet in Dqn (in supervised learning it only helps beyond a certain network size only). But I will give it a try to compare with the policy/target net split up.

from atari.

Kaixhin commented on August 21, 2024

@lake4790k Worth seeing just in case (with NCCL for sure). How does this code interact with your async code? As this is GPU-only and the other is CPU-only I can imagine there'll be a lot of added complexity with both.

from atari.

lake4790k commented on August 21, 2024

@Kaixhin I think the multi gpu support is not that complicated to add in the existing Atari (master) code. The async mode needs more refactoring, but doesn't need any of gpu related functionality as should be CPU only. So that I would do in the separate async branch for now.

It's definitely a challenge to support all the modes in a single codebase (but makes sense). Maybe I'll add some basic testcase (eg. catch) which can be run fast to see if nothing is broken by adding new stuff on top...

from atari.

lake4790k commented on August 21, 2024

@Kaixhin strange when I first set up multigpu I compared with and without nccl and saw no speed difference (used the torch blog CIFAR-10 code with R4), I have nccl installed so will test with that.

from atari.

lake4790k commented on August 21, 2024

I had a quick look at the speed of running the policy and target nets in parallel, but in the Atari code didn't see much speed difference. This could be because I tried a bigger network before or could be in Atari the memory access is also a dominant factor not only the network forwards.

I'll try the DataParallelTable approach later, but could be the Atari convnet is not big enough to gain much, in which case there's no point in complicating the code. One can also just run multiple separate experiments on multiple gpus, that scales perfectly...

from atari.

Kaixhin commented on August 21, 2024

I also think that in this setup there's a lot of overhead from other sources. Unless DataParallelTable produces significant gains (unlikely), go ahead and close this issue. Probably not worth the extra complication of implementing this.

from atari.

Kaixhin commented on August 21, 2024

@lake4790k Any update? Think we can close this unless you want to try more experiments.

from atari.

lake4790k commented on August 21, 2024

Haven't look at this further, but agreed, I'll close this as if one has multiple gpus, best use is to just run multiple experiments to make best use of the resources. Makes more sense to work on algorithmic improvements than this.

from atari.

Multi gpu support about atari HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent