Comments (9)
Have you tried DataParallelTable at all (with NCCL)? With the default minibatch size of 32 I doubt it'll work that well, but worth double checking? If not then an optional 2 GPU switch for the target network might be decent.
As for a cluster, there's always DistLearn. Though I would rather focus on integrating single-machine async Q-learning first.
from atari.
@Kaixhin Haven't tried DataParallel with Dqn as I thought it would not speed up much the small convnet in Dqn (in supervised learning it only helps beyond a certain network size only). But I will give it a try to compare with the policy/target net split up.
from atari.
@lake4790k Worth seeing just in case (with NCCL for sure). How does this code interact with your async code? As this is GPU-only and the other is CPU-only I can imagine there'll be a lot of added complexity with both.
from atari.
@Kaixhin I think the multi gpu support is not that complicated to add in the existing Atari (master) code. The async mode needs more refactoring, but doesn't need any of gpu related functionality as should be CPU only. So that I would do in the separate async branch for now.
It's definitely a challenge to support all the modes in a single codebase (but makes sense). Maybe I'll add some basic testcase (eg. catch) which can be run fast to see if nothing is broken by adding new stuff on top...
from atari.
@Kaixhin strange when I first set up multigpu I compared with and without nccl and saw no speed difference (used the torch blog CIFAR-10 code with R4), I have nccl installed so will test with that.
from atari.
I had a quick look at the speed of running the policy and target nets in parallel, but in the Atari code didn't see much speed difference. This could be because I tried a bigger network before or could be in Atari the memory access is also a dominant factor not only the network forwards.
I'll try the DataParallelTable approach later, but could be the Atari convnet is not big enough to gain much, in which case there's no point in complicating the code. One can also just run multiple separate experiments on multiple gpus, that scales perfectly...
from atari.
I also think that in this setup there's a lot of overhead from other sources. Unless DataParallelTable produces significant gains (unlikely), go ahead and close this issue. Probably not worth the extra complication of implementing this.
from atari.
@lake4790k Any update? Think we can close this unless you want to try more experiments.
from atari.
Haven't look at this further, but agreed, I'll close this as if one has multiple gpus, best use is to just run multiple experiments to make best use of the resources. Makes more sense to work on algorithmic improvements than this.
from atari.
Related Issues (20)
- Implement Memory Q-networks
- Implement Retrace(λ)
- Finish prioritised experience replay HOT 2
- Allow non-visual environments
- Can I convert rank-based prioritized experience replay to a python version HOT 2
- Async A3C Network Outputs NaN HOT 4
- Load models like environments HOT 2
- Disagreements with the async paper HOT 2
- Possible improvements on speeding up HOT 1
- problem in Agent.lua HOT 1
- gnuplots memory unreleased HOT 1
- Why is the current sharedRmsprop thread safe? HOT 2
- Implement optimality tightening HOT 8
- What is the actual performance? HOT 7
- Refactor DQN train function into separate functions
- Partition number and segments HOT 1
- How to process with the salient map? HOT 4
- actor-critic based HOT 2
- About A3C HOT 1
- Questions about training A3C HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from atari.