Comments (5)
Hi,
Thanks for the answer. After my colleague posted this question, we actually managed to add DataParallel
logic to the learner, and get roughly 2.5x-3x speedup (when distributing the learner across 3 GPUs while using another GPU for the actor). We would be happy to issue a pull request if this would be useful to the community.
Having done that, we are looking for other ways to speed up training further, and would greatly appreciate feedback from your end on the following ideas:
- Switching from
DataParallel
toDistributedDataParallel
, like you suggested. I haven't usedDistributedDataParallel
before, but from the docs, it seems that I would have to wrap thetrain()
function frompolybeast_learner.py
withDDP
. Could you confirm this intuition? - Also distributing the
actor model
across multiple GPUs. We have already tried this, but currently it does not work due to dynamic batching. AFAIK,DataParallel
should handle undivisible batch sizes, but in our case it seems to divide the batch incorrectly, leading to errors. - Increasing
num_actors
. It seems like the current bottleneck for us is actually the number of actors. Is there a way to decide on the optimal value for this hyperparameter (in terms of runtime), or should I just try a bunch of different values and pick the fastest? For reference, our machine has 128 cores. Also, is it possible with the current code to run agents across multiple machines? Finally, does this hyperparameter affect the learning dyamics?
Thanks a lot for your time.
from torchbeast.
That's a great question, and doing so is not currently easy. One way to go about this would be to add PyTorch DistributedDataParallel
logic to the learner. Another good (perhaps easier) way to use several GPUs is to run several experiments and have an additional "evolution" controller on top, which copies hyperparameters from one agent to the other.
In practise, we've used our GPU fleet to run more experiments in parallel instead. Later this year we also hope to share an update to TorchBeast that allows using more GPUs for a single learner, but it isn't quite ready yet.
from torchbeast.
Hey @ege-k,
That's great and we're more than happy to have a pull request for this.
As for your questions:
DataParallel
vs DistributedDataParallel
: I forgot about the exact details of these implementations, but I believe the main difference is that the latter uses multiprocessing as a way to get around contention on the Python GIL. This issue is more acute for RL agents, and slightly harder to deal with, since in this case the data itself is produced by the model and hence cannot just be loaded from disk to be ready when needed. Getting this to work in TorchBeast likely requires not using the top-level abstractions but the underlying tools.
Distribution the actor model: I'd assume a ratio of 1:3 or 1:4 for actor GPUs to learner GPUs is ideal in a typical setting. Once you want to use many more learner GPUs, distributing the actor model makes sense. This could be done by having different learners with different addresses and telling each actor which one to use. Dynamic batching would still happen, but only on that learner.
Your third question is the hardest one. Unfortunately, in RL often "everything depends on everything" so I cannot rule out that the number of actors influences the learning dynamics and therefore also changes the optimal hyperparameter setting. It certainly would if you also change batch sizes, which is likely required in order to find the best throughput. I don't think I know of a better way than to try various settings -- aiming to slightly overshoot as modern Linux kernels are quite efficient around thread/process scheduling and thus not a lot of waste is generated by the context switching.
As for the second part of your question: TorchBeast can be fully distributed if you find a way to tell each node the names of its peers. E.g., if you know about your setup and have fixed IP addresses, you could hardcode them. Often, that's not the case and you'll need some other means of communicating the names/addresses. E.g., you could use a shared file system (lots of issues around that, but it can work "in practise"), or a real name service, or a lock service on top of something like like etcd.
BTW, we are working on somewhat alternative designs currently and might have an update on that in a few weeks. Feel free to drop me an email if you would like to get an early idea of what we want to do.
from torchbeast.
Are there any updates on this @heiner?
I was also wondering if it would be possible for @ege-k @Batom3 to share their code with me?
from torchbeast.
Hey everyone!
I've since left Facebook, but my amazing colleagues are have written https://github.com/facebookresearch/moolib, which you should check out for a multi-GPU IMPALA.
from torchbeast.
Related Issues (20)
- "Done" default of 1 results in 0 reward episodes HOT 1
- Reproducing Pong Training Curve using Monobeast HOT 2
- Cannot reproduce the performance of "SpaceInvaders" game? HOT 1
- error install nest on Ubuntu 16 HOT 1
- How exactly monobeast and polybeast are different in performance perspective? HOT 1
- PolyBeast build fails with Python 3.8 HOT 8
- Minimum parameter configuration for not bad training results HOT 1
- Error on installing PolyBeast HOT 1
- How does IMPALA join the exploration mechanism? HOT 2
- Why doesn't the test code use LSTM? HOT 2
- Continuous Action Apace
- torchbeast works poorly on atari.
- request for support for continuous and multi-discrete action space HOT 2
- default instructions for monobeast in Pong HOT 1
- I encountered some problems when I ran the command pip install ".[polybeast]"
- Can't install polybeast in WSL2
- is it possible to use this architecture with multiple cpu?
- Reproducibility Issue for torchbeast
- Excessive memory use in monobeast.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchbeast.