Comments (6)
Hello @AymenTlili131,
We really appreciate your feedback. I was able to reproduce the same error on my end and it seems to me that this is not a matter of NCCL setup but the numbers of GPUs you're trying to assign. When running on NCCL, torch distributed receives the argument --nproc_per_node
as the number of GPUs you have available in your sytem to run the simulation, however FLUTE requires at least 2 in order to launch: 1 Server and 1 Worker that can execute many clients, but I can see you only have 1 available (GPU 0).
This is the stacktrace .. as you can see the problem occurs at the assignation time.
I took a look at NCCL test repo and noticed that the -g
argument correspond to the number of available GPUs, this is the reason of the fail, given that you only have 1 available it's not able to run with a higher number.
You can find more information about FLUTE architecture here. There is one issue already open for this situation here: #15 , we apologize for the inconvenience at this moment.
Regarding the comments about the requirements/ python versions, we will make sure to update them during the next commit.
Let me know if this information is useful or if we can provide more support on this. 🙂
Thanks,
Mirian
from msrflute.
Hey @Mirian-Hipolito
Things are up and running on my end . I'm grateful for your explanation and support and hope you and the maintainers have a wonderful rest of week .
I'll make sure to cite the FLUTE team if I find anything useful !
thanks again
Kind regards
from msrflute.
Hello @AymenTlili131, we are happy to share that we have removed the restriction of minimum number of GPUs to run FLUTE in our latest release. For more documentation about how to run an experiments using a single GPU, please refer to the README.
from msrflute.
Hey Mirian ,
This is great news .I gained access to other GPUs meanwhile and did experiment with working on them remotely but thanks to your efforts and your colleagues' I can experiment with tweaks and proofs of ideas at a much smaller scale . Greatly appreciate and thanks to the entire Microsoft family
from msrflute.
Thanks for writing back so soon ,
I'll request access to a workstation with 2 or more GPUs and test it for myself but this is a solid and good explination to why the error was raised , thanks !
I'd still like to keep the issue open until I confirm that it indeed works (not more than a week ).
Reading the linked FLUTE architecture it should and will work but hopefully i won't take long with the environment setup and testing before I get back to you .
from msrflute.
Thanks @AymenTlili131! Let us know if this issue persists.
Regards,
Mirian.
from msrflute.
Related Issues (13)
- mpi4py installation error -Remediation HOT 1
- This repo is missing important files
- Sample Code Running Error HOT 1
- RFC: single-GPU setups, improving worker 0 utilization HOT 1
- Sample code CUDA issue HOT 3
- Could you provide a multi-node execution example? HOT 3
- FLUTE GPU utilisation vs performance HOT 2
- Annealing LR Scheduler required? HOT 1
- Replay function on the Server is breaking HOT 1
- profiling error HOT 1
- Request fo Xbox client HOT 1
- The git submodule of Privacy Accounting didnt work HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from msrflute.