Describe the bug From <a class="user-mention notran

To Reproduce Setup a HPS and execute <div class="highlight hi

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

[BUG] Ray tasks are not Distributed on the different threads of a same node about deephyper HOT 8 CLOSED

deephyper commented on May 18, 2024

[BUG] Ray tasks are not Distributed on the different threads of a same node

from deephyper.

Comments (8)

ekourlit commented on May 18, 2024 1

To Reproduce
Setup a HPS and execute

deephyper hps ambs --problem <myProblem> --run <myRun>

I have forced my system not to detect the available GPUs with export CUDA_VISIBLE_DEVICES=. I can confirm this is the case in a python shell:

>>> import tensorflow as tf
>>> gpus = tf.config.experimental.list_physical_devices('GPU')
>>> logical_gpus = tf.config.experimental.list_logical_devices('GPU')
>>> print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
0 Physical GPUs, 0 Logical GPUs

Expected behavior
I was expecting the different Ray tasks created to run in parallel on my CPU threads. Ray is creating 24 tasks (as many as my CPU threads) but only one is actually running, all the rest are IDLE.

Screenshots

Desktop

OS: Ubuntu 18.04.4 LTS
System: Local machine
Python version: 3.7.7
DeepHyper Version: 0.1.11
TensorFlow Version: 2.2.0

Additional context
I don't know if Ray is actually made to distribute tasks on a single machine, but if it is possible it would speed a local search up.

from deephyper.

ekourlit commented on May 18, 2024 1

Thanks a lot @Deathn0t! This works for me! Here is a screenshot too:

So at the moment I'm using the CPU and all the processes run in parallel on the different threads. I'm wondering then which parallelization strategy is more efficient to use:

Use the different CPU threads, 1 for each task (this issue).
Use 1 task and distribute the training into multiple GPUs (2 in my case).
Use as many tasks as the GPUs (2 in my case) and parallelize the HPS, not the training.

Of course the answer to this might differ for each problem at hand (model and batch size vs GPU memory are important to consider) but if there is interest to know you can keep this issue open and I'll post the results of my findings.

from deephyper.

Deathn0t commented on May 18, 2024

@ekourlit,

Could you please fill the following form so that I can help you better to resolve this issue:

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. Ubuntu]
System: [optional, e.g. Theta]
Python version [e.g. 3.8]
DeepHyper Version [e.g. 0.1.11]

Additional context
Add any other context about the problem here.

from deephyper.

Deathn0t commented on May 18, 2024

I found an easy solution to this kind of behavior, just replace this line (link to code) by:

self.num_cpus = int(sum([node["Resources"]["CPU"] for node in ray.nodes()]))
self.num_workers = self.num_cpus

Let me know if it works well, I tried it with ray==0.7.6 on MacOS.

from deephyper.

Deathn0t commented on May 18, 2024

@ekourlit I added it in this commit 266bb2b as well as a --num-workers argument from the command line to set the number of jobs launched in parallel by all evaluators.

from deephyper.

Deathn0t commented on May 18, 2024

Awesome!

It is definitely a good question to ask... As you said it probably depends on the model (if neural networks: size, and type of layers). For CNN it definitely makes sense to use the GPUs.

Also, AMBS is Bayesian-based which is a sequential process. In DeepHyper we use a liar strategy to make it asynchronous, but still, if you optimize sequentially it can be very efficient too. Finally, the acquisition function estimator can be parallelized on the CPU cores. So, a good setup is probably to use the CPU for this estimator (at list part of it) and use the GPUs to distribute models. I will let this issue open so that you can share your findings. Thanks!

from deephyper.

ekourlit commented on May 18, 2024

Hi @Deathn0t ,

I attach you some slides summarizing the studies I made on the 3 parallelization strategies I mentioned above. In short, the distribution of the HPS with Ray tasks as many as my GPUs has the highest performance to my case. I agree with you that apart from the fully connected layer case, the GPU usage should greatly enhance the performance. In my case also, the data/batch size is not too large to be benefited from the distribution of the training (data) over multiple GPUs.

Regarding the memory problem I faced, I tried the solution using multiprocessing proposed here but it didn't work for me. When I use GPUs I get the error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1045] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED: initialization error; GPU dst: 0x7f7f63000d00; host src: 0x5614a238a040; size: 8=0x8

while when I use CPU the Keras model.fit() method hangs. I have to note though that in the thread people are commenting that the solution is not working for TF 2.2 on RTX cards, which is exactly my case.

Slides: DH-distributed.pdf

from deephyper.

Deathn0t commented on May 18, 2024

Hello @ekourlit , the slides are awesome! I am quite happy with the results shown in them because they confirm some experiments I did at scale (without GPU) but doing data-parallelism across different CPUs. For data-parallelism, it is important to increase the batch size and learning rate as mentioned in this work (link to article). But, as you mentioned there is a limitation in the size of the dataset...

The speed up with the Parallel Training on GPUs looks quite encouraging! I am not running on GPUs right now so it will be hard for me to help you with this memory issue...

I will now close this issue because the questions raised are answered but I encourage you to open a new one describing the memory issue.

Thank you again for these well-documented results!

from deephyper.

[BUG] Ray tasks are not Distributed on the different threads of a same node about deephyper HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent