Giter VIP home page Giter VIP logo

Comments (8)

ekourlit avatar ekourlit commented on May 18, 2024 1

To Reproduce
Setup a HPS and execute

deephyper hps ambs --problem <myProblem> --run <myRun>

I have forced my system not to detect the available GPUs with export CUDA_VISIBLE_DEVICES=. I can confirm this is the case in a python shell:

>>> import tensorflow as tf
>>> gpus = tf.config.experimental.list_physical_devices('GPU')
>>> logical_gpus = tf.config.experimental.list_logical_devices('GPU')
>>> print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
0 Physical GPUs, 0 Logical GPUs

Expected behavior
I was expecting the different Ray tasks created to run in parallel on my CPU threads. Ray is creating 24 tasks (as many as my CPU threads) but only one is actually running, all the rest are IDLE.

Screenshots
Screen Shot 2020-06-19 at 5 43 25 PM

Desktop

  • OS: Ubuntu 18.04.4 LTS
  • System: Local machine
  • Python version: 3.7.7
  • DeepHyper Version: 0.1.11
  • TensorFlow Version: 2.2.0

Additional context
I don't know if Ray is actually made to distribute tasks on a single machine, but if it is possible it would speed a local search up.

from deephyper.

ekourlit avatar ekourlit commented on May 18, 2024 1

Thanks a lot @Deathn0t! This works for me! Here is a screenshot too:
Screen Shot 2020-06-22 at 11 10 50 AM

So at the moment I'm using the CPU and all the processes run in parallel on the different threads. I'm wondering then which parallelization strategy is more efficient to use:

  1. Use the different CPU threads, 1 for each task (this issue).
  2. Use 1 task and distribute the training into multiple GPUs (2 in my case).
  3. Use as many tasks as the GPUs (2 in my case) and parallelize the HPS, not the training.

Of course the answer to this might differ for each problem at hand (model and batch size vs GPU memory are important to consider) but if there is interest to know you can keep this issue open and I'll post the results of my findings.

from deephyper.

Deathn0t avatar Deathn0t commented on May 18, 2024

@ekourlit,

Could you please fill the following form so that I can help you better to resolve this issue:

To Reproduce
Steps to reproduce the behavior:

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu]
  • System: [optional, e.g. Theta]
  • Python version [e.g. 3.8]
  • DeepHyper Version [e.g. 0.1.11]

Additional context
Add any other context about the problem here.

from deephyper.

Deathn0t avatar Deathn0t commented on May 18, 2024

I found an easy solution to this kind of behavior, just replace this line (link to code) by:

self.num_cpus = int(sum([node["Resources"]["CPU"] for node in ray.nodes()]))
self.num_workers = self.num_cpus

Let me know if it works well, I tried it with ray==0.7.6 on MacOS.

from deephyper.

Deathn0t avatar Deathn0t commented on May 18, 2024

@ekourlit I added it in this commit 266bb2b as well as a --num-workers argument from the command line to set the number of jobs launched in parallel by all evaluators.

from deephyper.

Deathn0t avatar Deathn0t commented on May 18, 2024

Awesome!

It is definitely a good question to ask... As you said it probably depends on the model (if neural networks: size, and type of layers). For CNN it definitely makes sense to use the GPUs.

Also, AMBS is Bayesian-based which is a sequential process. In DeepHyper we use a liar strategy to make it asynchronous, but still, if you optimize sequentially it can be very efficient too. Finally, the acquisition function estimator can be parallelized on the CPU cores. So, a good setup is probably to use the CPU for this estimator (at list part of it) and use the GPUs to distribute models. I will let this issue open so that you can share your findings. Thanks!

from deephyper.

ekourlit avatar ekourlit commented on May 18, 2024

Hi @Deathn0t ,

I attach you some slides summarizing the studies I made on the 3 parallelization strategies I mentioned above. In short, the distribution of the HPS with Ray tasks as many as my GPUs has the highest performance to my case. I agree with you that apart from the fully connected layer case, the GPU usage should greatly enhance the performance. In my case also, the data/batch size is not too large to be benefited from the distribution of the training (data) over multiple GPUs.

Regarding the memory problem I faced, I tried the solution using multiprocessing proposed here but it didn't work for me. When I use GPUs I get the error:

E tensorflow/stream_executor/cuda/cuda_driver.cc:1045] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED: initialization error; GPU dst: 0x7f7f63000d00; host src: 0x5614a238a040; size: 8=0x8

while when I use CPU the Keras model.fit() method hangs. I have to note though that in the thread people are commenting that the solution is not working for TF 2.2 on RTX cards, which is exactly my case.

Slides: DH-distributed.pdf

from deephyper.

Deathn0t avatar Deathn0t commented on May 18, 2024

Hello @ekourlit , the slides are awesome! I am quite happy with the results shown in them because they confirm some experiments I did at scale (without GPU) but doing data-parallelism across different CPUs. For data-parallelism, it is important to increase the batch size and learning rate as mentioned in this work (link to article). But, as you mentioned there is a limitation in the size of the dataset...

The speed up with the Parallel Training on GPUs looks quite encouraging! I am not running on GPUs right now so it will be hard for me to help you with this memory issue...

I will now close this issue because the questions raised are answered but I encourage you to open a new one describing the memory issue.

Thank you again for these well-documented results!

from deephyper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.