Comments (8)
To Reproduce
Setup a HPS and execute
deephyper hps ambs --problem <myProblem> --run <myRun>
I have forced my system not to detect the available GPUs with export CUDA_VISIBLE_DEVICES=
. I can confirm this is the case in a python
shell:
>>> import tensorflow as tf
>>> gpus = tf.config.experimental.list_physical_devices('GPU')
>>> logical_gpus = tf.config.experimental.list_logical_devices('GPU')
>>> print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
0 Physical GPUs, 0 Logical GPUs
Expected behavior
I was expecting the different Ray tasks created to run in parallel on my CPU threads. Ray is creating 24 tasks (as many as my CPU threads) but only one is actually running, all the rest are IDLE.
Desktop
- OS: Ubuntu 18.04.4 LTS
- System: Local machine
- Python version: 3.7.7
- DeepHyper Version: 0.1.11
- TensorFlow Version: 2.2.0
Additional context
I don't know if Ray is actually made to distribute tasks on a single machine, but if it is possible it would speed a local search up.
from deephyper.
Thanks a lot @Deathn0t! This works for me! Here is a screenshot too:
So at the moment I'm using the CPU and all the processes run in parallel on the different threads. I'm wondering then which parallelization strategy is more efficient to use:
- Use the different CPU threads, 1 for each task (this issue).
- Use 1 task and distribute the training into multiple GPUs (2 in my case).
- Use as many tasks as the GPUs (2 in my case) and parallelize the HPS, not the training.
Of course the answer to this might differ for each problem at hand (model and batch size vs GPU memory are important to consider) but if there is interest to know you can keep this issue open and I'll post the results of my findings.
from deephyper.
Could you please fill the following form so that I can help you better to resolve this issue:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
- OS: [e.g. Ubuntu]
- System: [optional, e.g. Theta]
- Python version [e.g. 3.8]
- DeepHyper Version [e.g. 0.1.11]
Additional context
Add any other context about the problem here.
from deephyper.
I found an easy solution to this kind of behavior, just replace this line (link to code) by:
self.num_cpus = int(sum([node["Resources"]["CPU"] for node in ray.nodes()]))
self.num_workers = self.num_cpus
Let me know if it works well, I tried it with ray==0.7.6
on MacOS.
from deephyper.
@ekourlit I added it in this commit 266bb2b as well as a --num-workers
argument from the command line to set the number of jobs launched in parallel by all evaluators.
from deephyper.
Awesome!
It is definitely a good question to ask... As you said it probably depends on the model (if neural networks: size, and type of layers). For CNN it definitely makes sense to use the GPUs.
Also, AMBS is Bayesian-based which is a sequential process. In DeepHyper we use a liar strategy to make it asynchronous, but still, if you optimize sequentially it can be very efficient too. Finally, the acquisition function estimator can be parallelized on the CPU cores. So, a good setup is probably to use the CPU for this estimator (at list part of it) and use the GPUs to distribute models. I will let this issue open so that you can share your findings. Thanks!
from deephyper.
Hi @Deathn0t ,
I attach you some slides summarizing the studies I made on the 3 parallelization strategies I mentioned above. In short, the distribution of the HPS with Ray tasks as many as my GPUs has the highest performance to my case. I agree with you that apart from the fully connected layer case, the GPU usage should greatly enhance the performance. In my case also, the data/batch size is not too large to be benefited from the distribution of the training (data) over multiple GPUs.
Regarding the memory problem I faced, I tried the solution using multiprocessing
proposed here but it didn't work for me. When I use GPUs I get the error:
E tensorflow/stream_executor/cuda/cuda_driver.cc:1045] failed to enqueue async memcpy from host to device: CUDA_ERROR_NOT_INITIALIZED: initialization error; GPU dst: 0x7f7f63000d00; host src: 0x5614a238a040; size: 8=0x8
while when I use CPU the Keras model.fit()
method hangs. I have to note though that in the thread people are commenting that the solution is not working for TF 2.2 on RTX cards, which is exactly my case.
Slides: DH-distributed.pdf
from deephyper.
Hello @ekourlit , the slides are awesome! I am quite happy with the results shown in them because they confirm some experiments I did at scale (without GPU) but doing data-parallelism across different CPUs. For data-parallelism, it is important to increase the batch size and learning rate as mentioned in this work (link to article). But, as you mentioned there is a limitation in the size of the dataset...
The speed up with the Parallel Training on GPUs looks quite encouraging! I am not running on GPUs right now so it will be hard for me to help you with this memory issue...
I will now close this issue because the questions raised are answered but I encourage you to open a new one describing the memory issue.
Thank you again for these well-documented results!
from deephyper.
Related Issues (20)
- [BUG] fit_surrogate does not handle objective values with failures "F"
- TypeError in GaussianProcessRegressor.fit() during CBO search HOT 4
- [FEATURE] Add option to minimise run-time to profile decorator
- [FEATURE] Handling of failures for MOO HOT 1
- [FEATURE] Checkpoint/Restart with fit_surrogate for MOO
- [FEATURE] add "pareto_efficient" boolean to result data frame of search HOT 1
- [FAQ] Direct Saving of Model with Best Weights and Biases HOT 2
- [DOC] RedisJSON Spack installation HOT 2
- [BUG] Multi-Objective Optimization - 101 - Error when loading the benchmark HOT 1
- [FEATURE] profile decorator should measure memory consumption HOT 1
- [OPT] Vectorise multi-objective scalarization functions
- [FEATURE] NAS search should collect metadata of trained neural networks
- [BUG] mpicomm KeyError:0 bug on ALCF Polaris after slingshot upgrade HOT 2
- [FEATURE] Parallel Coordinates Plots for Hyperparameter optimization results
- [BUG] UnboundLocalError: local variable 'max_features' referenced before assignment
- [DOC] Grid search with CBO HOT 2
- [Bug] Deprecated max_features="auto" in RandomForestRegressor causes UnboundLocalError in sklearn versions >=1.3 HOT 1
- [BUG] CBO fails when Multi-Objective Optimization uses constant liar HOT 1
- [BUG] NAS fails with Keras 3.0 HOT 1
- [BUG] why is cython a dependency? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deephyper.