Comments (19)
It is extremely slow on my end to browse either site but from reading the home pages they both look promising for the job! As long as installation can be done via pip it should not be a problem to set it up.
from sos.
I have implemented a simple version of parallel execution that allows step processes within the same step to be executed in parallel. This is very crude but to do that I have already had to change SoS syntax to execute step processes in separate processes independent of SoS itself. This is certainly necessary otherwise no step process can be executed safely outside of SoS (e.g submit to Celery as independent task).
I am using multiprocessing pool but we can switch to python 3's async libraries or Celery as long as we have the DAG correct. I suspect that we can switch backend executioner for different running environment.
from sos.
Great and looking to test this feature in the next couple of days!
from sos.
Celery is officially used in SoS now because a multiprocessing bug (feature) prevents us from spawning a new workflow from a workflow step (nested workflow). Celery.billiard
is used and works ok now.
Note that we do not need to have a DAG to execute steps in parallel. I mean, all we need to do is
- get the dependencies of all steps.
- put all steps in a pool, and execute any step (up to -j) with met dependencies.
- Whenever a step is completed, update the pool and execute one or more steps with met dependencies.
SoS is now Celery ready (things are submitted in separate processes) so we can easily extend the code to cluster after the above is done.
from sos.
Great! But I noticed big performance issues after I upgraded. For example this one
test.sos.txt
Command sos run test.sos.txt DSC -d
takes 12 seconds which I believe was 1 second before the update. Is it the same problem on your end, or something is wrong about my packages?
from sos.
Confirmed. The script took about 4s without the patch and 12s now. So the Celery stuff is very costly? This is certainly unexpected although I tend to think 8s is nothing for workflows if this is the cost of creating processes.
from sos.
I just worry that the slowness is proportional to the number of commands implied by the script (e.g. concurrent for loops) ... in that case parallelation with celery will actually harm ...
from sos.
The problem is with the pool.close()
and pool.join()
, which waits about 1 s. So there is about 1 second wait for the completion of each workflow and your example is the worst because it has 9 workflows. I am still investigating because some pool.close
is fast.
ERROR: start waiting
ERROR: Step completed 0.006140947341918945
ERROR: Step completed 0.0060999393463134766
ERROR: results returned 0.007298946380615234
ERROR: wait join 1.0097160339355469
ERROR: Step completed 3.094782829284668
ERROR: start waiting
ERROR: results returned 1.7881393432617188e-05
ERROR: wait join 9.083747863769531e-05
from sos.
If the slowness is on workflow level then I agree we can live with it. There must be good reason celery decides to wait. But it is interesting the wait time differ by several magnitudes!
from sos.
Not sure how easy it is but if we make -j1
not using celery at all then at least dryrun will not be that frustrating, e.g. faking a "null" interface after celery and use that for j1
.
from sos.
I have further investigated this issue and it turns out that pool.join
is slow. The fast join is when there is only one process. The current behavior is wrong anyway because '-j1' should not trigger pool even when concurrent=True
. I have fixed this issue so your example should run in sequential mode, and be fast without all the overhead of processes.
Overall I do not think this is an issue because step processes are supposed to be running much longer than 1s and can benefit from multiprocessing. Fast processes should be put without or before process
so they will always be executed sequentially. SoS certainly allows such flexibility.
from sos.
Good! I confirm j1
works and I agree the overhead is acceptable.
from sos.
If I understand correctly, a so-called celery cluster requires us to
- start celery worker process on a few computing nodes
- start message passing between nodes
- distribute tasks to workers
I like this approach because the current VPT approach requires us to use ssh headnode qsub jobs
to submit big jobs and we have no control over the submitted jobs and can only quite or wait for the completion of the tasks.
Also, the celery.group
etc and flower monitor system might be helpful for us.
from sos.
http://dask.pydata.org/en/latest/ also looks promising.
For record
http://distributed.readthedocs.org/en/latest/related-work.html
dask vs spark: http://dask.pydata.org/en/latest/spark.html
More on tasks and celery https://www.fullstackpython.com/task-queues.html
from sos.
You can also have a look at snakemake's dag class. At this point we can actually learn many things from snakemake, such as dag, cluster support etc. It is called stealing though. :-)
from sos.
Yes that's a 900 lines of python script. I was under the impression it requires the graph structure known at the beginning, thus I thought may not be a good option. Will see if that's the case.
Snakemake cluster support might have problems, though:
from sos.
I see from the thread that snakemake runs on each node ... this is not what I have in mind because I would like to send the jobs to computing nodes. But that approach has the advantage that it might run a whole branch of jobs, instead of a single job, on a node...
from sos.
To me, if we cannot interact with the cluster directly and have to rely on qsub like snakemake, it is no better than we support an --export
command that prepares all the resources and export commands / scripts to 'parallelable' batches, so that users can easily submit jobs. I think this is also easier to troubleshoot. On cluster system it helps to be more transparent. Attempts to interact with it may be ill-fated. The question on that thread may well be cluster environment specific but it ended up becoming snakemake's headache.
from sos.
Discussed on more specific threads.
from sos.
Related Issues (20)
- Combined use of -r and -q
- Singularity exec vs run HOT 6
- bug found in entrypoint version HOT 1
- Installing sos into the tljh HOT 1
- Singularity clear environment when execute containers HOT 7
- Singularity to recognize `oras://` HOT 5
- SoS Not Registering My Alias for Singularity HOT 5
- Retain the structure of variable for `input` grouping HOT 3
- tasks fails with `bash` is not defined
- Change the destination of where failed script is written to HOT 19
- Alternative task_template?
- Only works with nbclassic? HOT 1
- Requiring shared directory for remote execution of tasks HOT 3
- Remove `R` etc from sos namespace HOT 2
- tests hang due to clean up issues HOT 1
- subclass pathlib.Path for path
- Using a sos namespace
- SoS is broken on Python 3.12? HOT 6
- Outdated SoS conda packages HOT 13
- Documentation Typo: Uisng the markdown kernel HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sos.