Comments (4)
Buried in the help page ?SnowParam is this note:
NOTE: The \code{PSOCK} cluster from the \code{parallel} package does not
support cluster options \code{scriptdir} and \code{useRscript}. \code{PSOCK}
is not supported because these options are needed to re-direct to an
alternate worker script located in BiocParallel.
But naive testing suggests this no longer seems to be the case (either because of changes in parallel or BiocParallel) so I have started a 'PSOCK' branch.
Is there an easy way to generate the socket connection error?
from biocparallel.
Buried in the help page ?SnowParam is this note:
NOTE: The \code{PSOCK} cluster from the \code{parallel} package does not support cluster options \code{scriptdir} and \code{useRscript}. \code{PSOCK} is not supported because these options are needed to re-direct to an alternate worker script located in BiocParallel.
But naive testing suggests this no longer seems to be the case (either because of changes in parallel or BiocParallel) ...
I missed that note. I don't think I've ever seen argument scriptdir
or useRscript
in the parallel package. They don't appear if one searches https://hughjonesd.shinyapps.io/rcheology/.
Looking at snow, it looks like scriptdir
is used to point to the R script that runs the parallel workers. If so, then that's handled by parallel without scripts using an internal function, e.g.
'/path/to/lib/R/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.workRSOCK()' MASTER=localhost PORT=11312 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential
... so I have started a 'PSOCK' branch.
Excellent.
Is there an easy way to generate the socket connection error?
I don't think so. It's a race condition that appears when many R processes try to create a cluster using the same port. Give that the default is randomizing a port from 11000:11999, it only happens once in a while, but if you check enough things in parallel you end up with it often enough for it to add friction. Before R 4.0.0, I did see it once in a while happening to the future package on the CRAN servers, because I do tons of testing there. It disappeared at the next round of checks.
BTW, I'm not sure, but I also think the race condition could also happen to launch parallel workers in one R CMD check
and another one would actually connect to those workers. If the latter was faster enough, it could completely successfully, but if the original check terminated before, then it would shut down those workers, breaking the check for the other package. The SOCK/PSOCK protocol does not protect a non-owning R process from connecting, including those ran by other users. This is actually a security issue on multi-user servers, but that's another story.
from biocparallel.
Yes, parallel's implementation doesn't allow customization of the worker startup script, whereas snow (& therefore SOCK, MPI, FORK) can (and are, by BiocParallel) be customized.
Looking a little more deeply makes it seem likely that BiocParallel's log = TRUE
option would be affected, which you can see in the 'Log messages' and 'stdout' sections
> res <- bplapply(1:2, message, BPPARAM = SnowParam(type = "PSOCK", log = TRUE))
############### LOG OUTPUT ###############
Task: 1
Node: 6
Timestamp: 2022-11-21 17:34:29.946785
Success: TRUE
Task duration:
user system elapsed
0.186 0.007 0.198
Memory used:
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1209511 64.6 2057557 109.9 NA 2057557 109.9
Vcells 2873984 22.0 8388608 64.0 32768 8388267 64.0
Log messages:
stderr and stdout:
...
versus
res <- bplapply(1:2, message, BPPARAM = SnowParam(type = "SOCK", log = TRUE))
############### LOG OUTPUT ###############
Task: 2
Node: 5
Timestamp: 2022-11-21 17:34:36.612367
Success: TRUE
Task duration:
user system elapsed
0.090 0.006 0.109
Memory used:
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1209512 64.6 2057557 109.9 NA 2057557 109.9
Vcells 2873982 22.0 8388608 64.0 32768 8388267 64.0
Log messages:
INFO [2022-11-21 17:34:36] loading futile.logger package
stderr and stdout:
2
############### LOG OUTPUT ###############
from biocparallel.
Yes, parallel's implementation doesn't allow customization of the worker startup script, whereas snow (& therefore SOCK, MPI, FORK) can (and are, by BiocParallel) be customized.
You can probably use rscript_args
to customize the startup process of each worker, e.g. rscript_args = c("-e", shQuote('setwd("/path/to")'))
.
FWIW, I've made some of these things easier and more robust in parallelly::makeClusterPSOCK()
.
from biocparallel.
Related Issues (20)
- Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal HOT 5
- Performance (speed) degradation in MulticoreParam with default force.G = TRUE HOT 3
- Nested parallellization question HOT 6
- stop.on.error = FALSE for DoParam doesn't work as expected HOT 2
- Troubles with bplapply within function (using SnowParam on Windows) HOT 6
- MulticoreParam bplapply unable to restart upon interrupt HOT 1
- Increase depth of traceback beyond `tryCatch()` for bp* functions - possible enhancement HOT 1
- The running time isn't reduced when using bplapply()? HOT 3
- BiocParallel errors HOT 4
- Handle worker abort better HOT 1
- move Rmpi to Enhances: HOT 1
- "foreach" %in% loadedNamespaces() instead of "package:foreach" %in% search()? HOT 1
- BiocParallel errors:could not find function ".OLD_read_block" HOT 11
- I meet a error when I use BiocParallel HOT 6
- BiocParallel : long vectors are not supported in .C() HOT 5
- BiocParallel for parallelization in BEER: Error and GPU Compatibility HOT 8
- BatchtoolsParam fails to propagate errors in bpiterate HOT 2
- Extremely minor: typo in docs? man/MulticoreParam-class.Rd HOT 1
- BiocParallel socketConnection error HOT 5
- strategy of tasks in MulticoreParam HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biocparallel.