Comments (4)
To fix this I would add a parameter foreachPars to l*ply functions. Within llply a small change would be need in the call to foreach:
result = foreach(i = seq_len(n)) %dopar% do.ply(i)
changes into:
foreachPars = c(i = seq_len(n), foreachPars)
result = do.call('foreach', foreachPars) %dopar% do.ply(i)
I think this should work in the way I intend it...the use would be:
ldply(dat, .(category), bla, .parallel = TRUE, foreachPars = list(.export = 'y'))
The advantage would be that all foreach parameters can be used in a call to ldply without adding all the parameters explicitely to ldply.
from plyr.
I have a fix for this issue which involves no changes to plyr. Once a cluster is active, one can use clusterExport to load variables into the workers. An example:
library(ggplot2)
# Functions
createCluster = function(noCores, logfile = "") {
require(doSNOW)
cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
registerDoSNOW(cl)
return(cl)
}
bla = function(arg) {
return(arg$x*y)
}
# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])
# Create a cluster
cl = createCluster(2)
# Fails
# Error in do.ply(i) : task 1 failed - "object 'y' not found"
ddply(dat, .(category), bla, .parallel = TRUE)
# Works!
clusterExport(cl, list("y"))
ddply(dat, .(category), bla, .parallel = TRUE)
The same approach is possible for libraries (found this on stackoverflow):
clusterEvalQ(cl, library(boot))
I was far too much obsessed with solving this within foreach, while the solution was already there by loading stuff directly into workers :).
from plyr.
The following example includes an extended version of 'createCluster' which supports passing on objects to export and libraries to load. It requires an adapted version of clusterExport because it needs to find the variable to be exported not in the .GlobalEnv, but in the environment of the function.
library(ggplot2)
# Functions
clusterExport = local({
gets = function(n, v) { assign(n, v, envir = .GlobalEnv); NULL }
function(cl, list, envir = .GlobalEnv) {
## do this with only one clusterCall--loop on slaves?
for (name in list) {
clusterCall(cl, gets, name, get(name, envir = envir))
}
}
})
# Functions
createCluster = function(noCores, logfile = "/dev/null", export = NULL, lib = NULL) {
require(doSNOW)
cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
if(!is.null(export)) clusterExport(cl, export)
if(!is.null(lib)) {
l_ply(lib, function(dum) {
clusterExport(cl, "dum", envir = environment())
clusterEvalQ(cl, library(dum, character.only = TRUE))
})
}
registerDoSNOW(cl)
return(cl)
}
library(ggplot2)
library(doSNOW)
bla = function(arg) {
dum = ggplot(aes(x = x, y = x), data = arg)
summary(dum)
xi = bla2(arg$x)
return(arg$x*xi)
}
bla2 = function(arg) {
return(arg + 1)
}
# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])
# Create a cluster
# Fails
# Error in do.ply(i) : task 1 failed - "could not find function "ggplot""
cl = createCluster(2)
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
# Fails, pacakge is loaded, function 'bla2' is not
# Error in do.ply(i) : task 1 failed - "could not find function "bla2""
cl = createCluster(2, lib = list("ggplot2"))
res =ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
# Works! Also export the function 'bla2' and object 'y'
cl = createCluster(2, export = list("bla2","y"), lib = list("ggplot2"))
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
# Sanity check
all.equal(res, ddply(dat, .(category), bla, .parallel = FALSE))
# TRUE!
from plyr.
Duplicate of #84 (closing this one because there's more discussion there).
from plyr.
Related Issues (20)
- a_ply warning: duplicated names/ levels
- error in adply HOT 2
- "could not find function" error in shiny app when calling ddply
- error in *dply when column name is "" HOT 1
- Error in plyr function results in infinite loop while debugging
- plyr not showing up in CRAN repository (R Studio Server 1.1.383) HOT 2
- Error in `[.data.frame`(col, i) : undefined columns selected HOT 1
- x86_64-conda_cos6-linux-gnu-c++ not found HOT 1
- Problem with using summarise_all(n()) HOT 3
- Corner case error with plyr::count.
- floor not working for few values in round_any()
- R 3.6.0: .parallel=TRUE failures HOT 6
- aaply permutes indices
- adply .id ignored + unwanted columns
- adply() does not work with mean() or median() function
- NEWS.md is not on CRAN HOT 1
- loop_apply ?
- Release plyr 1.8.6
- Release plyr 1.8.9
- Question about parallelizing plyr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plyr.