e.g. (from Paul Hiemstra) <div class="snippet-clipboard-content notranslate posit

Duplicate of <a class="issue-link js-issue-link" data-error-text="Failed to load title

Expose more foreach settings about plyr HOT 4 CLOSED

hadley commented on July 17, 2024

Expose more foreach settings

from plyr.

Comments (4)

PaulHiemstra commented on July 17, 2024

To fix this I would add a parameter foreachPars to l*ply functions. Within llply a small change would be need in the call to foreach:

result = foreach(i = seq_len(n)) %dopar% do.ply(i)

changes into:

foreachPars = c(i = seq_len(n), foreachPars)
result = do.call('foreach', foreachPars) %dopar% do.ply(i)

I think this should work in the way I intend it...the use would be:

ldply(dat, .(category), bla, .parallel = TRUE, foreachPars = list(.export = 'y'))

The advantage would be that all foreach parameters can be used in a call to ldply without adding all the parameters explicitely to ldply.

from plyr.

PaulHiemstra commented on July 17, 2024

I have a fix for this issue which involves no changes to plyr. Once a cluster is active, one can use clusterExport to load variables into the workers. An example:

library(ggplot2)

# Functions
createCluster = function(noCores, logfile = "") {
  require(doSNOW)
  cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
  registerDoSNOW(cl)
  return(cl)
}

bla = function(arg) {
  return(arg$x*y)
}

# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])

# Create a cluster
cl = createCluster(2)

# Fails
#   Error in do.ply(i) : task 1 failed - "object 'y' not found"
ddply(dat, .(category), bla, .parallel = TRUE)

# Works!
clusterExport(cl, list("y"))
ddply(dat, .(category), bla, .parallel = TRUE)

The same approach is possible for libraries (found this on stackoverflow):

clusterEvalQ(cl, library(boot))

I was far too much obsessed with solving this within foreach, while the solution was already there by loading stuff directly into workers :).

from plyr.

PaulHiemstra commented on July 17, 2024

The following example includes an extended version of 'createCluster' which supports passing on objects to export and libraries to load. It requires an adapted version of clusterExport because it needs to find the variable to be exported not in the .GlobalEnv, but in the environment of the function.

library(ggplot2)

# Functions
clusterExport = local({
  gets = function(n, v) { assign(n, v, envir = .GlobalEnv); NULL }
  function(cl, list, envir = .GlobalEnv) {
    ## do this with only one clusterCall--loop on slaves?
    for (name in list) {
      clusterCall(cl, gets, name, get(name, envir = envir))
    }
  }
})
 
# Functions
createCluster = function(noCores, logfile = "/dev/null", export = NULL, lib = NULL) {
  require(doSNOW)
  cl = makeCluster(noCores, type = "SOCK", outfile = logfile)
  if(!is.null(export)) clusterExport(cl, export)
  if(!is.null(lib)) {
    l_ply(lib, function(dum) { 
      clusterExport(cl, "dum", envir = environment())
      clusterEvalQ(cl, library(dum, character.only = TRUE))
    })
  }
  registerDoSNOW(cl)
  return(cl)
}

library(ggplot2)
library(doSNOW)
 
bla = function(arg) {
  dum = ggplot(aes(x = x, y = x), data = arg)
  summary(dum)
  xi = bla2(arg$x)
  return(arg$x*xi)
}
 
bla2 = function(arg) {
  return(arg + 1)
}
 
# Constants
y = 10
dat = data.frame(x = 1:10, category = LETTERS[1:10])
 
# Create a cluster
 
# Fails
# Error in do.ply(i) : task 1 failed - "could not find function "ggplot""
cl = createCluster(2)
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Fails, pacakge is loaded, function 'bla2' is not
# Error in do.ply(i) : task 1 failed - "could not find function "bla2""
cl = createCluster(2, lib = list("ggplot2"))
res =ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Works! Also export the function 'bla2' and object 'y'
cl = createCluster(2, export = list("bla2","y"), lib = list("ggplot2"))
res = ddply(dat, .(category), bla, .parallel = TRUE)
stopCluster(cl)
 
# Sanity check
all.equal(res, ddply(dat, .(category), bla, .parallel = FALSE))
# TRUE!

from plyr.

hadley commented on July 17, 2024

Duplicate of #84 (closing this one because there's more discussion there).

from plyr.

Expose more foreach settings about plyr HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent