revolutionanalytics / foreach Goto Github PK

View Code? Open in Web Editor NEW

49.0 6.0 5.0 215 KB

R package to provide foreach looping construct

License: Apache License 2.0

R 100.00%

r foreach parallel-computing

foreach's People

Contributors

Stargazers

Watchers

Forkers

henrikbengtsson joskid mikmart simonpcouch sashahafner

foreach's Issues

Move packages into their own repo and update README?

Currently both the {foreach} and {iterators} pkg lives in the same repo, within a dir called pkgs.

This is quite uncommon for the R community and might scare some people away from contributing.

Also a markdown formatted README introducing the package could help making these projects more welcoming to potential contributors.

Parameter '.combine' does not give consistent results

I was expecting all of these to return a matrix, but the first one outputs a simple vector.

library(foreach)

foreach(i = 1, .combine = "cbind") %do% {
  i + 1:10
}

foreach(i = 1:2, .combine = "cbind") %do% {
  i + 1:10
}

do.call("cbind", foreach(i = 1) %do% {
  i + 1:10
})

.noexport not working as expected

Hi there,

The .noexport value within the foreach function doesn't seem to work correctly when both the function to loop over and the call to foreach itself live within various functions. Please see the example below.

library(dplyr)
library(foreach)
library(timetk)

# function to call foreach and run in parallel
submit_par <- function(iterator, fn) {
  
  cl <- parallel::makeCluster(3)
  doParallel::registerDoParallel(cl)
  
  temp <- foreach::foreach(i = iterator, 
                           .combine = 'rbind',
                           .packages = c("dplyr"),
                           .export = NULL, 
                           .errorhandling = "stop", 
                           .verbose = FALSE, 
                           .inorder = FALSE, 
                           .multicombine = TRUE, 
                           .noexport = c("combos")
  ) %dopar% {fn(i)}
  
  parallel::stopCluster(cl)
  
  return(temp)
}

# main function that takes some data and calls foreach
outer_fn <- function(input_tbl) {
  
  data <- input_tbl %>%
    dplyr::filter(date > "2010-01-01")
  
  combos <- unique(data$id)
  
  par_fn <- function(i) {
    
    df <- data %>%
      dplyr::filter(id == i)
    
    return(exists("combos"))
  }
  
  output <- submit_par(combos, par_fn)
  
  return(output)
}

# call to function with some example data
outer_fn(timetk::m4_monthly)

Here is what is returned from the outer_fn call

         [,1]
result.1 TRUE
result.2 TRUE
result.3 TRUE
result.4 TRUE

I'm expecting to see all FALSE values, since the object "combos" was specified in the ".noexport" argument so it shouldn't be exported to the cluster. But it turns out it is being exported. Is there anything I need to change with the above functions to ensure certain objects are not exported? I've played around with removing objects from specific environments within the "outer_fn" environment before calling foreach but that turns into a slippery slope if those objects need to be used in any way after the foreach call.

Thanks for your help! I'm trying to migrate my CRAN package (finnts) to run on spark using sparklyr. And there is a limit to the amount of data that can be serialized when running foreach in spark. So it's necessary I can easily remove specific objects from getting exported to compute clusters (either through doparallel or sparklyr).

How to combine 'foreach' with "shinybusy"?

Hi, thanks for creating this excellent package!

I want to add a progress bar for "foreach" progress for my shiny-app. However, the progress bar stopped (at around 88%) before reaching to the end. I couldn't figure out why this issue happened, could you help?
The following is a demo of the issue:
`
library(shiny)
library(shinybusy)
library(doParallel); registerDoParallel(cores = 4)

n <- 100
f <- function(n){
m <- 0
function(...) {
m <<- m + length(list(...)) - 1
Sys.sleep(0.1)
update_modal_progress(value = m/n)
}
}

ui <- fluidPage(
tags$h1("Modal with progress bar"),
actionButton("sleep1", "Launch a long calculation")
)

server <- function(input, output, session) {
observeEvent(input$sleep1, {
show_modal_progress_line()
result <- foreach(i = 1:n, .combine = f(n)) %dopar% {
rnorm(10)
}
remove_modal_progress()
})

}

shinyApp(ui, server)
`

Keep names in the output list

This is a feature request. It would be really nice to keep the names of the iterated list in the results. Here's an example.

xList <- list(a = 1, b = 2)
foreach(x = xList) %do% x^2

returns

[[1]]
[1] 1

[[2]]
[1] 4

but a named list similar to that returned by lapply() would be useful:

$a
[1] 1

$b
[1] 4

CONSISTENCY: %dopar% with doSEQ should evaluate expression in `local()` environment

I'd like to suggest that the doSEQ backend evaluates the %dopar% expression in a local() environment. This will help clarify that "global" assignments should not be made within %dopar% expressions. Currently, the latter is a common misunderstanding and one of the FAQs on foreach.

Details

Currently, we have that %dopar% falls back to using the doSEQ backend if no %dopar% backend is registered. Now, doSEQ evaluates the expression in the parent frame, which means that all assignments end up there, e.g.

> library(foreach)
> rm(a)
> y <- foreach(i=1:2) %dopar% { a <- i; i }
Warning message:
executing %dopar% sequentially: no parallel backend registered 
> a
[1] 2

This has the unfortunate side effect that users believe that doing assignments from within a %dopar% loop should work and as soon as they turn to real parallel backend their code no longer works.

My proposal is to have the doSEQ expression be evaluated in a local environment, effectively achieving something like:

> library(foreach)
> rm(a)
> y <- foreach(i=1:2) %dopar% local({ a <- i; i })
Warning message:
executing %dopar% sequentially: no parallel backend registered 
> a
Error: object 'a' not found

This can probably be implemented with something as simple as adding:

if (local) envir <- new.env(parent=envir)

to the top of doSEQ() + some care of providing a setting/argument local so that doSEQ can still be used for both %dopar% (local=TRUE) and %do% (local=FALSE).

BUG: getDoSeq() refers to non-existing field

getDoSeq() uses:

foreach/R/getDoSeq.R

Line 91 in 731cff6

list(fun=.foreachGlobals$seqFun, data=.foreachGlobals$seqdata)

However, there's no such seqdata field; it should be seqData.

R crash accessing object created in parallel, foreach()

I am moving to a new Azure VM and all of a sudden getting crashes and errors in crazy places I never have before. I don't know if this is related to foreach but if not I'm hoping you can point me in the correct direction. I've tracked down one spot where I can reproduce the problem with the following code

# load packages
library(foreach)
library(randomForest)
library(iterators)
library(parallel)
library(doParallel)

numCores <- detectCores() - 1
ntrees <- 8000
treeSubs <- ntrees/numCores
# initialize
cl <- makeCluster(numCores)
registerDoParallel(cl)
# dummy datasets
x <- as.data.frame(matrix(runif(100000), 20000))
y <- gl(2, 10000)

parRf <- foreach(ntree = rep(treeSubs,numCores), .combine = randomForest::combine,
                        .packages = 'randomForest', .multicombine = TRUE) %dopar%
                                randomForest(x=x, y=y,
                        importance=TRUE,mtry=2,ntree = ntree,
                        replace = TRUE
  )

z <- matrix(runif(1000), 200)

pred <- predict(parRf, z, type = "prob")

Notice it is the predict step that causes the failure, but when I make the randomForest call not in parallel, the predict step works fine. Or if I make the data sets smaller, it also works. In RStudio I get the grey "bomb" and in RGui it just disappears.

Here are some details of the crash report from the Windows Event Log. Is it indicating that memory registers are getting dropped and then the predict is trying to access them again? (and thus illegal access?)

Faulting application name: rsession.exe, version: 1.1.463.0, time stamp: 0x5bd11fb5
Faulting module name: randomForest.dll, version: 0.0.0.0, time stamp: 0x609f54bd
Exception code: 0xc0000005
Fault offset: 0x0000000000001b42
Faulting process id: 0x1e48
Faulting application start time: 0x01d752f21b6d7a79
Faulting application path: C:\Program Files\RStudio\bin\x64\rsession.exe

My session info:

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.16   iterators_1.0.13    randomForest_4.6-14 foreach_1.5.1      

loaded via a namespace (and not attached):
[1] compiler_4.0.5   tools_4.0.5      codetools_0.2-18
>

Thanks in advance for any tips.

Edit: I guess I should add that I asked over in SO, but the only reply so far is someone who can get the predict to run. Perhaps this person is using a different OS? (they haven't said).
https://stackoverflow.com/questions/67722541/r-crash-accessing-object-created-in-parallel-foreach

Again, thanks.

package the license file

The apache 2.0 license requires a license file to be packaged for end users but none is provided in the repo. Can one be added?

Foreach set comprehension

In his exciting talk at RStudio::Conf 2020 (https://resources.rstudio.com/rstudio-conf-2020/parallel-computing-with-r-using-foreach-future-and-other-packages-bryan-lewis), Bryan mentioned the similarity between foreach and haskells list comprehension and shows on his slides the example

foreach(1:10, j:10) %:%
  when(i < j) %dopar% {
    i + j
  }

However, in practice this produces an empty list because unlike in Haskell, foreach evaluates the expression not for each combination of i and j but only traverses the length of the shortest vector.

A shorter example:

foreach(i = 1:2, j = 1:2) %do% c(i, j)
#> [[1]]
#> [1] 1 1
#> 
#> [[2]]
#> [1] 2 2

But the corresponding list comprehension in Haskell does this:

[(i, j) | i <- [1..2], j <- [1..2]]
-- [(1,1),(1,2),(2,1),(2,2)]

Is this the intended behavior?

How to tell if you are already within %dopar% loop?

Just seeing if there is a sure fire way to determine if you are already within a foreach loop using %dopar%? As it is currently, I have a workaround function that parses sys.status() for any foreach calls that also use %dopar%-

isAlreadyInParallel = function() {
  status = sys.status()
  return(any(grepl(
    '%dopar%',
    status$sys.calls[grepl('foreach(', status$sys.calls, fixed = TRUE)],
    fixed = TRUE)))}

However, this feels pretty limited and doesn't cover the case in which someone uses a variable to hold %dopar% like below-

doFunction = if (foreach::getDoParRegistered()) `%dopar%` else `%do%`

doFunction(foreach(1:5), {...})

"socketselect" takes long time when the number of process increases.

Hi RevolutionAnalytics,

I find "socketSelect" takes long time when I make large number of process .
I am working on R4.0.3 in Windows10. The computer has 32 core (Ryzen 3970X).
Because the CPU have 32 core, I try to parallelize my code for 32 process.
However, when I register 32 process, socketSelect takes a long time.
It does not occur when I run the same code in Ubuntu in WSL2.
For your information, I attached screen shots of profvis that use 8, 12 and 32 process.

Thank you,
Tooshifumi

8 Process

12 Process

32 Process

Is it possible to save and register some previous parallel backend again later?

E.g. some better way to do something like:

library(doParallel)

registerDoParallel(cl1 <- makeCluster(4))
cl_save <- foreach:::getDoPar()$data

registerDoParallel(cl2 <- makeCluster(2))
foreach(ic = 1:10, .combine = 'c') %dopar% { ic }
stopCluster(cl2)

registerDoParallel(cl_save)
foreach(ic = 1:10, .combine = 'c') %dopar% { ic }

stopCluster(cl1)

params of unregister foreach, where they are in the env? (linting)

Hi hi, I'm working in a project, where is registered foreach, and then on exit unregistered:

  old_do_par <- doFuture::registerDoFuture()
  on.exit(
    with(
        old_do_par,
        foreach::setDoPar(fun = fun, data = data, info = info)
    ),
    add = TRUE
  )

Probable this code comes from: https://www.rdocumentation.org/packages/doFuture/versions/0.12.2/topics/registerDoFuture

And when we run linting we get:

lintr::lint_package(".")
file.R:124:58: warning: [object_usage_linter] no visible binding for global variable ‘info’
        foreach::setDoPar(fun = fun, data = data, info = info)

The point is clear, even with that definition, the fun, data and info are not declared everywhere, this vars belongs to the package foreach? or from where?

Thx!.

Passing .options causes eager evaluation of all arguments in foreach()

Passing any .options causes all arguments to foreach() to be evaluated eagerly in a context where that may not be appropriate:

library(foreach)

foreach(i = 1:3) %:% foreach(j = i:3, .options.foo = "bar") %do% seq(i, j) |> str()
#> Error in eval(expr, envir, enclos): object 'i' not found

# Expected result
foreach(i = 1:3) %:% foreach(j = i:3) %do% seq(i, j) |> str()
#> List of 3
#>  $ :List of 3
#>   ..$ : int 1
#>   ..$ : int [1:2] 1 2
#>   ..$ : int [1:3] 1 2 3
#>  $ :List of 2
#>   ..$ : int 2
#>   ..$ : int [1:2] 2 3
#>  $ :List of 1
#>   ..$ : int 3

Discovered via this StackOverflow question: https://stackoverflow.com/q/76291120/4550695

Spelling: At least three typos

✔   Release/foreach-1.4.8
> spelling::spell_check_package()
DESCRIPTION does not contain 'Language' field. Defaulting to 'en-US'.
  WORD              FOUND IN
cbind             foreach.Rd:44
doPar             getDoParWorkers.Rd:8,20,24,28,32
doSeq             getDoSeqWorkers.Rd:8,20,23,27,31
evalution         foreach.Rd:74
faciliates        foreach.Rd:134
familar           foreach.Rmd:458
icount            nested.Rmd:269
lapply            description:6
NetWorkSpaces     foreach-package.Rd:14
parallelization   foreach.Rd:134
                  nested.Rmd:244,283
parallelize       nested.Rmd:100,120,175,283
parallelizing     nested.Rmd:100
registerDoSEQ     registerDoSEQ.Rd:5
RMarkdown         foreach.Rmd:11
                  nested.Rmd:11
setDoPar          setDoPar.Rd:5
setDoSeq          setDoSeq.Rd:5
sinc              foreach-package.Rd:29
suboptimal        nested.Rmd:283
vectorized        foreach.Rmd:69

Using the future package to aid in determining symbols and packages to export

Background

Original implementation

In April 2018, in an unreleased version 1.4.6 of foreach, Rich Calaway introduced a new feature that used Henrik Bengtsson's future package (if available) to determine the appropriate symbols and packages to export to the foreach workers. In the initial implementation, future was used if it was installed and its namespace could be loaded via requireNamespace. Henrik responded with the following comments:

COMMENT #1: Surprised users

When future::getGlobalsAndPackages() will be used or not might
surprise users. Your proposal to use it when

requireNamespace("future", quietly=TRUE) == TRUE

risks confusing the end users and whoever helps troubleshooting issues
related to globals. They might get one result one day, and another
result the next day, just because the 'future' package happened to be
installed in between. The same if they run on different systems. If
someone runs into issues related to globals, the troubleshooting
counter questions have to involve: "Do you have the 'future' package
installed?" (unless they share a proper sessionInfo() that is).

One slightly less confusing condition would be to condition it on:

("future" %in% loadedNamespaces())

I've seen that style used in some packages. The user has to enable
the "feature" by making sure a package is loaded. However, that is
still not transparent to the end user since the 'future' package might
be loaded as a side effect by some other package. For instance, it
may be loaded when using registerDoSeq(), but then not when they retry
with registerDoParallel().

It might be better to let the user explicitly control this via an R option, e.g.

useFuture <- getOption("foreach.globalsAs", default = "future")

or default = "foreach". A middle ground, during your migration, is to
use something like:

useFuture <- getOption("foreach.globalsAs", default = NULL)
if (is.null(useFuture)) {
useFuture <- requireNamespace("future", quietly=TRUE)
if (useFuture) {
warning('foreach() will identify globals and packages using the
future package because that package is installed. To suppress this
warning set options(foreach.globalsAs = "future"). To use the
traditional approach of foreach for identifying globals, use
options(foreach.globalsAs = "foreach").')
} else {
warning('foreach() will identify globals and packages using the
foreach package because the alternative based on the future package
requires that future is installed. To suppress this warning set
options(foreach.globalsAs = "foreach"). To use future for identifying
globals, install that package.')
}
}

COMMENT #2: Maximum total size of globals

future::getGlobalsAndPackages() has a built-in protection for
exporting too large amounts of globals. It's currently set to 500
MiB, and currently only controlled via option
'future.globals.maxSize'. To minimize the surprise here, you might
wanna temporarily set this to +Inf to disable this check/assertion,
e.g.

gp <- local({
oopts <- options(future.globals.maxSize = +Inf)
on.exit(options(oopts))
future::getGlobalsAndPackages(ex, envir = env)
})

FYI, in the next release of the future package, you'll be able to do
future::getGlobalsAndPackages(..., maxSize = +Inf).

COMMENT #3: future+globals vs foreach (<= 1.4.5)

I've identified one use case where some people might say that the
'globals' package is too conservative. The case is when a variable is
global or local conditionally on some other variable/state, e.g.

{
if (runif(1) < 1/2) y <- 0
y
}

In future+globals, the above will NOT pick up 'y' as a global variable
(*), whereas in foreach (<= 1.4.5), it is identified as a global
variable (because you are using a more liberal approach).
Unfortunately, I don't this there is an easy solution to handle the
above ambiguous case, where a variable is global or local depending on
the run-time state, without adding significant processing overhead.
OTH, I think this is an oversight by the developer (or at least a
deliberate hack) and I don't mind "forcing" code to break in these
ambiguous cases since it's quite easy to make it non-ambiguous. So
far I've only identified one case of this in the wild:

example("avNNet", package = "caret", run.dontrun = TRUE)

so I think you shouldn't expect much reports on that.

(*) The main reason for future+globals failing to find 'y' here is
because it is a side-effect of its "ordered" code inspection for the
purpose of identifying 'x' as a global in { x <- x + 1 }. I'm
tracking this over at HenrikBengtsson/globals#31

First revised implementation

Rich revised the implementation in foreach 1.5.0/1.5.1, adding a check for a global option foreachGlobals to see if the user preferred the original foreach functionality (which found some global symbols but did not search for additional package exports), and expanding the symbol search to include the union of those found by future's getGlobalsAndPackages function and those found by foreach's original method.

This revision addressed issue #3 and part of #1, but did not address the size of globals problem.

Henrik responded with the following additional comments (and Rich responded with >RBC inline comments):

DOCUMENTATION: A user who sets options(foreachGlobals = "foreach")
may wonder how to undo that. From the code, it looks like one can do
this by unsetting the option, i.e. options(foreachGlobals = NULL). It
would help to document that too.

RBC --Agreed

DOCUMENTATION: It says "Beginning with foreach 0.5.0, foreach will
use the future package, if available, to automatically detect
needed packages." Without seeing the code, the term "available" is a
bit ambiguous to the user. Maybe write "..., if it is installed and
can be loaded, ..." instead. And clarify further by "If the future
package is not installed, then the identification of globals will be
done as if options(foreachGlobals = "foreach") was set."

RBC I'll work on this.

SURPRISE FACTOR: The silent, conditional requireNamespace("future")
behavior may still be confusing, surprising, ... and makes it hard to
troubleshoot. It's likely that there'll be comments like "weird, it
works for me" kind of discussions.

RBC -- Agreed, but that's part of the point--I want to get the benefit of future as seamlessly as possible.
RBC If users need to set an option, most users won't.

CLEANUP: To avoid loading 'future' when not needed, you might wanna
swap the order to if (!identical(getOption("foreachGlobals"),
"foreach") && requireNamespace("future", quietly=TRUE)){){ ... }

RBC: Good idea!

PREDICTABILITY: You could introduce options(foreachGlobals =
"future+foreach"), which if set, will produce an error if the 'future'
package is not installed. OTH, not sure what the current default
should be named - maybe "future+foreach-or-foreach"?

doFuture: If you could support/standardize on options(foreachGlobals
= "future+foreach") and possible also options(foreachGlobals =
"future"), then I could rely on that in doFuture rather than adding
another option.

RBC: Yes, I think I can do this.

ROBUSTNESS: I've also played around with the idea of something like
options(foreachGlobals = "manual") which will disable the automatic
identification of globals and rely solely on arguments .export and
'.packages`.

RBC: Another good idea.

Henrik responded to Rich's responses with the following additional comments:

I still argue its a bad idea that the behavior is conditioned on
whether you have a package installed or not. What is even worse is
that the dependent package (here 'future') may be installed at a
random time because it is installed together with some other package.
All of a sudden the behavior changes and the user has no idea why
because they did not change anything.

BTW, do you have a better place than an email thread where this can be
discussed and be properly documented? There's a great risk that
things are getting lost in nested email threads. I would love if you
would bring foreach to GitHub where the issue tracker provides an
excellent communication channel. I believe there's valueable feedback
from other developers that would reach you if you'd be on GitHub.
Also, I have two suggestions that I think would improve foreach a lot-
but I'll wait with those for now.

The second of these final comments is addressed with this issue; foreach has indeed been brought to GitHub and this issue is the first to be set into the issue tracker.

Current Development

Rich has handed off maintenance of foreach to Hong Ooi, but is leaving one final pass at the future integration as a branch richcala/foreachFutureTake3. This addresses most of the remaining issues--including the surprise factor. In this implementation, if the user is not using Microsoft R Open, the future option is added only if one of the foreachGlobals options "future+foreach", "foreach+future", or "future" is explicitly set. All of those act the same, however--the union of future and foreach previously implemented.

foreach package environmental error: "'rho' must be an environment not promise

I'm running an R script for Markov Chain Monte Carlo with parallelization on for loops using "foreach" R package. I have been trying to run it for about 20000 iterations. But, after several thousand iterations, it stops with the following error.

Error in { :
task 217 failed - "'rho' must be an environment not promise: detected in C-level eval"

It appears on different occasions within the for loop. For example, if I run the same code several times, at one time it may appear in 3000th iteration, but at another time it may appear in the 2000th iteration.

Here's a part of the code with the for loop that produced the error this time.

u1i_i <- foreach(j = 1:Ni, .combine = 'c', .inorder = T, .packages = "dplyr") %dopar% { df_i <- subset(df, pid == pt[j]) f_u1i(beta10 = beta10[i-1], beta11 = beta11[i-1], tau1sq = tau1sq[i], sigma1sq = sigma1sq[i], w1ij = w1ij_i_1[ni_l[j]:ni_u[j]], B1ij = B1ij_i_1[ni_l[j]:ni_u[j]], u1i = u1i_i_1[j], beta20 = beta20[i-1], beta21 = beta21[i-1], w2ij = w2ij_i_1[ni_l[j]:ni_u[j]], B2ij = B2ij_i_1[ni_l[j]:ni_u[j]], u2i = u2i_i_1[j], gamma_h01 = gamma_h01[i-1], gamma_h02 = gamma_h02[i-1], gamma1 = gamma1[i-1], a1 = a1[i-1], a2 = a2[i-1], b1 = b1[i-1], b2 = b2[i-1], y1ij = df_i$FEV1, x1ij = df_i[c("timeSince_t0")], ni = ni[j], df_t_Ti = df_t_Ti[ni_l[j]:ni_u[j],], Ti = s_df$timeSince_t0[j], omegai = s_df[j,"omegai"], deltai = s_df$event[j], sd_u1i_star = 0.1) }

f_u1i - function which takes values from parameters updated in the previous iteration (i-1) and in this iteration (i).

The complete code is too long to share.

My OS is, macOS Big Sur Version 11.3, processor is Apple M1 chip, and RAM is 8 GB.

WISH: Make it possible do undo a registerNnn() call

Background

Some package register dopar adaptors internally, e.g.

pkg_fcn <- function() {
  registerDoMC(4)
  ...
}

This, rather common, design pattern breaks whatever the user has previously set for whatever foreach purposes they have. For example, the following will not do what the user expects:

registerDoRedis()
y <- pkg_fcn()  ## <= silently registers another dopar adaptor
z <- foreach(x = 1:3) %dopar% { sqrt(x) }

Sometimes, these re-registrations happens deep down in the package dependency graphs making them really hard to locate.

Suggestion

Provide a way for package developers to temporarily register a dopar adapter using:

pkg_fcn <- function() {
  oldDoPar <- registerDoMC(4)
  on.exit(foreach::setDoPar(oldDoPar))
  ...
}

Vignette title errors

The vignette indices do not match the intended vignette titles, e.g.

title: Using the `foreach` package
author: Steve Weston
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{foreach}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{utf8}
---

There title and %\VignetteIndexEntry{} do no agree and the latter is what's used by R.

Depending packages have to be in userdirectory

Packages on which foreach depends have to be in user directory (under windows in C:\users...), instead of .libpath

Native support for a reproducible parallel RNG streams?

Currently, the {doRNG} package fills the gap for reproducible parallel streams in combination with the %dopar% operator.

@HenrikBengtsson and I were wondering if there ever was a discussion about an integrated support for this in the {foreach} package?

Currently, there are multiple ways to achieve this in R but none is really document well here or in {doRNG}. We are a bit worried about possible confusion for the end user and lack of documentation.

Would there be motivation/resources from your side to simplify things here?

cc @renozao

Proxy settings are not exported inside foreach block

Hello,

A call to a function, which uses proxy gateway that are defined in a global environment, is not working inside foreach block despite setting .export = ls(.GlobalEnv) and .packages = c("httr", "crul", "curl").

Though, I am able to execute the code successfully when I explicitly call 'crul::set_proxy(proxy(url))' inside foreach block. I would like to avoid setting proxy settings inside foreach block.

Can someone please suggest a workaround for this.

Thanks!

Update tooling

testthat over RUnit
roxygen2
Rmarkdown for vignettes
proper testing harness for reverse deps

%dopar% doesn't work but %do% works

foreach parallel computation %dopar% gives me an error "Error in { : task 1 failed - "$ operator is invalid for atomic vectors" when I run a function called "fitme" from R package "spaMM", but it works when I simply switch %dopar% to %do%. Note: I have registered a parallel backend.

Formula defined outside %dopar% cannot get variables in Global environment

Hi,

I ran into an error while performing analyses in parallel, with a formula defined before the %dopar% loop that needs to get variables in the global environment.
The following code throws an error:

x <- rnorm(100)
y <- 5 + 2 * x + rnorm(100, 0, .2)
form <- y ~ x

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:2, .packages = "mixmeta") %dopar% {
  model <- lm(form)
}
stopCluster(cl)

On the other hand, either putting the variables into a data.frame or defining the formula within the %dopar% loop work fine.

x <- rnorm(100)
y <- 5 + 2 * x + rnorm(100, 0, .2)
df <- data.frame(y = y, x = x)
form <- y ~ x

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:2) %dopar% {
  model <- lm(form, data = df)
}
stopCluster(cl)

x <- rnorm(100)
y <- 5 + 2 * x + rnorm(100, 0, .2)

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:2) %dopar% {
  model <- lm(y ~ x)
}
stopCluster(cl)

This is not a crippling issue as it is easy to find a workaround, but I thought you might want to know.

Thanks!

no visible binding for global variable

Hi,

I have the following code:

.getTSNEresults <- function(theObject, expressionMatrix, cores, PCs, 
		perplexities, randomSeed){
	
    PCAData <- prcomp(t(expressionMatrix))$x
    myCluster <- parallel::makeCluster(cores, type = "PSOCK")
    doParallel::registerDoParallel(myCluster)
	
    tSNECoordinates <- foreach::foreach(PCA=rep(PCs, length(perplexities)),
					perp=rep(perplexities, each=length(PCs)), .combine='cbind',
					.packages="SingleCellExperiment") %dopar% {
				
				listsce <- list(logcounts=t(PCAData[, 1:PCA]))
				sce <- SingleCellExperiment::SingleCellExperiment(
						assays=listsce)
				
				tsneCoord <- scater::runTSNE(sce, scale_features=FALSE,
                perplexity=perp, rand_seed=randomSeed, theme_size=13,
                return_SCESet=FALSE)
            scater::plotTSNE(tsneCoord)
        }

    parallel::stopCluster(myCluster)
    message("Calculated ", length(PCs)*length(perplexities), " 2D-tSNE plots.")
    return(tSNECoordinates)
}

When doing the R CMD check I get:

.getTSNEresults: no visible binding for global variable 'PCA'
.getTSNEresults: no visible binding for global variable 'perp'

I have tried to use global variables as follows:

.getTSNEresults <- function(theObject, expressionMatrix, cores, PCs, 
		perplexities, randomSeed){
	
    PCAData <- prcomp(t(expressionMatrix))$x
    myCluster <- parallel::makeCluster(cores, type = "PSOCK")
    doParallel::registerDoParallel(myCluster)
	
	utils::globalVariables(c("PCA", "perp"))
	
    tSNECoordinates <- foreach::foreach(PCA=rep(PCs, length(perplexities)),
					perp=rep(perplexities, each=length(PCs)), .combine='cbind',
					.packages="SingleCellExperiment") %dopar% {
				
				listsce <- list(logcounts=t(PCAData[, 1:PCA]))
				sce <- SingleCellExperiment::SingleCellExperiment(
						assays=listsce)
				
				tsneCoord <- scater::runTSNE(sce, scale_features=FALSE,
                perplexity=perp, rand_seed=randomSeed, theme_size=13,
                return_SCESet=FALSE)
            scater::plotTSNE(tsneCoord)
        }

    parallel::stopCluster(myCluster)
    message("Calculated ", length(PCs)*length(perplexities), " 2D-tSNE plots.")
    return(tSNECoordinates)
}

But it does not solve the problem. I am not sure that this is dependent on the package, my apologies if it is not.

Thanks for your help.

Bug? "verbose" argument changes output of foreach() function

Reproducible example:

v_out2 <- foreach(i = 1:nIter, verbose=TRUE, .combine = 'c') %do% {
set.seed(i)
data <- rnorm(1)
return( mean(data))
}
v_out2
[1] -0.6264538

I get the same for "verbose=FALSE". However, if I don't specify "verbose" at all I get:

v_out2 <- foreach(i = 1:nIter, .combine = 'c') %do% {
set.seed(i)
data <- rnorm(1)
return( mean(data))
}

v_out2
[1] -0.62645381 -0.89691455 -0.96193342 0.21675486 -0.84085548
[6] 0.26960598 2.28724716 -0.08458607 -0.76679604 0.01874617

as I should.

I am using the latest version from CRAN (foreach_1.5.2).

Add testing pipeline

Add link to Github repo to package DESCRIPTION

So users can find this repo here and eventually report feature requests/bugs :)

HenrikBengtsson/doFuture#41

foreach in packages and tests

Hi, actually I can't found doc, to how use this package in tests, seems it does not works very well with rcmdcheck.

foreach::foreach(x = iterators::iter(matrix, by = "col"), .combine = 'cbind', .packages = 'dplyr') %dopar% {
 x
}

Then the message:

❯ checking R code for possible problems ... NOTE
Undefined global functions or variables:
    x

We can use globalVariables(), but is a very bad idea, can even hide problems, I notice there is tests in this package and rcmdcheck works!, I would like to know how to use this package in packages, and ideally add the data to the docs plis.

Thx!

BUG: setDoSeq() attempts to remove the incorrect variables on error

Analogously, to setDoPar(), the setDoSeq() function attempts to undo partially set variables in case there's an error;

foreach/R/setDoSeq.R

Lines 31 to 47 in 731cff6

 setDoSeq <- function(fun, data=NULL, info=function(data, item) NULL) { 

 tryCatch( 

 { 

 assign('seqFun', fun, pos=.foreachGlobals, inherits=FALSE) 

 assign('seqData', data, pos=.foreachGlobals, inherits=FALSE) 

 assign('seqInfo', info, pos=.foreachGlobals, inherits=FALSE) 

 }, error = function(e) { 

 if (exists('fun', where=.foreachGlobals, inherits=FALSE)) 

 remove('fun', envir = .foreachGlobals) 

 if (exists('data', where=.foreachGlobals, inherits=FALSE)) 

 remove('data', envir = .foreachGlobals) 

 if (exists('info', where=.foreachGlobals, inherits=FALSE)) 

 remove('info', envir = .foreachGlobals) 

 e 

 }) 

 }

However, due to what looks like a cut'n'paste mistake from setDoPar(), setDoSeq() removes the wrong variables.

speed is not good as compared to sequence foreach?

I create df with 4 column each col have 10 crore rows in kaggle df then
foreach(...) when i run sequentailly takes 1.08 seconds
foreach(...) when i run multicore takes 31 seconds
why too slow in kaggle notebook?

	setDoSeq <- function(fun, data=NULL, info=function(data, item) NULL) {
	tryCatch(
	{
	assign('seqFun', fun, pos=.foreachGlobals, inherits=FALSE)
	assign('seqData', data, pos=.foreachGlobals, inherits=FALSE)
	assign('seqInfo', info, pos=.foreachGlobals, inherits=FALSE)
	}, error = function(e) {
	if (exists('fun', where=.foreachGlobals, inherits=FALSE))
	remove('fun', envir = .foreachGlobals)
	if (exists('data', where=.foreachGlobals, inherits=FALSE))
	remove('data', envir = .foreachGlobals)
	if (exists('info', where=.foreachGlobals, inherits=FALSE))
	remove('info', envir = .foreachGlobals)
	e
	})
	}

revolutionanalytics / foreach Goto Github PK

foreach's People

Contributors

Stargazers

Watchers

Forkers

foreach's Issues

Details

Background

Original implementation

First revised implementation

Current Development

Background

Suggestion

Recommend Projects

Recommend Topics

Recommend Org