Giter VIP home page Giter VIP logo

Comments (40)

theboocock avatar theboocock commented on May 20, 2024 2

@jlmelville Yes! That fixed it. The patch added to rcpp-parallel on conda works for me.

conda install -c conda-forge r-rcppparallel

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024 1

FWIW, I directly ran valgrind, and RDCsan in the container provided by https://github.com/wch/r-debug (RDSan seems to not work well with building RcppEigen) and did not see anything flagged that wasn't already something that shows up in the CRAN checks for RcppAnnoy and RcppParallel.

A new version of uwot is now on CRAN with the two fixes unearthed in this issue.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Sorry you are having trouble. I am unable to reproduce the crash you give, even with R-devel installed. If you get a stacktrace, do you get the same error as in the other issues, i.e. memory not mapped when RcppParallel::setNumThreads() is called?

If so, what does RcppParallel::defaultNumThreads() say? My guess is that a non-integer value of n_threads < 1 is being passed in, which I have just discovered does seem to cause RccpParallel some grief.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

The current master might solve the issue. Please give it a try if possible and let me know.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

Yeah, looks nasty. Confirm that 0 < n_threads < 1 blows up on my mac. Probably RcppParallel::defaultNumThreads() is giving 1 on affecting machines so you get n_threads=0.5. Interesting that n_threads=0 works, I would have thought that there would be a cast to integer at some point such that a fractional value would get truncated to zero pretty quickly...

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Thanks for confirming the issue @LTLA. That's one problem solved, but is it the problem? I suppose with uninitialized memory anything is possible, although it's odd that it works once then fails, and that more than one person reports in satijalab/seurat#2256 that installing RcppParallel from conda-forge solves the issue. Some compiler difference that initializes the memory differently?

from uwot.

LTLA avatar LTLA commented on May 20, 2024

Hmm. The fact that it fails on the second go does suggest it's a memory leak of some sort rather than the n_threads problem (which always fails immediately for me). Valgrind gives me a whole stack of warnings if run on the OP's code, but they all relate to base::eval rather than anything in uwot.

The question is whether this is a memory leak in uwot or RcppParallel, given that the problem was "fixed" by reinstalling the latter from conda. Though given how much conda messes with the libraries, it feels like a house of cards to rely on that to solve this kind of problem.

It would be nice to see what happens if someone can run Valgrind on a machine where the above code crashes. Might be pretty painful to do on Windows, though.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

Dear @LTLA and @jlmelville ,

I just installed the latest uwot code, reinstalled RcppParallel and ran the code again....it now works!
Should not have reinstalled RcppParallel to confirm that changes in the uwot code did the trick and not the reinstallation of RcppParallel, I am sorry for that ;-)

RcppParallel::defaultNumThreads() gives 8, by the way (as it did before).
Just to let you know, I had been running the code setting different values for n_threads, but to no avail.

Thanks for all your help!!

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

I'm glad it's working now, but I am mystified as to why. Did you reinstall RcppParallel from CRAN or from conda forge?

Edited to add: if n_threads was being set manually, then I am even more baffled. Seems like I will have to do another check of the parallel code to make sure it's not calling an R API at any point before I submit a new version to CRAN.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

I reinstalled using CRAN (should have said so in previous comment). I am also mystified, but I am not that well versed in programming/tracing/debugging to be able to find the source of what went wrong...
And I don't know how ' RcppParallel' in R works together with RcppParellel in conda.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

I will note that in the following chunk of code:

out <- prcomp(as.matrix(iris[,-5]))

library(irlba)
out <- irlba(as.matrix(iris[,-5]), nu=1, nv=1)

library(Rtsne)
out <- Rtsne(as.matrix(iris[,-5]), check_duplicates=FALSE)

library(uwot)
iris_umap <- umap(as.matrix(iris[,-5]), pca = 50, n_threads=1)

# And a second time
iris_umap2 <- umap(iris[,-5], pca = 50, n_threads=1)

Only umap triggers Valgrind warnings. So it actually doesn't seem like a pure eval problem, there seems to be some interaction between something happening in uwot and eval.

The first message looks something like this:

> iris_umap <- umap(as.matrix(iris[,-5]), pca = 50, n_threads=1)
==20371== Invalid read of size 32
==20371==    at 0x7154C91: __wcsnlen_avx2 (strlen-avx2.S:62)
==20371==    by 0x7082EC1: wcsrtombs (wcsrtombs.c:104)
==20371==    by 0x7008B20: wcstombs (wcstombs.c:34)
==20371==    by 0x1BE142: wcstombs (stdlib.h:154)
==20371==    by 0x1BE142: do_makenames (character.c:938)
==20371==    by 0x238822: bcEval (eval.c:7041)
==20371==    by 0x24519F: Rf_eval (eval.c:688)
==20371==    by 0x246F4E: R_execClosure (eval.c:1852)
==20371==    by 0x247C44: Rf_applyClosure (eval.c:1778)
==20371==    by 0x23C1C4: bcEval (eval.c:7009)
==20371==    by 0x24519F: Rf_eval (eval.c:688)
==20371==    by 0x246F4E: R_execClosure (eval.c:1852)
==20371==    by 0x247C44: Rf_applyClosure (eval.c:1778)
==20371==  Address 0x1136db90 is 0 bytes inside a block of size 12 alloc'd
==20371==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==20371==    by 0x27A750: R_chk_calloc (memory.c:3422)
==20371==    by 0x1BE0D0: do_makenames (character.c:931)
==20371==    by 0x238822: bcEval (eval.c:7041)
==20371==    by 0x24519F: Rf_eval (eval.c:688)
==20371==    by 0x246F4E: R_execClosure (eval.c:1852)
==20371==    by 0x247C44: Rf_applyClosure (eval.c:1778)
==20371==    by 0x23C1C4: bcEval (eval.c:7009)
==20371==    by 0x24519F: Rf_eval (eval.c:688)
==20371==    by 0x246F4E: R_execClosure (eval.c:1852)
==20371==    by 0x247C44: Rf_applyClosure (eval.c:1778)
==20371==    by 0x23C1C4: bcEval (eval.c:7009)

Definitely cryptic enough to be a parallelization issue. Rtsne also parallelizes but via OpenMP, which is generally more restrictive so it's harder to accidentally put in R API calls.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Oops. Possibly maybe someone who shall remain nameless (spoiler alert: it's me) is calling the R random number generator from inside a thread? Fixing this isn't conceptually difficult, but requires a fair bit of typing (because it's C++) so might not get finished until later today.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

I forgot to tag the commit with this issue, but what's currently on master should hopefully behave. @LTLA, if you ever install from master, re-running valgrind would be an interesting exercise.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

master doesn't get rid of the valgrind warnings, but I did manage to track them down to find_ab_params(), most likely the stats::nls() call therein. Running umap() with specified a and b arguments avoids the warnings. This may well be a false positive, it's hard to believe that a base function would be compromised like that; I call nls all over the place in my own functions.

from uwot.

theboocock avatar theboocock commented on May 20, 2024

I am still having issues with this even with the new version on cran. I reinstalled everything and tried again. But same as before on the second run of the example I get a seg fault.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

Operating system?

from uwot.

theboocock avatar theboocock commented on May 20, 2024

It is a linux cluster, which unfortunately is running the 2.6 kernel. However, there doesn't seem to be any major issues with any other R package.

Linux n6426 2.6.32-754.14.2.el6.x86_64 #1 SMP Tue May 14 19:35:42 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

from uwot.

LTLA avatar LTLA commented on May 20, 2024

Do you have valgrind installed? 🀞

If so, could you try copying the OP's code into some file (e.g., test.R) and running:

R CMD BATCH --no-save -d valgrind test.R

and seeing what test.Rout gives? If you don't have valgrind installed, the top should just say that valgrind isn't available. If you do have it installed, it should have some blurb at the top with memcheck blah blah blah and then hopefully give some diagnostics before the crash.

Those diagnostics would be extremely helpful.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Also could you run

iris_umap <- umap(iris, pca = 50, verbose = TRUE, n_threads = 0)

twice in a row, as well as repeating twice with:

iris_umap <- umap(iris, pca = 50, verbose = TRUE, n_threads = 1)

and see if either makes a difference, providing the output for the second crashing run. Getting a clue to where the second crash occurs would be helpful (although I suspect the damage is already done at some point in the first run).

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Edit: there is an explicit check that the number of components does not exceed the number of columns in the input, so for iris, the pca = 50 argument should be able to be omitted without affecting the crash.
Also does omitting pca = 50 make a difference? For iris, that step should be skipped anyway (I think β€” I’m away from my computer at the moment) because the input data doesn’t have sufficient rank to extract 50 components. It would be good to get a minimal reproducible example.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

Reinstalling RcppParallel from CRAN after you have installed uwot etc? It seemed to help for me...

from uwot.

LTLA avatar LTLA commented on May 20, 2024

If @theboocock has valgrind installed, please try running our suggested commands above before attempting reinstallation. This would be a rare opportunity to identify the problem on a known failing machine and to fix it once and for all - such chances are hard to come by.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

@LTLA , you're completely right!! Apologies for suggesting reinstallation BEFORE checking....

from uwot.

theboocock avatar theboocock commented on May 20, 2024

Hey all,

Reinstalling never does anything for me anyways. Here is the valgrind error. Seems like it is coming from libtbb.

Is this post relevant https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/641654?


> 
> library(irlba)
Loading required package: Matrix
> out <- irlba(as.matrix(iris[,-5]), nu=1, nv=1)
> 
> library(Rtsne)
> out <- Rtsne(as.matrix(iris[,-5]), check_duplicates=FALSE)
> 
> library(uwot)
> iris_umap <- umap(as.matrix(iris[,-5]), pca = 50, n_threads=1)
==264614== Invalid read of size 8
==264614==    at 0x1BB2A5E8: ??? (in /u/project/kruglyak/smilefre/anaconda3/lib/R/library/RcppParallel/lib/libtbb.so.2)
==264614==    by 0x1BDAB1FF: ???
==264614==    by 0x1BDC757F: ???
==264614==  Address 0xfffffffffffffff7 is not stack'd, malloc'd or (recently) free'd
==264614== 

 *** caught segfault ***
address 0xfffffffffffffff7, cause 'memory not mapped'

Traceback:
 1: RcppParallel::setThreadOptions(numThreads = n_threads)
 2: uwot(X = X, n_neighbors = n_neighbors, n_components = n_components,     metric = metric, n_epochs = n_epochs, alpha = learning_rate,     scale = scale, init = init, init_sdev = init_sdev, spread = spread,     min_dist = min_dist, set_op_mix_ratio = set_op_mix_ratio,     local_connectivity = local_connectivity, bandwidth = bandwidth,     gamma = repulsion_strength, negative_sample_rate = negative_sample_rate,     a = a, b = b, nn_method = nn_method, n_trees = n_trees, search_k = search_k,     method = "umap", approx_pow = approx_pow, n_threads = n_threads,     n_sgd_threads = n_sgd_threads, grain_size = grain_size, y = y,     target_n_neighbors = target_n_neighbors, target_weight = target_weight,     target_metric = target_metric, pca = pca, pca_center = pca_center,     pcg_rand = pcg_rand, fast_sgd = fast_sgd, ret_model = ret_model,     ret_nn = ret_nn, tmpdir = tempdir(), verbose = verbose)
 3: umap(as.matrix(iris[, -5]), pca = 50, n_threads = 1)
An irrecoverable exception occurred. R is aborting now ...
--264614-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--264614-- si_code=1;  Faulting address: 0x20000038;  sp: 0x402efbf50

valgrind: the 'impossible' happened:
   Killed by fatal signal
==264614==    at 0x38047487: vgPlain_get_StackTrace_wrk (m_stacktrace.c:334)
==264614==    by 0x3804756B: vgPlain_get_StackTrace (m_stacktrace.c:1086)
==264614==    by 0x3802F82E: record_ExeContext_wrk (m_execontext.c:314)
==264614==    by 0x38002A84: die_and_free_mem (mc_malloc_wrappers.c:361)
==264614==    by 0x3807A59A: vgPlain_scheduler (scheduler.c:1665)
==264614==    by 0x3803B63E: final_tidyup (m_main.c:2656)
==264614==    by 0x3803B767: shutdown_actions_NORETURN (m_main.c:2457)
==264614==    by 0x380A656B: run_a_thread_NORETURN (syswrap-linux.c:199)

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable```

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

@theboocock, thank you for running valgrind. Do you know if you are running any other packages that use RcppParallel? There are definitely some similar issues with memset and gcc6 but I'm loath to prematurely put the blame on TBB.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

Well, it looks like it isn't even hitting uwot's C++ code, so it's hard to blame anything else... An even simpler test would be whether running RcppParallel::setNumThreads() triggers the error, i.e.,

# valgrind me:
library(RcppParallel)
setThreadOptions(numThreads = 1)

If so, that seems like a slam dunk, though the use of conda does complicate matters.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

@LTLA, seeing as we get through one run without a crash, is it possible that uwot just stomps all over some memory that RcppParallel or tbb is using? Seems like it's hard to completely rule out uwot being the villain. I'll have a look at finding a container with gcc6 in it and see if it can be reproduced.

from uwot.

LTLA avatar LTLA commented on May 20, 2024

I was looking at @theboocock's valgrind output above, where umap fails the first time it runs. (The difference from a non-valgrind context is expected.) Either that, or I've had one too many G&T's.

It's also possible that irlba() or Rtsne() are doing something Bad... which would be even more concerning. The minimal example would be clarifying. So, either just:

# Put into test1.R with nothing else, and run under valgrind:
library(RcppParallel)
setThreadOptions(numThreads = 1)

Or, if the above doesn't trigger the error, then:

# Put into test2.R with nothing else, and run under valgrind:
library(uwot)
iris_umap <- umap(as.matrix(iris[,-5]), pca = 50, n_threads=1)

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

I wonder if benjjneb/dada2#684 is a related problem? There are some suspicious similarities.

from uwot.

theboocock avatar theboocock commented on May 20, 2024
library(RcppParallel)
setThreadOptions(numThreads = 1)

Triggers the error for me. I am going to try the dada2 solution now.,

from uwot.

theboocock avatar theboocock commented on May 20, 2024

This seems like the key piece
in build.sh


if [[ $target_platform =~ linux.* ]]; then
  # The vendored TBB library adds compile-time flags based on a probe of gcc,
  # this little "hack" ensures that the `gcc` executable is available when
  # TBB is built.
  mkdir $PWD/hack
  export PATH="$PWD/hack:$PATH"
  ln -s $CC $PWD/hack/gcc
  chmod +x $PWD/hack/gcc
fi

from uwot.

LTLA avatar LTLA commented on May 20, 2024

So the takeaway is that if you're running R under conda, you should be installing RcppParallel via conda as well? Not the most intuitive outcome, but tolerable. Possibly another thing to throw into the README; maybe it's worth having an entire section on "Known problems" along with the .Rprofile issue.

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Yes, I was hoping to work out if there is a lesson learned here. I don't want to mislead anyone. I don't have any experience using conda for R packages, just Python, and I have no knowledge of bioconda. Is it safe in general to mix CRAN and conda packages or is that always ill-advised?

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

@aldojongejan, you mentioned that you reinstalled RcppParallel from CRAN to fix the issue. Do you know if you had previously installed from conda? Or if you had a mix of conda-installed and CRAN-installed packages?

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

I worked with the developers version of R to get the latest version of Seurat and SingleR working. Guess that that also installed RcppParallel. Then, as a possible fix for my problem, I installed via Conda as suggested here (cole-trapnell-lab/monocle3#186). That didn't solve it for me, only when I later reinstalled RcppParallel from CRAN again (removing the RcppParallel directory from the 'library' folder etc.)... I should have paid more attention and documented the exact steps...

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

@aldojongejan, no problem, thank you for opening the issue here in the first place.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

Ha ha, opening the issue was not a real problem ;-)
I am sorry, I couldn;t be of more help, and I really appreciate all the work you guys put into helping me out and solving the problem!

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

uwot 0.1.8 removes the dependency on RcppParallel so hopefully these problems are gone.

from uwot.

aldojongejan avatar aldojongejan commented on May 20, 2024

from uwot.

jlmelville avatar jlmelville commented on May 20, 2024

Hopefully this is solved. Closing.

from uwot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.