Giter VIP home page Giter VIP logo

rtsne.multicore's People

Contributors

gfinak avatar pkharchenko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rtsne.multicore's Issues

Results not reproducible

Hi,
The documentation suggests that reproducible results can be achieved by setting the seed in R. This works for the original Rtsne function, but it doesn't seem to work for Rtsne.multicore.

library(Rtsne.multicore)

iris_unique <- unique(iris)
mat <- as.matrix(iris_unique[,1:4])

# repeat calculation
set.seed(42)
tsne_out1 <- Rtsne.multicore(mat)

set.seed(42)
tsne_out2 <- Rtsne.multicore(mat)

# plot results
plot(tsne_out1$Y, col=iris_unique$Species, main="first run")
plot(tsne_out2$Y, col=iris_unique$Species, main="second run")

first run
second run

Cannot install on Mac OSX 10.11.6 (15G1217)

> devtools::install_github("RGLab/Rtsne.multicore")
Downloading GitHub repo RGLab/Rtsne.multicore@master
from URL https://api.github.com/repos/RGLab/Rtsne.multicore/zipball/master
Installing Rtsne.multicore
'/Users/jespinoz/anaconda/lib/R/bin/R' --no-site-file --no-environ --no-save  \
  --no-restore --quiet CMD INSTALL  \
  '/private/var/folders/6z/5vbtz_gmkr76ftgc3149dvtr0003c0/T/RtmpCDQkG7/devtoolsd9a499115dc/RGLab-Rtsne.multicore-6789e40'  \
  --library='/Users/jespinoz/anaconda/lib/R/library' --install-tests

* installing *source* package ‘Rtsne.multicore’ ...
** libs
clang++ -I/Users/jespinoz/anaconda/lib/R/include -DNDEBUG -DROUT -fopenmp -I/Users/jespinoz/anaconda/include -I"/Users/jespinoz/anaconda/lib/R/library/Rcpp/include"   -fPIC  -I/Users/jespinoz/anaconda/include  -c RcppExports.cpp -o RcppExports.o
clang: error: unsupported option '-fopenmp'
make: *** [RcppExports.o] Error 1
ERROR: compilation failed for package ‘Rtsne.multicore’
* removing ‘/Users/jespinoz/anaconda/lib/R/library/Rtsne.multicore’
Installation failed: Command failed (1)

Verbose not printing

I am using the Rtsne.multicore package and I noticed that when I set verbose=T, the progress doesn't get printed until the whole run is complete, making it more like a "log" instead of "progress".

Example: Rtsne.multicore(as.matrix(unique(iris[, 1:4])), theta=0.01, verbose=T) runs very fast and prints the log after it's ran, but Rtsne.multicore(as.matrix(unique(iris[, 1:4])), theta=0.01, verbose=T, max_iter=1000000000) doesn't print anything at all when it's running. Since we know that each iteration runs lightning fast from the first run, it means that verbose is not being printed out while the program is running.

Rtsne.multicore gives different results from Rtsne in higher dimensions

Hi,
I've found Rtsne.multicore to be very useful in 2-dimensions; it produces embeddings that are very similar to those produced by the original Rtsne package. However, things seem to break down when moving to higher dimensions (in particular 3). The groupings produced by Rtsne.multicore aren't as coherent as those produced by Rtsne.

library(Rtsne)
library(Rtsne.multicore)

iris_unique <- unique(iris)
mat <- as.matrix(iris_unique[,1:4])

# run calculation 
set.seed(42)
tsne_out <- Rtsne(mat, dims=3)

set.seed(42)
tsne_out_multi <- Rtsne.multicore(mat, dims=3)

# plot results
pairs(tsne_out$Y, col=iris_unique$Species, main="Rtsne")
pairs(tsne_out_multi$Y, col=iris_unique$Species, main="Rtsne.multicore")

rtsne
rtsne multicore

Results differ based on number of threads

I observed that the results differ based on the number of threads specified.

In my application which used BH-SNE to create a 2D embedding followed by automated clustering using DBSCAN, I have replaced the single-threaded Rtsne call by a call to your multi-threaded Rtsne.multicore. This was nice&easy thanks to the similarity of both interfaces.

However, when I run the application, the results differ ever so slightly, as indicated below (just the first couple of points each time):
Using 1 thread

-4.3473001944841 -9.88816236259427
-0.264536173449281 2.26121958696939
-11.8037471711157 -1.23420653192463
18.5043209507443 -13.4638139443446
1.51823629529208 -27.2209786228982
8.44296382274354 11.5004388863181
17.0385503073606 -19.5842234534257
-1.80122124653633 -35.1542911986375
-14.9339466535662 11.4724805072396
-16.7179891732902 10.300907221322

Using 2 threads

-4.33102494052646 -9.94346771160292
-0.300330796745644 2.47627128482164
-14.4865548712467 3.83169546954971
18.0266761572745 -13.3481838170748
1.55009711170931 -27.3536683521347
8.57133969496983 11.704078885386
16.8146752705904 -19.4804761345993
-1.67702875389705 -35.6116919363096
-16.328562693303 10.9834569354747
-17.9212513482976 10.1738069116024

Using 3 threads

-4.15202535615338 -9.91628914440292
-0.266922842312901 2.30165398545058
-12.0458514750223 -1.26327092092668
18.3116039523395 -13.4472311793933
1.8728867702686 -27.0478452540983
8.21259960134093 11.338018514761
16.938103908809 -19.4664656504238
-1.51129210868152 -35.5926372619633
-15.7107052664802 10.622091607029
-16.9275577907434 10.5760540704756

Using 4 threads

-4.40493207317474 -10.2542865145978
-0.240311071414228 2.34386945654285
-11.613066543124 -1.22167721092907
17.978213066292 -13.6367838896947
1.68103298346623 -27.3950001130062
8.48320430773571 11.5841961868582
16.5975194709815 -19.6467988772466
-1.21063128661383 -35.6738754692542
-16.2962040171112 11.6000609166704
-16.4988660902924 10.7927849813962

The results using the same number of threads seems to be consistent between different runs, though - which is good at least :)

Using 1 thread - a second run

-4.3473001944841 -9.88816236259427
-0.264536173449281 2.26121958696939
-11.8037471711157 -1.23420653192463
18.5043209507443 -13.4638139443446
1.51823629529208 -27.2209786228982
8.44296382274354 11.5004388863181
17.0385503073606 -19.5842234534257
-1.80122124653633 -35.1542911986375
-14.9339466535662 11.4724805072396
-16.7179891732902 10.300907221322

And for all the points, computing the MD5SUM:

cat ./one_threads/one.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
2410c2539be68ffe1f52d1be0f04bfac  -
cat ./one_threads_old/one.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
2410c2539be68ffe1f52d1be0f04bfac  -
cat ./two_threads/two.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
1f7dd4212d74b162420c79e619b3b91b  -
 cat ./three_threads/three.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
f659b3527318c9545766fed14fc72daa  -
./four_threads/four.bin.embedding.tsv | awk '{print $1,$2}' | gmd5sum
0e7425b7acf3438d047fb1550bbd069f  -

While the differences are hard to spot by eye - I mean in a 2D scatterplot -, the automatic clustering is affected by the differences.

Your input is greatly appreciated!

Best,

Cedric

6): Symbol not found: ___kmpc_barrier- Openmp issue

Hi,
I am trying to install Rtsne.multicore on mac os Sierra . There are some problems while installing it. I know this is not a package problem but I am hoping you might have some insight into it. I installed clang compiler with openmp support but I am still getting this error when I try to install the package

devtools::install_github("RGLab/Rtsne.multicore")

The error is as follows:
Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/usr/local/lib/R/3.3/site-library/Rtsne.multicore/libs/Rtsne.multicore.so': dlopen(/usr/local/lib/R/3.3/site-library/Rtsne.multicore/libs/Rtsne.multicore.so, 6): Symbol not found**: ___kmpc_barrier** Referenced from: /usr/local/lib/R/3.3/site-library/Rtsne.multicore/libs/Rtsne.multicore.so Expected in: flat namespace in /usr/local/lib/R/3.3/site-library/Rtsne.multicore/libs/Rtsne.multicore.so Error: loading failed Execution halted

A solution at overflow gives some hints:
http://stackoverflow.com/questions/13715979/parallel-program-giving-error-undefined-reference-to-kmpc-ok-to-fork

Any suggestion is welcome!
Thanks!
Hena

Memory not mapped

> library(Rtsne.multicore) # Load package                  
> iris_unique <- unique(iris) # Remove duplicates          
> mat <- as.matrix(iris_unique[,1:4])                      
> set.seed(42) # Sets seed for reproducibility             
> tsne_out <- Rtsne.multicore(mat) # Run TSNE    

*** caught segfault ***
address 0x6541, cause 'memory not mapped'

Traceback:
1: .Call("_Rtsne_multicore_Rtsne_cpp", PACKAGE = "Rtsne.multicore", X, no_dims_in, perplexity_in, theta_in, num_threads, max_iter, distance_precomputed)
2: Rtsne_cpp(X, dims, perplexity, theta, num_threads, max_iter, is_distance)
3: eval(expr, pf)
4: eval(expr, pf)
5: withVisible(eval(expr, pf))
6: evalVis(expr)
7: capture.output(res <- Rtsne_cpp(X, dims, perplexity, theta, num_threads, max_iter, is_distance))
8: Rtsne.multicore.default(mat)
9: Rtsne.multicore(mat)

I have no idea how to inspect a core dump...

This happens only sometimes, I cannot confidently reproduce the error...
What further info should I provide?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.