Giter VIP home page Giter VIP logo

Comments (5)

jlmelville avatar jlmelville commented on June 13, 2024

Thanks for the report, there might be a bug here, but I'll need to do some checking.

Just to follow up on your last question now: if you use the the correlation distance, then the underlying Annoy calculation uses the cosine distance. This is because the correlation distance is equivalent to the cosine distance after mean-centering each row. So the annoy_metric bit is just an implementation detail.

from uwot.

mdrnao avatar mdrnao commented on June 13, 2024

Thanks for the swift reply and clarification! If there is anything I can do to assist, let me know.

I just double-checked the NN idx and dist output using the fresh and loaded-in versions of the same model in the transform function, and I do get the same indexed neighbours but the distances are different. The results should be the same and not a problem for the embeddings, but I was hoping to utilise the NN correlations.

the fresh model:

> umap.trans$nn$correlation$dist[1:5,1:5]
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.2502186 0.2512920 0.2519980 0.2521188 0.2528236
[2,] 0.2529842 0.2531742 0.2532540 0.2532705 0.2550313
[3,] 0.2392112 0.2398928 0.2402835 0.2411647 0.2420704
[4,] 0.2447240 0.2447956 0.2453944 0.2458417 0.2471814
[5,] 0.2545708 0.2578911 0.2581325 0.2582460 0.2583417

and the loaded in version:

> umap.trans2$nn$cosine$dist[1:5,1:5]
          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.7458150 0.7461820 0.7464244 0.7464666 0.7467052
[2,] 0.7494166 0.7494699 0.7495013 0.7495075 0.7500989
[3,] 0.7464465 0.7466761 0.7468034 0.7470980 0.7474063
[4,] 0.7483876 0.7484159 0.7486170 0.7487704 0.7492193
[5,] 0.7448720 0.7460172 0.7460974 0.7461340 0.7461688

if we look at the first few lines of the verbose output from the transform function with the former:

15:49:04 Setting model random seed 42
15:49:04 Read 36 rows and found 16824 numeric columns
15:49:04 Processing block 1 of 1
15:49:04 Annoy search: subtracting row means for correlation
15:49:04 Writing NN index file to temp file /tmp/user/444605590/RtmpIsZEeM/file26ee2758fd73
15:49:05 Searching Annoy index using 36 threads, search_k = 7500
15:49:05 Commencing smooth kNN distance calibration using 36 threads with target n_neighbors = 75

but with the latter:

15:49:24 Setting model random seed 42
15:49:24 Read 36 rows and found 16824 numeric columns
15:49:24 Processing block 1 of 1
15:49:24 Writing NN index file to temp file /tmp/user/444605590/RtmpIsZEeM/file26ee310a685b
15:49:25 Searching Annoy index using 36 threads, search_k = 7500
15:49:26 Commencing smooth kNN distance calibration using 36 threads with target n_neighbors = 75

from uwot.

mdrnao avatar mdrnao commented on June 13, 2024

For what it's worth - if I change the loaded in model nn_index metric to 'correlation' I restore the behaviour from a fresh model, returning correlation values from the transform function.

from uwot.

jlmelville avatar jlmelville commented on June 13, 2024

@mdrnao yes, this is definitely a bug and I just pushed a fix, so it will be fixed in the next release of uwot. Although I don't know if this is feasible for your workflow, doing what you did by changing the nn_index$metric back to correlation after a call to load_uwot would be a workaround until uwot is updated (there are some on-going dependency issues with the irlba package that may make submitting a new version potentially a bit painful until those get remedied).

Thank you for the assistance in tracking down what was happening and apologies for the oversight.

from uwot.

mdrnao avatar mdrnao commented on June 13, 2024

Ah thank you so much! The work around is absolutely fine for me.

Good luck with the new version, and thanks again for your support.

from uwot.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.