Comments (5)
Thanks for the report, there might be a bug here, but I'll need to do some checking.
Just to follow up on your last question now: if you use the the correlation distance, then the underlying Annoy calculation uses the cosine distance. This is because the correlation distance is equivalent to the cosine distance after mean-centering each row. So the annoy_metric
bit is just an implementation detail.
from uwot.
Thanks for the swift reply and clarification! If there is anything I can do to assist, let me know.
I just double-checked the NN idx and dist output using the fresh and loaded-in versions of the same model in the transform function, and I do get the same indexed neighbours but the distances are different. The results should be the same and not a problem for the embeddings, but I was hoping to utilise the NN correlations.
the fresh model:
> umap.trans$nn$correlation$dist[1:5,1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.2502186 0.2512920 0.2519980 0.2521188 0.2528236
[2,] 0.2529842 0.2531742 0.2532540 0.2532705 0.2550313
[3,] 0.2392112 0.2398928 0.2402835 0.2411647 0.2420704
[4,] 0.2447240 0.2447956 0.2453944 0.2458417 0.2471814
[5,] 0.2545708 0.2578911 0.2581325 0.2582460 0.2583417
and the loaded in version:
> umap.trans2$nn$cosine$dist[1:5,1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] 0.7458150 0.7461820 0.7464244 0.7464666 0.7467052
[2,] 0.7494166 0.7494699 0.7495013 0.7495075 0.7500989
[3,] 0.7464465 0.7466761 0.7468034 0.7470980 0.7474063
[4,] 0.7483876 0.7484159 0.7486170 0.7487704 0.7492193
[5,] 0.7448720 0.7460172 0.7460974 0.7461340 0.7461688
if we look at the first few lines of the verbose output from the transform function with the former:
15:49:04 Setting model random seed 42
15:49:04 Read 36 rows and found 16824 numeric columns
15:49:04 Processing block 1 of 1
15:49:04 Annoy search: subtracting row means for correlation
15:49:04 Writing NN index file to temp file /tmp/user/444605590/RtmpIsZEeM/file26ee2758fd73
15:49:05 Searching Annoy index using 36 threads, search_k = 7500
15:49:05 Commencing smooth kNN distance calibration using 36 threads with target n_neighbors = 75
but with the latter:
15:49:24 Setting model random seed 42
15:49:24 Read 36 rows and found 16824 numeric columns
15:49:24 Processing block 1 of 1
15:49:24 Writing NN index file to temp file /tmp/user/444605590/RtmpIsZEeM/file26ee310a685b
15:49:25 Searching Annoy index using 36 threads, search_k = 7500
15:49:26 Commencing smooth kNN distance calibration using 36 threads with target n_neighbors = 75
from uwot.
For what it's worth - if I change the loaded in model nn_index metric to 'correlation' I restore the behaviour from a fresh model, returning correlation values from the transform function.
from uwot.
@mdrnao yes, this is definitely a bug and I just pushed a fix, so it will be fixed in the next release of uwot. Although I don't know if this is feasible for your workflow, doing what you did by changing the nn_index$metric
back to correlation
after a call to load_uwot
would be a workaround until uwot is updated (there are some on-going dependency issues with the irlba package that may make submitting a new version potentially a bit painful until those get remedied).
Thank you for the assistance in tracking down what was happening and apologies for the oversight.
from uwot.
Ah thank you so much! The work around is absolutely fine for me.
Good luck with the new version, and thanks again for your support.
from uwot.
Related Issues (20)
- Add general_simplicial_set_intersection to the uwot API HOT 3
- umap_transform causes R Studio to abort (R encountered a fatal error.) HOT 4
- umap_transform can give odd results with dens_scale HOT 17
- umap transform fuzzy graph HOT 3
- Citing {uwot} HOT 1
- Weird looking UMAP for spectral flow data HOT 3
- What C++ version should CXX_STD have? HOT 16
- devtools can no longer build uwot on Windows HOT 5
- I can not load the saved model, an example from the help HOT 9
- Reproducibility issue with the same data and OS HOT 17
- umap_transform with seurat umap coordinate HOT 2
- dgCMatrix_colSums' not found error when using a binary matrix HOT 2
- irlba as_cholmod_sparse problems HOT 1
- maybe unintended data use in examples HOT 2
- if batch = TRUE in training, fgraph of umap_transform is transposed if n_epoch is set to 0 HOT 4
- Found more than one class "dist" in cache HOT 4
- UMAP trustworthiness and continuity? HOT 1
- Error on installing uwot under Microsoft R-4.1.2 HOT 2
- Error installing uwot into R-4.3.3 of centos-7.9 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from uwot.