Comments (7)
Well that was a lot more work than I intended. For the record (since others may face this, and so I will remember in the future) the issue is that numba very cleverly swaps out np.random calls for something lower level (to avoid roundtrips back to python I presume), and this does (may?) not play nice with setting a random seed for numpy. Once I worked out what the issue was and rewrote everything to deal with that the issue resolved itself nicely and we get something repeatable. I believe setting random_state
now works and should provide consistent embeddings (with a consistent random state).
from umap.
I agree, that would make a lot of sense. Ideally it isn't too hard, but has some quirks given how I am currently handling random number generation. It is certainly on my list of things to do (which is unfortunately long).
from umap.
It might be a good idea to publish a roadmap, community will be able to contribute!
from umap.
Sounds like a good plan -- any suggestions for where and how best to do that?
from umap.
Well, an issue on github will do. People will add comments, and you will be able to update the issue after each release.
from umap.
Basic random seed support is now in place via the random_seed
parameter which takes an int. Ideally things would work a little differently as per standard sklearn with a random_state that supports more input types (e.g. numpy random states) but that will take a little thought as to the best way to do that.
Edit: and it doesn't actually achieve the desired result :-( Not sure why though. It should provide slightly more consistency though.
from umap.
Okay, that helps more. I have a nagging feeling there will be more minor things like the eigenvector solver to track down if I want to truly eliminate variability.
from umap.
Related Issues (20)
- scipy.sparse._csparsetools.lil_get_lengths Error Running UMAP
- Not able to work with old embedder object created using python 3.8 HOT 1
- Setting a random state still leads to stochastic results
- Implementation of sciki-learn's get_feature_names_out() API is not correct
- Is 'n_training_epochs' working for parameteric UMAP?
- visualize video data
- How to combine UMAP models in new data?
- Edit instructions to make them compatible with zsh
- Empty API page on UMAP API Guide? HOT 1
- PCA diagnostic error HOT 2
- Speed inquries HOT 2
- UMAP crashes when torch also imported before first run HOT 2
- Unable to pickle trained UMAP instance
- Reducing Model Size for UMAP on Large Datasets HOT 2
- umap.UMAP accepts strings as n_neighbors and min_dist, causing later failures
- Optimal dimensions
- RunUMAP Failing HOT 1
- Semi-deterministic output even though randon_state is set
- TypeError: Dispatcher._rebuild() got an unexpected keyword argument 'impl_kind' HOT 1
- illegal hardware instruction python HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from umap.