Comments (5)
UMAP will tend pull outliers in. It will find extreme outliers, but this is not the approach you probably want. I think the 'outlier' notions in this gist are more what you are after. Ultimately this is a sort of co-UMAP (reverse the arrows) for clustering, and dual co-UMAP for outlier detection. I haven't written code to do all of this efficiently yet, but it is on my todo list.
from umap.
@lmcinnes so essentially first pre-filter with hdbscan and then apply umap?
from umap.
I think it really depends on what you are trying to do, but yes, something like that would represent something that bears similarities to Robust PCA. The again I think you really want some sort of regularized UMAP to do that properly. I would have to think about what that would mean/look like -- certainly an intriguing problem. Thanks for the ideas!
from umap.
Sometimes the outliers are so bad that it is hard to regularize them, just excluding is easier. For example that's why I like RANSAC regression more than regularizers for linear regression.
from umap.
That makes a lot of sense -- it does certainly depend on the data and your use case. At that rate filtering things out with hdbscan would probably work well.
from umap.
Related Issues (20)
- Setting a random state still leads to stochastic results
- Implementation of sciki-learn's get_feature_names_out() API is not correct
- Is 'n_training_epochs' working for parameteric UMAP?
- visualize video data
- How to combine UMAP models in new data?
- Edit instructions to make them compatible with zsh
- Empty API page on UMAP API Guide? HOT 1
- PCA diagnostic error HOT 2
- Speed inquries HOT 2
- UMAP crashes when torch also imported before first run HOT 2
- Unable to pickle trained UMAP instance
- Reducing Model Size for UMAP on Large Datasets HOT 2
- umap.UMAP accepts strings as n_neighbors and min_dist, causing later failures
- Optimal dimensions
- RunUMAP Failing HOT 1
- Semi-deterministic output even though randon_state is set
- TypeError: Dispatcher._rebuild() got an unexpected keyword argument 'impl_kind' HOT 1
- illegal hardware instruction python HOT 2
- Transform new input with composite model HOT 1
- Inquiry on Utilizing UMAP for Text Similarity and Clustering HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from umap.