Comments (2)
Came across this via Leland's twitter post btw.
from topometry.
Hi @sgbaird! Thank you for your interest in TopOMetry.
it might not be clear when to use UMAP vs. TopOMetry
I imagined people would consider reading the introduction, specifically pages 6-8 from our preprint. I'm not telling people to 'ditch UMAP'. TopOMetry assumptions on data structure are looser than UMAP's. We basically assume the k number of neighbors divided by the total number of samples approaches zero (i.e. data comprises a set of topological manifolds, that is, we can do calculus). When data topology is highly non-uniform, such as in biological information, TopOMetry yields greater details, such as in the PBMC68K example (Fig. 2 of the manuscript). Even in non-biological data, such as in Natural Language Processing, TopOMetry can better separate clusters and provide denoised affinity matrices for further clustering algorithms to be trained on. An important hint that data may fall outside UMAP's assumptions is if embeddings are too different.
A second point is TopOMetry is intended to be a comprehensive framework. Separate steps can be pipelined at the user's will (i.e. use only a first diffusion model and then a specific layout technique, or use the same model to duplicate any steps). I'm not saying the default workflows are necessarily the best, nor the best methods for approximating the LBO, they are only currently the best based on really solid mathematical ground. The idea is that TopOMetry works within a scikit-learn compatible workflow and that users can yield its approximate kNN, affinity learning, orthogonal decomposition, and layout optimization modules separately, at any possible combination, on their will. My intent is to allow the community to provide their thoughts and contributions and extensions on this initial work. After all, I did everything so far by myself.
Might consider adding to the docs and/or README
I'm indeed considering, as this was my first question after sharing the manuscript. Will do it this week, along with some new tutorials.
Came across this via Leland's twitter post btw.
Prof. Leland was very helpful in providing his insights and believing in me in the early stages of this project. I'm thankful he shared this. UMAP is seminal, groundbreaking work, and if I could see a little further it was by standing on the shoulders of giants.
from topometry.
Related Issues (11)
- Input parameter `X` in Diffusor.transform HOT 1
- Evaluating embedding based on how good the embedding explains a continuous/binary column
- PacMAP error HOT 1
- Conda Forge build HOT 1
- module 'topo' has no attribute 'sc' HOT 7
- paCMAP generate an error , HOT 1
- comparing classic umap and topometry HOT 2
- Map projection looks odd with high number of cells HOT 1
- library integration
- trustworthiness function missing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from topometry.