Giter VIP home page Giter VIP logo

Comments (6)

lmcinnes avatar lmcinnes commented on April 28, 2024 7

The complexity is essentially bounded by the nearest neighbor search phase. By making use of algorithms for approximate nearest neighbor graph construction I can get the scaling with input dimension down to essentially linear (effectively scaling with regard to the complexity of the distance computation). With regard to number of sample the scaling is a little harder to pin down (the algorithm I'm using, NN-descent, doesn't have a published theoretical complexity) and is, of course, dataset dependent (i.e. it will depend on properties of the distribution of the samples). Empirically I believe it should be something like O(d*n^1.14) or so. A better theoretical analysis of the complexity is definitely on my list of things to do, but its currently a long list.

from umap.

lmcinnes avatar lmcinnes commented on April 28, 2024 2

It certainly does, but it is an empirical estimate of an average case complexity. I have not seen any complete estimates of NNDescent complexity. Some work by Richard Darling and Jacob Baron demonstrate both some cases where NNDescent will fail (with complexity greater than O(N^2)), and classes of problem for which a complexity of O(N log N) is provable. The catch is that both are relatively theoretical cases: the first involves metrics that fail friend-of-a-friend principles, which isn't true for most of the metrics UMAP would be used with; the second is a fairly constrained class of problem that one is unlikely to see in practice (but for which complexity is at least provable).

So, given that worst cases do exist, I guess the real answer (though it is a highly deceptive one, and would equally apply to any algorithm that requires nearest neighbor computations) is O(N^2). For practical purposes it certainly empirically scales at O(N log N) and there are certainly problem classes for which that is the provable complexity, so perhaps that is a reasonable estimate?

from umap.

jc-healy avatar jc-healy commented on April 28, 2024 2

from umap.

matanox avatar matanox commented on April 28, 2024 1

The max statistic is useful in matching the human trait of taking assurance and building social trust by being able to commit to some amount with what they consider certainty. It's helpful for planning. Of course the finer view of considering the distribution of the complexity can be very useful in augmenting that approach.

If we have a particular way to frame the factors affecting this algorithm's runtime complexity beyond just the size of the input, or even testing/sampling the data for predicting the runtime, that might be pragmatically helpful.

from umap.

vmarkovtsev avatar vmarkovtsev commented on April 28, 2024

Thanks a lot!

from umap.

matanox avatar matanox commented on April 28, 2024

O(d*n^1.14) sounds almost too good to be true doesn't it?

from umap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.