Giter VIP home page Giter VIP logo

Comments (9)

TKlerx avatar TKlerx commented on August 19, 2024

May I add some things?!

I have seen some double comparisons with a ==, which is not numerically stable, e.g.

if(a == b && a == x)

in distributions.Uniform:26 (and in many other lines).

I would change every occurrence in the code, but usually I use apache-commons-math.Precision.
Is it your goal to keep JSAT without any external dependencies?
References:
[https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/]
[https://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math3/util/Precision.html#equals%28double,%20double,%20int%29]
[http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#689]

In addition, I already corrected some java docs. You can see it in my dev branch. Hell, you wrote a lot of code. How long have you been developing JSAT?

from jsat.

EdwardRaff avatar EdwardRaff commented on August 19, 2024

May I add some things?!

Of course!

in distributions.Uniform:26 (and in many other lines).

In that case it is actually safe. if a == b && a == x is the case where the uniform distribution is over one infinitesimal point, which really doesn't make sense, but is needed because otherwise the (a <= x && x <= b) case would apply, and result in a NaN from a division by zero.

I'm generally aware of the dangers of floating point, but there are instances where such == checks are done intentionally, such as the case in the Uniform distribution. There are a few other instances of code smells in JSAT that are done intentionally, hopefully I did a better job documenting them then I did that one in Uniform.java

I already corrected some java docs.

Feel free to do a pull request on them!

Hell, you wrote a lot of code. How long have you been developing JSAT?

Been a free time project for just over 4 years now :) I'm very much a "learn by doing" kinda guy, so whenever I see a paper I'm interested in I just go and try and implement it when I have the time. Though I did have more free time before working for money :-/

from jsat.

TKlerx avatar TKlerx commented on August 19, 2024

Feel free to do a pull request on them!

I will try to do so, but I already added some things which I would say are necessary (apache-commons-math for safe double comparison; plan to add more, e.g. trove for primitive lists).
My plan is to create a Maven repo that contains my JSAT version so I also changed the pom.xml and will even add more stuff.
I will try to cherry pick the safe changes without adding dependencies etc.

I ran findbugs on the project and it found some bad double compares and x == Double.NaN checks.
What do you think about including some de facto standard libraries for such hassles?

from jsat.

EdwardRaff avatar EdwardRaff commented on August 19, 2024

As I said, some of those checks are done intentionally.

Right now I do not want to have any dependencies in JSAT.

Also, FYI - there is some code where you will experience a serious performance regression in using Trove instead of the code I have for primitive maps. (And some code that is on my local repo and not committed yet).

from jsat.

TKlerx avatar TKlerx commented on August 19, 2024

As I said, some of those checks are done intentionally.

I think I found some places where the checks should not be done but I will write a testcase first which may take some time. Will let you know when it's done.

Do you have any advice how to compare two doubles for equality in JSAT without an external library? I have an array of doubles and want to check whether they are all the same.

Right now I do not want to have any dependencies in JSAT.

Ok :(

Also, FYI - there is some code where you will experience a serious performance regression in using Trove instead of the code I have for primitive maps. (And some code that is on my local repo and not committed yet).

I would use them e.g. instead of List to prevent a lot of AutoBoxing (I had an algorithm where removing boxing/unboxing increased the algorithm by factor 2).

from jsat.

EdwardRaff avatar EdwardRaff commented on August 19, 2024

Feel free to email me the places where you have concerns and I will review them

You need to decide what level of "sameness" you need. In ML we usually don't care about the difference in value less than 1e-3, so doing abs(a-b) < 1e-3 should be good for most cases. In the Vec class there is a compare to method that takes a threshold on the absolute difference.

As I said, I have my own primitive collections in JSAT, I did not say to use generics.

from jsat.

TKlerx avatar TKlerx commented on August 19, 2024

As I said, I have my own primitive collections in JSAT, I did not say to use generics.

Ahh, found them by now. Is there a reason why you use List/List then? E.g. NaiveBayes.getSampleVariableVector (l. 388)

from jsat.

EdwardRaff avatar EdwardRaff commented on August 19, 2024

That code was written before I created the DoubleList class, and I missed changing that one.

from jsat.

semper-omnia-paratus avatar semper-omnia-paratus commented on August 19, 2024

I would separate abstract classes/interfaces from concrete implementations by putting them into sub-packages.

from jsat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.