Giter VIP home page Giter VIP logo

Comments (3)

syclik avatar syclik commented on June 18, 2024 1

@jsocolar, thanks for bringing this up. I'm putting up a minimal example in C++ to walk through it.

If you pop this into a file called test/unit/math/normal_lcdf_test.cpp, you'll be able to run it from the command line like: python runTests.py test/unit/math/normal_lcdf_test.cpp

#include <stan/math.hpp>
#include <gtest/gtest.h>

TEST(normal_lcdf, inf_y) {
  double y = stan::math::INFTY;
  stan::math::var mu = 0.0;
  stan::math::var sigma = 1.0;

  stan::math::var target = stan::math::normal_lcdf(y, mu, sigma);

  EXPECT_FLOAT_EQ(0.0, target.val());

  target.grad();

  EXPECT_FLOAT_EQ(0.0, mu.adj());
  EXPECT_TRUE(isnan(sigma.adj()));
  

  stan::math::recover_memory();
}

When y is infinite (and given that mu and sigma are finite, here ensured by declaring them as parameters), then the gradient is zero. However, it seems that Stan yields infinite gradients here.

Just curious... how'd you come to this conclusion? And which gradient did you think was "infinite"? (The gradient has two elements.) It'd be good to know how you're thinking about what the math library does and how it's generating gradients, especially at the boundary conditions.

To get into the example, if you look at the test:

  • d normal_lcdf(y, mu, sigma) / dmu = 0
  • d normal_lcdf(y, mu, sigma) / dsigma = NaN

(NaN does not equal infinity, but it doesn't mean that's good.)

I think you're expecting d / dsigma = 0? Is that right?

I didn't trace through to figure out why it's having trouble computing that term, but I'm sure it could be addressed. I'd lean towards throwing an exception at the boundaries, but I think that would change the behavior of Stan in a way that we'd have to have a larger discussion to implement.

Thoughts?

from math.

ASKurz avatar ASKurz commented on June 18, 2024

Thank you for opening this issue, @jsocolar.

from math.

jsocolar avatar jsocolar commented on June 18, 2024

@syclik Thanks for taking a look!

Just curious... how'd you come to this conclusion? And which gradient did you think was "infinite"? (The gradient has two elements.) It'd be good to know how you're thinking about what the math library does and how it's generating gradients, especially at the boundary conditions.

I concluded that the gradient should be zero because that's the well defined limit of the gradient with respect to both mu and sigma as y approaches infinity. I concluded (incorrectly I think) that Stan's gradient was infinite (correct conclusion: not finite) by running the Stan program from the OP via cmdstanr with

lcdf_mod <- cmdstan_model("/Users/JacobSocolar/Desktop/lcdf.stan")
lcdf_mod$sample(data = list(y = Inf))

and getting back a bunch of

Chain 1 Rejecting initial value:
Chain 1   Gradient evaluated at the initial value is not finite.
Chain 1   Stan can't start sampling from this initial value.

Based on your follow-up, I assume that the problem is the partial wrt sigma, that the NaN is the problem, and that when I said "infinite" I really meant "not finite".

The use case for returning zero here is given in the discourse thread linked from the OP. It shouldn't be particularly burdensome to code around the current behavior, which I agree isn't necessarily wrong, but if there's appetite on the Stan side for returning zero for the partial wrt sigma then that'd also solve the original discourse issue.

from math.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.