Giter VIP home page Giter VIP logo

Comments (10)

janosg avatar janosg commented on June 14, 2024

Interface 1

the general_options get an an entry called "scaling" that can take tho following values:

  1. None: no scaling
  2. "custom": scale by the column "scaling_factor" in the param DataFrame (multiply in to_internal, divide in from_internal)
  3. "start_values": use inverse of absolute values of start parameters as scaling factor.
  4. "gradient": use inverse of absolute value of gradient at start values as scaling factor. We should calculate this gradient on the internal parameter vector and the stepsize should be quite large

In 3 and 4 we have to clip start values or gradients that are too close to 0 before taking the inverse

Interface 2

No entry in general options. If there is a 'scaling_factor' column in params, we apply it. Case 3 can be done in a one-liner anyways. For case 4 we could add a function.

Discussion

I think users would find it slightly more convenient if scaling is just an option and they don't have to call a separate function to calculate a scaling factor. But I prefer Interface 2 because it is much leaner and you can't by accident specify a scaling column that is then not used because 'scaling' was not set to "custom" in the general options.

from estimagic.

tobiasraabe avatar tobiasraabe commented on June 14, 2024

I have only little issues with both interfaces, but I am leaning more towards interface number one because I really like convenience and I think that is what makes estimagic superior to all other optimizers.

What I do not like about interface 2 is that...

  • "start_values" can be a one-liner for the user, but to come up with df['scaling_factor'] = 1 / np.abs(df['value'].clip(1)) I am sure most people get ZeroDivisionError first or think a lot about the numbers between [-1, 1]. estimagic can reduce this mental effort.
  • Doing the same as "gradient" requires the additional effort of looking up the function in the documentation, importing it, passing in all arguments.

I would make the following changes to interface 1

  • The options are None (default), False, "start_values" and "gradient".
  • False is for convenience to turn of the column in params. Can also be dropped as you can always do params.drop(columns="scaling_factor").
  • If an option except None is provided, it overrides the usage of the column in params.

from estimagic.

janosg avatar janosg commented on June 14, 2024

@hmgaudecker and @peisenha, maybe you can let us know which option you prefer?

from estimagic.

hmgaudecker avatar hmgaudecker commented on June 14, 2024

Guys, you are moving too fast for old folks who spend a couple of days away from their computers :-)

Looks good, but a couple of comments / suggestions.

  • I feel pretty strongly that the interface should be exhaustive, i.e., not depend on anything specified elsewhere. As a concrete suggestion, scaling could take on the values {None, "column_name", "start_values", "gradient"} where column_name would be any column in params_df. Any business that is non-exhaustive at that point (as is now implemented, IIRC, with a scaling_factor column overriding stuff or not) is prone to lead to confusion on behalf of the user. I see lots of conversations à la "I added that column and it does nothing" - "Yes, because you still specified 'gradient' in the options" on the horizon.
  • Specifically on the "gradient" option:
    • Where does the 0.01 come from? In the docs, please use 0.01 instead of 1e-2, scientific notation is not so common among economists
    • Did I miss a safeguard on the case where the gradient is essentially zero?
    • In general, I think you may want to add that this may make sense if you have no idea of your starting values but that it is not too useful if you are reasonably close to the optimum already (I am guessing wildly here)

from estimagic.

tobiasraabe avatar tobiasraabe commented on June 14, 2024

Just trying to clarify some things.

  • We had to abandon the custom scaling factor approach, because the scaling factor is added to the non-reparametrized version of the parameter vector and applying/recalculating this scaling to the reparametrized version does not work.
  • You are right on the 1e-2. Could be more intuitive.
  • What do you mean by "if the gradient is zero"? A gradient of zero is clipped to 0.01.
  • I like the last comment because I do not have any experience with scaling so some advice is very welcome.

from estimagic.

hmgaudecker avatar hmgaudecker commented on June 14, 2024
* We had to abandon the custom scaling factor approach, because the scaling factor is added to the non-reparametrized version of the parameter vector and applying/recalculating this scaling to the reparametrized version does not work.

Ah, okay, sorry I missed that. But what is different about this than bounds?

* You are right on the `1e-2`. Could be more intuitive.

So.... ?

* What do you mean by "if the gradient is zero"? A gradient of zero is clipped to 0.01.

Okay, then the docstring is misleading:

divides the parameter vector by the inverse of the gradient for each parameter not in ...

I would always interpret this as parameter not in as opposed to gradient not in, which apparently is what is going on.

* I like the last comment because I do not have any experience with scaling so some advice is very welcome.

I have only used the "start_values" version myself and intuitively, I do not fully see where the gradient scaling helps. I guess ideally, you would want to have the final values to be in the same ballpark range. Is that what gradient scaling wants to achieve? Some references on that one would be good.

from estimagic.

tobiasraabe avatar tobiasraabe commented on June 14, 2024
* We had to abandon the custom scaling factor approach, because the scaling factor is added to the non-reparametrized version of the parameter vector and applying/recalculating this scaling to the reparametrized version does not work.

Ah, okay, sorry I missed that. But what is different about this than bounds?

I am not sure how bounds are handled, and I guess that custom scaling factors are just weird after the transformation. @janosg knows probably more.

The rest of the issues should be solved in another PR.

from estimagic.

janosg avatar janosg commented on June 14, 2024

I googled a bit on the topic last week. Scaling based on the gradient seems to be more common than scaling based on start values because some optimizers work best when the gradient has similar magnitudes in all directions.

I think we forgot to handle bounds if scaling is used.

I suggest we postpone this issue until after my presentation next week. I have some ideas and will make a new proposal.

from estimagic.

hmgaudecker avatar hmgaudecker commented on June 14, 2024

Sounds good. I think that the docs should tell users that in both scaling cases one should use start values that are of roughly the order of magnitude one would expect (in case of the gradient because the step sizes might be very far off otherwise).

from estimagic.

janosg avatar janosg commented on June 14, 2024

Solved in #192

from estimagic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.