Giter VIP home page Giter VIP logo

Comments (7)

dementrock avatar dementrock commented on August 29, 2024

Hi @alexbeloi, the step size is computed according to the TRPO paper: https://arxiv.org/pdf/1502.05477v4.pdf. You can find the formula in Appendix.C.

How negative is the computed value of descent_direction.dot(Hx(descent_direction)), and can you describe more about your setup? This could happen if the code has a bug so that if you compute the mean KL is nonzero (or not sufficiently close to zero) before taking the step. We've also observed it sometimes happen with recurrent networks, although adjusting the nonlinearity seems to have solved it.

from rllab.

alexbeloi avatar alexbeloi commented on August 29, 2024

Hi @dementrock, it appears that mean KL is nonzero before taking the step because of something I'm doing. This issue came up when debugging the ISSampler with TRPO.

What I'm doing is taking (off-policy) stored paths, computing the agent_infos for those paths with respect to the current policy using _, agent_infos = policy.get_action(observations), and then those agent_infos get passed to old_dist_info_vars_list in the optimizer.

What I expected was that the on-policy agent_infos that I computed would be identical to the dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars) evaluated by the optimizer before taking the step, so kl = dist.kl_sym(old_dist_info_vars, dist_info_vars) would be zero before the step, but this isn't the case.

Is there a difference between agent_info computed from _, agent_infos = policy.get_action(observations) and the evaluation of dist_info_vars = policy.dist_info_sym(obs_var, state_info_vars) for obs_var evaluated at observations?

from rllab.

alexbeloi avatar alexbeloi commented on August 29, 2024

I feel there is some confusion on my part. Where does the NPO algorithm get values for old_dist_info_vars and dist_info_vars from?

from rllab.

alexbeloi avatar alexbeloi commented on August 29, 2024

Oh wow, super silly bug on my part. The last line of is_sampler.py should return samples not return paths. This was the root of the issue.

from rllab.

dementrock avatar dementrock commented on August 29, 2024

@alexbeloi Re difference between agent_infos and evaluating dist_info_vars: agent_infos may contain more entries than dist_info_vars, but for the common keys their values should be the same. Otherwise there is a bug somewhere.

Does replacing return paths with return samples solve the NaN issue?

from rllab.

alexbeloi avatar alexbeloi commented on August 29, 2024

@dementrock yes, that one line fix solves the NaN issue. I made a pull request with the patch and a (now working) example of TRPO with ISSampler.

from rllab.

dementrock avatar dementrock commented on August 29, 2024

Awesome, thanks!

from rllab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.