Giter VIP home page Giter VIP logo

Comments (5)

stokasto avatar stokasto commented on July 17, 2024

And I win for misspelling function in the issue title ;).

from async-rl.

etienne87 avatar etienne87 commented on July 17, 2024

@stokasto, probably not for 1. :

look closely at :

#L54 policy_output.py :

def sampled_actions_log_probs(self): return F.select_item( self.log_probs, chainer.Variable(np.asarray(self.action_indices, dtype=np.int32)))

I assume the function is equivalent to multiply by one-hot encoded target.

from async-rl.

stokasto avatar stokasto commented on July 17, 2024

@etienne87 Ah yes, that is fine then. As I wrote I did not overly carefully study the code during my re-implementation but just thought that I would bring these two issues up since (in case they are true) they are hard to detect and could be lurking in the code-base for a while.
Thank you for falsifying the first issue.

from async-rl.

muupan avatar muupan commented on July 17, 2024

Thank you for your comments.

  1. As @etienne87 wrote, F.select_item returns only the corresponding log probability of the selected action (which is stored in self.action_indices of SoftmaxPolicyOutput).
  2. float(advantage.data) is what makes sure the gradients won't flow throught v. advantage is a chainer.Variable, which can back-propagate gradients, but float(advantage.data) is literally just a python float and no gradients will be back-propagated through it.

from async-rl.

stokasto avatar stokasto commented on July 17, 2024

Hey, a great that explains it, I was wondering how the implementation could have converged to a reasonable value function if that error were indeed there.
As I said I am not experienced with chainer and so did not get this little interaction. My results also look fairly similar to the graphs you have posted. I'll upload it later and link to your implementation.
Feel free to close this issue.

from async-rl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.