Hey, first off: great work. I just re-implemented the paper myself using tensorflo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you for your comments. As <a class="user-mention notrans

Potential errors in the loss funktion about async-rl HOT 5 CLOSED

muupan commented on July 17, 2024

Potential errors in the loss funktion

from async-rl.

Comments (5)

stokasto commented on July 17, 2024

And I win for misspelling function in the issue title ;).

from async-rl.

etienne87 commented on July 17, 2024

@stokasto, probably not for 1. :

look closely at :

#L54 policy_output.py :

def sampled_actions_log_probs(self): return F.select_item( self.log_probs, chainer.Variable(np.asarray(self.action_indices, dtype=np.int32)))

I assume the function is equivalent to multiply by one-hot encoded target.

from async-rl.

stokasto commented on July 17, 2024

@etienne87 Ah yes, that is fine then. As I wrote I did not overly carefully study the code during my re-implementation but just thought that I would bring these two issues up since (in case they are true) they are hard to detect and could be lurking in the code-base for a while.
Thank you for falsifying the first issue.

from async-rl.

muupan commented on July 17, 2024

Thank you for your comments.

As @etienne87 wrote, F.select_item returns only the corresponding log probability of the selected action (which is stored in self.action_indices of SoftmaxPolicyOutput).
float(advantage.data) is what makes sure the gradients won't flow throught v. advantage is a chainer.Variable, which can back-propagate gradients, but float(advantage.data) is literally just a python float and no gradients will be back-propagated through it.

from async-rl.

stokasto commented on July 17, 2024

Hey, a great that explains it, I was wondering how the implementation could have converged to a reasonable value function if that error were indeed there.
As I said I am not experienced with chainer and so did not get this little interaction. My results also look fairly similar to the graphs you have posted. I'll upload it later and link to your implementation.
Feel free to close this issue.

from async-rl.

Recommend Projects

Potential errors in the loss funktion about async-rl HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent