Hi, Furthermore, may I ask if the gradients of policy and value are

Are policy and value gradients propagated back through the world model? about dreamer HOT 2 CLOSED

danijar commented on July 17, 2024

Are policy and value gradients propagated back through the world model?

from dreamer.

Comments (2)

danijar commented on July 17, 2024 3

The value loss is not propagated through multi-step predictions because we stop the gradients around the value targets as usual, turning it into a per-step loss.

The actor loss is the negative of the lambda returns. This is backpropagated through imagined sequences of multiple states.

Precisely, the gradient flows from the predicted value of a future state through the neural network value function, through the sequence of earlier states, through the sampled action, into the actor.

The stop gradient your pointing to makes sure it doesn't flow further. In other words, we only consider how a current action influences future states and their values. But we don't consider how a current action influences future actions.

Hope this helps. Regarding pcont, please reply to the previous ticket on that topic so we keep the discussion organized and easier for others to follow.

from dreamer.

xlnwel commented on July 17, 2024 1

Hi,

Thanks for your explanation. I see now that the gradient of the actor comes directly from the predicted values, which is different from the traditional policy gradient method. That makes sense now. By the way, I've moved my question about pcon to the previous issue.

from dreamer.

Recommend Projects

Are policy and value gradients propagated back through the world model? about dreamer HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent