It looks like the gradients of the y_mlp_out and all components involving y in the las

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

None gradients for 'y' layers about digress HOT 4 CLOSED

najwalb commented on September 28, 2024

None gradients for 'y' layers

from digress.

Comments (4)

cvignac commented on September 28, 2024 1

Hello,
all transformer layers take as input X, E and y. Even if the output dimension of y is eventually 0, y is still useful. The only thing that is not trained is mlp_out_y, that you can disable if you want.

For the regressor model in the conditional generation experiments on the contrary, the output dimensions of X and E are 0, but the output dimension of yis 1 or 2.

Clement

from digress.

haoming-codes commented on September 28, 2024

y is indeed not used for computing the loss. The input y to the transformer is the graph-level feature of the noisy_data, computed by compute_extra_data. The output y from the transformer is not used as input to the next denoising step.

from digress.

najwalb commented on September 28, 2024

@haoming-codes yes and this leads to the network layers using y to not be updated during training.

from digress.

najwalb commented on September 28, 2024

The part of the network transforming y in the last transformer layer (y_y, e_y, x_y) is also not training. But I get what you mean by 'y' is still useful, since it's at least incorporating time to the other variables in the network. Thanks for clarifying!

Best,

from digress.

None gradients for 'y' layers about digress HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent