I am trying to reproduce the tutorial <a href="https://learningjulia.com/2019/10/11/a-

Flux new explicit API does not work but old implicit API works for a simple RNN about flux.jl HOT 4 OPEN

liuyxpp commented on September 26, 2024

Flux new explicit API does not work but old implicit API works for a simple RNN

from flux.jl.

Comments (4)

ToucheSir commented on September 26, 2024

If you're ok with the initial state being non-trainable, then using one of the functions under https://juliadiff.org/ChainRulesCore.jl/stable/api.html#Ignoring-gradients on the reset! line should work. e.g. @ignore_derivatives Flux.reset!(model). Moving the call to reset! outside of the loss function would also do the trick.

from flux.jl.

liuyxpp commented on September 26, 2024

Ah, thanks! Can you explain more why does this fail for explicit mode but not implicit mode?

BTW, if I have extra data to train the initial state for each time sequence, how should I do that?

from flux.jl.

ToucheSir commented on September 26, 2024

I'm not sure why it fails. The RNN API is a weird one because it uses some of the implicit mode machinery even when you use explicit mode.

if I have extra data to train the initial state for each time sequence, how should I do that?

If you want to have separate initial states for each sample like you mentioned in #2185 (comment), the best bet would be to use the underlying RNN cell API (e.g. RNN -> RNNCell) and write your own loop over the timesteps. It'll be more manual work than using the Recur-based API, but it should just work and also avoid the MethodError shown above.

from flux.jl.

liuyxpp commented on September 26, 2024

Got that and I will report back once I figure it out. Many thanks!

from flux.jl.

Recommend Projects