Giter VIP home page Giter VIP logo

Comments (6)

pabloppp avatar pabloppp commented on August 28, 2024 1

Just read this. That's what I have!

Okay, so maybe I was not able to understand the implementation... You're saying that if the user passes both an input and a key parameter to the model like decoder(yi, keys = enc_keys) then your implementation will apply masked self-attention to yi, then apply non-masked regular attention over enc_keys?

If that's so, it's really awesome, and then this issue makes no sense and you could close it.

Again, thanks for your awesome work!! I am currently using your library for a couple of projects and it definitely works like a charm.

from reformer-pytorch.

pabloppp avatar pabloppp commented on August 28, 2024 1

Oh, seems like a clever approach 😮 thank you!

from reformer-pytorch.

lucidrains avatar lucidrains commented on August 28, 2024

@pabloppp Good observations! In the past, for simplicity's sake, I have combined the self-attention and "regular" attention, into one layer, which has worked for me for a number of tasks, although I have not done a head to head comparison with the original architecture. To keep the readme concise, and to make the reversible architecture work (it only accepts two functions F and G), I kept it the simplified way. However, I wouldn't oppose trying to be more faithful to the original architecture, but it would require some decisions on the reversible net, and which 2 of the now 3 components to be slotted into F and G.

from reformer-pytorch.

lucidrains avatar lucidrains commented on August 28, 2024

(The most faithful reproduction would be to do self-attention + feedforward and then regular-attention + feedforward, alternating in that manner.)

from reformer-pytorch.

lucidrains avatar lucidrains commented on August 28, 2024

Would it make sense to allow the choice to combine self-attention (k == v == q), where the mask should be applied if we want it to be causal, with regular attention(q != k == v) using the passed keys, where causality no longer makes a lot of sense because we might want to be able to focus on a word at the end of the sentence if the language has a different word ordering? Just read this. That's what I have!

from reformer-pytorch.

lucidrains avatar lucidrains commented on August 28, 2024

@pabloppp yes, exactly, except they are done in one layer in the same attention matrix. each token attends to all tokens of the past as well as the enc_keys

from reformer-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.