Giter VIP home page Giter VIP logo

Comments (7)

WenWeiTHU avatar WenWeiTHU commented on May 27, 2024

It is really a good question. Of course, the original transformer distinguishes tokens by adding position coding. However, the tokens of the original transformer represent different time steps. It is only true in our inverted version that tokens represent different variates.

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 27, 2024

After understanding this, let us discuss the necessity of distinguishing the tokens. First, distinguishing time steps is essential because if the time points are permuted, it will lead to totally different predictions. So the original transformer must use the position embedding for attention to distinguish the time points as tokens. But in our inverted version, FFNs are applied to the time points. Since the order of time points is already reflected by the permutation of neurons of FFNs, the position embedding is no longer needed here.

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 27, 2024

On the other hand, let us imagine if the variates are permuted, it is only necessary to satisfy the result of the output of variable X corresponding to the input of variable X. So distinguishing variate is not essential but it is essential to keep them independent. In fact, the original transformer does not distinguish them and even first mixes them into indistinguishable channels. In our model, we just keep them as independent tokens. They can also correlate with each other in the invert attention and finally come back to themselves.

from itransformer.

jiqizaisikao avatar jiqizaisikao commented on May 27, 2024

thank you for your reply,What happens if the variables are not independent of each other?

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 27, 2024

At least, when originally independent variables are mixed into indistinguishable feature channels, it can be difficult to explicitly find out the correlations among variables, which is important for Multivariate Time Series Forecasting.

from itransformer.

jiqizaisikao avatar jiqizaisikao commented on May 27, 2024

You use the same MLP to tokenize different variables. After doing so, it is impossible to distinguish which variable the tokenization result comes from. The assumption you do this is that the same linear layer MLP can distinguish different variables, but in fact The above time series are similar and cannot be mapped to different tokens through the same MLP. What do you think? The original transformer distinguishes different variables through position embedding, because the tokens at different positions may be the same. Similarly, the time series of different variables may be similar, but you have not distinguished them.

from itransformer.

jiqizaisikao avatar jiqizaisikao commented on May 27, 2024

In fact, you assume that the time series of different variables are themselves completely different, and the series itself can distinguish them

from itransformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.