Giter VIP home page Giter VIP logo

Comments (6)

WenWeiTHU avatar WenWeiTHU commented on May 26, 2024

Yes, MLP in our model is utilized to learn the series (as a token) representations, which are aggregated from past observations and projected into future predictions.

Position embedding is needed in the vanilla Transformer since the attention is permutation-invariant, however, MLP is not (as we mentioned in our paper:"since the order of sequence is already stored in the permutation of neurons of feed-forward network").

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 26, 2024

A visualization example: the sequential modeling is reflected by the permutation of MLP neurons.
image

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 26, 2024

A related issue can be helpful for you: #13

from itransformer.

blakezy avatar blakezy commented on May 26, 2024

Thanks for your reply! As I see #13, you say 'distinguishing variate is not essential but it is essential to keep them independent'. Among the temporal dimensions, however, a well-accepted logic is p(x_t|x_{t-1}) or p(x_t|x_{1:t-1}) and they are not independent. Seems like this paper gives up modeling this kind of sequential dependency but focuses on the correlations between variables. Does it support that relations between variables are more significant?

from itransformer.

WenWeiTHU avatar WenWeiTHU commented on May 26, 2024

Very interesting!

We think they are both important to achieve good MTSF performance. However, the relations of variates can be hardly considered in the vanilla Transformer. At the beginning of embedding, the variates are projected into the channels of embedding. It ignores the problem of inconsistent physical measurements and can fail to maintain the independence of variates, let alone capture and utilize the multivariate correlation, which is essential for forecasting with numerous variates, as well as in complicated systems driven by the latent physical process (such as meteorological systems).

In addition, even if the FFN and layernorm seem simpler than attention blocks, they are efficient and competent in learning the temporal dependency of a series, which can be traced back to statistical forecasters such as ARIMA and Holt-Winter. They also have no problems with inconsistent measurements since they work on the time points of the same variates, and have an enlarged respective field as the whole lookback series can be embeded as the variate token.

from itransformer.

blakezy avatar blakezy commented on May 26, 2024

Good views. Thanks!

from itransformer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.