Thanks for sharing. This is an awesome work! In my opinion, compared

A related issue can be helpful for you: <a class="issue-link js-issue-link" data-error

Thanks for your reply! As I see <a class="issue-link js-issue-link" data-error-text="F

Sequential modeling problem about itransformer HOT 6 CLOSED

thuml commented on May 26, 2024

Sequential modeling problem

from itransformer.

Comments (6)

WenWeiTHU commented on May 26, 2024

Yes, MLP in our model is utilized to learn the series (as a token) representations, which are aggregated from past observations and projected into future predictions.

Position embedding is needed in the vanilla Transformer since the attention is permutation-invariant, however, MLP is not (as we mentioned in our paper:"since the order of sequence is already stored in the permutation of neurons of feed-forward network").

from itransformer.

WenWeiTHU commented on May 26, 2024

A visualization example: the sequential modeling is reflected by the permutation of MLP neurons.

from itransformer.

WenWeiTHU commented on May 26, 2024

A related issue can be helpful for you: #13

from itransformer.

blakezy commented on May 26, 2024

Thanks for your reply! As I see #13, you say 'distinguishing variate is not essential but it is essential to keep them independent'. Among the temporal dimensions, however, a well-accepted logic is p(x_t|x_{t-1}) or p(x_t|x_{1:t-1}) and they are not independent. Seems like this paper gives up modeling this kind of sequential dependency but focuses on the correlations between variables. Does it support that relations between variables are more significant?

from itransformer.

WenWeiTHU commented on May 26, 2024

Very interesting!

We think they are both important to achieve good MTSF performance. However, the relations of variates can be hardly considered in the vanilla Transformer. At the beginning of embedding, the variates are projected into the channels of embedding. It ignores the problem of inconsistent physical measurements and can fail to maintain the independence of variates, let alone capture and utilize the multivariate correlation, which is essential for forecasting with numerous variates, as well as in complicated systems driven by the latent physical process (such as meteorological systems).

In addition, even if the FFN and layernorm seem simpler than attention blocks, they are efficient and competent in learning the temporal dependency of a series, which can be traced back to statistical forecasters such as ARIMA and Holt-Winter. They also have no problems with inconsistent measurements since they work on the time points of the same variates, and have an enlarged respective field as the whole lookback series can be embeded as the variate token.

from itransformer.

blakezy commented on May 26, 2024

Good views. Thanks!

from itransformer.

Recommend Projects

Sequential modeling problem about itransformer HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent