Hello, I would like to ask the original transformer to distinguish variables by adding

How to distinguish different tokens about itransformer HOT 7 CLOSED

jiqizaisikao commented on May 27, 2024

How to distinguish different tokens

from itransformer.

Comments (7)

WenWeiTHU commented on May 27, 2024

It is really a good question. Of course, the original transformer distinguishes tokens by adding position coding. However, the tokens of the original transformer represent different time steps. It is only true in our inverted version that tokens represent different variates.

from itransformer.

WenWeiTHU commented on May 27, 2024

After understanding this, let us discuss the necessity of distinguishing the tokens. First, distinguishing time steps is essential because if the time points are permuted, it will lead to totally different predictions. So the original transformer must use the position embedding for attention to distinguish the time points as tokens. But in our inverted version, FFNs are applied to the time points. Since the order of time points is already reflected by the permutation of neurons of FFNs, the position embedding is no longer needed here.

from itransformer.

WenWeiTHU commented on May 27, 2024

On the other hand, let us imagine if the variates are permuted, it is only necessary to satisfy the result of the output of variable X corresponding to the input of variable X. So distinguishing variate is not essential but it is essential to keep them independent. In fact, the original transformer does not distinguish them and even first mixes them into indistinguishable channels. In our model, we just keep them as independent tokens. They can also correlate with each other in the invert attention and finally come back to themselves.

from itransformer.

jiqizaisikao commented on May 27, 2024

thank you for your reply,What happens if the variables are not independent of each other?

from itransformer.

WenWeiTHU commented on May 27, 2024

At least, when originally independent variables are mixed into indistinguishable feature channels, it can be difficult to explicitly find out the correlations among variables, which is important for Multivariate Time Series Forecasting.

from itransformer.

jiqizaisikao commented on May 27, 2024

You use the same MLP to tokenize different variables. After doing so, it is impossible to distinguish which variable the tokenization result comes from. The assumption you do this is that the same linear layer MLP can distinguish different variables, but in fact The above time series are similar and cannot be mapped to different tokens through the same MLP. What do you think? The original transformer distinguishes different variables through position embedding, because the tokens at different positions may be the same. Similarly, the time series of different variables may be similar, but you have not distinguished them.

from itransformer.

jiqizaisikao commented on May 27, 2024

In fact, you assume that the time series of different variables are themselves completely different, and the series itself can distinguish them

from itransformer.

How to distinguish different tokens about itransformer HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent