Comments (7)
It is really a good question. Of course, the original transformer distinguishes tokens by adding position coding. However, the tokens of the original transformer represent different time steps. It is only true in our inverted version that tokens represent different variates.
from itransformer.
After understanding this, let us discuss the necessity of distinguishing the tokens. First, distinguishing time steps is essential because if the time points are permuted, it will lead to totally different predictions. So the original transformer must use the position embedding for attention to distinguish the time points as tokens. But in our inverted version, FFNs are applied to the time points. Since the order of time points is already reflected by the permutation of neurons of FFNs, the position embedding is no longer needed here.
from itransformer.
On the other hand, let us imagine if the variates are permuted, it is only necessary to satisfy the result of the output of variable X corresponding to the input of variable X. So distinguishing variate is not essential but it is essential to keep them independent. In fact, the original transformer does not distinguish them and even first mixes them into indistinguishable channels. In our model, we just keep them as independent tokens. They can also correlate with each other in the invert attention and finally come back to themselves.
from itransformer.
thank you for your reply,What happens if the variables are not independent of each other?
from itransformer.
At least, when originally independent variables are mixed into indistinguishable feature channels, it can be difficult to explicitly find out the correlations among variables, which is important for Multivariate Time Series Forecasting.
from itransformer.
You use the same MLP to tokenize different variables. After doing so, it is impossible to distinguish which variable the tokenization result comes from. The assumption you do this is that the same linear layer MLP can distinguish different variables, but in fact The above time series are similar and cannot be mapped to different tokens through the same MLP. What do you think? The original transformer distinguishes different variables through position embedding, because the tokens at different positions may be the same. Similarly, the time series of different variables may be similar, but you have not distinguished them.
from itransformer.
In fact, you assume that the time series of different variables are themselves completely different, and the series itself can distinguish them
from itransformer.
Related Issues (20)
- ValueError: could not convert string to float: '2020-01-01 00:20:00' HOT 1
- Error when using a Custom dataset with weekly frequency HOT 1
- Question: Support for Dynamic Categorical Inputs in iTransformer HOT 2
- 无法重现论文中的结果 HOT 2
- How to visualize the results? HOT 5
- Can't reproduce the result of PEMS03_96_96 task HOT 5
- Why not using Decoder-only Transformer?
- How to get the figures in the paper? HOT 1
- How to visualize the results? HOT 1
- 如何获得更优的模型参数 HOT 6
- './scripts/variate_generalization/Electricity/iTransformer.sh': No such file or directory HOT 1
- seq_len取值小于48时,代码无法运行 HOT 1
- 请问在M任务中如何指定预测目标。 HOT 1
- Here, are the following two lines redundant? batch_x = batch_x[:, :, partial_start:partial_end] batch_y = batch_y[:, :, partial_start:partial_end] HOT 1
- 有关使用或不使用.sh 文件的训练时间和内存使用率的问题 HOT 1
- CLS Token
- Fine-tuning?
- 位置编码 HOT 2
- data_loader.py文件的疑问
- 关于label_len参数的疑问 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from itransformer.