Giter VIP home page Giter VIP logo

itransformer's People

Contributors

eltociear avatar kashif avatar wenweithu avatar zdandsomsp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

itransformer's Issues

PEMS dataset

I notice that PEMS datasets include 4 parts: PEMS03, PEMS04, PEMS07, PEMS08. Could you please tell me which dataset was used in the original paper's experiment? Thanks!

How to deal with spatial information?

I've observed that your model architecture is excellently designed to facilitate multi-agent interactions. However, it appears that current models are limited to handling scalar values. Could you elaborate on how your system can manage information based on XYZ coordinates?

Regarding the figures in the article

![0a44f215308c893ddd6ebc228675cd1]
Hello, I am a beginner. May I ask how the heat map about attention mechanism in the article is drawn, and how can I create variables for drawing images?Could you please give me an example?

How to build comparison models such as MLP, LSTM in the framework?

Hi,
I was thinking of adding a comparison of some basic models such as MLP and LSTM.
But I encountered some issues with tensor dimensions when changing the model.

For example, the following is not working:

class MLP(nn.Module):
    def __init__(self, seq_len, pred_len):
        super(MLP, self).__init__()
        self.seq_len = seq_len
        self.pred_len = pred_len
        self.mlp = nn.Sequential(
            nn.Linear(self.seq_len, 110),
            nn.GELU(),
            nn.Dropout(0.1),
            nn.Linear(110, 60),
            nn.GELU(),
            # nn.Dropout(0.05),
            nn.Linear(60, self.pred_len)
        )

    def forward(self, x):
        x = x.view(-1, self.seq_len)
        return self.mlp(x)

Thank you very much for checking this basic question.

Why does the time scale affect prediction accuracy?

Hello,thanks for this excellent work!
I used iTransformer to train on my own data set (univariate,no 'date'column),and the results were much better when the dataset was spaced 1 second apart than 1 hour,what could be the reason?Here is a comparison of the results:
1s:mse:0.05796017870306969,mae:0.1638924479484558
1h:mse:0.4430641829967499,mae:0.46550384163856506
Thanks again.

Memory Footprint?

Dear Authors,
Your research is highly interesting. It was a great finding for time-series forecasting. We are very interested in this. However, regarding the memory footprint mentioned in figure 10 of the paper, we are a bit confused about how it is measured. Is there any code available to measure this? Or can you provide some instructions on how we can measure this? For example, does it include GPU-only memory consumption or GPU+RAM memory consumption?

Thank you in advance.

Univariate forecast with exogenous variables

Hi there, thanks for the excellent work. I wonder if the model is able to perform univariate forecasting with exogenous variables. It seems no to me at first glance. Any insights would be appreciated.

forecasting 中的inverse 问题

Test过程中,如果设置了inverse为True,执行到下面的代码时

if test_data.scale and self.args.inverse:
                    outputs = test_data.inverse_transform(outputs)

会报错:ValueError: Found array with dim 3. None expected <= 2.
原因是使用的sklearn中的StandardScaler只能接受2维数组导致,大佬空了改一下吧。

Dimension issue for backtesting

Hi! I really appreciate your great work!

I am using it to test on my custom data. I have changed Dataset_Custom class in data_loader.py. However, I have a question about the role of seq_len, pred_len and label_len. Can you please explain it?

And I want to train the model in a sliding window way, say train on data[i:i + window_size] and test on data[i + window_size + 1]. How can I set my seq_len, pred_len and label_len? Thank you very much!

关于数据集OT列的疑惑

您好,想请问一下csv文件中的OT列含义是什么?是作为多输入单输出的一个目标值吗?

关于输入数据的标准化处理

官方例程中,self.scale和self.use_norm默认都是开启的,这样在data_loader.py中读取数据时会调用一次self.scaler.transform做数据标准化处理,之后在iTransformer.py的forecast中还会再做一次Normalization,请问是否重复了呢,还是说有其他考量?谢谢

基础配置问题

作者您好!
1.在requirement中,sklearn包这样表述我拉到colab里运行是下载不了这个包的,我在服务器上手动改为了scikit-learn的写法成功加载了,希望作者能在requirement中修正。
2.希望提供适配此开源代码的pytorch、CUDA以及cuDNN的版本号,以便于复现者更快部署本地环境,感谢!

预测结果很差?

你好,

我用命令bash ./scripts/multivariate_forecasting/Traffic/iTransformer.sh,结果很差,只有周期(这个直接FFT就行,不需要ai),没有其他信息。
截屏2024-01-30 11 59 50
3140.pdf

excahnge的结果也很差:
截屏2024-01-30 11 50 52
780.pdf

我在Mac mini M2上运行的算法。对脚本做了少量改动,把cuda换成了mps。类似这样:
if torch.backends.mps.is_available():
device = torch.device("mps")

什么地方能看到你们官方的test_results目录的内容吗?

模型如何用于分类

你好,请问在分类上该模型是否同样具有优势,如何将模型用于分类?

i have a question

i don't understand embed type.
what is difference timeF, fixed, etc..?

How to distinguish different tokens

Hello, I would like to ask the original transformer to distinguish variables by adding position coding, so how does your method distinguish between different tokens?

Can`t load a custom dataset

Hello guys, thanks for this amazing work..
i am having some issue loading a custom dataset, the parameters are this:
python -u run.py \ --is_training 1 \ --root_path ./dataset/earth/ \ --data_path es.csv \ --model_id es_96_96 \ --model $model_name \ --data custom \ --features MS \ --seq_len 5 \ --pred_len 1 \ --e_layers 2 \ --enc_in 6 \ --dec_in 6 \ --c_out 1 \ --des 'Exp' \ --d_model 512\ --d_ff 512\ --itr 1\ --freq d\ --target_root_path ./dataset/earth/ \ --target_data_path 'es.csv'

the dataset look like this:
`date,0,1,2,3,4,OT

2001-01-02,1320.280029296875,1320.280029296875,1276.050048828125,1283.27001953125,1129400000,1347.56005859375
2001-01-03,1283.27001953125,1347.760009765625,1274.6199951171875,1347.56005859375,1880700000,1333.3399658203125`

but i am keep getting:
RuntimeError: Trying to resize storage that is not resizable
Any idea what am i doing wrong?

参数设置

请问ETTh1的具体训练参数是多少,我按照代码中参数的设置,并不能得到论文中的精度。

About the experimental results

Hello,May I ask why the results of the comparison model PatchTST in this paper differ so much from those in the paper PatchTST?

Experimental results from iTransformer

itransformer

Experimental results from PatchTST
patchTST

StandardScaler vs iTransformer normalization

Hi,I am playing with custom dataset and exploring different parameters.

I noticed that custom dataset loader uses StandardScaler by default. However iTransformer also supports normalization that uses normalization formula that is same as StandardScaler uses ?

  1. Are these two doing same thing (over same data) ?
  2. If yes this means that I should be using --use_norm 0 ?

Thank you for response.

Dataset_PEMS in data_loader.py

Hello, I found that when I run some models that require x_mark on the PEMS dataset, such as TimesNet, Autoformer, ect, there will be a dimension mismatching problem in embedding. I guess the reason is that x_mark is directly set to torch.zeros((seq_x.shape[0], 1))in Dataset_PEMS. Thanks!

What change should be done for good if X and Y have different num_variates?

I see that the examples given are all with the same dimension of input features and output features, I would like to ask what kind of modification should be done to the model if the input features are more than one and the output features are 1 or different, what kind of modification should be done to the model to make it better in this case? For example, adding a layer of full connection layer at the end, is this simple approach a good choice?

Questions about how to improve model training speed

Hello dear author, thank you very much for the work you shared, it is really fascinating. After reading the article and source code, we have some questions and are looking forward to your guidance.
Q1: When our GPU performance is insufficient, can we use FP16 or AMP to improve the training speed of the model without affecting the model performance?
Q2: When we tried to run the "bash ./scripts/multivariate_forecasting/Traffic/iTransformer.sh" script, we found that the training time takes about 2-3 days. Is this normal? We are using a single v100-16GB. What methods can we try to improve the training speed of the model? For example, using multiple GPUs, such as using 4 P100s? Or switch to a GPU with higher computing power, such as 4090, 4080 or 3090?
Q3: If we plan to run this model on a machine with more computing power, what should we pay more priority to? For example, is it the single-core performance of the CPU, the main frequency, or the number of cores in the CPU? GPU memory bit width and bandwidth? Does PCIE3.0 or 4.0 have a big impact when connecting the GPU to the motherboard?
I'm very sorry for asking so many questions, and I apologize again for wasting your time and energy! We look forward to your answers and guidance, and wish you better top-level conference results!

About the radar figure

Hi, thanks for the code sharing. Figure 1 in the paper is very impressive, however, when I tried to reproduce it, I found it difficult to set a customized ylim for each dataset. Would you mind also providing the visualization code for Figure 1? I believe it will help me a lot!

Sequential modeling problem

Thanks for sharing. This is an awesome work!

In my opinion, compared to existing work, iTransformer essentially uses an MLP to handle sequential dependencies and then uses attention scores to analyze the relations between variables, is this correct?

If it is, how it could handle temporal dependencies? Because there is no position embedding, and no explicit sequential modeling like RNN. (I tried to replace MLP with RNN but got a worse performance) It seems like sequential modeling is not very important and may have a negative effect. This is a little counter-intuitive. Do you have any ideas about it? Thanks!

How to get prediction without normalized?

I get prediction result, but it is normalized. it is not intuitive and difficult to show on diagram.
Could you please tell me how to get un-normalized results? thanks for your attention

Can iTransformers be used as a Large Language Model architecture?

Have you done any research on the performance per parameter of iTransformers for language tasks? If the gains shown in the graph transfer over well to natural language tasks, this could be extremely valuable for LLMs. If you haven't done any research on that specifically, can you tell me in what ways it could reasonably be projected to behave differently? For example, would context still work the same way? Could a model trained on a fixed context length also be able to scale well to a larger context length without hacks like RoPE scaling?

论文疑惑

作者您好!
首先感谢您的工作,文章逻辑清晰,源码组织规整!
拜读文章时有两个疑问向您请教。
第一:
在 4.2 iTransformers Generality 的 Variate generalization部分,文章提到
Firstly, benefiting from the flexibility of the
number of input tokens, the amount of variate channels is no longer restricted and thus feasible to vary
from training and inference. 以及后续的Figure 5
我的理解此处的variate是指时间序列的维度,即 # N: number of variate (tokens), 也即参数集里的enc_in,Figure 5 的实验则是隐藏80%的维度后做对比实验;
如果我的理解没错的话,请问如何实现训练和推断时使用不一样的数据维度?在embedding和FFN层面的处理我可以理解(参数形状和数据维度无关),但是在自注意力层面,Q, K, V 的形状是 N*Dk, 如果N在训练和推断时不一致,这部分参数如何在训练和推断时共享?

第二:
如果想针对三维数据(时序,特征,实体)而非二维数据(时序,特征)使用,请问如何处理比较合适?

感谢解惑,谢谢!

Market dataset

Good day, just raised this issue as I did not find the market dataset anywhere in the repo or the paper. Do you mind to share the link if possible?

traffic数据集问题

我的研究涉及现实物理意义解释。在itransformer提供的谷歌云端的数据集中,traffic这个数据有17545个时间步记录,862个变量,其中第一个date记录时间戳,860个变量 是否是记录的旧金山860个高速路监测点的流量数据呢?另外,最后一个变量OT我不知道其物理意义,暂时理解应该不是总占有率或者平均占有率,因为这可以通过前面860个变量简单加权计算得出,不需要这么复杂的模型预测,想请问能否提供OT的出处的解释。
另外这份traffic数据网上的各自介绍都说是基于2015-2016年的旧金山高速路数据,但是数据内记录却是2016.7-2018.7这两年间的数据,与[PEMS]任何一个子集也对应不上,在paperswithcode上也查不到资料,不知可否给出具体对数据集解释的官方网址?因为这也是许多顶会用过的应该有出处,谢谢!

Can you write a tutorial on how to use ITransformer to run custom datasets?

Your work is very beautiful. Running the dataset corresponding to the DEMO you provided is very simple, but I don't know how to run a custom dataset (such as household energy consumption data, where each sample has only one variable, which is the overall energy consumption value). I hope you can provide a tutorial. Thank you!

怎么添加自己的数据集

作者你好,这个模型能运行自己的数据集吗,就是custom dataset 我不知道怎么才能运行自己的数据集

PatchTST performance

Hi, I found that the performance of the PatchTST on the traffic dataset is significantly different from the original paper. In the original paper, the MSE Loss results of PatchTST on the traffic dataset are [0.477, 0.471, 0.485, 0.518] when fixing the lookback window=96. However, in iTransformer's baseline, the corresponding figures are [0.544, 0.540, 0.551, 0.586]. Besides, I cannot reproduce the results of the PatchTST of the original paper, either. Thanks! 😄

怎么确定最优的输入长度(seq_len)

我在预测electricity.csv文件下的数据时得到了较好的预测效果,但用自己的气象数据集进行预测时,图像的拟合结果较差。用何种方法可以确定最优的输入长度呢?需不需要调整其他参数?
image

Export model to ONNX

Hi, great open source implementation.
I am very new to ML ecosystem and I am trying to export trained itransformer model to ONNX, but currently failing to do so.
Do you think it would be possible to include sample or guide me how to do it ?
Thank you.

can we do predicting

In run.py I found train and test. Is there some where allowed us do predicting? thx

除OT列的其他变量预测

您好 请问我想预测数据集中除OT之外的变量 更改完目标变量target之后训练预测 训练结果的MSE 和MAE与OT是一模一样的 这是正确的吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.