Comments (5)
I tried not to use norm and found that the scale of the mse became normal, but still couldn't achieve the performance described in the paper.
Args in experiment:
Namespace(activation='gelu', batch_size=32, c_out=358, channel_independence=False, checkpoints='./checkpoints/', class_strategy='projection', d_ff=512, d_layers=1, d_model=512, data='PEMS', data_path='PEMS03.npz', dec_in=358, des='Exp', devices='0,1,2,3', distil=True, do_predict=False, dropout=0.1, e_layers=4, efficient_training=False, embed='timeF', enc_in=358, exp_name='MTSF', factor=1, features='M', freq='h', gpu=0, inverse=False, is_training=1, itr=1, label_len=48, learning_rate=0.001, loss='MSE', lradj='type1', model='iTransformer', model_id='PEMS03_96_96', moving_avg=25, n_heads=8, num_workers=10, output_attention=False, partial_start_index=0, patience=3, pred_len=96, root_path='./dataset/PEMS/', seq_len=96, target='OT', target_data_path='electricity.csv', target_root_path='./data/electricity/', train_epochs=10, use_amp=False, use_gpu=True, use_multi_gpu=False, use_norm=0)
Use GPU: cuda:0
start training : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0>>>>>>>>>>>>>>>>>>>>>>>>>>
train 15533
val 5051
test 5051
iters: 100, epoch: 1 | loss: 0.2624913
speed: 0.0681s/iter; left time: 323.3598s
iters: 200, epoch: 1 | loss: 0.2046000
speed: 0.0414s/iter; left time: 192.7160s
iters: 300, epoch: 1 | loss: 0.1952162
speed: 0.0416s/iter; left time: 189.4265s
iters: 400, epoch: 1 | loss: 0.1915123
speed: 0.0417s/iter; left time: 185.6636s
Epoch: 1 cost time: 22.924086570739746
Epoch: 1, Steps: 485 | Train Loss: 0.2236028 Vali Loss: 0.1956492 Test Loss: 0.2641603
Validation loss decreased (inf --> 0.195649). Saving model ...
Updating learning rate to 0.001
iters: 100, epoch: 2 | loss: 0.1867465
speed: 0.4899s/iter; left time: 2090.0281s
iters: 200, epoch: 2 | loss: 0.1924414
speed: 0.0424s/iter; left time: 176.4389s
iters: 300, epoch: 2 | loss: 0.1788835
speed: 0.0427s/iter; left time: 173.5791s
iters: 400, epoch: 2 | loss: 0.1569002
speed: 0.0430s/iter; left time: 170.4055s
Epoch: 2 cost time: 21.624907970428467
Epoch: 2, Steps: 485 | Train Loss: 0.1681240 Vali Loss: 0.1649887 Test Loss: 0.2362694
Validation loss decreased (0.195649 --> 0.164989). Saving model ...
Updating learning rate to 0.0005
iters: 100, epoch: 3 | loss: 0.1209449
speed: 0.4986s/iter; left time: 1885.1409s
iters: 200, epoch: 3 | loss: 0.1395759
speed: 0.0429s/iter; left time: 157.9302s
iters: 300, epoch: 3 | loss: 0.1247872
speed: 0.0433s/iter; left time: 154.9850s
iters: 400, epoch: 3 | loss: 0.1239995
speed: 0.0436s/iter; left time: 151.5990s
Epoch: 3 cost time: 21.9407217502594
Epoch: 3, Steps: 485 | Train Loss: 0.1301747 Vali Loss: 0.1335264 Test Loss: 0.2049689
Validation loss decreased (0.164989 --> 0.133526). Saving model ...
Updating learning rate to 0.00025
iters: 100, epoch: 4 | loss: 0.1209110
speed: 0.5015s/iter; left time: 1652.8661s
iters: 200, epoch: 4 | loss: 0.1202964
speed: 0.0432s/iter; left time: 138.1157s
iters: 300, epoch: 4 | loss: 0.1029550
speed: 0.0435s/iter; left time: 134.5850s
iters: 400, epoch: 4 | loss: 0.1273251
speed: 0.0438s/iter; left time: 131.2271s
Epoch: 4 cost time: 22.013235807418823
Epoch: 4, Steps: 485 | Train Loss: 0.1163763 Vali Loss: 0.1264918 Test Loss: 0.1891304
Validation loss decreased (0.133526 --> 0.126492). Saving model ...
Updating learning rate to 0.000125
iters: 100, epoch: 5 | loss: 0.1112179
speed: 0.5003s/iter; left time: 1406.2395s
iters: 200, epoch: 5 | loss: 0.1191604
speed: 0.0433s/iter; left time: 117.5131s
iters: 300, epoch: 5 | loss: 0.1151067
speed: 0.0439s/iter; left time: 114.6108s
iters: 400, epoch: 5 | loss: 0.1100017
speed: 0.0441s/iter; left time: 110.7392s
Epoch: 5 cost time: 22.304459810256958
Epoch: 5, Steps: 485 | Train Loss: 0.1107414 Vali Loss: 0.1210305 Test Loss: 0.1781913
Validation loss decreased (0.126492 --> 0.121031). Saving model ...
Updating learning rate to 6.25e-05
iters: 100, epoch: 6 | loss: 0.1029362
speed: 0.5167s/iter; left time: 1201.8610s
iters: 200, epoch: 6 | loss: 0.1482204
speed: 0.0434s/iter; left time: 96.5248s
iters: 300, epoch: 6 | loss: 0.1002703
speed: 0.0437s/iter; left time: 92.8868s
iters: 400, epoch: 6 | loss: 0.1189812
speed: 0.0440s/iter; left time: 89.1635s
Epoch: 6 cost time: 22.138785123825073
Epoch: 6, Steps: 485 | Train Loss: 0.1079675 Vali Loss: 0.1200538 Test Loss: 0.1769875
Validation loss decreased (0.121031 --> 0.120054). Saving model ...
Updating learning rate to 3.125e-05
iters: 100, epoch: 7 | loss: 0.0990048
speed: 0.5107s/iter; left time: 940.2505s
iters: 200, epoch: 7 | loss: 0.1185950
speed: 0.0435s/iter; left time: 75.7064s
iters: 300, epoch: 7 | loss: 0.0920268
speed: 0.0438s/iter; left time: 71.8683s
iters: 400, epoch: 7 | loss: 0.1048138
speed: 0.0441s/iter; left time: 67.9027s
Epoch: 7 cost time: 22.343345403671265
Epoch: 7, Steps: 485 | Train Loss: 0.1064951 Vali Loss: 0.1190044 Test Loss: 0.1757530
Validation loss decreased (0.120054 --> 0.119004). Saving model ...
Updating learning rate to 1.5625e-05
iters: 100, epoch: 8 | loss: 0.1264642
speed: 0.5120s/iter; left time: 694.2418s
iters: 200, epoch: 8 | loss: 0.1057447
speed: 0.0435s/iter; left time: 54.5842s
iters: 300, epoch: 8 | loss: 0.1032038
speed: 0.0439s/iter; left time: 50.7019s
iters: 400, epoch: 8 | loss: 0.1228641
speed: 0.0441s/iter; left time: 46.6026s
Epoch: 8 cost time: 22.21909260749817
Epoch: 8, Steps: 485 | Train Loss: 0.1056254 Vali Loss: 0.1186761 Test Loss: 0.1778394
Validation loss decreased (0.119004 --> 0.118676). Saving model ...
Updating learning rate to 7.8125e-06
iters: 100, epoch: 9 | loss: 0.1092731
speed: 0.5034s/iter; left time: 438.5034s
iters: 200, epoch: 9 | loss: 0.1012566
speed: 0.0435s/iter; left time: 33.5258s
iters: 300, epoch: 9 | loss: 0.1146228
speed: 0.0438s/iter; left time: 29.3889s
iters: 400, epoch: 9 | loss: 0.1040003
speed: 0.0456s/iter; left time: 26.0287s
Epoch: 9 cost time: 22.3890643119812
Epoch: 9, Steps: 485 | Train Loss: 0.1051820 Vali Loss: 0.1180485 Test Loss: 0.1759509
Validation loss decreased (0.118676 --> 0.118049). Saving model ...
Updating learning rate to 3.90625e-06
iters: 100, epoch: 10 | loss: 0.1146770
speed: 0.5025s/iter; left time: 193.9549s
iters: 200, epoch: 10 | loss: 0.1029931
speed: 0.0435s/iter; left time: 12.4420s
iters: 300, epoch: 10 | loss: 0.0811080
speed: 0.0441s/iter; left time: 8.1950s
iters: 400, epoch: 10 | loss: 0.0947617
speed: 0.0442s/iter; left time: 3.7979s
Epoch: 10 cost time: 22.295016527175903
Epoch: 10, Steps: 485 | Train Loss: 0.1048957 Vali Loss: 0.1181574 Test Loss: 0.1758959
EarlyStopping counter: 1 out of 3
Updating learning rate to 1.953125e-06
testing : PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
test 5051
test shape: (5051, 1, 96, 358) (5051, 1, 96, 358)
test shape: (5051, 96, 358) (5051, 96, 358)
mse:0.1759510487318039, mae:0.28603628277778625
from itransformer.
你别的试了吗
from itransformer.
I'm also experiencing this issue. In PEMS03, the {12, 24}-step prediction results are consistent with the paper. However, the predicted {48,96}-step results are far from the authors' report. And then I change --use_norm hyperparmeters to 0, just as as @Secilia-Cxy did, the performance reported by the paper can still not be reached.
results produced by the scripts/multivariate_forecasting/PEMS/iTransformer_03.sh. (--use_norm use default value 1 in this script
[close to the results in paper] PEMS03_96_12_iTransformer_PEMS_M_ft96_sl48_ll12_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.06867164373397827, mae:0.17403899133205414
[close to the results in paper]
PEMS03_96_24_iTransformer_PEMS_M_ft96_sl48_ll24_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.09742313623428345, mae:0.20868924260139465
[worse than the results in paper]
PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.16373442113399506, mae:0.2756078243255615
[far wrose than the results in the paper]
PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:1.262604832649231, mae:0.8765576481819153
set --use_norm as 0
[close to the results in paper]
PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.13765227794647217, mae:0.24710021913051605,rmse:0.37101519107818604,mape:2.035256862640381
[close to the results in paper]
PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.17429415881633759, mae:0.28364065289497375
What's more, I find that, results for PEMS07 dataset can't be reproduced either when prediction horizon are 48 and 96.
results produced by scripts/multivariate_forecasting/PEMS/iTransformer_07.sh (--use_norm is 0 in this script)
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.23394542932510376, mae:0.3273244798183441,rmse:0.48367905616760254,mape:3.564363479614258
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:1.0463123321533203, mae:0.8682820796966553,rmse:1.0228941440582275,mape:2.4747323989868164
change --use_norm to 1
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:1.5068509578704834, mae:1.0126152038574219,rmse:1.2275385856628418,mape:5.98640775680542
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
mse:0.8029258847236633, mae:0.6813836693763733,rmse:0.8960613012313843,mape:4.556532859802246
from itransformer.
我也遇到了这个问题。在PEMS03中,{12,24}步预测结果与论文一致。然而,预测的{48,96}步结果与作者的报告相去甚远。然后我把 --use_norm hyperparmeters 改为 0,就像一样,论文报告的性能仍然无法达到。
scripts/multivariate_forecasting/PEMS/iTransformer_03.sh. (--use_norm 生成的结果在此脚本中使用默认值 1
[接近论文结果]PEMS03_96_12_iTransformer_PEMS_M_ft96_sl48_ll12_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.06867164373397827,mae:0.17403899133205414
[接近论文结果] PEMS03_96_24_iTransformer_PEMS_M_ft96_sl48_ll24_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.09742313623428345, mae:0.20868924260139465
[比纸上的结果还差] PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.16373442113399506, mae:0.2756078243255615
[比论文中的结果要好得多] PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.262604832649231,MAE:0.8765576481819153
将 --use_norm 设置为 0
[接近论文结果] PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.13765227794647217, mae:0.24710021913051605,rmse:0.37101519107818604,mape:2.035256862640381
[接近论文结果] PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.17429415881633759, mae:0.28364065289497375
并且我发现,当预测范围为 48 和 96 时,PEMS07 数据集的结果也无法重现。
scripts/multivariate_forecasting/PEMS/iTransformer_07.sh 生成的结果(此脚本中 --use_norm 为 0)
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.23394542932510376, mae:0.3273244798183441,rmse:0.48367905616760254,mape:3.564363479614258
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.0463123321533203, mae:0.8682820796966553,rmse:1.0228941440582275,mape:2.4747323989868164
将 --use_norm 更改为 1
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.5068509578704834,mae:1.0126152038574219,rmse:1.2275385856628418,mape:5.98640775680542
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.8029258847236633, mae:0.6813836693763733,rmse:0.8960613012313843,mape:4.556532859802246
[比论文中的结果要好得多]
PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0
MSE:1.262604832649231,MAE:0.8765576481819153 兄弟你看错了吧
from itransformer.
我也遇到了这个问题。在PEMS03中,{12,24}步预测结果与论文一致。然而,预测的{48,96}步结果与作者的报告相去甚远。然后我把 --use_norm hyperparmeters 改为 0,但论文报告的性能仍然无法达到,跟一楼的结论相近。
scripts/multivariate_forecasting/PEMS/iTransformer_03.sh. (--use_norm 生成的结果在此脚本中使用默认值 1
[接近论文结果]PEMS03_96_12_iTransformer_PEMS_M_ft96_sl48_ll12_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.06867164373397827,mae:0.17403899133205414
[接近论文结果] PEMS03_96_24_iTransformer_PEMS_M_ft96_sl48_ll24_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.09742313623428345, mae:0.20868924260139465
[比论文的结果稍差] PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.16373442113399506, mae:0.2756078243255615
[比论文中的结果要差得多] PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.262604832649231,MAE:0.8765576481819153将 --use_norm 设置为 0
[接近论文结果] PEMS03_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.13765227794647217, mae:0.24710021913051605,rmse:0.37101519107818604,mape:2.035256862640381
[接近论文结果] PEMS03_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.17429415881633759, mae:0.28364065289497375并且我发现,当预测范围为 48 和 96 时,PEMS07 数据集的结果也无法重现。
scripts/multivariate_forecasting/PEMS/iTransformer_07.sh 生成的结果(此脚本中 --use_norm 为 0)
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.23394542932510376, mae:0.3273244798183441,rmse:0.48367905616760254,mape:3.564363479614258
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.0463123321533203, mae:0.8682820796966553,rmse:1.0228941440582275,mape:2.4747323989868164将 --use_norm 更改为 1
PEMS07_96_48_iTransformer_PEMS_M_ft96_sl48_ll48_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:1.5068509578704834,mae:1.0126152038574219,rmse:1.2275385856628418,mape:5.98640775680542
PEMS07_96_96_iTransformer_PEMS_M_ft96_sl48_ll96_pl512_dm8_nh4_el1_dl512_df1_fctimeF_ebTrue_dtExp_projection_0 MSE:0.8029258847236633, mae:0.6813836693763733,rmse:0.8960613012313843,mape:4.556532859802246
应该没弄错,检查了一遍。不过我试着降低了sh文件中的学习率,跑出来的PEMS07数据集上的表现已经跟论文中的接近了。
from itransformer.
Related Issues (20)
- ValueError: could not convert string to float: '2020-01-01 00:20:00' HOT 1
- Error when using a Custom dataset with weekly frequency HOT 1
- Question: Support for Dynamic Categorical Inputs in iTransformer HOT 2
- 无法重现论文中的结果 HOT 3
- How to visualize the results? HOT 5
- Why not using Decoder-only Transformer?
- How to get the figures in the paper? HOT 1
- How to visualize the results? HOT 1
- 如何获得更优的模型参数 HOT 4
- './scripts/variate_generalization/Electricity/iTransformer.sh': No such file or directory HOT 1
- seq_len取值小于48时,代码无法运行 HOT 1
- 请问在M任务中如何指定预测目标。 HOT 1
- Here, are the following two lines redundant? batch_x = batch_x[:, :, partial_start:partial_end] batch_y = batch_y[:, :, partial_start:partial_end] HOT 1
- 有关使用或不使用.sh 文件的训练时间和内存使用率的问题 HOT 1
- CLS Token HOT 1
- Fine-tuning? HOT 1
- 位置编码 HOT 2
- data_loader.py文件的疑问 HOT 2
- 关于label_len参数的疑问 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from itransformer.