salesforce / fsnet Goto Github PK

View Code? Open in Web Editor NEW

109.0 7.0 21.0 215 KB

License: BSD 3-Clause "New" or "Revised" License

Python 99.23% Shell 0.77%

fsnet's Introduction

Learning Fast and Slow for Online Time Series Forecasting

This project contains the Pytorch implementation of the following paper from Salesforce Research Asia:

Title: Learning Fast and Slow for Online Time Series Forecasting

Authors: Quang Pham, Chenghao Liu, Doyen Sahoo, and Steven Hoi

Introduction

Learning Fast and Slow for Online Time Series Forecasting introduces FSNet to forecast time series on the fly. FSNet augments the standard deep neural network (TCN in this repo) with the capabilities to quickly adapt to simultaneously deal with abrupt changing and repeating patterns in time series. Particularly, FSNet improves the slowly-learned backbone by dynamically balancing fast adaptation to recent changes and retrieving similar old knowledge. FSNet achieves this mechanism via an interaction between two complementary components of an adapter to monitor each layer's contribution to the lost, and an associative memory to support remembering, updating, and recalling repeating events.

Requirements

python == 3.7.3
pytorch == 1.8.0
matplotlib == 3.1.1
numpy == 1.19.4
pandas == 0.25.1
scikit_learn == 0.21.3
tqdm == 4.62.3
einops == 0.4.0

Benchmarking

1. Data preparation

We follow the same data formatting as the Informer repo (https://github.com/zhouhaoyi/Informer2020), which also hosts the raw data. Please put all raw data (csv) files in the ./data folder.

2. Run experiments

To replicate our results on the ETT, ECL, Traffic, and WTH datasets, run

chmod +x scripts/*.sh
bash .scripts/run.sh

3. Arguments

Method: Our implementation supports the following training strategies:

ogd: OGD training
large: OGD training with a large backbone
er: experience replay
derpp: dark experience replay
nomem: FSNET without the associative memory
naive: FSNET without both the memory and adapter, directly trains the adaptation coefficients.
fsnet: the proposed FSNet framework

You can specify one of the above method via the --method argument.

Dataset: Our implementation currently supports the following datasets: Electricity Transformer - ETT (including ETTh1, ETTh2, ETTm1, and ETTm2), ECL, Traffic, and WTH. You can specify the dataset via the --data argument.

Other arguments: Other useful arguments for experiments are:

--test_bsz: batch size used for testing: must be set to 1 for online learning,
--seq_len: look-back windows' length, set to 60 by default,
--pred_len: forecast windows' length, set to 1 for online learning.

fsnet's People

Contributors

Stargazers

Watchers

fsnet's Issues

RuntimeError: The size of tensor a (7) must match the size of tensor b (343) at non-singleton dimension 1

I used the parameters:
--data ETTh1
--method fsnet
--test_bsz 1
--seq_len 60
--pred_len 1
The training process works fine, but the testing error is as follows:
“RuntimeError: The size of tensor a (7) must match the size of tensor b (343) at non-singleton dimension 1”
Does anyone know how to fix this? Thanks!

FSNet may not beat the naive model

I have done some experiments with the FSNet and it works well. However, after plotting the ground truth and the prediction made by FSNet, I realized that the naive model, simply shifting the truth by one step left, is also a strong model. By shifting the truth only one step, I mean using only the latest data point to predict the future data point. Astonishingly, the naive model beats FSNet on the ETTh2 dataset! I don't have the code and results now. But roughly speaking, the MSE of the naive model on the ETTh2 dataset is 0.40 while the MSE of FSNet is 0.466. I believe the results are easy to reproduce.

Another interesting thing is the precision achieved by the backbone TS2Vec is way better than the one achieved by FSNet. The MSE for univariate forecasting on the ETTh2 dataset with a forecasting horizon of 24 achieved by TS2Vec is 0.090, while the MSE on the same dataset and horizon achieved by FSNet is 0.687. The following tables are reported in FSNet and TS2Vec papers respectively.

The comparison between TS2Vec and FSNet is not surprising because FSNet assumes the streaming data and gives up the usage of batch training. However, FSNet also is defeated by the naive model, which is a little embarrassing. The naive model is pretty strong in the online learning task because it can rapidly adapt to the abrupt changing points of the time series hence reducing the prediction error. The reason behind the rapid learning feature of the naive model is that the naive model is only one step slower than the target time series while the neural network still needs multiple steps to follow the abrupt changes. (I had a pretty good figure to explain this but the figure is not available now.) The neural network is sensitive to abrupt changes in online learning situations because the loss would be huge when the abrupt changes happen and the network is forced to adapt to it to reduce the error. However, the fixed step length limits the adaptability of the network, leading to a dilemma of fast learning and overreaction. (I also had a good figure but ...)

A good evidence of the strong naive model and the dilemma is the sequence length of the time series used to predict. Although FSNet and its backbone, the TS2Vec, claim to use multiple past time steps to predict the future value. The regressor of their network only takes in the last intermediate representation to predict the future value. The statement in TS2Vec is as follows,

The corresponding code in TS2Vec and FSNet is as follows,

https://github.com/yuezhihan/ts2vec/blob/main/tasks/forecasting.py

https://github.com/salesforce/fsnet/blob/main/exp/exp_fsnet.py

I also tried using all the intermediate representations to predict the future values but it turns out the precision is worse than predicting with only the last representation. This result breaks my belief that the prediction with longer sequence input will perform better. I guess the reason is that the longer sequence input makes the fast-learning feature hard to achieve and makes the sequence mapping between the past and the future hard to learn. I guess that's why TS2Vec uses only the last hidden representation.

The FSNet paper precisely describes the difficulties of online learning and the dilemma between fast learning and persistent memory. The one-batch training is computation-efficient but the performance could be improved further.

Questions about data mask

Why is the dimension of the mask here +7, this looks like it's specifically for the ETT dataset, won't this affect the other datasets?

Correct code errors in fw_chunks(self)

idx = idx.unsqueeze(1).float()
old_w = ww @ idx
→
v = v.unsqueeze(1).float()
old_w = ww @ v

Can you share the generation code of S-Abrupt (S-A), and S-Gradual (S-G)

How to reproduce the experimental results of Informer run online?

I have experimented with FSNet, and it has proven to be effective. I am keen on replicating the results of Informer as presented in your paper. This is in pursuit of conducting further comparisons for a survey I am currently working on. Your assistance in this matter is greatly appreciated. Thank you!

There may be a memory leak in your code

I've fully experimented with your code taking up CPU memory issues. It is evident that the CPU memory decreases gradually when the program is running. Fortunately, your process will not be killed because the datasets in your experiment are too small.
However, I tested your code on a dataset with 100,000 entries. And the process was killed when we got to one tenth of the way through the test because of memory leak.
Long data series are not uncommon in the real world of online learning, and online learning is geared towards applications in the display world. Therefore, I hope that you will take this issue seriously.
Looking forward to your reply.

fsnet for univariate forecasting

First of all, congratulations on your outstanding work!

I've been attempting to execute your code, specifically with the FSNet model and the baselines, for a univariate forecasting task. To achieve this, I modified the '--features' argument to 'MS' for multivariate predicting univariate or 'S' for univariate predicting univariate. Additionally, I defined the target features.

However, I observed that the performance is significantly subpar across all datasets. Are there any other arguments that I should consider adjusting? I had anticipated that the predictions would be more accurate, given that univariate forecasting is generally considered simpler than multivariate forecasting.

Thank you for your assistance!

Information Leakage: Infeasible online learning with horizon > 1

As we can know the ground-truth label only after $H$ steps, we CANNOT immediately update the forecast model online when $H>1$.
The current implementation actually uses future information during online learning.

Is it invalid to experiment on large horizons? I'm looking forward to your ideas.

ModuleNotFoundError: No module named 'exp.exp_online'

这是什么情况？需要大家的帮助。