Giter VIP home page Giter VIP logo

patchmixer's Introduction

PatchMixer

Introduction

This is the official implementation of PatchMixer: PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting.

Model Overview

alt text

PatchMixer is primarily composed of two convolutional layers and two forecasting heads. Its distinguishing feature is the โ€œpatch-mixingโ€ design, which means the model initially segments the input time series into smaller temporal patches and subsequently integrates information from both within and between these patches.

Getting Started

  1. Install requirements.
pip install -r requirements.txt
  1. Download data. You can download all the datasets from Autoformer. Create a seperate folder ./dataset and put all the csv files in the directory.

  2. Training. All the scripts are in the directory ./scripts/PatchMixer. For example, if you want to get the multivariate forecasting results for weather dataset, just run the following command, and you can open ./result.txt to see the results once the training is done, log file is in ./logs/LongForecasting/*.log:

sh ./scripts/PatchMixer/weather.sh

You can also add --use_multi_gpu for multi-gpu training. The hyperparameters can be adjusted as your needs (e.g. different patch length, different sequence lengths and prediction lengths.). We also provide codes for the baseline models.

Results

๐Ÿ† Achieve state-of-the-art in Long-Term Time series Forecasting

Quantitatively, PatchMixer demonstrates an overall relative reduction of $\mathbf{3.9%}$ on MSE and $\mathbf{3.0%}$ on MAE in comparison to the state-of-the-art Transformer (PatchTST). When evaluated against the best-performing MLP-based model (DLinear), our model showcases an overall decline of $\mathbf{11.6%}$ on MSE and $\mathbf{9.4%}$ on MAE. Moreover, in comparison to the achievable outcomes with the best CNN-based model (TimesNet), we demonstrate a remarkable overall relative reduction of $\mathbf{21.2%}$ on MSE and $\mathbf{12.5%}$ on MAE.

alt text

๐ŸŒŸ Training and Inference Efficiency

Our results highlight two key improvements. First, PatchMixer achieves a 3x faster inference and 2x faster training speed compared to PatchTST. Second, PatchTST's performance is highly sensitive to the length of the look-back window, particularly when it reaches or exceeds 1440. In contrast, PatchMixer exhibits fewer fluctuations in both inference and training times with increasing historic length, contributing to higher accuracy and computational efficiency.

alt text

๐ŸŒŸ Efficiency on Long Look-back Windows

In principle, the large receptive field is beneficial for improving performance, while the receptive field of the look-back window in time series analysis is also important. Generally speaking, a powerful LTSF model with a strong temporal relation extraction capability should be able to achieve better results with longer input historical sequences. Recent baselines such as PatchTST, DLinear, and our PatchMixer consistently reduce the MSE scores as the receptive field increases, which confirms our model's capability to learn from the longer look-back window.

alt text

Acknowledgement

We appreciate the following GitHub repo very much for the valuable code base and datasets:

https://github.com/yuqinie98/PatchTST

https://github.com/wanghq21/MICN

https://github.com/thuml/TimesNet

https://github.com/cure-lab/LTSF-Linear

https://github.com/zhouhaoyi/Informer2020

https://github.com/thuml/Autoformer

https://github.com/MAZiqing/FEDformer

https://github.com/ts-kim/RevIN

Contact

If you have any questions or concerns, please submit an issue on GitHub. For matters that require more direct communication, you can also email us at [email protected]. However, we kindly encourage the use of issues for better transparency and tracking.

Citation

If you find this repository useful in your research, please consider citing our paper:

@inproceedings{Gong2023PatchMixerAP,
  title={PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting},
  author={Zeying Gong and Yujin Tang and Junwei Liang},
  year={2023},
  url={https://api.semanticscholar.org/CorpusID:263334059}
}

patchmixer's People

Contributors

zeying-gong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

patchmixer's Issues

Environment versions

Please, provide exact versions of python and python packages you used
Thank you

Code

Is it possible to release the code on this platform?

Dual Forecasting Heads

May I ask how the results of the dual prediction head are combined? Is the optimal result obtained by multiplying the predicted results for each head with different weights?

Metrics (MSE and MAE) calcutaion.

Hi, firstly, very good work in you paper. I read it some times, but I could't find the info if the metrics MSE, and MAE where calculated over all data train, or over the last input lookback window in the data (like it was a data test). I find in a paper that you based yours (A TIME SERIES IS WORTH 64 WORDS: LONG-TERM FORECASTING WITH TRANSFORMERS), the metrics is a forecast average of all input series (or features), but the forecast were over all data or only over the last one, or you just get the last MSE and MAE loss function value of training? Could you please clarify this?

Was I able to be clear?

Thank you,

Regards.

question

Hi, when will the complete code be shared?

Questions about error indicators

Hello, when I read the code, I found that there was a scale before data input, but there was no anti-standardization when output, which made mae and mse very small. I calculated mape again and reached about 10. mape is not mentioned in the original paper. Is it just the standardization of this operation that makes the non-percentage error like mae seem small? I look forward to your reply and answer

Bug Issue: drop_last=True for test dataset lead to inaccurate test results

Hi, thanks to the authors for their valuable work, the results of this work are amazing. However, I think potential bugs may have lead to the incorrect performance of this work. As https://github.com/vewoxic/fits has shown, using drop=True during the testing phase can lead to incorrect results. In this work, the batch size was set to 1024, and this large batch size may have amplified the impact of the bug. Hope the authors will fix the bug and correct the experimental results.

thop

่ฏท้—ฎthopๆ€Žไนˆ่ฃ…ๅ•Š๏ผŸๆ นๆœฌ่ฃ…ไธๆˆๅŠŸ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.