Giter VIP home page Giter VIP logo

phv's Introduction

光伏短期功率预测大赛

李家翔,武睿琦,靳晓松 2023-02-06

模型融合

我们尝试的模型融合有

  1. 神经网络模型
  2. Xgboost模型
  3. 时间序列模型
  4. 基于概率模型的融合

结论

本次比赛,我们主要的实现方式是神经网络模型,最终的排名是52名。我们的特征工程涵盖了时间相关变量、平方项、立方项、比率、滚动SMA、滚动方差、PCA主成分、实发辐射的测试集预测值、NMF衍生变量、prophet等,而模型融合则涵盖了神经网络模型、Xgboost模型、时间序列模型以及基于概率模型的融合。

光伏短期功率预测大赛

这个项目是参加国能日新的光伏短期功率预测大赛的结稿。我们的团队名为 PHotoVoltaic (phv),最终排名是52名。

在这个比赛中,我们尝试了一系列的特征工程和模型融合,以提高模型的性能。在特征工程方面,我们加入了时间相关变量、平方项、立方项、比率、滚动SMA、滚动方差、PCA主成分、实发辐射的测试集预测值、NMF衍生变量、prophet等;在模型融合方面,我们尝试了神经网络模型、Xgboost模型、时间序列模型以及基于概率模型的融合。

我们的实现方式主要是神经网络模型,具体见Python代码wushen.ipynb,而Xgboost的融合则见R代码note.Rmd。我们也使用了trelliscope来进行EDA,交互方便,但是不适合上线部署,不便于交流。

最终,我们的模型达到了较好的效果,跑出了52名的排名。

EDA

使用trelliscope,交互方便,但是不适合上线部署,不便于交流。

  1. trelliscope/p
  2. trelliscope/tsi
  3. trelliscope/tsi_real

后续可以做的空间

深度学习的方法

  1. 可以采用空洞卷积的方法(A. van den Oord et al. 2016a; A. van den Oord et al. 2016b; Sprangers, Schelter, and Rijke 2022; Kechyn et al. 2018),这种方法可以用于一些其他的应用,比如音频的频谱、长时间序列等。

XGBoost

  1. 由于比赛过程中主办方修改了数据集和评价函数,我们无法复现原来的历史预测,因此,我们没有将神经网络和XGboost进行融合,这也是我们下一次比赛需要注意的问题。
  2. 我们可以采用更加合理的窗口特征提取方式(Elsayed et al. 2021),以及考虑多任务的框架,如MT-GBT(Ying et al. 2022),来提高模型的性能。

EDA和特征工程

  1. 我们需要做好EDA,观察被解释变量关于时间的波动,查看异常值。
  2. 在特征工程的部分,为了拟合非线性关系,我们可以使用更高效的Ramsey’s RESET test,详见Github
  3. 我们也可以参考预测值迁移的问题,发现模型可能存在欠拟合的情况,并采取模型校正部分的方法来解决。
  4. 因为有四个光伏板,并且都是时间序列,所以这里可以采用LSTM训练,参考6神经网络应用
  5. 既然考虑了PCA作为聚类特征,那么应该考虑DTW(Salvador and Chan 2007; Izakian, Pedrycz, and Jamal 2015)和TS-PCA(Chang, Guo, and Yao 2018)。
  6. 既然考虑了prophet,那么应该使用prophet的NNs训练(Triebe et al. 2021)。

**Code of Conduct**

Please note that the ‘phv’ project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms.

**License**

MIT © [Jiaxiang Li;Ruiqi Wu;Xiaosong Jin](LICENSE.md)

Chang, Jinyuan, Bin Guo, and Qiwei Yao. 2018. “Principal Component Analysis for Second-Order Stationary Vector Time Series.” The Annals of Statistics 46 (5). https://doi.org/10.1214/17-aos1613.

Elsayed, Shereen, Daniela Thyssens, Ahmed Rashed, Hadi Samer Jomaa, and Lars Schmidt-Thieme. 2021. “Do We Really Need Deep Learning Models for Time Series Forecasting?” arXiv Preprint arXiv:2101.02118.

Izakian, Hesam, Witold Pedrycz, and Iqbal Jamal. 2015. “Fuzzy Clustering of Time Series Data Using Dynamic Time Warping Distance.” Engineering Applications of Artificial Intelligence 39: 235–44.

Kechyn, Glib, Lucius Yu, Yangguang Zang, and Svyatoslav Kechyn. 2018. “Sales Forecasting Using WaveNet Within the Framework of the Kaggle Competition.” arXiv: Learning.

Oord, Aaron van den, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016a. “Wavenet: A Generative Model for Raw Audio.” arXiv Preprint arXiv:1609.03499.

Oord, Aaron van den, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016b. “Conditional Image Generation with PixelCNN Decoders.” Neural Information Processing Systems.

Salvador, Stan, and Philip Chan. 2007. “Toward Accurate Dynamic Time Warping in Linear Time and Space.” Intelligent Data Analysis 11 (5): 561–80.

Sprangers, Olivier, Sebastian Schelter, and Maarten de Rijke. 2022. “Parameter-Efficient Deep Probabilistic Forecasting.” International Journal of Forecasting.

Triebe, Oskar, Hansika Hewamalage, Polina Pilyugina, Nikolay Laptev, Christoph Bergmeir, and Ram Rajagopal. 2021. “NeuralProphet: Explainable Forecasting at Scale.” https://arxiv.org/abs/2111.15397.

Ying, ZhenZhe, Zhuoer Xu, Weiqiang Wang, and Changhua Meng. 2022. “MT-GBM: A Multi-Task Gradient Boosting Machine with Shared Decision Trees.” arXiv Preprint arXiv:2201.06239.

phv's People

Contributors

jiaxiangbu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

phv's Issues

Ask for help

Dear Jiaxiang Li:
Hello, I`m Jichao Wang, a student major in Data Science. I want to do an easy analysis about PV as my final exam. I find that the domain "real_irradiance" and "irradiance" are unreliable.

Could you share source data of that competition to [email protected]? In fact, I don`t need the full data set, I just need one for a certain plant farm. For example, train_1.csv is enough for me.

If you don`t save them, would you like to tell me what pre-process you did about that. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.