Giter VIP home page Giter VIP logo

wind-power-forecasting's Introduction

Wind Power Forecasting - Application tool

This repository contains the source code of my Final Master's degree project in Decision Systems Engineering, titled Wind Power Forecasting using Machine Learning techniques, coursed in Rey Juan Carlos University. It is based on the Data Science challenge posed by the Compagnie nationale du Rhône.

For further information, you can read the master's thesis here.

Introduction

This application is intended to be a flexible and configurable tool in order to easily build and analyze models for this forecasting problem. It is based on Kedro API for the sake of applying software engineering best practices to data and machine-learning pipelines. MLflow tracking is used to record and query experiments (code, data, config, and results).

Instalation

The packages to re-create the necessary conda environment are listed in ./requirements.txt.

Implemented pipelines

The main pipelines implemented are:

  1. Prepare data for EDA (eda). Transforms raw data into a proper format for Exploratory Data Analisys.
  2. Data engineering (de). Gets the data ready to be consumed by Machine Learning algorithms.
  3. Feature engineering (fe). Allows to explore and add new features to the data sets.
  4. Modeling (mdl). Trains the selected algorithm from among the following: MARS, KNN, RF, SVM. It also optimizes model hyperparameters and make predictions on the test set.

There are other two additional pipelines:

  1. CNR pipeline. It contains several subpipelines to get predictions and submission file for the CNR Data Science Challenge.
  2. Neural Networks. In progress ...

Configuration files

There are configuration files for every pipeline consisting of prameters.yml and catalog.yml files. The first one contains all the parameters required for the pipeline run. The second is the project-shareable Data Catalog. It's a registry of all data sources available for use by the project and it manages loading and saving of data. Both configuration files are located at conf/base.

CLI commands

As a kedro application, the CLI can be used to run pipelines, among all other options you can check in kedro documentation. To run the main pipelines of this project these are some basic command examples, choosing the Wind Farm (wf) and the algorithm (alg) to build the model:

  1. Prepare data for EDA: kedro run --pipeline eda --params wf:WF1
  2. Data engineering: kedro run --pipeline de --params wf:WF1
  3. Feature engineering: kedro run --pipeline fe --params wf:WF1,max_k_bests:3
  4. Modeling: kedro run --piepeline mdl --params wf:WF1,alg:KNN

You can overwrite any parameter value defined in parameter configuration files, as well as the the data set used as the first input whenever it is defined in any of the existing data catalogs.

Important: It's necessary to put raw data in data/01_raw/. Raw data is available here (free registration for the challenge is required).

Pipeline visualization

Using the plugin kedro-viz (need to be installed) by running kedro viz, you'll visualize data and machine-learning pipelines. For instance, this is the visualization of the data enegineering pipeline:

Other useful commands

  • Mlflow tracking ui: kedro mlflow ui. It serves the tracking tool as a web on localhost (by default port 5000)
  • Jupyter notebook: kedro jupyter notebook. It launches jupyter notebook loading all the kedro context variables so you can easily access pipelines, data catalogs, parameters and many other useful stuff from your notebook.

To use mlflow ui you need to install the plugin kedro-mlflow.

License: CC BY 4.0

wind-power-forecasting's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.