Giter VIP home page Giter VIP logo

anomaly_detection_in_wind_turbines's Introduction

Automated anomaly detection wind turbines power curve data.

Identifying anomalies in wind turbine performance using wind turbine operational data produced during a single year.

This repository contains the final 4-week capstone project on Anomaly detection in Time Series Data of Wind Turbines.

The project was done in collaboration with Fluence Nispera.

Purpose

Fluence, a global leader in energy storage and digital applications for renewable energy, offers a wind power plant monitoring platform. This platform helps wind power plant operators understand plant performance and identify opportunities for improvement. The ability to automatically detect deviations from normal operation of turbines in wind power plants is one of the most sought-after features of such a platform. But understanding what is “normal” operation is especially challenging as not all wind turbines, nor all wind power plant locations, are created equal. What might be normal operation in one location or turbine, might represent a less than optimal operation in another. Our team’s challenge was to identify anomalies in wind turbine performance using wind turbine operational data produced during a single year.

Anomaly Cloud Clustering with Isolation Forest and KMeans

Investigation of machine learning based approaches, especially those which required no human intervention, was of interest to Fluence. As a first step, we applied several machine learning models designed for anomaly detection from the PyOD and Scikit Learn libraries, including Local Outlier Factor (LOF), Cluster By Local Outlier Factor (CBLOF), Copula Based Outlier Detector (COPOD), Gaussian Mixture Model (GMM) and Isolation Forest.

While all models were able to identify anomalies, visual inspection revealed that Isolation Forest was best at correctly identifying anomalous data points. LOF provided comparable yet inferior results. CBLOF, COPOD, and GMM struggled to differentiate between anomalous and normal data.

Select Comparison of ML Model Outputs

Despite having the most promising results of the models tested, the performance of Isolation Forest varied widely depending on the data set. In many cases Isolation Forest failed to deliver the same quality results as statistically based approaches, defining areas of “normal” data as anomalous.

Additional Example of Isolation Forest Output

Because of this shortcoming, the use of Isolation Forest alone did not seem sufficient. However, our team came up with an idea which might reduce the impact of the mislabeled data, while also given the opportunity to change the anomaly label resolution from individual data points to larger time periods, labelling the entire time period as anomalous.

After Isolation Forest was used to label the entire data set (consisting of one year of data), the data set was split into weeks. Any data which Isolation Forest labelled as ‘normal’ was removed from each week’s data, leaving only a ‘cloud’ of anomalous data. Each cloud of data was then feed to a KMeans model and clustered into one of two groups.

Anomaly Cloud

Regardless of their inherent normality, each week of data contained some volume of anomalies identified by the Isolation Forest model. By using KMeans to cluster the weeks into two groups, weeks which were ‘more normal’ would group together while weeks which were ‘less normal’ would group together as well. The results of this approach were mixed, but did show promise in specific situations.

This combined model was able to successfully identify power curtailments in three data sets; ID06, ID07, and ID14. These datasets were the only in the provided data to have clear and sustained power curtailments. As long as the change in operation was clear and sustained this model did a great job of identifying and labelling this change in operation mode.

Promising Results

However, this approach struggles to deliver clear signals when the operation of the turbine is erratic, or there are multiple failure modes present in the same dataset. In such cases, weeks which seemed very similar were labelled as different and weeks which were clearly very different were not singled out.

Less-Than-Clear Labels

  • Due to proprietary data restrictions, the code for this project is not publicly available

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.