This is a Regression project using the Combined Cycle Power Plant Data Set from the UCI Machine Learning Repository. The data was in a .DATA file, which is also a part of this repository.
This project aims to find different regression models using different types of regression and find which type of regression provides the best accuracy, and which type of regression best suits the data set. The types of regression used are: multiple linear regression, polynomial regression, support vector regression, decision tree regression, and random forest regression. This project includes data preprocessing, basic visualization, splitting the dataset into training and testing data to overcome overfitting, creating several regression models, and comparing the accuracy of all models.
These are the variables in the dataset:
Variable Name |
---|
Temperature (T) |
Ambient Pressure (AP) |
Relative Humidity (RH) |
Exhaust Vacuum (V) |
Energy Output (EP) |
The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.
A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. In a CCPP, the electricity is generated by gas and steam turbines, which are combined in one cycle, and is transferred from one turbine to another. While the Vacuum is colected from and has effect on the Steam Turbine, he other three of the ambient variables effect the GT performance.
This project uses Python's Scikit-Learn machine learning package, alongwith Pandas, NumPy, MatplotLib, and MissingNo. It also uses the packages warnings to hide extra warnings, OpenPyXl to read the .xlsx format dataset and MatplotLib's ggplot style sheet for better graphs. For who do not have these installed, execute the following lines in your terminal:
pip install missingno
pip install matplotlib
pip install numpy
pip install pandas
pip install scikit-learn
pip install warnings
pip install openpyxl
According to the UCI Machine Learning website, publishing any material based on databases obtained from this repository requires acknowledgement, and a note of assistance received by using the repository, if any, so that it helps others to obtain the same data sets and replicate the experiments. For this dataset, two papers are required to be cited. The following are:
Pınar Tüfekci, Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods, International Journal of Electrical Power & Energy Systems, Volume 60, September 2014, Pages 126-140, ISSN 0142-0615, Web Link.
Heysem Kaya, Pınar Tüfekci , Sadık Fikret Gürgen: Local and Global Learning Methods for Predicting Power of a Combined Gas & Steam Turbine, Proceedings of the International Conference on Emerging Trends in Computer and Electronics Engineering ICETCEE 2012, pp. 13-18 (Mar. 2012, Dubai) Web Link.