Capstone : Forecasting Water Levels in Chennai India

By: Asher Lewis Github >

Problem Statement

For this project, we are going to try to forecast the average monthly water level for Chennai India's four main reservoirs using time-series data. The threshold of success of our model doing well enough that if it scores higher than the baseline model. The reason for doing so is that in 2019 Chennai experienced a water crisis which had millions of people left without water and required many trains and truck to get the city water. If we can forecast the monthly demand for a given reservoir we can get an idea of how and when the cities reservoirs run out of water. This information can potentially be used later down the line to predict future water demand. The water level is measured in millions of cubic feet. We are going to score our predictions using the Mean Squared Error (MSE). Water demand forecasting is hard in general so we have a rather modest goal for our model to score lower than the baseline's model MSE. This would translate to our model having an MSE closer to zero than the Baseline.regression models.

Executive Summary

On 19 June 2019, Chennai city officials declared that "Day Zero", or the day when almost no water is left, had been reached, as all the four main reservoirs supplying water to the city had run dry. First in this project we first combined our two given data sets and saved them into a new csv for analysis and forecasting.

The workflow was than broken up in four separate notebooks with this fifth one serving as the place where the notebooks could all come together.

In each notebook, we analyzed trends and the nature of both the water level and rain. We then explained some elements of time-series data, such as the potential problem of data not being stationary.

After this, we split our data and modeled. We ran a baseline model on each one of the reservoirs. After that, we ran an ARIMA model on each reservoir. For the ARIMA model, we looked at the residuals and plotted the predictions.

The workflow for this project has been divided up into five notebooks:

Four notebooks for each individual reservoir, and a fifth notebook that summarizes our problems and findings.

Data Dictionary

Feature	Type	Dataset	Description
Date	datetime64	chennai_reservoir_levels.csv	The date in year, month and day
Poondi_water_level	Float64	chennai_reservoir_levels.csv	Water level of Poondi lake in Millions of Cubic Feet
Cholavaram_level	Float64	chennai_reservoir_levels.csv	Water level of Cholavaram lake in Millions of Cubic Feet
Redhills_water_level	Float64	chennai_reservoir_levels.csv	Water level of Redhills lake in Millions of Cubic Feet
Chembarambakkam_water_level	Float64	chennai_reservoir_levels.csv	Water level of Chembarambakkam lake in Millions of Cubic Feet
Cholavaram_rain	Float64	chennai_reservoir_rainfall.csv	Rainfall for Cholavaram lake in millimeters
Poondi_rain	Float64	chennai_reservoir_rainfall.csv	Rainfall for Poondi lake in millimeters
Redhills_rain	Float64	chennai_reservoir_rainfall.csv	Rainfall for Redhills lake in millimeters
Chembarambakkam_rain	Float64	chennai_reservoir_rainfall.csv	Rainfall for Chembarambakkam lake in millimeters

Our data comes from Chennai Metro and Sewer and was gathered together on Kaggle. It contains data daily data from 2004 to the end of 2019.

Conclusions and Recommendations

All of our models managed to get above our problem statement's goal of higher MSE score than the baseline model.

Time-series data is a very difficult task and I wish I had more time, more time to go in-depth to see things that are very elusive and have to be pulled out. The seasons define us and the trends need to be explored

There are many things we can do in the future such as implementing more complex Models such as SARIMA and var models. Another thing we could do is run the are existing models with differencing the data. Another thing we could have done is regularize the data.

It goes without being said but always getting more data is better. It would be nice to have such features such as temperature and exact water usage.

In terms of the data, it was fascinating to see how in the data how much everything is man-made from the reservoirs themselves to the water scarcity problem with the data. I would suggest better collection methods of water during the monsoon season. Another thing I would suggest is to get a better record of how people use the water. This is truly a crisis that unfortunately awaits most cites unless we take the proper action.

abrahamlewis4867 / ga-capstone-project Goto Github PK

ga-capstone-project's Introduction

Capstone : Forecasting Water Levels in Chennai India

Problem Statement

Executive Summary

Table of Contents

Data Dictionary

Conclusions and Recommendations

References

ga-capstone-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent