Giter VIP home page Giter VIP logo

ga-capstone-project's Introduction

Capstone : Forecasting Water Levels in Chennai India

โ€‹ โ€‹ By: Asher Lewis Github >

Problem Statement

For this project, we are going to try to forecast the average monthly water level for Chennai India's four main reservoirs using time-series data. The threshold of success of our model doing well enough that if it scores higher than the baseline model. The reason for doing so is that in 2019 Chennai experienced a water crisis which had millions of people left without water and required many trains and truck to get the city water. If we can forecast the monthly demand for a given reservoir we can get an idea of how and when the cities reservoirs run out of water. This information can potentially be used later down the line to predict future water demand. The water level is measured in millions of cubic feet. We are going to score our predictions using the Mean Squared Error (MSE). Water demand forecasting is hard in general so we have a rather modest goal for our model to score lower than the baseline's model MSE. This would translate to our model having an MSE closer to zero than the Baseline.regression models.

Executive Summary

On 19 June 2019, Chennai city officials declared that "Day Zero", or the day when almost no water is left, had been reached, as all the four main reservoirs supplying water to the city had run dry. First in this project we first combined our two given data sets and saved them into a new csv for analysis and forecasting.

The workflow was than broken up in four separate notebooks with this fifth one serving as the place where the notebooks could all come together.

In each notebook, we analyzed trends and the nature of both the water level and rain. We then explained some elements of time-series data, such as the potential problem of data not being stationary.

After this, we split our data and modeled. We ran a baseline model on each one of the reservoirs. After that, we ran an ARIMA model on each reservoir. For the ARIMA model, we looked at the residuals and plotted the predictions.

Table of Contents

The workflow for this project has been divided up into five notebooks:

Four notebooks for each individual reservoir, and a fifth notebook that summarizes our problems and findings.

  1. Main notebook
  2. Notebook for Chembarambakkam
  3. Notebook for Poondi
  4. Notebook for Redhills
  5. Notebook for Cholavaram

Data Dictionary

Feature Type Dataset Description
Date datetime64 chennai_reservoir_levels.csv The date in year, month and day
Poondi_water_level Float64 chennai_reservoir_levels.csv Water level of Poondi lake in Millions of Cubic Feet
Cholavaram_level Float64 chennai_reservoir_levels.csv Water level of Cholavaram lake in Millions of Cubic Feet
Redhills_water_level Float64 chennai_reservoir_levels.csv Water level of Redhills lake in Millions of Cubic Feet
Chembarambakkam_water_level Float64 chennai_reservoir_levels.csv Water level of Chembarambakkam lake in Millions of Cubic Feet
Cholavaram_rain Float64 chennai_reservoir_rainfall.csv Rainfall for Cholavaram lake in millimeters
Poondi_rain Float64 chennai_reservoir_rainfall.csv Rainfall for Poondi lake in millimeters
Redhills_rain Float64 chennai_reservoir_rainfall.csv Rainfall for Redhills lake in millimeters
Chembarambakkam_rain Float64 chennai_reservoir_rainfall.csv Rainfall for Chembarambakkam lake in millimeters

Our data comes from Chennai Metro and Sewer and was gathered together on Kaggle. It contains data daily data from 2004 to the end of 2019.

Conclusions and Recommendations

All of our models managed to get above our problem statement's goal of higher MSE score than the baseline model.

Time-series data is a very difficult task and I wish I had more time, more time to go in-depth to see things that are very elusive and have to be pulled out. The seasons define us and the trends need to be explored

There are many things we can do in the future such as implementing more complex Models such as SARIMA and var models. Another thing we could do is run the are existing models with differencing the data. Another thing we could have done is regularize the data.

It goes without being said but always getting more data is better. It would be nice to have such features such as temperature and exact water usage.

In terms of the data, it was fascinating to see how in the data how much everything is man-made from the reservoirs themselves to the water scarcity problem with the data. I would suggest better collection methods of water during the monsoon season. Another thing I would suggest is to get a better record of how people use the water. This is truly a crisis that unfortunately awaits most cites unless we take the proper action.

References

  1. Duke University

  2. Penn State

  3. dataquest

  4. lin_reg_SOLUTION

  5. Regression Metrics

  6. linear_time_series_SOLUTION

  7. Indian Express

  8. Our water in stress

  9. Water Project

  10. UN

  11. Digital India

  12. Kaggle

  13. towards data

  14. npr

  15. indian press

  16. wbr

  17. cenus India

  18. stack exchange

  19. chenni sewer metro

ga-capstone-project's People

Contributors

abrahamlewis4867 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.