Giter VIP home page Giter VIP logo

time-series-analysis-and-forecasting-using-tunnel-traffic-data's Introduction

Time-series-forecasting-for-tunnel-traffic

Steps of the project:

1. Goal/Objective:

Tunnel Traffic is a time series describing the number of vehicles traveling through the Baregg Tunnel in Switzerland each day from November 2003 to November 2005. Here, I try to explore various components in the time series like trends and seasonality and predict on out of sample data using Linear Regression model.

2. Data Understanding:

In order to understand the dataset, describe() function has been used to get information about various descriptive statistical measures like number of records in the dataset, mean value and standard deviation of the target variable, minimum and maximum values, etc.
data description

3. Data preprocessing and Feature Engineering:

In order to facilitate time series analysis, Pandas package has been used to transform the dataset accordingly. By setting the index to a date column, the "Day" column has been parsed as a date type by using parse_dates when loading the data. Moreover, a time dummy/time step feature has been created for generating lag features from the same by counting out the length of the series. To engineer time dummy features, a function from the statsmodels library called DeterministicProcess has also been used which can help to avoid some tricky failure cases that can arise with time series and linear regression. The order argument refers to polynomial order: 1 for linear, 2 for quadratic, 3 for cubic, and so on.

Then, We create lag features for understanding the relationship between number of vehicles on a particular day and future days. The lag plot shows us how well we were able to fit the relationship between the number of vehicles one day and the number the previous day. When creating lag features, we need to decide what to do with the missing values produced. Filling them in is one option, maybe with 0.0 or "backfilling" with the first known value. Instead, we'll just drop the missing values, making sure to also drop values in the target from corresponding dates.

lag plot

4. Model Development:

Linear regression has been used as the modeling algorithm for fitting the data and making predictions on "out of sample" data. "Out of sample" refers to times outside of the observation period of the training data. The model actually created is (approximately) is Vehicles = 22.5 * Time + 98176. Plotting the fitted values over time shows us how fitting linear regression to the time dummy creates the trend line defined by this equation. linear trend forecast

5. Analysis:

A moving average plot has been generated to see what kind of trend this series has. Since this series has daily observations, a window of 365 days has been chosen to smooth over any short-term changes within the year. To create a moving average, the rolling method has been used first to begin a windowed computation. It was followed by the mean method to compute the average over the window. As we can see, the trend of Tunnel Traffic appears to be about linear.

For analysing the presence of seasonality in the dataset, the seasonal plots based on weeks and years have been created. The seasonal plots based on days of a certain week as well as based on yearly changes look as follows - seasonal plot week

Also, plot periodogram has been used to identify the dominant periods or frequencies in the time series. plot periodogram

6. Results:

After incorporating the lag features into the dataset, the predictions of the linear regression model for data that are outside the available dataset can be shown as follows - seasonal forecast

7. Result Interpretation:

The trend discovered by our LinearRegression model is almost identical to the moving average plot, which suggests that a linear trend was the right decision in this case. Also, whereas the model does predict the trend with certain degree of correctness, the predictions start to deviate from the linear trend with the progression of time. (e.g. starting around the year 2006) Hence, testing more sophisticated models for forecasting would be the next step for the project.

time-series-analysis-and-forecasting-using-tunnel-traffic-data's People

Contributors

sohambera16 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.