Giter VIP home page Giter VIP logo

trafficmanagement's Introduction

Grab Ai Traffic Management Challenge

Can we accurately forecast travel demand based on historical Grab bookings to predict areas and times with high travel demand?

Getting Started

I wasn't sure whether the test dataset could be used for generated the predictions by training the model, hence I uploaded backup models just in case.

Main Files:

  • GenerateFeatures.ipynb - builds features from source data
  • training.ipynb - train necessary models (If allowed)
  • PrepareNextFewPredictions.ipynb - Creates T+1 to T+5 predictions
  • GeneratePredictionT.py - Creates predictions for T+1
  • GenerateFeatures.py - script for generating features in the process.
  • submission.csv - final output

Other Files:

  • StaticValues - Static values for mapping of features that are generated previously
  • Scaler - scaler is generated during training to prevent data leakage

Prerequisites

Please ensure the following Python packages are installed before running the codes from Jupyter Notebook

*pandas
*numpy
*matplotlib.pyplot
*random
*pickle
*xgboost
*sklearn

How to generate submission files

Run GenerateFeatures.ipynb to generate features from the test set. Change training.csv to what the test csv is called
Run training.ipynb to train models from the features. (Training took around 10-15 minutes using 16core aws instance during test of 14 days data)
Run PrepareNextFewPredictions.ipynb to generate T+1 to T+5 predictions. Uses prediction of T+1 as features for T+2

Preprocessing

I had spent most of the time coming up with features and testing the effectiveness of it.

Static features

  • Longitude, latitude
  • X,Y,Z coordinates
  • Total Distance from everywhere - finding places far from everyone
  • Zones - Cluster the geohashes into 10 zones and reordered from highest demand to lowest
  • Distance to high demand 5 - Distance to highest demand zone
  • Distance to high demand 7 - Distance to second highest demand zone

ClusterImage

Feature Engineering

Temporal features

  • Hour and Minutes
  • Sin and cos Hour
  • Lagged demand (T-1 to T-7)
  • Day of Week
  • Peak Hours

Spatial features

  • Split into cluster zones by demand with Kmeans
  • Split into geohash4 and geohash5 zones

Statistical features

  • Moving averages
  • Exponential Moving averages
  • Moving median
  • Variance
  • Standard Deviation
  • Min
  • Max

Additional features

  • High demand percentage for geohash
  • last week's demand at same time

Not so useful features

  • Nearest neighbours
  • Duration of high demand
  • Log scale distance

Validation

  • Validation done on training dataset. Did a 4 fold validation using 14 days of data and testing on next day.

Models Model

  • Xgboost:6 xgboost models trained in rolling window fashion from 14 days
  • Xgboost(backup) 5 xgboost models trained from 4 week rolling windows from 60 days dataset
  • Model hyperparameters were manually adjusted.
  • Tried an ensemble with KNNRegressor and Stacked LSTM but results wasn't so good
  • Tried adding models trained using a specific day of week only, applied bagging and used it to predict the day of week, did not improve results significantly

Acknowledgments

Many thanks to grab and sponsor AWS for providing credits. I am truly grateful for the opportunity to work on this challenge as I have learnt alot from it. My laptop crashed a day before submission and I had to rewrite most of the code so I was thankful for the cloud credits available for me to continue my work. Some of the code that generated the static values might not be there as I am unable to recover it from my laptop yet, but it can be provided upon request.

trafficmanagement's People

Contributors

rongronggg avatar limcrong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.