Giter VIP home page Giter VIP logo

taxidemandml's Introduction

Taxi Demand Prediction around Central Park - README

Introduction

This repository contains the source code and data for a project that aims to predict taxi demand around Central Park using time series data. The project focuses on converting the time series data into a tabular format and utilizing it to predict the number of pickups for a given hour based on the previous hour's data.

Table of Contents

  1. Introduction
  2. Data
  3. Notebooks
  4. Installation
  5. Usage
  6. Contributing
  7. License

Data

The data folder contains two subdirectories: raw and transformed.

  • raw: This directory contains raw data files in Parquet format representing taxi rides for each month of the year 2022. The files are named as follows: rides_2022-MM.parquet, where MM represents the month.

  • transformed: This directory contains transformed data files in Parquet format, including the final tabular data used for modeling (tabular_data.parquet). Additionally, there are intermediate files generated during the data transformation process.

  • taxi_zones.csv: is required for the geolocation of the taxi zones.

Notebooks

The notebooks directory consists of Jupyter notebooks used for different stages of the project:

  1. 00_functions.ipynb: A notebook containing custom functions used throughout the project.

  2. 01_load_and_validate_raw_data.ipynb: A notebook for loading and validating the raw data.

  3. 02_transform_raw_data_to_time_series.ipynb: A notebook that transforms raw data into time series format.

  4. 03_time_series_data.ipynb: A notebook exploring and analyzing time series data.

  5. 04_transform_raw_data_into_features_and_targets.ipynb: A notebook responsible for feature engineering.

  6. 05_visualize_training_data.ipynb: A notebook used to visualize the training data.

  7. 06_baseline_model.ipynb: A notebook implementing a baseline model for prediction.

  8. 07_XGBoost_model.ipynb: A notebook presenting an XGBoost model for prediction.

  9. 08_catboost.ipynb: A notebook demonstrating the CatBoost model.

  10. 09_catboost_model_with_feature_engineering.ipynb: A notebook combining CatBoost with feature engineering.

  11. 10_catboost_with_hyperparameter_tuning.ipynb: A notebook for hyperparameter tuning of CatBoost.

The catboost_info directory stores additional files related to CatBoost training.

Installation

To set up the environment for this project, you can use poetry to install the required dependencies. Use the provided pyproject.toml and poetry.lock files to manage dependencies. Run the following command to create the environment:

poetry install

Usage

After installing the required dependencies, you can use the Jupyter notebooks in the notebooks directory to explore the project and run the code cells sequentially.

Contributing

Contributions to this project are welcome! If you find any issues or have ideas for improvements, please open an issue or submit a pull request to contribute.

taxidemandml's People

Contributors

llctrautmann avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.