Giter VIP home page Giter VIP logo

kaggle_expedia's Introduction

ML_Expedia

The data was retrieved from Kaggle.com and can be found in the link down below.

Note: train.csv is too big to upload. The data in this repo is the reduced data that contains only 10 cluster ids.

https://www.kaggle.com/c/expedia-hotel-recommendations/data

Group Members

  • Mikio Tada
  • Dillon Quan
  • Shrikar Thodla

Project Summary

The goal of this project was to predict which hotel cluster a user will book based on their behaviors on the website and other various information. The business impact of this goal can help Expedia provide personalized hotel recommendations to their users to improve user experience with their vacation planning, which can ultimately increase revenue for Expedia.

We downsampled the original dataset from 0-99 (37 million samples) hotel cluster IDs to only 0-9 (4 million samples) hotel cluster IDs. In addition, for a more efficient workflow, we randomly sampled 1 million samples to minimize runtime. Models that we used for this project include Random Forest, Decision Tree, Multiclass Logistic Regression, and k-nearest neighbor. For this problem, precision is the north star metric because we want to develop a model that will predict hotel clusters as precise as possible.

After running the four different model with default hyperparameters, we found that Decision Tree and Random Forest are good baseline models to improve. After tuning for the best hyperparameters for these two models, we ran the models with 4 million rows dataset and achieved around 63% weighted average precision score for both Random Forest and Decision Tree.

In a business view, a weighted average precision around 63% in our context means that the model can correctly predict 63% each hotel cluster ID 0-9 in weighted average, is correctly predicted. Our model does a better job than randomly predicting a hotel cluster. However, productizing our model depends on how well the current Expedia's model performs. One advantage of our model is its speed in returning a result. Random Forest and Decision Tree are fast to show a predicted hotel cluster once the models are trained. For user experience perspective, a user doesn't have to wait a long time to see the outcome/predicted hotel cluster.

kaggle_expedia's People

Contributors

mikiotada avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.