Giter VIP home page Giter VIP logo

booking-prediction-eda's Introduction

logo

Booking-Prediction-EDA

======================

Business Understanding and Set-up

Analysis of the booking data of a hotel chain/ hotel cooperation with five properties in locations.

DataSet

The dataset contains bookings made in five different hotels from one hotel chain. One line corresponds to one person. For example, a double room is usually displayed in two rows with the same booking number. (Could also be two single rooms) The customer id is also assigned per person and provides information about whether customers book in several houses or more than once in one house.

Dimensions:

  • 98 features (57 original + 41 new)
  • 245k assignments

Goal

  • Help identify repeating customers, as well as the features used for predition.
  • Recognition of guests who are willing to spend above average amounts of money (called VIPs).
  • Predict in which quarter and destination customer will book their next stay.

Final Deliverables

  • Jupyter notebook following PEP8 designed for data science / technical audience.
  • Slide deck (pdf / 10min presentation) pushed to GitHub designed for non-technical stakeholders outlining findings and recommendations, as well as future work.

Key Question

  • How to identify repeating customers?
  • What distinguishes VIPs from the other guests?
  • What do repeating customers/ VIPs have in common?

Step by step

this notebook:

  • Merging the three datasets
  • Cleaning
  • handling missing values
  • inconsistency checks
  • change variable types

next notebooks:

  • Feature engineering
  • EDA
  • Basic and advanded modelling
  • TimeSeries
  • Conclusion

Outcome/TakeAways

  • Columns reise_special_event & flag_old have no value for the analysis
  • Columns lkz & sprache_deutsch have low value for the analysis
  • Guest of the Viana do Castelo show a different booking behaviour than the rest
  • Guest from Linz and Düsseldorf are very similar.
  • Adaboost is best suited for forecasting regular guests.
  • To identify solvent customers, a logistic regression with dummie variables turned out to deliver the best result.
  • Hotels should focus on win customer to book less via travel agency
  • Advertising ban does not have a negative impact on follow up bookings

To-Do-List / Outlook

  • Adding additional information like:
  • Events at the destination
  • Cancellations
  • Revenue per available room (required capacity for each destination)
  • Feedback from tripadvisor, yelp, etc. as well as from hotel internal surveys

Files and Folders

  • dataset can not be published due to nda
  • data/glossary.xlsx: list of all features
  • slides/slides.pdf: slide deck visualising the findings (10min.)
  • 1-Cleaning.ipynb: Data cleansing for subsequent EDA
  • 2-EDA.ipynb: jupyter notebook with Exploratory Data Analysis (EDA), visualizations, further documentation
  • 3-Modelling_Repeater.ipynb: predicting repeating customers
  • 4-Modelling_VIPs.ipynb: predicting solvent guests

Python Modules used:

Pandas / NumPy / Matplotlib / Seaborn / sklearn

License

This code is licensed under the GNU General Public License v3.0. For more details, please take a look at the LICENSE file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.