Giter VIP home page Giter VIP logo

data-science-and-big-data-analytics's Introduction

Data-Science-and-Big-Data-Analytics

For NCCU 1052 Data Science and Big Data Analytics Final Project
Finish a machine learning challenged hosted by Kaggle in conjunction with Expedia : Expedia-personalized-sort

 Member

  1. Liu Z.Y.
  2. Tammykan
  3. Cwsu

What features I choose

  1. visitor_hist_starrating
  2. prpo_starrating
  3. prop_review_score
  4. prop_brand_bool
  5. prop_location_score1
  6. prop_location_score2
  7. promotion_flag
  8. orig_destination_distance

What data I predict

  1. booking_bool

Preprocessing

After download the data, I seperated the booking_true and booking_false data. And get the same number of rows with awk

    awk -F "," '{if(substr($54,0,1)=="1") {print}} ' data.csv > book.csv
    awk -F "," '{if(substr($54,0,1)=="0") {print}} ' data.csv > no_book.csv
    tail -n ? no_book.csv > no_book_tmp.csv
    cat book.csv no_book_tmp > data.csv

Run

Use spark to run the python script

    spark-submit --master local ./Main.py

Algorithm

I use RandomForest.trainClassifier and the null model is 1/2(guess)

Result

  1. Accuracy: 77.89%
  2. Area under Precision/Recall (PR) curve: 87%
  3. Area under ROC curve: 78.251%

Reference

  1. Spark doc
  2. Personalize Expedia Hotel Searches - ICDM 2013
  3. benhamner

data-science-and-big-data-analytics's People

Contributors

tammykan avatar zyliutw avatar

Watchers

 avatar

Forkers

cwsu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.