Giter VIP home page Giter VIP logo

data-mining-cup-2015's Introduction

Winning the Data Mining Cup 2015

Our team of students from Humboldt-Universität zu Berlin (HU Berlin) in the course "Applied Predictive Analytics" of Professor Lessmann won the international machine learning competition Data Mining Cup in 2015.

DMC 2015 winners!

The Competition

A short overview over the competition task and the datasets can be found on the competition website.

  • Setting: Coupons in online shops
  • Questions:
    • “Who responds to coupons?”
    • “What is the impact on the basket value?”
  • Data: historical order data from an online shop
    • Originally only around 6k observations (augmented to about 20k by us).
    • Original/raw features (28):
      • order ID, order time, user ID
      • product data (price, categorgy, product line, premium product?, ...)
      • coupon data (ID, time of generation)
      • An overview over all provided features was part of the provided task description (link).
    • Targets (4):
      • 3 different coupons (binary classification: redempted or not?)
      • shopping basket value (regression)

Evaluation

The predictions were evaluated using a custom evluation function:

DMC 2015 evaluation function

This function looked harmless at first but had many implications:

  • 3 parts of the sum relate to the coupons, 1 to the basket value
  • Errors in the coupon predictions are weighted inversely to that coupon's average redemption. (E.g. errors for a coupon that is redempted 20 % of the time are much more costly than errors for a coupon that is redempted 70 % of the time.)
  • It seemed important to detect very large basket value outliers in the test data since these had a potentially huge impact. (This is where our hand-crafted econometric models outperformed the machine learning models and might have given us the edge over teams only applying standard machine learning tools.)

Our Approach

We believe our success in this competition can be ascribed to:

  • Great guidance in the process by Professor Lessmann (no help apart from general advice, as this was a student only competition).
  • Loose division of responsibilities: This enabled parts of the team to start training models right away while new data sets kept coming from the feature engineering team and everyone got more insight into the specifics of the task.
  • A very deep dive into the data and extensive feature engineering.
  • Training lots of machine learning models.
  • Forecast combination of machine learning and econometric approaches using hold-out data.
  • A good portion of luck. :)

More details in our report and short slide deck.

Final Ranking

  1. HU Berlin 2 (Germany)
  2. Iowa State 2 (USA)
  3. Iowa State 1 (USA)
  4. École Polytechnique Fédérale De Lausanne (Switzerland)
  5. École Polytechnique Fédérale De Lausanne (Switzerland)
  6. HU Berlin 1 (Germany)
  7. Gadjah-Mada U (Indonesia)
  8. U Marburg (Germany)
  9. TU Dortmund (Germany)
  10. KIT (Germany)

data-mining-cup-2015's People

Contributors

howtodowtle avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.