Giter VIP home page Giter VIP logo

instacart-market-basket-analysis's Introduction


*The score is very close to the competition's winner's score i.e. 0.409 . Competition Leaderboard

  • Tech and Algo used: Auto-Encoder, Logistic-Regression, Decision Tree, Random-Forest, AdaBoost, Gradient Boosting, Feature-Engineering, Python, Tensorflow, Pandas, Sklearn, Matplolib, Seaborn, Plotly.

Problem Overview:

Goal: Predict which products will an Instacart consumer purchase again.

  1. Instacart is a grocery ordering and delivery app.
  2. Currently they use transactional data to develop models that predict which products a user will buy again, try for the first time, or add to their cart next during a session.
  3. The goal is to predict which previously purchased products will be in a user’s next order.
  4. For each orderid in the test set, we should predict a space-delimited list of product ids for that order.
  5. Predict an explicit 'None' value for orders with no reordered items.
  6. In the data provided, over 3 million grocery orders are present.
  7. More than 200,000 Instacart users.
  8. For each user, instacart provided between 4 and 100 of their orders in the dataset, with the sequence of products purchased in each order.

Key Takeaways:

  • Reorder of a product by a user highly depends on the frequency and recency of past purchases.
  • Fruits and Vegetables are reordered much more than any other product.
  • Personal Care products are reordered very less.
  • Gradient Boosting gave the best result for the dataset.
  • Probability Calibration was needed since the dataset was highly imbalanced.

Top 10 Feature Engineering:

  1. purchase_weight_order_up: Weight of user-product pair based on frequency of purchase and recency(order) of purchase.
  2. reorder_weight_order_up: Weight of user-product pair based on frequency of reorder and recency(order) of reorder.
  3. #orders_since_last_purchase_up: No. of orders placed by the user after his/her last purchase of the given product.
  4. #reorders_in_last_3_orders_up: No. of times user has reordered the given product in his/her last 3 orders.
  5. purchase_weight_days_up: Weight of user-product pair based on frequency of purchase and recency(days) of purchase.
  6. #purchases_in_last_3_orders_up: No. of times user has purchased the given product in his/her last 3 orders.
  7. p(reorder|user,product)_up: (#orders where given product was rerodered by user) / (Total #orders by user)
  8. p(reorder|product)_p: (#reorders of product p) / (#purchases of product p)
  9. exceed_in_max_lifetime_orders_up: No. of orders placed after the last purchase of the given product by the user - Max no. of orders after which user u purchased product p in past.
  10. days_since_last_purchase_up: No. of days passed after the last purchase of the given product by the user.

Click here to download the dataset with all the 96 engineered features.

Description of data provided by Kaggle: Link

If you find this helpful, please do star the repo.

You can find me on

instacart-market-basket-analysis's People

Contributors

shubhamscifi avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

margaretnm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.