Giter VIP home page Giter VIP logo

shopee-product-price-match-guarantee's Introduction

Shopee - Price Match Guarantee: Match products with descriptions and images

Machine learning project

CI

Duke University (MIDS) - Spring 2023


Project Overview

A competitive feature amongst retail platforms is product matching which allows companies to offer products at rates competitive to other retailers selling similar products. There are many methods that combine deep learning and traditional machine learning methods to analyze image and text information to calculate similarity between products, however there is little research comparing the effectiveness of integrating multimodal data (product images and descriptions) under this domain (Łukasik et al., 2021). Here, we compare the performance of both unimodal and multimodal models. We trained separate models for text (SBERT and DistilBERT) and images (ResNet50 and MobileNet); the DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet to predict product labels, which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.

Presentation

Click on the image to watch the presentation

image

Report

Shopee Final Report

Data

Shopee is the leading e-commerce platform in Southeast Asia and Taiwan; their platform contains products from vendors all over the world, predominantly in Singapore and Indonesia. In 2021, the company launched a Kaggle competition aimed at improving product matching algorithms to optimize their customers’ online shopping experience (Dane et al., 2021).

Link to Data

Data Split

Methods

We used the following methods to train our models:

Results

The following table shows the performance of the models trained on the Shopee dataset. The DistilBERT and ResNet50 models outperform the other two in terms of F1 score and accuracy. The multimodal model used joint embeddings from DistilBERT and MobileNet* to predict product labels , which outperformed both unimodal implementations. The integration of product images and titles offer the most useful information to find product matches on a particular platform.

Note: Due to computational restritions, we substitued ResNet50 to MobileNet for the multimodal model.

Performance on Test Set

Model Type Model F1 Score Accuracy
Text SBERT 0.43 0.45
Text DistilBERT 0.48 0.45
Image ResNet50 0.45 0.48
Image MobileNet 0.38 0.40
Text & Image Multimodal 0.50 0.53

Reproducibility

To reproduce our results, please follow the steps below:

  1. Clone the repository
  2. Install the requirements in requirements.txt using pip install -r requirements.txt
  3. If you cannot access data in 00_source_data in this repo, download the data from the Shopee Kaggle competition
  4. Under 10_code, run 01_train_test_split.ipynb to split the data into train, validation and test sets
  5. Under 10_code, run 02_Bert_Model.ipynb to train and use the embeddings from SBERT and DistilBERT
  6. Under 10_code, run 03_ResNet50_Embeddings.ipynb to train and use the embeddings from ResNet50
  7. Under 10_code, run 04_MobileNet_Embeddings.ipynb to train and use the embeddings from MobileNet
  8. Under 10_code, run 05_Multimodal_Model_Embeddings.ipynb to train and use the embeddings from DistilBERT and MobileNet

shopee-product-price-match-guarantee's People

Contributors

alisa0705 avatar ishasingh01 avatar sanil72900 avatar yer1k avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

yer1k alisa0705

shopee-product-price-match-guarantee's Issues

Rules for the Project Repo

Revised on April 13th, 2023

Same as with our other projects, please follow the following agreed guidance:

  1. No direct changes on Main branch

    • Instead, create a new branch with an INFORMATIVE branch name; for example, "EDA", "regression_model", etc.
    • When you are happy with the branch and want to merge them to Main, please start a PR (Pull Request), and add team members to review, see the screenshot below:
      image
  2. Use Issues for discussion

    • If you have found something worth discussing, please use issues and tag other team members, so that everyone is on the same page; for instance, @sanil72900, @IshaSingh01, @alisa0705
    • Please leave a comment or emoji after you have read this Issue, thanks.
    • Try describing the problems in more detail. Please do not use issues as a place to take notes or track progress.
    • REMEMBER, you can always discuss in the Slack group and use the progress tracker (I know we just created the progress tracker
  3. More to come

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.