Giter VIP home page Giter VIP logo

product-matching-using-deep-learning's Introduction

Product Matching for E-Commerce using Deep Learning

Deployment stopped due to GCP charges

Not Working Currently: Deployed Web App : http://product-matching-webapp.el.r.appspot.com/

E-commerce has seen an incredible surge in terms of the users over the past few years. The transition to E-commerce has been further accelerated by the COVID-19 pandemic. Thus for each E-commerce companies, it is has become increasingly important to provide high-quality search results and recommendations. With millions of third-party sellers operating on their websites, the process of distinguishing between products have become increasingly difficult.

The Goal of this project is to develop an efficient strategy to find similar products available on an e-commerce website by utilizing the product's image and text label.

Few Examples:

Example 1:

example 1.1

example 1.2

example 1.3

Example 2:

example 2.1

example 2.2

example 2.3

Structure of Web App

Web app structure

Why don't we just compare image and text directly?

Each image cannot be compared one by one with the whole image dataset. This approach will be incredibly computational expensive and excessively time-intensive in nature due to the sheer size of images. The process of comparing texts directly also may not give the desired outcomes.

Hence a fine-tuned pre-trained CNN models can be used to generate image embeddings and a similar approach is utilized to convert text data into word embeddings using TfidfVectorizer and a Transformer. This approach produces an average F1 score of 0.87 when compared to a baseline score of 0.55.

What are Embeddings

Embeddings are a vector representation of data formed by converting high dimensional data (Image, text, sound files etc.) into relatively low dimensional tabular data. They make it easier to perform machine learning on large inputs.More Information

This dataset was provided by Shopee ,Shopee is s a Singaporean multinational technology company which focuses mainly on e-commerce.

Approach Utilized

Image-based strategy

Rather than creating our model for embedding generation, the best method is to use state of the art image models then fine-tune them on our dataset. Using these pre-trained models without any fine-tuning will provide an average result (average F1 score of 0.59) whereas the fine-tuned model performs much better (average F1 score of 0.73). The image below represents the model used to generate image embeddings.

Image Model


The process of model fine-tuning is borrowed from the facial recognition system, ArcFace Margin Layer is used instead of a softmax layer in the model during the fine-tuning process.

Advantage of ArcFace Layer

Unlike Softmax, it explicitly optimizes feature embeddings to enforce higher similarity between same class data, this, in turn, leads to a higher quality of embeddings being generated.

Softmax vs Arcface Margin


After embeddings generation, the goal is to generate accurate predictions using KNearestNeighbour Algorithm and Cosine Similarity. Due to a large number of input data, the sklearn framework cannot be utilized as it leads to Out of Memory Error.Hence the RAPIDS library is used, it is an open-source framework used to accelerate the data science process by providing the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.

Image Predictions


Predictions from all the different image models are merged by utilizing either of the prediction approaches to be discussed later in the document to generate the final image based predictions.

Text-based strategy

The product’s text label is converted into word embeddings using two different approaches, TfidfVectorizer and Sentence Transformer are used to encode every text label.

Text Model


TfidfVectorizer

TF-IDF (term frequency–inverse document frequency) is used to under stand the relevance of a word present in the document,TfidfVectoriver utilizes the tfidf values to generate word embeddings for each and every label

Sentence Transformers

Sentence Transformers is a framework which provides an easy method to generate vector representations of text by utilizing transformer networks like BERT,RoBERTa etc,for this application a pretrained transformer is used to generate sentence embeddings for finding the semantic similarity between text data.

Text Predictions


After embedding generation, both the tfidf and transformer embeddings can be used by either of the prediction approaches for final prediction calculation. This dataset was provided by Shopee from their indonesian division for a data science competition, Shopee is s a Singaporean multinational technology company that focuses mainly on e-commerce.

Prediction generation using Cosine Similarity and KNearestNeigbour Algorithm + Merging Approach

Cosine Similarity

cosine similarity

Cosine similarity tells us the similarity between two different vectors by calculating the cosine of the angle between two vectors and determines whether the two vectors lie in the same directions, to generate the final predictions a minimum threshold distance is decided and all data points with a similarity value greater than the threshold value are the required predictions. (Higher the similarity value, closer the relation between data points).

KNearestNeighbour Algorithm

NearestNeighbour is a common algorithm used to find the required number of nearest data points according to a chosen metric. This allows us to find accurate predictions by deciding a minimum threshold distance. All data points with a distance less than the decided threshold will be the required predictions. (Lower the distance, closer the relation between data points).

Merging Approach

First Approach

1st appoach

Second Approach

2nd approach

First Approach performs slightly better than the second approach. The 1st approach allows the predictions to be developed using the merged embeddings (both image and text embeddings) whereas the second approach uses merging independents predictions.

Implementations of both the prediction methods are performed using the open-source library developed by NVIDIA called RAPIDS.

Results

Metric Used and how its calculated:

The Metric used to judge the performance is the Average F1 Score. For each data entry, the F1 score is calculated and then the mean of all F1 Scores is taken.F1 score measures a test's accuracy, it is calculated using precision and recall of the test.

F1 Score

Setting baseline using pHash

The used dataset provides the predictions using using pHash, pHash is a fingerprint of a multimedia file derived from various features, If pHash are 'close' enough, then datapoint is similar.

Baseline Average F1 Score:0.55
Image Only Score(Fintuned CNN):0.72
Text Only Score: 0.62

Resuts

Final Merged Score(Image+Text): 0.87

Resuts

product-matching-using-deep-learning's People

Contributors

harsh-miv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

product-matching-using-deep-learning's Issues

still working

@harsh-miv

Hi we want to give a shot to /product search via similarity. Is this the repo that fully working? Or can you recommend us a different direction.

If we can successfully use your code we will contribute surely

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.