Giter VIP home page Giter VIP logo

bairagisaurabh / project-i-recommendation-system-amazon Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 7.01 MB

Recommending items for women fashion wear using "Title similarity" and "Nearest Neighbours"

Home Page: https://clothes-recommendation-system.streamlit.app/

Python 0.03% Jupyter Notebook 99.97%
amazon-fashion-discovery-engine data-cleaning data-visualization text-similarity unsupervised-learning item-item-similarity recommendation-system

project-i-recommendation-system-amazon's Introduction

Amazon Fashion Recommendation System

Online shopping can save time for both the buyer and retailer by reducing phone calls about availability, specifications, hours of operation or other information that can be easily found on company and product pages. There are a lot of reasons why customers today prefer shopping online:

  1. Convenience.
  2. Price comparisons.
  3. You can send gifts more easily.
  4. No need to travel.

The model developed in this project is based on 'Item-Item' similarity. Here, the products are recommended based on the item that the user is currently seeing and doesn't take into account what his/her previous shopping habits have been.

I have made 2 models for Recommendations which consider :

  1. Title similarity and cosine distance.
  2. Unsupervised learning algorithm (Nearest Neighbours).

Find more details about the model in Models section below :)

Acknowledgements

Demo screenshot

1

Demo

(https://bairagisaurabh-project-i-recommendation-system-amazo-app-emd83f.streamlit.app/)

Data Overview

The dataset contains : 183138 rows, 19 features. 1 Here, we only consider the features which help us in giving sensible recommendations.

  1. product_type
  2. formatted_price
  3. color
  4. brand
  5. title
  6. image_url

Data Visualisation Tools

⚪Bar plot (color) ⚪Pie chart (product)

⚪Funnel chart (brand) 1 ⚪PDF (price)

These tools give us an idea about the distribution,structure of the data and number of occurence of categorical values.

Data Cleaning

⚪Missing Values:

Price: This feature has majority values as missing and since it is an important feature we drop the rows having missing values. Because if we were to do mean imputation it would result into false representation of the data because we are imputing on small data.

⚪Cleaning categorical feature:

  • We remove special characters, numeric values and punctuations.

  • We replace the empty spaces by empty string and join two words with an underscore (_).

  • Nan values are filled by considering the value which occurs most frequently under a particular feature.

  • There are strings present for 'price' we replace it by the value (0).

⚪Cleaning Text data:

  • Dropping the duplicate titles.
  • Removing Stopwords, punctuations, numeric values.
  • Applying lemmatization to return dictionary form of words.

Data preprocessing

⚪ One hot encoding on categorical features:

               (Product_type, color, brand)

⚪ TFIDF vectorization on text data:

                        Product titles

⚪ Removing similar titles:

  • For judging title similarity we have used the fuzzy-wuzzy library where, for any given two strings it returns a value between (0-100).
  • Higher value implies that the two strings are very similar.
  • We have used the token sort ratio here to know about the similarity.
  • Fuzzywuzzy library
  • Below we see how this library works!

2

  • We define a function which gives us indexes of most similar titles for a given title.
  • We remove the titles with the help of these indexes.

Models

⚪ Model I (Recommendations through title similarity)

  • We vectorize the product titles by applying TFIDF.
  • Defined a function which returns cosine similarity value (0-1) for two such given vectors.
  • For a given title, we calculate cosine similarity with every other titles and select top 10 highest values and return their indexes.
  • These 10 indexes then help us to fetch 10 recommendations that we need to make.
  • We display the result by defining a function which shows images when a URL is given as input to it.

⚪ Model II (Using Nearest Neighbours algorithm)

  • We first train only the title feature to see if this algorithm gives good recommendations.
  • It was confirmed that the above model is indeed good and hence we use other features as well to train the model.
  • By using the (model.kneighbours) attribute we get distances between vectors and the indexes of (k) similar points which would be our recommendations.
  • We get URL corresponding to these indexes and feed it to the defined function to display our results.

Results

  • Although we get approximately similar recommendations by both the Models I & II, the execution time for Model II is much less than Model I.
  • Below is a snapshot of the execution time:

⚪ For Title Similarity:

3

⚪ For Nearest Neighbours:

4

  • By adding more features we defintely get more sensible recommendations for our products, as it also takes into account the brand, color, product type of the product that we are currently seeing.

🛠 Skills

Python, Streamlit, Heroku

project-i-recommendation-system-amazon's People

Contributors

bairagisaurabh avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.