Giter VIP home page Giter VIP logo

dhruv16s / clothing-similarity-search Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 6.85 MB

A simple web application that performs NLP task to scrape content from a website and provide an interface to search for the products obtained through a custom search interface. Uses selenium, Sentence Transformer and text similarity techniques to scrape, embed and find the most relevant products.

Jupyter Notebook 97.33% Python 1.22% CSS 0.35% HTML 1.09%

clothing-similarity-search's Introduction

Objective

The goal of this project is to create a machine learning model capable of receiving text describing a clothing item and returning a ranked list of links to similar items from different websites. The solution must be a function deployed on Google Cloud that accepts a text string and returns JSON responses with ranked suggestions.

Note: The solution was not deployed over Google Cloud due to issues with containerizing the application and creating Docker images. Any help in this regard will be highly appreciated.

1. Introduction

For the purposes of this project, all the product information was scraped from the ASOS Website. More particularly the products were retrieved from the Men and Women category and included product sub categories such as shoes, t-shirts, shirts, party dresses, trainers, skirts and accessories such as sunglasses and watches.

2. Project Structure

./data: Is the folder that contains all relevant .csv files such as the scraped data, pre-processed data and the word embeddings.
./jupyter-files: Contains all relevant jupyter notebook files.
./static and ./template: Supporting folders containing the HTML and CSS code for the Flask application
app.py is the executable Flask application

3. Installations

  1. Clone this repository
git clone https://github.com/Dhruv16S/Clothing-Similarity-Search.git
  1. Additionally also ensure that Chrome driver has been properly installed for your corresponding Chrome version, to enable selenioum to work properly. Follow this link to install the suitable version of Chrome Driver

  2. Install required dependencies through

pip install -r requirements.txt

4. Project Components

  1. Web Scraping: Web scraping for the project was implemented using selenium. For further details, kindly refer the jupyter notebook web-scraping.ipynb under the directory jupyter-files

  2. Text Preprocessing: Text Preprocessing was performed using the nltk library. The source code can be found in preprocesing.ipynb

  3. Text Embedding: Text Embedding was performed using the Sentence Transformer library and the relevant dataset can be found in the data directory and is labelled as embeddings.csv

  4. Text Similarity: The cosine similarity was used to compare the input text with the database of products available.

  5. Deployment and Usage: After cloning the repository and installing the necessary packages, run the following command from the root of the cloned repo

    python app.py 
    

    The application will open at the following address locally:

    localhost:5000
    

As mentioned the project was not deployed over Google Cloud due to issues with containerization and Docker images. The project is live at the following URL Clothing Similarity Search.

Clothing.Similarity.Search.mp4

clothing-similarity-search's People

Contributors

dhruv16s avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.