Giter VIP home page Giter VIP logo

final_project's Introduction

Make It Count โ€” A Fashion Recommender

Ironhack Logo

Julia Zimpel

Full-Time Data Analytics 2020-08, Campus Berlin, 09 Oct. 2020

Repository: (https://github.com/Yulizzz123/final_project)

Content

1. Project Description

In today's fast-moving world of quick real-life encounters, fashion becomes the more crucial in presenting oneself, and leaving a lasting impression on other people. The situation has been hightened by Covid-19 where leaving the home has become severely restricted, and dressing up is a rare chance to 'make it count'.

At the same time, general interest in street-fashion over Haute Couture rises. At the end of 2019, Amazon released the app 'StyleSnap' which allows people to take pictures of favored fashion pieces, and receive purchase recommendations in the online shop.

This project seeks to develop a model that recommends fashion for women after having classified an input image according to clothing type and color. The following rules are applied:

Input Image Recommended
Top Skirt
Skirt Top
Dress Dress

Since many women do not wish to be suspected of wearing the same clothes on two subsequent days (unpublished survey), a fashion piece of a different color is recommended.

Recommenders may benefit several parties. Companies may increase revenue and customer satisfaction through cross-selling, while customers may save time in purchasing multiple items at once as well as in finding favored choices. Hence, this project is also a contribution to the study of business-customer interactions.

2. Methodology

This project is mainly written in Python. For image recognition the state-of-the-art method of Convolutional Neural Networks (CNN) is employed by using the Tensorflow. CNN is a method from Deep Learning which has a higher efficiency and accuracy than other image recognition methods, such as kNN. For color detection, kNN is sufficient, and hence employed.

Two workbook series (A and B) are created with a sample size of 12,000 (A) and 1,500 (B) images. While series A achieves a higher model accuracy in image recognition, series B allows for higher flexibility in adjusting statements and parameters due to its smaller requirement on processing capacity. Hence, B was created before moving to the larger sample of A.

Working with large data volumes has led to the application of following insights for faster processing:

No. Principles
1. Images are shrunk in size for deep learning.
2. Rather than creating long functions, code is split up into small components, and batches are employed.
3. To avoid long reruns of code, modules and separate worksheets are created.

3. Questions and Hypotheses

The model is created with 6 convolutional and pool layers and pools to which the input sequentially passes.

Adjusting the hyperparameters, the following has proven to achieve an optimal result with the data at hand:

No. Findings
1. 6 convolutional and pool layers rather than smaller and higher numbers of layers
2. 96 filters rather than a mix of 32, 64 and 96 filters
3. 512 neurons rather than 128 neurons
4. Adding a dropout layer
5. A dropout layer of 70% rather than 50% or 90%
6. 15 epochs rather than 10, 12 or 20

The large sample size of series A with 12,000 images achieves a test accuracy of 72%, while the smaller sample size of series B with 1,500 images achieves a test accuracy of 65%. The difference of only 7 percentage points lower of test accuracy of an eight times smaller sample size suggests that by choosing the right hyperparamters already considerable results can be obtained.

4. Dataset

The following sample datasets are used in the project:

Category Number of Images A Selected Images B Selected Images
Tops 10,078 4,000 500
Skirt 12,742 4,000 500
Dress 60,768 4.000 500

Since the samples are considerable low, only 10% of A and B respectively are for testing to reserve a maximum amount of images for training.

5. Database

The DeepFashion database contains over 800,000 images. From this the attribute prediction subset of 290,000 images is selected, of which further subsets for training and testing are formed (see dataset).

"DeepFashion" (http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) "Attribute Prediction" (http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion/AttributePrediction.html)

For testing with a different database, the Fashion Product Image database is used, which comprises 44,440 images. "Fashion Product Images" (https://www.kaggle.com/paramaggarwal/fashion-product-images-dataset)

6. Workflow

No. Activity
1. Clean Data
2. Develop Models for Image Recognition
3. Test Models for Image Recognition
4. Develop Model for Color Recognition
5. Issue Recommendation

7. Organization

For the project a series A and B created with different sample sizes (see 2. Methodology). Workbooks AB refer to files that concern both series A and B, i.e. images from a different dataset are prepared for the model to be applied on.

The project folder contains the following files:

No. File Description
1. README - summary of project outline
2. A1_Data_Preprocessing.ipynb - prepare data for series A
3. A2_Extract_Transform.ipynb - transform data for series A
4. A3_Create_and_Test_Model.ipynb - create and test model for series A
5. A4_Recommendations.ipynb - create cross-selling offers
6. B1_Data_Preprocessing.ipynb - prepare data for series B
7. B2_Extract_Transform.ipynb - transform data for series B
8. B3_Create_and_Test_Model.ipynb - create and test the model for series B
9. B4_Recommendations.ipynb - create cross-selling offers
10. AB1_Data_Wrangling_New_Test.ipynb - prepare different dataset for simulation
11. AB2_Extract.ipynb - prepare data access
12. style_data.csv - selection of images for the AB series
13. styles.csv - original data of the AB series
14. a_data.pickle - load data for series A
15. b_data.pickle - load data for series B
16. a_recognition.h5 - model of series A
17. b_recognition.h5 - model of series B

8. Next Steps

Next I will increase the test accuracy of my model by adjusting further hyperparameters and the model's architecture. Moreover, I will incorporate a recommendation engine based on deep learning in this project.

9. Sources

The following sources have been used in this project:

Image Recognition in Python with TensorFlow and Keras (https://stackabuse.com/image-recognition-in-python-with-tensorflow-and-keras/)

Flower Classification with Deep Neural Network with Tensorflow and Python Programming (https://www.youtube.com/watch?v=POO1gdUJ7yE)

Classify Images Using Convolutional Neural Networks (https://medium.com/@randerson112358/classify-images-using-convolutional-neural-networks-python-a89cecc8c679)

Deep Transfer Learning for Image Classification (https://towardsdatascience.com/deep-transfer-learning-for-image-classification-f3c7e0ec1a14)

final_project's People

Contributors

yulizzz123 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.