Giter VIP home page Giter VIP logo

myntra-discount-prediction-model's Introduction

Myntra Discount Prediction Model Using Machine Learning

This repository contains a machine learning model for predicting discounts on fashion clothing items on Myntra. The project involves data cleaning, preprocessing, feature engineering, exploratory data analysis, and regression analysis. The model performance is improved by applying logarithmic scaling to the features.

Table of Contents

Data Cleaning and Preprocessing

In this step, the dataset is cleaned and preprocessed to handle missing values and convert data types. The "DiscountOffer" column is filled with 0 for missing values and converted to string data type. A new column called "DiscountOffer_len" is created to store the length of the strings in the "DiscountOffer" column. The data is then split into different groups based on the length of the strings, and the discount amount is segregated into separate columns for each group. The discounted price is calculated for each group and stored in a new column. Finally, all the groups are concatenated back into one dataframe.

Feature Engineering

The feature engineering process involves creating new features and merging relevant information. The dataset is filtered to separate out instances where the discount percentage is zero. The filtered data is split into training, validation, and test sets. Average rating and total reviews are calculated for each brand, creating a new column called "Brand_importance." The importance values are merged back to the datasets. The number of unique brands in each category is calculated and stored in a column called "ind_cat_popularity," which is also merged back to the datasets. Additionally, the number of products in each category is calculated and stored in a column called "cat_popularity," which is merged back to the datasets.

Exploratory Data Analysis (EDA)

EDA is performed on the "model_data" dataset using the Seaborn and Matplotlib libraries. A heatmap is created to visualize the correlation between different features. The correlation between the target variable and the features is plotted as a bar chart. A pairplot is also created to visualize the relationships between variables.

Regression Analysis

Three regression models are used: Linear Regression, KNeighbors Regressor, and Random Forest Regressor. For each model, a model object is created, and it is fitted on the training data. The accuracy of the models is evaluated on the test and validation data using the r2_score function. The feature importance of the Random Forest Regressor model is also calculated and stored in a dataframe.

Logarithmic Scaling

To improve the model performance, logarithmic scaling is applied to the features of the training, testing, and validation data. This transformation helps balance the magnitude of each feature and makes the data more normally distributed. Logarithm is applied to the feature values, and a small value is added to avoid taking the logarithm of zero.

Results

The model performance improves after applying logarithmic scaling to the features. The accuracy of the Linear Regression model, KNeighbors Regressor model, and Random Forest Regressor model show improvement. Detailed results and analysis can be found in the project code.

Connect with me

Gmail LinkedIn Instagram HackerRank Github logo

myntra-discount-prediction-model's People

Contributors

roshancharlie avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

sridhar2207

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.