The goal of this project is demonstrate how to efficiently analyze ecommerce sales data.
First, I will walk through basic data wrangling, EDA, and product analysis for a retail company's inventory. Next, I will show how to effectively segment users using the RFM framework and k-means clustering.
I selected the Online Retail II Data Set from the UC Irvine Machine Learning Repository as the data source for this analysis. This decision was driven by a few reasons:
- The data set contains online retail transactions from a real UK e-commerce wholesaler
- There are over 500,000 records in the data set, allowing for more robust analysis
- There are a number of problematic characteristics with this data set. Consequently, it requires cleaning and transformation before it can be effectively analyzed.
For these reasons, I felt this data set would be a compelling example for an end-to-end project.