This project focuses on taking general demographic data about the population of Germany (provided at an individual person level) and performing clustering on it to segment the population into similar groups. This will allow us to then train a model that can be applied to data about customers in a mail-order sales company and thus segment that customer base and see in what demographic clusters customers may be overrepresented. This knowledge can then be applied to business decisions such as determining what groups of non-customers in the German population would be most receptive to direct advertising campaigns, for example.
As mentioned in the earlier section, two datasets are utilized in this project:
-
Individual German citizen demographics, derived from public datasets and stastical approximations. These data cover 891,221 individuals with 85 features.
-
Individual customer demographics (presumably German customers, although the project details are a bit unclear on this) for a mail-order sales company. These data cover 191,652 individuals with 85 features.
Note: these data are provided for this project under a restricted use license and thus cannot be provided publicly, but the code and results are included here as they are presumably sufficiently aggregated to make sharing not an issue.
Please see References.md for citations of literature and/or code and/or datasets utilized or adadpted in this project.