Clustering is one of the most frequently used forms of unsupervised learning. It automatically discover natural grouping in data.
Clustering is especially useful for exploring data you know nothing about. You might find connections you never would have thought of. Clustering can also be useful as a type of feature engineering, where existing and new examples can be mapped and labeled as belonging to one of the identified clusters in the data.
Some typical real world applications of clustering include fraud detection, categorizing books in a library or customer segmentation in marketing.
Table of Contents:
- Types of clustering algorithms
- Set-up
- 2.1 Data sets
- 2.2 Import Libraries
- 2.3 Import Data
- 2.4 Some Visualisations
- 2.5 Feature engineering
- 2.6 Outlier detection
- 2.7 Scalling data
- Determining The Optimal Number Of Clusters
- 3.1 Elbow method
- 3.2 Silhouette method
- 3.3 Dendrogram
- K-Means
- 4.1 Advantages and disadvantages of K-Means
- 4.2 Variations of K-Means
- 4.3 Training the K-Means model on the datasets
- 4.4 Comparing results
- 4.5 K-Means on online retail data
- Hierarchical clustering
- 5.1 Advantages and Disadvantages of Hierarchical clustering
- 5.2 Variations of hierarchical clustering
- 5.3 Training the hierarchical clustering model on the datasets
- 5.4 Comparing results
- 5.5 Hierarchical clustering on online retail data
- DBSCAN clustering algorithm
- 6.1 Advantages and Disadvantages of DBSCAN
- 6.2 Choosing the right initial parameters
- 6.3 Variations of DBSCAN
- 6.4 Training of DBSCAN clustering model on the datasets
- 6.5 Comparing results
- 6.6 DBSCAN clustering model on online retail data
- Gaussian Mixture Models (GMM)
- 7.1 Advantages and Disadvantages of Gaussian Mixture Models
- 7.2 Variations of GMM
- 7.3 Training of GMM on the datasets
- 7.4 Comparing results
- 7.5 GMM clustering model on online retail data
- All algorithm comparison