An attempt at clustering text from the Kleister-Charity dataset using TF-IDF and MiniBatchKMeans. The actual data used is "wim.csv" extracted from the original TSV file from the kleister-charity, which you can find in the rar file.
Future work and improvement:
- Metrics
- Implement a different algorithm e.g. DBSCAN