Comments (3)
Sounds like a case where:
- Data points are too close to each other, relatively speaking, compared with the distance between centroid.
- One centroid captures all the data points at a glance and updates itself to the centre of those points; while all other centroid has no data to update themselves.
Without looking at the exact data, you can try the following methods:
- Normalise all the records, in case the distance is too small in original space.
- Manually set K-means centroid initial values. If you intend to have K clusters, randomly pick K data points to be the initial centroids.
Which algorithm is better is hard to tell without looking at the data. We can focus on making K-means work first.
from python-for-data-and-media-communication-gitbook.
In sklearn.cluster.KMeans
if set ( init=βrandomβ)
, seems the centroid will be randomly picked. And if random to choose centroids and run for many times but obey the same result, I guess it rules out the possibility that the distance between data points is smaller than the distance between centroid, because it's already random.
Looks like there left only one situtation that there is no distance at all of are the data points.
from python-for-data-and-media-communication-gitbook.
If the dataset has infinite many data points, seems under some distance constrictions the data shape like Paraboloid may result in just only one non-empty cluster. But for the dataset with limited data points, it's unlikely to have the result.
from python-for-data-and-media-communication-gitbook.
Related Issues (20)
- datetime conversion HOT 1
- failed to open url with urllib.request HOT 6
- TODO marks in the source code HOT 4
- how to jump to next page in centaline property HOT 6
- termtosvg may be useful for ch00 on Terminal/ Shell
- Fix per-chapter TOC in exported HTML/PDF
- How to put User-Defined Image into the scatter diagram HOT 5
- How to merge grouped data HOT 3
- Add week 15 related case
- Some problems about return 'None' HOT 2
- prevent feedparser overwrite when scrape rss feed HOT 6
- ValueError: list.remove(x): x not in list HOT 1
- How to download CSV from GitHub HOT 1
- Test how to raise a quesion
- How to print csv file in Jupyter notebook HOT 3
- How to scrape all urls in one article HOT 2
- Chromedriver is not in the Path HOT 11
- cannot import Bar from pyecharts HOT 10
- Common methods to exit
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-for-data-and-media-communication-gitbook.