Giter VIP home page Giter VIP logo

weighted_k_means's Introduction

weighted_k_means's People

Contributors

oliviaguest avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

weighted_k_means's Issues

Scaling factor

I think there is a bug during the calculation of the scaling factor. If you use the following formula, the scaling factor will be irrelevant to the value of beta.

scaling_factor = (1 - self.beta) * scaling_factor + (self.beta) * self.scaling_factor

Can you please advise me how to fix this bug? Thanks!

ValueError for large weights

I am using a table with lat lon and weights with ~10.000 entries and I would like to create 10 clusters. The maximum weight is = 150.000. It contains few (~20) zero weights.

WKmeans always throws the same error:

ValueError: One or more clusters disappeared because all points rushed away to other cluster(s). Try increasing the stickiness parameter (beta).

Changing the beta or alpha doesn't change anything. Instead I noticed, when I divided my weights by let's say 1000 it worked again.

This led me to the function causing the error:

    def _has_converged(self):
        """Check if the items in clusters have stabilised between two runs.

        This checks to see if the distance between the centroids is lower than
        a fixed constant.
        """
        diff = 1000
        if self.clusters:
            for clu in self.clusters:
                # For each clusters, check the length. If zero, we have a
                # problem, we have lost clusters.
                if len(clu) is 0:
                    raise ValueError('One or more clusters disappeared because'
                                     ' all points rushed away to other'
                                     ' cluster(s). Try increasing the'
                                     ' stickiness parameter (beta).')

It seems that diff = 1000, is causing the issue as it's probably not suited to larger numbers or a large difference between min an max weight as in my case.

Could anyone recommend a dynamic value generation instead of the static value? Can't it just be like max weight / some factor?

[Advice] How to pass unique IDs with coordinates?

First of all: great repo!
Not an issue but rather some advice needed. When passing my data I would like to pass an id for every coordinate in order to process the data afterwards.

Assume this data where lat, lon are the coordinates (X) and weight is the variable c. My aim is to fill the column cluster which I would normally do via a simple pandas merge command, pretty much like in this medium article for sklearn's kmeans (see step 6).

id lat lon weight cluster
1 50.34623 8.23523 2000 ?
2 50.2345 9.23552 1000 ?
3 50.643 9.23523 1000 ?
... ... ... ... ...

As your wkmeans method doesn't have the function predicted_kmeans = kmeans.predict(X, sample_weight = Y) I was wondering what the easiest way was to fill the cluster column.

My main idea was to create a fake id by concatting lat+lon but this would lead to issues with duplicates.

I tried creating a 3D-array with lat, lon and id but it returned - for my use-case - "false" 3D-results.

Currently, I'm just passing the data from a pandas df.

X = np.array(df[["lon","lat"]].astype(float))
c = np.array(df['weight'].astype(float)) 

Any advice?

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.