Giter VIP home page Giter VIP logo

fkmeans's Introduction

fkmeans is a tiny C library that allows you to perform k-means clustering
algorithm over arbitrary sets of n-dimensional data. All you need to do is:

- Include the file kmeans.h in your sources;

- Consider your data set as a vector of vectors of double items (double**),
  where each vector is an n-dimensional item of your data set;

- If you want to perform the k-means algorithm over your data and you already
  know the number k of clusters there contained, or its estimate, you want to
  execute some code like this (in this example, the data set is 3-dimensional,
  i.e. it contains N vectors whose size is 3, and we know it contains n_clus
  clusters):

    kmeans_t *km;
    double **dataset;
    ...
    km = kmeans_new ( dataset, N, 3, n_clus );
    kmeans ( km );
    ...
    kmeans_free ( km );

  If you don't already know the number of clusters contained in your data set,
  you can use the function kmeans_auto() for automatically attempting to find
  the best one using Schwarz's criterion. Be careful, this operation can be very
  slow, especially if executed on data set having many elements. The example
  above would simply become something like:

    kmeans_t *km;
    double **dataset;
    ...
    km = kmeans_auto ( dataset, N, 3 );
    ...
    kmeans_free ( km );

- Once the clustering has been performed, the clusters of data can be simply
  accessed from your kmeans_t* structure, as they are held as a double*** field
  named "clusters". Each vector in this structure represents a cluter, whose
  size is specified in the field cluster_sizes[i] of the structure. Each cluster
  contains the items that form it, each of it is an n-dimensional vector. The
  number of clusters is specified in the field "k" of the structure, the
  number of dimensions of each element is specified in the field "dataset_dim"
  and the number of elements in the originary data set is specified in the field
  "dataset_size". So, for example:

    for ( i=0; i < km->k; i++ )
    {
	    printf ( "cluster %d: [ ", i );

	    for ( j=0; j < km->cluster_sizes[i]; j++ )
	    {
		    printf ( "(" );

		    for ( k=0; k < km->dataset_size; k++ )
		    {
			    printf ( "%f, ", km->clusters[i][j][k] );
		    }

		    printf ( "), ");
		}

	    printf ( "]\n" );
	}

  The library however already comes with a sample implementation, contained in
  "test.c", and typing "make" this example will be built. This example takes 0,
  1, 2 or 3 command-line arguments, in format

  $ ./kmeans-test [num_elements] [min_value] [max_value]

  and randomly generates a 2-dimensional data set containing num_elements, whose
  coordinates are between min_value and max_value. The clustering is then
  performed and the results are shown on stdout, with the clusters coloured in
  different ways;

- After you write your source, remember to include the file "kmeans.c",
  containing the implementation of the library, in the list of your sources
  files;

- That's all. Include "kmeans.h", write your code using
  kmeans_new()+kmeans()+kmeans_free() or kmeans_auto()+kmeans_free(), explore
  your clusters, remember to include "kmeans.c" in the list of your source
  files, and you're ready for k-means clustering.

Author: Fabio "BlackLight" Manganiello,
        <[email protected]>,
        http://0x00.ath.cx

fkmeans's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.