Giter VIP home page Giter VIP logo

consensus_clustering's Introduction

Consensus clustering

An implementation of Consensus clustering in Python

This repository contains a Python implementation of consensus clustering, following the paper Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data.

ConsensusCluster

The class containing the implementation.

Attributes

  • cluster : the class to perform the clustering (like KMEANS from sklearn)
    • NOTE: the class is to be instantiated with parameter n_clusters, and possess a fit_predict method, which is invoked on data.
  • L : smallest number of clusters to try
  • K : largest number of clusters to try
  • H : number of resamplings for each number of clusters
  • resample_proportion : percentage to sample
  • Mk : consensus matrices for each k (shape =(K,data.shape[0],data.shape[0]))
    • NOTE: every consensus matrix is retained, like specified in the paper
  • Ak : area under CDF for each number of clusters
    • (see paper: section 3.3.1. Consensus distribution.)
  • deltaK : changes in areas under CDF
    • (see paper: section 3.3.1. Consensus distribution.)
  • bestK : number of clusters that was found to be best

Methods

ConsensusCluster.__init__

Parameters:
    * cluster : the class to perform the clustering (like KMEANS from sklearn)
      * NOTE: the class is to be instantiated with parameter `n_clusters`,
        and possess a `fit_predict` method, which is invoked on data.
    * L : smallest number of clusters to try
    * K : largest number of clusters to try
    * H : number of resamplings for each number of clusters
    * resample_proportion : percentage to sample

ConsensusCluster.fit

Fits all attributes of the class to data

Parameters:
    * data : data.shape == (n_examples,n_features) 
    * verbose : should print or not

ConsensusCluster.predict

Predicts the clustering on the consensus matrix, for best found number of cluster

Returns:
    * Cluster labels for each example

ConsensusCluster.predict_data

Predicts the clustering on the data, for best found number of cluster

Parameters:
    * data : data.shape == (n_examples,n_features)

Returns:
    * Cluster labels for each example 

consensus_clustering's People

Contributors

snarles avatar zigasajovic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

consensus_clustering's Issues

Pip Package

Is it possible to additionally maintain this pypi using the setup tools?

I can submit a PR for this if needed.

Resulting matrix has very large value?

I am using kmeans for the ConsensusCluster class (or others work same). Then fit and predict right after the first operation. The resulting matrix has 1s on the diagonal, but there are very large values like 500000.0 on some places in the matrix.
Is that a bug? Thanks.

Here is full code:

        c = ConsensusCluster(cluster.KMeans, 2, 3, 4, 1)
        c.fit(self.d1)
        _, similarity1 = c.predict()

BTW I modified the source code so that the matrix with best k is also returned by predict. This similarity1 has the issue.

License for repo

Hi - I would like to use this implementation.

Can you add a license for this code?

Set n_jobs of the clustering model

Amazing implementation.

One importnant suggestion: It is not possible to set the n_jobs of the cluster model. This would be nice to add.

E.g. as you set the input argument n_clusters=k in Mh = self.cluster_(n_clusters=k).fit_predict(resample_data), you could similarly pass the n_jobs input argument (most sklarn models have this argument).

name 'bisect' is not defined

Hey there :)

I'm using your script and I get an error saying:
name 'bisect' is not defined

This is the code I'm using:

kmeans_=KMeans
cc = ConsensusCluster(cluster=kmeans_, L= 10, K= 30, H=10)
cc = cc.fit(np.array(data), verbose = True)

Thanks in advance

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.