Giter VIP home page Giter VIP logo

gactoolbox's Introduction

GACToolbox

Graph Agglomerative Clustering (GAC) toolbox

GACluster library supports large datasets, and provides demo scripts for reproducing the state-of-the-art benchmark results.

Introduction

Gactoolbox is a summary of our research of agglomerative clustering on a graph. Agglomerative clustering, which iteratively merges small clusters, is commonly used for clustering because it is conceptually simple and produces a hierarchy of clusters. Classifical aggolomerative clustering algorithms, such as average linkage and DBSCAN, were widely used in many areas. Those algorithms, however, are not designed for clustering on a graph. This toolbox implements the following algorithms for agglomerative clustering on a directly graph.

  1. Structural descriptor based algorithms (gacCluster.m). We define a cluster descriptor based on the graph structure, and each merging is determined by maximizes the increment of the descriptor. Two descriptors, including zeta function and path integral, are implemented. You can also design new descriptor (creating functions similar to gacPathEntropy.m and gacPathCondEntropy.m) and develop new algorithms with our code.

  2. Graph degree linkage (gdlCluster.m). It is a simple and effective algorithm, with better performance than normalized cuts and spectral clustering, and is faster.

This toolbox is written and maintained by Wei Zhang (wzhang009 at gmail.com). Please send me an email if you find any bugs or have any suggestions.

Examples

Preparations:

  1. Compile mex functions
  2. Add 'gacfiles' and 'gdlfiles' to your matlab paths
  3. Calculate a pairwise distance matrix from your data
K = 20;
a = 1;
z = 0.01;

% path integral

clusteredLabels = gacCluster (distance_matrix, groupNumber, 'path', K, a, z);

% zeta function

clusteredLabels = gacCluster (distance_matrix, groupNumber, 'zeta', K, a, z);

% GDL-U algorithm

clusteredLabels = gdlCluster(distance_matrix, groupNumber, K, a, false);

% AGDL algorithm

clusteredLabels = gdlCluster(distance_matrix, groupNumber, K, a, true);

Citations

Please cite the following papers, if you find the code is helpful.

  • W. Zhang, D. Zhao, and X. Wang. Agglomerative clustering via maximum incremental path integral. Pattern Recognition, 46 (11): 3056-3065, 2013.

  • W. Zhang, X. Wang, D. Zhao, and X. Tang. Graph Degree Linkage: Agglomerative Clustering on a Directed Graph. in Proceedings of European Conference on Computer Vision (ECCV), 2012.

Additional Notes

  1. How to compile mex files?

    I include mexw64 files. If you use a system other than win64, you can find a file called compileMex.m to help you build the mex files.

  2. We provide MATLAB implementation of structural descriptor based clustering and MATLAB-C++ mixed implementation of graph degree linkage. The MATLAB implementation is for ease of understanding, although it's inefficient. In the future we will add MATLAB implementation of graph degree linkage.

    In speed: AGDL > GDL-U > path integral > zeta function

  3. GDL-U and AGDL have similar performance. GDL-U is for small datasets and AGDL is for large datasets.

    AGDL has an additional parameter Kc in gdlMergingKNN_c.m. The larger Kc is, the closer performance AGDL has to GDL-U and slower the algorithm is. Default Kc = 10 is a good trade-off for most datasets.

gactoolbox's People

Contributors

waynezhanghk avatar weilinear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gactoolbox's Issues

Error when using gdlCluster()

I get this error when using gdlCluster

Error using gacPartial_sort
Rank is out of the range.

Error in gacMink (line 11)
        [sortedDist, NNIndex] = gacPartial_sort(X, k, dim);

Error in gacBuildDigraph_c (line 24)
[sortedDist,NNIndex] = gacMink(distance_matrix,max(K+1,4),2);

Error in gdlCluster (line 38)
    [graphW, NNIndex] = gacBuildDigraph_c(distance_matrix, K, a);

The distance matrix
Columns 1 through 9

     0    0.2963    0.7295    0.6844    0.4773    0.1356    0.0587    0.0943    0.5658
0.2963         0    0.4333    0.3882    0.1811    0.1607    0.2376    0.2019    0.2695
0.7295    0.4333         0    0.0451    0.2522    0.5940    0.6708    0.6352    0.1638
0.6844    0.3882    0.0451         0    0.2071    0.5488    0.6257    0.5901    0.1187
0.4773    0.1811    0.2522    0.2071         0    0.3418    0.4186    0.3830    0.0884
0.1356    0.1607    0.5940    0.5488    0.3418         0    0.0769    0.0413    0.4302
0.0587    0.2376    0.6708    0.6257    0.4186    0.0769         0    0.0356    0.5071
0.0943    0.2019    0.6352    0.5901    0.3830    0.0413    0.0356         0    0.4714
0.5658    0.2695    0.1638    0.1187    0.0884    0.4302    0.5071    0.4714         0
0.3502    0.0540    0.3793    0.3342    0.1271    0.2146    0.2915    0.2559    0.2155

Column 10

0.3502
0.0540
0.3793
0.3342
0.1271
0.2146
0.2915
0.2559
0.2155
     0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.