Giter VIP home page Giter VIP logo

scgc's Introduction

Self-Supervised Contrastive Graph Clustering & Influence Augmented Contrastive (IAC) loss

License: MIT

This repository contains a PyTorch implementation of "SCGC : Self-Supervised Contrastive Graph Clustering".(https://arxiv.org/abs/2204.12656)

The repo, inlcuding data sets and pretrained models are, has been forked initially from SDCN. We use also use the model code from AGCN and portions of contrastive loss code from Graph-MLP.

Setup

  • Our code was tested on CUDA 11.3.0, python 3.6.9, pytorch 1.3.1.
  • pip install -q munkres is needed for the hungarian algorithem, for the evaluation metrics
  • The code also run on Google colab (2022 April) with no modifications, with munkres installed.

Note : SCGC is able to run with no GPU if the GPU timing code is commented out.

Datasets

The dataset contains 2 folders, data and graph. Please obtain them from the dataset Google drive links. You will need to set --data_path to the parent folder containing data and graph. Please note that the data folder contains the pre-trained .pkl models. We directly use these pre-trained models from SDCN.

Usage

  • All parameters are defined in train.py with comments and explanations.

  • To run SCGC on the 6 datasets, for 10 iterations, use the following. This code has GPU time and memory profiling enabled, which can be turned off by commenting relevant code. Our published ACC,NMI,ARI and F1 was run with profiling commented.

python train.py --name usps --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 1 --beta 0.1 --order 4 --tau 0.5 --lr 0.001 
python train.py --name hhar --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 1 --beta 10  --order 4 --tau 2.25 --lr 0.001 
python train.py --name reut --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 3 --beta 0.1 --order 3 --tau 1 --lr 0.001 
python train.py --name acm  --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 0.5 --beta 0.1 --order 2 --tau 0.25 --lr 0.001 
python train.py --name dblp --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 0.5 --beta 0.1 --order 1 --tau 0.25 --lr 0.001 
python train.py --name cite --iterations 10 --epochs 200 --model SCGC --verbosity 0   --alpha 1 --beta 0.1 --order 1 --tau 0.25 --lr 0.0001 
  • To replicate the SCGC*, which uses Augmented Contrastive (IAC) loss and 50% less model parameters, run the following.
python train.py --name usps --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 4 --beta 0.1 --order 4 --tau 0.25 --lr 0.001 --influence
python train.py --name hhar --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 1 --beta 10  --order 3 --tau 2.25 --lr 0.001 --influence
python train.py --name reut --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 0.5 --beta 0.1 --order 3 --tau 0.25 --lr 0.001 --influence
python train.py --name acm  --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 1 --beta 0.1 --order 1 --tau 0.25 --lr 0.001 --influence
python train.py --name dblp --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 1 --beta 0.1 --order 1 --tau 0.25 --lr 0.001 --influence
python train.py --name cite --iterations 10 --epochs 200 --model SCGC_TRIM --verbosity 0   --alpha 1 --beta 0.1 --order 1 --tau 0.25 --lr 0.0001 --influence

The usps output is

Namespace(alpha=4.0, batch_size=2048, beta=0.1, cuda=True, data_path='/content/drive/MyDrive/001_Clustering/_Dataset_SDCN', epochs=200, influence=True, iterations=1, k=3, lr=0.001, mode='full', model='SCGC_TRIM', n_clusters=10, n_input=256, n_z=10, name='usps', note='-', order=4, seed=42, tau=0.25, verbosity=0)
---------------PROFILING CODE--------------
Loaded PAE acc:0.7098  nmi:0.6748  ari:0.5874  f1:0.6968
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  Total GFLOPs  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       _MODEL_TRAIN_ALL        27.73%       50.714s        99.25%      181.531s      181.531s       0.000us         0.00%       26.670s       26.670s          -4 b      -2.41 Gb           0 b    -367.99 Gb             1            --  
                                           _MODEL_TRAIN         0.06%     109.462ms         0.23%     421.977ms       2.110ms       0.000us         0.00%        3.386s      16.931ms       2.34 Kb        -800 b     876.65 Mb     -42.76 Gb           200            --  
                                              _MODEL_KL         0.01%      19.598ms         0.07%     118.949ms     594.745us       0.000us         0.00%      11.177ms      55.885us         800 b      -2.34 Kb      71.00 Mb    -199.50 Kb           200            --  
                                            _MODEL_DIST         1.03%        1.888s        14.81%       27.096s     135.478ms       0.000us         0.00%       15.566s      77.831ms        -800 b     -64.41 Gb     258.14 Gb    -193.11 Gb           200            --  
                                     _MODEL_CONTRASTIVE         0.03%      48.284ms         0.08%     149.847ms     749.235us       0.000us         0.00%        4.008s      20.040ms         800 b      -2.34 Kb      64.48 Gb    -128.92 Gb           200            --  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 182.898s
Self CUDA time total: 50.388s
Z:acc-nmi-ari-F1-gpu-clock:  0.8489,  0.8489,  0.8489,  0.0000,|, 0.8411,  0.8411,  0.8411,  0.0000,|, 0.7941,  0.7941,  0.7941,  0.0000,|, 0.8152,  0.8152,  0.8152,  0.0000,|,106441.4219, 106441.4219, 106441.4219,  0.0000,|,106.4405, 106.4405, 106.4405,  0.0000,||, Namespace(alpha=4.0, batch_size=2048, beta=0.1, cuda=True, data_path='/content/drive/MyDrive/001_Clustering/_Dataset_SDCN', epochs=200, influence=True, iterations=1, k=3, lr=0.001, mode='full', model='SCGC_TRIM', n_clusters=10, n_input=256, n_z=10, name='usps', note='-', order=4, seed=42, tau=0.25, verbosity=0)
Namespace(alpha=1.0, batch_size=2048, beta=10.0, cuda=True, data_path='/content/drive/MyDrive/001_Clustering/_Dataset_SDCN', epochs=200, influence=True, iterations=1, k=5, lr=0.001, mode='full', model='SCGC_TRIM', n_clusters=6, n_input=561, n_z=10, name='hhar', note='-', order=3, seed=42, tau=2.25, verbosity=0)

The line Z:acc-nmi-ari-F1-gpu-clock gives the min,max,avg,std of ACC,NMI,ARI,F1,GPU,CPU followed by || and all the args. Profiling _MODEL_XXX contexts capture logical model functions and training. Please see the code for more information.

  • To reun the prfiling for AGCN and SGCN we used the following. These use the SDCN and AGCN defaults.
python train.py --name usps --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name hhar --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name reut --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.0001	--alpha 0.01 --beta 0.1
python train.py --name acm  --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name dblp --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name cite --iterations 1 --epochs 200 --model SDCN --verbosity 0   --lr 0.0001	--alpha 0.01 --beta 0.1

python train.py --name usps --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.001	--alpha 1000 --beta 1000
python train.py --name hhar --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.001	--alpha 0.1 --beta 1
python train.py --name reut --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.0001	--alpha 10	--beta 10
python train.py --name acm  --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name dblp --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.001	--alpha 0.01 --beta 0.1
python train.py --name cite --iterations 1 --epochs 200 --model AGCN --verbosity 0   --lr 0.0001	--alpha 0.01 --beta 0.1

Data sources and code

Datasets and code is forked from SDCN. We use also use the model code from AGCN AGCN and portions of contrastive loss code from Graph-MLP. We acknowledge and thank the authors of these works for sharing their code.

Citation

@article{kulatilleke2022scgc,
  title={SCGC : Self-Supervised Contrastive Graph Clustering}, 
  author={Kulatilleke, Gayan K and Portmann, Marius and Chandra, Shekhar S},
  journal={arXiv preprint arXiv:2204.12656},
  year={2022}
}

scgc's People

Contributors

gayanku avatar

Stargazers

 avatar  avatar  avatar  avatar Siamak Layeghy avatar  avatar Tao Zhou avatar Kyle Dunovan avatar  avatar  avatar  avatar  avatar  avatar Mingyuan Luo avatar jiang chenhui avatar limeng avatar Sixsixsix avatar Yuquan Li avatar

Watchers

Shakes avatar  avatar

Forkers

whysirier

scgc's Issues

visualization

Hello,I'm very interested in your visualization. My visualization code is not very effective. I hope to use your visualization code for reference. I wonder if you can provide me with the visualization code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.