Giter VIP home page Giter VIP logo

harp's Introduction

HARP

Code for the AAAI 2018 paper "HARP: Hierarchical Representation Learning for Networks". HARP is a meta-strategy to improve several state-of-the-art network embedding algorithms, such as DeepWalk, LINE and Node2vec.

You can read the preprint of our paper on Arxiv.

This code run with Python 2.

Installation

The following Python packages are required to install HARP.

magicgraph is a library for processing graph data. To install, run the following commands:

git clone https://github.com/phanein/magic-graph.git
cd magic-graph
python setup.py install

Then, install HARP and the other requirements:

git clone https://github.com/GTmac/HARP.git
cd HARP
pip install -r requirements.txt

Usage

To run HARP on the CiteSeer dataset using LINE as the underlying network embedding model, run the following command:

python src/harp.py --input example_graphs/citeseer/citeseer.mat --model line --output citeseer.npy --sfdp-path bin/sfdp_linux

Parameters available:

--input: input_filename

  1. --format mat for a Matlab .mat file containing an adjacency matrix. By default, the variable name of the adjacency matrix is network; you can also specify it with --matfile-variable-name.

  2. --format adjlist for an adjacency list, e.g:

    1 2 3 4 5 6 7 8 9 11 12 13 14 18 20 22 32

    2 1 3 4 8 14 18 20 22 31

    3 1 2 4 8 9 10 14 28 29 33

    ...

  3. --format edgelist for an edge list, e.g:

    1 2

    1 3

    1 4

    2 5

    ...

--output: output_filename The output representations in Numpy .npy format. Note that we assume the nodes in your input file are indexed from 0 to N - 1.

--model model_name The underlying network embeddings model to use. Could be deepwalk, line or node2vec. Note that node2vec uses the default parameters, which is p=1.0 and q=1.0.

--sfdp-path sfdp_path Path to the binary file of SFDP, which is the module we used for graph coarsening. You can set it to sfdp_linux, sfdp_osx or sfdp_windows.exe depending on your operating system.

More options: The full list of command line options is available with python src/harp.py --help.

Evaluation

To evaluate the embeddings on a multi-label classification task, run the following command:

python src/scoring.py -e citeseer.npy -i example_graphs/citeseer/citeseer.mat -t 1 2 3 4 5 6 7 8 9

Where -e specifies the embeddings file, -i specifies the .mat file containing node labels, and -t specifies the list of training example ratios to use.

Note

SFDP is a library for multi-level graph drawing, which is a part of GraphViz. We use SFDP for graph coarsening in this implementation. Note that SFDP is included as a binary file under /bin; please choose the proper binary file according to your operation system. Currently we have the binary files under OSX, Linux and Windows.

Citation

If you find HARP useful in your research, please cite our paper:

@inproceedings{harp,
	title={HARP: Hierarchical Representation Learning for Networks},
	author={Chen, Haochen and Perozzi, Bryan and Hu, Yifan and Skiena, Steven},
	booktitle={Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence},
	year={2018},
	organization={AAAI Press}
}

harp's People

Contributors

example123 avatar gtmac avatar mongooma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

harp's Issues

some codes are required for disconnected graphs maybe?

hi, i'm trying to apply your method to my project.

but when i ran the code with my graph disconnected, there is an index error for sub-graphs except largest one (at graph_coarsening.py - skipgram_coarsening_disconnected function).
i think that is because you give the parameter 'recursive_graphs' to skipgram_coarsening_hs function (line 307) without newly initialize for next sub-graphs.

Am i understood your code right?
then, i suggest you add a code like 'recursive_graphs = None' at line between 343 and 344.

The code does not coarsen the graph

I have run your code, but I found that it did not coarsening the graph. I try to find out the reason, it leads me to the sfdp, the method you use to coarsening the graph. Since I can not find any useful information from the website GraphViz, I turn to you for help.
I run the code on colab, using python2, with dateset Citeseer. I get the following message.

Number of nodes: 3312
Number of edges: 9072
Underlying network embedding model: node2vec
{'scale': -1, 'num_paths': 10, 'path_length': 80, 'sg': 1, 'sfdp_path': 'bin/sfdp_linux', 'iter_count': 1, 'sample': 0.1, 'window_size': 10, 'lr_scheme': 'default', 'representation_size': 128, 'hs': 0, 'coarsening_scheme': 2, 'alpha': 0.025, 'min_alpha': 0.001}
Subgraph 1 with 2110 nodes and 7336 edges
Graph Coarsening...
Training negative sampling model...
Start building Skip-gram + Negative Sampling model on the coarsened graphs...
Training on graph level 0...
Finish building Skip-gram model on the coarsened graphs.
Subgraph 2 with 1202 nodes and 1784 edges
Training the Negative Sampling Model...
Finish training the Skip-gram model.

Then I check the 'tempdir' in 'read_coarsening_info' and 'external_ec_coarsening' in graph_coarsening. There were only two files there.

['x', 'tmp.mtx']

Looking forward to your reply.

graph coarsening methods don't work

The source code uses sfdp library for graph coarsening.
But when I tried these three standard datasets, I found that the graph was not coarsened.
That is, after the sentence "recursive_graphs, recursive_merged_nodes = [], read_coarsening_info( temp_dir)" of the external_ec_coarsening() function in graph_coarsening.py, the recursive_merged_nodes I received is an empty list.
I want to know what happened because this caused the external_ec_coarsening() function to directly return the original graph.

环境问题

我想问一下为什么无法安装0.19.1版本的scipy

TypeError: sequence item 0: expected string or Unicode, int found

when I input
python src/harp.py --input example_graphs/citeseer/citeseer.mat --model deepwalk --output citeseer.npy --sfdp-path bin/sfdp_linux,it throws following errors:
Number of nodes: 3312
Number of edges: 9072
Underlying network embedding model: deepwalk
{'scale': -1, 'num_paths': 40, 'path_length': 10, 'sg': 1, 'sfdp_path': 'bin/sfdp_linux', 'iter_count': 1, 'sample': 0.1, 'window_size': 10, 'lr_scheme': 'default', 'representation_size': 128, 'hs': 1, 'coarsening_scheme': 2, 'alpha': 0.025, 'min_alpha': 0.001}
Subgraph 1 with 2110 nodes and 7336 edges
Graph Coarsening...
Original graph with 2110 nodes and 7336 edges
Coarsening Round 1:
Generate coarsened graph with 1316 nodes and 4948 edges
Coarsening Round 2:
Generate coarsened graph with 867 nodes and 3386 edges
Coarsening Round 3:
Generate coarsened graph with 599 nodes and 2394 edges
Coarsening Round 4:
Generate coarsened graph with 426 nodes and 1790 edges
Coarsening Round 5:
Generate coarsened graph with 312 nodes and 1356 edges
Coarsening Round 6:
Generate coarsened graph with 178 nodes and 792 edges
Coarsening Round 7:
Generate coarsened graph with 108 nodes and 454 edges
Coarsening Round 8:
Generate coarsened graph with 71 nodes and 254 edges
Coarsening Round 9:
Generate coarsened graph with 53 nodes and 150 edges
Coarsening Round 10:
Generate coarsened graph with 36 nodes and 84 edges
Coarsening Round 11:
Generate coarsened graph with 27 nodes and 54 edges
Coarsening Round 12:
Generate coarsened graph with 19 nodes and 36 edges
Coarsening Round 13:
Generate coarsened graph with 13 nodes and 24 edges
Coarsening Round 14:
Generate coarsened graph with 9 nodes and 16 edges
Coarsening Round 15:
Generate coarsened graph with 6 nodes and 10 edges
Coarsening Round 16:
Generate coarsened graph with 4 nodes and 6 edges
{'window_size': 10, 'num_paths': 40, 'path_length': 10, 'sg': 1, 'iter': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'progress_threshold': 100000, 'sample': 0.1, 'scale': -1, 'lr_scheme': 'default', 'representation_size': 128, 'hs': 1, 'alpha': 0.025, 'report_loss': False, 'min_alpha': 0.001}
Start building Skip-gram + Hierarchical Softmax model on the coarsened graphs...
Training on graph level 16...
Traceback (most recent call last):
File "src/harp.py", line 77, in
sys.exit(main())
File "src/harp.py", line 62, in main
lr_scheme='default',alpha=0.025,min_alpha=0.001,sg=1,hs=1,coarsening_scheme=2, sample=0.1)
File "/home/rainsong/Desktop/HARP-master/src/graph_coarsening.py", line 312, in skipgram_coarsening_disconnected
sample=sample)
File "/home/rainsong/Desktop/HARP-master/src/graph_coarsening.py", line 385, in skipgram_coarsening_hs
edges = build_deepwalk_corpus(recursive_graphs[level], num_paths, path_length, output)
File "/home/rainsong/Desktop/HARP-master/src/graph_coarsening.py", line 542, in build_deepwalk_corpus
num_workers=20)
File "/usr/local/lib/python2.7/dist-packages/deepwalk-1.0.3-py2.7.egg/deepwalk/walks.py", line 85, in write_walks_to_disk
for file_ in executor.map(_write_walks_to_disk, args_list):
File "/home/rainsong/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 641, in result_iterator
yield fs.pop().result()
File "/home/rainsong/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 455, in result
return self.__get_result()
File "/home/rainsong/.local/lib/python2.7/site-packages/concurrent/futures/_base.py", line 414, in __get_result
raise exception_type, self._exception, self._traceback
TypeError: sequence item 0: expected string or Unicode, int found
Is there someone can help me?

WindowsError: [Error 2]

F:\Users\Administrator\miniconda3\envs\HARP\python.exe D:/workspace-2022/reappear/HARP/src/harp.py --input D:\workspace-2022\reappear\HARP\example_graphs\citeseer\citeseer.mat --model line --output citeseer.npy --sfdp-path D:\workspace-2022\reappear\HARP\bin\sfdp_windows.exe
Number of nodes: 3312
Number of edges: 9072
Underlying network embedding model: line
{'window_size': 1, 'sg': 1, 'sfdp_path': 'D:\workspace-2022\reappear\HARP\bin\sfdp_windows.exe', 'iter_count': 50, 'sample': 0.001, 'scale': 1, 'lr_scheme': 'default', 'representation_size': 64, 'hs': 0, 'alpha': 0.025, 'min_alpha': 0.001}
Subgraph 1 with 2110 nodes and 7336 edges
Graph Coarsening...
Traceback (most recent call last):
File "D:/workspace-2022/reappear/HARP/src/harp.py", line 77, in
sys.exit(main())
File "D:/workspace-2022/reappear/HARP/src/harp.py", line 73, in main
lr_scheme='default',alpha=0.025,min_alpha=0.001,sg=1,hs=0,sample=0.001)
File "D:\workspace-2022\reappear\HARP\src\graph_coarsening.py", line 295, in skipgram_coarsening_disconnected
recursive_graphs, recursive_merged_nodes = external_ec_coarsening(subgraph, sfdp_path)
File "D:\workspace-2022\reappear\HARP\src\graph_coarsening.py", line 221, in external_ec_coarsening
subprocess.call(['rm', '-r', temp_dir])
File "F:\Users\Administrator\miniconda3\envs\HARP\lib\subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "F:\Users\Administrator\miniconda3\envs\HARP\lib\subprocess.py", line 394, in init
errread, errwrite)
File "F:\Users\Administrator\miniconda3\envs\HARP\lib\subprocess.py", line 644, in _execute_child
startupinfo)
WindowsError: [Error 2]

I would be grateful if you could help me. Can you help me?

WindowsError: [Error 2]

When I run with the example F:\develop\Python\project\HARP>python src/harp.py --input example_graphs/citeseer/citeseer.mat --model deepwalk --output citeseer.npy --sfdp-path bin
/sfdp_linux , I have a problem as follows:

(python27) F:\develop\Python\project\HARP>python src/harp.py --input example_gra
phs/citeseer/citeseer.mat --model deepwalk --output citeseer.npy --sfdp-path bin
/sfdp_linux
Number of nodes: 3312
Number of edges: 9072
Underlying network embedding model: deepwalk
{'scale': -1, 'num_paths': 40, 'path_length': 10, 'sg': 1, 'sfdp_path': 'bin/sfd
p_linux', 'iter_count': 1, 'sample': 0.1, 'window_size': 10, 'lr_scheme': 'defau
lt', 'representation_size': 128, 'hs': 1, 'coarsening_scheme': 2, 'alpha': 0.025
, 'min_alpha': 0.001}
Subgraph 1 with 2110 nodes and 7336 edges
Graph Coarsening...
Traceback (most recent call last):
File "src/harp.py", line 77, in
sys.exit(main())
File "src/harp.py", line 62, in main
lr_scheme='default',alpha=0.025,min_alpha=0.001,sg=1,hs=1,coarsening_scheme=
2, sample=0.1)
File "F:\develop\Python\project\HARP\src\graph_coarsening.py", line 295, in sk
ipgram_coarsening_disconnected
recursive_graphs, recursive_merged_nodes = external_ec_coarsening(subgraph,
sfdp_path)
File "F:\develop\Python\project\HARP\src\graph_coarsening.py", line 221, in ex
ternal_ec_coarsening
subprocess.call(['rm', '-r', temp_dir])
File "F:\develop\Anaconda3\envs\python27\lib\subprocess.py", line 172, in call

return Popen(*popenargs, **kwargs).wait()

File "F:\develop\Anaconda3\envs\python27\lib\subprocess.py", line 394, in in
it

errread, errwrite)
File "F:\develop\Anaconda3\envs\python27\lib\subprocess.py", line 644, in _exe
cute_child
startupinfo)
WindowsError: [Error 2]

I would be grateful if you could help me. Can you help me?

Executation fails with the HARP (Deepwalk)

Hi, I could run the Harp using LINE as the underlying network embedding model, but it fails when using deepwalk embedding model. I have attached a schema of both runs . The first image is for Line embedding run and the second image is for deepwalk embedding model run, which fails. Please advise. Thanks
Untitled1
Untitled2

IndexError: index 1719 is out of bounds for axis 0 with size 1710

Hello sir,when i use a mat file with network 21592159 and group 2159573,it tells me

Finish building Skip-gram model on the coarsened graphs.
Traceback (most recent call last):
File "src/harp.py", line 77, in
sys.exit(main())
File "src/harp.py", line 62, in main
lr_scheme='default',alpha=0.025,min_alpha=0.001,sg=1,hs=1,coarsening_scheme=2, sample=0.5)
File "/home/rainsong/Desktop/2/HARP-master/src/graph_coarsening.py", line 333, in skipgram_coarsening_disconnected
embeddings[real_ind] = vec
IndexError: index 1719 is out of bounds for axis 0 with size 1710

I set the parameters by default.
Can you help me?thank you very mush.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.