Giter VIP home page Giter VIP logo

peregrine's Introduction

Peregrine

Peregrine: A Pattern-Aware Graph Mining System

tests

Peregrine is an efficient, single-machine system for performing data mining tasks on large graphs. Some graph mining applications include:

  • Finding frequent subgraphs
  • Generating the motif/graphlet distribution
  • Finding all occurrences of a subgraph

Peregrine is highly programmable, so you can easily develop your own graph mining applications using its novel, declarative, graph-pattern-centric API. To write a Peregrine program, you describe which graph patterns you are interested in mining, and what to do with each occurrence of those patterns. You provide the what and the runtime handles the how.

For full details, you can read our paper published in EuroSys 2020 or the longer version on arXiv. For a deeper look at optimizations under the hood, check out our article in the 2021 ACM Operating Systems Review.

For an in-depth summary, watch the video presentation:

EuroSys 2020 Presentation Thumbnail

TL;DR: compared to other state-of-the-art open-source graph mining systems, Peregrine:

  • executes up to 700x faster
  • consumes up to 100x less memory
  • scales to 100x larger data sets
  • on 8x fewer machines
  • with a simpler, more expressive API

Table of Contents

  1. Quick start
  2. Writing your own programs
  3. Data graphs
  4. Reproducing our EuroSys 2020 paper results
  5. Contributing
  6. Acknowledgements
  7. Resources

1. Quick start

Peregrine has been tested on Ubuntu 18.04 and Arch Linux but should work on any POSIX-y OS. It requires C++20 support (GCC version >= 10.1). Additionally, the tests require UnitTest++.

Ubuntu 18.04 prerequisites:

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt install g++-10 libunittest++-dev

To build Peregrine:

$ git clone https://github.com/pdclab/Peregrine.git
$ cd Peregrine
$ source tbb2020/bin/tbbvars.sh intel64
$ make -j CC=g++-10
$ bin/test

Several sample applications, query patterns, and a sample dataset are released with the code. Calling any of the applications without arguments will show you what they expect:

$ bin/count
USAGE: bin/count <data graph> <pattern | #-motifs | #-clique> [# threads]

These applications print their results in <pattern>: <aggregation value> format.

For example, motif-counting:

$ bin/count data/citeseer 3-motifs 8
Counting 3-motifs
Finished reading datagraph: |V| = 3264 |E| = 4536
[...]
All patterns finished after 0.030265s
[2-3][1-3]: 23380
[1-2][1-3][2-3]: 1166

The string [2-3][1-3] encodes the pattern consisting of edges (1, 3), (2, 3), and 23380 is the number of unique occurrences Peregrine found in the citeseer graph.

Other applications give similar output:

$ bin/count data/citeseer 4-clique 8
[...]
All patterns finished after 0.030265s
[3-4][1-2][1-3][1-4][2-3][2-4]: 255
$
$ bin/count data/citeseer query/p1.graph 8
[...]
All patterns finished after 0.003368s
[3-4][1-2][1-3][1-4][2-3]: 3730

FSM provides support values instead of counts:

$ bin/fsm data/citeseer 3 300 8 # size 3 FSM with support 300
[...]
Frequent patterns:
[1,0-2,0][1,0-3,0][2,0-4,0]: 303
[1,1-2,1][1,1-3,1][2,1-4,1]: 335
Finished in 0.078629s

The sample FSM app performs edge-induced FSM by default. For vertex-induced FSM, simply call it as

$ bin/fsm data/citeseer 3 300 8 v # vertex-induced

The existence-query application simply states whether the desired pattern exists or not:

$ bin/existence-query data/citeseer 14-clique 8
[...]
All patterns finished after 0.005509s
[pattern omitted due to length] doesn't exist in data/citeseer

The output application stores matches grouped by pattern in either CSV or packed binary format:

$ bin/output data/citeseer 3-motifs results bin 8
[...]
all patterns finished after 0.002905s
[1-3](1~2)[2-3]: 23380 matches stored in "results/[1-3](1~2)[2-3]"
[1-2][1-3][2-3]: 1166 matches stored in "results/[1-2][1-3][2-3]"

In the CSV format, for a pattern with n vertices each line will contain a match written in the form:

v1,v2,...,vn
...

In the binary format, matches are written as sequences of n 4-byte vertex IDs in binary, with no delimiters:

bits(v1)bits(v2)...bits(vn)...

2. Writing your own programs

In Peregrine's programming model, you provide a data graph, a set of patterns you're interested in, and a callback the system will apply to each occurrence of these patterns in your data graph. We present a brief overview of the API here, beginning with constructing patterns.

For all of the following code snippets, assume we are using namespace Peregrine.

We have not released support for directed graphs yet; the code currently assumes all graphs are undirected.

2.1 Constructing patterns directly

Pattern graphs can constructed in two ways using the SmallGraph data structure:

2.1.1 Statically, from a file

Given a file in the following edge-list format:

<vertex-id> [label] <vertex-id> [label]

where the label's are optional 32-bit integers and vertex ids are contiguous integers starting from 1. To indicate a vertex is unlabelled in a partially-labelled pattern, assign it label -1. To indicate an anti-edge, any extra integer can be placed at the end of the line.

For example, a triangle:

1 2
1 3
2 3

A vertex-induced 3-star, notice the last edge is an anti-edge:

1 2
1 3
2 3 1

A partially-labelled triangle (vertex 3 is unlabelled):

1 100 2 101
1 100 3 -1
2 101 3 -1

Then, construct the SmallGraph:

SmallGraph p("pattern_graph.txt");

2.1.2 Dynamically, using the builder methods

Construct an empty graph and add edges/anti-edges one by one:

SmallGraph p
  .add_edge(1, 2)
  .add_edge(1, 3)
  .add_anti_edge(2, 3)
  .set_label(1, 100)
  .set_label(2, 101)
  .set_label(3, -1);

2.2 Constructing patterns using the PatternGenerator

The PatternGenerator class is useful for

Quickly generating common patterns:

SmallGraph triangle = PatternGenerator::clique(3);
SmallGraph wedge = PatternGenerator::star(3);

Quickly generating many patterns:

int size = 4;
std::vector<SmallGraph> vertex_induced = PatternGenerator::all(size,
        PatternGenerator::VERTEX_BASED,        // 4 vertices
        PatternGenerator::INCLUDE_ANTI_EDGES); // anti-edges are inserted between all unadjacent vertices

std::vector<SmallGraph> vertex_induced_edge_based = PatternGenerator::all(size,
        PatternGenerator::EDGE_BASED,          // 4 edges
        PatternGenerator::INCLUDE_ANTI_EDGES); // anti-edges are inserted between all unadjacent vertices

std::vector<SmallGraph> edge_induced = PatternGenerator::all(size,
        PatternGenerator::EDGE_BASED,          // 4 edges
        PatternGenerator::EXCLUDE_ANTI_EDGES); // no extra anti-edges are inserted

std::vector<SmallGraph> edge_induced_vertex_based = PatternGenerator::all(size,
        PatternGenerator::VERTEX_BASED,        // 4 vertices
        PatternGenerator::EXCLUDE_ANTI_EDGES); // no extra anti-edges are inserted

Extending existing patterns:

std::vector<SmallGraph> vertex_extensions = PatternGenerator::extend({triangle}, PatternGenerator::VERTEX_BASED);
std::vector<SmallGraph> edge_extensions = PatternGenerator::extend({triangle}, PatternGenerator::EDGE_BASED);

2.3 Importing data graphs

Data graphs can be constructed from a preprocessed edge-list (see Data graphs) or a SmallGraph.

DataGraph dg("preprocessed_data/");
SmallGraph g("small_data_graph.txt");
DataGraph dg(g);

2.4 Matching patterns

Pattern matching is done through the match, count, and output functions.

Counting instances of patterns is very simple. Consider the following minimal motifs program:

#include "Peregrine.hh"
using namespace Peregrine;
void motifs(int size)
{
  int nthreads = 16;
  DataGraph g("data/citeseer/");

  auto patterns = PatternGenerator::all(size, PatternGenerator::VERTEX_BASED, PatternGenerator::INCLUDE_ANTI_EDGES);

  auto results = count(g, patterns, nthreads);

  for (const auto &[pattern, count] : results)
  {
    std::cout << pattern << ": " << count << std::endl;
  }
}

The arguments to count are straightforward: the data graph you wish to use, the patterns whose instances you wish to count, and the number of worker threads to use.

For arbitrary aggregations, use the match or output template functions. For example, you could replace count above with this snippet (note that using count will be faster):

  const auto callback = [](auto &&handle, auto &&match) { handle.map(match.pattern, 1); };
  auto results = match<Pattern, uint64_t, AT_THE_END, UNSTOPPABLE>(g, patterns, nthreads, callback);

First, let's explain the template arguments to match:

  • Peregrine::Pattern is the aggregation key type
  • uint64_t is the aggregation value type
  • AT_THE_END describes when aggregation happens: either AT_THE_END of execution or ON_THE_FLY
  • UNSTOPPABLE indicates that you will not be using early termination

The regular parameters are the same as the count example except for callback. This is a function that will be called on each match for a pattern. It takes two arguments: a handle to Peregrine's aggregator, and the match itself.

Here, callback is mapping the pattern of the match to 1. Note that this lines up with the types passed to match: the aggregation key is Pattern and the value is an integer. Peregrine automatically takes care of simple types like uint64_t, aggregating them with the built-in sum operator.

Using more complicated aggregation values requires them to implement a few methods and implement a view function. The sample FSM application is a great example: check out both apps/fsm.cc and apps/Domain.hh to see a view function and the aggregation value methods.

Users can call three methods on the handle:

  • map(key, value): maps key to value
  • read_value(key): returns a view of the value mapped by key
  • stop(): halts exploration early

The output function takes the same template parameters and arguments as match, but allows you to write matches to disk using the callback:

  const auto callback = [](auto &&handle, auto &&match)
  {
    handle.map(match.pattern, 1);
    handle.template output<CSV>(match.mapping); // this writes out matches
  };
  auto results = output<Pattern, uint64_t, AT_THE_END, UNSTOPPABLE>(g, patterns, nthreads, callback);

You can pass output an additional argument to specify where to put results (by default it writes them under "."):

  auto results = output<Pattern, uint64_t, AT_THE_END, UNSTOPPABLE>(g, patterns, nthreads, callback, default_view<uint64_t>, "/destination/path/");

In the simple case that you want to output all matches and don't require any other processing, you can call an overloaded definition of output:

auto results = output<CSV>(g, patterns, nthreads, "/destination/path/");

You can output matches in CSV or binary formats (Peregrine::CSV or Peregrine::BIN).

3. Data graphs

Peregrine's data processor ingests graph edge-lists and stores them in binary adjacency-list format. For labeled data graphs, a separate file of vertex-label pairs is used. Both vertex ID's and labels should be unsigned 32-bit integers.

Edge-list file format:

<vertex-id> <vertex-id>

Label file format:

<vertex-id> <label>

Given such files edges.txt and labels.txt, the processor is used as follows:

$ cd Peregrine
$ mkdir data/my-graph
$ make convert_data
$ bin/convert_data edges.txt labels.txt data/my-graph/

To convert an unlabeled dataset, labels.txt can be omitted.

4. Reproducing our EuroSys 2020 paper results

See the guide.

Note that these instructions and datasets only work with the eurosys20-experiments branch.

5. Contributing

A sincere thank you for deciding to spend your valuable time on this project! PR's are welcome, and will be carefully considered so long as they do not degrade performance or compromise correctness.

If you want to understand the system, the most important bits are in:

  1. Peregrine.hh, which contains the main entry-points to the system
  2. PatternMatching.hh, which contains the pattern matching engine
  3. Graph.hh, where pattern analysis happens

You can also contribute in several ways without having to dig into Peregrine internals:

  • Submit your applications to be included as samples
  • Point out or improve missing or unsatisfactory documentation
  • Complain about confusing errors
  • Suggest features you wish Peregrine had
  • Report correctness or performance bugs

6. Acknowledgements

We are grateful for the other open-source projects this codebase relies on:

7. Resources

If you use this software project, please cite:

@inproceedings{10.1145/3342195.3387548,
  author = {Jamshidi, Kasra and Mahadasa, Rakesh and Vora, Keval},
  title = {Peregrine: A Pattern-Aware Graph Mining System},
  year = {2020},
  isbn = {9781450368827},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3342195.3387548},
  doi = {10.1145/3342195.3387548},
  booktitle = {Proceedings of the Fifteenth European Conference on Computer Systems},
  articleno = {Article 13},
  numpages = {16},
  location = {Heraklion, Greece},
  series = {EuroSys ’20}
}

@article{10.1145/3469379.3469381,
  author = {Jamshidi, Kasra and Vora, Keval},
  title = {A Deeper Dive into Pattern-Aware Subgraph Exploration with PEREGRINE},
  year = {2021},
  issue_date = {July 2021},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  volume = {55},
  number = {1},
  issn = {0163-5980},
  url = {https://doi.org/10.1145/3469379.3469381},
  doi = {10.1145/3469379.3469381},
  journal = {SIGOPS Oper. Syst. Rev.},
  month = jun,
  pages = {1–10},
  numpages = {10}
}

peregrine's People

Contributors

cristiancantoro avatar kiran-r avatar kjamsh avatar pdclab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

peregrine's Issues

segmentation fault when converting simple input & seemingly incorrect FSM results

Hello,

I'm new to frequent graph mining and hope to use this tool as a black box.

I ran into the following issues. Could you help take a look?

  1. A simple input seems to crash the bin/convert_data. I'm running Ubuntu 18.04

graph.txt

0 1
1 2
2 3
3 0

Command:
bin/convert_data graph.txt test/graph_converted

Result:

Read edge-list in 5.7e-05s
Segmentation fault (core dumped)
  1. Another input could be converted by bin/convert_data, but I could not understand the fsm results.
1 2
2 3
3 4
1 5
2 6
3 7
4 8
5 6
6 7
7 8
5 9
6 10
7 11
8 12
9 10
10 11
11 12
9 13
10 14
11 15
12 16
13 14
14 15
15 16

command:
./bin/fsm test/sample1_bin/ 4 4 v

results:

4-FSM with threshold 4
-------
all patterns finished after 0.014621s
-------
all patterns finished after 0.005576s
-------
all patterns finished after 0.001604s
-------
all patterns finished after 0.015546s
-------
all patterns finished after 0.036467s
3 frequent patterns: 
[1,0-2,0][1,0-3,0][2,0-4,0](1,0~4,0)(2,0~3,0)(3,0~4,0): 15
[1,0-2,0][1,0-3,0][2,0-4,0][3,0-4,0](1,0~4,0)(2,0~3,0): 15
[1,0-2,0][1,0-3,0][1,0-4,0](2,0~3,0)(2,0~4,0)(3,0~4,0): 11
finished in 0.15677s

I'm not sure how to interpret the results. For example, the [2,0-4,0] probably represent an edge between vertex 2 and vertex 4? But there isn't such an edge in the input graph. Also, I'm not sure what's the difference between [] pairs and () pairs?

Thanks for your help!

About the Experimental Dataset

Thank you for your previous answer to the matching order question. Now I have a new problem, I initially only run the code with the citeseer dataset, now I want to run the code with other datasets mentioned in the paper, from https://github.com/pdclab/peregrine/blob The data set downloaded from /master/experiments-guide.md, I can't run it successfully. I added a statement to output the number of vertices and edges in the data DataGraph function. When clion runs with the mico data set, the feedback I get is
image
the number of vertices and edges is not consistent with my known mico data set, and the same is true for other data sets。
The feedback obtained on the command line is ”Segmentation fault (core dumped)“
Have you encountered any related problems?Do you have any relevant solutions?

A question about subgraph match

Hello guys,

I have a ring data graph, and a chain pattern graph like these.
Data
1 2
2 3
3 4
4 5
5 1
Pattern
1 2
2 3
3 4

I was expecting that the match results should be 1 2 3 4, 2 3 4 5, 3 4 5 1, 4 5 1 2, and 5 1 2 3. But Peregrine only outputs 4 3 2 1
and 5 4 3 2 . Is this correct? I am not sure whether this is due to the canonical checks. How should I find the later 3 match?

Output only given number of results

Hi,

For the subgraph matching task, I want to only output a given number of results (say 1000) and terminate. Can I do that in Peregrine?

Thanks

compilation error with TBB

Hi,
I got this error when doing make:

g++ -c core/DataGraph.cc -o core/DataGraph.o -O3 -std=c++2a -Wall -Wextra -Wpedantic -fPIC -fconcepts -I/net/ohm/export/cdgc/cxh/peregrine/core/
In file included from /org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/parallel_backend.h:14,
from /org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/algorithm_impl.h:25,
from /org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/glue_execution_defs.h:52,
from /org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/execution:32,
from core/Graph.hh:4,
from core/DataGraph.cc:13:
/org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/parallel_backend_tbb.h:28:2: error: #error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
28 | #error Intel(R) Threading Building Blocks 2018 is required; older versions are not supported.
| ^~~~~
/org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/parallel_backend_tbb.h: In function ‘void __pstl::__par_backend::__parallel_for(_ExecutionPolicy&&, _Index, _Index, _Fp)’:
/org/centers/cdgc/sw/gcc-9.2.0/include/c++/9.2.0/pstl/parallel_backend_tbb.h:99:10: error: ‘tbb::this_task_arena’ has not been declared
99 | tbb::this_task_arena::isolate(= {
| ^~~~~~~~~~~~~~~

Does that mean I need to install TBB? Which TBB version should I install? I have both gcc 9.2.0 and 9.3.0 installed.

Thank you!

About partial order、matching orders

Thanks to the open Peregrine‘s source code ,I have some question:
which variables arethe sequence of partial order and matching orders stored?is the partial order is defaulted?matching orders I am not found

About the dataset

Can you provide the patent dataset file.
Looking forward to your reply.

Minor Documentation Issue/Enhancement Request

Greetings,
Currently, your #4-reproducing-our-eurosys-2020-paper-results docs are slightly wrong for the Eurosys branch since they mention tbb2020 when your Eurosys branch is still on 2019.

In addition, it is not possible to reproduce those results on the latest branch since the binary graph format has changed, but the provided datasets have not. However, if the tutorial instead included the links to the datasets and the DataConverter commands, then it would apply to all versions regardless of the Binary Graph Format.

Just a minor suggestion.

compile error with libtbb.so.2

I want to use peregrine in a centos server with gcc-10.1.0 installed from source code.
And I follow the lead of the project, running

source tbb2020/bin/tbbvars.sh

until I meet the errors while running "make", the error says:

`make -C ./core/bliss-0.73
make[1]: enter directory “/home/hulin/gitFile/peregrine/core/bliss-0.73”
rm -f libbliss.a
ar cr libbliss.a defs.o graph.o partition.o orbit.o uintseqhash.o heap.o timer.o utils.o bliss_C.o
ranlib libbliss.a
g++ -I. -g -Wall --pedantic -O9 -fPIC -o bliss bliss.o defs.o graph.o partition.o orbit.o uintseqhash.o heap.o timer.o utils.o bliss_C.o
make[1]: leave directory “/home/hulin/gitFile/peregrine/core/bliss-0.73”
g++ apps/fsm.cc core/DataGraph.o core/PO.o core/utils.o core/PatternGenerator.o /home/hulin/gitFile/peregrine/core/showg.o core/roaring.o -o bin//fsm -L/home/hulin/gitFile/peregrine/core/bliss-0.73/ -lbliss -L/usr/local/lib -lpthread -latomic -Ltbb2020/lib/intel64/gcc4.8 -ltbb -O3 -std=c++2a -Wall -Wextra -Wpedantic -fPIC -fconcepts -I/home/hulin/gitFile/peregrine/core/ -Itbb2020/include

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to ‘typeinfo for std::invalid_argument@GLIBCXX_3.4’

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to‘std::length_error::length_error(std::string const&)@GLIBCXX_3.4’

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to‘std::range_error::range_error(std::string const&)@GLIBCXX_3.4’

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to‘__cxa_allocate_exception@CXXABI_1.3’

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to‘std::bad_alloc::~bad_alloc()@GLIBCXX_3.4’

tbb2020/lib/intel64/gcc4.8/libtbb.so.2:undefined reference to‘__cxa_get_globals@CXXABI_1.3’

... ...

collect2: error : ld return 1`

I am a little confused, and I doubt the tbb2020 doesn't support gcc 10.1.0. What's the possible problem?
Thanks!

Understanding output

Hello,

For the sample FSM app in the README, the output is:

Frequent patterns:
[1,0-2,0][1,0-3,0][2,0-4,0]: 303
[1,1-2,1][1,1-3,1][2,1-4,1]: 335

I'm struggling to understand what this output means. What does the numbers in the brackets corresponds to?

Question - newer algorithm identification and improvements?

Hi,

hopefully an easy question.
The performance of your solution is quite amazing.
I see other Graph solutions are implementing newer and better performing algorithms i.e. https://arxiv.org/pdf/1910.05971.pdf into Gunrock/GraphBlast as example.

My question is, are you actively looking for newer algorithms to implement/improve the performance of your solution? or is your solution more academic and in a kind of snapshot in time/maintenance/bug handling mode only.

Thanks in advance!

issue about graph6

Is there any problem in ./core/graphs/graphc.g6 ? The code can read the file named reg82.g6 which is from official web. But ./core/graphs/graphc.g6 cannot be read. The output is ">E readgg: illegal character"(from readgg())

Way to output found embeddings?

Hi authors,

Thank you for this excellent graph mining codebase.

I'm wondering is there an argument to output all found embeddings that fit the pattern? (rather than just produce counted numbers)

Thanks

Kan

Unable to compile

Hi, when I make, I get a lot of "undefined reference to 'tbb::~~~'" errors like the following.
core/DataConverter.o:(.data.rel.ro+0x10): undefined reference to typeinfo for tbb::task

I'm working on Ubuntu 18.04 and faithfully followed the instructions in README.
Is there anything that I can try?

Thank you

Wrong output and different results using the data converter

Hello there,
I had some issues while testing your framework, and I hope you can help me understand what is wrong.

The edges of my graph are:

0 1
1 2
2 3
4 5
5 6
6 7
8 9
9 10
10 11

and the node labels are:

0 1
1 1
2 1
3 1
4 1
5 2
6 2
7 1
8 1
9 2
10 1
11 2

FIRST ISSUE
I used your data converter convert_data on a Mac and on a Ubuntu PC, obtaining different results. On the Ubuntu PC, the datagraph has the correct number of nodes and edges, while on the Mac, it has 11 nodes and 8 edges.
Could you help me understand what could be the reason?

SECOND ISSUE
Then, I run the fsm algorithm using size = 1 and support = 4:

    bin/fsm data/graph 1 4 5

and obtained the output:

1-FSM with threshold 4
Finished reading datagraph: |V| = 12 |E| = 9
-------
all patterns finished after 0.000489s
0 frequent patterns: 

However, the edge with node labels equal to 1 has MNI support equal to 4, because the matches for the first vertex are {0,1,2,3} and the matches for the second vertex are {0,1,2,3}, and hence the min is 4.
I do not understand why my output does not contain any pattern.

THIRD ISSUE
Finally, I tried to set the frequency to 0, to generate all the possible patterns of a given size:

    bin/fsm data/graph 3 0 5

My output contains patterns like the following one:

    [1,2-2,1][1,2-3,1][1,2-4,0]: 0

Given that the nodes in the input graph have label equal to either 1 or 2, why are there patterns with node label equal to 0?

Issue with make command

/usr/bin/ld: cannot open output file bin//count: No such file or directory
collect2: error: ld returned 1 exit status
Makefile:24: recipe for target 'count' failed
make: *** [count] Error 1
make: *** Waiting for unfinished jobs....
/usr/bin/ld: cannot open output file bin//existence-query: No such file or directory
collect2: error: ld returned 1 exit status
Makefile:21: recipe for target 'existence-query' failed
make: *** [existence-query] Error 1
/usr/bin/ld: cannot open output file bin//fsm: No such file or directory
collect2: error: ld returned 1 exit status
Makefile:18: recipe for target 'fsm' failed
make: *** [fsm] Error 1

TBB compiled with gcc 4.7

Since Peregrine requires C++20 support, I'm using gcc 10.1.0 to compile Peregrine.
When I run source tbb2019/bin/tbbvars.sh intel64, it told me ERROR: libtbb.so.2 library does not exist in /home/sth/download/peregrine/tbb2019//lib/intel64/gcc4.1.
I found this code in tbbvars.sh:

get_library_directory () {

gcc_version_full=$(gcc --version | grep "gcc" | egrep -o " [0-9]+\.[0-9]+\.[0-9]+.*" | sed -e "s/^\ //")
if [ $? -eq 0 ]; then
    gcc_version=$(echo "$gcc_version_full" | egrep -o "^[0-9]+\.[0-9]+\.[0-9]+")
fi
case "${gcc_version}" in
4.[7-9]*|[5-9]* )
    lib_dir="gcc4.7";;
4.[4-6]* )
    lib_dir="gcc4.4";;
* )
    lib_dir="gcc4.1";;
esac
echo $lib_dir

}
It seems that gcc 10.1.0 is regarded as gcc4.1, and the directory /tbb2019//lib/intel64/gcc4.1 doesn't exist.
What should I do? Do I need to compile TBB2019 with gcc 10.1.0 by myself?

Unable to compile Peregrine on Ubuntu 16.04

Hi,

I am unable to compile Peregrine on ubuntu 16.04 with GCC 10.2 or 9.3. I am setting the TBB available in the repo by:

source tbb2020/bin/tbbvars.sh intel64

However, I see linker errors when compiling as shown below. Do you have any workarounds for this problem?

In function `Peregrine::DataConverter::convert_data(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)': DataConverter.cc:(.text+0x42a6): undefined reference to `tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long)' DataConverter.cc:(.text+0x44da): undefined reference to `tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long)' DataConverter.cc:(.text+0x46d7): undefined reference to `tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long)' DataConverter.cc:(.text+0x47fc): undefined reference to `tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long)' DataConverter.cc:(.text+0x4c25): undefined reference to `tbb::interface7::internal::isolate_within_arena(tbb::interface7::internal::delegate_base&, long)' collect2: error: ld returned 1 exit status

P.S. This problem is only on 16.04. It compiles and runs fine in 18.04 and 20.04.

Unable to compile on MacOS

sorry to bother, I have a trouble about compiling the program on MacOS.
It shows that the compiler cannot find some symbols, as details pasted below.

Undefined symbols for architecture x86_64:
"__ZN3tbb8internal23allocate_via_handler_v3Em", referenced from:
__ZN3tbb10interface58internal25concurrent_unordered_baseINS0_31concurrent_unordered_map_traitsISt6vectorIjSaIjEEyNS1_12hash_compareIS6_St4hashIS6_ESt8equal_toIS6_EEENS_13tbb_allocatorISt4pairIKS6_yEEELb0EEEE11init_bucketEm in ccDGi3UY.o
_ZN3tbb10interface524concurrent_unordered_mapISt6vectorIjSaIjEEySt4hashIS4_ESt8equal_toIS4_ENS_13tbb_allocatorISt4pairIKS4_yEEEEixERSB in ccDGi3UY.o
_ZN9Peregrine11match_multiISt6vectorIjSaIjEE15DiscoveryDomainILj1EELNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ELNS_12OutputOptionE1ERKZ4mainEUlOT_OT0_E0_RZNS_5matchIS3_S5_LS6_1ELS7_1ERNS_9DataGraphESF_Z4mainEUlSA_E_LS8_1EEENSt11conditionalIXeqT6_LS8_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISB_EEEENSt10filesystem7__cxx114pathEEESaISS_EES1_ISt4pairISM_SO_ESaISW_EEE4typeEOT3_RKS1_ISM_SaISM_EEjOT4_SN_EUlSA_E_EENSK_IXeqT3_LS8_0EESU_SY_E4typeES18_OSN_jS16 in ccDGi3UY.o
_ZN9Peregrine11match_multiISt6vectorIjSaIjEE6DomainLNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ELNS_12OutputOptionE1ERKZ4mainEUlOT_OT0_E1_RZNS_5matchIS3_S4_LS5_1ELS6_1ERNS_9DataGraphESE_Z4mainEUlS9_E_LS7_1EEENSt11conditionalIXeqT6_LS7_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISA_EEEENSt10filesystem7__cxx114pathEEESaISR_EES1_ISt4pairISL_SN_ESaISV_EEE4typeEOT3_RKS1_ISL_SaISL_EEjOT4_SM_EUlS9_E_EENSJ_IXeqT3_LS7_0EEST_SX_E4typeES17_OSM_jS15 in ccDGi3UY.o
"__ZN3tbb8internal25deallocate_via_handler_v3EPv", referenced from:
__ZN3tbb10interface58internal18split_ordered_listISt4pairIKSt6vectorIjSaIjEEyENS_13tbb_allocatorIS8_EEED1Ev in ccDGi3UY.o
__ZN9Peregrine13MapAggregatorISt6vectorIjSaIjEE15DiscoveryDomainILj1EELNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ERZNS_5matchIS3_S5_LS6_1ELS7_1ERNS_9DataGraphERKZ4mainEUlOT_OT0_E0_Z4mainEUlSC_E_LNS_12OutputOptionE1EEENSt11conditionalIXeqT6_LSJ_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISD_EEEENSt10filesystem7__cxx114pathEEESaISS_EES1_ISt4pairISM_SO_ESaISW_EEE4typeEOT3_RKS1_ISM_SaISM_EEjOT4_SN_EUlSC_E_LSJ_1EED1Ev in ccDGi3UY.o
__ZN3tbb10interface58internal25concurrent_unordered_baseINS0_31concurrent_unordered_map_traitsISt6vectorIjSaIjEEyNS1_12hash_compareIS6_St4hashIS6_ESt8equal_toIS6_EEENS_13tbb_allocatorISt4pairIKS6_yEEELb0EEEE11init_bucketEm in ccDGi3UY.o
_ZN3tbb10interface524concurrent_unordered_mapISt6vectorIjSaIjEEySt4hashIS4_ESt8equal_toIS4_ENS_13tbb_allocatorISt4pairIKS4_yEEEEixERSB in ccDGi3UY.o
_ZN9Peregrine11match_multiISt6vectorIjSaIjEE15DiscoveryDomainILj1EELNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ELNS_12OutputOptionE1ERKZ4mainEUlOT_OT0_E0_RZNS_5matchIS3_S5_LS6_1ELS7_1ERNS_9DataGraphESF_Z4mainEUlSA_E_LS8_1EEENSt11conditionalIXeqT6_LS8_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISB_EEEENSt10filesystem7__cxx114pathEEESaISS_EES1_ISt4pairISM_SO_ESaISW_EEE4typeEOT3_RKS1_ISM_SaISM_EEjOT4_SN_EUlSA_E_EENSK_IXeqT3_LS8_0EESU_SY_E4typeES18_OSN_jS16 in ccDGi3UY.o
_ZN9Peregrine11match_multiISt6vectorIjSaIjEE6DomainLNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ELNS_12OutputOptionE1ERKZ4mainEUlOT_OT0_E1_RZNS_5matchIS3_S4_LS5_1ELS6_1ERNS_9DataGraphESE_Z4mainEUlS9_E_LS7_1EEENSt11conditionalIXeqT6_LS7_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISA_EEEENSt10filesystem7__cxx114pathEEESaISR_EES1_ISt4pairISL_SN_ESaISV_EEE4typeEOT3_RKS1_ISL_SaISL_EEjOT4_SM_EUlS9_E_EENSJ_IXeqT3_LS7_0EEST_SX_E4typeES17_OSM_jS15 in ccDGi3UY.o
_ZN9Peregrine11match_multiISt6vectorIjSaIjEE6DomainLNS_14OnTheFlyOptionE1ELNS_15StoppableOptionE1ELNS_12OutputOptionE1ERKZ4mainEUlOT_OT0_E1_RZNS_5matchIS3_S4_LS5_1ELS6_1ERNS_9DataGraphESE_Z4mainEUlS9_E_LS7_1EEENSt11conditionalIXeqT6_LS7_0EES1_ISt5tupleIJNS_10SmallGraphEDTclcl7declvalIT5_EEcl7declvalISA_EEEENSt10filesystem7__cxx114pathEEESaISR_EES1_ISt4pairISL_SN_ESaISV_EEE4typeEOT3_RKS1_ISL_SaISL_EEjOT4_SM_EUlS9_E_EENSJ_IXeqT3_LS7_0EEST_SX_E4typeES17_OSM_jS15.cold in ccDGi3UY.o
...
ld: symbol(s) not found for architecture x86_64

Extending Multi Label Functionality

Hi,

I have been playing around with the software and I really like it. I am wondering if it is possible for me to extend the software to handle multi labels on vertices? Right now it can only support 1 label per vertex. If so, how should I proceed in doing this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.