Giter VIP home page Giter VIP logo

libgspanforsofia's Introduction

LibgSpanForSofia

А wrap-up c++ library for gSpan algorithm implemented by Taku Kudo

The detailes about the algorithm can be found in

X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” in Data Mining, 2002. ICDM 2003. Proceedings. 2002, pp. 721–724.

This implementation is referenced by Xifeng Yan, the author of gSpan, in his web page. And is taken from library gBoost, Graph Boosting Toolbox for Matlab. gBoost is dual-licensed under both the GNU General Public License, version 2 and the Mozilla Public License, version 1.1. Accordingly we remain among these settings.

Some details about the algorithm

gSpan finds frequent graph patterns starting from an empty graph and moving to larger and large ones. It uses canoncial code DFS to avoid redandancy in generated subgraphs.

Compilation of the library

You will need

The next step is to convert the project to your most loved envirement. For that run

$ premake4 {configuration}

The most widely used configurations are

$ premake4 vs2005|vs2008|vs2010|vs2012 # for diferent version of visual studio sln* file

$ premake4 gmake # for GNU make file

$ premake4 codeblocks # for cbp file of code::blocks IDE

$ premake4 clean # for removing the created files

The script converts the description from the file premake4.lua to the format of your IDE in love. The file is placed in the build subdirectory. Then, I guess, you know what to do with the resulting file.

Interface of the LibgSpanForSofia library

The interface of the library can be found in the include file.

It consist of one function RungSpan that accepts the same parameters as the original implementation plus a reference to the callback function and an arbitrary data pointer passed to the callback function. The introduction of the callback function allows for a flexibal interaction with the gSpan algorithm.

/**
 * This function that runs gaston on the input dataset.
 * @param inputFileName is the name of the file containing the descriptions of graphs
 * @param support is minimal support (the number of graphs from the dataset) of a discovered graph pattern (a subgraph)
 * @param callback is the callback function, which is used for reporting discovered graphs
 * @param minsize is the minimal number of nodes in discovered graphs 
 * @param maxsize is the maximal number of nodes in discovered graphs (negative number means 'no limitations')
 * @param directed is the flag for indicating if the graphs in the database are directed
 * @return is the success of the operation.
 */
typedef bool (LibgSpanAPI *RungSpanFunc)( LibgSpanDataRef data,
                                          const char* inputFileName, int support, ReportGraphCallback callback, int minsize /*= 0*/, int maxsize /*= -1*/, bool directed /*= false*/ );
bool LibgSpanAPI RungSpan( LibgSpanDataRef data,
    const char* inputFileName, int support, ReportGraphCallback callback, int minsize = 0, int maxsize = -1, bool directed = false );

Callback function

Below you can see the prototype of the callback function

/**
 * A type of function that is used for reporting newly found graphs
 * @param graph is the reported graph
 * @return returns if the graph should be further expanded
 */
typedef bool (LibgSpanAPI *ReportGraphCallback)( LibgSpanDataRef data, const LibgSpanGraph* graph );

This callback function is used to report found graphs, encoded by structure LibgSpanGraph. The reference to data coincides with the one passed to RungSpan function. It is never used inside the algorithm and required only for the user if he/she is needed to pass some additional data to the callback function (e.g., an instance of the callback-processing class).

Graph encoding in callback

The graph is encoded by the following structure and basically is just an enumeration of graph vertices, edges, and zero-based objects (transactions) IDs that refer to graphs from the dataset (tidsets).

// Type for storing label of vertices
typedef int VertexLabelType;
// Type for storing label of edges
typedef int EdgeLabelType;
/**
 * A sructure encoding a graph edge
 */
struct LibgSpanGraphEdge {
  // The 0-based indices of the vertices for this edge.
  int From;
  int To;
  // Label of the edge.
  EdgeLabelType Label;
};

/**
 * A Sructure encoding a graph
 */
struct LibgSpanGraph {
  // Number of vertices and edges in the graph
  int VertexCount;
  int EdgeCount;
  // List of vertices, the size is equal VertexCount
  VertexLabelType* Vertices;
  // List of edges, the size is equal EdgesCount
  LibgSpanGraphEdge* Edges;
  // The support of this graph
  int Support;
  // The set of objects covering this pattern. Length of Objects is @var Support.
  int* Objects;
};

// Any data that should be passed to callback.
typedef void* LibgSpanDataRef;

The return value of the callback function.

The callback function returns a boolean value that adds an aditional control over gSpan. It is very usefull for branch cutting. Basically it returns the necessity of the expanision of the current graph pattern. If the user knows somehow that the current graph and all its supergraph are of no use, it can return false from the callback function saving some computational time. If the callback function always returns true, then the result is exactly the same as the one returned by original gSpan with the corresponding parameters.

libgspanforsofia's People

Contributors

alekseybuzmakov avatar

Stargazers

Vladimir avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.