Giter VIP home page Giter VIP logo

graphblas-algorithms's Introduction

GraphBLAS Algorithms
conda-forge pypi PyPI - Python Version License
Tests Coverage DOI Discord

graphblas-algorithms is a collection of GraphBLAS algorithms written using python-graphblas. It may be used directly or as an experimental backend to NetworkX.

Why use GraphBLAS Algorithms? Because it is fast, flexible, and familiar by using the NetworkX API.

Are we missing any algorithms that you want? Please let us know!
GraphBLAS vs NetworkX
GraphBLAS vs igraph

Installation

conda install -c conda-forge graphblas-algorithms
pip install graphblas-algorithms

Basic Usage

First, create a GraphBLAS Matrix.

import graphblas as gb

M = gb.Matrix.from_coo(
  [0, 0, 1, 2, 2, 3],
  [1, 3, 0, 0, 1, 2],
  [1., 2., 3., 4., 5., 6.],
  nrows=4, ncols=4, dtype='float32'
)

Next wrap the Matrix as ga.Graph.

import graphblas_algorithms as ga

G = ga.Graph(M)

Finally call an algorithm.

hubs, authorities = ga.hits(G)

When the result is a value per node, a gb.Vector will be returned. In the case of HITS, two Vectors are returned representing the hubs and authorities values.

Algorithms whose result is a subgraph will return ga.Graph.

Plugin for NetworkX

Dispatching to plugins is a new feature in Networkx 3.0. When both networkx and graphblas-algorithms are installed in an environment, calls to NetworkX algorithms can be dispatched to the equivalent version in graphblas-algorithms.

Dispatch Example

import networkx as nx
import graphblas_algorithms as ga

# Generate a random graph (5000 nodes, 1_000_000 edges)
G = nx.erdos_renyi_graph(5000, 0.08)

# Explicitly convert to ga.Graph
G2 = ga.Graph.from_networkx(G)

# Pass G2 to NetworkX's k_truss
T5 = nx.k_truss(G2, 5)

G2 is not a nx.Graph, but it does have an attribute __networkx_plugin__ = "graphblas". This tells NetworkX to dispatch the k_truss call to graphblas-algorithms. This link connection exists because graphblas-algorithms registers itself as a "networkx.plugin" entry point.

The result T5 is a ga.Graph representing the 5-truss structure of the original graph. To convert to a NetworkX Graph, use:

T5.to_networkx()

Note that even with the conversions to and from ga.Graph, this example still runs 10x faster than using the native NetworkX k-truss implementation. Speed improvements scale with graph size, so larger graphs will see an even larger speed-up relative to NetworkX.

Plugin Algorithms

The following NetworkX algorithms have been implemented by graphblas-algorithms and can be used following the dispatch pattern shown above.

graphblas_algorithms.nxapi
├── boundary
│   ├── edge_boundary
│   └── node_boundary
├── centrality
│   ├── degree_alg
│   │   ├── degree_centrality
│   │   ├── in_degree_centrality
│   │   └── out_degree_centrality
│   ├── eigenvector
│   │   └── eigenvector_centrality
│   └── katz
│       └── katz_centrality
├── cluster
│   ├── average_clustering
│   ├── clustering
│   ├── generalized_degree
│   ├── square_clustering
│   ├── transitivity
│   └── triangles
├── community
│   └── quality
│       ├── inter_community_edges
│       └── intra_community_edges
├── components
│   ├── connected
│   │   ├── is_connected
│   │   └── node_connected_component
│   └── weakly_connected
│       └── is_weakly_connected
├── core
│   └── k_truss
├── cuts
│   ├── boundary_expansion
│   ├── conductance
│   ├── cut_size
│   ├── edge_expansion
│   ├── mixing_expansion
│   ├── node_expansion
│   ├── normalized_cut_size
│   └── volume
├── dag
│   ├── ancestors
│   └── descendants
├── dominating
│   └── is_dominating_set
├── efficiency_measures
│   └── efficiency
├── generators
│   └── ego
│       └── ego_graph
├── isolate
│   ├── is_isolate
│   ├── isolates
│   └── number_of_isolates
├── isomorphism
│   └── isomorph
│       ├── fast_could_be_isomorphic
│       └── faster_could_be_isomorphic
├── linalg
│   ├── bethehessianmatrix
│   │   └── bethe_hessian_matrix
│   ├── graphmatrix
│   │   └── adjacency_matrix
│   ├── laplacianmatrix
│   │   ├── laplacian_matrix
│   │   └── normalized_laplacian_matrix
│   └── modularitymatrix
│       ├── directed_modularity_matrix
│       └── modularity_matrix
├── link_analysis
│   ├── hits_alg
│   │   └── hits
│   └── pagerank_alg
│       ├── google_matrix
│       └── pagerank
├── lowest_common_ancestors
│   └── lowest_common_ancestor
├── operators
│   ├── binary
│   │   ├── compose
│   │   ├── difference
│   │   ├── disjoint_union
│   │   ├── full_join
│   │   ├── intersection
│   │   ├── symmetric_difference
│   │   └── union
│   └── unary
│       ├── complement
│       └── reverse
├── reciprocity
│   ├── overall_reciprocity
│   └── reciprocity
├── regular
│   ├── is_k_regular
│   └── is_regular
├── shortest_paths
│   ├── dense
│   │   ├── floyd_warshall
│   │   ├── floyd_warshall_numpy
│   │   └── floyd_warshall_predecessor_and_distance
│   ├── generic
│   │   └── has_path
│   ├── unweighted
│   │   ├── all_pairs_shortest_path_length
│   │   ├── single_source_shortest_path_length
│   │   └── single_target_shortest_path_length
│   └── weighted
│       ├── all_pairs_bellman_ford_path_length
│       ├── bellman_ford_path
│       ├── bellman_ford_path_length
│       ├── negative_edge_cycle
│       └── single_source_bellman_ford_path_length
├── simple_paths
│   └── is_simple_path
├── smetric
│   └── s_metric
├── structuralholes
│   └── mutual_weight
├── tournament
│   ├── is_tournament
│   ├── score_sequence
│   └── tournament_matrix
├── traversal
│   └── breadth_first_search
│       ├── bfs_layers
│       └── descendants_at_distance
└── triads
    └── is_triad

graphblas-algorithms's People

Contributors

dependabot[bot] avatar eriknw avatar jim22k avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

graphblas-algorithms's Issues

multi-property graphs and handling of "weight" parameter

Currently, graphblas_algorithms.Graph objects only have a single edge attribute, and the "weight" parameters are ignored when passed a Graph object from our library (it's used when passed a networkx graph). We should strive to match networkx, so we should allow multi-property graphs and handle weight parameters.

To do this, it probably makes the most sense to use multiple graphblas matrices to hold edge data. One matrix should be boolean of all True to indicate the structure, and then we should have a new matrix for each edge property (which may have fewer elements than the structural matrix).

We should also consider how this will affect caching of properties (which probably needs reconsidered anyway). Perhaps we should create a pseudo-syntax (one that doesn't use eval) such as A.select(tril) for structural and A[myweight].select(tril) for a property.

add `id_to_key` as argument to graph constructor

in ga.Graph.__init__ we can specify key_to_id. I can't see any reason why this must be a stdlib.dict and not any other class that implements the typing.Mapping protocol (__getitem__, __len__, __iter__). I have a usecase where I have a more memory efficient mapping than a dict and would like to use it.

However, whenever the id_to_key property is used, this will create a full inverse mapping, undoing any memory efficiency. It would be great to be able to pass an inverse mapping optionally (id_to_key) to avoid this calculation if possible. Currently I'm doing:

G = ga.Graph(..., key_to_id=...)
G._id_to_key = ...

but obviously this relies on setting the 'private' _id_to_key member, which has no API stability guarantees. I'd be happy to implement this change!

square_clustering output using networkx api

digraphTCC2 = ga.Graph.from_networkx(digraphTCC)
clus = nx.square_clustering(digraphTCC2)
print(clus)

for i in clus:
print(i)

I've just found graphblas-algorithms as a way to speed up computations on a networkx graph, and I've integrated it into my work for finding the square clustering coefficient of a graph. The speedup is immense, but I'm not sure how to interpret the result.

Based on the code snippet above (digraphTCC is a valid networkx graph), the result of the function, clus prints out as
<graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550>.

I'm unaware how to interact with a NodeMap object, but when I iterate through it, the values seem to just be node labels. How should I interpret this result, and transform it into the dictionary of {node:clustering} that you would expect from the slow networkx version?

Thank you for your time!

Redo/refactor calculations and caching of properties such as degrees

We compute and cache many properties such as has_self_edges, degrees- (w/o diagonals), degrees+ (with diagonals), etc., which is actually very handy. However, the code is a little complicated (cough cough, my bad) and may be error prone (see: #82 (comment)). It would be great if we could refactor this to be more clear, maintainable, and safe. I imagine such a solution would be more declarative and less procedural.

Raw vs Wrapped objects

Calling an algorithm should accept two types of inputs: a raw Matrix or a wrapped Graph.

  • When given a Matrix, the return type should be raw (i.e. a Matrix or Vector).
  • When given a Graph, the return type should be wrapped (i.e. a Graph or NodeMap).

This will make the interface more uniform when interacting with nxapi as Graphs will always be passed in, so a wrapper instance will be returned. But when using graphblas-algorithms without nxapi, the need for wrapping can be controlled by the user in how they call algorithms.

`NodeMap` and other results need better reprs

From #82, showing <graphblas_algorithms.classes.nodemap.NodeMap object at 0x00000206BC590550> is not the most helpful even though a NodeMap is a MutableMapping.

So, the repr of NodeMap should be updated to look more dict-like. Same for VectorMap, VectorNodeMap, NodeNodeMap, and NodeSet (these could probably use basic docstrings too!).

Improving onboarding and enabling contributions

It will be an ongoing effort to make it easier to contribute to graphblas-algorithms, so lets gather ideas and prioritize in this thread. Heh, right now, I encourage potential contributors to be very patient. Things will get better/easier, I promise!

  • Improve the overall documentation of python-graphblas
  • Add contributing documentation to python-graphblas and graphblas-algorithms
    • How to set up local dev environment, etc. Should borrow liberally from contributing docs from other libraries.
  • Create issues for "starter algorithms"
  • Create "algorithm template" notebook that helps guide development
    • Create example data, load datasets, guidelines for benchmarking, run networkx tests, etc.
  • Create notebooks with exercises as e.g. self-guided tutorials
    • Example starter exercises:
      • Find neighbors of node X
      • Find "friends of friends"
      • Find predecessors (or successors) of directed graphs
      • etc.; there are lots of "simple" algorithms we can build up to
    • I think some notebooks should encourage hands-on experimentation, not just reading
  • I think dev notebooks could be a nice side-effect of adding algorithms
    • For example, for triangle counting, there were lots of possibilities to choose from, and I decided based on benchmarks
    • These notebooks can help demonstrate how algorithm development is done, and help revise algorithms later if necessary
    • These may be messy. Not intended as tutorial notebooks. I have some notebooks around that probably belong somewhere.
    • I don't want to require a notebook for each algorithm, but it's a nice-to-have.
  • Don't forget about contributions other than algorithms!

I'm the most familiar with both python-graphblas and graphblas-algorithms, so I trust my judgement the least regarding how to improve onboarding and enabling contributions. @MridulS, @z3y50n, @ParticularMiner, what do you think would be the most helpful? Also, thanks for your patience :)

Update README

Package is now available one conda-forge. Update README to indicate that.

Automated testing to compare with networkx

It would be super-duper handy to be able to automatically generate input graphs and node selection lists (i.e., masks) to run with both NetworkX and graphblas-algorithms and compare results.

For example, it would be nice to cover:

  • Graph and DiGraph
  • With and w/o self-edges
  • Symmetric and asymmetric DiGraph
  • Purely structural (all edges 1)
  • Edge values contain all combinations of {positive, 0, negative}, and with ints or floats
  • Different edge densities, including full graph (w/ and w/o self-edges)
  • Empty graph, only self-edges
  • Some rows and/or columns are empty (i.e., no in-edges or out-edges or both)
  • Perhaps some graphs with specific shapes: ring, tree, DAG, etc
  • Bipartite graphs
  • More than one groups of connected components (by construction)
  • etc.

Perhaps we could leverage hypothesis to help generate random inputs. I would be delighted if we began with a very small subset of the above.

CC @jim22k who has done similar work in the past. Having this functionality would be incredibly useful in making sure we match NetworkX. For one thing, it would help us determine what to do about self-edges, which may be poorly defined at times for NetworkX (I really don't know if it is or not), but should be well-defined for us: do what NetworkX does.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.