Comments (6)
Nope, it was my bad experience using spark for network algorithms that motivated this benchmark to find a better alternative. My hunch is that it would be at networkx level of performance and even worse if distributed.
If you are mainly working with the graph structure itself, I find it's often beneficial to put the graph in memory using one of these libraries and have an external (possibly distributed) storage for the meta information.
from graph-benchmarks.
It's definitely worth trying with graph-tool. It uses the C++ Boost library internally and should be on par with NetworkKit in terms of memory usage. It's also pretty easy to install the package and try it out!
Distributed graph computation is a hard task and I try to avoid spark as much as possible.
from graph-benchmarks.
I typically use graph-tool (python) or lightgraphs (julia) and both are fine memory wise. Lightgraph even has a squash method that returns the smallest int type required to represent the graph. Networkx is less efficient especially if you are using a multigraph as it uses a dict structure instead of a adjacency list (if I recall correctly).
from graph-benchmarks.
Thank you! This is very helpful :) I’ll check them out. Btw, have you considered including GraphX in the scope? Even with one single machine, the Spark-based package might do a good job at parallelizing the computation.
from graph-benchmarks.
Thank you for sharing your experiences!
from graph-benchmarks.
Amazing work.
Between NetworkKit and Graph-Tool which one do you consider to be more efficient in terms of memory usage?
I have an undirected weighted graph with 50M nodes and 100M edges, I tried several Python libraries and the only library that supported this workload was NetworkKit. The graph takes about 6gb of ram (7gb during creation). My main use case is shortest path queries.
I haven't tried Graph-Tools but if the memory footprint is worse it won't solve my problem, even if the shortest path queries are faster, as your benchmark showed.
I'm using Databricks and a Spark Cluster, I also tried GraphFrames (distributed) with a 5 node cluster, but for shortest paths and most types of queries this lib is trash. All other libs I've tested are running on the cluster's driver machine, since they support multi-threading they're using all cores (8 cores, 28gb ram).
Considering a graph this size, in your opinion is Graph-Tool worth a try?
from graph-benchmarks.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graph-benchmarks.