Giter VIP home page Giter VIP logo

cost's Introduction

COST

COST is an acronym for the "Configuration that Outperforms a Single Thread", indicating the hardware resources required by a distributed system before it begins to outperform a single-threaded implementation. This repository contains the single-threaded implementations providing the baseline performance.

Specifically, this repository contains single-threaded implementations of three graph algorithms, PageRank, label propagation, and union-find, supporting performance measurements taken on two graphs, twitter_rv and uk_2007_05. The code is intended to be instructive, rather than a meaningful replacement for a graph-processing system.

Instructions

The project consists of several independent binaries and a supporting library. The src/bin/ directory has one file for each binary, each of which can be executed by typing

cargo run --release --bin <binary_name> -- <arguments>

All binaries take at least one argument, and can be run with zero arguments to present their usage (and any warnings about files that may be overwritten as a result of executing the binary).

Introducing graph data

The most common first binary to use is to_vertex, which creates a binary representation of data presented as a textual list of pairs of vertex identifiers (one per line). You should be able to type:

% cargo run --release --bin to_vertex
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/to_vertex`
Usage: to_vertex <source> <prefix>
NOTE: <prefix>.nodes and <prefix>.edges will be overwritten.
%

If you acquire some excellent graph data, you could for example type

% cargo run --release --bin to_vertex -- my_graph.txt my_graph

which will create files my_graph.nodes and my_graph.edges. These files will generally be smaller than the textual representation, though the .nodes file will use space proportional to the largest vertex identifier.

Once you have ingressed some graph data, you can also re-arrange the data according to a Hilbert curve, which is an excellent bit of mathematics you can search for and read about if you so care.

% cargo run --release --bin to_hilbert -- my_graph

will produce my_graph.upper and my_graph.lower for pre-existing my_graph.nodes and my_graph.edges. The Hilbert representation can be even a bit tighter, and often has improved performance for several of the algorithms.

Graph algorithms

There are three algorithms here: pagerank, label propagation, and union find. Each has their own binary, and each expects you to supply three arguments: the "mode", which is one of vertex, hilbert, and compressed, the graph filename prefix, and a number greater than the largest vertex identifier (a size for per-vertex state allocation). If you don't know the last number, the stats binary can help you out by scanning the graph for you.

For example,

% cargo run --release --bin union_find -- hilbert ./friendster 66000000
    Finished release [optimized] target(s) in 0.0 secs
     Running `target/release/union_find hilbert ./friendster 66000000`
65608365 non-roots found
%

which reports the number of nodes in the graph minus the number of connected components.

Notes

There is a companion COST repository managed by Microsoft Research, including the state of the project several months ago. This may be helpful if you are interested in the corresponding C# implementations. The repository also contains Naiad implementations that were done more recently. I am no longer affiliated with Microsoft and cannot commit to the repository (nor, historically, do they accept pull requests), and must apologize for the sorry state I left the code in. It may be cleaned up in the future (either by me, or other more industrious souls), given the right incentives.

cost's People

Contributors

frankmcsherry avatar lucasmaystre avatar maxdemarzi avatar vaastav avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cost's Issues

Doesn't build with rust 1.0.0-beta

Hi,

apparently std::os::MemoryMap and std::os::MapOption got removed from the beta release. According to the rust IRC channel, there are bindings to mmap in https://github.com/carllerche/nix-rust/ but it's not a drop-in replacement.

The error message is:

Compiling COST v0.0.1 (file:///Users/hesk/unix/src/COST)
src/typedrw.rs:6:26: 6:37 error: unresolved import `std::os::MapOption::MapReadable`. Could not find `MapOption` in `std::os`
src/typedrw.rs:6 use std::os::MapOption::{MapReadable, MapFd};
                                          ^~~~~~~~~~~
src/typedrw.rs:6:39: 6:44 error: unresolved import `std::os::MapOption::MapFd`. Could not find `MapOption` in `std::os`
src/typedrw.rs:6 use std::os::MapOption::{MapReadable, MapFd};
                                                       ^~~~~
src/typedrw.rs:7:5: 7:23 error: unresolved import `std::os::MemoryMap`. There is no `MemoryMap` in `std::os`
src/typedrw.rs:7 use std::os::MemoryMap;
                     ^~~~~~~~~~~~~~~~~~
error: aborting due to 3 previous errors
Could not compile `COST`.

execution error

Hello,

I am trying to run ./target/release/COST twitter twitter_rv_15066953.net small as written in the instructions and getting the following error message:

thread '

' panicked at 'malformed dst', /home/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/option.rs:330

I am using RUST 1.0.0 64bit on Ubuntu 14.04.

Do you have any suggestions how I can fix this error?

Thank you in advance

Larissa

Error compiling COST

Hi,

I've tried to compile COST but it fails with the following error message:

$ cargo build
   Compiling COST v0.0.1 (file:///Users/hesk/Desktop/tmp/COST)
src/twitter_parser.rs:47:44: 47:67 error: type `core::result::Result<_, _>` does not implement any method in scope named `expect`
src/twitter_parser.rs:47             let src: u32 = elts[0].parse().expect("malformed src");
                                                                    ^~~~~~~~~~~~~~~~~~~~~~~
src/twitter_parser.rs:48:44: 48:67 error: type `core::result::Result<_, _>` does not implement any method in scope named `expect`
src/twitter_parser.rs:48             let dst: u32 = elts[1].parse().expect("malformed dst");
                                                                    ^~~~~~~~~~~~~~~~~~~~~~~
error: aborting due to 2 previous errors
Could not compile `COST`.

Significant improvements for graph computation tasks during reproduction of results compared to original paper

Hi,
Sorry for bugging like this but I am seeing significant improvements from running the binaries on the twitter and uk-2007-05 datasets as compared to the original results and I was wondering if I could get some insight as to why the results differ so drastically.

Here are the results for the tasks:

Label_Prop :

Scalable System | twitter | uk

  • Original SSD | 153s | 417s
  • New Hilbert SSD | 30s | 57s
  • New Vertex SSD | 93s | 58.5s

PageRank :

Scalable System | twitter | uk

  • Original SSD | 300s | 651s
  • New Vertex SSD | 289.6s | 119.7s
  • New Hilbert SSD | 84s | 141.6s

Union_Find :

Scalable System | twitter | uk

  • Original SSD | 15s | 30s
  • New Vertex SSD | 24.2s | 11.744s
  • New Hilbert SSD | 5.9s | 13.528s

Here are the specifications for the machine I am running this on:
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz with 32GB RAM and 250GB SSD.
The machine was running Ubuntu 16.04 LTS.
I am certain that the machine I ran the experiments on are more powerful than a 2014 Macbook Pro but I am not sure if it should create such a different in the performance numbers.

PS Thanks for making this Open Source!!! :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.