Giter VIP home page Giter VIP logo

graphalytics-platforms-giraph's Introduction

Graphalytics Giraph platform driver

Build Status

Getting started

This is a Graphalytics benchmark driver for Apache Giraph. Giraph is an iterative graph processing system built for high scalability, originated as the open-source counterpart to Google's Pregel, inspired by the Bulk Synchronous Parallel model of distributed computation introduced by Leslie Valiant.

  • Make sure that you have installed Graphalytics.
  • Download the source code from this repository.
  • Execute mvn clean package in the root directory (See more details in Software Build).
  • Extract the distribution from graphalytics-{graphalytics-version}-giraph-{platform-version}.tar.gz.

Verify the necessary prerequisites

The softwares listed below are required by the Giraph platform driver, which must be available in the cluster environment. Softwares that are provided are already included in the platform driver.

Software Version (tested) Usage Description Provided
Giraph 1.2.0 Platform Providing Giraph implementation ✔(maven)
Graphalytics 1.0 Driver Graphalytics benchmark suite ✔(maven)
Granula 1.0 Driver Fine-grained performance analysis ✔(maven)
YARN 2.6.1 Deployment Job provisioning and allocation -
Zookeeper 3.4.1 Deployment Synchronizing Giraph workers -
JDK 1.7+ Build Java virtual machine -
Maven 3.3.9 Build Building the platform driver -
  • Yarn: should be reachable in the compute node where the benchmark will be executed.
  • Zookeeper: should be running in a compute node accessible via the network.

Adjust the benchmark configurations

Adjust the Giraph configurations in config/platform.properties:

  • platform.giraph.zoo-keeper-address: Set to the hostname and port on which ZooKeeper is running.
  • platform.giraph.job.heap-size: Set to the amount of heap space (in MB) each worker should have. As Giraph runs on MapReduce, this setting corresponds to the JVM heap specified for each map task, i.e., mapreduce.map.java.opts.
  • platform.giraph.job.memory-size: Set to the amount of memory (in MB) each worker should have. This corresponds to the amount of memory requested from the YARN resource manager for each worker, i.e., mapreduce.map.memory.mb.
  • platform.giraph.job.worker-count: Set to an appropriate number of workers for the Hadoop cluster. Note that Giraph launches an additional master process.
  • platform.hadoop.home: Set to the root of your Hadoop installation ($HADOOP_HOME).

Known Issues

  • Benchmark reports will report nan as processing time when yarn log aggregation is off. The solution is to enable log aggregation in the yarn-site.xml file by setting yarn.log-aggregation-enable to true.

Running a benchmark

To execute a Graphalytics benchmark on Giraph (using this driver), follow the steps in the Graphalytics tutorial on Running Benchmark.

graphalytics-platforms-giraph's People

Contributors

alexandru-uta avatar amusaafir avatar mihaic avatar stijnh avatar szarnyasg avatar thegeman avatar wlngai avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphalytics-platforms-giraph's Issues

Differences and relations between certain configurations in `platform.properties`

From platform.properties:

#platform.giraph.job.worker-cores: 1
#platform.giraph.options.numComputeThreads: 4
#platform.giraph.options.userPartitionCount = 4

What exactly is the meaning of the two of these and how do these differ from eachother? Also, I understood that numComputeThreads requires userPartitionCount to be set to be effective, what exactly is the relation between these two?

Timeout is reached after 900 seconds

Hi, When i run benchmark with command ./bin/sh/run-benchmark.sh, I get errors:

06:23 [ERROR] Timeout is reached after 900 seconds. This benchmark run is forcibly terminated.
06:23 [INFO ] Terminating Yarn job: application_1633923033889_0003
21/10/11 06:23:01 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Killing application application_1633923033889_0003
06:23 [ERROR] A benchmark failure (EXE) is caught by the runner.
21/10/11 06:23:02 INFO impl.YarnClientImpl: Killed application application_1633923033889_0003
06:23 [INFO ] Terminated Yarn job: application_1633923033889_0003
06:23 [WARN ] Terminating runner process forcibly.
06:23 [WARN ] Terminating process 92060 focibly.
06:23 [WARN ] Executing command "kill -9 92060"
06:23 [ERROR] Failed to kill runner process.
06:23 [INFO ] The benchmark run is sucessfully terminated.
06:23 [ERROR] Skipped generation of Granula archive due to benchmark failure.

Is this because the data set is too large?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.