ldbc / ldbc_graphalytics Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 35.0 2.73 MB

Generic driver for LDBC Graphalytics implementation

Home Page: https://ldbcouncil.org/benchmarks/graphalytics/

License: Apache License 2.0

Java 88.13% Shell 2.21% Python 2.17% CSS 0.52% HTML 0.40% JavaScript 6.45% Verilog 0.02% Euphoria 0.10%

graphalytics ldbc

ldbc_graphalytics's People

Contributors

Stargazers

Watchers

ldbc_graphalytics's Issues

SSSP validation broken for infinity output

Using validation graph from:
https://s3-eu-west-1.amazonaws.com/graphalytics-graphs/index.html

The result contains numeric_limits<int64_t>::max(), but the validation expects Infinity:

18:34:31.206 [INFO ] Validating contents of '/var/scratch/mcapota/output-graphmat-s/datagen-300-SSSP'...
18:34:43.340 [INFO ]  - Vertex 6 has value '9.223372036854776E18', but valid value is 'Infinity'

Add EPS in report

Edges per seconds = number of edges / runtime.

We will examine (semantical) TEPS in the future.

Reporting

Add more details, e.g.:
https://www.spec.org/cpu2006/results/res2014q4/cpu2006-20140922-31653.html
http://icl.cs.utk.edu/hpcc/hpcc_record.cgi?id=382

Add dataset description and show it in the report

E.g., number of nodes and edges, file size.

Warn if graph data is not found, but proceed with processing other graphs

Allows shipping one big configuration files that includes configuration for all the known graphs.

Unnesscary duo jars in C++ platforms.

For Java platforms, two jar files is needed, one for default, another for Granula.However, for C++ platforms, it is entirely unnecessary to have two jar files (they are identical).

graph files are not accessible. please provide a mirror

Hello

I should have reported this problem to an administrator, but I did not find any contact on the project page nor on the github space. The server at atlarge.ewi.tudelft.nl which hosts graph files is down for several days. I would appreciate if you resolve the server issue or provide a secondary mirror for accessing those files.

Best regards,
Fadishei

Refactor GraphX tests to use the new testing code

Add a more precise specification for the algorithms

Choke point analysis

Add an overall validation indicator

As far as I can tell, the only way to determine if there were validation failures in a Graphalytics run is to examine each benchmark validation result. It would be useful to have an indication that the validation succeeded overall, e.g., "Validation successful for all benchmarks" printed at the end of the command line output and at the top of the experimental tab in the HTML report.

Rename config directory to config-template

Javadocs

Machine-readable report format

Add an option to output the benchmark report in a machine-readable format, e.g., TSV.

Add distribution that includes all platforms

Refactor Neo4j tests to use the new testing code

Add configuration for HDFS output paths

Add code style configuration for IDEs (Eclipse/IntelliJ)

And mention it in the documentation.

atlarge.ewi.tudelft.nl server down

I am unable to access the graph datasets. Is there any other place I can access them?

Check output directory before running benchmark

It's a waste of time to run the benchmark and see this at the end:

[main] ERROR nl.tudelft.graphalytics.Graphalytics - Failed to write report:
java.io.IOException: Output directory of report is non-empty: "neo4j-report".

Unit tests

Inaccurate timeout on benchmark run.

Graphalytics benchmark launches independent JVM to execute each benchmark run, such that time-out can be guaranteed.

However, the timeout includes now the proprocess, postprocess and validation time, which is not an accurate measurement.

GraphLab

Graphs not available to donwload

The prescribed path - http://atlarge.ewi.tudelft.nl/graphalytics/ in README to download the graphs returns a 404 error. Please change the link to point to the correct location.

The wrong graphs are benchmarked when the selected graphs cannot be found.

When not a single graphs defined in "benchmark.run.graphs" cannot be found, graphs defined in "graphs.names" will be benchmarked instead.

Zeta and Geometric Generators

Hello,

in the paper, you mention the following:

"To support the ability to generate graphs of different characteristics, we have extended Datagen with the capability to dynamically reproduce different distributions by means of plugins. We have already implemented those for the Zeta and Geometric distribution models, but more will be added in the
future as more real graphs are analysed."

Where can I find the modified LDBC graph gens for Zeta and Geometric distributions?

Monitoring

Unclassified states/failure of benchmark run.

For any benchmark, the chance is high that some benchmark runs will fail. It is easier to diagnose the failures if there are well-classified states of benchmark run, e.g.,

initialization
execution
completion
validation

Update README

Harmonize naming of SSSP

In graphalytics-core, it's called SingleSourceShortestPaths, e.g.:
https://github.com/tudelft-atlarge/graphalytics/blob/master/graphalytics-core/src/main/java/nl/tudelft/graphalytics/domain/algorithms/SingleSourceShortestPathsParameters.java

In graphalytics-platforms-giraph and the other platforms, it's called SingleSourceShortestPath, e.g.:
https://github.com/tudelft-atlarge/graphalytics-platforms-giraph/blob/master/graphalytics-platforms-giraph-platform/src/main/java/nl/tudelft/graphalytics/giraph/algorithms/sssp/SingleSourceShortestPathJob.java
https://github.com/tudelft-atlarge/graphalytics-platforms-graphx/blob/master/graphalytics-platforms-graphx-platform/src/main/scala/nl/tudelft/graphalytics/graphx/sssp/SingleSourceShortestPathJob.scala

Add SonarQube analysis for non-Java code

Results DB

Unconfigurable port number for benchmark processes.

The Graphalytics benchmark operates on port 8011 and 8012. The port numbers should be configurable for easier deployment.

Neo4j

Common Crawl

Exception during report generation with misconfiguration

Exception in Thymeleaf when running benchmark with an algorithm that is not yet configured for all graphs. No result is added for (graph, algorithm) pair, but there should be a "not completed"/"not started" result. Exception text:

Exception evaluating OGNL expression: "report.getResult(graph, algorithm).completedSuccessfully"

Translate all paths to absolute paths

Some paths are not converted correctly by the core to absolute paths before they are passed to the platform extensions. To do: find out which paths, and fix them.

graphalytics-tests does not run every time the core compiles

In the past month, since Build 51, builds have not run graphalytics-tests, even if their graphalytics-core compiles. Normal behavior (bc -giraph, -neo4j, etc. were not recompiled)? Fix?

Hint: could it be graphalytics-validation (was -tests)? @mihaic with the idea.

Extract common testing code

Fix test failures in MR

Included test report posted by Jenkins:

Refer to this link for build results (access rights to CI server needed):
http://jenkins.tribler.org//job/Graphalytics_pull_request_tester/21/

Failed Tests: 2

nl.tudelft.graphalytics:graphalytics-platforms-mapreducev2: 2

Test FAILed.

Remove legacy code

Restructure project

Graphalytics will only include core and validation (currently named graphalytics-tests). Each platform in platforms will be a separate repository. Platform configuration files will be extracted from config.
Each platform will also be a separate Maven project, depending on the core project. Once packaged, each platform code will be used by the core code for actual benchmarking. run-benchmark.sh will sill be used for running the benchmark.

Obtaining binaries will work as follows:

compile-benchmark.sh without flags will install core and validation in the local Maven repository.
compile-platform.sh (new script) in each platform to obtain platform binaries.
Configure path for platform binaries in graphalytics.
package-benchmark.sh (new script) to create a redistributable archive of core (validation?) and platforms, as well as run-benchmark.sh and platform-specific prepare-benchmark.sh scripts.

Add per job configuration

Unimplemented simplified validation

A full validation has been deployed on each benchmark run, which is quite heavy and verbose. Sometimes validation dataset might not be available. It will be easier to determine whether a benchmark execution has completed by checking if the output has the correct size.

Continuous Integration

E.g., Jenkins @ TUDelft. At least compilation + unit tests, perhaps integration tests.

Can not download graphs

The link to http://atlarge.ewi.tudelft.nl/graphalytics/ is broken.

ldbc / ldbc_graphalytics Goto Github PK

ldbc_graphalytics's People

Contributors

Stargazers

Watchers

Forkers

ldbc_graphalytics's Issues

Failed Tests: 2

nl.tudelft.graphalytics:graphalytics-platforms-mapreducev2: 2

Recommend Projects

Recommend Topics

Recommend Org