Giter VIP home page Giter VIP logo

ldbc_snb_datagen's Introduction

LDBC_LOGO

LDBC-SNB Data Generator

Build Status Codacy Badge

The LDBC-SNB Data Generator (Datagen) is the responsible of providing the datasets used by all the LDBC benchmarks. This data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by Datagen, as well as the format of the output files, can be found in the latest version of official LDBC SNB specification document.

ldbc_snb_datagen is part of the LDBC project. ldbc_snb_datagen is GPLv3 licensed, to see detailed information about this license read the LICENSE.txt file.

Quick start

Configuration

Initialize the params.ini file as needed. For example, to generate the basic CSV files, issue:

cp params-csv-basic.ini params.ini

There are three main ways to run Datagen, each using a different approach to configure the amount of memory available.

  1. using a pseudo-distributed Hadoop installation,
  2. running the same setup in a Docker image,
  3. running on a distributed Hadoop cluster.

Pseudo-distributed Hadoop node

To configure the amount of memory available, set the HADOOP_CLIENT_OPTS environment variable. To grab Hadoop, extract it, and set the environment values to sensible defaults, and generate the data as specified in the params-csv-params.ini template file, run the following script:

cp params-csv-basic.ini params.ini
wget http://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar xf hadoop-3.2.1.tar.gz
export HADOOP_CLIENT_OPTS="-Xmx2G"
# set this to the Hadoop 3.2.1 directory
export HADOOP_HOME=`pwd`/hadoop-3.2.1
# set this to the repository's directory
export LDBC_SNB_DATAGEN_HOME=`pwd`
./run.sh

Docker image

SNB datagen images are available via Docker Hub where you may find both the latest version of the generator as well as previous stable versions.

Alternatively, the image can be built with the provided Dockerfile. To build, execute the following command from the repository directory:

docker build . --tag ldbc/datagen

Running

Set the params.ini in the repository as for the pseudo-distributed case. The file will be mounted in the container by the --mount type=bind,source="$(pwd)/params.ini,target="/opt/ldbc_snb_datagen/params.ini" option. If required, the source path can be set to a different path.

The container outputs its results in the /opt/ldbc_snb_datagen/out/ directory which contains two sub-directories, social_network/ and substitution_parameters. In order to save the results of the generation, a directory must be mounted in the container from the host. The driver requires the results be in the datagen repository directory. To generate the data, run the following command which includes changing the owner (chown) of the Docker-mounted volumes.

โš ๏ธ This removes the previously generated social_network directory:

rm -rf social_network/ substitution_parameters && \
  docker run --rm --mount type=bind,source="$(pwd)/",target="/opt/ldbc_snb_datagen/out" --mount type=bind,source="$(pwd)/params.ini",target="/opt/ldbc_snb_datagen/params.ini" ldbc/datagen; \
  sudo chown -R $USER:$USER social_network/ substitution_parameters/

If you need to raise the memory limit, use the -e HADOOP_CLIENT_OPTS="-Xmx..." parameter to override the default value (-Xmx2G).

Hadoop cluster

Instructions are currently not provided.

Graph schema

The graph schema is as follows:

Community provided tools

ldbc_snb_datagen's People

Contributors

agubichev avatar alexaverbuch avatar arnauprat avatar dtdom avatar hannes avatar hegyibalint avatar hobinyoon avatar lassewesth avatar maverick-zhn avatar minhducit avatar miratepuffin avatar mirkospasic avatar mkaufmann avatar norbertmb avatar szarnyasg avatar xsanchez avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.