Giter VIP home page Giter VIP logo

blueprints-accumulo-graph's Introduction

NOTE: This code is no longer maintained. See https://github.com/JHUAPL/AccumuloGraph for a newer implementation.

Blueprints for Accumulo

This is an implementation of the Tinkerpop Blueprints API backed by Accumulo. The graph is stored in a single table in Accumulo. This implementation has support for key/value indexing and some performance tweaks. If indexing is enabled, the index is stored in a separate table.

How to use it

AccumuloGraphOptions opts = new AccumuloGraphOptions();

opts.setConnectorInfo(instance, zookeepers, username, password);
// OR
opts.setConnector(connector);

opts.setGraphTable(graphTable);

// Optional
opts.setIndexTable(indexTable);
opts.setAutoflush(...);
opts.setReturnRemovedPropertyValues(...);
opts.setMock(...);

AccumuloGraph graph = new AccumuloGraph(opts);

Options are as follows:

  • Connector info: Set the information you need to connect to Accumulo. Alternatively, pass in an Accumulo Connector object which represents the connection. If not supplied, mock instance is needed (see below).

  • Graph table: Where to store the graph.

  • Index table: Where to store the key/value index.

  • Autoflush (default: true): Immediately flush changes to Accumulo, rather than waiting for performance reasons. If disabled, may cause timing issues (see caveats).

  • Return removed property values (default: true): The removeProperty method specifies that the value of the removed property is returned. This potentially requires another read from Accumulo. If you don't care what is returned, disable this to speed things up.

  • Use mock instance (default: false): If you don't have an Accumulo cluster lying around, but still want to use this, you can use a "mock" instance of Accumulo which runs in memory and simulates a real cluster.

Caveats

There are definitely bugs.

Timing issues: There may be a lag time between when you add a vertex/edge, set their properties, etc. and when it is reflected in the backing Accumulo table. This is done for performance reasons, but as a result, if you set values and then immediately read them back, the results may be inconsistent. The same holds for key/value indexes. This isn't a problem if you're doing things like bulk loads, or using the graph as read-only, but otherwise it may be problematic. If this is an issue, this can be mitigated somewhat using the autoflush option, where changes are flushed immediately to Accumulo, at the cost of write performance. I have tried to reduce these timing issues as much as possible, but there may still be issues with this, and it needs more testing.

TODO

  • Hadoop integration.
  • Read-only usage. This will enforce only read operations, and would allow caching strategies, and avoid timing issues.
  • Element/property cache, to increase performance for read-only usage.
  • Bulk loading of graph elements.
  • Regular-style indexes, in addition to key/value index.
  • Tuned querying.
  • Benchmarking.
  • Documentation.

Implementation details

The graph is stored in a single table with the following schema.

Row CF CQ Val Purpose
[v id] MVERTEX - - Vertex id
[v id] EOUT [e id] [e label] Vertex out-edge
[v id] EIN [e id] [e label] Vertex in-edge
[e id] MEDGE [e label] - Edge id
[e id] VOUT [v id] - Edge out-vertex
[e id] VIN [v id] - Edge in-vertex
[v/e id] PROP [pname] [pval] Element property

If the index table is enabled, it has the following schema.

Row CF CQ Val Purpose
PVLIST [p name] - - Vertex property list
PELIST [p name] - - Edge property list
[p name] [p val] [v/e id] - Property index

=======

Please contact me if you find any bugs! Thanks!

blueprints-accumulo-graph's People

Contributors

mikelieberman avatar

Stargazers

Agent of Reality avatar  avatar zizai avatar  avatar Josh Elser avatar Christopher Tubbs avatar  avatar Michael Wall avatar Jason Trost avatar

Watchers

Mike Hugo avatar James Cloos avatar  avatar  avatar  avatar

blueprints-accumulo-graph's Issues

Compilation issues.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project blueprints-accumulo-graph: Compilation failure: Compilation failure:
[ERROR] /Users/dev/Software/Blueprints/blueprints-accumulo-graph/src/main/java/accumulograph/AccumuloKeyIndex.java:[86,68] inconvertible types
[ERROR] required: java.lang.Iterable
[ERROR] found: java.lang.Iterable<com.tinkerpop.blueprints.Vertex>
[ERROR] /Users/dev/Software/Blueprints/blueprints-accumulo-graph/src/main/java/accumulograph/AccumuloKeyIndex.java:[91,65] inconvertible types
[ERROR] required: java.lang.Iterable
[ERROR] found: java.lang.Iterable<com.tinkerpop.blueprints.Edge>

AccumuloVertex implements tinkerpop.Vertex and extends AccumuloEelement. I do not think that cast from Iterable to Iterable is legal.
Thx,
Maciek

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.