Giter VIP home page Giter VIP logo

orientdb-neo4j-importer's Introduction

Develop: Build Status 2.2.x: Build Status

Neo4j to OrientDB Importer

Documentation

http://orientdb.com/docs/last/OrientDB-Neo4j-Importer.html

Internals

Compile

mvn clean install

To skip tests:

mvn clean install -DskipTests

To run only a specific test, e.g. shouldImportEmptyDb:

mvn -Dtest=ONeo4jImporterTest#shouldImportEmptyDb test

Tests

The test databases are created using the following queries:

graphdb_empty_db (test shouldImportEmptyDb)

Empty database

graphdb_unique_constraints_only (test shouldImportUniqueConstraintsOnlyDb)

CREATE CONSTRAINT ON (n:NodeLabelA) ASSERT n.p_number   IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelB) ASSERT n.p_string   IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelC) ASSERT n.p_boolean  IS UNIQUE

graphdb_nodes_only (test shouldImportNodesOnlyDb)

foreach(x in range(1,10) | create (:NodeLabelA {p_number:x, other_property: "NodeLabelA-"+x}))
foreach(x in range(1,10) | create (:NodeLabelB {p_string:"string_value_" + x, other_property: "NodeLabelB-"+x}))
foreach(x in range(1,5)  | create (:NodeLabelC {p_boolean:false, other_property: "NodeLabelC-"+x}))
foreach(x in range(6,10) | create (:NodeLabelC {p_boolean:true, other_property: "NodeLabelC-"+x}))

graphdb_nodes_only_no_labels (test shouldImportNodesOnlyNoLabelsDb)

foreach(x in range(1,10) | create ( {p_number:x, other_property: "string-"+x}))
foreach(x in range(1,10) | create ( {p_string:"string_value_" + x, other_property: "string-"+x}))
foreach(x in range(1,5)  | create ( {p_boolean:false, other_property: "string-"+x}))
foreach(x in range(6,10) | create ( {p_boolean:true, other_property: "string-"+x}))

graphdb_nodes_only_mixed_labels_and_no_labels (test shouldImportNodesOnlyMixedLabelsNoLabelsDb)

foreach(x in range(1,10) | create (:NodeLabelA {p_number:x, other_property: "NodeLabelA-"+x}))
foreach(x in range(1,10) | create ( {p_string:"string_value_" + x, other_property: "string-"+x}))
foreach(x in range(1,5)  | create (:NodeLabelC {p_boolean:false, other_property: "NodeLabelC-"+x}))
foreach(x in range(6,10) | create ( {p_boolean:true, other_property: "string-"+x}))

graphdb_nodes_only_label_case_test (test shouldImportNodesOnlyLabelCaseDb)

foreach(x in range(1,10) | create (:NodeLabelA {p_number:x, other_property: "NodeLabelA-"+x}))
foreach(x in range(1,10) | create (:NodeLABELA {p_string:"string_value_" + x, other_property: "NodeLABELA-"+x}))
foreach(x in range(1,5)  | create (:NodeLabelC {p_boolean:false, other_property: "NodeLabelC-"+x}))
foreach(x in range(6,10) | create ( {p_boolean:true, other_property: "string-"+x}))

graphdb_nodes_only_label_case_test_constraints (test shouldImportNodesOnlyLabelCaseConstraintsDb)

CREATE CONSTRAINT ON (n:NodeLabelA) ASSERT n.p_number        IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelB) ASSERT n.p_number        IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLABELB) ASSERT n.p_number        IS UNIQUE

foreach(x in range(1,10) | create (:NodeLabelA {p_number:x, other_property: "NodeLabelA-"+x}))
foreach(x in range(1,10) | create (:NodeLABELA {p_string:"string_value_" + x, other_property: "NodeLABELA-"+x}))
foreach(x in range(1,10) | create (:NodeLabelB {p_number:x, other_property: "NodeLabelB-"+x}))
foreach(x in range(1,10) | create (:NodeLABELB {p_number:x, other_property: "NodeLABELB-"+x}))

graphdb_nodes_only_multiple_labels (test shouldImportNodesOnlyMultipleLabelsDb)

foreach(x in range(1,10) | create (:NodeLabelA:NodeLabelB {p_number:x, other_property: "NodeLabelA-NodeLabelB-"+x}))
foreach(x in range(1,10) | create (:NodeLabelC:NodeLabelD {p_string:"string_value_" + x, other_property: "NodeLabelC-NodeLabelD"+x}))
foreach(x in range(1,10) | create (:NodeLabelE {p_boolean:true, other_property: "NodeLabelE-"+x}))

graphdb_multiple_labels_and_constraints (test shouldImportMultipleLabelsAndConstraintsDb)

CREATE CONSTRAINT ON (n:NodeLabelA) ASSERT n.p_number        IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelB) ASSERT n.p_number        IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelC) ASSERT n.p_string        IS UNIQUE
CREATE CONSTRAINT ON (n:NodeLabelE) ASSERT n.other_property  IS UNIQUE

foreach(x in range(1,10) | create (:NodeLabelA:NodeLabelB {p_number:x, other_property: "NodeLabelA-NodeLabelB-"+x}))
foreach(x in range(11,20) | create (:NodeLabelB {p_number:x, other_property: "NodeLabelB-"+x}))
foreach(x in range(1,10) | create (:NodeLabelC:NodeLabelD {p_string:"string_value_" + x, other_property: "NodeLabelC-NodeLabelD"+x}))
foreach(x in range(1,10) | create (:NodeLabelE {p_boolean:true, other_property: "NodeLabelE-"+x}))

orientdb-neo4j-importer's People

Contributors

lvca avatar robfrank avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

orientdb-neo4j-importer's Issues

Allow customized strategy in case labels (or relationship types) have same name but different cases

With current version, 2.2.14:

  • Neo4j Nodes with same Label but different case, e.g. LABEL and LAbel will be aggregated into a single OrientDB vertex Class
  • Neo4j Relationship with same name but different case, e.g. relaTIONship and RELATIONSHIP will be aggregated into a single OrientDB edge Class

This is an enhancement request to allow users to customize the strategy. The above can be one strategy, but other strategies are possible, e.g. append a random string to the label or relationship type, and migrate them into separated classes

This seems to be not a common case - but could be useful to improve how we behave here anyway

Improve automated tests

Improve automated test suite by adding more databases:

  • movie
  • northwind
  • reactome
  • panama papers

and

  • more asserts

Nodes with multiple labels are not recognized

By design, nodes with multiple labels should be recognized, and only the first label used during the import into OrientDB

However, due to a bug, number of nodes with multiple labels is not written in the migration summary, and instead of the first label the last one is used during the migration

Generate kind of "check" file

That can be used, after the migration, to compare original and migrated databases

In particular, it would be useful to have:

  • list of neo4j labels
  • list of neo4j constraints and indexes

compared to:

  • list of orientdb classes
  • lis of orientdb contraints and indexes

Improve migration in case of multi-labels

Right now only the first label is migrated and info on the other labels is loss

List of labels can be stored in an additional vertex property of type EMBEDDEDLIST STRING. This property can be indexed (this will allow fast queries "by labels")

Even if the multi-label nodes are migrated into a single class, it will then be possible to filter nodes that had a specific label, e.g.

select * from V where Neo4jLabelList IN ['NodeLabelB']

and equivalent match, e.g.

MATCH {class: V, as: NodeLabelB, where: (Neo4jLabelList IN ['NodeLabelB']) } RETURN NodeLabelB.Neo4jNodeID, NodeLabelB.other_property, NodeLabelB.Neo4jLabelList

Too much info written in the migration log on certain conditions

On certain conditions, too much info is written in the migration log

As a consequence migration is slowed down and log becomes unreadable

One case is when a neo4j's relationship type with the same name of a neo4j's label is found. In this case a message similar to the following is written:

WARNING: Found a Neo4j Relationship ('summation') with same name of a Neo4j node Label ('Summation'). Importing this relationship in OrientDB as 'E_summation

However this message is written for every relationship that has the same type of a node label. As a consequence, if there are many such relationships lot of similar messages will be written in the log, and this will impact migration performance

A possible fix is to write the message only for the first found relationship type (and not for all found relationships of that type)

Easily reproducible with reactome.graphdb

Improve migration in case of Neo4j nodes with multiple labels: allow customized mapping between label and classes

Version 2.2.14 improved a bit how the migration is done in case the original Neo4j nodes have multiple labels

But there's an important additional step do support and release to improve user's experience: allow a customized mapping between Neo4j labels and OrientDB classes

Let's suppose there are N labels in Neo4j. User should be able, using a configuration file, to decide how these N labels are migrated. A few cases:

A. Single label nodes

N labels -migrated-to-the-corresponding-> N classes (happens already today)

N labels -migrated-to-> M classes , where M<>N. Users decide their own mapping, e.g.

N1 label --> M1 class
N2 label --> M2 class
N3 & N4 labels --> M3 class
N5 label --> M4 class
etc

A. Multiple label nodes

Customized mapping is important in particular in case of neo4j nodes with multiple labels:

N1 & N2 labels --> M1 class
N3 & N4 labels --> M2 class
N5 & N6 labels --> M3 class
etc

Customized mapping details

The customization can be done using a configuration file, that the import will read at run-time.

In the first stage the user is expected to edit the configuration file manually.

In the second stage, after the integration between the importer and the Studio tool has been done, the user should be able to do such mapping visually, in a similar way on how visual mapping happens with the tool teleporter.

Note that during the visual mapping process a configuration file is created (in a transparent way for the user), and the importer will then read it and no additional modification to the import code will be needed (in other words, in this second stage only the way the configuration file is created changes: it can be created either manually or visually, but the code in the import remains the same)

Implementation details

From a technical point of view, to allow customized mapping between labels and classes only the part related to nodes migration has to be changed. Relationships and schema migration is not affected

In particular, only the part where the vertex class (to be used) is determined has to be changed. To keep code clean and clear, a function that returns the class name can be used, and all the logic (read the configuration file and determine the class) can be put into this function.

Edge Classes: create a property and a unique index on Neo4jRelID

This will improve how migrated relationships can be queried by original Neo4j relationship ID

Possible Workaround:

  • using Studio or the Console, on the E (edge) class, create a property named Neo4jRelID and then a unique index on this property. This will allow you to query relationship by neo4j id using a query like:

select * from E where Neo4jRelID = your_id_here

enhancement: configuration file that defines prefs/destination targets

I'd like to suggest something...
an import prefs configuration file that defines

the DB name for the import
the names of the Neo4Jid and Neo4jLabels properties to import to
whether or not to overwrite or append to (possible) existing database

IHMO, I would like to use a particular DB rather than neo4j_import ( otherwise I have to rename it when I'm done importing )
I would like the ID and labels to be something I use going forward rather than renaming those properties or copying those properties to something else
I would like to import into an existing DB so I don't have to repeat the process over and over.

...maybe there are other configuration items possible

3.0 - Remove TP2 and use new API

Examples

Vertex Type

- OrientVertexType writer = graph.createVertexType("Writer");
+ OClass writer = schema.createClass("Writer", v);

EdgeType

- graph.createEdgeType("Writes");
+ schema.createClass("Writes", e);

AddEdge

-    db.addEdge("class:Writes", writer, post, null);
+    db.newEdge(writer, post, "Writes");

AddVertex

-    OrientVertex post = db.addVertex("class:Post");
+    OVertex post = db.newVertex("Post");

Dep

-import com.tinkerpop.blueprints.impls.orient.OrientBaseGraph;
-import com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx;
-import com.tinkerpop.blueprints.impls.orient.OrientVertex;
-import com.tinkerpop.blueprints.impls.orient.OrientVertexType;

+import com.orientechnologies.orient.core.record.OVertex;
+import com.orientechnologies.orient.core.db.ODatabase;

Pom

- <artifactId>orientdb-graphdb</artifactId>
+<artifactId>orientdb-server</artifactId>

Optimize logging

Avoid progresses are written every time one vertex or edge has been created, but rather update progress % every say 1 second

This should improve performance

Improve logging: print vertex / edge speed creation

Right now we have it at the end of the import process, as "average" speed

Would be good to have also a "point" or instantaneous speed. This would allow to see if this speed is changing / degrading over time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.