janusgraph / janusgraph Goto Github PK

JanusGraph: an open-source, distributed graph database

License: Other

Java 98.82% Shell 0.65% Batchfile 0.10% Groovy 0.31% Python 0.03% Dockerfile 0.08%

graph-database tinkerpop gremlin hbase cassandra elasticsearch solr bigtable graphdb graph

janusgraph's Introduction

JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster. JanusGraph is a transactional database that can support thousands of concurrent users, complex traversals, and analytic graph queries.

Learn More

The project homepage contains more information on JanusGraph and provides links to documentation, getting-started guides and release downloads.

Visualization

To visualize graphs stored in JanusGraph, you can use any of the following tools:

Community

GitHub Discussions: see GitHub Discussions for all general discussions and questions about JanusGraph
Discord for interactive discussions and questions about JanusGraph: Join the server
Stack Overflow: see the janusgraph tag
Twitter: follow @JanusGraph for news and updates
LinkedIn: follow JanusGraph for news and updates
Mailing lists:
- janusgraph-users (at) lists.lfaidata.foundation (archives) for questions about using JanusGraph, installation, configuration, integrations
  
  To join with a LF AI & Data account, use the web UI; to subscribe/unsubscribe with an arbitrary email address, send an email to:
  - janusgraph-users+subscribe (at) lists.lfaidata.foundation
  - janusgraph-users+unsubscribe (at) lists.lfaidata.foundation
- janusgraph-dev (at) lists.lfaidata.foundation (archives) for internal implementation of JanusGraph itself
  
  To join with a LF AI & Data account, use the web UI; to subscribe/unsubscribe with an arbitrary email address, send an email to:
  - janusgraph-dev+subscribe (at) lists.lfaidata.foundation
  - janusgraph-dev+unsubscribe (at) lists.lfaidata.foundation
- janusgraph-announce (at) lists.lfaidata.foundation (archives) for new releases and news announcements
  
  To join with a LF AI & Data account, use the web UI; to subscribe/unsubscribe with an arbitrary email address, send an email to:
  - janusgraph-announce+subscribe (at) lists.lfaidata.foundation
  - janusgraph-announce+unsubscribe (at) lists.lfaidata.foundation

Contributing

Please see CONTRIBUTING.md for more information, including CLAs and best practices for working with GitHub.

Powered by JanusGraph

Apache Atlas - metadata management for governance (website)
Eclipse Keti - access control service to protect RESTful APIs (website)
Exakat - PHP static analysis (website)
Open Network Automation Platform (ONAP) - automation and orchestration for Software-Defined Networks

Uber Knowledge Graph (event info)
Express-Cassandra - Cassandra ORM/ODM/OGM for Node.js with optional support for Elassandra & JanusGraph

Windup by RedHat - application migration and assessment tool (website)

Users

The following users have deployed JanusGraph in production.

CELUM
Crédit Agricole CIB - use case
eBay - video
FiNC
G DATA - blog post series about malware analysis use case
Netflix - video and slides (graph discussion starts at #86)
Qihoo 360 (about)
Red Hat - application migration and assessment tool built on Windup
Times Internet
Uber

janusgraph's People

Contributors

Stargazers

Watchers

Forkers

mbrukman fppt rameshdharan hsaputra robertdale tuxdna ptgoetz jerryjch codeaudit joshkaufman xephon-contrib gabrielcc2 ngageoint pluradj amcp duckofyork bobquest33 ef-labs icikic twilmes dataexpertise watanabekiyoshi ankit1987 simonellistonball blacknred0 laxatives gendermag david4096 xdev-developer shardings kottmann b20n galoisinc bgorlick krlohnes cjquinon boney9 uber-archive dpitera smarthi jaguarx netflix-skunkworks hkropp huiwenhan chaosbreakers michaelmarkieta gacelita xiangqiao123 tippytto jloveland pinghe davidclement90 datalayer-externals etsangsplk gadgetlabs dioptre chinhuang007 sdmonov robingong schiebel tedhtchang cherrera2001 chrislbs tanp5364 psterk1 nidmgh bjstar kelvinni c-harper anilpacaci kangkot oberonv1 sjudeng santoshborse boyeggplant110 zj15243885020 why333 atomicjets gdtm86 codersea shriroopjoshi rex2068 ottobackwards prkara wan-meng ashrafulsust dingfc victor-en gnomeria fangyongs arnauprat ghaseminya smcquillan ilearnpvb sliversun yhwang pankajydv jianlongzhong msohail07 ngocson2vn

janusgraph's Issues

Cassandra 3 Support

I am interesting in using JanusGraph with Cassandra 3.x. Is there any step that I need to perform to make JanusGraph work with Cassandra 3.x? Are there any special build instructions?

Cassandra node down - Broken pipe not recovering connection

I noticed when my cassandra node is down or a network issue happens, Titan (now JanusGraph) wouldn't recover the connection, for the stack below it seems a Thrift issue but with some try/catch we could make try to reconnect when the network is back up...

2017-01-23 11:28:38.047  WARN 51233 --- [pool-4-thread-1] c.t.titan.diskstorage.log.kcvs.KCVSLog   : Could not read messages for timestamp [2017-01-21T16:20:00Z] (this read will be retried)

com.thinkaurelius.titan.core.TitanException: Could not execute operation due to backend exception
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:44) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:144) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:703) ~[titan-core-1.0.0.jar:na]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_25]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_25]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_25]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_25]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_25]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_25]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_25]
Caused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Permanent failure in storage backend
	at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:249) ~[titan-cassandra-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:148) ~[titan-cassandra-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:91) ~[titan-cassandra-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getSlice(CassandraThriftKeyColumnValueStore.java:80) ~[titan-cassandra-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:769) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.log.kcvs.KCVSLog$MessagePuller$1.call(KCVSLog.java:766) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:133) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation$1.call(BackendOperation.java:147) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:56) ~[titan-core-1.0.0.jar:na]
	at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:42) ~[titan-core-1.0.0.jar:na]
	... 9 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
	at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) ~[libthrift-0.9.2.jar:0.9.2]
	at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) ~[libthrift-0.9.2.jar:0.9.2]
	at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65) ~[libthrift-0.9.2.jar:0.9.2]
	at org.apache.cassandra.thrift.Cassandra$Client.send_multiget_slice(Cassandra.java:735) ~[cassandra-thrift-2.1.9.jar:2.1.9]
	at org.apache.cassandra.thrift.Cassandra$Client.multiget_slice(Cassandra.java:724) ~[cassandra-thrift-2.1.9.jar:2.1.9]
	at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.getNamesSlice(CassandraThriftKeyColumnValueStore.java:129) ~[titan-cassandra-1.0.0.jar:na]
	... 17 common frames omitted
Caused by: java.net.SocketException: Broken pipe
	at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_25]
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_25]
	at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_25]
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_25]
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) ~[na:1.8.0_25]
	at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) ~[libthrift-0.9.2.jar:0.9.2]
	... 22 common frames omitted

Add support for fuzz testing

Per Wikipedia:

Fuzzing or fuzz testing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program. The program is then monitored for exceptions such as crashes, or failing built-in code assertions or for finding potential memory leaks. Fuzzing is a form of random testing commonly used to test for security problems in software or computer systems.

Fuzz testing helps find hard-to-trigger bugs with less manual effort than via hand-written test cases. There are various open-source tools for fuzz testing, but it's possible that we may need to write a custom tool that will provide randomly-generated but valid input for testing JanusGraph.

Fix double-checked locking

Fortify - Double checked locking does not work

Occurs in three places:

Use static analysis tools to find bugs early

We should use static analysis tools to find and avoid bugs at compile time, before they become issues at runtime, and become much harder (and hence, costlier) to find and fix.

These can be tools we run offline ourselves, such as

SpotBugs (formerly known as "FindBugs")
ErrorProne
NullAway – issue #807

or online services such as

Coverity – PR #59

Wikipedia has a list static analysis tools for Java.

Replace uses of Whirr with supported tooling

Apache Whirr has been retired; however, we have several uses of Whirr in the docs:

These should all be removed and replaced with a modern, supported deployment tool, of which there are many (e.g., Chef, Ansible, Puppet, Salt, Terraform, etc.).

Rename Titan* class names to JanusGraph

As per issue title, our fork needs to rename classes to remove the Titan name. JanusGraph is the trademark, use that everywhere except in cases where 'Graph' would be duplicated sequentially.

Add developer contribution guideline docs

Document the JanusGraph development process including:

policy for commits
design docs and project decision making
release policy

Update the Version Compatibility Matrix

http://docs.janusgraph.org/0.1.0-SNAPSHOT/version-compat.html

Update the support matrix after the recent upgrades of the components.
We can wait until the major upgrades are all in before updating this doc for the first release.

Upgrade to BerkeleyDB JE 7.3.7

Oracle Berkeley DB Java Edition, 12c Release 1
Library 12.1.7.3, Version 7.3.7, 2017-02-01 03:44:57 UTC

Release notes

As of version 7.3, JE is licensed under the Apache 2.0 license. See the LICENSE file for the complete license.

w00t

Change log

In JE 7.3 the on-disk file format moved to 14. The file format change is forward compatible in that JE files created with earlier releases can be read when opened with JE 7.3 or later. The change is not backward compatible in that files created with JE 7.3 or later cannot be read by earlier releases. After an existing environment is opened read/write using JE 7.3, the environment can no longer be read by earlier releases.

My initial testing has been successful. Reading in a BerkeleyJE graph with the old version worked fine. Updating the version in janusgraph-berkeleyje/pom.xml didn't introduce any new dependencies.

Add code coverage

We should use a service such as Coveralls, Codecov, or Circle CI's built-in support to track code coverage by tests.

Code coverage is an imperfect metric of test quality (necessary, but not sufficient), but it's an additional useful signal to keep in mind when assessing the quality of the overall test suite.

Python 3.5 support for querying a JanusGraph graph?

Lovely project that will hopefully carry on the great work Titan has started. However, are you guys planning on releasing Python drivers to be able to query intuitively a JanusGraph graph?

Mitigate path manipulation risks

Fortify - High - Require absolute paths and validate file / path inputs

Resource leak vulerabilities

Fortify - should close streams

Add support for mutation testing

Mutation testing can uncover ineffective tests or uncovered conditions without significant manual effort by mutating source code before running the tests and seeing if they still pass.

Per Wikipedia:

Mutation testing (or Mutation analysis or Program mutation) is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors (such as using the wrong operator or variable name) or force the creation of valuable tests (such as dividing each expression by zero). The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution. Mutation testing is a form of white-box testing.

There is a list of mutation testing tools at the bottom of that article.

Here are a few examples of running various mutation testing tools for Java projects.

Categorize tests by runtime

The full test suite has a timeout of 21600 sec (6 hours) in the pom.xml which far exceeds Travis' 50 min timeout.

Experimentally, even following the TESTING.md category designation and running different groups of tests separately (OrderedKeyStoreTests, UnorderedKeyStoreTests and default) in separate jobs still times out.

Tests vary dramatically in runtimes: some tests run in < 1 sec, while others take > 300 sec.

One idea is to create categories such as SmallTests, MediumTests, LargeTests (see blog post) or even simply FastTests vs. SlowTests and use JUnit Category annotations for grouping, and then select subsets of tests for different jobs within a single Travis run, so we can get faster feedback on a change during a code review, even if other parts of the test suite take longer to verify.

Fix access specifier manipulation in Hex.java

Fortify says we should not modify accessibility:

Hex.java

Failure seen while porting JanusGraph on ppc64le

Hi All,

I am trying run the JanusGraph build and automated tests on ubuntu 16.10 and RHEL 7.3 VM ppc64le and for the module janusgraph-hadoop-parent, test case verify-janusgraph-cassandra-test fails on both the distros.

The testcase summary file (/janusgraph/janusgraph-hadoop-parent/janusgraph-hadoop-1/target/failsafe-reports/failsafe-janusgraph-cassandra.xml) shows error in 21 tests.

Further analysis showed 7 testcases failed with IncompatibleClassChangeError and 3 testcases failed with IllegalStateException

Analysis of IncompatibleClassChangeError:
java.lang.IncompatibleClassChangeError: Implementing class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.cassandra.hadoop.HadoopCompat.(HadoopCompat.java:71)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:120)
at org.janusgraph.hadoop.formats.cassandra.CassandraBinaryInputFormat.getSplits(CassandraBinaryInputFormat.java:62)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at org.janusgraph.hadoop.scan.HadoopScanRunner.runJob(HadoopScanRunner.java:136)
at org.janusgraph.hadoop.MapReduceIndexManagement.updateIndex(MapReduceIndexManagement.java:186)
at org.janusgraph.hadoop.AbstractIndexManagementIT.testRemoveGraphIndex(AbstractIndexManagementIT.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

Analysis on IllegalStateException
java.lang.IllegalStateException: java.lang.ExceptionInInitializerError
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at org.apache.tinkerpop.gremlin.process.computer.traversal.step.map.ComputerResultStep.processNextStart(ComputerResultStep.java:80)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:126)
at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:37)
at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:157)
at org.janusgraph.hadoop.CassandraInputFormatIT.testReadGraphOfTheGods(CassandraInputFormatIT.java:55) Caused by: java.lang.ExceptionInInitializerError: null
at org.apache.spark.storage.DiskBlockManager.addShutdownHook(DiskBlockManager.scala:147)
at org.apache.spark.storage.DiskBlockManager.(DiskBlockManager.scala:54)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:75)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:173)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:347)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:194)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.(SparkContext.scala:450)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2258)
at org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
at org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
at org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
at org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$28(SparkGraphComputer.java:138)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoSuchFieldException: SHUTDOWN_HOOK_PRIORITY
at java.lang.Class.getField(Class.java:1703)
at org.apache.spark.util.SparkShutdownHookManager.install(ShutdownHookManager.scala:220)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks$lzycompute(ShutdownHookManager.scala:50)
at org.apache.spark.util.ShutdownHookManager$.shutdownHooks(ShutdownHookManager.scala:48)
at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:189)
at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala:58)
at org.apache.spark.util.ShutdownHookManager$.(ShutdownHookManager.scala)
at org.apache.spark.storage.DiskBlockManager.addShutdownHook(DiskBlockManager.scala:147)
at org.apache.spark.storage.DiskBlockManager.(DiskBlockManager.scala:54)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:75)
at org.apache.spark.storage.BlockManager.(BlockManager.scala:173)

Steps I followed were:
git clone https://github.com/JanusGraph/janusgraph && cd janusgraph
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
mvn install
I am doing this build on Ubuntu 16.04 and RHEL 7.3 ppc64le. Please help me in resolving this issue.

Thanks,
Archa

Improve support for traversal interrupts

Adding issue as requested in #78 (comment):

Traversal interrupts in at least the HBase backend currently lead to a JanusGraphException rather than a TraversalInterruptException during read in BackendTransaction. Goal here would be to refactor to support throwing and propagating a TraversalInterruptException directly.

Remove hadoop 1 support

Do we still need to keep the hadoop 1 profile/module?

Remove HBase 0.96 support

HBase 0.96 is an old release and has been end-of-serviced. Let's deprecate its support.

Hadoop InputFormat tests should check property values

Adding issue as requested in #81 (comment):

CassandraInputFormatIT and HBaseInputFormatIT tests (via AbstractInputFormat in #81) check property counts but should also check property values.

No Releases Available yet

0.1.0-SNAPSHOT is listed in documentation, but is not in releases in github yet.

http://docs.janusgraph.org/0.1.0-SNAPSHOT/getting-started.html

Add license headers to all files

We need to add a license header to all source code (mostly *.java but also *.sh, etc.) as follows, and replace existing ones, if they don't match it:

// Copyright 2017 JanusGraph Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

This depends on issue #13 since we first need to have a list of authors (aka copyright holders) before we can use the name "JanusGraph Authors".

Use string constants for Elasticsearch config properties

Adding issue as requested in #79 (comment):

ElasticSearchIndex and ElasticSearchSetup use configuration properties as hardcoded strings. These properties should be collected and used as string constants.

Upgrade Apache Commons Collections to v3.2.2

Version 3.2.1 has a CVSS 10.0 vulnerability. That's the worst kind of
vulnerability that exists. By merely existing on the classpath, this
library causes the Java serialization parser for the entire JVM process
to go from being a state machine to a turing machine. A turing machine
with an exec() function!

https://commons.apache.org/proper/commons-collections/security-reports.html
http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-and-your-application-have-in-common-this-vulnerability/

thinkaurelius/titan#1277

Support Apache TinkerPop 3.2

3.2.3 is latest at time of this writing.

Add timeouts to test classes and methods

The entire test suite today takes an extremely long time; while there appears to be a global timeout of 21600 seconds (6 hours), running this command:

$ mvn clean install surefire:test --fail-never

may not terminate (and hasn't in over two days). Each of the currently-defined categories of tests does not terminate within the 50 minute timeout that Travis sets for tests.

We should add timeouts on a per-test basis via:

@Rule(timeout=1000)  // in milliseconds
public void testWithTimeout() { ... }

or a single global timeout that will be applied to all test methods individually:

import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.Timeout;

public class HasGlobalTimeout {
    public static String log;
    private final CountDownLatch latch = new CountDownLatch(1);

    @Rule
    public Timeout globalTimeout = Timeout.seconds(10); // 10 seconds max per method tested

    @Test
    public void testSleepForTooLong() throws Exception {
        log += "ran1";
        TimeUnit.SECONDS.sleep(100); // sleep for 100 seconds
    }

    @Test
    public void testBlockForever() throws Exception {
        log += "ran2";
        latch.await(); // will block 
    }
}

Remove HBase 0.94 support

HBase 0.94 is an old release. Let's deprecate its support.

Support HBase 1.2

HBase's current stable release is 1.2.4.
https://archive.apache.org/dist/hbase/stable/

Let's add support for HBase 1.2.x line.

Various Gremlin query patterns throw NPE when query.batch=true

These queries fail when running with query.batch=true.

Here are a few simple examples:

g.V().has('id', 1).emit().repeat(out('knows'))
g.V().match(__.as('a').out().as('b'))

Right now the TinkerPop test suite is not executed with query.batch enabled so I believe this should be enabled first under this issue. With that running, we'll have a better idea of the full scope of query.batch issues and can address the problem.

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-cassandra.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> v1 = graph.addVertex('id', 1)
==>v[4240]
gremlin> v2 = graph.addVertex('id', 2)
==>v[4096]
gremlin> v1.addEdge('knows', v2)
==>e[176-39s-1lh-35s][4240-knows->4096]
gremlin> g.tx().commit()
==>null
gremlin> g.V().has('id', 1).emit().repeat(out('knows'))
16:24:29 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(id = 1)]. For better performance, use indexes
==>v[4240]
java.lang.NullPointerException
Display stack trace? [yN] y
java.lang.NullPointerException
	at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.flatMap(JanusGraphVertexStep.java:109)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:47)
	at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.processNextStart(JanusGraphVertexStep.java:102)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:54)
	at org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep$RepeatEndStep.standardAlgorithm(RepeatStep.java:251)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.step.branch.RepeatStep.standardAlgorithm(RepeatStep.java:162)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:147)
	at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:182)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)

gremlin> g.V().match(__.as('a').out().as('b'))
16:28:46 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>[a:v[4240], b:v[4096]]
java.lang.NullPointerException
Display stack trace? [yN] y
java.lang.NullPointerException
	at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.flatMap(JanusGraphVertexStep.java:109)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.FlatMapStep.processNextStart(FlatMapStep.java:47)
	at org.janusgraph.graphdb.tinkerpop.optimize.JanusGraphVertexStep.processNextStart(JanusGraphVertexStep.java:102)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ExpandableStepIterator.next(ExpandableStepIterator.java:54)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.MatchStep$MatchEndStep.processNextStart(MatchStep.java:460)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:147)
	at org.apache.tinkerpop.gremlin.process.traversal.step.map.MatchStep.standardAlgorithm(MatchStep.java:313)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.ComputerAwareStep.processNextStart(ComputerAwareStep.java:47)
	at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.hasNext(AbstractStep.java:140)
	at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.hasNext(DefaultTraversal.java:147)
	at org.apache.tinkerpop.gremlin.console.Console$_closure3.doCall(Console.groovy:182)

Make snapshot builds available

https://twitter.com/funkatron/status/819972558871293952

Making snapshot distribution zips available before the first official release will lower the bar for people wanting to try out JanusGraph.

Add list of authors and contributors

We need to keep and maintain a list of all authors (copyright holders, which may be individuals or corporations) and contributors (individuals who actually did the work).

For an example, see Go authors and contributors. Some projects conflate the two categories, which makes it unclear who owns the copyright vs. who contributed effort to the project.

We need to be precise in this regard, because our copyright statement will be something akin to:

Copyright 2017 JanusGraph Authors

so we need to list the authors and not just contributors, who may have signed away their copyright to someone else or an organization.

Note, however, that the Go project re-uses the list of contributors also as a list of folks who are eligible to contribute due to having signed the CLA, but not who specifically contributed. We will not be making the list of CLA signatories public, but only the list of folks who actually contributed to the project.

Fix npe vunerabilities

Fortify - High - can crash the program

Add a guide for contributors

We should add a guide for how to contribute to the project, including:

how / where to sign the ICLA / CCLA
how to branch and where to push branches for review (personal repo vs. main repo)
whether or not to squash / rebase /etc.
how to review PRs (e.g., watching for cla: yes vs. cla: no labels, test results, etc.)

Fix log forging vunerabilities

Fortify - validate input from main or dont use input from main in log statement

ConfigurationLint.java - Fixed in #469 (verify)
GraphDatabaseConfiguration.java
GraphDatabaseConfiguration.java
JanusGraphFactory - Fixed in #469 (verify)
JanusGraphFactory - Fixed in #469 (verify)

calls to notify - should we keep track of the thread to call notify on?

Fortify low

Fix insecure randomness

Fortify - Use SecureRandom instead of Random

Update ElasticSearch dependency

JanusGraph officially is only compatible with ElasticSearch 1.5.x, a release from late 2014 that is no longer supported (EOL was September 2016 https://www.elastic.co/support/eol).

http://docs.janusgraph.org/0.1.0-SNAPSHOT/version-compat.html

I am running Titan in production with ES 1.7.5 with no issues, but that just hit EOL as well. The 2.0 branch is still maintained, and ES recently jumped to 5.x releases. There are security concerns with the older, unmaintained branches.

Related: Titan requires the dynamic scripting feature of ES, which is off by default in newer versions. I do not know how strict of a requirement this is, but it would be nice to not rely on it if possible.
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-security.html

Looks like we are missing modes-rexster.png file?

In the document page for HBase:

http://docs.janusgraph.org/0.1.0-SNAPSHOT/hbase.html

There is a missing link to an image:

http://docs.janusgraph.org/0.1.0-SNAPSHOT/images/modes-rexster.png

New Cassandra backend using CQL w/ prepared statements

Thrift is deprecated and will be removed in Cassandra 4.0.

For perf reasons, avoid IN() clauses for selects, prefer concurrent async queries.

adding vertex with custom id?

When g.addV(), there are no way to specify the id. The id is auto-generated.

Is it possible to specify the id directly? We have UUID for all entities. Avoiding another redirection will save the extra index on UUID.

Address DNS vunerabilities

Fortify - High - DNS can be spoofed - should pass in IP addresses instead.

Cassandra keyspace (storage.cassandra.keyspace) and (titan and titan-version)

Trying to migrate my graph to JanusGraph, but apparently cassandra keyspace changed from titan to janusgraph, which makes sense, so I just updated the prop storage.cassandra.keyspace to titan so I could access my data as it was before, now it complains about janusgraph-version..
Some debugging indicated me that if I also provide "titan-version" it works, but I don't have that option as of now, so now what to do? Should I migrate (please don't tell me that) to a new janusgraph?

If I replace the following line https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-core/src/main/java/org/janusgraph/graphdb/configuration/GraphDatabaseConfiguration.java#L1384
with:

globalWrite.get(new ConfigOption<String>(GRAPH_NS,"titan-version",
            "The version of JanusGraph with which this database was created. Automatically set on first start. Don't manually set this property.",
            ConfigOption.Type.FIXED, String.class).hide())

This is the error I'm getting with keyspace set to titan.

Caused by: java.lang.IllegalStateException: Need to set configuration value: root.graph.janusgraph-version
	at com.google.common.base.Preconditions.checkState(Preconditions.java:197) ~[guava-18.0.jar:na]
	at org.janusgraph.diskstorage.configuration.ConfigOption.get(ConfigOption.java:230) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]
	at org.janusgraph.diskstorage.configuration.BasicConfiguration.get(BasicConfiguration.java:70) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]
	at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1384) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]
	at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:108) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]
	at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:98) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]
	at org.janusgraph.core.JanusGraphFactory$Builder.open(JanusGraphFactory.java:153) ~[janusgraph-core-0.1.0-SNAPSHOT.jar:na]

Without keyspace to titan, it just seems to be an empty graph for obvious reasons.

Publish snapshot artifacts somewhere with source

... so that people can build on janus while we incubate and be able to debug with source code.

Update documentation

There are several places where documentation appears:

top-level of the repo: UPGRADE.asc, CHANGELOG.asc, etc.
in the docs/ directory
in the janusgraph-docs/ module

These should be changed as follows:

CHANGELOG.asc should be emptied out, and we should start from scratch with a section on "Changes from Titan"
UPGRADE.asc should either be removed or emptied out or include the section on "Upgrading from Titan" which would essentially duplicate "Changes from Titan"
TESTING.md file actually uses the string github.com/thinkaurelius/titan/wiki which needs to be replaced separately with a pointer to documentation or removed entirely
Documentation should be updated once the licensing issues are finalized

This issue is a carryover from PR #8.

Enable automated testing on Travis CI

Right now, our Travis config only runs a build, but no tests, because the full suite times out: Travis CI build/test jobs time out after 50 minutes, and the test suite requires far longer than that.

Note that a Travis CI has a concept of a Build which consists of one or more Jobs: each Job has a 50 minute timeout, but there's no limit (to my knowledge) on the number of Jobs that can be included in a single Build, so in theory, we can split up the test suite as needed to fit into the 50 minute timeout.

The goal of this issue is to enable automated test runs for each PR and on the master branch on Travis CI. To accomplish this, we may also need to address one or more of the following issues:

[COMMITTERS] Register with Sonatype

In order for committers to push artifacts to maven central as part of the release process, each must have a Sonatype Nexus account with permissions for the "org.janusgraph" group ID.

I have a Nexus account and can set this up, but it would be best if I can setup the "org.janusgraph" users in bulk.

Anyone who is interested in acting as a release manager for JanusGraph should register for a JIRA account at the following URL:

https://issues.sonatype.org/secure/Dashboard.jspa

Once you have registered, respond to this issue with your JIRA ID. Once we have a list of IDs, I will proceed with setting up the Nexus account.

ScyllaDB support for Thrift and CQL backends

Is it possible to get some links or add documentation of Storing JanusGraph on a ScyllaDB ?

If it is already in the works... no issues....

migrate es to HTTP client and not TransportClient (on deprecation path)

At some point in the future, support for the Transport Client (native Java client) will be removed.
To mitigate, we need to use the HTTP API.