Giter VIP home page Giter VIP logo

cayley's Introduction

Tests

Cayley is an open-source database for Linked Data. It is inspired by the graph database behind Google's Knowledge Graph (formerly Freebase).

Get it from the Snap Store

Features

  • Built-in query editor, visualizer and REPL
  • Multiple query languages:
  • Modular: easy to connect to your favorite programming languages and back-end stores
  • Production ready: well tested and used by various companies for their production workloads
  • Fast: optimized specifically for usage in applications

Performance

Rough performance testing shows that, on 2014 consumer hardware and an average disk, 134m quads in LevelDB is no problem and a multi-hop intersection query -- films starring X and Y -- takes ~150ms.

Community

cayley's People

Contributors

andrew-d avatar barakmich avatar bcleenders avatar brendanball avatar dennwc avatar derekrliang avatar dsymonds avatar dwhitena avatar h4ck3rm1k3 avatar iddan avatar jf87 avatar joostverdoorn avatar josephschorr avatar jtorvald avatar jzelinskie avatar kortschak avatar lalbertalli avatar mbrukman avatar mguentner avatar mikaelcabot avatar neonstalwart avatar oren avatar panamafrancis avatar pbnjay avatar quentin-m avatar robertmeta avatar sayden avatar schmichael avatar tmlbl avatar yannic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cayley's Issues

Run golint and deadcode and consider cleaning

golint . | wc -l prints 534. 191 of these are not due to lack of documentation and can be cleaned relatively easily. The rest probably deserves a "needs documentation" issue.

find -type d -exec deadcode '{}' \; identifies two non-codegen instances of unused code. These are also easily cleanable (one - cpuprofile is the subject of a PR, the other is graph/iterator/fixed_iterator.go:53:1: newFixed).

panic: runtime error: index out of range

drr@minty ~/git/cayley $ ./cayley repl --dbpath=testdata.nt
cayley> graph.Vertex("dani").Is()

=> [internal Iterator]
-----------
1 Results
Elapsed time: 1.824459 ms

cayley> graph.Vertex("dani").Is().All()

panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range [recovered]
    panic: runtime error: index out of range

goroutine 9 [running]:
runtime.panic(0x9631c0, 0x110f677)
    /usr/lib/go/src/pkg/runtime/panic.c:266 +0xb6
gremlin.func·009()
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:120 +0xe5
runtime.panic(0x9631c0, 0x110f677)
    /usr/lib/go/src/pkg/runtime/panic.c:248 +0x106
github.com/robertkrimen/otto.func·024()
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/error.go:240 +0x43d
runtime.panic(0x9631c0, 0x110f677)
    /usr/lib/go/src/pkg/runtime/panic.c:248 +0x106
graph.(*FixedIterator).DebugString(0xc2100c3690, 0x4, 0x0, 0x0)
    /home/drr/git/cayley/src/graph/fixed-iterator.go:92 +0x1e6
graph.(*AndIterator).DebugString(0xc21005e480, 0x0, 0xc21005e480, 0x851301)
    /home/drr/git/cayley/src/graph/and-iterator.go:117 +0x50c
gremlin.runIteratorOnSession(0x7faed00052d8, 0xc21005e480, 0xc2100d0000)
    /home/drr/git/cayley/src/gremlin/gremlin-finals.go:244 +0xd3
gremlin.func·004(0xc21006e6e0, 0x0, 0x0, 0x5, 0x9ee9e0, ...)
    /home/drr/git/cayley/src/gremlin/gremlin-finals.go:42 +0xc3
github.com/robertkrimen/otto.(*_object).call(0xc210195ea0, 0x5, 0x9ee9e0, 0xc210195180, 0x11211b0, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/type_function.go:140 +0x415
github.com/robertkrimen/otto.(*_runtime).cmpl_evaluate_nodeCallExpression(0xc21006e6e0, 0xc210184c90, 0x0, 0x0, 0x0, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/cmpl_evaluate_expression.go:240 +0x78e
github.com/robertkrimen/otto.(*_runtime).cmpl_evaluate_nodeExpression(0xc21006e6e0, 0x7faed0000fe8, 0xc210184c90, 0xc21014ebb0, 0x9ee901, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/cmpl_evaluate_expression.go:44 +0x7f8
github.com/robertkrimen/otto.(*_runtime).cmpl_evaluate_nodeStatement(0xc21006e6e0, 0x7faed0000d90, 0xc21014ebb0, 0x0, 0x0, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/cmpl_evaluate_statement.go:49 +0xd76
github.com/robertkrimen/otto.(*_runtime).cmpl_evaluate_nodeStatementList(0xc21006e6e0, 0xc21014eba0, 0x1, 0x1, 0x7faecfe88d40, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/cmpl_evaluate_statement.go:108 +0x95
github.com/robertkrimen/otto.(*_runtime).cmpl_evaluate_nodeProgram(0xc21006e6e0, 0xc2100c3000, 0x990400, 0x0, 0x0, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/cmpl_evaluate.go:17 +0x113
github.com/robertkrimen/otto.func·040()
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/runtime.go:293 +0x43
github.com/robertkrimen/otto.catchPanic(0x7faecfe88df8, 0x0, 0x0)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/error.go:243 +0x6a
github.com/robertkrimen/otto.(*_runtime).cmpl_run(0xc21006e6e0, 0x990420, 0xc210171780, 0xc210187360, 0x7faecfe84fa0, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/runtime.go:294 +0x120
github.com/robertkrimen/otto.Otto.Run(0xc210180c60, 0xc21006e6e0, 0x990420, 0xc210171780, 0xc210186c40, ...)
    /home/drr/git/cayley/src/github.com/robertkrimen/otto/otto.go:292 +0x49
gremlin.(*GremlinSession).runUnsafe(0xc2100d0000, 0x990420, 0xc210171780, 0x0, 0x0, ...)
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:139 +0x14a
gremlin.(*GremlinSession).ExecInput(0xc2100d0000, 0xc210186b40, 0x1f, 0xc210093420, 0x64)
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:151 +0x22b
created by cayley_cmd.RunQuery
    /home/drr/git/cayley/src/cayley_cmd/cayley-repl.go:52 +0x17a

goroutine 1 [runnable]:
cayley_cmd.RunQuery(0xc210186b40, 0x1f, 0x7faed0000bf8, 0xc2100d0000)
    /home/drr/git/cayley/src/cayley_cmd/cayley-repl.go:53 +0x19d
cayley_cmd.CayleyRepl(0x7faed0000a20, 0xc210048c00, 0xa1d060, 0x7, 0xc210048960)
    /home/drr/git/cayley/src/cayley_cmd/cayley-repl.go:133 +0xa6c
main.main()
    /home/drr/git/cayley/src/cayley/main.go:77 +0x7c1

goroutine 3 [chan receive]:
github.com/barakmich/glog.(*loggingT).flushDaemon(0x1118680)
    /home/drr/git/cayley/src/github.com/barakmich/glog/glog.go:923 +0x50
created by github.com/barakmich/glog.init·1
    /home/drr/git/cayley/src/github.com/barakmich/glog/glog.go:408 +0x33e

goroutine 4 [syscall]:
runtime.goexit()
    /usr/lib/go/src/pkg/runtime/proc.c:1394

goroutine 8 [sleep]:
time.Sleep(0x6fc23ac00)
    /usr/lib/go/src/pkg/runtime/time.goc:31 +0x31
gremlin.func·011()
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:128 +0x3c
created by gremlin.(*GremlinSession).runUnsafe
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:136 +0x10f

goroutine 10 [sleep]:
time.Sleep(0x6fc23ac00)
    /usr/lib/go/src/pkg/runtime/time.goc:31 +0x31
gremlin.func·011()
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:128 +0x3c
created by gremlin.(*GremlinSession).runUnsafe
    /home/drr/git/cayley/src/gremlin/gremlin-session.go:136 +0x10f

Add Benchmarking Tests

For each backend, run the same set of (grunty) queries across some dataset -- starting with the dataset already in the repository.

Probably off by default for a straight go test, but with the right extraction and flag, measure how fast things actually get traversed. If this changes drastically for the better or worse, well, good to know.

And set up travis to run them, to avoid bitrot.

Feature: Get a random vertex

Feature request: Get a random vertex (in constant time).

Even better: given a range, get a random vertex in that range in constant time.

Why? Because sometimes a datastore (e.g., hash tables) can easily get random entries when the application cannot. Getting a random thing can be very help for sampling, tests, and statistical analysis in general.

Being able to get a random in/out edge for a given vertex could also be handy in some cases.

I haven't looked that the hashing for the different back ends, but hopefully the non-ranged-based version turns out to be easy to implement. I'll take a look soon, I hope.

go get github.com/google/cayley error

go get github.com/google/cayley
go: missing Bazaar command. See http://golang.org/s/gogetcmd
package github.com/google/cayley
imports labix.org/v2/mgo: exec: "bzr": executable file not found in $PATH
package github.com/google/cayley
imports labix.org/v2/mgo/bson
imports labix.org/v2/mgo/bson
imports labix.org/v2/mgo/bson: cannot find package "labix.org/v2/mgo/bson" in any of:
/usr/local/go/src/pkg/labix.org/v2/mgo/bson (from $GOROOT)
/Users/ghj1976/project/mygocode/src/labix.org/v2/mgo/bson (from $GOPATH)

Process not start with huge database

I've downloaded FreeBase data dump (27Gb freebase-rdf-2014-07-06-00-00.gz) and uncompressed it (gzip -cd freebase-rdf-2014-07-06-00-00.gz > freebase.nt 330Gb freebase.nt). When I starting process it takes a lot of time and then process got killed. Log:
`root@ns501558:/home# time ./cayley_0.3.0-pre_linux_amd64/cayley http --dbpath=freebase.nt
Killed

real 16m34.086s
user 14m44.880s
sys 0m13.356s`

Is any solution for it?

May be its better to use compressed databases like I suggested before #57 ?

Fix Sessions to properly manage worker routines

I used the repl console to execute some queries, but after a while the application crashes. Below is the stack trace. Would you know what is the cause ?

cayley> panic: runtime error: close of closed channel

goroutine 50 [running]:
runtime.panic(0xa46720, 0xf1f7d5)
    pkg/runtime/panic.c:279 +0xf5
github.com/google/cayley/query/gremlin.func·011()
    github.com/google/cayley/query/gremlin/session.go:131 +0x4d
created by github.com/google/cayley/query/gremlin.(*Session).runUnsafe
    github.com/google/cayley/query/gremlin/session.go:140 +0x136

goroutine 16 [syscall]:
syscall.Syscall(0x0, 0x0, 0xc208089000, 0x1000, 0x7f0b67402318, 0xc20803a008, 0x43c338)
    pkg/syscall/asm_linux_amd64.s:21 +0x5
syscall.read(0x0, 0xc208089000, 0x1000, 0x1000, 0x4cb355, 0x0, 0x0)
    pkg/syscall/zsyscall_linux_amd64.go:838 +0x75
syscall.Read(0x0, 0xc208089000, 0x1000, 0x1000, 0x8, 0x0, 0x0)
    pkg/syscall/syscall_unix.go:136 +0x5c
os.(*File).read(0xc20803a000, 0xc208089000, 0x1000, 0x1000, 0x4, 0x0, 0x0)
    pkg/os/file_unix.go:190 +0x62
os.(*File).Read(0xc20803a000, 0xc208089000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
    pkg/os/file.go:95 +0x98
bufio.(*Reader).fill(0xc208400de0)
    pkg/bufio/bufio.go:97 +0x1b3
bufio.(*Reader).ReadSlice(0xc208400de0, 0xc20803a00a, 0x0, 0x0, 0x0, 0x0, 0x0)
    pkg/bufio/bufio.go:298 +0x22c
bufio.(*Reader).ReadLine(0xc208400de0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    pkg/bufio/bufio.go:326 +0x69
github.com/google/cayley/db.Repl(0x7f0b67411048, 0xc20811e0f0, 0xad11c0, 0x7, 0xc208004540, 0x0, 0x0)
    github.com/google/cayley/db/repl.go:85 +0x266
main.main()
    github.com/google/cayley/cayley.go:120 +0xbd6

goroutine 19 [finalizer wait, 4 minutes]:
runtime.park(0x414c00, 0xf27020, 0xf24a49)
    pkg/runtime/proc.c:1369 +0x89
runtime.parkunlock(0xf27020, 0xf24a49)
    pkg/runtime/proc.c:1385 +0x3b
runfinq()
    pkg/runtime/mgc0.c:2644 +0xcf
runtime.goexit()
    pkg/runtime/proc.c:1445

goroutine 20 [chan receive]:
github.com/barakmich/glog.(*loggingT).flushDaemon(0xf29a80)
    github.com/barakmich/glog/glog.go:923 +0x75
created by github.com/barakmich/glog.init·1
    github.com/barakmich/glog/glog.go:408 +0x37a

goroutine 17 [syscall, 4 minutes]:
runtime.goexit()
    pkg/runtime/proc.c:1445

goroutine 21 [select, 1 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(0xc2080246c0)
    github.com/syndtr/goleveldb/leveldb/db_compaction.go:113 +0x1fa
created by github.com/syndtr/goleveldb/leveldb.openDB
    github.com/syndtr/goleveldb/leveldb/db.go:117 +0x4b2

goroutine 22 [select, 1 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(0xc2080246c0)
    github.com/syndtr/goleveldb/leveldb/db_compaction.go:666 +0x777
created by github.com/syndtr/goleveldb/leveldb.openDB
    github.com/syndtr/goleveldb/leveldb/db.go:120 +0x4f4

goroutine 23 [select, 1 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(0xc2080246c0)
    github.com/syndtr/goleveldb/leveldb/db_compaction.go:615 +0x1b9
created by github.com/syndtr/goleveldb/leveldb.openDB
    github.com/syndtr/goleveldb/leveldb/db.go:121 +0x50c

goroutine 24 [select, 1 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).jWriter(0xc2080246c0)
    github.com/syndtr/goleveldb/leveldb/db_write.go:37 +0x143
created by github.com/syndtr/goleveldb/leveldb.openDB
    github.com/syndtr/goleveldb/leveldb/db.go:122 +0x524

goroutine 38 [sleep]:
time.Sleep(0x6fc23ac00)
    pkg/runtime/time.goc:39 +0x31
github.com/google/cayley/query/gremlin.func·011()
    github.com/google/cayley/query/gremlin/session.go:130 +0x35
created by github.com/google/cayley/query/gremlin.(*Session).runUnsafe
    github.com/google/cayley/query/gremlin/session.go:140 +0x136

Gizmo: accumulate results in map()

Other finals seem to work fine. For example this works:

g.Emit({
    "films" : g.V().Has("/film/film/directed_by", director_id).TagArray()
});

But this gets: {"error" : "(anonymous): Line 6:1 Unexpected token } (and 3 more errors)"}

g.Emit({
    "films" : g.V().Has("/film/film/directed_by", director_id).ForEach(function(d){
      return d;
    });
});

... or any other variation of function body (like a nested g.Emit());

You might wonder what I'm trying to do... it's format an entire tree. g.Emit() seems to be the only way to do optional fields in an object, so I was trying to figure out how to nest that.

Use gitflow?

For projects like this, it's important to be able to easily work on unstable things, have stable branches, and easily push out version releases. gitflow (and associated tooling), as you might already know, works well for this. Can we strictly adopt this development flow? All development is done on develop, master is stable, tag/v0.0.1 is a tagged version (which GitHub will automatically make a release), et cetera.

Thoughts?

- Jonathan

Note: I've signed the CLA.

Figure out better persistence format?

Currently, the persistent format (the triples) for data in a Cayley database is essentially a plain text log (kinda similar to Redis' AOF files). With graph data, it is very easy for a database to grow at exponential rates fast ... as this happens, the current persistence format will obviously occupy lots of space. I'd like to at least have a discussion on ways to improve this. Maybe using some sort of compressed binary format?

Thanks!

- Jonathan

Note: I've signed the CLA.

NIFTY!

i was wondering when someone would get round to doing a graphdb in go!

i thought i might eventually get there...but started by focusing on just trying to nail down interfaces for interacting with graphs: https://github.com/sdboyer/gogl

so, basically, this is just me putting out there that i've been playing in this area, too. maybe there's some potential for collaboration. or not, either way :)

Revise N-Quads parser

While in nquads I was a little concerned about some possible brittleness. Nothing I could really pin down to a concrete issue, but I figured we could formalise a parser. I have written a ragel-based parser that we could use based on http://www.w3.org/TR/n-quads/#sec-grammar.

The parser definition (below) does not yet handle initial or internal comments, but that can be added when I understand the exact semantics of comments in the N-Quads spec (it's a little ambiguous, though I think I agree with the likely correctness of the cayley implementation). I think initial comments should probably be handled outside the parser and internal comments don't make sense to me, although the spec seems to say they are allowable.

The code below demonstrates it working if you ragel -Z $FILE.rl && go run $FILE.go, or try running a cayley-sanitised generated sample here.

Note that what this parses is not exactly what is in 30kmoviedata.nt. Is there a spec for that format? It seems to be a relaxed and slightly altered form of the spec above.

// GO SOURCE FILE MACHINE GENERATED BY RAGEL; DO NOT EDIT

// Copyright 2014 The Cayley Authors. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

package main

import (
    "bytes"
    "errors"
    "fmt"
    "strconv"

    "github.com/google/cayley/graph"
)

var (
    ErrInvalid = errors.New("invalid N-Quad")
)

func main() {
    for _, input := range []string {
        // N-Triples example 1.
        `<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> . # comments here`,
        `_:subject1 <http://an.example/predicate1> "object1" .`,
        `_:subject2 <http://an.example/predicate2> "object2" .`,

        // N-Triples example 2.
        `<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> .`,

        // N-Triples example 3.
        `_:alice <http://xmlns.com/foaf/0.1/knows> _:bob .`,
        `_:bob <http://xmlns.com/foaf/0.1/knows> _:alice .`,

        // N-Quads example 1.
        `<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> <http://example.org/graph3> . # comments here`,
        `_:subject1 <http://an.example/predicate1> "object1" <http://example.org/graph1> .`,
        `_:subject2 <http://an.example/predicate2> "object2" <http://example.org/graph5> .`,

        // N-Quads example 2.
        `<http://example.org/#spiderman> <http://www.perceive.net/schemas/relationship/enemyOf> <http://example.org/#green-goblin> <http://example.org/graphs/spiderman> .`,

        // N-Quads example 3.
        `_:alice <http://xmlns.com/foaf/0.1/knows> _:bob <http://example.org/graphs/john> .`,
        `_:bob <http://xmlns.com/foaf/0.1/knows> _:alice <http://example.org/graphs/james> .`,

        // N-Triples tests.
        `<http://example.org/bob#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .`,
        `<http://example.org/bob#me> <http://xmlns.com/foaf/0.1/knows> <http://example.org/alice#me> .`,
        `<http://example.org/bob#me> <http://schema.org/birthDate> "1990-07-04"^^<http://www.w3.org/2001/XMLSchema#date> .`,
        `<http://example.org/bob#me> <http://xmlns.com/foaf/0.1/topic_interest> <http://www.wikidata.org/entity/Q12418> .`,
        `<http://www.wikidata.org/entity/Q12418> <http://purl.org/dc/terms/title> "Mona Lisa" .`,
        `<http://www.wikidata.org/entity/Q12418> <http://purl.org/dc/terms/creator> <http://dbpedia.org/resource/Leonardo_da_Vinci> .`,
        `<http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> <http://purl.org/dc/terms/subject> <http://www.wikidata.org/entity/Q12418> .`,

        // N-Quads tests.
        `<http://example.org/bob#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> <http://example.org/bob> .`,
        `<http://example.org/bob#me> <http://xmlns.com/foaf/0.1/knows> <http://example.org/alice#me> <http://example.org/bob> .`,
        `<http://example.org/bob#me> <http://schema.org/birthDate> "1990-07-04"^^<http://www.w3.org/2001/XMLSchema#date> <http://example.org/bob> .`,
        `<http://example.org/bob#me> <http://xmlns.com/foaf/0.1/topic_interest> <http://www.wikidata.org/entity/Q12418> <http://example.org/bob> .`,
        `<http://www.wikidata.org/entity/Q12418> <http://purl.org/dc/terms/title> "Mona Lisa" <https://www.wikidata.org/wiki/Special:EntityData/Q12418> .`,
        `<http://www.wikidata.org/entity/Q12418> <http://purl.org/dc/terms/creator> <http://dbpedia.org/resource/Leonardo_da_Vinci> <https://www.wikidata.org/wiki/Special:EntityData/Q12418> .`,
        `<http://data.europeana.eu/item/04802/243FA8618938F4117025F17A8B813C5F9AA4D619> <http://purl.org/dc/terms/subject> <http://www.wikidata.org/entity/Q12418> <https://www.wikidata.org/wiki/Special:EntityData/Q12418> .`,
        `<http://example.org/bob> <http://purl.org/dc/terms/publisher> <http://example.org> .`,
        `<http://example.org/bob> <http://purl.org/dc/terms/rights> <http://creativecommons.org/licenses/by/3.0/> .`,
    } {
        t, err := parse([]rune(input))
        fmt.Printf("%+v %v\n", t, err)
    }
}

%%{
    machine quads;

    alphtype rune;

    action Escape {
        isEscaped = true
    }

    action StartSubject {
        subject = p
    }

    action StartPredicate {
        predicate = p
    }

    action StartObject {
        object = p
    }

    action StartLabel {
        label = p
    }

    action SetSubject {
        if subject < 0 {
            panic("unexpected parser state: subject start not set")
        }
        triple.Subject = unEscape(data[subject:p], isEscaped)
        isEscaped = false
    }

    action SetPredicate {
        if predicate < 0 {
            panic("unexpected parser state: predicate start not set")
        }
        triple.Predicate = unEscape(data[predicate:p], isEscaped)
        isEscaped = false
    }

    action SetObject {
        if object < 0 {
            panic("unexpected parser state: object start not set")
        }
        triple.Object = unEscape(data[object:p], isEscaped)
        isEscaped = false
    }

    action SetLabel {
        if label < 0 {
            panic("unexpected parser state: label start not set")
        }
        triple.Provenance = unEscape(data[label:p], isEscaped)
        isEscaped = false
    }

    action Return {
        return triple, nil
    }

    action Comment {
    }

    action Error {
        if p < len(data) {
            return graph.Triple{}, fmt.Errorf("%v: unexpected rune %q at %d", ErrInvalid, data[p], p)
        }
        return graph.Triple{}, fmt.Errorf("%v: unexpected rune at %d", ErrInvalid, p)
    }

    PN_CHARS_BASE           = [A-Za-z]
                            | 0x00c0 .. 0x00d6
                            | 0x00d8 .. 0x00f6
                            | 0x00f8 .. 0x02ff
                            | 0x0370 .. 0x037d
                            | 0x037f .. 0x1fff
                            | 0x200c .. 0x200d
                            | 0x2070 .. 0x218f
                            | 0x2c00 .. 0x2fef
                            | 0x3001 .. 0xd7ff
                            | 0xf900 .. 0xfdcf
                            | 0xfdf0 .. 0xfffd
                            | 0x10000 .. 0x1efff
                            ;

    PN_CHARS_U              = PN_CHARS_BASE | '_' | ':' ;

    PN_CHARS                = PN_CHARS_U
                            | '-'
                            | [0-9]
                            | 0xb7
                            | 0x0300 .. 0x036f
                            | 0x203f .. 0x2040
                            ;

    ECHAR                   = ('\\' [tbnrf"'\\]) %Escape ;

    UCHAR                   = ('\\u' xdigit {4}
                            | '\\U' xdigit {8}) %Escape
                            ;

    BLANK_NODE_LABEL        = '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)? ;

    STRING_LITERAL_QUOTE    = '"' (
                              0x00 .. 0x09
                            | 0x0b .. 0x0c
                            | 0x0e .. '!'
                            | '#' .. '['
                            | ']' .. '~'
                            | ECHAR | UCHAR)*
                              '"'
                            ;

    IRIREF                  = '<' (
                              '!' .. ';'
                            | '='
                            | '?' .. '['
                            | ']'
                            | '_'
                            | 'a' .. 'z'
                            | '~' | UCHAR)*
                              '>'
                            ;

    LANGTAG                 = '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* ;

    whitespace              = [ \t] ;

    literal                 = STRING_LITERAL_QUOTE ('^^' IRIREF | LANGTAG)? ;

    subject                 = IRIREF | BLANK_NODE_LABEL ;
    predicate               = IRIREF ;
    object                  = IRIREF | BLANK_NODE_LABEL | literal ;
    graphLabel              = IRIREF | BLANK_NODE_LABEL ;

    statement := (
                    whitespace*  subject    >StartSubject   %SetSubject
                    whitespace+  predicate  >StartPredicate %SetPredicate
                    whitespace+  object     >StartObject    %SetObject
                    (whitespace+ graphLabel >StartLabel     %SetLabel)?
                    whitespace*  '.' whitespace* ('#' any*)? >Comment
                 ) %Return @!Error ;

    write data;
}%%

func parse(data []rune) (graph.Triple, error) {
    var (
        cs, p int
        pe    = len(data)
        eof   = pe

        subject   = -1
        predicate = -1
        object    = -1
        label     = -1

        isEscaped bool

        triple graph.Triple
    )

    %%write init;

    %%write exec;

    return graph.Triple{}, ErrInvalid
}

func unEscape(r []rune, isEscaped bool) string {
    if !isEscaped {
        return string(r)
    }

    buf := bytes.NewBuffer(make([]byte, 0, len(r)))

    for i := 0; i < len(r); {
        switch r[i] {
        case '\\':
            i++
            var c byte
            switch r[i] {
            case 't':
                c = '\t'
            case 'b':
                c = '\b'
            case 'n':
                c = '\n'
            case 'r':
                c = '\r'
            case 'f':
                c = '\f'
            case '"':
                c = '"'
            case '\'':
                c = '\''
            case '\\':
                c = '\\'
            case 'u':
                rc, err := strconv.ParseInt(string(r[i+1:i+5]), 16, 32)
                if err != nil {
                    panic(fmt.Errorf("internal parser error: %v", err))
                }
                buf.WriteRune(rune(rc))
                i += 5
                continue
            case 'U':
                rc, err := strconv.ParseInt(string(r[i+1:i+9]), 16, 32)
                if err != nil {
                    panic(fmt.Errorf("internal parser error: %v", err))
                }
                buf.WriteRune(rune(rc))
                i += 9
                continue
            }
            buf.WriteByte(c)
        default:
            buf.WriteRune(r[i])
        }
        i++
    }

    return buf.String()
}

Tagging with properties works only for 1st record in result set

I'm playing with sample movies db and trying to use the name of start node as tag, but something goes wrong:

Here's my script:

var filmToActor = g.Morphism().Out("/film/film/starring").Out("/film/performance/actor")  
g.V().Has("name", "Casablanca").Save("name", "source").Follow(filmToActor).Tag("target").All()

And here's the output. Only the first record gets the correct "source" value, and others are somehow getting "/film/film":

{
 "result": [
  {
   "id": ":/en/humphrey_bogart",
   "source": "Casablanca",
   "target": ":/en/humphrey_bogart"
  },
  {
   "id": ":/en/ingrid_bergman",
   "source": "/film/film",
   "target": ":/en/ingrid_bergman"
  },
  {
   "id": ":/en/paul_henreid",
   "source": "/film/film",
   "target": ":/en/paul_henreid"
  },
  ...
 ]
}

I tried to replace .Save("name", "source") with .Out("name").Tag("source") and than getting .Back, but got the same result.

graph db and query interface as GO library

A TODO for @kortschak ;-)

it would be great to have a way to use cayley (core) as a go library.

  • go gettable components (mainly/only graph package with a GO API interface)
  • separate appengine , leveldb, ... so no need to import those packages.
    ...

thanks.

move iterators into their own package

Currently all the iterators are in graph, with the consequence that we have a collection of g/*Iterator/ names. I think this would be improved by moving all the iterators into cayley/graph/iterator and s/(.+)Iterator/\1/g. giving the following type:

iterator.And
iterator.Base
iterator.Fixed
iterator.Optional
iterator.Or
iterator.Hasa
iterator.Int64All // or iterator.Int64
iterator.LinksTo // or iterator.Link
iterator.Null
iterator.ValueComparison // or iterator.Comparison

The Iterator interface is left in graph.

Postgres backend!

One of the things in TODO.md is supporting Postgres as a backend. As someone who loves Postgres when dealing with relational data, I'd like to hear more thoughts on this, because I'd be interested in implementing it. For example, would native data types be best or could every entry just make use of Postgres' awesome native support for JSON documents?

As stated in the document, why not 😄?

- Jonathan

Note: I've signed the CLA.

Connecting to mongodb

I have tried to configure cayley to use mongodb: cayley http --config=cayley.cfg.test --port=64211 where cayley.cfg.test is:
{ "database": "mongodb", "db_path": "localhost/test:27017" }
but i get an error: F0808 12:43:02.479565 06352 cayley.go:137] triplestore: name 'mongodb' is not registered.

MQL query returns unexpected values

Using the 30kmovies sample and running the following MQL ("find all movies by David Fincher and list actors for each movie")

[{
  "type": "/film/film",
  "name": null,
  "/film/film/directed_by": {
    "name": "David Fincher"
  },
  "/film/film/starring": [{
    "/film/performance/actor": {
      "name": null
    }
  }]
}]

Results in the following output:

{
 "result": [
  {
   "/film/film/directed_by": {
    "name": "David Fincher"
   },
   "/film/film/starring": [
    {
     "/film/performance/actor": {
      "name": "Edward Norton"
     }
    },
    // [...]
   ],
   "name": "/film/film",
   "type": "/film/film"
  },
  // [...]
 ]
}

Notice the wrong value of the last "name" key.

Moving "name": null from the top down to the end...

[{
  "type": "/film/film",
  "/film/film/directed_by": {
    "name": "David Fincher"
  },
  "/film/film/starring": [{
    "/film/performance/actor": {
      "name": null
    }
  }],
  "name": null
}]

...results in the correct output...

{
 "result": [
  {
   "/film/film/directed_by": {
    "name": "David Fincher"
   },
   "/film/film/starring": [
    {
     "/film/performance/actor": {
      "name": "Edward Norton"
     }
    },
    // [...]
   ],
   "name": "Fight Club",
   "type": "/film/film"
  }
  // [...]
 ]
}

There seems to be an unexpected key order sensivity when using MQL.

Rocksdb backend

See rocksdb and its curious benchmarks.

github.com/DanielMorsing/rocksdb provides Go bindings for rocksdb. That repo is a fork of github.com/jmhodges/levigo, which is a wrapper around the C++ LevelDB. That API isn't too different from github.com/syndtr/goleveldb's API, which is the basis for the current LevelDB support.

I've got rough code running; not too bad.

The obvious problem is that standard rocksdb support would cost a external native code dependency. Requiring librocksdb.{so,a} to run Cayley using another back end is bad. Not sure how to address that problem. Maybe for now just include the graph/rocksdb directory, but leave rocksdb out of db/{load,init}.go?

Higher-level Go API

Tracking for this discussion that keeps happening -- now it's got a number.

Basically, it'd be great to write queries (or, more explicitly, build iterator trees) without having to be so mechanistic in building the iterator trees. Exposing such an API makes it easier for people using Cayley as a library, makes it easier to write query languages (that have repeated patterns in a lot of places) and, perhaps, make it easier to talk about external API connections -- essentially, the wire format for building iterator trees. Which would allow interesting bindings for other languages without catting together Javascript.

This may be directly related to the Gremlin API, in that one may influence the other and vice versa.

Cassandra backend

Cassandra would be a nice option for storage since it's a popular and proven key/value store.

All Iterator.Optimize() calls should defer to TripleStore

Not just LinksTo, but And and Or Iterators too. This is especially true for SQL backends where we can save enormously on wire transfers and take advantage of indexes!

PR will probably be forthcoming, but graph.iterator types don't keep a reference to their TripleStore so I'm not sure how to approach this.

Explain Use Cases

Can you explain use cases for this DB? Is Cayley mean to be the universal DB of a project (substituting eg posgresql)? Or it's meant only for some not real time access.
Also some links for Client API are missing (eg how to access it from Go).

Make cayley go gettable

Are you interested in a PR to make cayley go gettable? The changes are fairly invasive, but will make the project more usable.

Accumulo backend

I'm interested in using Accumulo as a back end. Opinions? Should be relatively easy since Accumulo presents an API similar to LevelDB. Then you'd have pretty decent horizontal scalability (with the usual caveats re graphs, round trips, etc).

regression introduced in 62785 or 19124

During the merger of materilizer into the nexter changes, I broke something. I have only just noticed while working on b/llrb changes.

62785d2 - breaking merge (expected damage)
191244c - merge resolution (introduction of failure)

The breakage shows up as a failure of the helpless checker - gives no results. It also scrambles the results of other queries but leaves their counts the same. This test is missed by travis, because we don't run long tests there, and we obviously need to implement less slipshod tests (my fault).

I'm looking into what I did now, but since it was an interaction with @barakmich's changes, it would be good it he could look as well.

cc:@barakmich

nquads silently drops lines containing quotes

If you apply the patch below or equivalent, you see 137 lines output as having been dropped when you execute cayley http --dbpath=30kmoviedata.nt. All have at least one quote (single or double) mark.

    diff --git a/nquads/nquads.go b/nquads/nquads.go
    index 1f534b0..c555cc2 100644
    --- a/nquads/nquads.go
    +++ b/nquads/nquads.go
    @@ -16,6 +16,7 @@ package nquads

     import (
            "bufio"
    +       "fmt"
            "io"
            "strings"

    @@ -185,11 +186,13 @@ func ReadNQuadsFromReader(c chan *graph.Triple, reader io.Reader) {
                            continue
                    }
                    triple := Parse(line)
    -               line = ""
                    if triple != nil {
                            nTriples++
                            c <- triple
    +               } else {
    +                       fmt.Printf("dropped line: %q\n", line)
                    }
    +               line = ""
            }
            glog.Infoln("Read", nTriples, "triples")
            close(c)

Write log, as_of time and prep for replication

Long story short, it'd be good to be more durable. A proper write log (and then transactions and, potentially, rollbacks) would go a long way. It then also opens up replication and federation beyond just running atop a distributed store. It also helps the consistency of those as well.

I've been thinking about how to go about this and it's not too crazy, just some work and checking in the iterators.

panic: runtime error: index out of range

$ ./cayley repl --dbpath=30kmoviedata.nt.gz
panic: runtime error: index out of range

goroutine 21 [running]:
runtime.panic(0x85245e0, 0x89b029c)
/fs/home/barak/local/go/src/pkg/runtime/panic.c:279 +0xe9
github.com/google/cayley/nquads.getQuotedPart(0x18c97282, 0xe9, 0x18c97214, 0x0, 0x0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/nquads/nquads.go:109 +0x5c1
github.com/google/cayley/nquads.getUnquotedPart(0x18c97259, 0x112, 0x18c25ea0, 0x0, 0x0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/nquads/nquads.go:154 +0x10f
github.com/google/cayley/nquads.getTripleComponent(0x18c97259, 0x112, 0x18c97259, 0x0, 0x0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/nquads/nquads.go:86 +0x124
github.com/google/cayley/nquads.ParseLineToTriple(0x18c97259, 0x112, 0x18c971e0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/nquads/nquads.go:47 +0x11a
github.com/google/cayley/nquads.ReadNQuadsFromReader(0x18c1a750, 0xb76db238, 0x18c244e0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/nquads/nquads.go:187 +0x339
github.com/google/cayley/db.ReadTriplesFromFile(0x18c1a750, 0xbfb0c6a1, 0x12)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/db/load.go:64 +0x16a
created by github.com/google/cayley/db.LoadTriplesFromFileInto
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/db/load.go:69 +0x65

goroutine 16 [chan receive]:
github.com/google/cayley/db.LoadTriplesFromFileInto(0xb76db6f0, 0x18c6c080, 0xbfb0c6a1, 0x12, 0x2710)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/db/load.go:72 +0xd9
github.com/google/cayley/db.CayleyLoad(0xb76db6f0, 0x18c6c080, 0x18c1a1e0, 0xbfb0c6a1, 0x12, 0x18c94001)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/db/load.go:41 +0x12c
github.com/google/cayley/db.OpenTSFromConfig(0x18c1a1e0, 0x0, 0x0)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/db/open.go:36 +0x255
main.main()
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/google/cayley/cayley.go:78 +0x660

goroutine 19 [finalizer wait]:
runtime.park(0x805a6d0, 0x89b4434, 0x89b2fa9)
/fs/home/barak/local/go/src/pkg/runtime/proc.c:1369 +0x94
runtime.parkunlock(0x89b4434, 0x89b2fa9)
/fs/home/barak/local/go/src/pkg/runtime/proc.c:1385 +0x3f
runfinq()
/fs/home/barak/local/go/src/pkg/runtime/mgc0.c:2644 +0xc5
runtime.goexit()
/fs/home/barak/local/go/src/pkg/runtime/proc.c:1445

goroutine 20 [chan receive]:
github.com/barakmich/glog.(*loggingT).flushDaemon(0x89b5b60)
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/barakmich/glog/glog.go:923 +0x67
created by github.com/barakmich/glog.init·1
/home/barak/src/all-cayley/barakmich-cayley/src/github.com/barakmich/glog/glog.go:408 +0x26f

Javascript API source?

I'm the author of LevelGraph, congrats for doing this project! I have been using the very same technique for storing graph database in LevelDB for quite some time and it works great!

I would like to know where are the sources of the JS APIs, I would love to see if I can adopt the same interface for LevelGraph, so we can have a Gremlin.js thing.

expanded triplestore interface and pluggable backends

I'd suggest a slightly expanded TripleStore interface which would let you move a lot of the backend-specific code out of db/init.go db/load.go and db/open.go. It would make the integration points a little more obvious for new backends.

Here's an Initial list of methods based on the above files:

Open(dbpath string, opts graph.OptionsDict) // aka NewTripleStore
CreateSchema() // aka CreateNewMongoGraph etc
BeginBulkLoad() (chan *graph.Triple)

I'd also suggest a central registry / factory of backends to make the above files cleaner and allow new modules to use them transparently.

cayley init and cayley load disagree on behaviour when using mem back end

db.Init() returns true and does nothing, while db.Load() depends on the existence of a file via db.Open().

This is primarily a documentation issue. When running cayley with a mem back end, init and load should print meaningful messages - that those commands are not necessary.

The behaviour for loading triples differs when using mem too; the data are specified through the config and not via the triples parameter. This is confusing.

gremlin query halting is racey

This is trivial to fix using a chan (maybe a time.Timer for ease) rather than a bool. I'll pick it up in the next few days. While I make that change I will change the timeout to behave as a time.Duration rather than seconds.

.../github.com/google/cayley$ go test -race
==================
WARNING: DATA RACE
Write by goroutine 10:
  github.com/google/cayley/query/gremlin.func·011()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:132 +0x96

Previous read by goroutine 8:
  [failed to restore the stack]

Goroutine 10 (running) created at:
  github.com/google/cayley/query/gremlin.(*Session).runUnsafe()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:139 +0x1fd
  github.com/google/cayley/query/gremlin.(*Session).ExecInput()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:154 +0x457

Goroutine 8 (running) created at:
  testing.RunTests()
      /usr/local/src/go/src/pkg/testing/testing.go:504 +0xb46
  testing.Main()
      /usr/local/src/go/src/pkg/testing/testing.go:435 +0xa2
  main.main()
      github.com/google/cayley/_test/_testmain.go:65 +0xdc
==================
==================
WARNING: DATA RACE
Read by goroutine 10:
  github.com/google/cayley/query/gremlin.func·011()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:133 +0xc8

Previous write by goroutine 9:
  github.com/google/cayley/query/gremlin.(*Session).ExecInput()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:169 +0x2fd

Goroutine 10 (running) created at:
  github.com/google/cayley/query/gremlin.(*Session).runUnsafe()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:139 +0x1fd
  github.com/google/cayley/query/gremlin.(*Session).ExecInput()
      /home/daniel/Development/src/github.com/google/cayley/query/gremlin/session.go:154 +0x457

Goroutine 9 (finished) created at:
  github.com/google/cayley.TestQueries()
      /home/daniel/Development/src/github.com/google/cayley/cayley_test.go:325 +0x3f0
  testing.tRunner()
      /usr/local/src/go/src/pkg/testing/testing.go:422 +0x10f
==================
PASS
Found 2 data race(s)
exit status 66
FAIL  github.com/google/cayley  552.029s

Documentation

This is a catchall issue.

As highlighted in #91, a significant proportion of the exported codebase lacks doc comments or the comments are not associated with the relevant labels according to the godoc conventions (and so are missed by golint and godoc).

It might be worth considering adding a page to the wiki on graph databases in general (and links to resources). This kind of approach to client education has been very successful in growing the Go userbase with the {talks,blog}.golang.org pages and the go-tour.

other graph models

So far the examples use the RDF graph model. Is it possible or planned that other graph models like tinkeerpop blueprints would be supported?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.