Giter VIP home page Giter VIP logo

document's Introduction

Use cases  |  Setup  |  Status  |  API  |  Contact

Build Status

Effortless JSON storage for Tarantool

You may use this module to receive and store structured data you get from external world. It has a few important strengths:

  • You are not forced to define any kind of schema for your documents
  • Still, they are stored with very little redundancy
  • You can build indices on arbitrary fields (even nested)
  • There are convenient high-level functions for data manipulation
  • The module works transparently for local spaces, remote spaces and even sharded spaces!
  • You can do "eventually consistent" selects and joins across sharded spaces!

Use cases

This module is suitable for projects where having a strict schema is not desirable. And especially for small codebases, where you don't want to write lots of boilerplate.

Setup

This module has no outside dependencies, so you can just drop document.lua into the root of your project.

Alternatively, you can use Tarantool package manager:

tarantoolctl rocks install document

Usage

Boilerplate:

doc = require('document')
json = require('json')

box.cfg{}

box.schema.create_space('test', {if_not_exists = true})
doc.create_index(box.space.test, 'primary',
                 {parts={'id', 'unsigned'}, if_not_exists=true})

Actual data manipulation

doc.insert(box.space.test, {id=1, foo="foo", bar={baz=3}})
doc.insert(box.space.test, {id=2, foo="bar", bar={baz=0}})

print('All tuples')
for _, r in doc.select(box.space.test) do
    print('tuple:', json.encode(r))
end

print('Tuples where bar.baz > 0')
for _, r in doc.select(box.space.test, {{'$bar.baz', '>', 0}}) do
    print('tuple:', json.encode(r))
end

print('Deleting a tuple where primary key == 2')
doc.delete(box.space.test, {{"$id", "==", 2}})

How it works

A naive implementation would have just stored JSON documents as strings inside a tuple, and extracted indices into separate fields of the tuple.

A more optimized approach is what mongodb or postgresql are doing: instead of storing JSON documents as text, invent a compact binary format and store it inside a tuple.

But we decided to take another approach, and dynamically figure out document schema. We walk through the incoming document and put each leaf element into a separate tuple field, essentially "flattening" it. If we already saw such field previously, then schema already contains a mapping between path in the document and a position inside the tuple. If not, then we extend the schema and add a new field, assigning a new rightmost column in the tuple to store its data.

When data is selected back, we reconstruct the original object using document schema.

Our experiments show that most documents can achieve 5x to 10x compression due to the method, because the schema is stored only once per space.

Queries

Queries are written using Lua tables, and are just lists of conditions of the following form:

{left, op, right}

Where left and right parts of the condition are either regular values or references to field name, and op is a comparison operator.

Example values for left and right:

  • 1
  • nil
  • "foo"
  • "$id"

Here, the "$id" is a special form that references tuple field by name. You can put a "path" there, separated with ".", like "$foo.bar.val".

Example values for op:

  • ">"
  • ">="
  • "=="
  • "<="
  • "<"

Query examples:

  • {{"$id", ">", 10}}
  • {{"$id", ">", 10}, {"$id", "<", 100}}
  • {{"$user.name", "==", "foo"}, {"$qty", "==", 0}}

Status

  • The functionality for dealing with regular spaces is feature-complete
  • Serialization/deserialization should be reasonably fast for most use-cases (though, there are no benchmarks at the moment)
  • Selects/joins across sharded spaces may have bugs. There is no automated test coverage for this case.

API

doc.insert(space, tbl)

Insert document tbl into space.

doc.delete(space, query)

Insert table tbl into space.

Delete documents from space, that match query (see Queries above)

doc.select(space, query, options)

Select documents from space that match query (see Queries above) and return an iterator to the result set.

options is a table with the following optional keys:

  • limit: maximum number of results to return
  • offset: the offset from the beginning of the result set

doc.join(space1, space2, query, options)

Perform an inner join of spaces space1 and space2, where both items satisfy query (see Queries above).

options is a table with the following optional keys:

  • limit: maximum number of results to return
  • offset: the offset from the beginning of the result set

Low level API

doc.flatten(space, tbl)

Converts document tbl to flat array, updating schema for space space as necessary.

doc.unflatten(space, tbl)

Converts flat array tbl to a nested document, according to schema for space space.

create_index(index_name, options)

Behaves similar to box.space.create_index(), but allows to specify string field names in addition to numeric in parts.

field_key(space, field_name)

Returns integer key for field named field_name in a flattened document. If you need a key for nested documents, use dot notation, like: "foo.bar.id".

Contacts

This module was initialy written by Konstantin Nazarov.

You can reach out to him at [email protected].

document's People

Contributors

aleclarson avatar curiousgeorgiy avatar totktonada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

document's Issues

Vinyl engine

Hi, I've found out that tarantool crashes when I insert a tuple with a different format.

-- Tarantool: 1.9.0-52-g38b2a29ff

local profileSpace = box.schema.create_space('profile', {
    engine = 'vinyl'
})

local doc = require("document")

doc.create_index(profileSpace, 'primary', {parts={'oid', 'string'}})
doc.insert(profileSpace, {oid = 'test1'})
doc.insert(profileSpace, {oid = 'test2', email = 'test2@mail'})

Error: Vinyl does not support changing space format of a non-empty space

RPM and Debian packages

It would be cool to have a RPM and Debian packages, like a tarantool-avro-schema or tarantool-http

When I want to build a RPM package for my application, I specify in the rpm.spec:

Requires: tarantool >= x.y.z
Requires: tarantool-module-required >= x.y.z

but using the tarantool-document I can't do this, I need to copy document.lua to my source code every time, if I want to update application's dependencies

Documents are not returned with SQL interface

Step to reproduce:

  1. Create space and insert documents as in sample
    tarantool> box.schema.create_space('TEST_DOC', {if_not_exists = true})
    tarantool> doc.create_index(box.space.TEST_DOC, 'primary',{parts={'id', 'unsigned'}, if_not_exists=true})
    tarantool> doc.insert(box.space.TEST_DOC, {id=1, foo="foo", bar={baz=3}})

  2. Verify that data is inserted
    tarantool> for _, r in doc.select(box.space.TEST_DOC) do
    print('tuple:', json.encode(r))
    end

tuple:  {"bar":{"baz":3},"foo":"foo","id":1}
---
...
  1. Select data with SQL interface
    tarantool> box.sql.execute([[select * from TEST_DOC]]);

Expected: tuples with documents.

Actual: SQL error

- error: 'no such table: TEST_DOC'
...

Additional data:
tarantool> box.sql.execute([[SELECT * FROM _space;]])

  - [528, 1, 'TEST_DOC', 'memtx', 0, !!binary gA==, !!binary k4OkdHlwZah1bnNpZ25lZKRuYW1lomlkq2lzX251bGxhYmxlwoOkdHlwZaZzY2FsYXKkbmFtZadiYXIuYmF6q2lzX251bGxhYmxlw4OkdHlwZaZzdHJpbmekbmFtZaNmb2+raXNfbnVsbGFibGXD]

tarantool> box.info.version

---
- 1.8.3-77-ge5b5dbc
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.