Giter VIP home page Giter VIP logo

Comments (5)

tv42 avatar tv42 commented on July 23, 2024

I'm not sure if this means fully hierarchical keys, or just moving from bucket+key to potentially bucket+key+subkey. Everything before "Caveats" seems to imply the former, but the "Only support single-level" phrase conflicts with that. Personally, I don't see a single, fixed, extra level of depth worth the effort. Complex uses will end up having to construct a compound key in a []byte anyway.

Fully hierarchical keys would be nice, assuming the API and internals can still remain clean. What if it's just Buckets all the way down?

// variadic so you can go multiple levels deep in one call.
// returns nil if bucket does not exist.
func (db *DB) Bucket(keys ...[]byte) *Bucket
func (db *DB) CreateBucket(key []byte) error
func (db *DB) CreateBucketIfNotExists(key []byte) error

// just like DB.Bucket, goes deeper, returns nil for non-existing buckets.
// calling Bucket on nil receivers is valid and returns nil.
func (b *Bucket) Bucket(keys ...[]byte) *Bucket
// gives error if b==nil.
// buckets and keys directly under this bucket are in separate namespaces; that is,
// a bucket can have both a sub-bucket and a data item with the same key.
func (b *Bucket) CreateBucket(key []byte) error
func (b *Bucket) CreateBucketIfNotExists(key []byte) error

// get a key inside this bucket; returns nil if b==nil or key is not found
func (b *Bucket) Get(key []byte) []byte
// walk the bucket; returns nil if b==nil
func (b *Bucket) Cursor() *Cursor

The variadic versions make it easier to use, and guaranteeing behavior for b==nil helps write error checks just at the end. For example, if I have a compound key <timestamp, userid, itemid, mumble>, these two would be interchangeable:

a := db.Bucket([]byte("widgets")).Bucket(timestamp).Bucket(userid).Bucket(itemid).Get(mumble)
b := db.Bucket([]byte("widgets"), timestamp, userid, itemid).Get(mumble)

So .Bucket() becomes a lot like cd, it's a reference deeper into the tree, and using it avoids repeating & re-walking the path there. I don't know much about the internals, if this is feasible at all.

And, well, then you could also support direct Get and Put with hierarchical keys. Using a Bucket is means of 1. avoiding the walking cost 2. often, staying within one transaction.

func (db *DB) Get(keys ...[]byte) []byte
func (b *Bucket) Get(keys ...[]byte) []byte

Also, I switched bucket keys to []byte in the examples. I feel like using both string and []byte for keys in one API misleading.

Anyway, appreciate all the work and please be careful with api growth.

P.S. I've been looking for a way to encode sql-style compound keys in a []byte while preserving sort order. Everything I've been able to find assumes that all variable length fields are nul-terminated, and thus do not contain all possible binary values; and quoting nuls is hard if you're trying to preserve sort order. Hierarchical keys would eliminate the need.

from bolt.

benbjohnson avatar benbjohnson commented on July 23, 2024

@tv42 I think Buckets all the way down is probably the way to go. I like the idea of variadic args too.

One of my concerns with infinitely-nestable buckets is cursor allocations. I wanted to be able to allocate a main cursor and a subcursor initially and just reuse them. Although as long as we can reuse the same cursor at each depth in the hierarchy then I'm fine with n-levels.

You bring up a good point about bucket names being bytes instead of strings. I'll probably have to change that.

from bolt.

kardianos avatar kardianos commented on July 23, 2024

I think []byte is more appropriate for keys due to the cost of copying []byte to string.

from bolt.

tv42 avatar tv42 commented on July 23, 2024

I wrote a library that might be relevant: https://github.com/tv42/compound
not as in BoltDB using it, but because of the underlying trick.

It can combine multiple []byte into a single key, in a way that does not confuse "foo"+"bar" with "foob"+"ar", while preserving sort order.

The trick is: quote 0x00 as 0x00 0xFF, separate items with 0x00 0xNN where NN is any value other than FF. I use that space to store the data type, for extra safety.

So for Bolt, you could make the bucket (or key) space be nested, and join the []byte names with this logic. The API could look like the varargs stuff above.

It seems lmdb internals have a big separation between buckets and keys, not sure if blowing up the number of buckets would hurt too much. Basically, logically you have hierarchical keys:

val := tx.Get(a, b, c, d, e)

but where do you draw the boundary between "buckets" and "keys"? In my mind, the bucket separation is largely an artifice of not having this hierarchy in the first place, so it might not make sense to keep that separation, if the API has this power.. but then again, I don't know much that would change the implementation details.

from bolt.

benbjohnson avatar benbjohnson commented on July 23, 2024

@tv42 Thanks for the link to the library. I'll check it out tomorrow. Just as a heads up, I'm going to be implementing nested keys this week because we're looking to do Bolt integration and production-level testing next week and that's the last missing piece.

My plan right now is to utilize the leafPageElement.flags field to create two new leaf element types: an embedded bucket type and a regular bucket type. (The names of those types will probably change)

The embedded bucket will be used if the subbucket contents are small enough to fit inline on the leaf page. The regular bucket type will be used if the subbucket contents are large and then a root page pointer will be used just like the current bucket type.

I'll add some more details tomorrow. I gotta run.

from bolt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.