Giter VIP home page Giter VIP logo

Comments (2)

knazarov avatar knazarov commented on September 15, 2024 2

I would do it differently. Sharding key should be described as a part of DDL, for multiple reasons:

  • it will make inserting tuples with the wrong bucket_id impossible. as opposed to the proposal, where you specify sharding_key as part of the get/put/select/update call
  • it will make it possible for clients to ask how the crud module computes the sharding key, and compute it themselves
  • computing sharding key on the client will lead to all sorts of possible optimizations, including the possibility to send requests to the correct router when routers and storages are on the same Tarantool instance

So, I'd only give the ability to specify a list of fields that are used to calculate bucket_id and make the rules of calculation explicitly documented.

I'd expect the information about sharding keys to be in the _sharding_key space (like what you have with the DDL module). Just make _sharding_key space a requirement for crud, if the user wants other sharding keys than the primary key. The bonus points are that this will make loose coupling with the DDL module. E.g. exactly the same by contract, but not required, and can be implemented by the user if they choose to do so.

from crud.

dokshina avatar dokshina commented on September 15, 2024

RFC

Now CRUD supports only sharding by primary key value.
But vshard documentation says that bucket_id "can be assigned in arbitrary way by client application".
So, here we are.

1. Specifying the way bucket_id is computed.

I see two different ways to specify bucket_id:

  • Specify bucket_id_func that accepts tuple (and maybe space object) and computes bucket_id:
    The default function is:

    function default_bucket_id(tuple, space)
      local key = utils.extract_key(tuple, space.index[0].parts)
      local bucket_id = vshard.router.bucket_id_strcrc32(key)
      return bucket_id
    end
  • The other way is to specify sharding key (which in general can be not index, but a set of fields) and function by key:

    sharding_key = 'location', -- index or field name or smth else??, what about set of fields?
    bucket_id_func = vshard.router.bucket_id()

It seems that second approach is better - it allows to provide clear error messages (not smth like failed to index space.index[5] - a nil value).
But now I don't see a clear way to specify sharding key.

The first approach seems more general and it respects what vshard says about bucket_id: "can be assigned in arbitrary way by client application".

No matter which approach we will choose, I call it sharding_opts below.

2. Usage

Insert (also replace, upsert)

It seems that everything is OK here. Insert accepts the whole tuple and sharding_opts. So, it can easily compute bucket_id and insert a tuple.

crud.insert('customers', tuple, { sharding_opts = '<some-magic>' })
-- tuple + sharding_opts -> bucket_id

Get (also update and delete)

Get accepts primary key, which isn't a sharding key in general. What can we do?

We can support different scenarious:

  1. Default scenario - no sharding_opts specified:
crud.get('customers', primary_key)
-- primary_key + default_sharding_opts -> bucket_id
  1. sharding_opts are specified:
crud.get('customers', primary_key, { sharding_opts = '<some-magic>' })

If sharding_opts isn't default value, we perform honest map-reduce - get by specified primary_key on all replica sets.

XXX: What should we do if two values on different storages are found?

  1. sharding_opts and bucket_id are specified (for bucket_id_func(tuple, space)) approach:
crud.get('customers', primary_key, { sharding_opts = '<some-magic>', bucket_id = '<bucket-id>' })

We just use specified bucket-id value to select one replica set.

XXX: It's a bit strange for get - if we want to get tuple by ID, how could we know bucket_id? But it seems to be OK for delete and update.

  1. sharding_opts and sharding_key are specified (for sharding_key + bucket_id_func(key) approach):
crud.get('customers', primary_key, { sharding_opts = '<some-magic>', sharding_key = '<a-key-to-pass-to-bucket-id-func>' })

If sharding_opts isn't default value, we use specified sharding_key to compute bucket_id value.

XXX: It's a bit strange for get - if we want to get tuple by ID, how could we know sharding_key? But it seems to be OK for delete and update.

3. Danger

While we are using sharding by primary key, Tarantool saves us from modifying sharding key.
But when we start to use custom sharding key, there is no guarantees that is wouldn't be changed occasionally.

from crud.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.