Comments (2)
I would do it differently. Sharding key should be described as a part of DDL, for multiple reasons:
- it will make inserting tuples with the wrong bucket_id impossible. as opposed to the proposal, where you specify sharding_key as part of the get/put/select/update call
- it will make it possible for clients to ask how the
crud
module computes the sharding key, and compute it themselves - computing sharding key on the client will lead to all sorts of possible optimizations, including the possibility to send requests to the correct router when routers and storages are on the same Tarantool instance
So, I'd only give the ability to specify a list of fields that are used to calculate bucket_id and make the rules of calculation explicitly documented.
I'd expect the information about sharding keys to be in the _sharding_key space (like what you have with the DDL module). Just make _sharding_key space a requirement for crud, if the user wants other sharding keys than the primary key. The bonus points are that this will make loose coupling with the DDL module. E.g. exactly the same by contract, but not required, and can be implemented by the user if they choose to do so.
from crud.
RFC
Now CRUD
supports only sharding by primary key value.
But vshard
documentation says that bucket_id
"can be assigned in arbitrary way by client application".
So, here we are.
1. Specifying the way bucket_id
is computed.
I see two different ways to specify bucket_id
:
-
Specify
bucket_id_func
that acceptstuple
(and maybespace
object) and computesbucket_id
:
The default function is:function default_bucket_id(tuple, space) local key = utils.extract_key(tuple, space.index[0].parts) local bucket_id = vshard.router.bucket_id_strcrc32(key) return bucket_id end
-
The other way is to specify sharding key (which in general can be not index, but a set of fields) and function by key:
sharding_key = 'location', -- index or field name or smth else??, what about set of fields? bucket_id_func = vshard.router.bucket_id()
It seems that second approach is better - it allows to provide clear error messages (not smth like failed to index space.index[5] - a nil value
).
But now I don't see a clear way to specify sharding key.
The first approach seems more general and it respects what vshard
says about bucket_id
: "can be assigned in arbitrary way by client application".
No matter which approach we will choose, I call it sharding_opts
below.
2. Usage
Insert (also replace, upsert)
It seems that everything is OK here. Insert accepts the whole tuple and sharding_opts
. So, it can easily compute bucket_id
and insert a tuple.
crud.insert('customers', tuple, { sharding_opts = '<some-magic>' })
-- tuple + sharding_opts -> bucket_id
Get (also update and delete)
Get accepts primary key, which isn't a sharding key in general. What can we do?
We can support different scenarious:
- Default scenario - no
sharding_opts
specified:
crud.get('customers', primary_key)
-- primary_key + default_sharding_opts -> bucket_id
sharding_opts
are specified:
crud.get('customers', primary_key, { sharding_opts = '<some-magic>' })
If sharding_opts
isn't default value, we perform honest map-reduce - get by specified primary_key
on all replica sets.
XXX: What should we do if two values on different storages are found?
sharding_opts
andbucket_id
are specified (forbucket_id_func(tuple, space))
approach:
crud.get('customers', primary_key, { sharding_opts = '<some-magic>', bucket_id = '<bucket-id>' })
We just use specified bucket-id
value to select one replica set.
XXX: It's a bit strange for get - if we want to get tuple by ID, how could we know bucket_id
? But it seems to be OK for delete and update.
sharding_opts
andsharding_key
are specified (forsharding_key + bucket_id_func(key)
approach):
crud.get('customers', primary_key, { sharding_opts = '<some-magic>', sharding_key = '<a-key-to-pass-to-bucket-id-func>' })
If sharding_opts
isn't default value, we use specified sharding_key
to compute bucket_id
value.
XXX: It's a bit strange for get - if we want to get tuple by ID, how could we know sharding_key
? But it seems to be OK for delete and update.
3. Danger
While we are using sharding by primary key, Tarantool saves us from modifying sharding key.
But when we start to use custom sharding key, there is no guarantees that is wouldn't be changed occasionally.
from crud.
Related Issues (20)
- Borders fails to process bad fields
- operation_data usability
- Update operation convert is broken for splice HOT 2
- Replace/insert errors `operation_data` should always contain tuple that was not inserted
- [BUG] Calling `crud.get` causes a bunch of `fiber leak` errors HOT 1
- crud doesn't build key from conditions HOT 1
- Support vshard's `identification_mode` = `name_as_key` HOT 2
- Сan't initialize storage/router if no UUIDs
- support vshard `master: auto` HOT 1
- `box.info.ro` can be true in `init_storage()` on all instances in replicaset HOT 1
- It seems that crud is doing two selects in one `crud.get` HOT 3
- Add a role for tarantool 3.0 HOT 1
- After upgrade CRUD from 0.10.0 to 1.4.2 we got errors for different operations for old spaces. HOT 15
- Проблема с crud.count HOT 3
- Consider reworking batch operations info HOT 2
- Handle 'wrong symbol )' exception in case of comparision field with nil or {} on nonindexed column
- Bad error handling
- Vshard cluster does not start in tests
- `_many` operations fail to work with bucket specified
- schema: add new system space _gc_consumers
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crud.