Use cases: a) customer wants to see counts of templates in message template catalo

I would agree to add count() and <code class="notrans

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Implement count() method about crud HOT 15 CLOSED

tarantool commented on August 11, 2024

Implement count() method

from crud.

Comments (15)

knazarov commented on August 11, 2024

I would agree to add count() and bsize() methods for the whole space, but I won't support adding counts by a condition. In too many cases this will be a full scan.

In almost all cases I know, it is safe to not show the total count of whatever you return to the user.

from crud.

akudiyar commented on August 11, 2024

len() and bsize() doesn't help if the customer wants to count filtered tuples without actually loading them to the client (over the network in the general case).

A full scan on a sharded space is not a big deal if it is not done often. We cannot avoid full scans completely in other cases, knowledge of how it works will be always necessary for developers.

from crud.

knazarov commented on August 11, 2024

It is a big deal because it stops other things from accessing a database. And yes, we definitely can avoid full scans. We just won't allow them through the API.

from crud.

akudiyar commented on August 11, 2024

If we won't allow the full scans, they will not disappear from the customer tasks. This pain will just shift to another place.

We need some kind of support for such tasks for being able to implement these things in connectors.

from crud.

knazarov commented on August 11, 2024

Your statement is demonstrably false. Aerospike and Redis can exist without such queries. They have the ability to iterate over the collection on the client the same way we propose to do with select or pairs.

To count the items of a large collection you can create a separate space with counters. With interactive transactions, you can atomically update both of those spaces from the client. This will not require you to write any additional code.

If you have only a few items to count, you can just select them all.

from crud.

akudiyar commented on August 11, 2024

But counters in special spaces look like an implementation detail, why cannot we have count() method in CRUD API which does all this boilerplate under the hood?

I see that for every simple task like count which may involve scan complexity we are going to push the customers to reinvent the wheel. And connectors cannot help to avoid this because there is no DDL API for now.

UPD: There is a problem that CRUD API doesn't rely on any DDL API at the moment too.

from crud.

no1seman commented on August 11, 2024

Seems it's time to triage once more because we have the following use case:
User have to get count by any contitions and user agree that the result will not be accurate.

So I suggest to make count_async:

arguments and options like crud.select/crud.pairs;
implement storage_count_async with cycle with paris that will count number of rows in space with yeild by batch_size;
router must call storage_count_async on all replicasets;

To avoid any locks and slowdowns - implement mutex, that will guarantee that storage_count_async may run no more than N times simultaneously on each storage.

from crud.

artur-barsegyan commented on August 11, 2024

@no1seman
Here we are solving a special case of a general problem with a map-reducer call for crudes.
I suggest thinking about this in the direction of sending a stored procedure with a special contract for the return value and calling this procedure from the router.

Because, for example, there is still a frequent task on the cluster to write a set of data on the storage in a transaction. And in this transaction on the storage, you need to perform many different operations.

It is not necessary to send the procedure code through the cruise, you can simply teach to call an already existing store.

from crud.

unera commented on August 11, 2024

local count = crud.count({{'=', 'status', 'NEW'}})

Lets do something like

crud.count({ '=', 'status', 'NEW' }, {options})

Where options:

sec_scan, default value is false
Implementation:

count look through space indexes and find index for status.

If the index is found, count iterates using it.
If the index is not found, count iterates using pk if options.sec_scan == true

The same for bsize, pairs

from crud.

no1seman commented on August 11, 2024

@unera Why not to use the same API as select/pairs? The man difference from select/pairs: count not get data and do it with yields. So, seems need the folllowing options:
batch_size (number of pairs cycles to yield after)
use_box_count - in some cases, for example not huge space we may need to count precisely by index, but if the size of space huge - need to count approximately with yeilds (this option may be automatic, because we may get len of space on this particular instance, if it is larger than COUNT_HARD_LIMIT we have to falldown to approximate algorithm)

from crud.

Mons commented on August 11, 2024

One more thing, to kill the whole cluster with one wrong query.
Fullscan and filters are pure evil for Tarantool.

from crud.

unera commented on August 11, 2024

@no1seman

Why not to use the same API as select/pairs?

I agree :)

I didn't think that the question and select/pairs are different.

So, lets do as select/pairs. Drop my comment from 1 Oct.

from crud.

unera commented on August 11, 2024

like here

local objects, err = crud.count(space_name, conditions, opts)

Syntax is the same,

excluding options:

first
after
batch_size
fields

from crud.

no1seman commented on August 11, 2024

@unera batch_size may be used as number of pairs cycles between yields or there may be any other option.

from crud.

R-omk commented on August 11, 2024

What about this case:
select count(field) from t1 in case the field can be nullable ?

Can instead of inventing one more not working 'killer feature' , make a general map/reduce?

from crud.

Implement count() method about crud HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent