Giter VIP home page Giter VIP logo

Comments (15)

knazarov avatar knazarov commented on August 11, 2024

I would agree to add count() and bsize() methods for the whole space, but I won't support adding counts by a condition. In too many cases this will be a full scan.

In almost all cases I know, it is safe to not show the total count of whatever you return to the user.

from crud.

akudiyar avatar akudiyar commented on August 11, 2024

len() and bsize() doesn't help if the customer wants to count filtered tuples without actually loading them to the client (over the network in the general case).

A full scan on a sharded space is not a big deal if it is not done often. We cannot avoid full scans completely in other cases, knowledge of how it works will be always necessary for developers.

from crud.

knazarov avatar knazarov commented on August 11, 2024

It is a big deal because it stops other things from accessing a database. And yes, we definitely can avoid full scans. We just won't allow them through the API.

from crud.

akudiyar avatar akudiyar commented on August 11, 2024

If we won't allow the full scans, they will not disappear from the customer tasks. This pain will just shift to another place.

We need some kind of support for such tasks for being able to implement these things in connectors.

from crud.

knazarov avatar knazarov commented on August 11, 2024

Your statement is demonstrably false. Aerospike and Redis can exist without such queries. They have the ability to iterate over the collection on the client the same way we propose to do with select or pairs.

To count the items of a large collection you can create a separate space with counters. With interactive transactions, you can atomically update both of those spaces from the client. This will not require you to write any additional code.

If you have only a few items to count, you can just select them all.

from crud.

akudiyar avatar akudiyar commented on August 11, 2024

But counters in special spaces look like an implementation detail, why cannot we have count() method in CRUD API which does all this boilerplate under the hood?

I see that for every simple task like count which may involve scan complexity we are going to push the customers to reinvent the wheel. And connectors cannot help to avoid this because there is no DDL API for now.

UPD: There is a problem that CRUD API doesn't rely on any DDL API at the moment too.

from crud.

no1seman avatar no1seman commented on August 11, 2024

Seems it's time to triage once more because we have the following use case:
User have to get count by any contitions and user agree that the result will not be accurate.

So I suggest to make count_async:

  • arguments and options like crud.select/crud.pairs;
  • implement storage_count_async with cycle with paris that will count number of rows in space with yeild by batch_size;
  • router must call storage_count_async on all replicasets;

To avoid any locks and slowdowns - implement mutex, that will guarantee that storage_count_async may run no more than N times simultaneously on each storage.

from crud.

artur-barsegyan avatar artur-barsegyan commented on August 11, 2024

@no1seman
Here we are solving a special case of a general problem with a map-reducer call for crudes.
I suggest thinking about this in the direction of sending a stored procedure with a special contract for the return value and calling this procedure from the router.

Because, for example, there is still a frequent task on the cluster to write a set of data on the storage in a transaction. And in this transaction on the storage, you need to perform many different operations.

It is not necessary to send the procedure code through the cruise, you can simply teach to call an already existing store.

from crud.

unera avatar unera commented on August 11, 2024

local count = crud.count({{'=', 'status', 'NEW'}})

Lets do something like

crud.count({ '=', 'status', 'NEW' }, {options})

Where options:

  • sec_scan, default value is false
    Implementation:

count look through space indexes and find index for status.

  • If the index is found, count iterates using it.
  • If the index is not found, count iterates using pk if options.sec_scan == true

The same for bsize, pairs

from crud.

no1seman avatar no1seman commented on August 11, 2024

@unera Why not to use the same API as select/pairs? The man difference from select/pairs: count not get data and do it with yields. So, seems need the folllowing options:
batch_size (number of pairs cycles to yield after)
use_box_count - in some cases, for example not huge space we may need to count precisely by index, but if the size of space huge - need to count approximately with yeilds (this option may be automatic, because we may get len of space on this particular instance, if it is larger than COUNT_HARD_LIMIT we have to falldown to approximate algorithm)

from crud.

Mons avatar Mons commented on August 11, 2024

One more thing, to kill the whole cluster with one wrong query.
Fullscan and filters are pure evil for Tarantool.

from crud.

unera avatar unera commented on August 11, 2024

@no1seman

Why not to use the same API as select/pairs?

I agree :)

I didn't think that the question and select/pairs are different.

So, lets do as select/pairs. Drop my comment from 1 Oct.

from crud.

unera avatar unera commented on August 11, 2024

like here

local objects, err = crud.count(space_name, conditions, opts)

Syntax is the same,

excluding options:

  • first
  • after
  • batch_size
  • fields

from crud.

no1seman avatar no1seman commented on August 11, 2024

@unera batch_size may be used as number of pairs cycles between yields or there may be any other option.

from crud.

R-omk avatar R-omk commented on August 11, 2024

What about this case:
select count(field) from t1 in case the field can be nullable ?

Can instead of inventing one more not working 'killer feature' , make a general map/reduce?

from crud.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.