Giter VIP home page Giter VIP logo

Comments (9)

gousiosg avatar gousiosg commented on August 29, 2024 1

As our queries will be mostly reading from non-updating data, we don't need cursors as they will be complicating our lives in terms of session management (which we also don't need). If we can restrict our browsing on indexed columns, I 'd recommend we do keyset pagination and always return ordered results (which actually makes sense).

from fasten.

marcomicera avatar marcomicera commented on August 29, 2024 1

I would avoid anything more sophisticated until there is a need for it.

Yeah, so we agree on this. I dropped the keyset pagination for the offset-limit one d1b7acb. If you're ok with this, we can close this issue already.

@romatallinn you now have pagination! Same offset and limit query params as in the good old endpoints table. Should something change in the future, you'll be promptly informed!

from fasten.

marcomicera avatar marcomicera commented on August 29, 2024

Alright, doable with JOOQ's seek() method.

Since we won't be having fixed-size pages, the limit and offset query params don't make much sense anymore. I see two ways:

  1. We get rid of them and use a unique query parameter, e.g., page. I can then emulate pagination by storing page boundaries (i.e., last page values of columns we order by) in a dedicated materialized view that we should then somehow keep updated (immediately with a trigger, deferred with a temporal trigger, or on-demand with a stored procedure).
    Example from that blog post:

    select
      x.value, 
      x.id,
      row_number() 
        over(order by x.value, x.id) + 1 page_number
    from (
      select
        t.id, 
        t.value,
        case row_number() 
             over(order by t.value, t.id) % 5 
          when 0 then 1 
          else 0 
        end page_boundary
      from t
      order by t.value, t.id
    ) x
    where x.page_boundary = 1

    This will then yield:

    | VALUE |  ID | PAGE_NUMBER |
    |-------|-----|-------------|
    |     0 | 786 |           2 |
    |     1 | 250 |           3 |
    |     1 | 959 |           4 |
    |     2 | 229 |           5 |
    |     2 | 533 |           6 | <-- Before page 6
    |     3 |  37 |           7 |
    |     3 | 768 |           8 |
    
  2. We keep them and use the aforementioned table to determine the page number given a certain offset value. Then we limit the result size with the LIMIT SQL clause.

Both of these solutions require these materialized views that contain page boundaries.
Am I missing a simpler solution, maybe?

from fasten.

marcomicera avatar marcomicera commented on August 29, 2024
  1. We keep them and use the aforementioned table to determine the page number given a certain offset value. Then we limit the result size with the LIMIT SQL clause.

I'm doing this, but with page and limit as query parameters.

from fasten.

marcomicera avatar marcomicera commented on August 29, 2024

I was right in the middle of it when I realized that keyset pagination isn't really suited for our case as we're already filtering those entries for package names and versions all the time.

So for instance such a table:

create materialized view packages_page_limits as
select package_name,
       id,
       row_number()
       over (order by package_name, id) + 1 page_number
from (
         select id,
                package_name,
                case row_number()
                     over (order by package_name, id) % 100
                    when 0 then 1
                    else 0
                    end page_boundary
         from packages
         order by package_name
     ) as x
where x.page_boundary = 1;
select * from packages_page_limits limit 10;
                       package_name                       |    id    | page_number
----------------------------------------------------------+----------+-------------
 aero.m-click:mcpdf                                       |   526850 |           2
 ai.h2o:xgboost4j                                         |   592988 |           3
 ai.tock:tock-bot-engine-jackson                          |  8812137 |           4
 android.compatibility:android-support-v7-appcompat       | 20842521 |           5
 ant:ant-junit                                            |   298684 |           6
 app.myoss.cloud.boot:myoss-starter-web                   |  8548776 |           7
 at.bestsolution.eclipse:org.eclipse.fx.ui.animation      |  7573640 |           8
 at.lux:imageanalysis                                     |  7487539 |           9
 au.com.codeka:carrot                                     |    88521 |          10
 au.com.redboxresearchdata.fascinator:plugin-sso-rapidaaf |  5939684 |          11

would only help me when returning all packages, if I'm not wrong.

I'm switching back to the classic (yet inconsistent & inefficient) offset-limit.

from fasten.

gousiosg avatar gousiosg commented on August 29, 2024

Materialized views are not a great solution, as they will need to be updated every time we add a new package. Also, for large tables, like callables and edges they will just replicate the whole (humongous) table with just an extra column. In sort, let's just do page and limit without pre-computing stuff. This is what most APIs do, so it will be more familiar with devs as well.

from fasten.

gousiosg avatar gousiosg commented on August 29, 2024

The biggest package in the current database is com.groupdocs:groupdocs-signature, version 18.4 (371k callables). Postgres-based offset/limit pagination works nicely:

fasten_java=> \timing
fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
  join modules m on m.package_version_id = pv.id
  join callables c on c.module_id = m.id
where
  c.is_internal_call is true
  and p.package_name = 'com.groupdocs:groupdocs-signature'
          AND pv.version = '18.4' limit 100 offset 45000;
Time: 365.154 ms

fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
  join modules m on m.package_version_id = pv.id
  join callables c on c.module_id = m.id
where
  c.is_internal_call is true
  and p.package_name = 'com.groupdocs:groupdocs-signature'
          AND pv.version = '18.4' limit 100 offset 12000;
Time: 303.800 ms

I would avoid anything more sophisticated until there is a need for it.

from fasten.

gousiosg avatar gousiosg commented on August 29, 2024

It is indeed a bit slower if we order by callable.id, but not prohibitively so:

fasten_java=> select c.fasten_uri
from package_versions pv join packages p on pv.package_id = p.id
  join modules m on m.package_version_id = pv.id
  join callables c on c.module_id = m.id
where
  c.is_internal_call is true
  and p.package_name = 'com.groupdocs:groupdocs-signature'
          AND pv.version = '18.4'
order by c.id       
limit 100 offset 12000;
Time: 1477.363 ms (00:01.477)

from fasten.

marcomicera avatar marcomicera commented on August 29, 2024

@gousiosg can we close this?

from fasten.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.