Giter VIP home page Giter VIP logo

Comments (4)

findepi avatar findepi commented on June 22, 2024

Trino streams results as they are available back to the client.
You can observe this if you run

SELECT * FROM very_large_table

you will get some results back to your client (eg CLI), while query is still running. The query won't consume a lot of memory.

So, i believe that Trino already does what is being ask for.

On occasion, and I'm not sure why, the trino CLI will spit out small amount of results but then continue with the query.

CLI uses a pager (eg less)

from trino.

electrum avatar electrum commented on June 22, 2024

If you pipe the results of the CLI, disable the pager (export TRINO_PAGER=cat), or use the JDBC driver, you will see that the results are streamed immediately.

from trino.

Sartan4455 avatar Sartan4455 commented on June 22, 2024

Certainly! Here’s a cleaned-up version of your text:

I am using the HTTP API, if that makes a difference. I don't mean streamed results; I mean incremental results. Maybe that's what you mean, and that's what it's doing. That's totally fine.

Our use case involves time-series data. Suppose a user wants information with a complex query over a time period, say one day, and the order doesn't matter. We have billions of rows in each table, so such a query will take a while, 30+ minutes or more. During that time, no results will be streamed back, or at least it doesn't appear that way – I won't see 100 results come back and then more results trickle in. This might be due to how things need to be collated on the master and how SQL works (I am not an expert on that).

Our older proprietary system (which is one reason we are moving to open source), because of how it was laid out on disk, would already have data in time order, so it would get incremental results in seconds as it walked through the data.

One solution we have considered is breaking apart the day into subqueries, such as breaking a day query into hours. This would then run on much less data and could provide incremental results. Each hour query might take 1 minute, and we could show results every minute instead of waiting 30 minutes. This could be further reduced to 30 seconds, 15 seconds, and so on. This led me to think that if Trino could stream results more incrementally, it would be much less complicated on our end. I saw that Postgres has a single row mode, which seems similar to this idea, or perhaps even using a CURSOR.

Anyway, maybe I am misunderstanding or wanting something unrealistic. That's absolutely possible and even highly probable.

Thanks for looking into it! My team and I appreciate all the work you put into Trino.

from trino.

hashhar avatar hashhar commented on June 22, 2024

As long as the query can compute results fast enough the results are streamed back to the client.

So e.g. SELECT * FROM my_large_table does stream results perfectly since rows can be returned as soon as they are available.

However SELECT max(id) FROM my_large_table cannot stream results since it needs to wait for the global aggregation to be computed.

SELECT part_key, count(1) FROM my_large_table GROUP BY part_key on the other hand would behave somewhere between the other two examples because it can return results for one part_key as soon as the aggregation is computed. However if all of the part_key results take long time to compute it'll look like the query isn't streaming results.

Same is btw true for Postgres too in single_row_mode - it can return rows only once the computation is done, so if the computation itself takes a lot of time there are no results returned for that amount of time.

from trino.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.