In postgres there is a row-by-row results. Which, if possible, will start to stream d

[Feature Request] Support "Single Row Mode" similar to PostgreSQL about trino HOT 4 CLOSED

Sartan4455 commented on June 22, 2024

[Feature Request] Support "Single Row Mode" similar to PostgreSQL

from trino.

Comments (4)

findepi commented on June 22, 2024

Trino streams results as they are available back to the client.
You can observe this if you run

SELECT * FROM very_large_table

you will get some results back to your client (eg CLI), while query is still running. The query won't consume a lot of memory.

So, i believe that Trino already does what is being ask for.

On occasion, and I'm not sure why, the trino CLI will spit out small amount of results but then continue with the query.

CLI uses a pager (eg less)

from trino.

electrum commented on June 22, 2024

If you pipe the results of the CLI, disable the pager (export TRINO_PAGER=cat), or use the JDBC driver, you will see that the results are streamed immediately.

from trino.

Sartan4455 commented on June 22, 2024

Certainly! Here’s a cleaned-up version of your text:

I am using the HTTP API, if that makes a difference. I don't mean streamed results; I mean incremental results. Maybe that's what you mean, and that's what it's doing. That's totally fine.

Our use case involves time-series data. Suppose a user wants information with a complex query over a time period, say one day, and the order doesn't matter. We have billions of rows in each table, so such a query will take a while, 30+ minutes or more. During that time, no results will be streamed back, or at least it doesn't appear that way – I won't see 100 results come back and then more results trickle in. This might be due to how things need to be collated on the master and how SQL works (I am not an expert on that).

Our older proprietary system (which is one reason we are moving to open source), because of how it was laid out on disk, would already have data in time order, so it would get incremental results in seconds as it walked through the data.

One solution we have considered is breaking apart the day into subqueries, such as breaking a day query into hours. This would then run on much less data and could provide incremental results. Each hour query might take 1 minute, and we could show results every minute instead of waiting 30 minutes. This could be further reduced to 30 seconds, 15 seconds, and so on. This led me to think that if Trino could stream results more incrementally, it would be much less complicated on our end. I saw that Postgres has a single row mode, which seems similar to this idea, or perhaps even using a CURSOR.

Anyway, maybe I am misunderstanding or wanting something unrealistic. That's absolutely possible and even highly probable.

Thanks for looking into it! My team and I appreciate all the work you put into Trino.

from trino.

hashhar commented on June 22, 2024

As long as the query can compute results fast enough the results are streamed back to the client.

So e.g. SELECT * FROM my_large_table does stream results perfectly since rows can be returned as soon as they are available.

However SELECT max(id) FROM my_large_table cannot stream results since it needs to wait for the global aggregation to be computed.

SELECT part_key, count(1) FROM my_large_table GROUP BY part_key on the other hand would behave somewhere between the other two examples because it can return results for one part_key as soon as the aggregation is computed. However if all of the part_key results take long time to compute it'll look like the query isn't streaming results.

Same is btw true for Postgres too in single_row_mode - it can return rows only once the computation is done, so if the computation itself takes a lot of time there are no results returned for that amount of time.

from trino.

[Feature Request] Support "Single Row Mode" similar to PostgreSQL about trino HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent