Comments (4)
Trino streams results as they are available back to the client.
You can observe this if you run
SELECT * FROM very_large_table
you will get some results back to your client (eg CLI), while query is still running. The query won't consume a lot of memory.
So, i believe that Trino already does what is being ask for.
On occasion, and I'm not sure why, the trino CLI will spit out small amount of results but then continue with the query.
CLI uses a pager (eg less
)
from trino.
If you pipe the results of the CLI, disable the pager (export TRINO_PAGER=cat
), or use the JDBC driver, you will see that the results are streamed immediately.
from trino.
Certainly! Here’s a cleaned-up version of your text:
I am using the HTTP API, if that makes a difference. I don't mean streamed results; I mean incremental results. Maybe that's what you mean, and that's what it's doing. That's totally fine.
Our use case involves time-series data. Suppose a user wants information with a complex query over a time period, say one day, and the order doesn't matter. We have billions of rows in each table, so such a query will take a while, 30+ minutes or more. During that time, no results will be streamed back, or at least it doesn't appear that way – I won't see 100 results come back and then more results trickle in. This might be due to how things need to be collated on the master and how SQL works (I am not an expert on that).
Our older proprietary system (which is one reason we are moving to open source), because of how it was laid out on disk, would already have data in time order, so it would get incremental results in seconds as it walked through the data.
One solution we have considered is breaking apart the day into subqueries, such as breaking a day query into hours. This would then run on much less data and could provide incremental results. Each hour query might take 1 minute, and we could show results every minute instead of waiting 30 minutes. This could be further reduced to 30 seconds, 15 seconds, and so on. This led me to think that if Trino could stream results more incrementally, it would be much less complicated on our end. I saw that Postgres has a single row mode, which seems similar to this idea, or perhaps even using a CURSOR.
Anyway, maybe I am misunderstanding or wanting something unrealistic. That's absolutely possible and even highly probable.
Thanks for looking into it! My team and I appreciate all the work you put into Trino.
from trino.
As long as the query can compute results fast enough the results are streamed back to the client.
So e.g. SELECT * FROM my_large_table
does stream results perfectly since rows can be returned as soon as they are available.
However SELECT max(id) FROM my_large_table
cannot stream results since it needs to wait for the global aggregation to be computed.
SELECT part_key, count(1) FROM my_large_table GROUP BY part_key
on the other hand would behave somewhere between the other two examples because it can return results for one part_key as soon as the aggregation is computed. However if all of the part_key results take long time to compute it'll look like the query isn't streaming results.
Same is btw true for Postgres too in single_row_mode - it can return rows only once the computation is done, so if the computation itself takes a lot of time there are no results returned for that amount of time.
from trino.
Related Issues (20)
- OPA: Disable information_schema access control checks
- ConnectorPageSource.getNextPage should not be called if isFinished returns true
- JSON_PARSE to honor duplicate key check? HOT 2
- IcebergSplit Class Loader
- Add a product test for variant type in Delta Lake
- Iceberg query failure because of predicate pushdown with iceberg column id
- unable to Connect Hudi from Trino in GCS HOT 2
- Iceberg table partition creation fails due to column name conflict with Trino's generated partitioning column name
- give me a complete example of Redis HOT 2
- give redis hash json how to write (geographical reasons)
- unable to connect hudi from trino HOT 2
- Query taking longer time in the Planning Phase
- Change length of clintInfo Column HOT 1
- MethodTooLargeException When Aggregation Funcation Use More Than 6 Parameters HOT 1
- Failed to run the optimize command to merge small files in Iceberg after upgrading Trino to version 444 HOT 2
- approx_percentile inaccurate
- How to access data from multiple storage
- is it possible to query data on different s3 bucket?
- Deleted records returned when using equality deletes with Iceberg and filtering by non-partitioned columns HOT 1
- Username is visible in Trino UI in plaintext form
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trino.