Comments (6)
Hello @zeroflaw, this is not a bug but expected behaviour. Arguably it's a quirk of SQL: you have to add an ORDER BY
clause into the first
and last
functions to ensure a deterministic result:
SELECT
time_bucket(INTERVAL '5m', timestamp) AS open_timestamp,
FIRST(open ORDER BY close) as open,
MAX(high) as max,
MIN(low) as min,
LAST(close ORDER BY close) as close
FROM (SELECT * FROM 'kline_sample.parquet' ORDER BY timestamp ASC)
GROUP BY open_timestamp
ORDER BY open_timestamp ASC;
┌──────────────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┐
│ open_timestamp │ open │ max │ min │ close │
│ timestamp with time zone │ decimal(24,12) │ decimal(24,12) │ decimal(24,12) │ decimal(24,12) │
├──────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ 2021-01-01 01:00:00+01 │ 123.300000000000 │ 123.600000000000 │ 123.300000000000 │ 123.600000000000 │
└──────────────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┘
This consistently returns 123.3
for first
and 123.6
for last
.
from duckdb.
Thanks, yes indeed that works.
Should this not be the default - ordering by itself e.g.
FIRST(open ORDER BY open)
LAST(close ORDER BY close)
They seem like they should be convince methods as you could just do FIRST(open ORDER BY open desc) which would actually give you the 'last', which works but is not intuitive at all. I mean its technically correct, so it not a great argument. But I hope you can see where I'm coming from. The ordering should have a consistent sensible default (otherwise it does nothing meaningful).
Anyway, thanks again.
from duckdb.
Yes, regrettably this is one of the many instances of "undefined behavior" in the SQL standard, that's likely there for performance reasons.
In any case, it's unlikely that we'll change this in DuckDB in the near future.
from duckdb.
any_value(open ORDER BY open) = FIRST(open ORDER BY open)
So the aggregate functions 'any_value', 'last' and 'first' are basically all the same function, if performance mattered you could just use 'any_value' which would not have a ordering default (this would be sensible), while last and first should have sensible default orderings. I'm not familiar with the SQL standard, but i'm now going to find and read them because this seems ridiculous.
Thanks for your time.
from duckdb.
I'm not familiar with the SQL standard, but i'm now going to find and read them because this seems ridiculous.
None of these functions are defined in the SQL standard, and in general in SQL no ordering is present unless an explicit ORDER BY
is provided. FIRST
and LAST
are generally used when you want to order by a different column than the one provided as the argument, e.g. LAST(open ORDER BY timestamp)
.
What you are suggesting (FIRST(open ORDER BY open)
) just means that FIRST(open)
is equivalent to MIN(open)
which does not seem particularly useful as you could just use MIN
. In the current implementation, without an explicit ORDER BY
specified, FIRST(open)
means `the first value of open that was found by any thread. You can make this deterministic by either (1) setting the system to use a single thread, or (2) explicitly specifying an ORDER BY as discussed.
from duckdb.
I agree not particularly useful, but its consistent, and I potentially wouldn't have been thrown off by a value inconsistency flipping between two results (something you would expect of ANY_VALUE). So it's arguably more useful than an arbitrary value, considering the function is named FIRST, at least you are getting the FIRST of something, even if its effectively defaulting to MIN, least it is not defaulting to ANY_VALUE.
I'm happy with the resolution, and I'll take this as a lesson and be more careful with FIRST and LAST going forward.
from duckdb.
Related Issues (20)
- 0.10.0: Build breaks: error: virtual function 'StartTransaction' has a different return type ('Transaction *') than the function it overrides (which has return type 'Transaction &') HOT 3
- Database file not decreasing in size HOT 2
- 2 tests fail HOT 1
- JDBC DatabaseMetaData should return COMMENT values as REMARKS
- Reading from Hive partitioned parquet files: criteria from generate_series are not pushed down for use as File Filter.
- Out of Memory Error when COPY to encrypted parquet HOT 1
- ARRAY and parquet: not quite complete HOT 2
- The C-API and ARRAYs HOT 2
- Unable to install custom extensions from S3 bucket HOT 2
- New Secret Manager not working on the AWS Graviton instance type HOT 11
- Performance regression: DISTINCT ON got slower and more memory hungry HOT 5
- JDBC ResultSet.getTime() returns null if time value has fractional seconds
- /bigobj not only needed for amalgamation builds HOT 2
- JOIN clause can be aliased, but alias cannot be referenced HOT 1
- Queries to a long df that worked in pip duckdb==0.9.2 does not work in duckdb==0.10.0 HOT 2
- Wrong result when using POSITION() function with only bind values from JDBC HOT 1
- [Python] Unable to substitute "COPY ... TO" target using prepared statement HOT 3
- gcs filesystem query that worked in 0.9.2 no longer works in 0.10.0 HOT 2
- Crash when interrupting complex query HOT 5
- [python] unit test `test_query_interrupt.py` sending `KeyboardInterrupt` to wrong target on aarch64-darwin sometimes HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.