Giter VIP home page Giter VIP logo

Comments (6)

szarnyasg avatar szarnyasg commented on September 26, 2024

Hello @zeroflaw, this is not a bug but expected behaviour. Arguably it's a quirk of SQL: you have to add an ORDER BY clause into the first and last functions to ensure a deterministic result:

SELECT
    time_bucket(INTERVAL '5m', timestamp) AS open_timestamp,
    FIRST(open ORDER BY close) as open,
    MAX(high) as max,
    MIN(low) as min,
    LAST(close ORDER BY close) as close
FROM (SELECT * FROM 'kline_sample.parquet' ORDER BY timestamp ASC)
GROUP BY open_timestamp
ORDER BY open_timestamp ASC;
┌──────────────────────────┬──────────────────┬──────────────────┬──────────────────┬──────────────────┐
│      open_timestamp      │       open       │       max        │       min        │      close       │
│ timestamp with time zone │  decimal(24,12)  │  decimal(24,12)  │  decimal(24,12)  │  decimal(24,12)  │
├──────────────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┤
│ 2021-01-01 01:00:00+01   │ 123.300000000000 │ 123.600000000000 │ 123.300000000000 │ 123.600000000000 │
└──────────────────────────┴──────────────────┴──────────────────┴──────────────────┴──────────────────┘

This consistently returns 123.3 for first and 123.6 for last.

from duckdb.

zeroflaw avatar zeroflaw commented on September 26, 2024

Thanks, yes indeed that works.

Should this not be the default - ordering by itself e.g.
FIRST(open ORDER BY open)
LAST(close ORDER BY close)

They seem like they should be convince methods as you could just do FIRST(open ORDER BY open desc) which would actually give you the 'last', which works but is not intuitive at all. I mean its technically correct, so it not a great argument. But I hope you can see where I'm coming from. The ordering should have a consistent sensible default (otherwise it does nothing meaningful).

Anyway, thanks again.

from duckdb.

szarnyasg avatar szarnyasg commented on September 26, 2024

Yes, regrettably this is one of the many instances of "undefined behavior" in the SQL standard, that's likely there for performance reasons.

In any case, it's unlikely that we'll change this in DuckDB in the near future.

from duckdb.

zeroflaw avatar zeroflaw commented on September 26, 2024

any_value(open ORDER BY open) = FIRST(open ORDER BY open)

So the aggregate functions 'any_value', 'last' and 'first' are basically all the same function, if performance mattered you could just use 'any_value' which would not have a ordering default (this would be sensible), while last and first should have sensible default orderings. I'm not familiar with the SQL standard, but i'm now going to find and read them because this seems ridiculous.

Thanks for your time.

from duckdb.

Mytherin avatar Mytherin commented on September 26, 2024

I'm not familiar with the SQL standard, but i'm now going to find and read them because this seems ridiculous.

None of these functions are defined in the SQL standard, and in general in SQL no ordering is present unless an explicit ORDER BY is provided. FIRST and LAST are generally used when you want to order by a different column than the one provided as the argument, e.g. LAST(open ORDER BY timestamp).

What you are suggesting (FIRST(open ORDER BY open)) just means that FIRST(open) is equivalent to MIN(open) which does not seem particularly useful as you could just use MIN. In the current implementation, without an explicit ORDER BY specified, FIRST(open) means `the first value of open that was found by any thread. You can make this deterministic by either (1) setting the system to use a single thread, or (2) explicitly specifying an ORDER BY as discussed.

from duckdb.

zeroflaw avatar zeroflaw commented on September 26, 2024

I agree not particularly useful, but its consistent, and I potentially wouldn't have been thrown off by a value inconsistency flipping between two results (something you would expect of ANY_VALUE). So it's arguably more useful than an arbitrary value, considering the function is named FIRST, at least you are getting the FIRST of something, even if its effectively defaulting to MIN, least it is not defaulting to ANY_VALUE.

I'm happy with the resolution, and I'll take this as a lesson and be more careful with FIRST and LAST going forward.

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.