Giter VIP home page Giter VIP logo

Comments (4)

soerenwolfers avatar soerenwolfers commented on September 22, 2024 1

"To reproduce" means you are supposed to provide concrete code that reproduces the observed problem, at least on your hardware and software setup. In particular, you have to add the definition and contents of the table in your snippet. If you find it impossible to reproduce the problem on a toy table with code-generated or manually inserted fake values, you should upload the contents of your table, e.g. as parquet file, and include the code that reads the parquet file in your snippet.

from duckdb.

ASaltedFishBoy avatar ASaltedFishBoy commented on September 22, 2024

"To reproduce" means you are supposed to provide concrete code that reproduces the observed problem, at least on your hardware and software setup. In particular, you have to add the definition and contents of the table in your snippet. If you find it impossible to reproduce the problem on a toy table with a few manually inserted values, you should upload the contents of your table, e.g. as parquet file, and include the code that reads the parquet file in your snippet.

This is my table construction statement,the part that does not contain data insertion of CAN_TEST.I have nothing more can offer.

CREATE SEQUENCE id_sequence START 1
CREATE TABLE CAN_TEST (trace_id BIGINT PRIMARY KEY DEFAULT nextval('id_sequence'),timestamp BIGINT, others...)
CREATE UNIQUE INDEX CAN_TEST_trace_id_IDX ON main.CAN_TEST  (trace_id)

CREATE TABLE CAN_TEST_timestamp (trace_id BIGINT PRIMARY KEY ,timestamp BIGINT)
INSERT INTO CAN_TEST_timestamp (SELECT trace_id,timestamp FROM main.CAN_TEST ORDER BY "timestamp")

Then there are the two statements in issue.

from duckdb.

soerenwolfers avatar soerenwolfers commented on September 22, 2024

You could have turned this into a MWE like this:

CREATE OR REPLACE SEQUENCE id_sequence START 1;

CREATE OR REPLACE TABLE CAN_TEST (trace_id BIGINT PRIMARY KEY DEFAULT nextval('id_sequence'), timestamp BIGINT, other1 REAL, other2 TIMESTAMP);
CREATE UNIQUE INDEX CAN_TEST_trace_id_IDX ON main.CAN_TEST  (trace_id);

INSERT INTO CAN_TEST (SELECT 
    unnest(ARRAY(SELECT * FROM range(10_000_000))),
    unnest(ARRAY(SELECT random() * 100 FROM range(10_000_000))),
    unnest(ARRAY(SELECT random() FROM range(10_000_000))),
    unnest(ARRAY(SELECT '2020-01-01'::DATE + to_seconds(range) FROM range(10_000_000)))
);

CREATE OR REPLACE TABLE CAN_TEST_timestamp (trace_id BIGINT PRIMARY KEY, timestamp BIGINT);
INSERT INTO CAN_TEST_timestamp (SELECT trace_id,timestamp FROM main.CAN_TEST ORDER BY "timestamp");
SELECT * FROM (SELECT * FROM (SELECT * FROM main.CAN_TEST_timestamp ORDER BY "timestamp") LIMIT 4000 OFFSET 8000000) AS timeOrder INNER JOIN main.CAN_TEST ON main.CAN_TEST.trace_id == timeOrder.trace_id ORDER BY (timeOrder."timestamp",timeOrder.trace_id);
449 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SELECT * FROM (SELECT * FROM (SELECT * FROM main.CAN_TEST_timestamp ORDER BY "timestamp") LIMIT 4000 OFFSET 8000000) AS timeOrder LEFT JOIN main.CAN_TEST ON main.CAN_TEST.trace_id == timeOrder.trace_id ORDER BY (timeOrder."timestamp",timeOrder.trace_id);
603 ms ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I do see a performance difference, but nowhere near the size you observe. Maybe it's the "other" columns? Maybe it's the number of rows? Maybe it's the datatypes of the other columns?

from duckdb.

ASaltedFishBoy avatar ASaltedFishBoy commented on September 22, 2024

You could have turned this into a MWE like this:

CREATE OR REPLACE SEQUENCE id_sequence START 1;

CREATE OR REPLACE TABLE CAN_TEST (trace_id BIGINT PRIMARY KEY DEFAULT nextval('id_sequence'), timestamp BIGINT, other1 REAL, other2 TIMESTAMP);
CREATE UNIQUE INDEX CAN_TEST_trace_id_IDX ON main.CAN_TEST  (trace_id);

INSERT INTO CAN_TEST (SELECT 
    unnest(ARRAY(SELECT * FROM range(10_000_000))),
    unnest(ARRAY(SELECT random() * 100 FROM range(10_000_000))),
    unnest(ARRAY(SELECT random() FROM range(10_000_000))),
    unnest(ARRAY(SELECT '2020-01-01'::DATE + to_seconds(range) FROM range(10_000_000)))
);

CREATE OR REPLACE TABLE CAN_TEST_timestamp (trace_id BIGINT PRIMARY KEY, timestamp BIGINT);
INSERT INTO CAN_TEST_timestamp (SELECT trace_id,timestamp FROM main.CAN_TEST ORDER BY "timestamp");
SELECT * FROM (SELECT * FROM (SELECT * FROM main.CAN_TEST_timestamp ORDER BY "timestamp") LIMIT 4000 OFFSET 8000000) AS timeOrder INNER JOIN main.CAN_TEST ON main.CAN_TEST.trace_id == timeOrder.trace_id ORDER BY (timeOrder."timestamp",timeOrder.trace_id);
449 ms ± 18.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
SELECT * FROM (SELECT * FROM (SELECT * FROM main.CAN_TEST_timestamp ORDER BY "timestamp") LIMIT 4000 OFFSET 8000000) AS timeOrder LEFT JOIN main.CAN_TEST ON main.CAN_TEST.trace_id == timeOrder.trace_id ORDER BY (timeOrder."timestamp",timeOrder.trace_id);
603 ms ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I do see a performance difference, but nowhere near the size you observe. Maybe it's the "other" columns? Maybe it's the number of rows? Maybe it's the datatypes of the other columns?

There are roughly 30 columns of data here, of different types(int64,string or bytes), but the data itself is not large (1 row of total bytes is roughly 200bytes), and the other differences seem to be absent.
I am so sorry that I cannot provide a specific table definition, which involves confidentiality.

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.