Giter VIP home page Giter VIP logo

Comments (7)

szarnyasg avatar szarnyasg commented on June 25, 2024 1

Out of curiosity, what are the differences between main and feature? I looked at the diff but it was quite large. I see the changes made in #12499, but are there any others that would impact this query?

Currently, main is the code leading up to v1.0.1, so it only contains bugfixes compared to v1.0.0. feature will eventually be v1.1.0, so it contains new features as well.

from duckdb.

Mytherin avatar Mytherin commented on June 25, 2024

Hard to say without looking at your custom code - but this query is not just a scan, it generates a large result set of strings. How does your custom code deal with generating the result set?

from duckdb.

rootjalex avatar rootjalex commented on June 25, 2024

The custom code returns a row store of the result (a std::vector of tuples of the selected type). For this example, it just selects a VariableText class, which is a thin wrapper around std::string.

This issue persists even with aggregations, e.g. instead of selecting a column, just doing COUNT(*), a simple counter loop is significantly faster than duckdb:

sf=15
duckdb (single):        0.459s, 0.355s, 0.354s
duckdb (8 threads):  0.0474s, 0.0496s, 0.0570s
my scan (single):       0.0147s, 0.0137s, 0.0152s
my scan (8 threads): 0.022s, 0.0214s, 0.0212s

(I know my multi-threaded scan is slower than the single-threaded, which is weird, I think there's just some std::thread overheads I haven't debugged).
I'm not sure why duckdb is so much slower than a sequential for loop over the data.

from duckdb.

Mytherin avatar Mytherin commented on June 25, 2024

Profiling this it seems most of the time is going into the extract(month from l_shipdate) function - what does your implementation there look like?

from duckdb.

szarnyasg avatar szarnyasg commented on June 25, 2024

@rootjalex can you please try building DuckDB on the feature branch and see whether it improved the performance of the queries in your issue?

from duckdb.

rootjalex avatar rootjalex commented on June 25, 2024

Profiling this it seems most of the time is going into the extract(month from l_shipdate) function - what does your implementation there look like?

Date is stored as 32 bits, 14 for the year, 4 for the month, and 5 for the day. Extracting a month looks like:

struct Date {
private:
    // 14 bits for yyyy, 14 > log2(9999)
    // 4 bits for mm, 4 > log2(12)
    // 5 bits for dd, 5 > log2(31)
    // guard bits are in between each
    // yyyy - bit - mm - bit - dd
    uint32_t value = 0;

    static constexpr uint32_t YEAR_SHIFT = 11;
    static constexpr uint32_t YEAR_MASK = 0b11111111111111 << YEAR_SHIFT;
    static constexpr uint32_t MONTH_SHIFT = 6;
    static constexpr uint32_t MONTH_MASK = 0b1111 << MONTH_SHIFT;
    static constexpr uint32_t DAY_MASK = 0b11111;

public:
    ...
    Date(uint16_t year, uint8_t month, uint8_t day) {
        value = (((uint32_t)year << YEAR_SHIFT) & YEAR_MASK) | (((uint32_t)month << MONTH_SHIFT) & MONTH_MASK) | ((uint32_t)day & DAY_MASK);
    }
    ALWAYS_INLINE uint16_t get_month() const {
        return (this->value & MONTH_MASK) >> MONTH_SHIFT;
    }
    ...
};

I know this is significantly simpler than duckdb's implementation, but it should only be a small constant factor difference in runtime, and these differences don't appear to be (small) constant factors off

from duckdb.

rootjalex avatar rootjalex commented on June 25, 2024

@rootjalex can you please try building DuckDB on the feature branch and see whether it improved the performance of the queries in your issue?

Runtime was definitely improved (single threaded on sf=15 is around 1.15s for the SELECT and .2s for the COUNT query, and 8-threaded is around .16s for the SELECT and .03s for the COUNT query). It still is a bit far off from my simple implementation though.

Out of curiosity, what are the differences between main and feature? I looked at the diff but it was quite large. I see the changes made in #12499, but are there any others that would impact this query?

from duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.