Comments (7)
Out of curiosity, what are the differences between main and feature? I looked at the diff but it was quite large. I see the changes made in #12499, but are there any others that would impact this query?
Currently, main
is the code leading up to v1.0.1, so it only contains bugfixes compared to v1.0.0. feature
will eventually be v1.1.0
, so it contains new features as well.
from duckdb.
Hard to say without looking at your custom code - but this query is not just a scan, it generates a large result set of strings. How does your custom code deal with generating the result set?
from duckdb.
The custom code returns a row store of the result (a std::vector of tuples of the selected type). For this example, it just selects a VariableText
class, which is a thin wrapper around std::string
.
This issue persists even with aggregations, e.g. instead of selecting a column, just doing COUNT(*)
, a simple counter loop is significantly faster than duckdb:
sf=15
duckdb (single): 0.459s, 0.355s, 0.354s
duckdb (8 threads): 0.0474s, 0.0496s, 0.0570s
my scan (single): 0.0147s, 0.0137s, 0.0152s
my scan (8 threads): 0.022s, 0.0214s, 0.0212s
(I know my multi-threaded scan is slower than the single-threaded, which is weird, I think there's just some std::thread overheads I haven't debugged).
I'm not sure why duckdb is so much slower than a sequential for loop over the data.
from duckdb.
Profiling this it seems most of the time is going into the extract(month from l_shipdate)
function - what does your implementation there look like?
from duckdb.
@rootjalex can you please try building DuckDB on the feature
branch and see whether it improved the performance of the queries in your issue?
from duckdb.
Profiling this it seems most of the time is going into the
extract(month from l_shipdate)
function - what does your implementation there look like?
Date is stored as 32 bits, 14 for the year, 4 for the month, and 5 for the day. Extracting a month looks like:
struct Date {
private:
// 14 bits for yyyy, 14 > log2(9999)
// 4 bits for mm, 4 > log2(12)
// 5 bits for dd, 5 > log2(31)
// guard bits are in between each
// yyyy - bit - mm - bit - dd
uint32_t value = 0;
static constexpr uint32_t YEAR_SHIFT = 11;
static constexpr uint32_t YEAR_MASK = 0b11111111111111 << YEAR_SHIFT;
static constexpr uint32_t MONTH_SHIFT = 6;
static constexpr uint32_t MONTH_MASK = 0b1111 << MONTH_SHIFT;
static constexpr uint32_t DAY_MASK = 0b11111;
public:
...
Date(uint16_t year, uint8_t month, uint8_t day) {
value = (((uint32_t)year << YEAR_SHIFT) & YEAR_MASK) | (((uint32_t)month << MONTH_SHIFT) & MONTH_MASK) | ((uint32_t)day & DAY_MASK);
}
ALWAYS_INLINE uint16_t get_month() const {
return (this->value & MONTH_MASK) >> MONTH_SHIFT;
}
...
};
I know this is significantly simpler than duckdb's implementation, but it should only be a small constant factor difference in runtime, and these differences don't appear to be (small) constant factors off
from duckdb.
@rootjalex can you please try building DuckDB on the
feature
branch and see whether it improved the performance of the queries in your issue?
Runtime was definitely improved (single threaded on sf=15 is around 1.15s for the SELECT and .2s for the COUNT query, and 8-threaded is around .16s for the SELECT and .03s for the COUNT query). It still is a bit far off from my simple implementation though.
Out of curiosity, what are the differences between main and feature? I looked at the diff but it was quite large. I see the changes made in #12499, but are there any others that would impact this query?
from duckdb.
Related Issues (20)
- odd `extract` interval behavior HOT 4
- INSERT OR IGNORE with Autoincrement primary key / sequence RETURNING wrong id; HOT 3
- duckdb segv seg faults depending on python import order
- Inconsistency between duckdb_settings() and duckdb_get_config_flag HOT 1
- Invalid Input Error: executemany requires a list of parameter sets to be provided
- datepart('quarter', INTERVAL) is broken HOT 3
- Branches of CASE statement reevaluate `random()` expression HOT 1
- Insert on conflict always returning nextval of primary key sequence HOT 2
- Execution time increasing when running a query multiple times HOT 4
- Changing precision prints incorrect value (number of leading zeros) when using precision {:.2} vs {:.3} HOT 1
- `make bundle-library` doesn't work on windows HOT 1
- Detecting data races in duckdb::Connection::Query() under ThreadSanitizer HOT 3
- Illegal Predicate Pushdown leads to Casting error HOT 1
- Wrong result/data loss when using Filter Pushdown and Union_By_Name HOT 3
- Query unnesting with grouping sets results in wrong output HOT 1
- IN/OR operator removes completely Parquet Filter Pushdown on other filters HOT 2
- Exporting to Pandas has incorrect column type for BOOLEAN and BIGINT when some entries are NA/NULL HOT 4
- hive_partitioning=True can interpret empty string values in directory as NULL
- Race condition in parallel installations of extensions HOT 1
- `DuckDBPyRelation.intersect()` (Python function API) implicitly performs `.distinct()` operation (i.e. deduplicates for set semantics)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from duckdb.