Comments (7)
This can be done in several ways.
- groupArray -> arraySort -> arrayReduce
SELECT arrayReduce('functionA', arraySort(groupArray(x))) FROM R_table;
- materialized CTE #61086
WITH t AS MATERIALIZED (SELECT x FROM R_table ORDER BY x) SELECT functionA(x) FROM t;
from clickhouse.
@Alex-Cheng which aggregate function? https://fiddle.clickhouse.com/88a7eba5-01f5-41d9-9577-4865ca007def
Modern optimizer removes excessive order by if result of a query does not depend on it.
You may use limit
to overcome optimizer.
select functionA(x) from (select `x` from `R_table` order by `x`
limit toUInt64(-1) ------<<<<<<<<---
) settings max_threads=1
from clickhouse.
CH is not designed to do this, and you will face multiple problems and changes in the future because of this:
- CH is multithreaded by design, without any order guarantees. You can use
max_threads=1
- CH does not guarantee in which order it will read the data.
- CH does not guarantee the size of the batches. It's usually max_block_size, but it's also not guaranteed.
- CH also does not guarantee in which order the aggregation states will be merged.
You can try the suggested workarounds, but they might stop working at any time.
The only similar thing to what you are describing are window functions.
from clickhouse.
CH is not designed to do this, and you will face multiple problems and changes in the future because of this:
- CH is multithreaded by design, without any order guarantees. You can use
max_threads=1
- CH does not guarantee in which order it will read the data.
- CH does not guarantee the size of the batches. It's usually max_block_size, but it's also not guaranteed.
- CH also does not guarantee in which order the aggregation states will be merged.
You can try the suggested workarounds, but they might stop working at any time.
The only similar thing to what you are describing are window functions.
Yes, the approaches you suggests were used for resolving the problem, e.g. max_threads=1
and max_block_size=<very big number>
. However as same as you mentioned, it stops working after upgrading to 24.3. I understood that the aggregation functions should not depend on data order. However my aggregation algorithm requires data odering for my special requirement. I am wondering if there is same requirement that aggregation depends on data ordering?
Is my requirement too special?
Maybe I should transform the aggregation function into a regular function that accepts Array as inputs, just as mentioned by @canhld94 .
from clickhouse.
The approach suggested by @den-crane (thank you, it is very skillful workaround) is working and the solution requires the least workload. It could be a temporary solution for a while, and finally I need to think out of an alternative algorithm that does not depend on the order of input data.
from clickhouse.
@Alex-Cheng check the code of groupArray
.
I think there is (or was) some trait which allows to mark a function as dependent of data order.
Earlier groupArray 100% respected the order of input blocks. And even now it's mentioned in the doc: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/grouparray
I think this trait forbids the optimizer to change a plan (remove orderbys) or something like this.
from clickhouse.
Earlier groupArray 100% respected the order of input blocks.
With important exception of GROUP BY in external memory AFAIK
from clickhouse.
Related Issues (20)
- "Amount of memory requested to allocate is more than allowed" in Variant + Map type HOT 1
- Cannot assign requested address when trying to insert data to the s3 storage HOT 1
- Is there a way to avoid cancelling merges when running ALTER TABLE DROP PARTITION?
- Superfluous query cancelled exception HOT 3
- `test_move_partition_throttling` santizier assert HOT 2
- Net Exception: Socket is not connected logs error logs with every insert HOT 10
- Clickhouse 23.12.6.19 randomly turned to readonly HOT 17
- Cannot perform insert in readonly mode
- `02362_part_log_merge_algorithm` is flaky
- map inside if statement with default empty map value fails when accessed through distributed table HOT 1
- Analyzer: Bad cast from type DB::FunctionNode to DB::ColumnNode. (LOGICAL_ERROR)
- Variant type: "Invalid Field get from type Null to type Array"
- Support dot syntax access field for Map type column HOT 1
- Select Query Raise Error: Size of filter doesn't match size of column HOT 2
- `select count(distinct xx) from tab` does not support external storage ? HOT 2
- Antlr g4 file insertStmt cant parser SQL correctly. HOT 2
- Performance of Alpine vs. Ubuntu based docker images of Clickhouse: any difference? HOT 4
- tcp_keep_alive_timeout seems not set for MySQL connection
- Clickhouse remote table function returns incomplete results HOT 4
- `key16` aggregation method should be two level
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clickhouse.