Comments (1)
A few more notes - I think there was a deleted comment but I'll leave the reply:
The FixedString(26) version of the ULID still compresses reasonably well, since the extra bits are all zeroes. This optimization to get it down from 26 bytes to 16 is more about uncompressed memory usage and other benefits like fitting in 128 bits for memory alignment, as well as being able to go to/from UInt128 directly.
Another way to put it, it would be nice to be able to take any UInt128 and get a ULID string from it, which should be possible.
The ULID generation spec, where it's the 48-bit timestamp and 80 bits of randomness, happens to be a good way to generate well-behaved UInt128 equivalents which compress well. That's what I was trying to get at with the example here, showing how python-ulid can accept the max UInt128 and give this result. ClickHouse can do the same thing with something like UInt128ToULIDString, ULIDStringToUInt128.
Round trip examples:
select UInt128ToULIDString(340282366920938463463374607431768211455);
-- '7ZZZZZZZZZZZZZZZZZZZZZZZZZ'
select ULIDStringToUInt128('7ZZZZZZZZZZZZZZZZZZZZZZZZZ');
-- 340282366920938463463374607431768211455
select UInt128ToULIDString(0);
-- '00000000000000000000000000'
select ULIDStringToUInt128('00000000000000000000000000');
-- 0
The process of generating ULIDs would still behave the same as in the spec, so we get those benefits with locality and compression. The internal representation would just be those 128 bits with the behaviour built in to display as a ULID string.
select generateULID();
-- '01HYSRYCB8G8DF3DT60C0K9GX1'
select generateULID();
-- '01HYSRYCB8PNC1T5DA7JRRR66V'
If it's useful, could generalize this to do the same thing with UInt256 and strings that are twice as long but otherwise follow the ULID convention. It's a nice way to deal with such long numbers. Hypothetically the same style of generation would work too, with 48 bits timestamp + 208 bits random, but it's hard to think of a scenario where that would be needed. Could call that a ULID256.
from clickhouse.
Related Issues (20)
- Slow index analysis in case of long PK (even if most columns are not used) HOT 2
- encrypt/decrypt functions became 20x times slower after 24.4 HOT 2
- HTTP Interface returns 200 OK in case of server-side receive timeout while reading (parts of) the request body
- Autocast Strings to Int for Enum for JSON
- Why I do not need SELECT on all source tables to select from mv with PASTE JOIN
- Dual password support HOT 4
- Skip Index `set` is not used with IN operator HOT 1
- TopK with Counts HOT 1
- Crash in index analysis with tuple HOT 1
- If you query a directory with s3 or file-like engines, it should be identical to querying `*` inside it. HOT 4
- With analyzer on restriction `joined_subquery_requires_alias` is not working
- Add CTE support to UDF
- Cache dictionary + short circuit evaluation: Expected the argument №3 ('' of type String) to have 256 rows, but it has 240
- Load job "system.part_log" failed JSON exception: error: 1 unexpected end of data HOT 3
- `02572_query_views_log_background_thread` is flaky
- Add ENUM support to WITH FILL
- 23.11 segfault HOT 1
- FINAL on table with is_deleted works 10-30 times slower than expected HOT 3
- 24.4+ SSL config disableProtocols is overridden by requireTLSv1_2
- The data retrieved with the same query conditions fluctuates up and down, sometimes more and sometimes less! HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clickhouse.