(copied from <a href="https://bitbucket.org/eyalroz/db-kernel-testbench/issues/163/sup

Support segmentation and non-segmentation in more decompression kernels about libgiddy HOT 1 OPEN

eyalroz commented on September 28, 2024

Support segmentation and non-segmentation in more decompression kernels

from libgiddy.

eyalroz commented on September 28, 2024

For the DICT scheme, we'll need to choose between uniformity and flexibility.

In the uniform extreme of the spectrum, we'll have:

Fixed size dictionaries
Fixed element size per dictionary
An actual new dictionary copied in for every segment of the compressed data (even if it's very similar to the previous segment's dictionary)

And in the flexible extreme (or close to it) we'll have:

A variable-length, and variable-width, array of dictionary entry data
For each segment, a dictionary descriptor:
- An indication of where the dictionary begins in the variable-length dictionary data
- The dictionary's length (number of entries)
- (Possibly) The dictionary index size in bytes or in bits; this could theoretically be deduced from the dictionary's length - but that depends too much, perhaps, on the decompressing software's capabilities
  ... and note that a segment might simply refer to the same dictionary as its predecessor; or we might even allow it to expand its predecessor's dictionary by starting at the same place and extend further.

I'm leaning toward the more flexible extreme.

from libgiddy.