Pebble currently implements compaction heuristics that match the RocksDB <a href="http

perf: investigate rent-or-buy compaction heuristics about pebble HOT 3 OPEN

cockroachdb commented on August 20, 2024

perf: investigate rent-or-buy compaction heuristics

from pebble.

Comments (3)

petermattis commented on August 20, 2024

The original message for this issue is about compaction heuristics, but there is also the compaction policy/strategy to consider. Pebble currently implements a tiered+leveled strategy where L0 has N tiers and L1 through L6 use a leveled compaction strategy. This is the same default strategy that RocksDB uses. There appear to be better strategies that dominate this tiered+leveled strategy for both read-amplification and write-amplification (though space-amplification suffers). The balanced rent-or-buy strategy is an example of this. The benefit to write-amplification can be very large. The table below compares tiered+leveled vs lazy-leveling vs balanced-rent-or-buy:

levels       tiered+leveled         lazy+leveled balanced-rent-or-buy
     2               8192.5               8192.5                119.9
     3               8193.5               4096.5                 33.8
     4                342.8               4097.5                 18.2
     5                 76.9               2049.5                 13.2
     6                 73.9               1367.2                 10.1
     7                 36.6                912.3                  8.5
     8                 26.8                913.3                  7.9
     9                 25.8                458.4                  7.0
    10                 22.4                306.1                  6.4
    20                 21.7                 15.8                  4.4
    30                 31.5                  7.7                  3.8

We'd never actually want to use 30 levels, but lazy-leveling would require that to get competitive write-amplification numbers. Note that the numbers above are from an append-only simulation. Still to do is to extend the simulation to account for updates and a steady data size.

Chapter 6 of Steven Jacobs' PhD thesis goes over various compaction (merge) policies.

from pebble.

msbutler commented on August 20, 2024

Here's an explanation of Bigtable’s balanced rent or buy compaction policy and a proposal for applying it to Pebble.

BigTable Rent Or Buy Explanation

To best understand, first free your mind of Pebble. Consider a storage engine that organizes potentially overlapping SSTs on a stack. The overlying db will flush SSTs to the top of the stack, and the engine will serve reads by iterating through these SSTs. There’s no notion of levels, just files in a stack

To prevent unfettered growth of the stack and huge read amplification, the engine compacts files on the stack. To chose files for compaction, the engine pops files off the top of the stack until a size threshold is met. The engine then merges those files and pushes the new file back on to the stack. This implies that newer files get compacted more frequently and older files get compacted iff all newer files are also involved. So, if the file at the bottom of the stack gets compacted, the stack reduces to one file after the compaction (assuming no flushes occur during the compaction).

The engine’s high level goal is to minimize the total compaction cost (write amp) while maintaining the constraint that after a compaction, at most K files remain on the stack (maintain read amp upper bound). So, if before compaction, the stack had K+J files, the engine will compact at least J files. A compaction’s cost is measured as the total length of files involved in a compaction.

Now, how does the engine choose how many files to compact? At a high level, the engine considers the following tradeoff: the more files it pops in a given compaction, future write and read amp decrease; however, the merge becomes more expensive. This tradeoff illustrates the rent or buy strategy: buying is analogous to conducting a big expensive compaction to lower future costs in write / read amp; while renting is the opposite.

Here’s the actual algorithm:

Assume there are K + J files in stack. To maintain the K file constraint, the engine will certainly compact the Kth file and all J files above it. The engine must decide to include the K-1th file:
- Don’t include the K-1th file, if the current compaction (without K-1) will cost less than the compaction that created K-1th file. Execute Compaction.
- Else, include the K-1th file in the compaction, and consider adding the K-2th file, if the current compaction cost is greater than the candidate file's original compaction cost. Note: for the general case, the current compaction cost is equal to the cost of ALL previous compactions to the right of the candidate file. The engine could continue recursing until all files compact into one! Given tombstones, the cost of a compaction isn’t necessarily the size of the file created.

Here’s a screenshot from a slide deck that illustrates the algorithm in action. As you can see, each file size roughly exceeds the sum of the file sizes to the right. (The slides explain the algorithm starting from the bottom of the stack instead. A “big merge” involves all files in the stack).

Applying To Pebble

Suppose we always chose a file to compact from L0, we could then find the cleanly overlapping files in LBase (and a few more in L0) as we do now, and then consider adding overlapping files from LBase+1, LBase+2, ..., using a similar metric as above. Specifically:

Consider files in candidate level i that overlap with the keyspace of the current compaction. Sum the compaction cost incurred to create these candidate files, C^cand_i
Add all candidate file(s) if C^cand_i < C_curr, the compaction's current cost, equal to the sum of ALL compactions required to create the compaction's current files and their ancestors. So, for each file, we must maintain two new pieces of information: the cost of the previous compaction to create it, and a running sum of all compactions required to create this file and previous files that merged into it.

To maintain a target size ratio across levels, the compaction heuristic must be slightly tweaked (i.e. reduce space amplification at the cost of write amp). Add a scalar A_i to the heuristic: Add files from level i if C^cand_i < A_i*C^curr. I haven’t worked out how to compute A_i yet, though I can if this general approach is worth pursuing!

Overall, this proposal looks like a multi input compaction strategy with a new heuristic and a constraint that the input file is always L0. #136

from pebble.

github-actions commented on August 20, 2024

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

from pebble.

perf: investigate rent-or-buy compaction heuristics about pebble HOT 3 OPEN

Comments (3)

BigTable Rent Or Buy Explanation

Applying To Pebble

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent