LSMRangeDeletes

Making range deletes better in LSMs

Problem Statement

Searching for deleted items on the disk can be time-consuming due to the storage of range deletes in SSTable files. Initially, metadata is read from the file to check for the existence of a specific key. If a key is deleted, a range deletion can be found in an exclusive area within the file, but this requires an additional disk I/O operation to read the page containing range deletes. However, if the range deletion is in the buffer, no extra I/O operations are needed to access the pages containing the range deletes.

Determining the presence of deleted items on the disk can lead to time wastage because range deletes are stored in SSTable files. The process involves first reading metadata from the file to ascertain whether a particular key exists in the file. To determine whether a key is deleted, we need to look for a range deletion in a specific area within the file, which requires an additional disk IO operation to read the page containing the range deletion information. However, if the range deletion is present in the buffer, no additional IO operations are required to access the pages containing the range deletes.

To determine the existence of an entry, there is a possibility of having to open and read the file unless it is blocked by a fencing index and bloom filter. Unfortunately, fetching data from disk is significantly slower than fetching from memory, resulting in a slowdown of query workload.

Solution "Per-Level Range Delete Filter"

RD can be either stored on each level (PLRDF) or the top level (TopLevel-RDF). When range deletes flush down to the disk, range deletes are merged and stored on to level 0 vector. Later, when compaction happens, corresponding range deletes are moved and merged to the next level vector in PLRDF. In TopLevel-RDF, range deletes are stored in the top level only, and those whose ranges intersect with the newly inserted files are removed. When either flush or compaction happens, range deletes from the top per level which are newer remove the obsolete entries from the files it compacted to on the lower level. In this way, if an entry is overlapped with a range delete tombstone in a file, it must be newer than the range delete tombstone. Sequence numbers are not required because if there are overlapping files across two levels, files on the lower level are newer, and if an entry and range deletes co-exist in a file, the entry is sure to exist.

None. Original rocksDB without RDF, performs similarly to PLRDF, while it searches for overlapping range deletes first each time before searching a specific key in a file.
PLRDF. Range deletes are stored on each level and moved down to the next level during compaction. When a point query is blocked on a certain level, it indicates that the key doesn’t exist on the levels below, requiring disk access on the current level, just the same as storing range deletes inside SST files. It performs a key search in a file first, if the key is found it returns the key directly before it tries to search for the overlapped key in the same file.
Split-PLRDF. Whenever new entries arrive at a certain level, ranges that overlap with the newly inserted entries are split. This ensures that an entry existing on a certain level won’t be covered by ranges in the RDF. However, this approach results in a larger memory footprint due to fragmented ranges.
TopLevel-RDF. RDF is limited to the top level, such that it blocks a query if an entry doesn’t exist in the database. It performs a split function on all incoming entries, leading to a larger memory footprint.
Skyline-RDF. All the range deletes inserted are maintained in this single RDF. Keys are first found before comparing with this RDF. Because we need to differentiate whether a key with an arbitrarity sequence number is deleted, each range within this RDF needs a sequence number too, so that it can tell whether a key is truly deleted. Due to the post-filtering, it spent more effort than the previous RDFs, where if a range deleted was found on a certain level, originally it shall directly return but now it keeps searching for the key until a key is found or reaches the last level.
- PLRDF: Range deletes are stored on each level and moved down to the next level during compaction. When a point query is blocked on a certain level, it indicates that the key doesn’t exist on the levels below, requiring disk access on the current level, just the same as storing range deletes inside SST files.
- Split-PLRDF: Whenever new entries arrive at a certain level, ranges that overlap with the newly inserted entries are split. This ensures that an entry existing on a certain level won’t be covered by ranges in the RDF. However, this approach results in a larger memory footprint due to fragmented ranges.
- TopLevel-RDF: RDF is limited to the top level, such that it blocks the query if an entry doesn’t exist in the database. It performs the split function on all incoming entries, leading to a larger memory footprint.

shubham-sudo / lsmrangedeletes Goto Github PK