<a href="https://github.com/petermattis/pebble/blob/31f4d2c3b39e95b7027b2c72bb2701995a

I believe our merges are associative <p di

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

iterators deviating from RocksDB merge semantics in reverse iteration about pebble HOT 11 CLOSED

tbg commented on August 20, 2024

iterators deviating from RocksDB merge semantics in reverse iteration

from pebble.

Comments (11)

petermattis commented on August 20, 2024

I haven't looked closely at the code, but in forward iteration we merge: ((T3 T2) T1). In reverse iteration we merge (T3 (T2 T1)). At least, that is the idea. This requires the merge operator to be associative. I thought that was called out somewhere in the doc comments, but perhaps it isn't. RocksDB allows non-associative merge operators.

Since floating point operations don't commute, this would lead to different values observed depending on the direction of iteration.

Maybe that's OK, but it seems very awkward. OTOH, buffering all of the internal versions of a key for a merge is, too.

Buffering all of the versions is quite awkward. My feeling is that it is better to restrict what a merge operator can do.

from pebble.

tbg commented on August 20, 2024

At least, that is the idea. This requires the merge operator to be associative.

This restriction makes sense. Alas, floating point operations are not associative neither, so there's a problem with our usage. Or perhaps I missed something and we're not actually using floating point ops in the merges.

from pebble.

petermattis commented on August 20, 2024

This restriction makes sense. Alas, floating point operations are not associative neither, so there's a problem with our usage. Or perhaps I missed something and we're not actually using floating point ops in the merges.

I believe our merges are associative because they only concatenate floating point arrays and don't actually perform any floating point operations. Additionally, I believe we never use reverse iteration to retrieve timeseries data.

from pebble.

tbg commented on August 20, 2024

Additionally, I believe we never use reverse iteration to retrieve timeseries data.

That's correct, but means little more that we wouldn't hit this inconsistency.

I believe our merges are associative

I'm not sure. There's so much code that I'd be surprised.

Either way, I'm also not sure anything needs to be done here other than keeping it in mind. I agree that the merge operators pebble supports should be associative, but it also seems that the semantics in the absence of that are only vaguely undefined.

from pebble.

petermattis commented on August 20, 2024

I believe our merges are associative

I'm not sure. There's so much code that I'd be surprised.

I'm pretty sure we had to make timeseries merges associative as before that we ran into subtle issues that were caught by the replica consistency checks. See MergeTimeSeriesValues. I'm not quite sure why full merges are handled differently than partial merges.

Either way, I'm also not sure anything needs to be done here other than keeping it in mind. I agree that the merge operators pebble supports should be associative, but it also seems that the semantics in the absence of that are only vaguely undefined.

I wonder if we can detect a non-associative merge operator, perhaps only in race-enabled code or something like that.

from pebble.

petermattis commented on August 20, 2024

@ajkr Do you have any thoughts on this issue? It sure is nice that Pebble doesn't need to buffer all of the values for merging as RocksDB does. But the downside is a subtle difference in behavior between forward and reverse iteration. My preference is to document this subtlety. What are your thoughts?

from pebble.

ajkr commented on August 20, 2024

Hm it is difficult to tell whether that merge operator is associative or not. I would guess it is associative after reading the code, even considering that the merge logic behaves differently depending on whether it's doing a full merge or a partial merge. My guess is that code path is an optimization but we'd have to delete it and test to verify.

Side note: We should migrate to FullMergeV2 as it avoids making copies of merge operands.

from pebble.

ajkr commented on August 20, 2024

I'm not sure about documentation being sufficient for having an API that behaves non-intuitively (I may have done that before :)).

Also how does it know when to start merging when scanning backwards? Consider the below situation:

x#40,kTypeMerge: +1, x#30,kTypeValue: 2, x#20,kTypeMerge:+2, x#10,kTypeMerge: +3

If we're scanning backwards we'll first see merge ops +3, +2 and merge them together to get +5. Then we'll see the base value 2, and since what we've computed so far became irrelevant do we throw it away? Then we continue and merge +1 with 2 to get a final value of 3.

from pebble.

petermattis commented on August 20, 2024

I'm not sure about documentation being sufficient for having an API that behaves non-intuitively (I may have done that before :)).

Yeah, that's why it is useful to have someone else sanity check the API.

The commentary around full vs partial merges is hardly intuitive either. I wonder what an intuitive merge API looks like. My instinct is to leave the order of merge operations unspecified so that the user doesn't rely on that leaving flexibility to the implementation.

Also how does it know when to start merging when scanning backwards? Consider the below situation:

x#40,kTypeMerge: +1, x#30,kTypeValue: 2, x#20,kTypeMerge:+2, x#10,kTypeMerge: +3

If we're scanning backwards we'll first see merge ops +3, +2 and merge them together to get +5. Then we'll see the base value 2, and since what we've computed so far became irrelevant do we throw it away? Then we continue and merge +1 with 2 to get a final value of 3.

That is exactly what happens: a kTypeValue entry "resets" the merge buffer overwriting whatever it contains.

from pebble.

ajkr commented on August 20, 2024

Hm it is difficult to tell whether that merge operator is associative or not. I would guess it is associative after reading the code, even considering that the merge logic behaves differently depending on whether it's doing a full merge or a partial merge. My guess is that code path is an optimization but we'd have to delete it and test to verify.

I tried commenting out the lines specific to partial merges. Then all merges took the same code path that formerly only full merges took. Ran the tests in "pkg/storage/engine" and "pkg/ts". They all pass so I still think it is associative.

The commentary around full vs partial merges is hardly intuitive either. I wonder what an intuitive merge API looks like. My instinct is to leave the order of merge operations unspecified so that the user doesn't rely on that leaving flexibility to the implementation.

Got it. My intuition is totally biased from working on RocksDB, but my default would be merges start from the base value and work their way towards newer values. That could be achieved by repeatedly calling a function that takes a pair of arguments - starting with the base value (which could be a Put or the oldest merge operand) and the next oldest value (which would be a merge operand). Or it could be enabled by passing a list of values to FullMerge as long as the order is specified and the list includes the base value.

To me that feels right because people might want to use merge operator like in the JSON example here: https://github.com/facebook/rocksdb/wiki/Merge-Operator#associativity-vs-non-associativity. In that example the base value and merge operands held different kinds of values, so it's a bit strange. But it also helps with cases like Tobias mentioned where all values are floating point but still the operator cannot be associative.

After writing it out, I don't see much problem with losing the ability to support those cases, so don't have a strong opinion either way. To me it'd be mildly surprising to receive merge operands in different orders depending on which way the iterator goes, but certainly I could learn to work with it :).

from pebble.

petermattis commented on August 20, 2024

To me that feels right because people might want to use merge operator like in the JSON example here: https://github.com/facebook/rocksdb/wiki/Merge-Operator#associativity-vs-non-associativity. In that example the base value and merge operands held different kinds of values, so it's a bit strange. But it also helps with cases like Tobias mentioned where all values are floating point but still the operator cannot be associative.

The JSON example seems pretty esoteric to me. I don't mind not supporting it in Pebble. It isn't something we need in CockroachDB and that is the bar for inclusion: "Pebble intentionally does not aspire to include every feature in RocksDB and is specifically targeting the use case and feature set needed by CockroachDB".

After writing it out, I don't see much problem with losing the ability to support those cases, so don't have a strong opinion either way. To me it'd be mildly surprising to receive merge operands in different orders depending on which way the iterator goes, but certainly I could learn to work with it :).

In RocksDB, if a merge operator supports PartialMerge then the merges can be processed in an arbitrary order depending on compactions.

I'm going to improve the documentation around the merge operator, and then call it a day on this issue.

from pebble.

iterators deviating from RocksDB merge semantics in reverse iteration about pebble HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent