Giter VIP home page Giter VIP logo

Comments (13)

Ryanfsdf avatar Ryanfsdf commented on August 20, 2024 1

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

Hi Andrew, I initially stated in my PR that {Lower,Upper}Bound would be ignored but I later revised it so that {Lower,Upper}Bound are both respected. So your assumption that it would both use bloom filters and not return keys at or above the upper bound is correct.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

from pebble.

petermattis avatar petermattis commented on August 20, 2024

@tbg This is done, right?

from pebble.

tbg avatar tbg commented on August 20, 2024

from pebble.

petermattis avatar petermattis commented on August 20, 2024

I think this would make a reasonable starter project. What remains to be done is to implement IterOptions.Prefix. The comment above that field says:

// If Prefix is true, the iterator will only be used to iterate over keys
// matching that of the key it is first positioned at. If the Comparer was
// supplied with a user-defined Split function and bloom filters are
// enabled, this allows for improved performance by skipping SSTables known
// not to contain the given prefix. The iterator will not properly observe
// keys not matching the prefix.
//
// TODO(tbg): should an assertion trip if the first key's prefix is unstable?
// TODO(tbg): should Prefix override (or sharpen) {Lower,Upper}Bound? When
// we see the first key, we get the prefix and a separator which should be
// a good {Lower,Upper}Bound.
// TODO(tbg): unimplemented.
// Prefix bool

from pebble.

Ryanfsdf avatar Ryanfsdf commented on August 20, 2024

@petermattis @tbg

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

In terms of First() and Last(), I’m in favor of leaving them unimplemented. One use case of First() and Last() for prefix iterators is that we may support these methods only if Seek() has already been called, in which case we can let First() return the first key with the prefix and Last() return the last key with the prefix. However, this would change the function contract to require a call to Seek() before calling First() or Last(), which I don’t particularly like. Supporting this would further increase the divide between regular Iterators and Prefix Iterators, in which case we should create a new class PrefixIter entirely, instead of making it an option.

from pebble.

petermattis avatar petermattis commented on August 20, 2024

I'm wondering if IterOptions.Prefix should even be required, or if the existence of Split(Prefix extractor) should indicate that all iterators for the Reader use prefix iteration. This seems to be the approach that RocksDB is taking now (as stated here in the “What’s Changed” section). I don’t like the idea of Options.Comparer.Split being a prerequisite for IterOptions.Prefix, since setting Options.Comparer.Split already communicates that prefix iterators should be used.

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

The benefit that IterOptions.Prefix offers is that we can still do full key iteration without creating a new Reader, which comes at the cost of the clarify of Options and IterOptions. I don’t think this benefit is worth the cost, but I’m open to suggestions.

Most of the time, the Reader is the DB, so "creating a new Reader" is not even an option.

Also, I think {Lower,Upper}Bound can be ignored for prefix iterators, since I don’t think there’s a use case for specifying {Lower,Upper}Bound when iterating through a specific prefix. As mentioned here, it seems that CockroachDB never sets {Lower,Upper}Bound for prefix iterators anyways.

Agreed.

In terms of First() and Last(), I’m in favor of leaving them unimplemented.

Agreed.

from pebble.

ajkr avatar ajkr commented on August 20, 2024

Note that we don't want all iterators to be prefix iterators as most scan operations need to find multiple keys, not just keys matching the prefix of the seek key. In CRDB, we specify ReadOptions.total_order_seek to disable prefix iteration. My preference is for prefix iteration to be the opt-in mode. @ajkr looks like he had input into the new RocksDB API. Perhaps he can shed light on why it was chosen.

I don't recall discussions around the total_order_seek / prefix_same_as_start API. It must've been decided a very long time ago. I do not know the reasons for it either, but forgetting to set total_order_seek=true is incredibly common, indicating the API needs improvement. Fortunately it'll be getting the attention it deserves soon via making use of iterate_{lower,upper}_bound together with the Seek / SeekForPrev targets to automatically infer whether the user is doing a prefix scan or larger range scan.

from pebble.

petermattis avatar petermattis commented on August 20, 2024

@ajkr How do you feel about the proposed API in this PR: Iterator.SeekPrefixGE? This makes it clear that prefix iteration is being requested, and allows to switch between prefix and non-prefix iteration using the same iterator.

from pebble.

ajkr avatar ajkr commented on August 20, 2024

Taking a look now. Sorry I got addicted to the release problem for too long this week.

from pebble.

ajkr avatar ajkr commented on August 20, 2024

It's a great API. I can barely think of how to misuse it, but let's see :).

The only part I didn't get is the discussion around ignoring UpperBound when SeekPrefixGE is used. Let's say I want to lookup user key "a" with timestamp > 100 (the timestamps are ordered descending per our comparator). My intuition would be to set UpperBound = "a@100" and call SeekPrefixGE("a@MaxTimestamp"). I'd expect it to both use bloom filter and not return keys at or above the upper bound.

from pebble.

ajkr avatar ajkr commented on August 20, 2024

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

from pebble.

petermattis avatar petermattis commented on August 20, 2024

Do we plan for UpperBound to be dynamically changeable? libroach currently changes iterate_upper_bound before a Seek (which might or might not be safe), so I wonder whether we'd need to use Pebble in that way as well.

However, in the current state of Pebble, setting UpperBound requires creating a new iterator (Which doesn't seem ideal if we want to set UpperBound on the fly, like what you mentioned is done in libroach).

I had forgotten that libroach was adjusting the upper-bound for an iterator on the fly. Let's hold off on supporting that in Pebble for now until I figure out why that is being done and if there is a better approach.

from pebble.

Ryanfsdf avatar Ryanfsdf commented on August 20, 2024

This was fixed by #139.

from pebble.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.