Comments (9)
I'm trying to implement this. During my impl, I found a problem that:
- Get random set size
- Get all indices, counting the random fetching indices
- Traverse the set
The command 1-3 should be done in one snapshot, otherwise, the case might be:
- Get all indices, size == 10
- Random need to access 10th member
- User delete an element
- 10th cannot be fetched
So, GetMetadata and traversing the set should be done in one rocksdb snapshot. Would you think this is safe and ok? @git-hulk @PragmaTwice
from kvrocks.
I'm wondering if it'd be better to sacrifice the scope of the candidate set to simplify the implementation like below:
- if size <= N(e.g. 128/256), then iterate all members and randomly pick one of them.
- if size > N, random among the first N elements if the cached random point is empty. Otherwise, seek from the cached random point and then random among the next N elements, then cache the random point again or remove the cache random point if it reaches the end.
from kvrocks.
if size > N, random among the first N elements if the cached random point is empty. Otherwise, seek from the cached random point and then random among the next N elements, then cache the random point again or remove the cache random point if it reaches the end.
I thought of using this previously, and this is named "statistics" in this issue. but where should we storing this and how do updating this?
from kvrocks.
but where should we storing this and how do updating this?
We're now using a LRU to cache the iterator key while scanning the DB/hash/set, maybe we can do this in the same way.
from kvrocks.
That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...
from kvrocks.
That's a bit complex 🤔, in my current impl I may just need to iter the set with one snapshot...
Yes, it should be more complex than the current implementation. Then I think the implementation with the same snapshot is good.
from kvrocks.
Yeah, another point is that support get meta by snapshot can enhance our isolation. It might slightly affect LSM-Tree Garbage collection, but I think it's ok
from kvrocks.
Aha: 95301c5#diff-4bae2c37c513c915c0528134abd47edff2a52833806777feb088b6d09990a74e
@git-hulk should we sample at most 60 keys in spop?
from kvrocks.
@mapleFU For Kvrocks, I feel good to only randomize part of them if there're too many keys.
from kvrocks.
Related Issues (20)
- Memory limits on connections HOT 2
- Enable the new data type encoding by default HOT 2
- Introduce the key encoding version HOT 3
- Make the default namepace name changable HOT 8
- Move the performance benchmark to website HOT 3
- Optimize the implementation of IntervalSet intersection HOT 2
- [Build] Add supports for build kvrocks with UBSAN
- [Tools] Add a git commit hook for clang-format checking HOT 3
- Add option to INFO to return a json payload HOT 13
- should we start the compaction_checker_range thread if it's not configed. HOT 3
- TLS test case seems broken
- Improve consistency and isolation semantics by adding Context parameter to DB API HOT 19
- Improve error handling and logging for Retryable IO Errors HOT 13
- Add support for the TOUCH command HOT 2
- Potential data inconsistency issues? HOT 2
- Transaction with FlushDB:DeleteRange unsupported in WriteBatchWithIndex
- A new version of search key and metadata encoding
- Is stream column family useful to be a seperate cf instead of merging into subkey cf? HOT 4
- [OSPP 2024] Tracking issues: Enhance Kvrocks Transaction Syntax
- Improve and unify the RESP error message HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kvrocks.