Comments (3)
I wonder if the grander idea is to slowly build out a "config diagnostics console" that eventually can evolve into an autotuner.
E.g. hashtable load factor is the metric behind the problem in the issue title. For segment size, we want a max object size to segment size ratio, or internal fragmentation %.
There are generally two ways to approach this objective, one is self-contained, such as codify in Pelikan some intelligence that runs as a little state machine (or ML agent if we want to sound trendy) that "scores" the main configuration values based on the metric they are responsible for, like the few mentioned here. The other way is to outsource that work to an external entity, and simply curate a stream of events/logs to provide as data. In both cases though, I suspect the trigger and frequency of the internal action(s) will be somewhat independent of debug logging.
Given we have a very generic and flexible logging backend, we can potentially create a new log type to support this functionality, and gate the logging differently too (e.g. evaluate and/or log hash table load factor when we have to allocate a new hash bucket as overflow, only calculate internal fragmentation when the most recent write wasted more than X% of segment space) to keep it very lightweighted.
from pelikan.
That'd be a big improvement. I wonder if in the interim we should just adjust some of our default values. Maybe making those values match what we currently have in the example config? The current default is hash_power = 16
with an overflow factor of 1.0 - so effectively we have only 114688 item slots if launched without a config file.
I guess as an alternative, we could make the config file be a mandatory argument.
My biggest immediate concern is that the "no config provided" defaults are so conservative that it's easy to run into problems.
from pelikan.
Agree on improving the current default. I think lacking a config file, I will probably base default values on a presumed average key-value size, e.g. 1KB (we can add an internal constant called TARGET_OBJECT_SIZE
. So if people ask for 4GB of data memory, we assume we will have 4 million objects.
Related (but no action needed now), we have target size for read and write buffers. The current value (16K) agrees with the 1KB object size with moderate pipelining, but we can eventually provide a calculator and config generator that produces a config that sets multiple parameters based on a few key assumptions, such as object size, object life cycle (creation rate and desirable TTL), concurrency level, etc that map closer to users' mental model of caching.
from pelikan.
Related Issues (16)
- Publish on crates.io HOT 5
- Add basic Redis compatible backend
- Add storage library to support datastructures
- Add support for ssd/nvme storage
- Add support for tiered storage
- Add support for drop-in replacement of guava HOT 4
- Add support for unix domain sockets
- `Set` command option's like 'EX 1000' and NX failing to parse
- benchmark: part of threads never stopped when run with memtier_benchmark HOT 8
- Add fuzzer for RESP protocol HOT 1
- Resp protocol parser crashes when given a command ending with a newline HOT 1
- Question about segcache eviction policy HOT 22
- Segcache integration test is flaky within MacOS CI HOT 1
- Handle `SIGINT` in segcache HOT 6
- Building Dockerfile results in missing protoc error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pelikan.