Currently, commands are processed synchronously in the channel thread. This limits parallelization on setups that open very few Sonic Channel instances, but where Sonic runs on a lot of CPUs. Currently 1 channel = 1 thread; but we'd like to make the channel protocol fully asynchronous and thus Sonic will be able to dispatch commands for work to a thread pool.
We just need to figure out how we can rework the protocol so that a RES for a REQ can be caught later on (eg. with a marker ID as we already do for search queries with PENDING?).
I guess we can improve further on data compact-ness, and make better use of XxHash, as well as reducing the maximum cardinality of the database system to gain 50% RocksDB space when storing IIDs.
In case ngram is not confident-enough, scan for the locale that has the most stopwords for given terms text. Restrict scan to ngram-detected alphabet, as this is 100% reliable.
This will fix the short-text undetectable locale problem, as there are not enough words for the ngram algorithm to perform reliably.
A per-collection + per-bucket FST would help complete incomplete words when they cannot be found as-is in the RocksDB key-value store. Eg. search "Bapt" and it would auto-complete to indexed word "Baptiste".
If there's not a sufficient number of results in the partial-word search, attempt to complete search results with the FST auto-completion.
Some configuration values should be checked and compared between each other on daemon boot, and would result in a panic!() if there was a configuration mistake.
Eg.: FST consolidate after should be lower than inactive after, otherwise some FST graphs may never be consolidated in due time as their store will get expired before that.
Binary format of a key (planned, better but less legible):
[IDX@u8][BUCKET_HASH@u32][ROUTE_HASH@u32]
The key generator would simply concatenate the key bytes all together (1 byte + 4 bytes + 4 bytes); reading the key would be as simple as doing a byte offset.