Giter VIP home page Giter VIP logo

valeriansaliou / sonic Goto Github PK

View Code? Open in Web Editor NEW
19.4K 198.0 549.0 2.56 MB

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

Home Page: https://crates.io/crates/sonic-server

License: Mozilla Public License 2.0

Dockerfile 0.07% Rust 98.46% Shell 0.87% JavaScript 0.60%
rust infrastructure search index graph database server backend search-server search-engine

sonic's Issues

Make all commands ASYNC (dispatch to worker thread pool)

Currently, commands are processed synchronously in the channel thread. This limits parallelization on setups that open very few Sonic Channel instances, but where Sonic runs on a lot of CPUs. Currently 1 channel = 1 thread; but we'd like to make the channel protocol fully asynchronous and thus Sonic will be able to dispatch commands for work to a thread pool.

We just need to figure out how we can rework the protocol so that a RES for a REQ can be caught later on (eg. with a marker ID as we already do for search queries with PENDING?).

Black-list some locales by default + make it configurable

Jav: Javanese
Orm: Oromoo
Hau: Hausa
Kur: Kurdish
Yor: : Yoruba
Uzb: Uzbek
Igbo: Ibo
Ceb: Cebuano
Tgl: Tagalog
Mlg : Malagasy
Nya : Chewa
Kin: Kinyarwanda
Zul: Zulu
Som: Soomaaliga
Ilo: Ilokano
Uig: Ouïghour
Hat: Haitian Creole
Aka: Akan
Sna: ChiShona
Afr: Afrikaans
Run: Ikirundi
Tuk: Turkmen
Epo: Esperanto

FST-based search term correction

A per-collection + per-bucket FST would help complete incomplete words when they cannot be found as-is in the RocksDB key-value store. Eg. search "Bapt" and it would auto-complete to indexed word "Baptiste".

If there's not a sufficient number of results in the partial-word search, attempt to complete search results with the FST auto-completion.

Configuration checker on boot

Some configuration values should be checked and compared between each other on daemon boot, and would result in a panic!() if there was a configuration mistake.

Eg.: FST consolidate after should be lower than inactive after, otherwise some FST graphs may never be consolidated in due time as their store will get expired before that.

Re-introduce SUGGEST (core + libraries)

With the use of an FST, word-based search suggestion will now work once again.

Beware: limit SUGGEST to word-based suggestion, and not term-based suggestion.

Store all database keys in binary format (no UTF-8, please)

UTF-8 format of a key (current, bad but legible!):

  • [IDX@BASE10-UTF8]:[BUCKET_HASH@BASE36-UTF8]:[ROUTE_HASH@BASE36-UTF8]

Binary format of a key (planned, better but less legible):

  • [IDX@u8][BUCKET_HASH@u32][ROUTE_HASH@u32]

The key generator would simply concatenate the key bytes all together (1 byte + 4 bytes + 4 bytes); reading the key would be as simple as doing a byte offset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.