Giter VIP home page Giter VIP logo

calibur's People

Contributors

jmpotato avatar little-wallace avatar tennyzhuang avatar w41ter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

calibur's Issues

Feature: do not acquire mutex for snapshot

Every time we create a iterator we will acquire lock for version_set and it may be block by some other background thread.
So we need a thread-local snapshot ptr or an atomic ptr to avoid acquire mutex.

Feature: Support compression algorithm for block-based table

Description

To save disk space, RocksDB will compress the data by block by some popular algorithm such as LZ4 and ZSTD. To make code simple to understand, I think we only need to implement these two algorithm at first, to be compatible with current data formats of RocksDB. Of course, maybe we can find a better file format and a better compression algorithm in the future, but it is not for now.

pread returns EINVAL in linux

Run simple example in PR #6: cargo run --example simple_example, and program will failed in Engine::open. The detail errno is EINVAL.

According to man 2 read:

EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the file offset is not suitably aligned.

I noticed that get_current_manifest_path will invokes FileSystem::read_file_content, which eventually invoke AsyncRandomAccessFile::open, and O_DIRECT is added to open, but there doesn't seem to be any alignment in read_file_content.

Feature: Support prefix-seek and seek bound

Description

RocksDB can create bloom-filter with prefix of keys. And when user want to seek some key, RocksDB can tell whether this key is found when the prefix of which seek key match the first key in DB.

RocksDB can give a bound to iterator so that iterator would not skip too many tombstone.

Feature: refactor compaction picker

Decription

Now we will sort all files every level by priority and only take the higher score level. But it may be some case that the highest score level is during compact job and we can not pick any file. So that we need to skip the highest level to find another level to compact

Feature: Support multi thread to finish L0 compaction

Feature Description

Here we only use one thread to compact file from L0 to base level. But for level style compaction, the compaction job which will merge multiple files from L0 to base level, must run with only one job. If we only use one thread, this job will be slowly. So RocksDB will split this job to multiple range and every thread run one range to speed up.

module

  • compaction
  • db.rs run_compaction_job

Skills

  • You need know well about compaction of LSM Tree.

Feature: Support block-cache to avoid frequent IO requests.

Feature Description

In order to complete the prototype of the database as quickly as possible, I did not design the cache before, which would cause all requests to directly access the disk data. I hope a rocksdb-like block-cache but if anyone else has a better opinion, I'll gladly accept it.

Module

  • add a new module cache.
  • table. Create the cache object in table_factory and send it to every TableReader.

Feature: Remove wal files when flush has been finished

Description

As a LSM Tree engine, all the data will be persisted in write-ahead-log files by an append-IO, and then they will be applied to structure in memory, which we called memtable. And when the data in memtable are flushed on disk, we could remove the wal files to release disk space.

Design

  • How to decide whether a log file could been removed? We have keep a log number for each of column family. It means the max number which has been persisted on disk. The minimal number for all the column families, is the last log, in which data has been flushed to SST on disk.

Feature: using table cache to avoid read all sst files when open DB

Problem Description

To quickly finish engine, we open all file when they are generate or open all files when open DB.
But if most of files will never be access because they are cold data, it is not a good idea to store the filter-block in memory.

Design && Work Item

  • add buffered read to avoid multiple io when open a file.
  • a thread-safe lru-cache structure for both table-cache and block-cache.
  • refactor filemeta and read sst interface because we need to access table id at first and then get the file from cache.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.