Giter VIP home page Giter VIP logo

fsdb's Introduction

fsdb project - file based database for partitioning and event sourced data

  • concept of file based database for partitioning and event sourced data
    • like a local file storage version of a kafka database
  • 20230930041146 fsdb developing ideas
  • run build script build-fsdb to build optimized components.
    • i tried to make the whole thing with just scripts, but it is just too slow to handle billions of rows.
  • fsdb can load roughly 30 million rows of real data per hour with a 12 core machine on a SSD
    • 20231004133128 an optimized hashcode generator for partitioning work into multiple processes

how data is stored

  • the first field of the data is regarded as an ID
  • data stored as TSV or CSV in two partition files:
    • a compressed file holds all existing data
    • an uncompressed file holds recently added data.
      • this data can be compressed and appended to the compressed file periodically as data is inserted so the overall size of the database doesn't grow too fast
    • data is stored in partition files named after the partition number
  • optionally store a timestamp with each row
    • timestamps in fsdb
    • should ignore it when printing unless provided with a command line option to print timestamp?
    • initialize database with -t to enable timestamps
    • search for timestamps using searchtime subcommand
  • bloom filter could be optional feature implemented with hooks

use cases

  • can use as large lookup table like dynamo db
  • 20231003062001 fsdb use case - using as a set
  • join with another file or stream piped to standard input. this is possible if the ID is the first column.

implementation

  • basic set of features / subcommands needed for database

    • initialize and set up number of partitions
    • search for one ID or multiple
    • search for IDs missing from the database - set difference
    • ingest data - pipe it into standard input and an awk script will put it where it belongs
    • print all data
    • compress subcommand - compress text files and append to gzip streams together. called by ingest when a partition gets too large
  • testing timestamps for data

awk '
BEGIN {
  print systime()
}
'
1696194880

zet/20230929145418/README.md

Related

Tags:

#data #file #database #project #shortcmd

fsdb's People

Contributors

nicholas-long avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.