Giter VIP home page Giter VIP logo

welo's Introduction

HLDB

A peer-to-peer database protocol

Summary

HLDB can be used to build local-first applications. It is best suited for social/collaborative applications that do not require consensus.

Each peer has their own copy of the database called a replica. The peer's local replica is used as the source of truth. Updated remote replicas are merge with the local replica to see the new state.

In this way, the applications are edge-computed by the participating peers. Applications designed this way give users more control with potential to make large scale database breaches a thing of the past.

Encryption?

There is no encryption built into the protocol yet.

Access Control

Currently only write access can be controlled and is not able to be updated for now. Access is controlled and enforced by correct peers on their own replicas.

Papers

At the core of the database replica is a Merkle-CRDT. This type of CRDT satisfies BEC, byzantine eventual consistency. This property ensures SEC and that any number of faulty replicas cannot affect correct ones.

These are two papers the foundation of the protocol are built on:

  1. Merkle-CRDTs: Merkle-DAGs meet CRDTs
  2. Byzantine Eventual Consistency and the Fundamental Limits of Peer-to-Peer Databases

Specification

The protocol specification can be found in hldb/specs.

Implementations

Name Language
welo typescript

welo's People

Contributors

github-actions[bot] avatar saul-jb avatar tabcat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

tabcat saul-jb

welo's Issues

generate docs with typedoc

Use typedoc to generate api documentation.

The default setting renders html, I would prefer markdown added to /docs.

Refactor to use Libp2p directly

Have been avoiding using the libp2p provided by ipfs because it is not available when using ipfs via the ipfs-http-client. Recently have cared less about this use-case.

The two pieces to be refactored to use libp2p directly are:

The peer monitor will benefit from being able to listen for updates instead of polling.

Replacing the Keychain with libp2p.keychain will nullify headaches around the KeyChain not being exported in the first place.

In the further future, will be able to use libp2p protocol handlers to build the BEC push replicator.

design and spec of live replicator

Spec:

design overview:

  • join libp2p pubsub channel for finding live replicator peers for the database

    • on peer join attempt to join a one-to-one purpose pubsub channel with them
    • on peer join one-to-one purpose channel
  • on received heads advertisement

    • if heads advert has unknown CIDs
      • request CIDs and traverse valid entries
    • if heads advert CIDs are all known
      • implementation may choose to advertise local heads or not on that one-to-one channel
  • on change to replica heads:

    • advertise to all open one-to-one pubsub channels

not sure if one-to-one channels should be used by multiple databases.

Improve Tests

  • Separate unit and integration tests
  • Export test utils

Browser Support

Browser support has not been added yet.
It should be straight forward.
Not many Node APIs are in use currently.
Most of the work will be running tests in the browser and adding them to CI.

chore: cleanup scripts, filenames, types

Focus here is on cleaning up the package scripts, file structure and names, their defined types, and setting up linting again.

A lot of the scripts were copied from ipjs when the plan was to write everything in js with types in jsdoc. This has not forked, especially after the move to typescript.

  • remove unused package scripts
  • clean up src and test
    • refine the types
    • make type interfaces for manifest components
    • rename manifest component files to match their component.type names
    • fix linter errors

Status - Oct 2022

This month is focused on adding a live replicator, benchmarks, and the first published release (alpha).

Still need to finish local persistence for database from last month. Will try to finish before Monday.
I anticipate this area to be reworked a few times before finding a good, general solution for databases.

Going to try to have a replication demo by the 10th.

From last month:

This month:

  • #17
  • #20
  • test replication and replicated states
  • write benchmarks
  • automate release with generated API docs and changelog
  • hldb/specs#2
  • release alpha with expected public API changes

๐ŸŽƒ

Status - Nov 2022

November focus is on building a Zzzync replicator, planned Opal tasks are a bit sparse.

IPFS Camp 22 was the end of last month and has set back development a bit. There are a few things to catch up on in November. Most things are small and can be done together in a day. Other things are larger but mostly finished like the draft spec. And the largest thing to catch up on will be the live replicator.

Tracklist:

  • #20
  • #27
  • write benchmarks
  • automate release with generated API docs and changelog
  • hldb/specs#2
  • release alpha with expected public API changes
  • Test Opal + Zzzync interopt

๐Ÿฅง

add pubsub peer monitor

The Shared Channel for the Live Replicator needs to be able to see peers joining and leaving. Since this is not provided by the libp2p API directly, pubsub.getSubscribers will need to be polled.

Blocks abstraction from ipfs.block api

Will be creating an abstraction for working with ipfs.block api. This api sounds like it would take and return Block instances but instead it takes byte arrays. The abstraction is needed to simplify specifying the cid version and byte encoding used so that the CID returned is correct. It will also make encoding and decoding easier by providing default codec and hasher options.

Status - Dec 2022

December focus is on heavy testing, perf and reliability, and then documentation and a beta release.

Tracklist:

  • #34
  • automate release with generated API docs and changelog
  • release alpha with expected public API changes
  • #37
  • Test Opal + Zzzync interopt
  • Network simulated testing with testground
    • stress-test and benchmark replicators
    • check for replication bugs and perf improvements
  • Write base FAQ.md document for common questions to be added
  • Basic Tutorial document added to repo or blogged
  • NodeJS and Create React App examples
  • Release 1.0-beta with public API locked until 1.0

๐ŸŽ

Cleanup testing utilities

Right now the tests are kind of thrown together and pull in and use things like IPFS and Identity in a very messy way.

Plans:

  • provide preset IPFS configs
  • identity fixtures
  • storage/keychain fixtures
  • remove ipfs repo data, no reason to commit it.
  • remove identities/keychain saved data after testing

Design use of Storage Abstraction

When creating/opening a database a storage api will be provided. The api can be used by the database and its components to read/write persisted data. Each component will have control over its own datastore.

Benchmark Welo

It's important to write benchmarks to understand the capability of the system and to track the effect of changes.

Will add benchmarks with benchmark.js and track them with github-action-benchmark.

  • #43
  • #44
  • #45
  • #46
  • #47
  • add benchmark script to package.json
  • github-action-benchmark pr comment
  • github-action-benchmark gh pages chart

Keyvalue updates read and write to storage

Now that the replica/graph state is kept on disk the state of the database's replica is available immediately. The same needs to be done for the keyvalue stores index. When completed the index should be able to be queried immediately, as opposed to loading and processing the entries again.

Behavior:

  • read the store index on start without loading entries into memory
  • compute new index and write root hash to storage (not doing perf improvements yet)

rewrite manifest module registry

The registry contains components. The components are referenced by a key in a database manifest. If the components referenced by the manifest are registered, the database can be opened and SEC is ensured.

The plan is to use something like protocol ids for registration and resolution so that components can be upgraded since versioning is included. Also the prefix will make sure that components of different categories are not mixed up, e.g. a store component is not registered as an access component.

Base Feature Set

Tracking base feature set for Opal's 1.0-beta release in December 2022.

  • Locally persisted Database replicas
  • Easy Custom Database States
  • Pubsub Heads Exchange Replicator
  • #65
  • IPLD Schema Validation (if ready in javascript)
  • Automated Release: ci, change log, and api docs

Opal Milestones:

Status - Sept 2022

Many of the project files have just been committed. Almost every source file has a unit tests for it.

Bi-directional iterative traversal of the merkle-dag has been implemented and tested thoroughly (although there is always more room for testing; especially crucial components like this). This is done by keeping all known entry cids inside of an adjacency list with incoming and outgoing links being tracked. Source files with respective unit tests:

Two key features still need to be added. They are persistence and replication. This means that currently the utility of Opal currently is only mutating local states in memory. The mentioned two features are key and are the priority this month.

On adding persistence, the goal will be to also allow for opening databases in O(1) time this includes the time before being able to edit the database. To do this the states of the a few different components will need to be persisted, specifically:

  • index: the reduced state to allow for immediate reading of the current state of the database.
  • graph: an adjacency list which includes reverse resolutions for all links in the merkle-dag.

Unfortunately to this kind of usability will rely on more stateful-ness, at least as I understand the problem now. What is nice is that these states may be able to be verified as correct/up-to-date potentially by references hashes. In which case the components generating and editing those stats can be sure that everything is working.


Focus this month is building a solid project foundation:

prep for adding features

Rework and polish some things before adding features like local database persistence.

  • #11
  • build interfaces for manifest modules
  • rework classes to use Libp2p's startable interface
  • rework store module and use in database class
  • fix tests
  • npm audit fix

refactors and cleanup

  • Replace src/storage with a util that just makes handling Datastores easier. (does the same thing but just as a util instead of a class.
  • Swap out node's EventEmitter with the standardized EventTarget

move to Typescript

All cypsela projects from now on will probably use typescript. It's becoming widely adopted and improves the process of writing javascript. Maybe types in javascript will be standardized in the future.

Iterative and Concurrent Traversal

The traverser needs to be turned into an async iterator.
The reason this hasn't been done yet is mainly time and difficulty.
The traverser needs to be able to traverse efficiently for ordered and unordered traversal
so there is no duplicate code.
Fetching and caching entries ahead of when they are needed would be nice.

IIRC the difficulty is doing this with the using the yield statement.
May need to make a custom AsyncGenerator

  • Traverser Function returns Async-Iterator
  • Replica.traverse returns an Async-Iterator
  • Store API consumes Async-Iterator
  • IPFS/Pubsub Heads Exchange Replicator consumes Async-Iterator

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.