celestiaorg / celestia-node Goto Github PK
View Code? Open in Web Editor NEWCelestia Data Availability Nodes
License: Apache License 2.0
Celestia Data Availability Nodes
License: Apache License 2.0
During sync discussions, we've touched upon the need for Nodes to differentiate peers by their Mode/Types. To accomplish this we need to concretely define those types.
// node.Type defines a type for Nodes to be differentiated.
type Type uint8
const (
// node.Full is a full-featured Celestia Node.
Full Type = iota + 1
// node.Light is a stripped-down Celestia Node which aims to be lightweight while preserving highest possible
// security guarantees.
Light
)
// String converts Type to its string representation.
func (t Type) String() string {
if !t.IsValid() {
return "unknown"
}
return typeToString[t]
}
// IsValid reports whenever the Type is valid.
func (t Type) IsValid() bool {
_, ok := typeToString[t]
return ok
}
// typeToString keeps string representations of all valid Types.
var typeToString = map[Type]string{
Full: "Full",
Light: "Light",
}
NOTE: Instead of Type there could be Mode
Move providing outside of the PutBlock
function and into its own.
Only the data availability header and access to the context routing IPFS api is needed in order to provide to the IPFS DHT, so it is possible to decouple providing from the PutBlock
. This gives us the advantage of having more control over when we provide and when we do not. This has already been done in the PR that inspired this issue, celestiaorg/celestia-core#427, but should likely be decoupled from that PR into its own.
PutBlock
Another interesting suggestion brought up by @Wondertan is to use ipld-prime. Particularly, the IPLD selectors look extremely cool and relevant to what we are doing. E.g. they could be used to download a range of leafs that belong to a particular namespace at once (#221).
Note: If it turns out much simpler to use ipld-selectors instead of manually adding logic to traverse the NMT for #5, we should prioritize this issue. Let's keep in mind though that ipld-prime is a relatively young project and it might be better to wait until it matures and stabilizes a bit further.
We should have software-level support for different network types, e.g. celestia-devent
or celestia-mainnet
In upcoming #57 this is solved by simply extending Config with a network type field.
The network type should be hardcoded in binary, as different network types usually run over different software versions, and allowing users to manually change network type is a source of bugs coming from users joining a network with a software version we were not expecting.
We should be prepared for the floodgate of opened issues opened by us and others as the development progresses after time
I do like the idea what tendermint has in its main repo and it will help us to automate the triaging process for the future tickets
Here is the list of initial templates(could be more in the future if needed) :
Detailed information for each of the point:
No. 1 - Similar to tendermint's bug template
No. 2 - If the user can see that a new feature is beneficial for the product, he/she/they can propose it here
No. 3 - StackOverflow like questions specific to the repo (can be moved to another place if neccessary)
No. 4 - If the user finds out that the existing feature should be reworked/changed/improved using lib/architecture, then he/she/they can do it here
No. 5 - if none of 4 above fits the user's need
For now, it seems like No. 2 and No. 4 can be merged into 1 template. Nevertheless, as the product grows, it will be hard and intense in manual human-labour for the team/contributors to triage what requests are "freshly new features" and what requests are "outdated/should be updated"
Implement SharesService
in package shares
.
Implement:
Share
SharesService
The SharesService
contains the following methods / functionality:
Start
Stop
For ShareExchange
, a light
node only requires the ability to request shares and perform sampling (random share requests across random peers), and store the shares they receive, while a full
node should be able to serve those share requests as well as perform requests.
ShareStore
only applies to light
nodes as full
nodes store full blocks, so if the node is a full
node, ShareStore
should be disabled.
type Service interface {
GetShare(ctx context.Context, dah header.DataAvailabilityHeader, row, col int) (Share, error)
GetShares(context.Context, DataAvailabilityHeader) ([]Shares, error)
GetSharesByNamespace(context.Context, DataAvailabilityHeader, namespace.ID) ([]Shares, error)
Start(context.Context) error
Stop(context.Context) error
}
We need a way to be able to embed a Core node process into the Full
node so we don't have to start a Core
node separately and pass in its RPC endpoint to the celestia node on initialisation. Ideally, via either the Node
or the rpc
component, we can control the lifecycle for the Core
node (Start/Stop).
This would not only help with testing, but is the desired end-goal for devnet (is to default run a "trusted" Core
node alongside a Full
node). This will of course be optional as we also have the goal to implement Full
<> Full
communication instead of only relying on a trusted Core
node, but for now, our source of NewBlocks is from Core
nodes and we must have some way of spinning them both up simultaneously with as little configuration on the user-side as possible.
Implement the ability for a Celestia light
or full
node to request account information from a Celestia Core node via gRPC for the purposes of submitting a transaction.
Optional: implement SubmitTx
throttling for DoS protection for the trusted Celestia Core node.
Implement an RPC server in package rpc
that will handle inbound requests from users to submit transactions. This feature should be available for both light
and full
nodes.
/submit_tx
-- should trigger an AccountQuery
in the node to get the state for given account in order to submit a tx
/submit_tx_sync
which would be blocking, vs /submit_tx_async
Implement a basic Node
structure that contains node-specific information, for example:
The following should be implemented:
Basic go linting / testing pipeline. Will add to it later when we need.
Application Clients (e.g. from ORUs) need a way to simply download all data specific to their namespace (application sovereignty).
Add a simple library that traverses the NMT and returns all data of the requested namespace ID.
It might be valuable to consider adding an RPC endpoint for this too. E.g. for block explorers who want to serve Txs of certain applications (or ORU chains), too.
We allocate plain byte slices extensively throughout the project repos and after some work with them discard allocated slices causing GC to clean them up. In some hot paths, like whole block processing flow, this surely causes additional pressure on both allocator and GC. Fortunately, there is a relatively simple trick to avoid that and reduce stress in general for all data allocations by reusing fixed-sized allocated buffers through sync.Pool
as basic primitive, though we can rely on already existing libs with simple APIs not to reinvent the wheel.
I would like to provide real numbers, but this is very application-specific, so for us, they could be very different. For my previous project,
htop
showed almost 2x RAM usage reduction after applying this to multiple places.
Has fewer stars and less used within Go comm, but has a smarter API. Actually, we already using it as we rely on IPFS/libp2p, thus using it directly would allow us to share allocated bytes within IPFS and LL code, what is generally good. I would just stick with this one.
From the gopher who created
fasthttp
. BTW, he is from Kyiv and I know him personally. He is obsessed with optimizations.
pls add more
Implement node configuration functionality in a separate config
package or under the node
package (I have no preference):
./node init
Config
structure (we will be using TOML as that is what Tendermint uses)Another kind of fraud proofs that we need to implement, are data availability fraud proofs: nodes need to be able to generate proofs in case they observed an invalid erasure coding.
Details are laid out in:
Ideally, the implementation would be accompanied with a brief ADR.
Tasks
Make sure github actions does not have admin rights over this repo (or any celestiaorg repos for that matter).
Current RetrieveBlockData implementation spawns a goroutine per each share in ExtendedDataSquare. In the worst case with the max block 128X128 that is 16384 goroutines. This causes the race detector to stop working in tests, as it has a limit to 8k routines. Furthermore, this a lot of routines for a single operation, and in fact most of the time they are idle.
Why routines are idle. Shares in a block are addressed and committed with multiple Merkle Trees, so when we request a whole block we need to walk through all the trees. The walking step is: request blob of data by its hash(1), unpack block(2), check if we unpacked more hashes or a share(3), and proceed with unpacked hashes onto the next step recursively(3). Those steps are executed until no more hashes left and we have all the shares. From this, we can see that every walking step is a full roundtrip - network request, unpack, request again, and so on. Now, imagine those 16384 goroutines which walk down the same trees. In practice, those routines mostly wait for every single roundtrip to finish and then they compete to initiate the next roundtrip. Competing, in this case, is useless and we need to avoid it.
Instead of spawning goroutine per share, we should spawn a routine per tree we need to walk through, particularly per every DAHeader root. This way, we don't have any competing routines and only one routine initiates all ther roundtrips. In numbers, for the largest block that would be 128 routines per Row DAHeader root and 256 routines if want to fetch and store all the inner nodes of trees.
NOTE: All the shares are addressed and committed twice, with two Row and Column Merkle Trees, thus it is not required to traverse both trees to fetch the shares, but again, if we need to store all inner nodes of both trees(cc @adlerjohn) than both tree should be traversed.
// TODO
// TODO
As already mentioned, the original spark of the issue is the race detector complaining about exceeding the limit of 8k routines. The solution for that was to detect if the race detector running and simply skipping the test 😬. Later, after some discussions, celestiaorg/celestia-core#357 was created. And the solution for it is being implemented here celestiaorg/celestia-core#424.
Also, supersede celestiaorg/celestia-core#278
There is a plan to remove DAHeader and use only one root hash to address and commit to all the shares. In such a case, we would not need to spawn any routines at all. The routine calling the RetrieveBlockData needs to be blocking anyway and it can then take care of initiating all the round trips.
CI often fails due to lint timeout exceeded
, like here: https://github.com/celestiaorg/celestia-node/pull/52/checks?check_run_id=3554301525#step:3:73
We already merged a pr (#49) to increase the timeout by 1 minute, but that didn't fix it.
Broadcast
/broadcast_tx_async
(non blocking)/broadcast_tx_sync
(blocking)We haven't decided which db to use in order to store erasure coded blocks in Full
nodes. I think this warrants a small ADR on its own (which can also be combined with our decision on how to store blocks, how many to cache, etc.)
From what I remember at the offsite, we discussed using badgerdb. I'd like to start a thread here and then have either me or @Wondertan write up the final decision in an ADR.
For obvious reasons, we have to choose a way of logging things. Instead of arguing on which library to use, let's first agree with the most convenient way of using a logger for us. Currently, I see only two options, but feel free to give others(NOTE: Each option would provide identical output.):
package foo
// def
type FooService struct {
...
log log.Logger
}
func (fs *FooService) Foo() {
fs.log.Debug("Foo fooed!")
}
// usage
func logic() {
toplvlLogger := log.Logger("")
fs := &FooService(..., toplvlLogger.With("foo"))
fs.Foo()
}
// def
package foo
var log = logging.Logger("foo")
type FooService struct {
...
}
func (fs *FooService) Foo() {
log.Debug("Foo fooed!")
}
// usage
func logic() {
fs := &FooService(...)
fs.Foo()
}
Personally, the second way seems much cleaner to me and my favorite, as it does not require us to pass the logger, everywhere, all the time.
WDYT?
Ideally, we want to connect only to nodes that run some lazyledger node, too.
IPFS nodes connect to a large set of nodes by default. That's cool but we want our nodes to only connect to a set of nodes that 1) load the LL ipld plugin and 2) run some other portion of the lazyledger software (light client, (full) node, archival node etc).
The project is intended to grow with a variety of components and services in it. To remove the time and mental overhead of writing and updating node initialization logic, where we build and order all the components ourselves, we should delegate that to a DI. There are multiple options, but Uber's one is solely based on reflection and does not involve code-generation. Even reflection is considered to be slow, the slowness manifests only on node build time, not the runtime.
celestiaorg/celestia-core#375 disables providing in Bitswap here. Unfortunately, this options also disables reproviding.
Reproviding is basically re-execution of providing. Another important conceptual part of DHT is that its entries are not persistent and cleaned up through time, thus there should be some logic that automatically renews entries on DHT.
As celestiaorg/celestia-core#375 disables it automatically we need to enable it manually. Luckily though, we can implement our own strategy for reproviding which only reprovides roots and not all the cids in blockstore, as IPFS excessively do.
Importantly, this is required for the network to work properly, otherwise recently proposed blocks won't be available after ~12 hours. Thus, this needs to be part of celestiaorg/celestia-core#381
Implement some kind of interface for the Node
so that any other application using the Node
object and/or local CLI could remotely interact with the running daemon by using the same standardised interface.
Components:
Node
Sub-issue of #25
Implement NewBlockEventSubscription
such that requesting the following information from the following Celestia Core endpoints is supported:
Request:
/block
endpointCurrently, we are following the pattern of celestia-core and tendermint to include docs in the repo in here, but ideally, docs/specs/adrs should be extracted into a separate repo.
Before incentivised testnet, we should have a set of test plans that contain such scenarios:
In order to put out the tests above into a regular run cadence/schedule we need to do the following:
There are multiple things that Celestia Node needs to store:
With the purpose of encapsulation, sealing all on-disk footprint from Node let's introduce a Repository. It should manage the root directory, versioning, categorizing, and grouping of any generated or user data to be stored on disk.
// Note that other things can be done later
type core.Repository interface {
Config() (Config, error)
PutConfig(*Config) error
}
type node.Repository interface {
Keystore() (keystore.Keystore, error)
Datastore() (datastore.Batching, error)
Core() (core.Repository, error)
Config() (*Config, error)
PutConfig(*Config) error
Path() string
Close() error
// Optional
DiskUsage() (uint64, error)
SetAPI(string) error
GetAPI() (string, error)
}
Implement BlockService
in a separate block
package:
RawBlock
-- a "raw" block received from Celestia Core (that is not erasure coded)ErasuredBlock
-- an erasure coded blockBlockService
Start
Stop
NewBlockEventSubscription
-- ability for full
nodes to "subscribe" to new RawBlock
s from Celestia Core via RPCBlockExchange
-- ability for full
nodes to request / send their other full
node peers RawBlock
sErasureCodedBlockStore
Note that full
nodes will be able to ask for RawBlock
s from either a trusted Celestia Core node (that is running simultaneously with the full
node), or from other full
node peers. Full
nodes will learn of new blocks via a header announcement either through ExtendedHeaderSub
(in which they will be notified by their other full
node peers), or via announcement from their trusted Celestia Core node via RPC.
Describe the high-level architecture of the Celestia node and its different components (and then start the implementation accordingly).
This could be several ADRs:
And more granular descriptions of:
@renaynay @Wondertan feel free to amend / edit.
I recommend using cobra as the maintainer is directly affiliated with golang and is generally more active in maintenance as opposed to urfave/cli.
Requirements:
celestia light init
celestia light start
celestia full init
celestia full start
--config
flag on initlight
mode are grouped under command ./celestia light --help
, as well as other issue areas beyond the node type like metrics
, utils
, etc.)Nice to haves:
/ ____/__ / /__ _____/ /_(_)___ _
/ / / _ \/ / _ \/ ___/ __/ / __ `/
/ /___/ __/ / __(__ ) /_/ / /_/ /
\____/\___/_/\___/____/\__/_/\__,_/ ```
Currently, we spin up a full ipfs node which is kinda bloated for what we are trying to achieve.
@Wondertan brought up some very good alternatives:
We should carefully understand the pros/cons before we jump into either of these. I'm leaning towards the first approach as it is the most light-weight and gives us more freedom on how to interact with ipfs. It would especially be the most reasonable approach if we ever go all-in and replace the tendermint p2p stack with something libp2p based. This is also related to the work over at optimint: https://github.com/lazyledger/optimint/labels/C%3Ap2p
Consider using graphsync instead of bitswap
Note that this currently has low priority. Only in case bitswap performs unacceptably for our case (which does not seem like it does: celestiaorg/ipld-plugin-experiments#9 (comment)), we need to reprioritize this before launch.
Celestia-node is p2p centric and requires integration of some libp2p services/components listed below:
Also, supporting Config fields for each component should be added.
The node types section only states who (which node type) is able to generate proofs of invalid erasure coding. Nowhere is explained what happens after generating them. This leaves a lot of room for interpretation. Who should care about those proofs and how will they be propagated (to who)?
Consumers of fraud proofs would be anyone that just does DAS. I guess that could be made more explicit.
Also, do they trigger slashable events? If yes, who will be slashed?
In terms of what penalities fraud proofs are associated with, that's more of a consensus/evidence concern.
These should be separate issues that should be handled successively:
related issue about evidence types: celestiaorg/celestia-specs#23
also related: celestiaorg/celestia-specs#110
This is out of scope for devnet, but we should think about how we want Share
pruning to work for light
nodes. Our goal is to keep light nodes truly light, so we shouldn't impose large storage requirements on them.
Questions:
SharesByNamespace
) or via sampling? If so, why?For ShareExchange and DASing to work we need to use IPLD implementation of NMT, which is currently located in Core. The Node needs to use it as well, so we need to extract the plugin with supporting functions into a separate repo to be used by both Core and Node repos.
ipld
package from celestiaorg/celestia-core#427Partial storage nodes are nodes that only store some of the blocks in the blockchain, and can be queried by any other nodes (including light clients and other partial nodes) to download data from parts of the chain.
There's two main questions:
I propose a method of answering the above, with a scalable "tree-based" approach.
Let's assume a network-wide constant MIN_GRANULARITY = 1000 blocks
where MIN_GRANULARITY
is the minimum number of consecutive blocks you can advertise that you are storing to the network (which we call a "blockset"), and constant BASE = 10
. We call a range of blocksets a "blockrange" (e.g. blockrange 0-10K consists of blocksets 0-1K, 0-2K, ..., 9K-10K). We can organise the blocksets into a directory structure, where each directory has BASE
number of subdirectories (blockranges) or files (blocksets). Let's say there's 10 million blocks in the chain, the directory would look as follows:
0-10M/
├─ 0-1M/
│ ├─ 0-100K/
│ │ ├─ 0-10K/
│ │ │ ├─ 0-1K
│ │ │ ├─ 1K-2K
│ │ │ ├─ ...
│ │ │ ├─ 9K-10K
│ ├─ 100K-200K/
│ ├─ .../
│ ├─ 900K-1M/
├─ 1M-2M/
├─ .../
├─ 9M-10M/
Each subdirectories (blockranges) or files (blocksets) would be its own network topic. For example, a topic could be 0-10K
(blockrange) or 0-1K
(blockset). The network has the following interfaces:
GetPeers(topic)
returns some IP addresses for peers that have advertised that they are serving the blockrange/set for topic
.Advertise(topic, ip)
advertises that a node with IP address ip
is serving the blockrange/set for topic
.The above operations might be expensive or time-consuming. Therefore, depending on how many blocks and blockranges there are in the network, partial storage nodes may only advertise up to a certain height of blockranges, and likewise clients querying the nodes might only try to get peers from a certain height of blockranges. Let's assume a client-side variable GRANULARITY
, where GRANULARITY >= MIN_GRANULARITY
, on both partial storage nodes and client nodes.
When a partial storage node wants to call Advertise()
on blockranges that it's serving, it will only do so on blockranges that have a greater granularity than GRANULARITY
. For example, if a partial storage node is serving blocks 0-1M, and GRANULARITY = 100,000
, the it will call Advertise()
on 0-1M, 0-100K, ..., 900K-1M, but not 0-10K, ..., 9K-10K, etc.
Similarly, if a client wants to download data in block 1500 for example, the deepest blockrange it would try to GetPeers()
for is 0-100K. One can also construct different algorithms to find peers, using a top-to-bottom approach. For example, the client can first call GetPeers()
on blocks 0-10M, but if no node is storing 10M blocks, it could then try calling GetPeers()
on blocks 0-1M, and so on.
This would allow the network to self-adjust the acceptable data in each shard, depending on how big blocks are or how much storage resources partial nodes have.
Note: GRANULARITY
is a client-side variable that can be adjusted automatically by the client itself based on its success on downloading blocks at different granularities. On the other hand, MIN_GRANULARITY
and BASE
are network-wide variables that have to be agreed network-wide as part of the p2p protocol.
An alternative to a subnet-based peer discovery approach is an approach where there's only one network of partial storage nodes, that have status messages that represent which blocks they have. Partial storage nodes would have the following interface:
GetStatus(GRANULARITY)
where GRANULARITY >= MIN_GRANULARITY
returns a bit field where the index of each bit in the field is a blockrange corresponding to GRANULARITY
, and on-bit means that the node has the blocks in that blockrange.For example, if a GetStatus(1M)
is called in a chain with 10M blocks, and the partial storage node is only storing blocks 1M-2M, the bit field would be as follows:
0100000000
^
|
blockrange 1M-2M
In the fraud proofs paper, the client picks the (x, y) co-ordinates, but the node decides whether to return a response from the row or column root.
In the current implementation, the client also decides whether the response should be from a row or column root.
We should consider the security implications to this if, any.
Implement a Service
interface that will represent all of the "services" that will be constructed and registered on the node via DI (such as dig). I recommend this gets implemented in the node
package rather than a separate package, but I am open to hearing arguments for separating it from node
.
The interface should contain at a bare minimum the following behaviours:
Implement a simple RPC client on the Node
that can be started and stopped, but can dial specific endpoints on the Celestia Core node like /block
and get the raw block from Celestia Core.
We can either import the rpc interface from tendermint or implement our own.
The Data Availability model we use requires data discovery. We rely on IPFS's Kademlia DHT, which basically allows any network participant to find a host for a certain piece of data by its hash.
To describe the way we use it, let's introduce a simple pseudo-code interface for it:
interface DHT {
// Find the nearest peer to the hash and ask him to keep a record of us hosting the data.
// By default, records are stored for 24h.
Provide(hash)
// Find peers hosting the data by its hash.
FindProviders(hash) []peer
// Periodically execute Provide for a given hash to keep record around.
Reprovide(hash)
}
When a block producer creates a block, it saves it and calls Provide
for every Data Availability root of the block, making it discoverable and afterward available. After, any other node that wants to get the block's data or validate its availability can call FindProviders
, detect the block producer, and finally access the block data through Bitswap. The block producer and block requester also call Reprovide
. Overall, with the described flow, we aim for maximum confidence that data of any particular block is always discoverable from peers storing it.
The current state of the implementation does not conform to the flow above, and these things are left to be done:
Records of someone hosting data are stored on peers selected not by their qualities but by the simple XOR metric. Unfortunately, this eventually makes different light clients store those records unreliably, as they are not meant to be full-featured daemons. Therefore, some data may become undiscoverable for some period of time.
We need to ensure providing takes less time than the time between two subsequent block proposals by a node. Otherwise, DHT providing wouldn't keep up with block production, creating an evergrowing providing queue. Unfortunately, for the standard DHT client, providing can take up to 3 mins on a large-scale network.
From this also comes a rule - the bigger the committee is, the more time the node has to proceed with providing. So naturally, the larger the network, the larger the committee is, and the larger the providing time, so altogether, these can overlap organically, not causing any issues. But if we still observe slow providing time being an issue, full routing table DHT client for block producer would be a solution as it significantly drops providing time.
To validate a block's availability, we need to randomly sample some of its parts by requesting them from the network. Network requests must have a timeout not to wait for a response infinitely. Also, the nature of the IPFS software does not specify any timeout for data requests for its users, thus not to wait endlessly, we have to specify an adequate limit for sample response time.
I defined a random timeout for the request in 1 min that is not backed up by any rationale.
I am not entirely about the proper way for finding ideal timings here, but I think we need to bench medium, numbers over a real-world environment and add on top slightly more additional time for inaccuracies.
A build command that can be executed with such kinds of params
ValidateAvailability
currently panics if the number of samples is greater than the squareWidth**2.
This behavior should either be changed (implicitly use the min(numSamples, squareWidth) for the actual number of samples), or, the documentation should be improved that caller is responsible to ensure that the number of samples meaningfully depends on the block size.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.