vulcanize / eth-block-extractor Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
As a developer running the block watcher,
I want to be able to sync to the most recent block by default,
So that I do not need to know and specify an ending block number
As a developer using the eth-block-extractor,
I want to be able to read ethereum data off of IPFS from this project,
So that I don't need to run any other processes to use the data.
NOTES
IPFS doesn't natively know how to parse ethereum data structures. We can get around this by running a separate program to register parsers for the data structures we're persisting, but it'd be nice to be able to execute that work internally from this project.
Might also want to consider just returning a raw byte array and letting the user do their own parsing.
This story should probably be broken up into separate ones for each data structure as we get closer to crossing that bridge.
As a developer running the createIpldsForStateTrie
command,
I want to get some sort of feedback about the state of the execution within a block,
So that I have an idea of how things are going when syncing a block for which there are very many state/storage trie nodes.
NOTES
Right now, the sync is performed at the block level (all state/storage trie nodes are fetched, and then all are persisted to IPFS). This means that I can let the program run for a long time without seeing any feedback (and, furthermore, it means that I can lose all progress toward constructing a block's state/storage trie if I halt execution). Perhaps there's a way to intermittently halt and resume the process of iterating through these tries, within a single block, to persist already fetched data and provide feedback about the state of the execution.
The current dag putter for receipts uses the Receipt.EncodeRLP()
method which encodes only the consensus fields:
// EncodeRLP implements rlp.Encoder, and flattens the consensus fields of a receipt
// into an RLP stream. If no post state is present, byzantium fork is assumed.
func (r *Receipt) EncodeRLP(w io.Writer) error {
return rlp.Encode(w, &receiptRLP{r.statusEncoding(), r.CumulativeGasUsed, r.Bloom, r.Logs})
}
// receiptRLP is the consensus encoding of a receipt.
type receiptRLP struct {
PostStateOrStatus []byte
CumulativeGasUsed uint64
Bloom Bloom
Logs []*Log
}
I'm wondering if we should be publishing the ReceiptForStorage
instead:
// receiptStorageRLP is the storage encoding of a receipt.
type receiptStorageRLP struct {
PostStateOrStatus []byte
CumulativeGasUsed uint64
TxHash common.Hash
ContractAddress common.Address
Logs []*LogForStorage
GasUsed uint64
}
I think we want that TxHash
but I need to give it some more thought. Opening this so that if anybody has some insights, please share!
Maybe something like eth-block-extractor
in preparation for tm-block-extractor
and ln-db-extractor
As a developer running the block watcher,
I want to be able to compute historical state trie nodes,
So that I do not need to be running an archive node to create IPLDs for state trie nodes.
As a developer computing historical state,
I want to be able to compute the historical state trie without maintaining previous tries in memory,
So that performance improves for larger numbers of blocks.
NOTES
Right now the operation to compute historical state must begin at a state root for which the trie already exists in the db (e.g. the genesis block). After we've computed the state for subsequent blocks, we should be able to discard the trie associated with the root from earlier blocks.
As a developer running the createIpldsForStateTrie
command,
I want to be able to see execution complete more quickly,
So that I can sync the state/storage tries for more blocks in less time.
NOTES
Is there a way to more efficiently fetch every node in the blocks state trie?
Could we cache nodes known from previous blocks and only fetch those known to be new?
As a developer running the eth-block-extractor,
I want to be able to fetch ethereum data over the RPC,
So that I can execute my extraction while my node is running.
NOTES
Currently, syncing requires connecting to a cold db. This optimizes for performance but disallows execution as a node is running. Adding an optional --rpc
flag would enable users to accept degraded performance in exchange for the ability to sync while the node is running.
Note: need a codec for logs. This may not fit within the existing Ethereum IPLD codec structure (see here)
As a developer running the eth-block-extractor,
I want to be able to persist the IPLDs I'm generating to postgres,
So that I can make fast queries and link data more efficiently.
NOTES
Preferred method would probably be to setup PG as a datastore for go-ipfs, but we could also persist nodes to a vulcanize table if we can't avoid exceeding the maximum available connections
There was a new release recently. Would be good to take advantage of whatever performance improvements are possible, but could entail updates to the wrappers
package (if there are changes to the exported functions).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.