vulcanize / eth-block-extractor Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 26.7 MB

License: Apache License 2.0

Makefile 0.78% Go 99.22%

eth-block-extractor's Introduction

vulcanize

eth-block-extractor's People

Contributors

Stargazers

Watchers

Forkers

isabella232

eth-block-extractor's Issues

Add default option for most recent block

As a developer running the block watcher,
I want to be able to sync to the most recent block by default,
So that I do not need to know and specify an ending block number

Parsing eth data off of IPFS

As a developer using the eth-block-extractor,
I want to be able to read ethereum data off of IPFS from this project,
So that I don't need to run any other processes to use the data.

NOTES
IPFS doesn't natively know how to parse ethereum data structures. We can get around this by running a separate program to register parsers for the data structures we're persisting, but it'd be nice to be able to execute that work internally from this project.
Might also want to consider just returning a raw byte array and letting the user do their own parsing.
This story should probably be broken up into separate ones for each data structure as we get closer to crossing that bridge.

State/storage trie "progress bar"

As a developer running the createIpldsForStateTrie command,
I want to get some sort of feedback about the state of the execution within a block,
So that I have an idea of how things are going when syncing a block for which there are very many state/storage trie nodes.

NOTES
Right now, the sync is performed at the block level (all state/storage trie nodes are fetched, and then all are persisted to IPFS). This means that I can let the program run for a long time without seeing any feedback (and, furthermore, it means that I can lose all progress toward constructing a block's state/storage trie if I halt execution). Perhaps there's a way to intermittently halt and resume the process of iterating through these tries, within a single block, to persist already fetched data and provide feedback about the state of the execution.

Use `ReceiptForStorage` ?

The current dag putter for receipts uses the Receipt.EncodeRLP() method which encodes only the consensus fields:

// EncodeRLP implements rlp.Encoder, and flattens the consensus fields of a receipt
// into an RLP stream. If no post state is present, byzantium fork is assumed.
func (r *Receipt) EncodeRLP(w io.Writer) error {
	return rlp.Encode(w, &receiptRLP{r.statusEncoding(), r.CumulativeGasUsed, r.Bloom, r.Logs})
}

// receiptRLP is the consensus encoding of a receipt.
type receiptRLP struct {
	PostStateOrStatus []byte
	CumulativeGasUsed uint64
	Bloom             Bloom
	Logs              []*Log
}

I'm wondering if we should be publishing the ReceiptForStorage instead:

// receiptStorageRLP is the storage encoding of a receipt.
type receiptStorageRLP struct {
	PostStateOrStatus []byte
	CumulativeGasUsed uint64
	TxHash            common.Hash
	ContractAddress   common.Address
	Logs              []*LogForStorage
	GasUsed           uint64
}

I think we want that TxHash but I need to give it some more thought. Opening this so that if anybody has some insights, please share!

Rename repo

Maybe something like eth-block-extractor in preparation for tm-block-extractor and ln-db-extractor

Compute historical state trie nodes for non-archive node

As a developer running the block watcher,
I want to be able to compute historical state trie nodes,
So that I do not need to be running an archive node to create IPLDs for state trie nodes.

Add Travis

Add garbage collection when computing historical state

As a developer computing historical state,
I want to be able to compute the historical state trie without maintaining previous tries in memory,
So that performance improves for larger numbers of blocks.

NOTES
Right now the operation to compute historical state must begin at a state root for which the trie already exists in the db (e.g. the genesis block). After we've computed the state for subsequent blocks, we should be able to discard the trie associated with the root from earlier blocks.

Create IPLDs for Transaction Receipts

Create IPLDs for State Tries

Create IPLDs for transactions

State/storage trie performance improvements

As a developer running the createIpldsForStateTrie command,
I want to be able to see execution complete more quickly,
So that I can sync the state/storage tries for more blocks in less time.

NOTES
Is there a way to more efficiently fetch every node in the blocks state trie?
Could we cache nodes known from previous blocks and only fetch those known to be new?

Sync IPFS data from running node

As a developer running the eth-block-extractor,
I want to be able to fetch ethereum data over the RPC,
So that I can execute my extraction while my node is running.

NOTES
Currently, syncing requires connecting to a cold db. This optimizes for performance but disallows execution as a node is running. Adding an optional --rpc flag would enable users to accept degraded performance in exchange for the ability to sync while the node is running.