Giter VIP home page Giter VIP logo

eth-block-extractor's Introduction

vulcanize

eth-block-extractor's People

Contributors

i-norden avatar rmulhol avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

eth-block-extractor's Issues

Add default option for most recent block

As a developer running the block watcher,
I want to be able to sync to the most recent block by default,
So that I do not need to know and specify an ending block number

Parsing eth data off of IPFS

As a developer using the eth-block-extractor,
I want to be able to read ethereum data off of IPFS from this project,
So that I don't need to run any other processes to use the data.

NOTES
IPFS doesn't natively know how to parse ethereum data structures. We can get around this by running a separate program to register parsers for the data structures we're persisting, but it'd be nice to be able to execute that work internally from this project.
Might also want to consider just returning a raw byte array and letting the user do their own parsing.
This story should probably be broken up into separate ones for each data structure as we get closer to crossing that bridge.

State/storage trie "progress bar"

As a developer running the createIpldsForStateTrie command,
I want to get some sort of feedback about the state of the execution within a block,
So that I have an idea of how things are going when syncing a block for which there are very many state/storage trie nodes.

NOTES
Right now, the sync is performed at the block level (all state/storage trie nodes are fetched, and then all are persisted to IPFS). This means that I can let the program run for a long time without seeing any feedback (and, furthermore, it means that I can lose all progress toward constructing a block's state/storage trie if I halt execution). Perhaps there's a way to intermittently halt and resume the process of iterating through these tries, within a single block, to persist already fetched data and provide feedback about the state of the execution.

Use `ReceiptForStorage` ?

The current dag putter for receipts uses the Receipt.EncodeRLP() method which encodes only the consensus fields:

// EncodeRLP implements rlp.Encoder, and flattens the consensus fields of a receipt
// into an RLP stream. If no post state is present, byzantium fork is assumed.
func (r *Receipt) EncodeRLP(w io.Writer) error {
	return rlp.Encode(w, &receiptRLP{r.statusEncoding(), r.CumulativeGasUsed, r.Bloom, r.Logs})
}

// receiptRLP is the consensus encoding of a receipt.
type receiptRLP struct {
	PostStateOrStatus []byte
	CumulativeGasUsed uint64
	Bloom             Bloom
	Logs              []*Log
}

I'm wondering if we should be publishing the ReceiptForStorage instead:

// receiptStorageRLP is the storage encoding of a receipt.
type receiptStorageRLP struct {
	PostStateOrStatus []byte
	CumulativeGasUsed uint64
	TxHash            common.Hash
	ContractAddress   common.Address
	Logs              []*LogForStorage
	GasUsed           uint64
}

I think we want that TxHash but I need to give it some more thought. Opening this so that if anybody has some insights, please share!

Rename repo

Maybe something like eth-block-extractor in preparation for tm-block-extractor and ln-db-extractor

Add garbage collection when computing historical state

As a developer computing historical state,
I want to be able to compute the historical state trie without maintaining previous tries in memory,
So that performance improves for larger numbers of blocks.

NOTES
Right now the operation to compute historical state must begin at a state root for which the trie already exists in the db (e.g. the genesis block). After we've computed the state for subsequent blocks, we should be able to discard the trie associated with the root from earlier blocks.

State/storage trie performance improvements

As a developer running the createIpldsForStateTrie command,
I want to be able to see execution complete more quickly,
So that I can sync the state/storage tries for more blocks in less time.

NOTES
Is there a way to more efficiently fetch every node in the blocks state trie?
Could we cache nodes known from previous blocks and only fetch those known to be new?

Sync IPFS data from running node

As a developer running the eth-block-extractor,
I want to be able to fetch ethereum data over the RPC,
So that I can execute my extraction while my node is running.

NOTES
Currently, syncing requires connecting to a cold db. This optimizes for performance but disallows execution as a node is running. Adding an optional --rpc flag would enable users to accept degraded performance in exchange for the ability to sync while the node is running.

Persist IPLDs to Postgres

As a developer running the eth-block-extractor,
I want to be able to persist the IPLDs I'm generating to postgres,
So that I can make fast queries and link data more efficiently.

NOTES
Preferred method would probably be to setup PG as a datastore for go-ipfs, but we could also persist nodes to a vulcanize table if we can't avoid exceeding the maximum available connections

Upgrade go-ethereum dep

There was a new release recently. Would be good to take advantage of whatever performance improvements are possible, but could entail updates to the wrappers package (if there are changes to the exported functions).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.