Light

grayedfox / blockscrape Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 3.0 96 KB

Blockscrape is a utility program that scrapes data from a blockchain and exports it to CSV format.

License: MIT License

JavaScript 100.00%

blockscrape's Introduction

Hello Saturn 🪐

I'm Che and I like philosophy, coding, martial arts, and music.

📝 TypeScript, JavaScript, NodeJS, C#, Python, BASH
💜 Automation, Tooling, Process Management
📚 Nietzsche, Simone de Beauvoir, Socrates
🌱 Sustainability, Equality, Innovation, AI, Data Science
🏳️‍🌈 he/him

blockscrape's People

Contributors

Stargazers

Watchers

Forkers

tester3418 longj3418 wizadr

blockscrape's Issues

Ability to specify which attributes should be exported to data dump

Promised feature inside of Coming Soon, pretty self explanatory

Update docs to reflect remote API support

update requirements to say that local chain is optional (although strongly recommended)
update coming soon section(s)
update supported blockchains section and specify which ones are supported remotely via blockcyher
move "remote-able" into current features

Write a guide for adding support of a new blockchain

Initial goal was to allow for easy exendability in terms of supporting diferebt blockchains and although it's certainly possible it's a bit more complex than my initial estimates.

write a guide which explains the steps for adding a new blockchain (ideally: check their docs, map paths to local api commands like getTransactionByHash, then add default params where required)
encourage users to make requests if they're happy to wait a little and don't want to touch the code

Current data export mechanism relies too heavily on workers being error free

The current logic of the program writes a block, and all of its transactions, in a single write and retains the scraped order when writing to the exportedData.csv file.

While this is good for later data analysis (and saves us having to sort hundreds of millions of lines later on) the write mechanism will wait until a worker sends it the next descending block before it writes, which can result in the following bug:

master process writes block 1000 and all transactions to file
worker 1 finishes block 1002, sending required data to master
worker 2, 3, and 4 are still busy with blocks 1001, 1003, 1004
master stores block 1002 data in memory, tells worker 1 to go to next available block
worker 1 starts scraping block 1005
worker 3 and 4 finish, master also adds their blocks to memory (block 1003 and 1004), since it hasn't received block 1001 yet, and start scraping next blocks
worker 2 dies/freezes/poos for some reason, so block 1001 is never sent to master
remaining workers continue to scrape next available blocks but master never restarts worker 1 and (currently) will never actually write any more data, since worker 1 is never restarted and remaining workers will continue to scrape blocks in order of blockHeight

This can result in the scraper doing all of it's good work but never exporting new data since it refuses to write blocks out of order.

Possible fixes:

allow blocks to be written and skip a block sometimes if the blocksToWrite array is getting too full (more of a workaround)
make workers smarter so that they will only scrape the next available block if a previous block is not available to work on (this case should only happen when a worker errors or dies)
above solution would require proper error handling so that workers save failed blocks somewhere

Clean up JSON.parse usage by placing inside appropriate API calls

Ideally string responses that are JSON formatted should be parsed inside of the API and not multiple times inside the scraper file

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.