Giter VIP home page Giter VIP logo

blockscrape's Introduction

Hello Saturn ๐Ÿช

I'm Che and I like philosophy, coding, martial arts, and music.

  • ๐Ÿ“ TypeScript, JavaScript, NodeJS, C#, Python, BASH
  • ๐Ÿ’œ Automation, Tooling, Process Management
  • ๐Ÿ“š Nietzsche, Simone de Beauvoir, Socrates
  • ๐ŸŒฑ Sustainability, Equality, Innovation, AI, Data Science
  • ๐Ÿณ๏ธโ€๐ŸŒˆ he/him
GrayedFox's github stats

blockscrape's People

Contributors

grayedfox avatar

Stargazers

 avatar

Watchers

 avatar

blockscrape's Issues

Update docs to reflect remote API support

  • update requirements to say that local chain is optional (although strongly recommended)
  • update coming soon section(s)
  • update supported blockchains section and specify which ones are supported remotely via blockcyher
  • move "remote-able" into current features

Write a guide for adding support of a new blockchain

Initial goal was to allow for easy exendability in terms of supporting diferebt blockchains and although it's certainly possible it's a bit more complex than my initial estimates.

  • write a guide which explains the steps for adding a new blockchain (ideally: check their docs, map paths to local api commands like getTransactionByHash, then add default params where required)
  • encourage users to make requests if they're happy to wait a little and don't want to touch the code

Current data export mechanism relies too heavily on workers being error free

The current logic of the program writes a block, and all of its transactions, in a single write and retains the scraped order when writing to the exportedData.csv file.

While this is good for later data analysis (and saves us having to sort hundreds of millions of lines later on) the write mechanism will wait until a worker sends it the next descending block before it writes, which can result in the following bug:

  • master process writes block 1000 and all transactions to file
  • worker 1 finishes block 1002, sending required data to master
  • worker 2, 3, and 4 are still busy with blocks 1001, 1003, 1004
  • master stores block 1002 data in memory, tells worker 1 to go to next available block
  • worker 1 starts scraping block 1005
  • worker 3 and 4 finish, master also adds their blocks to memory (block 1003 and 1004), since it hasn't received block 1001 yet, and start scraping next blocks
  • worker 2 dies/freezes/poos for some reason, so block 1001 is never sent to master
  • remaining workers continue to scrape next available blocks but master never restarts worker 1 and (currently) will never actually write any more data, since worker 1 is never restarted and remaining workers will continue to scrape blocks in order of blockHeight

This can result in the scraper doing all of it's good work but never exporting new data since it refuses to write blocks out of order.

Possible fixes:

  • allow blocks to be written and skip a block sometimes if the blocksToWrite array is getting too full (more of a workaround)
  • make workers smarter so that they will only scrape the next available block if a previous block is not available to work on (this case should only happen when a worker errors or dies)
  • above solution would require proper error handling so that workers save failed blocks somewhere

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.