Giter VIP home page Giter VIP logo

onchainification / candlestick_retriever Goto Github PK

View Code? Open in Web Editor NEW
151.0 8.0 45.0 69 KB

Retrieve all historical candlestick data from crypto exchange Binance and upload it to Kaggle.

Home Page: https://www.kaggle.com/jorijnsmit/binance-full-history

License: GNU General Public License v3.0

Python 100.00%
candlesticks cryptocurrencies market-data kaggle-dataset klines binance-api defi

candlestick_retriever's Introduction

candlestick_retriever

Retrieve all historical candlestick data from crypto exchange Binance and upload it to Kaggle.

Dependencies

  • pandas
  • requests
  • pyarrow
  • kaggle

Running

Simply run ./main.py to either download or update every single pair available:

[...]
2020-08-22 17:44:24.178846 959/970 Wrote 83000 new lines to file for DOGE-BTC 
2020-08-22 17:45:13.963455 960/970 Wrote 83000 new lines to file for NULS-ETH 
2020-08-22 17:45:14.573595 961/970 Already up to date with BTCB-BTC
2020-08-22 17:46:06.781870 962/970 Wrote 83000 new lines to file for ATOM-BTC 
2020-08-22 17:46:08.669972 963/970 Already up to date with LSK-BNB
[...]

Once that is completed you should end up with a directory with a Parquet file for each pair, currently 970 files totaling ~12GB.

candlestick_retriever's People

Contributors

gosuto-inzasheru avatar yo2x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

candlestick_retriever's Issues

storage size can be drastically decreased

  • use .parquet files
  • convert to a lower precision dtype, e.g. float32 and int16.
  • drop some unnecessary columns

This brought a CSV file of 170MB back to 52MB; ~30%. 50GB dataset (all pairs) then becomes 15GB!

use zstd for improved compression

Pandas to_parquet() method uses snappy compression by default. You can get significantly better compression (20% lower file size or better) and keep good decompression speed by passing compression=zstd when saving to parquet.
It's worth noting that zstd allows many compression levels and I'm not sure if pandas automatically chooses the highest compression level, importing the df into a duckdb table and saving the parquet from there specifying zstd as compression could result in lower-sized parquet files

Catch other connection errors

Add the ability to catch other connection errors such as time out and connection reset error.
Just made a pull request

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.