Giter VIP home page Giter VIP logo

dataset-cache's Introduction

Dataset Cache CircleCI

A solution for downloading and caching datasets

How does it work?

dataset-cache allows you to download your datasets, cache them and guarantee that the data you're using is the correct one.

When a file is downloaded it will be saved to the output directory, then it will be hashed using sha256 and renamed to it's hash for caching.

Directories can be provided in the form of a tarball (tar.gz). After downloading, the contents will be extracted and the directory will be hashed using hash-then. The contents then will be saved in a directory named using the hash.

Use

As a library

Donwload and cache a file

dataset.get({
  url: 'https://raw.githubusercontent.com/empiricalci/fixtures/master/my-file.txt',
  hash: '8b4781a921e9f1a1cb5aa3063ca8592cac3ee39276d8e8212b336b6e73999798'
}, '/output_dir').then(function (data) {
  console.log(data) // {path: '..', hash: '..', valid: '..', cached: '..'} 
})

Donwload and cache a directory If directory: true the library will extract the .zip or .tar.gz

dataset.get({
  url: 'https://github.com/empiricalci/fixtures/raw/master/my-files.tar.gz',
  hash: '0e4710c220e7ed2d11288bcf3cf111ac01bdd0cb2a4d64f81455c5b31f1a4fbe',
  directory: true
}, output_dir).then(function (data) {
  console.log(data) // {path: '..', hash: '..', valid: '..', cached: '..'} 
})

Install multiple datasets at once

dataset.install({
  resource1: {url: '..', hash: '..'},
  dataset2: {url: '..', hash: '..'}
}, function (datasets) {
  console.log(datasets)
})

dataset-cache's People

Contributors

alantrrs avatar

Watchers

 avatar  avatar

dataset-cache's Issues

Minimum Viable Library

  • Download the resources
  • Uncompress them
  • Put them into its own directory
  • Test with multiple datasets

Caching directories using zip files is not working on Windows

The resulting checksum for directories extracted from zip files is different in Windows and Linux

  1) Get directory from zip should download and uncompress the file:
      AssertionError: '24490b407969812c10e687d38edebda165b94d6c4f8f9ad9306066acf76f8dfa' == '5c9c3ff715bac9faa626f6e0e1da60c976c279c94d931388c6aaaf38456957d8'
      + expected - actual
      -24490b407969812c10e687d38edebda165b94d6c4f8f9ad9306066acf76f8dfa
      +5c9c3ff715bac9faa626f6e0e1da60c976c279c94d931388c6aaaf38456957d8

      at C:\projects\dataset-cache\test\index.js:112:14

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.