Giter VIP home page Giter VIP logo

arraystream's Introduction

ArrayStream

ArrayStream is a node.js utility to stream large quantities of json data to file, formatted as a large JSON array. These can later be read using utilities like https://www.npmjs.com/package/stream-json.

Features

  • Extends node Writeable
  • Will automatically convert JS objects into valid JSON
  • Validation on each JSON chunk
  • Automatically creates output folder and uses a basic increment file naming strategy
  • Wait for stream end using bluebird Promise
  • High level metadata / statistic such as: total count of items written, list of filenames written to
  • Option to append to or overwrite existing files.
  • Option to avoid existing files and start from nearest filename increment (if file_0001.json and file_0002.json exist, write to file_0003.json)
  • Set maximum number of objects to be written for each file and ArrayStream will auto-increment filename
  • Option to set stream to "lazy" mode and not open any file handles until needed

Usage

const ArrayStream = require('@kevin-coelho/json-arraystream');

// see lib/types.js for config definition
const config = {
	file: {
		folder: './output',
		keyPattern: 'testFile_$$',
		fileType: 'json',
		numDigits: 4, // will create files like output/testFile_0001.json
	},
	maxItems: 10,
	itemValidationSchema: joi
		.object()
		.keys({
			someNumber: joi.number().required(),
		})
		.required(),
	lazy: true,
};

const out = new ArrayStream(config);
out.on('error', () => console.error('an error occurred'));

// write like a normal stream
out.write({
	someNumber: 5
});

// stream produces joi validation error when chunk fails to pass itemValidationSchema
out.write({
	foo: 'bar',
});

// out will create a new file testFile_0001.json, testFile_0002.json, ... for every 10 chunks written
const hugeArray = [....];
hugeArray.forEach(chunk => out.write(chunk));

await out.closeArrayStream();

Installation & Test

  • npm install @kevin-coelho/json-arraystream
  • npm run test

Roadmap

  • Support limiting files by size rather than item limit
  • Support streaming output to AWS S3 instead of local file

arraystream's People

Contributors

kevin-coelho avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.