Giter VIP home page Giter VIP logo

readdir-cluster's Introduction

readdir-cluster

Status of the GitHub Workflow: bevry NPM version NPM downloads
GitHub Sponsors donate button ThanksDev donate button Patreon donate button Liberapay donate button Buy Me A Coffee donate button Open Collective donate button crypto donate button PayPal donate button
Discord server badge Twitch community badge

Create a cluster of workers to iterate through the filesystem

Usage

Complete API Documentation.

import readdirCluster, { Stat } from 'readdir-cluster'

// note that Stat is not the same as fs.Stats as it has functions removed, as it needed to be serialisable
function iterator(path: string, filename: string, stat: Stat) {
    // skip directories and files that start with .
    if (filename[0] === '.') return false
    // do not recurse into directories
    if (stat.directory) return false
}

const paths = await readdirCluster({ directory: '.', iterator })
console.log(paths)

Performance

Benchmarks:

  • Running readdir-cluster . returns 7388 files in 500ms

  • Running readdir with recursive: true returns 7388 files in 100ms

    import { readdir } from 'fs'
    readdir('.', { recursive: true }, (err, files) => {
        if (err) console.error(err)
        else if (files.length) process.stdout.write(files.join('\n') + '\n')
    })
  • Running fdir returns 6480 files in 100ms

    import { fdir } from 'fdir'
    const api = new fdir().withBasePath().crawl(process.argv[2])
    api.withPromise().then((files) => {
        if (files.length) process.stdout.write(files.join('\n') + '\n')
    })

Recommendations:

  • if you target Node.js 18.7 and above, you should use fs.readdir with recursive: true
  • if you target older Node.js versions, you should use @bevry/fs-list
  • if you target older Node.js versions and you want a stat object, use readdir-cluster
  • if you target Nodejs 12 and above, and want a lot of customisation, use fdir

As for why this package exists, readdir-cluster was created in 2005, recursive was added to Node.js in 2023, and fdir was created in 2020. That said, there are several issues that could potentially improve readdir-cluster performance.

Install

npm

Install Globally

  • Install: npm install --global readdir-cluster
  • Executable: readdir-cluster

Install Locally

  • Install: npm install --save readdir-cluster
  • Executable: npx readdir-cluster
  • Import: import pkg from ('readdir-cluster')
  • Require: const pkg = require('readdir-cluster').default

Editions

This package is published with the following editions:

  • readdir-cluster aliases readdir-cluster/index.cjs which uses the Editions Autoloader to automatically select the correct edition for the consumer's environment
  • readdir-cluster/source/index.ts is TypeScript source code with Import for modules
  • readdir-cluster/edition-es2022/index.js is TypeScript compiled against ES2022 for Node.js 14 || 16 || 18 || 20 || 21 with Require for modules
  • readdir-cluster/edition-es2017/index.js is TypeScript compiled against ES2017 for Node.js 8 || 10 || 12 || 14 || 16 || 18 || 20 || 21 with Require for modules
  • readdir-cluster/edition-es2015/index.js is TypeScript compiled against ES2015 for Node.js 6 || 8 || 10 || 12 || 14 || 16 || 18 || 20 || 21 with Require for modules
  • readdir-cluster/edition-es5/index.js is TypeScript compiled against ES5 for Node.js 4 || 6 || 8 || 10 || 12 || 14 || 16 || 18 || 20 || 21 with Require for modules
  • readdir-cluster/edition-es2017-esm/index.js is TypeScript compiled against ES2017 for Node.js 12 || 14 || 16 || 18 || 20 || 21 with Import for modules
  • readdir-cluster/edition-types/index.d.ts is TypeScript compiled Types with Import for modules

History

Discover the release history by heading on over to the HISTORY.md file.

Backers

Code

Discover how to contribute via the CONTRIBUTING.md file.

Authors

Maintainers

Contributors

Finances

GitHub Sponsors donate button ThanksDev donate button Patreon donate button Liberapay donate button Buy Me A Coffee donate button Open Collective donate button crypto donate button PayPal donate button

Sponsors

  • Andrew Nesbitt — Software engineer and researcher
  • Balsa — We're Balsa, and we're building tools for builders.
  • Codecov — Empower developers with tools to improve code quality and testing.
  • Poonacha Medappa
  • Rob Morris
  • Sentry — Real-time crash reporting for your web apps, mobile apps, and games.
  • Syntax — Syntax Podcast

Donors

License

Unless stated otherwise all works are:

and licensed under:

readdir-cluster's People

Contributors

balupton avatar dependabot-preview[bot] avatar dependabot[bot] avatar github-actions[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

odnodn

readdir-cluster's Issues

Your .dependabot/config.yml contained invalid details

Dependabot encountered the following error when parsing your .dependabot/config.yml:

The property '#/update_configs/0/automerged_updates/0/match/update_type' value "security" did not match one of the following values: all, security:patch, semver:patch, semver:minor, in_range

Please update the config file to conform with Dependabot's specification using our docs and online validator.

Figure out if batch work for the cluster workers is even faster

Right now we send each task for the worker individually, this may incur a communication cost, perhaps we can batch the tasks or queue the tasks so that instead of sending each task to the worker individually, we send a batch of 10 at a time, or however many we have queued up while the worker was running, this may reduce communication cost and increase performance at the cost of additional complexity.

TypeError: cluster.setupMaster is not a function

TypeError: cluster.setupMaster is not a function
    at openWorkers (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/source/index.js:31:11)
    at module.exports (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/source/index.js:75:2)
    at EventEmitterGrouped.<anonymous> (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/source/test.js:25:3)
    at ambi (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/node_modules/ambi/out/lib/ambi.js:55:18)
    at Domain.fireMethod (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/node_modules/taskgroup/out/lib/taskgroup.js:385:23)
    at Domain.run (domain.js:221:14)
    at EventEmitterGrouped.Task.fire (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/node_modules/taskgroup/out/lib/taskgroup.js:423:27)
    at Immediate._onImmediate (/Users/balupton/Projects/esnextguardian-to-editions/readdir-cluster/node_modules/taskgroup/out/lib/taskgroup.js:440:26)
    at tryOnImmediate (timers.js:543:15)
    at processImmediate [as _immediateCallback] (timers.js:523:5)

See if just using spawn is faster than clusters

@jokeyrhyme: You may not necessarily need to use the cluster module, as you don't need port-level load-balancing for this (as you don't even open ports). I wonder if just using the childProcess built-in instead would be faster?

This is a good point. PR and performance comparison welcome.

Fallback to non-cluster readdir if not in master process

Using node -e "require('readdir-cluster')(process.cwd(), console.log, console.log)" will fail as the node eval will mean readdir-cluster does not run in the master process, so cluster.setupMaster does not exist. In such circumstances, which we cannot control, we should fallback gracefully, either to spawn #3 or a normal readdir.

generalise to run provided async function on all files?

I have a use case where I'd like the MD5 hashes of each file. I was about to request that, but then I realised other use cases might want different types of hashes, and other use cases might need something else entirely.

Is it possible to make this run an alternative function in the worker, provided by the consumer of this project (instead of hard coded in this project)?

Is this something you'd accept a PR for?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.