Giter VIP home page Giter VIP logo

js-spark's Introduction

What is JS-Spark

Distributed real time computation/job/work queue using JavaScript. A JavaScript reimagining of the fabulous Apache Spark and Storm projects.

If you know underscore.js or lodash.js you may use JS-Spark as a distributed version of them.

If you know Distributed-RPC systems like storm you will feel at home.

If you've ever worked with distributed work queues such as Celery, you will find JS-Spark easy to use.

main page computing que

Why

There are no JS tools that can offload your processing to 1000+ CPUs. Furthermore, existing tools in other languages, such as Seti@Home, Gearman, require time, expensive setup of server, and later setting up/supervising clients machines.

We want to do better. On JS-Spark your clients need just to click on a URL, and the server side has one line installation (less than 5 min).

Hadoop is quite slow and requires maintaining a cluster - we can to do better. Imagine that there's no need to set up expensive cluster/cloud solutions. Use web browsers! Easily scale to multiple clients. Clients do not need to install anything like Java or other plugins.

Setup in a matter of minutes and you are good to go.

The possibilities are endless:

No need to setup expensive clusters. The setup takes 5 min and you are good to go. You can do it on one machine. Even on a Raspberry Pi.

  • Use as ML tool to process in real time huge streams of data... while all clients still browse their favorite websites

  • Use for big data analytics. Connect to Hadoop HDFS and process even terabytes of data.

  • Use to safely transfer huge amount of data to remote computers.

  • Use as CDN... Today most websites runs slower when more clients use them. But using JS-Spark you can totally reverse this trend. Build websites that run FASTER the more people use them

  • Synchronize data between multiple smartphones.. even in Africa

  • No expensive cluster setup required!

  • Free to use.

How (Getting started with npm)

To add a distributed job queue to any node app run:

    npm i --save js-spark

Look for Usage with npm.

Example: running multicore jobs in JS:

Simple example with node multicore jobs

example-js-spark-usage

git clone [email protected]:syzer/example-js-spark-usage.git && cd $_
npm install

Game of life example

distributed-game-of-life

git clone https://github.com/syzer/distributed-game-of-life.git && cd $_
npm install

Example: NLP

This example shows how to use one of the Natural Language Processing tools called N-Gram in a distributed manner using JS-Spark:

Distributed-N-Gram

If you'd like to know more about N-grams please read:

http://en.wikipedia.org/wiki/N-gram

How (Getting started)

Prerequisites: install Node.js, then: install grunt and bower,

sudo npm install -g bower
sudo npm install -g grunt

Install js-spark

npm i --save js-spark
#or use:
git clone [email protected]:syzer/JS-Spark.git && cd $_
npm install

Then run:

    node index & 
    node client

Or:

    npm start        

After that you may see how the clients do the heavy lifting.

Usage with npm

var core = require('jsSpark')({workers:8});
var jsSpark = core.jsSpark;

jsSpark([20, 30, 40, 50])
    // this is executed on the client
    .map(function addOne(num) {
        return num + 1;
    })
    .reduce(function sumUp(sum, num) {
        return sum + num;
    })
    .thru(function addString(num){
        return "It was a number but I will convert it to " + num; 
    })
    .run()
    .then(function(data) {
        // this is executed on back on the server
        console.log(data);
    })

Usage (Examples)

Client side heavy CPU computation (MapReduce)

task = jsSpark([20, 30, 40, 50])
    // this is executed on client side
    .map(function addOne(num) {
        return num + 1;
    })
    .reduce(function sumUp(sum, num) {
        return sum + num;
    })
    .run();

Distributed version of lodash/underscore

jsSpark(_.range(10))
     // https://lodash.com/docs#sortBy
    .add('sortBy', function _sortBy(el) {
        return Math.sin(el);
    })
    .map(function multiplyBy2(el) {
        return el * 2;
    })
    .filter(function remove5and10(el) {
        return el % 5 !== 0;
    })
    // sum of  [ 2, 4, 6, 8, 12, 14, 16, 18 ] => 80
    .reduce(function sumUp(arr, el) {
        return arr + el;
    })
    .run();

Multiple retry and clients elections

If you run calculations via unknown clients is better to recalculate same tasks on different clients:

jsSpark(_.range(10))
    .reduce(function sumUp(sum, num) {
        return sum + num;
    })
    // how many times to repeat calculations
    .run({times: 6})
    .then(function whenClientsFinished(data) {
        // may also get 2 most relevant answers
        console.log('Most clients believe that:');
        console.log('Total sum of numbers from 1 to 10 is:', data);
    })
    .catch(function whenClientsArgue(reason) {
        console.log('Most clients could not agree, ', + reason.toString());
    });

Combined usage with server side processing

task3 = task
    .then(function serverSideComputingOfData(data) {
        var basesNumber = data + 21;
        // All your 101 base are belong to us
        console.log('All your ' + basesNumber + ' base are belong to us');
        return basesNumber;
    })
    .catch(function (reason) {
        console.log('Task could not compute ' + reason.toString());
    });

More references

This project involves reimplementing some nice things from the world of big data, so there are of course some nice resources you can use to dive into the topic:

Running with UI

Normally you do not need to start UI server. But if you want to build an application on top on the js-spark UI server. Feel free to do so.

    git clone [email protected]:syzer/JS-Spark.git && cd $_
    npm install
    grunt build
    grunt serve

To spam more light-weight (headless) clients:

    node client

Required to run UI

  • mongoDB default connection parameters:

  • mongodb://localhost/jssparkui-dev user: 'js-spark', pass: 'js-spark1' install mongo, make sure mongod(mongo service) is running run mongo shell with command:

mongo
use jssparkui-dev
db.createUser({ 
  user: "js-spark",
  pwd: "js-spark1",
  roles: [
    { role: "readWrite", db: "jssparkui-dev" }
  ]
})
  • old mongodb engines can use db.addUser() with same API

  • to run without UI db code is not required!

  • on first run you need to seed the db: change option seedDB: false => seedDB: true on ./private/srv/server/config/environment/development.js

Tests

npm test

TODO

  • service/file -> removed for other module
  • di -> separate module
  • [!] bower for js-spark client
  • config-> merge different config files
  • [!] server/auth -> split to js-spark-ui module
  • [!] server/api/jobs -> split to js-spark-ui module
  • split ui
  • more examples
  • example with cli usage (not daemon)
  • example with using thu
  • [?] .add() is might be broken... maybe fix or remove

js-spark's People

Contributors

amacfie avatar e-jigsaw avatar jordymoos avatar kichooo avatar pfiver avatar syzer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

js-spark's Issues

fork issue

Unhandled rejection TypeError: Incorrect value of args option
at exports.fork (child_process.js:529:11)
at tryCatcher (/Mypath/distributedNgram/node_modules/bluebird/js/main/util.js:24:31)
at ret (eval at (/Mypath/distributedNgram/node_modules/bluebird/js/main/promisify.js:154:12), :13:23)
at /Mypath/distributedNgram/node_modules/js-spark/private/src/server/service/fork.js:18:24
at Function.times (/Mypath/distributedNgram/node_modules/js-spark/node_modules/lodash/dist/lodash.js:6350:25)
at Object.forkWorker (/Mypath/distributedNgram/node_modules/js-spark/private/src/server/service/fork.js:16:11)
at module.exports (/Mypath/distributedNgram/node_modules/js-spark/index.js:31:32)
at Object. (/Mypath/distributedNgram/index.js:1:96)
at Module._compile (module.js:456:26)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:929:3

EVALing

Maybe we can just use it? Just catch all the exceptions, and we are fine. Clients need to trust the host anyway. Only thing to do, it can be made https ready, so that nobody should MitM attack multiple browsers by injecting anything into the logic code.

functionaliy demos

checkout crowdprocess.com, they have a nice raytracing demo: http://distracer.io/
Would be cool if js-spark had the same.. if I have a spare moment I'll see if I could help with that.

Stream files to clients

maybe when constricting JS-Spark with a String (not an Array)
this sting can be a file path name

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because we are using your CI build statuses to figure out when to notify you about breaking changes.

Since we did not receive a CI status on the greenkeeper/initial branch, we assume that you still need to configure it.

If you have already set up a CI for this repository, you might need to check your configuration. Make sure it will run on all new branches. If you don’t want it to run on every branch, you can whitelist branches starting with greenkeeper/.

We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.