Giter VIP home page Giter VIP logo

mapreduce's Introduction

MapReduce

A local multi-process MapReduce implementation written in C++

The mapper and reducer functions are implemented in mapreduce/task.h, they are wrapped by the wrapper functions implemented in mapreduce/task_wrapper.h, so that the users can focus only on the implementation of mapper and reducer, but not the details of writting files and sending messages. The mapreduce/task.h file already contains an example implementation of mapper and reducer for word count application.

To run the MapReduce system for word count, first compile the master and worker:

# make

Run the master:

# ./master -d <file> -m <map task> -r <reduce task> -p <port>

with the below arguments provided:

  -d <file>         the file input for the task (required)
  -m <map task>     the number of map tasks (default 10)
  -r <reduct task>  the number of reduce tasks (default 10)
  -p <port>         the port worker listens on (default 5057)

For example, to run the master for word count with the provided example test data, run:

# ./master -d data.txt

The master will then listen on port 5057 and wait for the work request from workers.

After running up the master, we can run multiple workers to complete the map tasks and reduce tasks concurrently. A shell script run-workers.sh is provided to easily execute multiple worker processes:

# ./run-workers.sh <workers> <port>

The first argument should be the number of workers (default 8), and the second argument should be the port for connecting to the master (default 5057). If the master does run on the default port, this argument should be specified to the same port so that the workers can reach the master.

For example, to run the workers for word count, run:

# ./run-workers.sh

The workers will then send requests for tasks to master, and complete the tasks concurrently. After all tasks are completed, the master response the workers with termination message so that the workers can terminate, and the master itself also terminates after all workers are terminated.

The result will be written in result.txt:

GENERICIZED 1
OR 1
INFRASTRUCTURE 1
ALSO 1
COMMUNICATION 2
BENEFICIAL 1
THIS 1
ENGINE 1
...

mapreduce's People

Contributors

lklxx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.