Giter VIP home page Giter VIP logo

mapreduceframework's Introduction

MapReduce Infrastructure

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication.

Install Dependency

How to setup the project

Source Code Structure

Code walk through

How to Run MapReduce Job

This project can run successfully both on Linux and Mac OS X

  1. First, make sure you already installed gRPC and its dependent Protocol Buffers v3.0, check out Install Dependency section to find out much more details.

  2. Compile code and generate libraries

    • Goto src directory and run make command, two libraries would be created in external directory: libmapreduce.a and libmr_worker.a.

          cd src && make
    • Now goto test directory and run make command, two binaries would be created: mrdemo and mr_worker.

          cd test && make
  3. Now running the demo, once you have created all the binaries and libraries.

    • Clear the files if any in the output directory

          rm test/output/*
    • Start all the worker processes in the following fashion:

          ./mr_worker localhost:50051 & ./mr_worker localhost:50052 & ./mr_worker localhost:50053 & ./mr_worker localhost:50054 & ./mr_worker localhost:50055 & ./mr_worker localhost:50056;
    • Then start your main map reduce process: ./mrdemo

          ./mrdemo
    • Once the ./mrdemo finishes, kill all the worker proccesses you started.

      1. For Mac OS X:

            killall mr_worker
      2. For Linux:

            killall mr_worker
    • Check output directory to see if you have the correct results(obviously once you have done the proper implementation of your library

      .
      ├── output0.txt
      ├── output1.txt
      ├── output2.txt
      ├── output3.txt
      ├── output4.txt
      ├── output5.txt
      ├── output6.txt
      ├── output7.txt
      ├── temp0.txt
      ├── temp1.txt
      ├── temp2.txt
      ├── temp3.txt
      ├── temp4.txt
      ├── temp5.txt
      ├── temp6.txt
      └── temp7.txt
      
      0 directories, 16 files
      

Reference

Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 2004

mapreduceframework's People

Contributors

gangliao avatar

Watchers

James Cloos avatar 潮水中一滴浪花 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.