Giter VIP home page Giter VIP logo

cs-441-dofcc-hw1's Introduction

Scala Hadoop MapReduce Program

Alessandro Martinolli
mail:[email protected]
youtube video:https://youtu.be/xvdkpopnPFg

Overview

This is a Scala-based Hadoop MapReduce program designed for processing graph data. The program aims to analyze nodes from two graphs, generating results based on a specific comparison logic. It's structured in the com.lsc package and uses MapReduce for distributed processing.

Features

  • Node Combination Generator:

    • Generates all possible node combinations between two graphs.
  • Hadoop MapReduce Job:

    • Processes node combinations, apply analysis logic and produces outputs.

Modules

1. Main

  • Acts as the main driver.
  • Sets up and executes the Hadoop MapReduce job.

2. MapReduce

  • Contains the Mapper and Reducer definitions.

3. FileManager

  • Manages file operations for creating and handling shards.

4. Parser

  • Handles node parsing logic.
  • Calculates node similarities.

5. Comparison

  • Computes required statistics.
  • Analyzes the reducer output and YAML input to produce various metrics.

Usage

  1. Ensure your Hadoop environment is up and running.
  2. Clone the repository
  3. Compile and run the program using SBT:
    sbt clean compile
    sbt "run <input dir> <output dir>"
    
  4. Compile and test the program using SBT:
    sbt clean test
    
  5. Run the main method in Comparison class.
    Note: The JAR for this program is already included in the repository.

Additional Information

  • The files contained in the mapper_input folder must be named as in the repository
  • The program includes commented sections that indicate prior workflows and logic. These sections can be uncommented based on requirements.
  • Logging is facilitated through slf4j, ensuring detailed logs regarding the status of graph loading, shard creation, and MapReduce job execution.

cs-441-dofcc-hw1's People

Contributors

al3ssandro-create avatar utsavns96 avatar

Watchers

0x1D0CD00D avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.