Alessandro Martinolli
mail:[email protected]
youtube video:https://youtu.be/xvdkpopnPFg
This is a Scala-based Hadoop MapReduce program designed for processing graph data. The program aims to analyze nodes from two graphs, generating results based on a specific comparison logic. It's structured in the com.lsc
package and uses MapReduce for distributed processing.
-
Node Combination Generator:
- Generates all possible node combinations between two graphs.
-
Hadoop MapReduce Job:
- Processes node combinations, apply analysis logic and produces outputs.
- Acts as the main driver.
- Sets up and executes the Hadoop MapReduce job.
- Contains the Mapper and Reducer definitions.
- Manages file operations for creating and handling shards.
- Handles node parsing logic.
- Calculates node similarities.
- Computes required statistics.
- Analyzes the reducer output and YAML input to produce various metrics.
- Ensure your Hadoop environment is up and running.
- Clone the repository
- Compile and run the program using SBT:
sbt clean compile sbt "run <input dir> <output dir>"
- Compile and test the program using SBT:
sbt clean test
- Run the main method in Comparison class.
Note: The JAR for this program is already included in the repository.
- The files contained in the mapper_input folder must be named as in the repository
- The program includes commented sections that indicate prior workflows and logic. These sections can be uncommented based on requirements.
- Logging is facilitated through
slf4j
, ensuring detailed logs regarding the status of graph loading, shard creation, and MapReduce job execution.