Giter VIP home page Giter VIP logo

catla's Introduction

Catla

Catla

Catla is a self-tuning system for Hadoop parameters to improve performance of MapReduce jobs on Hadoop clusters. It is template-driven, making it very flexible to perform complicated job execution, monitoring and self-tuning for MapReduce performance.

Components

  1. Task Runner: To submit a single MapReduce job to a Hadoop cluster and obtain its analyzing results and logs after the job is completed.
  2. Project Runner: To submit a group of MapReduce jobs in an organized project folder and monitor the status of its running until completion; eventually, all analyzing results and their logs that contain information of running time in all MapReduce phrases are downloaded into specified location path in its project folder.
  3. Optimizer Runner: To create a series of MapReduce jobs with different combinations of parameter values according to parameter configuration files and obtain the optimal parameter values with least time cost after the tuning process is finished. Two tuning processes, namely exhaustive search and derivative-free optimization (DFO) techniques, are supported.

Catla architecture

Fig.1 Architecture of Catla

Prerequisites

  1. You should run Catla in a Windows computer located in the same network as Hadoop clusters. It means Catla is able to access master host via network.
  2. Standard Java environment on the computer should be properly installed.
  3. Hadoop must enable Yarn Log Aggregation by setting value of 'yarn.log-aggregation-enable' to true.
  4. Critical information of master host, like username, userpassword, SSH port, etc. must be known because Catla needs the information to run MapReduce jobs.
  5. You must change the configuration of master host's information in the env_* files in the example folder before you try to run any examples here.
  6. In your master host, please use 'sudo mkdir' command to create a new folder /usr/hadoop_apps in Ubuntu and change the folder's permission to every-one access.
  7. This project is built on Hadoop 2.7.2, which means it may work in all Hadoop 2.x.x versions.

Simple steps

  1. Copy Catla.jar from '/catla-dist' in the Github repo to 'examples' folder; thus, the example folders and Catla.jar are in the same folder.
  2. Change master host's information in the file 'HadoopEnv.txt' according to your actual Hadoop cluster, such as master's IP, master's username, password, master port, Hadoop bin path, and root folder of App (the same as set in 6 of Prerequisites).
  3. Open a Windows Command program, change current directory into the '/examples' folder by using 'CD' command
  4. Simply run the Java command as bellows: 'java -jar Catla.jar -tool task -dir task_wordcount'.
  5. After finished, the 'task_wordcount' folder should create a new folder 'downloaded_results' which stores the analyzing result of WordCount MapReduce job.
  6. The above step is a simple demonstration example. Advanced example?

Anlysis using Catla

1) Exhaustive search

exhaustive search


Fig. 2 Three-dimensional surface plot of running time of a MapReduce job over two Hadoop configuration parameters using the exhaustive search method

2) Derivative-free optimization-based search

BOBYQA optimizer

Fig. 3 Change of running time of a MapReduce job over number of iterations when tuning using a BOBYQA optimizer

Contributors

This project is established upon the project Apache Hadoop, Apache Commons Math3 and Apache MINA SSHD under APACHE LICENSE, VERSION 2.0.

LICENSE

See the LICENSE file for license rights and limitations (GNU GPLv3).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.