Giter VIP home page Giter VIP logo

eagle's Introduction

Hawk/Eagle-beta

What is it?

Hawk

Hawk is a Hybrid Data Center Scheduler presented at Usenix ATC 2015

It takes the best of both worlds combining centralized and distributed schedulers. It has the following main features:

  1. Hybrid Scheduling. Schedules Long jobs in a centralized way (better scheduling decisions) and Short jobs in a distributed way (better scheduling latency).

  2. Work stealing. To do better load balance when a node is free it will contact another one and 'steal' the short-latency-sensitive jobs in the queue.

  3. Partitioning. It prevents Long jobs from taking all the resources in the cluster so that Short jobs do not experience head-of-line blocking.

Eagle

Eagle is currently work in progress. A beta version is available here. Eagle aims to avoid the Head-of-Line blocking that short jobs experience in distributed schedulers by providing and approximate/fast view of the Long jobs.

Installation

In order to run Hawk and Eagle you need to have Java JDK 1.7 installed and Maven.

This command installs Hawk/Eagle locally.

$ mvn install -DskipTests 

Getting started

Create a configuration file with the following parameters.

Hawk

deployment.mode = configbased		# currently only this mode is supported
static.node_monitors =<hostname_1>:20502	# comma sepparated list of nodes where jobs will run
static.app.name = spark			# the application name, this can also be changed as a java opt
system.memory =10240000			# 
system.cpus=1				# currently only one slot per machine is supported
sample.ratio=2				# number of probes per task for distributed schedulers, 'power of two'
cancellation=no				# after a job finishes will cancel the rest of the probes, in practice makes no difference
scheduler.centralized=<centralized_scheduler_ip>	# centralized scheduler IP, if no centralized scheduler set 0.0.0.0
big.partition=80				# the percentage of nodes where Long jobs can run
small.partition=100				# the percentage of nodes where Short jobs can run
nodemonitor.stealing=yes			# enable Hawk stealing
nodemonitor.stealing_attempts=10		# number of stealing attempts
eagle.piggybacking=no			# enable Eagle
eagle.retry_rounds=0			# number of rounds distributed schedulers should try before going to small partition

Eagle-beta

To enable Eagle you need to change the following parameters

nodemonitor.stealing=no			# enable Hawk stealing
nodemonitor.stealing_attempts=0		# number of stealing attempts
eagle.piggybacking=yes			# enable Eagle
eagle.retry_rounds=3			# number of rounds distributed schedulers should try before going to small partition

After creating the configuration file you can run Hawk/Eagle daemon with the following command (replace JAVA_DIR, EAGLE_JAR and CONF_FILE with their corresponding paths):

$ JAVA_DIR -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps -Xmx2046m -XX:+PrintGCDetails -cp EAGLE_JAR ch.epfl.eagle.daemon.EagleDaemon -c CONF_FILE

Now you need to run a front end application, you can test it with a Spark program for example.

Spark plugin

We also have a plugin for Spark, you can find it here.

You can compile it using the following command, provided you installed Eagle first.

$ build/sbt assembly

You can run an example with JavaSleep, for that you need to create a file with the jobs sleeping time. The input file should have the following format:

[Each line a job]
Col1: job arrival time
Col2: number of tasks in job
Col3: estimated job runtime (we use normally the mean)
Col4: (and as many cols as needed) the real duration of each task for the job (for the sleep)

Example:
570    2 2722 2722 2722 
  1. Start the driver.

    $ spark/bin/spark-run -Dspark.driver.host=<driver_hostname>
    -Dspark.driver.port=60501
    -Dspark.scheduler=eagle
    -Deagle.app.name=spark_<driver_hostname>
    -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
    -Dspark.broadcast.port=33644
    org.apache.spark.examples.JavaSleep "eagle@$SCHEDULER:20503" 5 3 hostname $SMALL "<path_to_input_file>"

SMALL can take the values: "small" or "big" depending on if its the centralized or the distributed (centralized --> big)

  1. Start the backends. This should run in each of the nodes

    $ spark/bin/spark-run
    -Dspark.scheduler=eagle
    -Dspark.master.port=7077
    -Dspark.hostname=<thismachine_hostname>
    -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
    -Dspark.kryoserializer.buffer=128
    -Dspark.driver.host=<driver_hostname>
    -Dspark.driver.port=60501
    -Deagle.app.name=spark_<driver_hostname>
    -Dspark.httpBroadcast.uri=http://<driver_hostname>:33644
    -Dspark.rpc.message.maxSize=2047
    org.apache.spark.scheduler.eagle.EagleExecutorBackend --driver-url spark://EagleSchedulerBackend@<driver_hostname>:60501

Simulator

Hawk and Eagle are meant to improve job completion times in large clusters, to simulate with tens of thousands of nodes we used a simulator. This simulator is in Python, please refer to its README for further information.

Contact

eagle's People

Contributors

pameladelgado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.