Giter VIP home page Giter VIP logo

eagle's Introduction

Hawk/Eagle-beta

What is it?

Hawk

Hawk is a Hybrid Data Center Scheduler presented at Usenix ATC 2015

It takes the best of both worlds combining centralized and distributed schedulers. It has the following main features:

  1. Hybrid Scheduling. Schedules Long jobs in a centralized way (better scheduling decisions) and Short jobs in a distributed way (better scheduling latency).

  2. Work stealing. To do better load balance when a node is free it will contact another one and 'steal' the short-latency-sensitive jobs in the queue.

  3. Partitioning. It prevents Long jobs from taking all the resources in the cluster so that Short jobs do not experience head-of-line blocking.

Eagle

Eagle is currently work in progress. A beta version is available here. Eagle aims to avoid the Head-of-Line blocking that short jobs experience in distributed schedulers by providing and approximate/fast view of the Long jobs.

Installation

In order to run Hawk and Eagle you need to have Java JDK 1.7 installed and Maven.

This command installs Hawk/Eagle locally.

$ mvn install -DskipTests 

Getting started

Create a configuration file with the following parameters.

Hawk

deployment.mode = configbased		# currently only this mode is supported
static.node_monitors =<hostname_1>:20502	# comma sepparated list of nodes where jobs will run
static.app.name = spark			# the application name, this can also be changed as a java opt
system.memory =10240000			# 
system.cpus=1				# currently only one slot per machine is supported
sample.ratio=2				# number of probes per task for distributed schedulers, 'power of two'
cancellation=no				# after a job finishes will cancel the rest of the probes, in practice makes no difference
scheduler.centralized=<centralized_scheduler_ip>	# centralized scheduler IP, if no centralized scheduler set 0.0.0.0
big.partition=80				# the percentage of nodes where Long jobs can run
small.partition=100				# the percentage of nodes where Short jobs can run
nodemonitor.stealing=yes			# enable Hawk stealing
nodemonitor.stealing_attempts=10		# number of stealing attempts
eagle.piggybacking=no			# enable Eagle
eagle.retry_rounds=0			# number of rounds distributed schedulers should try before going to small partition

Eagle-beta

To enable Eagle you need to change the following parameters

nodemonitor.stealing=no			# enable Hawk stealing
nodemonitor.stealing_attempts=0		# number of stealing attempts
eagle.piggybacking=yes			# enable Eagle
eagle.retry_rounds=3			# number of rounds distributed schedulers should try before going to small partition

After creating the configuration file you can run Hawk/Eagle daemon with the following command (replace JAVA_DIR, EAGLE_JAR and CONF_FILE with their corresponding paths):

$ JAVA_DIR -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCTimeStamps -Xmx2046m -XX:+PrintGCDetails -cp EAGLE_JAR ch.epfl.eagle.daemon.EagleDaemon -c CONF_FILE

Now you need to run a front end application, you can test it with a Spark program for example.

Spark plugin

We also have a plugin for Spark, you can find it here.

You can compile it using the following command, provided you installed Eagle first.

$ build/sbt assembly

You can run an example with JavaSleep, for that you need to create a file with the jobs sleeping time. The input file should have the following format:

[Each line a job]
Col1: job arrival time
Col2: number of tasks in job
Col3: estimated job runtime (we use normally the mean)
Col4: (and as many cols as needed) the real duration of each task for the job (for the sleep)

Example:
570    2 2722 2722 2722 
  1. Start the driver.

    $ spark/bin/spark-run -Dspark.driver.host=<driver_hostname>
    -Dspark.driver.port=60501
    -Dspark.scheduler=eagle
    -Deagle.app.name=spark_<driver_hostname>
    -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
    -Dspark.broadcast.port=33644
    org.apache.spark.examples.JavaSleep "eagle@$SCHEDULER:20503" 5 3 hostname $SMALL "<path_to_input_file>"

SMALL can take the values: "small" or "big" depending on if its the centralized or the distributed (centralized --> big)

  1. Start the backends. This should run in each of the nodes

    $ spark/bin/spark-run
    -Dspark.scheduler=eagle
    -Dspark.master.port=7077
    -Dspark.hostname=<thismachine_hostname>
    -Dspark.serializer=org.apache.spark.serializer.KryoSerializer
    -Dspark.kryoserializer.buffer=128
    -Dspark.driver.host=<driver_hostname>
    -Dspark.driver.port=60501
    -Deagle.app.name=spark_<driver_hostname>
    -Dspark.httpBroadcast.uri=http://<driver_hostname>:33644
    -Dspark.rpc.message.maxSize=2047
    org.apache.spark.scheduler.eagle.EagleExecutorBackend --driver-url spark://EagleSchedulerBackend@<driver_hostname>:60501

Simulator

Hawk and Eagle are meant to improve job completion times in large clusters, to simulate with tens of thousands of nodes we used a simulator. This simulator is in Python, please refer to its README for further information.

Contact

eagle's People

Contributors

pameladelgado avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.