Giter VIP home page Giter VIP logo

armadaproject / armada Goto Github PK

View Code? Open in Web Editor NEW
424.0 25.0 126.0 19.63 MB

A multi-cluster batch queuing system for high-throughput workloads on Kubernetes.

Home Page: https://armadaproject.io

License: Apache License 2.0

Go 79.24% Shell 0.27% Dockerfile 0.27% Smarty 0.32% C# 9.17% HTML 0.03% TypeScript 8.23% CSS 0.29% JavaScript 0.03% Python 1.82% PLpgSQL 0.11% Julia 0.22%
gr-oss kubernetes

armada's Introduction

CircleCI Go Report Card

Armada

Armada is a system built on top of Kubernetes for running batch workloads. With Armada as middleware for batch, Kubernetes can be a common substrate for batch and service workloads. Armada is used in production and can run millions of jobs per day across tens of thousands of nodes.

Armada addresses the following limitations of Kubernetes:

  1. Scaling a single Kubernetes cluster beyond a certain size is challenging. Hence, Armada is designed to effectively schedule jobs across many Kubernetes clusters. Many thousands of nodes can be managed by Armada in this way.
  2. Achieving very high throughput using the in-cluster storage backend, etcd, is challenging. Hence, Armada performs queueing and scheduling out-of-cluster using a specialized storage layer. This allows Armada to maintain queues composed of millions of jobs.
  3. The default kube-scheduler is not suitable for batch. Instead, Armada includes a novel multi-Kubernetes cluster scheduler with support for important batch scheduling features, such as:
    • Fair queuing and scheduling across multiple users. Based on dominant resource fairness.
    • Resource and job scheduling rate limits.
    • Gang-scheduling, i.e., atomically scheduling sets of related jobs.
    • Job preemption, both to run urgent jobs in a timely fashion and to balance resource allocation between users.

Armada also provides features to help manage large compute clusters effectively, including:

  • Detailed analytics exposed via Prometheus showing how the system behaves and how resources are allocated.
  • Automatically removing nodes exhibiting high failure rates from consideration for scheduling.
  • A mechanism to earmark nodes for a particular set of jobs, but allowing them to be used by other jobs when not used for their primary purpose.

Armada is designed with the enterprise in mind; all components are secure and highly available.

Armada is a CNCF Sandbox project and is used in production at G-Research.

For an overview of Armada, see the following videos:

The Armada project adheres to the CNCF Code of Conduct.

Documentation

For documentation, see the following:

We expect readers of the documentation to have a basic understanding of Docker and Kubernetes; see, e.g., the following links:

Contributions

Thank you for considering contributing to Armada! We want everyone to feel that they can contribute to the Armada Project. Your contributions are valuable, whether it's fixing a bug, implementing a new feature, improving documentation, or suggesting enhancements. We appreciate your time and effort in helping make this project better for everyone. For more information about contributing to Armada see CONTRIBUTING.md and before proceeding to contributions see CODE_OF_CONDUCT.md

Discussion

If you are interested in discussing Armada you can find us on slack

armada's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

armada's Issues

Improve scheduling using current usage information

Current algorithm use priority to decide how much resource from cluster lease request should be allocated to a particular queue. If the requests are small enough queue with highest priority becomes oversubscribed.
We should leverage information about current queue usage to schedule more optimally and distribute resource more evenly.

Support having pod resource request != pod resource limit

Problem
Currently the executor reports pod usage to the server as simply the pod request values.

This restricts use to:

  • Pod request = Pod limit

If request != limit, it could be the case that the executor over allocates the cluster (potentially by a lot if request and limit significantly differ).

Potential solution
The executor should report pod usage as:
max (pod request, pod actual usage)

This means when the pod usage > pod request, we don't over allocate the cluster.

This will be further complicated if we want to start doing preemption based on pods being over the request

Considerations

  • How accurate we can get pod current actual usage
  • How much impact we have on the Kubernetes API, if we ask for the real usages too frequently when there are many pods

Write commandline formatter so we can use log in commandline tools rather than fmt

Problem

All of our code in commandline tools (armadactl/armada-load-tester) have to use fmt.Println to log messages to the user.

This is inconsistent with the other code and less flexible (if we say wanted to provide a log of what user had done to a file etc).

Suggested solution
We could implement a custom logrus formatter, so it would printout like we are using fmt.Println but actually runs through logrus.

Some details on how to do that:
https://stackoverflow.com/questions/43022607/how-do-i-disable-field-name-in-logrus-while-logging-to-file

Add metrics on executor background loop duration/frequency

Currently there is no way to see how long the background loops are taking and to be able to see when they slow down.

If they slowdown significantly it could impact overall ability for the component to work

i.e if leasing is taking 10 minutes, every lease will expire

Implement basic job submission

This functionality will be implemented in Executor component.
Acceptance criteria: We can submit and run job on Kubernetes cluster.

Add to watch to allow watching multiple job sets

Problem
If you run a load test with multiple jobsets.

The loadtest will watch these multiple jobsets.

However there is no way to stop/restart this watching of multiple jobsets. meaning you have to leave it running and/or can't rewatch an experiment how it happened.

Suggested solution

  • Armadactl watch should be updated to support watching multiple jobsets
  • armada-load-tester should be updated to support watching multiple jobsets (potentially on multiple connections, as is the case in loadtest)

Handle jobs not getting cleaned up

When Kubernetes is being slow to delete jobs

  • This appears to happen when there is a lot of scheduling and deletion is given a lower priority (I don't know if there is a way to fix it)

  • Our application tells kubernetes to delete but it doesn't get deleted

We only delete when the armada says it has been "cleaned" when calling ReportDone

If you have called ReportDone on the job already, then armada will not report it "cleaned" anymore. So we won't tell kubernetes to delete it again, so the job just never goes away.

Lets label all pods before deletion so we know which ones were reported to the armada server.

Naively report cluster usage based on resource request

Swap from current reporting from Limit to Request.

For now all batch pods will use request = limit

This makes scheduling better match the current state in kubernetes (which schedules based request rather than limit)

Investigate Helm not updating deployments correctly

Problem
Sometimes helm doesn't update configmaps/deployments correctly, particularly when going from A -> B -> A.

This seems to have been many helm issues tracking this over time, the current one appears to be:
helm/helm#5915

Fix ideas

  • Use -- force flag
  • Updating helm version (apparently helps some people)
  • Helm delete then helm install (a bit rubbish)

Alternatives

  • Use Helm template and kubectl apply + kubectl delete by label
  • Wait for helm 3 to see if this fixes the issues
  • Kustomize

Properly publish all artifacts for multiple platforms.

@itamarst suggested

  1. Build user-facing CLIs for Mac/Linux/maybe even Windows in CI (Azure Pipelines is good if you decide on all 3), and put them somewhere downloadable, especially releases.
  2. Have the CLI do a version check (which users can disable), so it can say "oh, you're out of date, you should update".
  3. Distribute CLI via means that allow for automatic updates, depending on expected user base (e.g. if you expect Brew users, Homebrew for Mac, and then either Snap/Flatpak for Linux users or Deb/RPM channels).

Implement job storage

Job definition needs to be stored and placed into the queue.
Job should be removed from the storage after it is finished (Job history will be recorded in the Events Recording).

Make helm charts more flexible

  • ServiceMonitors custom labels
  • Executor node selector customisation (maybe flag for on master node / not on master node)

Allow submitting jobs to the api in bulk

Problem
When submitting 10000 - 100000 jobs at a time, it takes a long time to complete submission.

This is because the jobs are submitted one at a time.

Suggested solution

  • The API should be updated to support bulk submission,

  • The client should be updated to submit in groups of say 100 at a time

This should significantly speed up submitting many jobs at once.

Add .Net client

Problem
There is no native support for .Net client applications.

This means there is no way for users them to programmatically interact with Armada without writing their own gRPC client.

Suggested solution
Create a .Net client that implements all the gRPC endpoints.

Hopefully this can be generated using the .proto files and possibly be generated for several languages.

Make watch client code usage more efficient

Problem
When watching a very large jobset (100000+ jobs) it takes a very long time to catch up to the most recent events in the stream.

Cause
Currently the code that uses watch generates a summary of the current state each time an event comes in.

This involves looking through all currently seen jobs and reporting what state they are in.

For large jobsets this is highly inefficient and means the application uses a LOT of CPU along with being slow.

  • As every event that comes in, loops over 100000 objects and prints a line

Suggested solution

We should just make a state struct that holds the current state. Each new event would just add to the state, rather then regenerating a state summary each time.

Report terminated event from executor for killed pods

Problem
When a pod is cancelled, the executor never sends an event to confirm that the pod was killed.

Instead it is just assumed that because the API will refuse leasing the job, it'll get cancelled. However there is no confirmation when this occurs from a users point of view.

Suggested solution
For cancelled pods due to lease renew failure, report a terminated event.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.