Giter VIP home page Giter VIP logo

apidesign's Introduction

TaskCluster - API Design

This repository outlines the formal API design for the TaskCluster.

Design Goals

  • AMQP is only for message exchange, not keep alive, state or data, we should only use it for events that have relevance now.
  • State of tasks should be eventually consistent on S3, data that doesn't change should be directly uploaded to S3.
  • Database should hold state while task is executed, once tasks are resolved completed or failed they can be removed from the database.
  • Keep it simple
  • Be dynamic initially, while allowing for less dynamic behavior as we evolve the system.

Terminology

  • Task, a unit of work executed by the task cluster.
  • Artifact, a result generated by a worker
  • Queue, place that hold state of pending and running tasks and ensures they are eventually scheduled (or fails if times out).
  • Resolution, a task is resolved once, the 2 resolved states are 'completed' and 'failed', if a task gets canceled it fails.

Common Identifiers

Whenever, we talk about an id it is a with the exception of run-id an string of at most 36 alpha-numeric characters (plus _ and -).

Name Description
task-id Identifies a unique task
run-id Identifies a run of a task (this is an integer, max 999)
worker-group Identifies a group of workers
worker-id Identifies a specific worker within a group
provisioner-id Identifies a provisioner
worker-type Identifies a worker type for a given provisioner

Note, that worker-id and worker-type are not globally unique, they are merely identifiers within a given worker-group and a provisioner-id.

The only identifiers that is assigned by the queue is the run-id and task-id. The run-id is assigned by the queue, because we want them to be numerically increasing sequence per task. The task-id assigned by the queue to ensure random uuid as a security measure. Non random task-id's would allow new tasks to overwrite older tasks, which is a security issue.

The other identifiers are dynamically allocated when you call into the queue. For example if you want to add a new machine type and provisioner, you just give them a unique name and start submitting tasks for them.

By convention non-uuid identifiers should be either prefixed by irc nickname of the person who invented it, or registered in queue documentation to ensure that they are unique. For details see the Future Security Design section below.

Worker identification, the alert reader will notice that a worker is identified by two ids worker-group and worker-id. In this case a group of workers could identify a master node that manages a cluster of specialized hardware. The worker-group could also identify multi-core EC2 instance under which each worker-id identifies a process. The worker-group identifier is often be useful for routing, where as this worker-id (in combination with worker-group) will identify a process, specialized hardware node or folder within which the task ran.

Task Status Structure

The task status structure contains all data stored the queue about a task. The purpose of this structure is track the state of a task until it is resolved.

{
  "task_id":            // Unique task identifier
  "provisioner_id":     // Provisioner identifier
  "worker_type":        // Type of worker to be provisioned by provisioner
  "runs": [
    {
      "run_id":         // run-id, an integer starting from 1
      "worker_group":   // worker group identifier
      "worker_id":      // worker identifier
    }
  ],
  "state":              // pending|running|completed|failed
  "reason":             // String such as none, retries-failed, timeout, canceled
  "routing":            // Task specific routing keys
  "retries":            // Number of retries left
  "priority":           // Double relative priority
  "created":            // Creation time (ISO 8601)
  "deadline":           // Deadline for resolution after this either failed or completed
  "taken_until":        // Time until it reverses from running to pending
}

The actual task definition, results and logs should be stored in S3, the queue will sign urls for the worker so it can upload these files without AWS credentials.

Future Security Design

As the system evolves we may want shift from ensuring identifier uniqueness by convention. Specifically, we will probably want provisioners to register with the queue and provide a JSON schema of task payloads they accept, as well as define a set of oauth scopes is required to post tasks for workers provisioned by the provisioner...

Essentially, we'll need to lock down the system so that there are different scopes for posting and consuming tasks with a given provisioner-id. Additionally, registering JSON schemas for each worker-type would allow us to reject invalid tasks much sooner.

This document leaves these security considerations as future work, as initially we'll just want something fairly dynamic.

apidesign's People

Contributors

ccooper avatar jonasfj avatar lightsofapollo avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.